Course Overview
One of the most valuable Data relatetd Technologies in the world is ETL. For Students,Freshers, Software Workers, and IT Professionals who aspire to lead the IT sector, Digital Hub Tech offers sufficient Database information from highly experienced and real-timeprofessionals as well as a number of prestigious and deserving Oracle courses inHyderabad.
Pre-Requisite
- Basics of Computers, Windows Operating System.
- Basic Database knowledge might be very helpful.
Course Objective
Improve your grasp of ETL Analysis with Design and Testing. Improve your understanding of the Database, Data Warehouse, ETL processes used in Real Environment. It’ll help to upgrade the next level of Skill Set like Bigdata, Spark, Data Analytics, Data Science, Data Engineer, etc.
Eligibility/Qualification
Highest Degree : B.E/ B.Tech/ MCA/ MBA/ M.Sc/ MS/ ME/ M.Tech/ BCA/ BSC/BCom/ BA
Exerienced Candidates : Interested to build Career in same related IT Field
Professional Role : Developer, Tester, Production Support
College Students & Freshers : Any Stream of Graduates & Post-Graduates
Career Opportunities
ETL Testing / Data Warehouse Testing has gained much prominence in recent years. Currently 15,000 ETL Testing Job Positions are open for this exciting career in PAN India.
ABOUT ETL AND DATA WAREHOUSE
The data ware house is nothing but the database where we are collecting the historical data from different sources and storing those data in the form of a table which can be used for the analysis purpose. We can select any data base as our warehouse (e.g. Teradata, MySQL, SQL Server, etc). A data warehouse is a database which is being constructed by integrating data from multiple heterogeneous sources.It is designed for the query and analysis not for the transition processing. Â It support decision making and analytical processing.
ETL comes from the concept of Data Warehouse only. ETL stand for Extract Transfer Load. ETL is a process where data is being extracted from the sources, transfer and loaded to the destination.Data can be extracted from the OLTP database and transformed to the target data warehouse area. Also there are many data which are being extracted from the Non-OLTP system like text file, spreadsheet, etc. ETL testing usually refer to a process of validating the data, verifying and also qualifying data while preventing duplicate data or records and data loss.
Example
Suppose we have CUSTOMER table in the OLTP and DIM_CUSTOMER table in Data Warehouse.
The developer develop the ETL programwhich retrieve the data from CUSTOMER table to DIM_CUSTOMER table.
Extract
- Extract the relevant data.
- Extract can be of two types: Homogeneous and Heterogeneous.
- Homogeneous: When the data are extracted from the same type of source then it is called as homogeneous.
- Heterogeneous: When data are extracted from the different type of source then it is called as heterogeneous.
Transformation
- Transformation is a set of rules which is generate and modify or pass data.
- Transfer data to the data warehouse by using some rule or key where key is one or more data attribute which uniquely identify an entity. There are different type of key like primary key, forging key, alternate key, surrogate key, composite key, etc.
- Before transferring the data from the source cleaning of data happen. In Cleaning does the omission and identify the error fixing.This improve the data quality and solve the problem of source system.
- Transformation can be of 2 types: Active and Passive.
- Active: In an active transformation the number of row that is created as output can be changed once a transformation has been occurred.
- Passive: In case of passive transformation the number of row that is created as output cannot be changed once a transformation has been occurred.
Load
- Load the data into the data warehouse.
- Load are of two type: Incremental load and Full load.
- Incremental/Delta Load: The incremental load is also call as delta load.
Ex: If there are 10,000 recordsavailable in current build and new required load is 3,000 records then
After loading it’ll be 13,000 by ignoring duplicate records.
Full Load/ Initial Load:All data will be loaded freshly and next time if the new data comes then the old data will be replaced.
ETL testing is being held to ensure that the data has been loaded from the source to destination properly. Also, it involves the verification of data in various stage which are being used between sources to destination. ETL testing is quite different from the application testing. Because, it require data centric testing approach. So, because of that we usually get some challenges during testing. Below are few challenges that we usually face:
- ETL testing involves comparing a good number of data typically a million of data.
- The data needs to be testing in heterogeneous data sources.
- Data is often transferred for which it may need some complex queries.
- It is mostly dependent on availability of test data with different test scenarios.
A Typical ETL Tool – based Data warehouse use Staging Area, Data Integration and Access Layers to perform its functions. It’s normally a 3-Layer Architecture.
- Staging layer->Staging Layer or Staging Database is used to store the data extracted from different Source Data Systems.
- Data Integration Layer->Integration Layer Transforms the Data from the Staging Layer and moves the Data to a Database where the Data is arranged into Hierarchical Groups, often called Dimensions and into Facts and Aggregate Facts. The combination of Facts and Dimensions Tables in a DW System is called
- Access Layer-> The Access Layer is used by end-users to retrieve the Data for Analytical Reporting and Information.
Example
Let us assume there is Manufacturing Company having Multiple Departments such as Sales, HR, and Material Management etc. All these Departments have separate Databases which they use to maintain information w.r.t their work and each Data base has a different Technology,Landscape,Table Names,Columns, etc.Now if the company wants to analyze historical Data and generate Reports and all the Data from these Data Sources should be extracted and Loaded into a Data warehouse to save it for Analytical work.An ETL Tool extracts the Data from all these Heterogeneous Data Sources, Transform the Data (Like Applying Calculations, Joining Fields, Keys, Removing incorrect Data Fields, etc.)and Loads it into a Data warehouse.Later you can use various Business IntelligenceTools(BI)to generate meaningful Reports, Dashboards and Visualizations using this Data.