Friday, December 13, 2024

ETL Testing

ETL - Extract Transform and Load
Dataware House - Storage Area used for Analytical Purpose is stored in Dataware House. Dataware house combines data from multiple sources. Generally, it includes historical Data.
1. Non-Volatile Data (Non-Modified).
2. Subject Oriented
3. Integrated Data
4. Time-Variant

Extract - Raw data is copied or exported from source locations to a staging area during data extraction data mangement.

Transform - In this stage, raw data undergoes data processing. In this step, the data is processed and consolidated for its intended analytical use case.

Load - The converted data is moved from the staging area into the target data warehouse in this final stage.

An ETL pipeline (or data pipeline) is the system that allows ETL activities to take place. Data pipelines are a collection of tools and actions used to transfer data from one system with ites own manner of data storage and processing to another system where it might be kept and managed differently.

OLAP - Online Analytical Processing- is an approach to answer multi-dimensional analytical queries. Operates on Data Warehouse.

OLTP - Online Transaction Processing - facilitates and manages transaction oriented applications. Operates on Database.
CRUD - Create, Read, Update and Delete Operations are performed on the Database
OLTP Architecture - Database Server, Application Server, User Interface

OLTP (Online Transactional Processing) OLAP (Online Analytical Processing)
Current data Historical data
Day-to-day Transactional operations Data analysis and decision making
Normalised data structures Star schema or Snow flake Schema
Simple Queries Complex Queries
Requires Fast Response Can have Longer Response Times
Used by Front Line Employees, Managers Analyst, Executives and Decision Makers
Data in OLTP is processed Data in OLAP is processed is periodically refreshed
Used for Analysing the Business Used for Running the Business
Oracle, MySQL, SQL Server, DB2 Tableau, Power BI and SAP
Provides Summarized and Consolidated Data Provides Detailed and flat relation view of Data
Example: Database Example: Data warehouse


In Database Testing, we will focus on tables, relationship, columns and datatypes.
In ETL Testing, we will focus on data.

Data Model
1. Conceptual Data Model
  • Identifies high level relationship between entities.
  • Attributes are not defined. Keys are not defined
  • 2. Logical Data Model
  • Identifies Entities
  • All Attributes are defined
  • Keys and Perform Normalisation
  • Relationships are defined
    3. Physical Data Model - Entities, Attributes and Datatypes are defined


    FeatureConceptualLogicalPhysical
    Entity Name
    Entity Relationships
    Attributes
    Primary Key
    Foreign Keys
    Table Names
    Column Names
    Column Data Types

    Types of Schema - Schema is a collection of entities and their relationships

    Star Schema
    A star schema is a type of data modeling technique used in data warehousing to represent data in a structured way. It cotains fact table in the middle and dimensional tables around it
    Fact Table in a star schema contains the measures or metric. Fact Table contains relationship with other tables. Every column in the fact table is the primary key
    The dimension table in a star schema contain the descriptive
    Snowflake Schema


    Slowly Changing Dimensions (SCD)
  • Type 1: Replaces Old Entry with New Entry
  • Type 2: Inserts a New Entry in the Table
  • Type 3: Original Entry gets modified to Reflect the New Entry


  • ETL Tools
  • Informatical Power Center
  • Query Search
  • Microservice MS SQL Server SSIS
  • (Business Intelligence) BI Tools - Used for Reporting and Reports used for Analysis
  • Business Objects
  • Cognos
  • Tableau
  • Quick View


  • Types of Data Load in ETL Testing
  • Full Load - This is the data loading process, when we do it for the first time. It can be referred as Bulk Load or Fresh Load
  • Incremental Load- The modified data alone will be updated in target followed by full load. The changes will be captured by comparing created or modified date against last run date of the job.
  • Data Cleansing (Data Scrubbing) - Process of removing irrelevant or redundant data and correcting incorrect and incomplete data
  • Irrelevant - Deleting Data which is not required for business or not needed any more.
  • Redundant - Deleting the Duplicate Data
  • InCorrect - Updating Incorrect values with Correct value
  • InComplete - Updating Incomplete values with


  • Masking - Hiding/Encrypting Sensitive Data is called Masking

    No comments:

    Post a Comment