VBScript: ETL Testing

ETL - Extract Transform and Load
Dataware House - Storage Area used for Analytical Purpose is stored in Dataware House. Dataware house combines data from multiple sources. Generally, it includes historical Data.
1. Non-Volatile Data (Non-Modified).
2. Subject Oriented
3. Integrated Data
4. Time-Variant

Extract - Raw data is copied or exported from source locations to a staging area during data extraction data mangement.

Transform - In this stage, raw data undergoes data processing. In this step, the data is processed and consolidated for its intended analytical use case.

Load - The converted data is moved from the staging area into the target data warehouse in this final stage.

An ETL pipeline (or data pipeline) is the system that allows ETL activities to take place. Data pipelines are a collection of tools and actions used to transfer data from one system with ites own manner of data storage and processing to another system where it might be kept and managed differently.

OLAP - Online Analytical Processing- is an approach to answer multi-dimensional analytical queries. Operates on Data Warehouse.

OLTP - Online Transaction Processing - facilitates and manages transaction oriented applications. Operates on Database.
CRUD - Create, Read, Update and Delete Operations are performed on the Database
OLTP Architecture - Database Server, Application Server, User Interface

OLTP (Online Transactional Processing)	OLAP (Online Analytical Processing)
Current data	Historical data
Day-to-day Transactional operations	Data analysis and decision making
Normalised data structures	Star schema or Snow flake Schema
Simple Queries	Complex Queries
Requires Fast Response	Can have Longer Response Times
Used by Front Line Employees, Managers	Analyst, Executives and Decision Makers
Data in OLTP is processed	Data in OLAP is processed is periodically refreshed
Used for Analysing the Business	Used for Running the Business
Oracle, MySQL, SQL Server, DB2	Tableau, Power BI and SAP
Provides Summarized and Consolidated Data	Provides Detailed and flat relation view of Data
Example: Database	Example: Data warehouse

In Database Testing, we will focus on tables, relationship, columns and datatypes.
In ETL Testing, we will focus on data.

Data Model
1. Conceptual Data Model

Identifies high level relationship between entities.

Attributes are not defined. Keys are not defined

2. Logical Data Model

Identifies Entities

All Attributes are defined

Keys and Perform Normalisation

Relationships are defined
3. Physical Data Model - Entities, Attributes and Datatypes are defined

Feature	Conceptual	Logical	Physical
Entity Name	✓	✓
Entity Relationships	✓	✓
Attributes		✓
Primary Key		✓	✓
Foreign Keys		✓	✓
Table Names			✓
Column Names			✓
Column Data Types			✓

Types of Schema - Schema is a collection of entities and their relationships

Star Schema
A star schema is a type of data modeling technique used in data warehousing to represent data in a structured way. It cotains fact table in the middle and dimensional tables around it
Fact Table in a star schema contains the measures or metric. Fact Table contains relationship with other tables. Every column in the fact table is the primary key
The dimension table in a star schema contain the descriptive
Snowflake Schema

Slowly Changing Dimensions (SCD)

Type 1: Replaces Old Entry with New Entry

Type 2: Inserts a New Entry in the Table

Type 3: Original Entry gets modified to Reflect the New Entry

ETL Tools

Informatical Power Center

Query Search

Microservice MS SQL Server SSIS

(Business Intelligence) BI Tools - Used for Reporting and Reports used for Analysis

Business Objects

Cognos

Tableau

Quick View

Types of Data Load in ETL Testing

Full Load - This is the data loading process, when we do it for the first time. It can be referred as Bulk Load or Fresh Load

Incremental Load- The modified data alone will be updated in target followed by full load. The changes will be captured by comparing created or modified date against last run date of the job.

Data Cleansing (Data Scrubbing) - Process of removing irrelevant or redundant data and correcting incorrect and incomplete data

Irrelevant - Deleting Data which is not required for business or not needed any more.

Redundant - Deleting the Duplicate Data

InCorrect - Updating Incorrect values with Correct value

InComplete - Updating Incomplete values with

Masking - Hiding/Encrypting Sensitive Data is called Masking

VBScript

Table of Contents

Friday, December 13, 2024

ETL Testing

No comments:

Post a Comment

VBScript Followers

Visitor right now online

Site Visitors