ETL - Extract Transform and Load
Dataware House - Storage Area used for Analytical Purpose is stored in
Dataware House. Dataware house combines data from multiple sources. Generally,
it includes historical Data.
1. Non-Volatile Data (Non-Modified).
2. Subject Oriented
3. Integrated Data
4. Time-Variant
Extract - Raw data is copied or exported from source locations to a
staging area during data extraction data mangement.
Transform - In this stage, raw data undergoes data processing. In this
step, the data is processed and consolidated for its intended analytical use
case.
Load - The converted data is moved from the staging area into the target
data warehouse in this final stage.
An ETL pipeline (or data pipeline) is the system that allows ETL activities to
take place. Data pipelines are a collection of tools and actions used to
transfer data from one system with ites own manner of data storage and
processing to another system where it might be kept and managed differently.
OLAP - Online Analytical Processing- is an approach to answer
multi-dimensional analytical queries. Operates on Data Warehouse.
OLTP - Online Transaction Processing - facilitates and manages
transaction oriented applications. Operates on Database.
CRUD - Create, Read, Update and Delete Operations are performed on the
Database
OLTP Architecture - Database Server, Application Server, User Interface
OLTP (Online Transactional Processing) |
OLAP (Online Analytical Processing) |
Current data |
Historical data |
Day-to-day Transactional operations |
Data analysis and decision making |
Normalised data structures |
Star schema or Snow flake Schema |
Simple Queries |
Complex Queries |
Requires Fast Response |
Can have Longer Response Times |
Used by Front Line Employees, Managers |
Analyst, Executives and Decision Makers |
Data in OLTP is processed |
Data in OLAP is processed is periodically refreshed |
Used for Analysing the Business |
Used for Running the Business |
Oracle, MySQL, SQL Server, DB2 |
Tableau, Power BI and SAP |
Provides Summarized and Consolidated Data |
Provides Detailed and flat relation view of Data |
Example: Database |
Example: Data warehouse |
In
Database Testing, we will focus on tables, relationship, columns and
datatypes.
In
ETL Testing, we will focus on data.
Data Model
1.
Conceptual Data Model Identifies high level relationship between entities. Attributes are not defined. Keys are not defined
2.
Logical Data Model
Identifies Entities
All Attributes are defined
Keys and Perform Normalisation
Relationships are defined
3. Physical Data Model - Entities, Attributes and Datatypes are defined
Feature | Conceptual | Logical | Physical |
Entity Name | ✓ | ✓ | |
Entity Relationships | ✓ | ✓ | |
Attributes | | ✓ | |
Primary Key | | ✓ | ✓ |
Foreign Keys | | ✓ | ✓ |
Table Names | | | ✓ |
Column Names | | | ✓ |
Column Data Types | | | ✓ |
Types of Schema - Schema is a collection of entities and their relationships
Star Schema
A star schema is a type of data modeling technique used in data warehousing to represent data in a structured way. It cotains fact table in the middle and dimensional tables around it
Fact Table in a star schema contains the measures or metric. Fact Table contains relationship with other tables. Every column in the fact table is the primary key
The dimension table in a star schema contain the descriptive
Snowflake Schema
Slowly Changing Dimensions (SCD)
Type 1: Replaces Old Entry with New Entry
Type 2: Inserts a New Entry in the Table
Type 3: Original Entry gets modified to Reflect the New Entry
ETL Tools
Informatical Power Center
Query Search
Microservice MS SQL Server SSIS
(Business Intelligence) BI Tools - Used for Reporting and Reports used for Analysis
Business Objects
Cognos
Tableau
Quick View
Types of Data Load in ETL Testing
Full Load - This is the data loading process, when we do it for the first time. It can be referred as Bulk Load or Fresh Load
Incremental Load- The modified data alone will be updated in target followed by full load. The changes will be captured by comparing created or modified date against last run date of the job.
Data Cleansing (Data Scrubbing) - Process of removing irrelevant or redundant data and correcting incorrect and incomplete data
Irrelevant - Deleting Data which is not required for business or not needed any more.
Redundant - Deleting the Duplicate Data
InCorrect - Updating Incorrect values with Correct value
InComplete - Updating Incomplete values with
Masking - Hiding/Encrypting Sensitive Data is called Masking
API - Application Programming Interface.
They are basically collection of procedures and functions which allow us to communicate between two applications or libraries.
REST API - It is REpresentational State Transfer API.
XMATCH
XMATCH function searches for a specified item in an array or range of cells, and then returns the item's relative position.
The XMATCH function returns the relative position of an item in an array or range of cells.
=XMATCH(lookup_value, lookup_array, [match_mode], [search_mode])
PROPER
PROPER function is a Text function, and its purpose is to capitalize the first letter in each word of a text value that will remain in lowercase or convert to lowercase.
XLOOKUP function: Use =XLOOKUP("last", range, return_array) to get the value of the last non-empty cell1.
LOOKUP function: Use =LOOKUP("last", range, return_array) to find the last non-blank cell in a row2.
COUNTA and OFFSET functions: Use =COUNTA(range) to count the number of non-empty cells, and =OFFSET(A1, COUNTA(A:A)-1, 0) to get the last non-blank cell34.
ADDRESS and LOOKUP functions: Use =ADDRESS(LOOKUP(2, 1/(M:M<>""), ROW(M:M)), 13) to return the address of the last non-blank cell in column M
Basic Functions & Operations | SUM & COUNT |
A1+B1 | Addition | SUM(A1: A10) | Sum of a range |
A1-B1 | Subtraction | AVERAGE(A1:A10) | Average of a range |
A1*B1 | Multiplication | COUNT(A1:A10) | Count of numeric entries |
A1/B1 | Division | COUNTIF(A1:A10, ">10") | Count cells that meet a condition |
A1*2 | Exponentation | COUNTIFS | Counts the number of cells in a range where one or more conditionsa are met |
MOD(A1, B1) | Remainder of division | SUMIF(A1: A10, ">10", B1: B10) | Sum based on a condition |
Logical Functions | MIN(A1:A10) | Calculates the smallest number in a range |
IF(A1> 10, "Yes", "No") | Conditional Statement | MEDIAN | Finds the median value in a group of values |
AND(A1> 0, B1 < 5) | Logical AND | MAX | Calculates the highest number a range |
OR(A1 > 10, B1 < 5) | Logical OR | Text Functions |
NOT(A1 = 10) | Logical NOT | CONCATENATE(A1, "", B1) | Join text strings |
IFERROR(A1/B1, "Error") | Handle errors gracefully | TEXT(A1, "0.00") | Format number as text |
Date & Time | LEFT(A1, 3) | Extract Left Most characters |
Today() | Current date | RIGHT(A1, 3) | Extract Right Most characters |
NOW() | Currnet date and time | MID(A1, 2, 3) | Extract chracters from the middle |
Lookup Functions | LEN(A1) | Length of a Text String |
VLOOKUP | Vertifically searches for a value in a table | FIND("text", A1) | Find position of text |
INDEX | A lookup function that searches a table both vertically and horizontally | SUBSTITUTE(A1, "old", "new") | Replace text in a string |
XLOOKUP | An updated version of VLOOKUP available in newer Excel versions | UPPER | Converts a string of text to all uppercase |
HLOOKUP | Horizontally searches for a value in a table | LOWER | Converts a string of text to all lowercase |
MATCH | Returns the location of a value in a sequence | VALUE | Converts a text to a value |
OFFSET | Moves the cell reference by the number of given rows and/or columns
|
Java Annotations:
@Entity
@Author
@Override - annotation informs the compiler that the element is meant to override an element declared in a superclass.
@SuppressWarnings - annotation tells the compiler to suppress specific warnings that it would otherwise generate.
@Deprecated - annotation indicates that the marked element is deprecated and should no longer be used.
Lambda Expressions: Lambda can be used with certain type of interfaces.
parameter -> expression
(parameter1, parameter2) -> expression
(parameter1, parameter2) -> {expression}
Any interface with a SAM(Single Abstract Method) is a functional interface, and its implementation may be treated as
lambda expressions. When a method inside an interface does not have implementation, it is called
abstract method. When an interface has one abstract method, the interface is called
Functional Interface. Abstract method in the interface uses the lambda expressions. Interface is preceded with Functional Interface Tag
Example Interface:
@Functional Interface
public Interface printable {
void print();
}
For Loop using Lambda Expressions on Array List:
List
names = Arrays.asList("John", "Freddy", "Samuel");
names.forEach(name -> System.out.println("Hello, " + name));
For Loop using Lambda Expressions on Hash Map:
Map ages = new HashMap<>();
ages.put("John", 25);
ages.put("Freddy", 24);
ages.put("Samuel", 30);
ages.forEach((name, age) -> System.out.println(name + " is " + age + " years old"));
Filter list using the Stream API and keep only the names that start with the letter “A”.
List names = Arrays.asList("Angela", "Aaron", "Bob", "Claire", "David");
List namesWithA = names.stream()
.filter(name -> name.startsWith("A"))
.collect(Collectors.toList());
System.out.println(namesWithA.get(0).toLowerCase());
System.out.println(namesWithA.get(1).toLowerCase());
Replace All in List:
List names = Arrays.asList("bob", "josh", "megan");
names.replaceAll(name -> name.toUpperCase());
names.forEach(name -> System.out.println("Hello, " + name));
Replace All in HashMap:
Map salaries = new HashMap<>();
salaries.put("John", 40000);
salaries.put("Freddy", 30000);
salaries.put("Samuel", 50000);
salaries.replaceAll((name, oldValue) ->
name.equals("Freddy") ? oldValue : oldValue + 10000);
salaries.forEach((name, value) -> System.out.println(name + " is " + value + " years old"));
Calculator Operations using Lambda Expressions:
package org.example;
public class Main {
@FunctionalInterface
interface Operation {
int calculate(int x);
}
public static void main(String[] args) {
int a = 5;
Operation square = (int x) -> x * x;
Operation add = (int x) -> x + x;
Operation subtract = (int x) -> x - x;
int ans = square.calculate(a);
int addedvalue = add.calculate(a);
System.out.println(ans);
System.out.println(addedvalue);
System.out.println(subtract.calculate(6));
}
}
Synchronise List of files from Linux Server to Window using Batch file
@echo off
"C:\Users\mmendu\AppData\Local\Programs\WinSCP\WinSCP.com" ^
/ini=nul ^
/command ^
"open sftp://a_MMendu:P
"synchronize -filemask=S_EDI_202202*.TXT local C:\Users\mmendu\Documents\UFT /app/retek/rtkstst/transport/extract" ^
set WINSCP_RESULT=%ERRORLEVEL%
if %WINSCP_RESULT% equ 0 (
echo Success
) else (
echo Error
)
Get the files by Masking in synchorize command
synchronize -filemask=RMS_EDI_202202*.TXT local "+FilesFolderpath+" /app/retek/rtkstst/transport/extract"
Constant Names
Constant Names
Names of constants should be meaningful noun or
noun phrases with all letters capitalized. Multiple words were separated using
the underscore (_) character.
Example: MY_VARIABLE
Function Names
Function Names (or) Module names should be meaningful words or phrases that describe the
abstraction of the module. Use nouns for
object abstractions and verbs for functional abstractions. Module names should be without spaces, in
mixed case, with the first letter uppercase, and the first letter of each
subsequent word capitalized.
Boolean Function
Names
Functions that return the results of a test of a Boolean condition
T should be named IsT, HasT, or CanT.
Variable Names
- Append
computation qualifiers (Avg, Sum, Min, Max, Index) to the
end of a variable name where appropriate.
- Boolean
variable names should contain Is which implies Yes/No or True/False
values, such as fileIsFound.
- Do not
use literal numbers or literal strings, such as For i = 1 To 7. Instead,
use named constants, such as For i = 1 To NUM_DAYS_IN_WEEK for ease
of maintenance and understanding.