Wednesday, January 18, 2017

MERGE in SQL is a good friend for Data Warhouse

If your need is SCD2 type table, where you would like to keep the history. Then SQL merge could be your good friend. You could produce history with different way but SQL merge is clean; it's one placeholder for inserting and updating the data.

Lets start with very simple example, in data warehouse many of your table must need to hold history where you take each full load at the first load and then you start loading delta each day. In data warehouse you have many layers but in my example, it's source and target table is taken.

Step 1: Creating Source Table
First make a source table:
Create table Source_Table (id int, ename varchar(50), UpdateDate DateTime);

Step 2: Creating Target table
We need to make Target table to load the data

Create table Target_Table (id int, ename varchar(50),EFF_DT date, END_DT date)

Step 3:  Merge Actions in general
Merge have three different actions, those are INSERT, UPDATE and DELETE.

Below SQL query will show the current inserted row to the table, you don't need to write select * from syntax.

insert into Source_Table
OUTPUT INSERTED.*
values (1,'Diponkar',getdate()), (2,'San',getdate()), (3,'Derek',getdate()), (4,'Paul',GETDATE())

Fig 1: OUTPUT from the Insert                 


Step 4: Data warehouse Full load/Init load

Since we don't have any data in the target table. So, if we execute below merge query it will only affect the insert part of the merge.

Merge target_Table T
Using Source_Table S
on (S.id=T.id)
When NOT Matched
Then insert (id,ename,EFF_DT,END_DT)
values(S.id,S.ename,GETDATE(),'2999-12-31')
When Matched
Then Update SET T.ename=s.ename
OUTPUT $action, DELETED.*, INSERTED.*;

And we are able to see what is ACTION of this merge.

Fig 2: Merge Action


Above figure 2, clearly shown that only insert action is taken place by the merge,

Step 5:   Delta load
Source data is daily delta data, so we can delete the old data from source table and insert two new rows. One with new key and other with existing key.

delete from Source_Table

insert into Source_Table (updated data with same business key)
values(1,'Dipu2',GETDATE())

insert into Source_Table (completely new row)
values(5,'Drina',GETDATE())

Data in the source table is like below:

Fig 3: Delta load



Now, lets run the below merge query and catch the action:

Merge target_Table T
Using Source_Table S
on (S.id=T.id)
When NOT Matched
Then insert (id,ename,EFF_DT,END_DT)
values(S.id,S.ename,GETDATE(),'2999-12-31')
When Matched
Then Update SET T.ename=s.ename
OUTPUT $action, DELETED.*, INSERTED.*;


There should be one insert and one update, lets have a look at the output:

Fig 4:Output after merge applied


Step 6: Data in the target table
As we see from the below fig: 5, merge query updated the target table accordingly.

Fig 5: Insert and Update shown