As continuity of earlier post
Maintain Historical and Current data in data Warehouse, I am going to write how to implement it in Data Warehouse.
First we
create a table that holds Customer information, and assume the table ‘CUSTOMER’
is part of OLTP database. And table doesn’t have any column with END Date. It has Effective date(EFF_DT).
create volatile table
Customer
(
Customer_id int,
CUSTOMER_NAME Varchar(120),
Address VArchar(200),
EFF_DT date
) ON
COMMIT PRESERVE
ROWS;
And Insert
data into the table.
INSERT INTO Customer VALUES ( 101, 'PAUL','Bondhev 50',
date '2015-01-03'
);
INSERT INTO Customer VALUES ( 101,'PAUL','Bondhev 51',
date '2015-03-01' );
INSERT INTO Customer VALUES ( 101,'PAULA','Bondhev 51',
date '2015-04-01');
INSERT INTO Customer VALUES ( 101, 'PAULA','Bondhev 52',
date '2015-04-02');
INSERT INTO Customer VALUES ( 101, 'PAULAS','Bondhev 52',
date '2015-04-10'
);
INSERT INTO Customer VALUES ( 102, 'MADS',Borupvang 52',
date '2015-01-03'
);
INSERT INTO Customer VALUES ( 103, 'LARS',Hellerup 52',
date '2015-04-01'
);
INSERT INTO Customer VALUES ( 202, 'DAN','Oxford street
30', date '2014-04-10'
);
INSERT INTO Customer VALUES ( 202, 'DANS','Montix road
52', date '2014-04-10');
INSERT INTO Customer VALUES ( 202,'DANS','boulevard 52',
date '2014-04-10'
);
INSERT INTO Customer VALUES ( 202, 'DANSA','boulevard 52',
date '2014-04-20'
);
Data look
like below table:
Fig 1: Customer table in OLTP database
As we
discussed in earlier post that when we load data from OLTP database to data
warehouse then we add END_DT at the time of loading data.
From the
above data, customer id 101 has more than one entry and the customer changes
his name and address a few times. To add
END DATE you need to do following:
Find
out max EFF_DT and set that as previous row's END_DT
e.g.
Customer id 101 data is arranged in descending order like below, lowest EFF_DT
is 03-01-2015.
Fig 2: End date manages in Data Warehouse
We need to
look at EFF_DT of previous row which is 01-03-2015 and make that as END_DT for
the row.
And we do
same way for other rows to find out END_DT.
SQL Code is
like below to implement above:
Fig 3: Preceding and Preceding to generate End Date
“ROWS BETWEEN
1 PRECEDING AND 1 PRECEDING” is used in an ordered analytical function to tell
it to include only preceding row in the partition in the calculation being
performed.
If you look
at the above data (Fig: 2), most updated row EFF_DT =10-04-2015 doesn’t have any end
date. Since the window function (Fig: 3) we have used to get previous row which did not find any previous row. So it return NULL.
We can use
COALESEC to replace NULL and put unlimited date ‘9999-12-31’ to present as
active row.
Select CUSTOMER.*,
COALESCE(MAX(EFF_DT) OVER ( PARTITION
BY Customer_id ORDER BY EFF_DT desc ROWS BETWEEN 1 preceding AND 1 preceding ),date '9999-12-31')
as END_DT
FROM CUSTOMER
Final output should look like below:
Fig 4: Latest row updated with ever ending date