Thursday, April 30, 2015

Window function: Rows between Preceding and Preceding in Teradata

As continuity of earlier post Maintain Historical and Current data in data Warehouse, I am going to write how to implement it in Data Warehouse.

First we create a table that holds Customer information, and assume the table ‘CUSTOMER’ is part of OLTP database. And table doesn’t have any column with END Date. It has Effective date(EFF_DT).

create volatile table Customer
(
Customer_id int,
CUSTOMER_NAME Varchar(120),
Address VArchar(200),
EFF_DT date

) ON COMMIT PRESERVE ROWS;

And Insert data into the table.

INSERT INTO Customer VALUES ( 101, 'PAUL','Bondhev 50', date '2015-01-03' );
INSERT INTO Customer VALUES ( 101,'PAUL','Bondhev 51', date '2015-03-01'  );
INSERT INTO Customer VALUES ( 101,'PAULA','Bondhev 51', date '2015-04-01');
INSERT INTO Customer VALUES ( 101, 'PAULA','Bondhev 52', date '2015-04-02');
INSERT INTO Customer VALUES ( 101, 'PAULAS','Bondhev 52', date '2015-04-10' );
INSERT INTO Customer VALUES ( 102, 'MADS',Borupvang 52', date '2015-01-03' );
INSERT INTO Customer VALUES ( 103, 'LARS',Hellerup 52', date '2015-04-01' );

INSERT INTO Customer VALUES ( 202, 'DAN','Oxford street 30', date '2014-04-10' );
INSERT INTO Customer VALUES ( 202, 'DANS','Montix road 52', date '2014-04-10');
INSERT INTO Customer VALUES ( 202,'DANS','boulevard 52', date '2014-04-10' );

INSERT INTO Customer VALUES ( 202, 'DANSA','boulevard 52', date '2014-04-20' );

Data look like below table:


Fig 1: Customer table in OLTP database

As we discussed in earlier post that when we load data from OLTP database to data warehouse then we add END_DT at the time of loading data.

From the above data, customer id 101 has more than one entry and the customer changes his name and address a few times.  To add END DATE you need to do following:

  Find out max EFF_DT and set that as previous row's  END_DT

e.g. Customer id 101 data is arranged in descending order like below, lowest EFF_DT is 03-01-2015.


Fig 2: End date manages in Data Warehouse

We need to look at EFF_DT of previous row which is 01-03-2015 and make that as END_DT for the row.

And we do same way for other rows to find out END_DT.


SQL Code is like below to implement above:

                                Fig 3: Preceding and Preceding to generate End Date

“ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING” is used in an ordered analytical function to tell it to include only preceding row in the partition in the calculation being performed.

If you look at the above data (Fig: 2), most updated row EFF_DT =10-04-2015 doesn’t have any end date. Since the window function (Fig: 3) we have used to get previous row which did not find any previous row. So it return NULL.

We can use COALESEC to replace NULL and put unlimited date ‘9999-12-31’ to present as active row.

Select CUSTOMER.*,
COALESCE(MAX(EFF_DT) OVER ( PARTITION BY Customer_id  ORDER BY EFF_DT desc ROWS BETWEEN 1 preceding AND 1 preceding ),date '9999-12-31') as END_DT

FROM CUSTOMER

Final output should look like below:


Fig 4: Latest row updated with ever ending date


No comments: