When you want
to keep history in Data Warehouse then need to manipulate END_DT along with
EFF_DT in the table to find out active rows and inactive rows.
For example source
table Customer look like below:
Fig: 1.1 Source table ‘Customer’
The above
source table doesn’t have any end date but when you will get the date from
source to target you need to add end date. As we can see from the above data,
we have two customers and each customer has many rows, sometimes they changes
address and sometimes name.
When you get data
into target table only one row should have active for one customer by saying end_date
as ‘9999-12-31’ and all other rows for the customer will have end_date as
previous row’s EFF_DT. The data should look like below:
Fig: 1.2 How End_date works in ‘Customer’ table
Why we care
about End date (END_DT) in data warehouse?
In data warehouse
initially we take full load and then we take delta (each day’s data) every day.
You may ask, can’t we do exact copy from
source table? Do we need to add End Date
(END_DT)??!!
And simple
answer of your question will be YES and one of the reasons of having data
warehouse is isolate active data from the historical data by putting END_DT.
Take an
example, if you want to make query and get the latest attributes of customer_id=101
then how will you find the latest information of the customer. If you make
query like below :
Select * from Customer
where customer_id=101;
--Result: you will find
all five rows like above
And to get latest
information of the customer ‘PAUL’, please run below query:
Select * from Customer
Where customer_id=101
and date between EFF_DT and END_DT
Result:
Customer_id
|
CUSTOMER_NAME
|
Address
|
EFF_DT
|
END_DT
|
101
|
PAULAS
|
Bondhev 52
|
10-04-2015
|
31-12-9999
|
Fig 1.3: Find out the latest
information of the customer
Without
having END_DT your data warehouse will behave same like as OLTP database where
you have to make complex query to find out the latest information.
Now we will
find out how we can populate END_DT from EFF_DT by using Window function: Rows
between 1 preceding and 1 preceding (Coming in next post.)
No comments:
Post a Comment