Sunday, May 19, 2019

When and why unpivot require for your Data Warehouse(DW) project?

You may involve with a Data warehouse (DW) project where you find source system got many repeated columns in a table which can be reduced to a few columns table. It's not only reducing the number of columns rather you would like to find out attributes those can be aggregated for the reporting purpose, in DW world, which will be your fact attributes. Let's start with an example:

In Medical data, Hospital can have different type of emergency bed type e.g. below table holds bed availability data for different type of bed.

Fig 1: Hospital Bed data

Let's make a table with above data, use below SQL query to populate the data.

create table #HospitalBed (TranID int,SubmissionDate datetime, GenAcuteCom int, 
ShortstayCrisisComp int, MoodAnxietyComp int, EatDisorderComp int)

 insert into #HospitalBed
 select 101,getutcdate(),10,3,9,4
 union 
 select 102,getutcdate(),5,6,2,8
 union 
select 104,getutcdate(),6,3,9,1
 union 
 select 105,getutcdate(),5,6,2,8



If you would like to get the data under, SELECT * FROM #HospitalBed

Fig 2: Data from Table

Now to fit this data for data warehouse model, we can clearly see columns need to turned into rows. These different type of bed should go under one column called 'BedType' and values for those columns  under another column can be called 'complementbed'. Now how do we do it? We have function in SQL called 'unpivot', we are going to use the function to populate expected data set.

select u.tranID, u.Bedtype, u.complementbed
from #HospitalBed s
unpivot
(
  complementbed
  for Bedtype in (GenAcuteCom, ShortstayCrisisComp, MoodAnxietyComp,EatDisorderComp)
) u;

Fig 3: After unpivot

As you can see , after applying UNPIVOT you get more rows but column got reduced.

But what is the benefit? This will help to make dimensional model easy, where all these BedType can be kept in a dimension table and fact table can have the key of BedType along the with ComplementBed value. Below Fig (4) shows how dimensional model can be established for the earlier mentioned source table (Fig 1) eventually.

Fig 4: Sample Dimension and Fact table 


And consider the source table we started have 4 different type of bed, however, it could be around 20 columns and more. Dimensional model can be super complicated and even may not able to build proper dimensional model without transposing the columns into rows for the above scenario,