Saturday, November 16, 2019

SQL Server: Easy way to add Columns to Multiple tables

When you work in Data Warehouse/BI Project or any project where database is required and you are in a situation where need to add one or more same columns to the tables you already built then this post will help you to achieve the goal quicker and error free.

For example a BI project you may working with require to add load timestamp to all the target tables which at the design phase somehow missed or you find out at the later part of the project that you require one or two particular columns to all the tables.

Boring task!!! right?? Going through all the tables and changing it one by one is time consumable and error prone.  For example if you have 50 tables and you need to add one column say column called 'UpdatedAt'. How do we do it by using SQL script?

You need to list all the table name and run the below query by replacing the table name e.g. 'testTable1', 'testTable2' etc.
             Declare @tbl_name as varchar(100)
              Declare @tsql_code Varchar(500)

              Declare cur_table cursor local for
                           select name
                           from sys.tables
                           Where name in ('testTable1','testTable2')
                           order by 1
                           for read only

              open cur_table
              fetch cur_table into @tbl_name
                           while @@fetch_status = 0
                           Select @tsql_code='Alter Table ' + @tbl_name  +' Add Col1 Varchar(10), Col2 Varchar(10)'
                           Exec (@tsql_code)
                           fetch cur_table into @tbl_name
                     close cur_table
                     deallocate cur_table

However,  the above code may not work since schema name is missing there, so you need to add the schema name along with the table name like below example:

   Declare @tbl_name as varchar(100)
              Declare @tsql_code Varchar(500)

     Declare cur_table cursor local for
                           select '['+SCHEMA_NAME(schema_id)+'].['+name+']'
                           from sys.tables
                           Where SCHEMA_NAME(schema_id)+'.'+name in ('dbo.Currency','dbo.Sales')
                           order by 1
                           for read only

              open cur_table
              fetch cur_table into @tbl_name
                           while @@fetch_status = 0
                           Select @tsql_code='Alter Table ' + @tbl_name  +' Add UpdatedAt [DATETIME]'
                           Exec (@tsql_code)
                           fetch cur_table into @tbl_name
                     close cur_table
                     deallocate cur_table

Hope this will make your life easier.

Sunday, October 20, 2019

SSIS: Execute SQL Task Limiation

I would like to share an experience that found recently. In general, when you work with SSIS and using SQL Execute task, you tend to make query by using  SQL Server Management Studio (SSMS) and when query works fine then copy the code into the Execute SQL Task. However,  when I copied the code and pasted to Execute SQL task and run the package it failed at Execute SQL task, then figure it out only part of the SQL code is pasted.

Fig 1: SQL Execute Task

The SQL code I copied was around 1600 lines, however when pasted it's only took 891 lines of code to Execute SQL Task and rest have been discarded. It means Execute SQL task got limitation, out of curiosity took the pasted part of the code from Execute SQL task then count it and found it can hold up to 30,987 characters with space. (see below fig 2)

Fig 1: SQL Execution task limiation

How do you solve it?
Well, It's easy, you wrap the SQL code either by using view or store procedure and then use that in your Execute SQL task.

Sunday, September 15, 2019

Referential Integrity in Data warehouse Design?

I was involved in a project and the project was asking about implementing referential integrity for the data warehouse design. I was surprised, since I never did such design for Data Warehouse(DW).

If someone ask you to put the referential integrity in your data warehouse project, how do you explain them that it's not required for DW project?

If you design the control /config tables to maintain logs or control the flow of your ETL run then you may follow the referential integrity for only those tables, but when you start working with data load for the tables, don't go with referential integrity which will kill the performance.

So No 1 : Poor performance while loading

If your DW tables (dimensions and facts) have referential integrity which will consume extra time to insert or update the records. So standard for data warehouse design is, you don't maintain referential integrity.

Let's explain why you don't need it?

When you are doing dimensional modeling it's not require that you need to have referential integrity because ETL does it for you. Because ETL process is the source of all key values that will be inserted into the star. Any ETL process should have surrogate keys in the dimension tables and while loading facts natural keys will be transformed with those surrogate keys from dimension. Any data that would violate referential integrity will be caught by these process.

In summary, If your ETL is built correct then you don't need to have referential integrity in the Data Warehouse design.

Saturday, August 24, 2019

Table variable using in SSIS packages

When you need it and how to use it?

As you may know table variable is similar to temp table in Microsoft SQL, however; table variable generally uses fewer resources. I am going to explain when table variable could be useful at the time of SSIS package development.

Scenario 1: When you need to store data like array and need to use it later in your SSIS package

For example, below scenario I only had to insert/update data where the periods exist in the source data and then making next operation depend on the value exist or not.

DECLARE @periodACD TABLE (periodACD char(2))



FROM [ETL].[SourceTable];

SELECT CASE WHEN periodACD='4' THEN 'Insert or update for April'  END /*You can use any transformation or calculation at THEN clause.*/
FROM @periodACD WHERE periodACD='4'

Scenario 2: Keep insert or updated row count for logging purposes while you use SQL Merge

For example, you are using merge to update the dimension and want to catch the number of rows you insert, you can use table variable to store the data for merge.

DECLARE @RowCount TABLE (InsertedID INT) --Declare Table variable

MERGE  [Dim].[Table] AS Target
 USING    [Staging].[Data] as Source
    ON Target.SourceID = Source.SourceID --BusinessKey

AND (ISNULL(Source.SourceName,'')<>ISNULL(target.[BrandDesc],'')
OR ISNULL(Source.SourceDate,0)<>ISNULL(Target.SourceDate,0)
        OR ISNULL(Source.SourceBit,'')<>ISNULL(Target.SourceBit,'')
   UPDATE SET Target.SourceName = Source.SourceName,
  Target.SourceDate = Source.SourceDate,
  Target.SourceBit = Source.SourceBit

OUTPUT inserted.SourceID 'InsertedID' INTO @RowCount; --output IDs for rowcount

FROM @RowCount

Saturday, July 20, 2019

New to Git and Github? How Azure DevOps connect with Git? Must know.

Those who have been using TFS for a long and now you know 'Git' is a 'thing' that you can't avoid.  When you start with Git, please don’t expect it will be exact same like TFS. When I have started with TFS 10 years back, it was super easy to start with, however for Git, start with a mindset that, command line is required.

As you know Git, other name come along which is 'GitHub' and then confusion grows. 
Many people think Git and GitHub are the same, But it’s not. Git is the version-Control software and GitHub is your storage with sharing capability. If you don’t want to share your code with anyone but still having versions maintenance you don’t need GitHub (which is unlikely though). Moreover, alternative to GitHub you can use Azure DevOps.

Those of you used to work with TFS, maybe already got to know about Azure Devops which give you option to either use Git or TFS as your version-control tool. It means in this scenario, you don’t need GitHub. Below picture depicts Git relate with Azure DevOps and GitHub.

Fig 1: Git, GitHub and Azure Devops

Let's start with beginner steps to work with Git.

Step 1: Install Git at your PC.
Follow the link and choose your right platform to download the file and then install it.

Step 2: Open the Git command window, which look like below

Fig 2: Installed Git software

Now, you would like to create a solution or file which you would like to add to the Git and would like to work with other colleagues.

Adding file to Git means, the folder where you keep the files/code should be recognized by Git. To achieve that, you need to do following:

For example, your code is located under C\Code and you want to make this as Git repository.

After opening the Git Bash, write below command:

1 cd C:\                                                                                                            

2 cd Code
3 git init

Now the folder is ready to use as Git repository

'Initialized empty local Git repository in C:/Code/.git/'

And you will a find a git folder created in the folder C:/Code/

Now you can copy all your files under the folder C:/Code and let's add a file to the local Git repository.

4. git add Testcode.txt

and Commit the code in local repository

5.  git commit -m 'added test file'

Now your solution or all code files is under git repository, If you work on your own and wanted to have version control so you can always go back to your earlier code, which is ready at this stage.

Step 3:
Let's get to the next step where you would like to push this code to either Github or Azure Devops. In my case I have uploaded the code to Azure DevOps.

How do you do this?

Login to the Azure devops :

Create repository , choose Type as 'Git'

Fig 3: Create repo under Azure Devops
After finishing the repository creation, you will find like below window, where you need to copy the command line code.

Now go back to your  'Git Bash' command window, and write two lines of code:

6. git remote add origin https://d*****
7. git push -u origin –all

Now if you login to the Azure DevOps, you will see the local file you have committed earlier is uploaded to the Azure DevOps.

Saturday, June 29, 2019

Creating script along with Data for database objects- Why require? and Easy way to accomplish.

Sometimes you may have a table which have populated by business or tech to support the solution you develop. This type of table can be called as Master data table which is not derived from any source system. So whenever you need to move table DDL/script to higher environment (from DEV to UAT or PROD) you need to move not only DDL but also data for that table.

Now let's find out How do you do it?

There are a few ways to do it but my preferable way to do it by using SQL Task, we mostly use this for generating scripts; however, it can also take the data.

Step 1: Select the database and find generate script option
Fig 1: Finding out generate scripts for the database
Step 2: Follow the wizard
Fig 2: Go thought the wizard
Step 3: Choose the database object, the table you would like to move to the higher environment. And click Next.
Fig 3: Choose your database object
Step 4: Before you generate the script, you also need to have data,so click at the 'Advanced' button.
Fig 4: Advanced scripting options
Step 5: Change 'Type of data to script' to Schema and data
Fig 5: get schema and data together

You are almost done, after completing the addition few steps in the wizard script will be ready with DDL and data. Now you can move the saved file to the upper environment and run the script, you will find table and data together.

Sunday, May 19, 2019

When and why unpivot require for your Data Warehouse(DW) project?

You may involve with a Data warehouse (DW) project where you find source system got many repeated columns in a table which can be reduced to a few columns table. It's not only reducing the number of columns rather you would like to find out attributes those can be aggregated for the reporting purpose, in DW world, which will be your fact attributes. Let's start with an example:

In Medical data, Hospital can have different type of emergency bed type e.g. below table holds bed availability data for different type of bed.

Fig 1: Hospital Bed data

Let's make a table with above data, use below SQL query to populate the data.

create table #HospitalBed (TranID int,SubmissionDate datetime, GenAcuteCom int, 
ShortstayCrisisComp int, MoodAnxietyComp int, EatDisorderComp int)

 insert into #HospitalBed
 select 101,getutcdate(),10,3,9,4
 select 102,getutcdate(),5,6,2,8
select 104,getutcdate(),6,3,9,1
 select 105,getutcdate(),5,6,2,8

If you would like to get the data under, SELECT * FROM #HospitalBed

Fig 2: Data from Table

Now to fit this data for data warehouse model, we can clearly see columns need to turned into rows. These different type of bed should go under one column called 'BedType' and values for those columns  under another column can be called 'complementbed'. Now how do we do it? We have function in SQL called 'unpivot', we are going to use the function to populate expected data set.

select u.tranID, u.Bedtype, u.complementbed
from #HospitalBed s
  for Bedtype in (GenAcuteCom, ShortstayCrisisComp, MoodAnxietyComp,EatDisorderComp)
) u;

Fig 3: After unpivot

As you can see , after applying UNPIVOT you get more rows but column got reduced.

But what is the benefit? This will help to make dimensional model easy, where all these BedType can be kept in a dimension table and fact table can have the key of BedType along the with ComplementBed value. Below Fig (4) shows how dimensional model can be established for the earlier mentioned source table (Fig 1) eventually.

Fig 4: Sample Dimension and Fact table 

And consider the source table we started have 4 different type of bed, however, it could be around 20 columns and more. Dimensional model can be super complicated and even may not able to build proper dimensional model without transposing the columns into rows for the above scenario,