Pages

Wednesday, December 25, 2019

Environment variable setup in SSIS Catalogue

If your SSIS solution is built with project deployment mode(not package deployment mode) and would like introduce environment variable then this post is for you.

Let's start with a few steps before going into environment variable. In general, after building  SSIS solution, you get .ispac file which you take from DEV to UAT/TEST/PREPROD/PROD environment. And then by following the wizard you deploy the .ispac file under SSIS catalogue.  To know step by step process to deploy SSIS package to catalogue please follow : Deploying SSIS package to Catalogue

After deployment , you will find the packages under SSIS catalogue, as like below figure 01:


Fig 01: SSIS Catalogue


At next step you will create  job/Jobs  to run the packages. Now say, if you don't create any environment variable then the job will like below:

Fig 2: Job Properties without Environment variable

Will this job execute without environment variable? Surely, it does. But the connection string still retain development server information. However, the solution is already moved to UAT or Test Environment, so you need to change the connection string manually.   If  Environment variable have been defined then it can be assigned to the job so the same job (process) can run for different environment (UAT/Pre-PROD/PROD) by pointing the right environment.

Job with environment variable look like below figure 3:


Fig 3: Job with Environment variable

At this setup, you can choose from different environments before running the job. And each environment can hold different source and target database as well as source and destination file location. Now let's see how to create the environment  variable.


Step 1: Create environment variable
Under SSISDB, you will find Environment folder as like below picture, right mouse click and follow the step.

Fig 4: Create environment variable


Step 2: Property set up for environment variable

At this step, you will define connection strings, source/destination file path etc.

Fig 5: setup properties for environment



Step 3: Connect environment variable from project, Under SSISDB catalog choose the project and then right mouse click to find 'Configure'.

Fig 6: Configure environment variable



And then add reference to the project, References->Add and then environment can be chosen.

Fig 7: adding environment variable reference to the project



Step 4: From step 1 to 3, we worked with SSISDB, Now let's look at the job, SQL Server Agent->Jobs, where will have option to choose from environment

Fig 8: Job Properties
And from the properties, click on Steps and then you will find variable to choose from.



In this example, only UAT environment have been created but as per your need Pre-PROD, PROD can be created and source and target database can be assigned accordingly. And those environment can be assigned to job.

Saturday, November 16, 2019

SQL Server: Easy way to add Columns to Multiple tables

When you work in Data Warehouse/BI Project or any project where database is required and you are in a situation where need to add one or more same columns to the tables you already built then this post will help you to achieve the goal quicker and error free.

For example a BI project you may working with require to add load timestamp to all the target tables which at the design phase somehow missed or you find out at the later part of the project that you require one or two particular columns to all the tables.

Boring task!!! right?? Going through all the tables and changing it one by one is time consumable and error prone.  For example if you have 50 tables and you need to add one column say column called 'UpdatedAt'. How do we do it by using SQL script?

You need to list all the table name and run the below query by replacing the table name e.g. 'testTable1', 'testTable2' etc.
 
             Declare @tbl_name as varchar(100)
              Declare @tsql_code Varchar(500)

              Declare cur_table cursor local for
                           select name
                           from sys.tables
                           Where name in ('testTable1','testTable2')
                           order by 1
                           for read only

              open cur_table
              fetch cur_table into @tbl_name
     
                           while @@fetch_status = 0
                           begin
                               
                           Select @tsql_code='Alter Table ' + @tbl_name  +' Add Col1 Varchar(10), Col2 Varchar(10)'
                           Exec (@tsql_code)
                           fetch cur_table into @tbl_name
                           end
                     close cur_table
                     deallocate cur_table


However,  the above code may not work since schema name is missing there, so you need to add the schema name along with the table name like below example:

   Declare @tbl_name as varchar(100)
              Declare @tsql_code Varchar(500)

     Declare cur_table cursor local for
                           select '['+SCHEMA_NAME(schema_id)+'].['+name+']'
                           from sys.tables
                           Where SCHEMA_NAME(schema_id)+'.'+name in ('dbo.Currency','dbo.Sales')
                           order by 1
                           for read only

              open cur_table
              fetch cur_table into @tbl_name
     
                           while @@fetch_status = 0
                           begin
                               
                           Select @tsql_code='Alter Table ' + @tbl_name  +' Add UpdatedAt [DATETIME]'
                           Exec (@tsql_code)
                           fetch cur_table into @tbl_name
                           end
                     close cur_table
                     deallocate cur_table


Hope this will make your life easier.

Sunday, October 20, 2019

SSIS: Execute SQL Task Limiation

I would like to share an experience that found recently. In general, when you work with SSIS and using SQL Execute task, you tend to make query by using  SQL Server Management Studio (SSMS) and when query works fine then copy the code into the Execute SQL Task. However,  when I copied the code and pasted to Execute SQL task and run the package it failed at Execute SQL task, then figure it out only part of the SQL code is pasted.

Fig 1: SQL Execute Task


The SQL code I copied was around 1600 lines, however when pasted it's only took 891 lines of code to Execute SQL Task and rest have been discarded. It means Execute SQL task got limitation, out of curiosity took the pasted part of the code from Execute SQL task then count it and found it can hold up to 30,987 characters with space. (see below fig 2)

Fig 1: SQL Execution task limiation


How do you solve it?
Well, It's easy, you wrap the SQL code either by using view or store procedure and then use that in your Execute SQL task.


Sunday, September 15, 2019

Referential Integrity in Data warehouse Design?

I was involved in a project and the project was asking about implementing referential integrity for the data warehouse design. I was surprised, since I never did such design for Data Warehouse(DW).

If someone ask you to put the referential integrity in your data warehouse project, how do you explain them that it's not required for DW project?

If you design the control /config tables to maintain logs or control the flow of your ETL run then you may follow the referential integrity for only those tables, but when you start working with data load for the tables, don't go with referential integrity which will kill the performance.

So No 1 : Poor performance while loading

If your DW tables (dimensions and facts) have referential integrity which will consume extra time to insert or update the records. So standard for data warehouse design is, you don't maintain referential integrity.

Let's explain why you don't need it?

When you are doing dimensional modeling it's not require that you need to have referential integrity because ETL does it for you. Because ETL process is the source of all key values that will be inserted into the star. Any ETL process should have surrogate keys in the dimension tables and while loading facts natural keys will be transformed with those surrogate keys from dimension. Any data that would violate referential integrity will be caught by these process.

In summary, If your ETL is built correct then you don't need to have referential integrity in the Data Warehouse design.


Saturday, August 24, 2019

Table variable using in SSIS packages

When you need it and how to use it?

As you may know table variable is similar to temp table in Microsoft SQL, however; table variable generally uses fewer resources. I am going to explain when table variable could be useful at the time of SSIS package development.

Scenario 1: When you need to store data like array and need to use it later in your SSIS package

For example, below scenario I only had to insert/update data where the periods exist in the source data and then making next operation depend on the value exist or not.

DECLARE @periodACD TABLE (periodACD char(2))

INSERT INTO @periodACD

SELECT Distinct [PERIOD]

FROM [ETL].[SourceTable];


SELECT CASE WHEN periodACD='4' THEN 'Insert or update for April'  END /*You can use any transformation or calculation at THEN clause.*/
FROM @periodACD WHERE periodACD='4'


Scenario 2: Keep insert or updated row count for logging purposes while you use SQL Merge

For example, you are using merge to update the dimension and want to catch the number of rows you insert, you can use table variable to store the data for merge.

DECLARE @RowCount TABLE (InsertedID INT) --Declare Table variable

MERGE  [Dim].[Table] AS Target
 USING    [Staging].[Data] as Source
    ON Target.SourceID = Source.SourceID --BusinessKey
WHEN MATCHED

AND (ISNULL(Source.SourceName,'')<>ISNULL(target.[BrandDesc],'')
OR ISNULL(Source.SourceDate,0)<>ISNULL(Target.SourceDate,0)
        OR ISNULL(Source.SourceBit,'')<>ISNULL(Target.SourceBit,'')
   
THEN
   UPDATE SET Target.SourceName = Source.SourceName,
  Target.SourceDate = Source.SourceDate,
  Target.SourceBit = Source.SourceBit

OUTPUT inserted.SourceID 'InsertedID' INTO @RowCount; --output IDs for rowcount

SELECT COUNT(*)
FROM @RowCount

Saturday, July 20, 2019

New to Git and Github? How Azure DevOps connect with Git? Must know.


Those who have been using TFS for a long and now you know 'Git' is a 'thing' that you can't avoid.  When you start with Git, please don’t expect it will be exact same like TFS. When I have started with TFS 10 years back, it was super easy to start with, however for Git, start with a mindset that, command line is required.

As you know Git, other name come along which is 'GitHub' and then confusion grows. 
Many people think Git and GitHub are the same, But it’s not. Git is the version-Control software and GitHub is your storage with sharing capability. If you don’t want to share your code with anyone but still having versions maintenance you don’t need GitHub (which is unlikely though). Moreover, alternative to GitHub you can use Azure DevOps.

Those of you used to work with TFS, maybe already got to know about Azure Devops which give you option to either use Git or TFS as your version-control tool. It means in this scenario, you don’t need GitHub. Below picture depicts Git relate with Azure DevOps and GitHub.

Fig 1: Git, GitHub and Azure Devops

Let's start with beginner steps to work with Git.

Step 1: Install Git at your PC.
Follow the link and choose your right platform to download the file and then install it.
https://git-scm.com/downloads

Step 2: Open the Git command window, which look like below

Fig 2: Installed Git software

Now, you would like to create a solution or file which you would like to add to the Git and would like to work with other colleagues.

Adding file to Git means, the folder where you keep the files/code should be recognized by Git. To achieve that, you need to do following:

For example, your code is located under C\Code and you want to make this as Git repository.

After opening the Git Bash, write below command:

1 cd C:\                                                                                                            

2 cd Code
3 git init

Now the folder is ready to use as Git repository

'Initialized empty local Git repository in C:/Code/.git/'

And you will a find a git folder created in the folder C:/Code/

Now you can copy all your files under the folder C:/Code and let's add a file to the local Git repository.

4. git add Testcode.txt

and Commit the code in local repository

5.  git commit -m 'added test file'


Now your solution or all code files is under git repository, If you work on your own and wanted to have version control so you can always go back to your earlier code, which is ready at this stage.

Step 3:
Let's get to the next step where you would like to push this code to either Github or Azure Devops. In my case I have uploaded the code to Azure DevOps.

How do you do this?

Login to the Azure devops : https://dev.azure.com/

Create repository , choose Type as 'Git'

Fig 3: Create repo under Azure Devops
After finishing the repository creation, you will find like below window, where you need to copy the command line code.



Now go back to your  'Git Bash' command window, and write two lines of code:

6. git remote add origin https://d*****@dev.azure.com/dpaul0628/DpaulGithub/_git/LoveGit
7. git push -u origin –all

Now if you login to the Azure DevOps, you will see the local file you have committed earlier is uploaded to the Azure DevOps.

Saturday, June 29, 2019

Creating script along with Data for database objects- Why require? and Easy way to accomplish.

Sometimes you may have a table which have populated by business or tech to support the solution you develop. This type of table can be called as Master data table which is not derived from any source system. So whenever you need to move table DDL/script to higher environment (from DEV to UAT or PROD) you need to move not only DDL but also data for that table.

Now let's find out How do you do it?

There are a few ways to do it but my preferable way to do it by using SQL Task, we mostly use this for generating scripts; however, it can also take the data.

Step 1: Select the database and find generate script option
Fig 1: Finding out generate scripts for the database
Step 2: Follow the wizard
Fig 2: Go thought the wizard
Step 3: Choose the database object, the table you would like to move to the higher environment. And click Next.
Fig 3: Choose your database object
Step 4: Before you generate the script, you also need to have data,so click at the 'Advanced' button.
Fig 4: Advanced scripting options
Step 5: Change 'Type of data to script' to Schema and data
Fig 5: get schema and data together




You are almost done, after completing the addition few steps in the wizard script will be ready with DDL and data. Now you can move the saved file to the upper environment and run the script, you will find table and data together.