It's all about Data

Wednesday, January 18, 2017

MERGE in SQL is a good friend for Data Warhouse

If your need is SCD2 type table, where you would like to keep the history. Then SQL merge could be your good friend. You could produce history with different way but SQL merge is clean; it's one placeholder for inserting and updating the data.

Lets start with very simple example, in data warehouse many of your table must need to hold history where you take each full load at the first load and then you start loading delta each day. In data warehouse you have many layers but in my example, it's source and target table is taken.

Step 1: Creating Source Table
First make a source table:
Create table Source_Table (id int, ename varchar(50), UpdateDate DateTime);

Step 2: Creating Target table
We need to make Target table to load the data

Create table Target_Table (id int, ename varchar(50),EFF_DT date, END_DT date)

Step 3: Merge Actions in general
Merge have three different actions, those are INSERT, UPDATE and DELETE.

Below SQL query will show the current inserted row to the table, you don't need to write select * from syntax.

insert into Source_Table
OUTPUT INSERTED.*
values (1,'Diponkar',getdate()), (2,'San',getdate()), (3,'Derek',getdate()), (4,'Paul',GETDATE())

Fig 1: OUTPUT from the Insert

Step 4: Data warehouse Full load/Init load

Since we don't have any data in the target table. So, if we execute below merge query it will only affect the insert part of the merge.

Merge target_Table T
Using Source_Table S
on (S.id=T.id)
When NOT Matched
Then insert (id,ename,EFF_DT,END_DT)
values(S.id,S.ename,GETDATE(),'2999-12-31')
When Matched
Then Update SET T.ename=s.ename
OUTPUT $action, DELETED.*, INSERTED.*;

And we are able to see what is ACTION of this merge.

Fig 2: Merge Action

Above figure 2, clearly shown that only insert action is taken place by the merge,

Step 5: Delta load
Source data is daily delta data, so we can delete the old data from source table and insert two new rows. One with new key and other with existing key.

delete from Source_Table

insert into Source_Table (updated data with same business key)
values(1,'Dipu2',GETDATE())

insert into Source_Table (completely new row)
values(5,'Drina',GETDATE())

Data in the source table is like below:

Fig 3: Delta load

Now, lets run the below merge query and catch the action:

Merge target_Table T
Using Source_Table S
on (S.id=T.id)
When NOT Matched
Then insert (id,ename,EFF_DT,END_DT)
values(S.id,S.ename,GETDATE(),'2999-12-31')
When Matched
Then Update SET T.ename=s.ename
OUTPUT $action, DELETED.*, INSERTED.*;

There should be one insert and one update, lets have a look at the output:

Fig 4:Output after merge applied

Step 6: Data in the target table
As we see from the below fig: 5, merge query updated the target table accordingly.

Fig 5: Insert and Update shown

Friday, December 30, 2016

How to Fix SSAS database ID and Name miss match?

You can have a situation where your SSAS database ID and Name can be miss matched. If you try to manually change the Database Name but ID still the same; this could happen when you move solution from development to Test and then change the name but still ID did not change.

What could possibly happen if the ID and Name are not same:

1) If you deploy SSAS package via Visual Studio it will look like processing but actually you will not find updated processed cube. (you can look at the last processed property of the database)
2) If you have job agent and then try to run the job it will show like it's working but actually there will be no update of the cube. You can look at the last processed property timestamp.

So, it means your SSAS package will not be updated any more.

How to check Name and ID are miss matched?

After the deployment of your SSAS Package your SSAS database look like below; where you have Tabular and OLAP cube.

Fig 1: Analysis Server Database

If you look at the tabular database closely and see the property where name and Id are miss matched.

Fig 2: Name and Id miss match for the tabular cube

What is the fix?

At first, you need to create script of the database as like below, which will generate .XMLA file:

Fig 3: Create script of the Tabular database

Generated .xmla file will look like below where you can find ID and Name.

Secondly, You need to change the ID and Name from .xmla script and press 'F5' to execute the change. Now if you look into the databse you will find the ID and Name are completely matched.

Fig 4: ID and Name is updated

Now you will be able to deploy your package with new database name but with same metadata.

Fig 5: Deploy package with new database Name

You may not be happy at this point, since you wanted to have your original name back, well, it's simple just open SQL Server Data Tools and from the solution property change your Database name as like above screenshot (Fig 5), you can put your original Database name and then deploy; you are all set now. You may want to delete other database that you created, no worries; go ahead and delete the old one.

Monday, December 5, 2016

How to Insert data into a table with Identity column

In general, you don't insert data manually in a table which has an identity column. However, in obvious cases if you need to insert data into the table manually then writing general inset Query will failed with error:

For example, you have a table (DDL) like below:

Create table PurchaseOrder

(orderId int identity(1,1),

item nvarchar(100))

If you would like to insert data into the table, standard syntax for inserting data is like below:

Insert into PurchaseOrder(orderId,item)

values (101,'strawberry')

However, above query will failed with below error message:

Fig 1: Eror Inserting data into the table with Identiy column

To insert the data into a table with identity column you should not put any value for the identity column. e.g.

Insert into PurchaseOrder (orderId,item)
values ('strawberry')

And then you will see the result set like below:

Fig 2. Output of the above insert command

However, if you would like to explicity put value for the identity column. e.g. you would like to add a negative number as orderid to make an item invalid. (sometimes you may need for the data warehouse dimension and fact tables). SQL for inserting data with your own choice:

SET IDENTITY_INSERT PurchaseOrder ON
insert into PurchaseOrder(orderId,item)
values (-1,'An Island')
SET IDENTITY_INSERT PurchaseOrder OFF

Now the result will look like below:

Fig 3: Output after the above SQL command

Thursday, November 24, 2016

Is your Pre-Production database environment aligned with Production?

As database developer or BI developer you may get below questions many times:

1) Is our Pre-Production database has exact match with production database?

2) How do we ensure Test environment has good data quality?

3) Is our Test environment has big deviation than production system? and so on.

To answer those questions you have different ways of investigating the database tables. However, one most obvious and first step could be counting rows of two different databse environemnts. You may think it's going to take a lot hell of time to count all the tables's row count in two different environments and list them manually.

well, you don't need to do this task manually. There is a way where you can get row counts of all the tables in a database with small piece of code.

Please run the below code to find number of rows for each table in the chosen database.

USE AdventureWorks2012
GO
SELECT @@SERVERNAME AS Server_Name,
DB_NAME() AS Current_Database,
SCHEMA_NAME(A.schema_id) + '.' + A.Name as Table_Name,
SUM(B.rows) AS Row_Count
FROM sys.objects A
INNER JOIN sys.partitions B ON A.object_id = B.object_id

GROUP BY A.schema_id, A.Name

I have used AdventureWorks2012 databse and result set looks like below:

Fig 1: result set of the query

At the same way you can find out number of rows of the tables across different database environments. Now you can place the result side by side in excel and easily compare the row numbers and number of tables between two different database environments. If you wish to make this automated then you can put this SQL code in a SSIS package and load the data after comparison.

Friday, November 4, 2016

Fix the SSIS Errror: AcquireConnection method call to the connection manager (error code 0xC0202009)

I was doing ETL to get data from staging area to Data warehouse and while runnig the data flow task it's shown the error : The AcquireConnection method call to the connection manager [DatabseName] failed with error code 0xC0202009.

Fig 1: Error code 0xC0202009

Interestingly, the same SSIS package is running without error in Test and Production. After a little bit of interenet search as found, a few people mentioned about changing the Debug options from 64bit to 32Bit, It means you set Run64BitRuntime=False, as like below:

Fig 2: Change 64bitRuntime from debug options

However, it did not work for me, then tried a attribute called 'Delay validation'. You need to click the Data Flow Task's property and change the 'Delay Validation' from TRUE to FALSE.

Fig 3: Change Delay Validation Property

After changing the property you should able to run the package without error,

Saturday, October 29, 2016

Excel PowerPivot.: How to execute SQL query inside PowerPivot

Power Pivot is one of the best faeture in excel. Though PowerPivot can be used to serve different purposes, however; this post only include: how you can run your SQL code into Powerpivot and manipulte the data into the excel?

Why?

If you use Excel Power Pivot as front end tool then by implementing this process you can avoid asking developer a new excel sheet each time the data is updated.

How?

Before you start you must need to have Powerpivot enable in your excel, if so then your excel Tabs are like below:

Fig 1: Different tabs in the excel

If you don't find the tab 'POWERPIVOT' in your excel then you need to work a little more to add that. Please follow the below link to make it work:

Step 1 :Make connection with your data source

Fig 2: Manage option under POWERPIVOT

After you click the manage window you will find a popup window which like below, from there you will have option to connect with data source.

Fig 3. Connect with the data source

You need to follow the wizard to complete the connection.

Step 2: PowerPivot Design

After you establish connection with the data source, then click on Design tab as like below

Fig 4: Design Tab from powerpivot manager

Under the design window you wil find following Tabs like picture and you need to choose 'Table properties'

Fig 5: Design- table properties

And then you need to click Table properites where you will write/paste the SQL code to generate the data directely from the data source.

Fig 6: Add SQl to the table property window

After putting the SQL code in the above window, Please click 'validate' button before you click the 'Save' button; it will validate your SQL code. Now press 'Save' button then SQL query will run and get the data from the database and populate the data in your excel.

Tuesday, September 6, 2016

Excel connection Manager Error: Error Code 0x80004005

I was working with loading data from list of excel to the SQL database and the package was using SSIS foreach loop container, Excel source and OLEDB destination.

while I was running the package Excel connection Manager was giving the error as like below screenshot (Fig 1)

Fig 1: Error found at excel connection manager

Problem started when I have changed source from specific file to the whole folder path from the expression window:

Fig 2: Expression window from excel connection manager property

Error message was like below:

"Excel Connection Manager" Description: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft Access Database Engine" Hresult: 0x80004005 "

I was looking into fix the issues, after googling it I found different things can happen with the same error code: 0x80004005, however; after trying different way I found the fix which is changing the Run64BitRunTime property from TRUE to FALSE.

How to do?

At first. right click on the Project under solution explorer and open the property window which look like below:

Fig 3: Find out the property window

Now from the property window, go to the Configuaration Properties->Debugging and then change the value for Run64BitRunTime from TRUE to FALSE and then click OK to save the changes.

Fig 4. Update property value

It may still show error when you click Excel source and edit it, however if you run the package or loop container it will run and save the data to the database.

Pages