Sunday, May 7, 2017

Exporting data as flat file by SSIS: Step by step guideline


Exporting data from database and convert that into comma seperated file(.csv) is pretty straight forward and easy task by using SSIS,

As an example, I need to export ResellerInfo from AdventureWorksDW2012 and generate .csv file by using SSIS.

In SSIS Toolbox you have transformations called a) Data flow Task b) OLE DB source and c) Flat File Destination , to complete the excercise we need those two transformations and one Task.

Fig 1: SSIS Toolbox (Data flow Task)



Fig 2: SSIS Toolbox
In OLE DB Source you should write your SQL query to populate the result, in this case, I have below SQL:   SELECT
       [ResellerName]
      ,[NumberEmployees]
      ,[OrderFrequency]
      ,[OrderMonth]
      ,[FirstOrderYear]
      ,[LastOrderYear]
      ,[ProductLine]
      ,[AddressLine1]
      ,[BankName]
      ,[MinPaymentType]
      ,[YearOpened]

  FROM [AdventureWorksDW2012].[dbo].[DimReseller]
 

At your package you need drag a data flow task and then double click the data flow task, now under the data flow drag and drop the two transformations OLE DB Data source and Flat File Destination.

Fig 3: Under Data Flow task

In the above design, 'Populate Data' is a dataflow task and 'destination CSV' is Flat file destination transformation.

While we connected from 'Populate data' to the 'Destination CSV', we had to configure the 'Destination CSV' which is Destination Flat file Transformation. Destination Flat File configuration must connect with Flat file connection manager as like below:

Fig 4: Flat file destination is conneted with Flat file connection manager

Flat File connection manager should include, connection manager name, file destination and name at least. You can have options to choose from different delimeter like comma, semicolon etc.

Fig 5: Configure flat file connection


Now, from your package you can run the data flow task "Generate CSV for Reseller" by right click.

Fig 6: Execute Data flow task



and then you should have the .csv under the location that you have already put in the flat file connection manager.

Fig 7: CSV result set at the specified location


Friday, April 21, 2017

SSIS Tricks: Control executing the next SSIS tasks

Your SSIS package may requrie to have control that will make decision if the next SSIS tasks will execute or not. I have an example where 'Data Flow Task' for loading the fact table will start if 'Execute SQL  Task' return TRUE.

Here as an example; I have a sequence container that look like below:
Fig 1: Execute SQL task and  Data flow task

Let's look at the code in SQL Execute task named "Control Fact Load":

Fig 2: SQL query for SQL Execute Task

The SQL code in the above diagram:

SELECT case when max([TimeKey])<convert(varchar(8),getdate(),112)
                      Then 'Y' 
                    Else 'N'
         END  as GoToNextStep
                FROM [dbo].[Fact_XXX]

The SQL code in the above diagram intend to find if there is any Date exist in the fact table bigger than today. It means if the fact table's date "TimeKey" is less than today's date then we retutn 'Y' ; so, we did not load any data for today yet. And if today's date is already exist it means data is uploaded to the fact already then we return 'N' hence next SSIS task will not execute.

The return value of the SQL need to save in a variable. So user variable GoToNextStep is created and assigned the value after the Execute SQL task as like below:
Fig 4: User variable created
And then assigned the value after SQL execute task:

Fig 5: Binding value to the variable


The way we control between two tasks is precedence constraint.

Fig 6: Prcedence constraint editior

You need to open the prcedence constraint editior (right click the connector between two tasks) and then set the value as like below:
Evaluation operation: Expression and Constraint
Value=Success
Expression: @[User::GoToNextStep]=="Y"

Now, Data Flow Task (Load Data To Target table) will only execute if prior Exceute SQL task (Control Fact Load) return "Y". If 'N' is returned via Exceute SQL task (Control Fact Load) then next data flow task (Load Data To Target table) will not execute at all.



Sunday, April 2, 2017

How to atuomate SFTP file transfer in Windows

When it comes to loading file to the FTP server then we use different free FTP/SFTP client like WinSCP, FileZilla etc. The process of uploading files is completely manual in this way. If your business needs to upload files repeatedly then the process should be automated.

My writing will cover how this can be done in Microsoft windows.
For example you have a file called UploadMe.txt which is located in C:\temp. Now we would like to upload this file to the SFTP Server. Before we start using  script, let's see how we do in WinSCP.

Fig 1: WinSCP  User Interface



As we can see, we need following information to connect the FTP/SFTP server, I have put the information as an example:

Host name: ftp.myftp.com
Port number: 22
User bane: myuser
Password: Password

And when you are connected to the SFTP server then you drag files from your local machine and drop the files to the server.
Fig 2: Moving file from local PC to SFTP Server

How do we do the same thing by using script?

Step 1: You need to install WinSCP, so go to the : https://winscp.net/eng/download.php and download it.
       
Fig 3: Download WinSCP 

Step 2: Script for establishment of connection to the SFTP server and move the file from your machine to the Server. So the script should cover at least below:

a) Opening a connection the SFTP server, below syntax will open the connection:
open sftp://myuser:Password@ftp.myftp.com:22


b) Syntax for copying data from local machine to FTP server:
put -nopermissions -nopreservetime "C:\temp\test_data.txt"  test_script.txt

Lets make full script in one place and save the file as 'uploadscript.txt'

option echo off
option batch on
option confirm off
open sftp://myuser:Password@ftp.myftp.com:22
#Change LOCAL directory
lcd "C:\temp\"
#copy an individual file
put -nopermissions -nopreservetime "C:\temp\test_data.txt"  test_data.txt
exit


Before we run the above script through .bat file, we would like to test if we can connect with FTP server and load the file by using command windows.

Fig 4: windows Command line
Change the path to WinSCP so that we can use WinSCP resources:

Fig 5: Accessing WinSCP file

Run the script which will make connection with the SFTP Server and copy the file to the server.

Fig 6: Make connection and move file

As you can see from the above screenshot (fig 6), SFTP server connection is made and started loading the file.

However, above process is done through windows command line to see if we can establish connection and transfer file. Since we would like to make it automated so we need to make a batch file and run the script ('uploadscript.txt') that we just run through windows Command line. And last step is to schedule the task via windows Task Schedular, hurrah, now your file will be automatically upload to the SFTP server as schedule them.















Sunday, March 26, 2017

How to fix your corrupted SSAS database?

If your SSAS database is corrupted, then you can either 1) restore the database or 2) delete the existing SSAS database and then process the cube to load the fresh data. 

I would like to share the second one: Delete the existing database and then process the SSAS cube to load the fresh data.

For example, my SSAS database 'HelloWorld' got corrupted and now you would  like to fix the SSAS database, so following steps should be taken to fix it.

Step 1: Delete the existing database


Fig: Find your corrupted database

Hit the delete button

Fig 2: Delete the database
Choose any one of the options that you prefer:

Fig 3: 



Since database is deleted, now we can open SSAS package and hit the process button to deploy and process it as like below:


Fig 4: Deploy the SSAS Cube


You may unfortunate:


However, if we may unfortunate sometime when you will find error like :

Fig 5: Deployment error

Step 2:  Delete the .xml file from data directory

To fix the error we need to delete .xml file from the data directoy, Depend on your Windows version (32/64 bit), you can go to data dictionary and delete the xml file
C:\Program Files\Microsoft SQL Server\MSAS11.TABULAR\OLAP\Data


The XML file format will look like: <CubeID>.<VersionNum>.db.xml, e.g. this case:  HelloWorld 1.0.db.xml . You can also delete the folder as like same name:  HelloWorld 1.0.db

Fig 6: Deletete the .xml file



 Step 3: Restart the SSAS 

Restart the SSAS 
Fig 7: Restart the SSAS

Step 4: Run the SSAS package

You are all set to process the cube. So open your SQL Server Data Tools and SSAS solution, process the cube and you will be able to load fresh data structure and the data, it means your databse is back again and functional as like before.





Thursday, February 9, 2017

Measures(Calculation) in SSAS Tabular

While you work with SSAS Tabular or OLAP you have option to create measures. As found calculation in SSAS OLAP is easier than Tabular. However, I have recently made SSAS Tabular cube and going to share how to create Measures/Calculation in SSAS Tabular model.

SSAS tabular project look like below, I have marked the area for the calculation/measures.

Fig 1: Measures/Calculation area.

How do we make Measures?

Select any cell from above fig 1, and then as like excel you  have option to put the calculation or formula at the formula bar as like below fig 2 (yellow shaded).

Fig 2: Calculation bar
Here, I am going to calculate how many product is sold in Canada region, So I have calculated number of rows where productno are not empty and filter the salesTerritory with Canada.

The measure name is: Prod_Cnt_Canada (product count in the canda region)

Prod_Cnt_Canada:=COUNTAX(FILTER(FactInternetSales;[SalesTerritoryKey]=6);[ProductKey])

The above example counts the number of nonblank rows in the column, ProductKey, using the table FactInternetSales that results from filtering the SalesTerritoryKey=6 (means canada).

COUNTAX function is to return the number of rows for non empty rows. And FILTER function is to do the filter.

Result after the calculation is show in the below figure: 3

Fig 3: Measure after the calcuation



Wednesday, January 18, 2017

MERGE in SQL is a good friend for Data Warhouse

If your need is SCD2 type table, where you would like to keep the history. Then SQL merge could be your good friend. You could produce history with different way but SQL merge is clean; it's one placeholder for inserting and updating the data.

Lets start with very simple example, in data warehouse many of your table must need to hold history where you take each full load at the first load and then you start loading delta each day. In data warehouse you have many layers but in my example, it's source and target table is taken.

Step 1: Creating Source Table
First make a source table:
Create table Source_Table (id int, ename varchar(50), UpdateDate DateTime);

Step 2: Creating Target table
We need to make Target table to load the data

Create table Target_Table (id int, ename varchar(50),EFF_DT date, END_DT date)

Step 3:  Merge Actions in general
Merge have three different actions, those are INSERT, UPDATE and DELETE.

Below SQL query will show the current inserted row to the table, you don't need to write select * from syntax.

insert into Source_Table
OUTPUT INSERTED.*
values (1,'Diponkar',getdate()), (2,'San',getdate()), (3,'Derek',getdate()), (4,'Paul',GETDATE())

Fig 1: OUTPUT from the Insert                 


Step 4: Data warehouse Full load/Init load

Since we don't have any data in the target table. So, if we execute below merge query it will only affect the insert part of the merge.

Merge target_Table T
Using Source_Table S
on (S.id=T.id)
When NOT Matched
Then insert (id,ename,EFF_DT,END_DT)
values(S.id,S.ename,GETDATE(),'2999-12-31')
When Matched
Then Update SET T.ename=s.ename
OUTPUT $action, DELETED.*, INSERTED.*;

And we are able to see what is ACTION of this merge.

Fig 2: Merge Action


Above figure 2, clearly shown that only insert action is taken place by the merge,

Step 5:   Delta load
Source data is daily delta data, so we can delete the old data from source table and insert two new rows. One with new key and other with existing key.

delete from Source_Table

insert into Source_Table (updated data with same business key)
values(1,'Dipu2',GETDATE())

insert into Source_Table (completely new row)
values(5,'Drina',GETDATE())

Data in the source table is like below:

Fig 3: Delta load



Now, lets run the below merge query and catch the action:

Merge target_Table T
Using Source_Table S
on (S.id=T.id)
When NOT Matched
Then insert (id,ename,EFF_DT,END_DT)
values(S.id,S.ename,GETDATE(),'2999-12-31')
When Matched
Then Update SET T.ename=s.ename
OUTPUT $action, DELETED.*, INSERTED.*;


There should be one insert and one update, lets have a look at the output:

Fig 4:Output after merge applied


Step 6: Data in the target table
As we see from the below fig: 5, merge query updated the target table accordingly.

Fig 5: Insert and Update shown

Friday, December 30, 2016

How to Fix SSAS database ID and Name miss match?


You can have a situation where your SSAS database ID and Name can be miss matched. If you try to manually change the Database Name but ID still the same; this could happen when you move solution from development to Test and then change the name but still ID did not change.

What could possibly happen if the ID and Name are not same:

1) If you deploy SSAS package via Visual Studio it will look like processing but actually you will not find updated processed cube. (you can look at the last processed property of the database)
2) If you have job agent and then try to run the job it will show like it's working but actually there will be no update of the cube. You can look at the last processed property timestamp.

So, it means your SSAS package will not be updated any more.

How to check Name and ID are miss matched?

After the deployment of your SSAS Package your SSAS database look like below; where you have Tabular and OLAP cube.

Fig 1: Analysis Server Database


If you look at the tabular database closely and see the property where name and Id are miss matched.

Fig 2: Name and Id miss match for the tabular cube




What is the fix?

At first, you need to create script of the database as like below, which will generate .XMLA file:
Fig 3: Create script of the Tabular database

Generated .xmla file will look like below where you can find ID and Name.



Secondly, You need to change the ID and Name from .xmla script and press 'F5' to execute the change. Now if you look into the databse you will find the ID and Name are completely matched.

Fig 4: ID and Name is updated


Now you will be able to deploy your package with new database name but with same metadata.

Fig 5: Deploy package with new database Name

You may not be happy at this point, since you wanted to have your original name back, well, it's simple just open SQL Server Data Tools and from the solution property change your Database name as like above screenshot (Fig 5), you can put your original Database name and then deploy; you are all set now.  You may want to delete other database that you created, no worries; go ahead and delete the old one.