It's all about Data

Sunday, March 20, 2016

Git for Dummies: A basic guide to use Git for your project.

As per the Git website "Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency." [1]

I have worked with Microsoft TFS before but this is the first time using Git for my current project. I am going to write my experiences with Git and the command set which is necessary to work with Git.

Download and Install Git:

You can download Git from https://git-scm.com/download
And then install by following the wizard. After installing Git you will see below three items those will be added as program.

Fig 1: Installed Git in the PC

Among three use 'Git Bash' to start with.

Get the source code:
To get the source code from repository, you need to clone the code. For example if the source code repository is on your network drive or any server then you need to clone that code first at your own PC.

As you have seen from Fig 1, from three items click Git Bash and then you will find below window where you will write command:

Fig 2: Git Bash Window

If your source repository is in the network drive in your organization, e.g. network drive location is: //SourceRepository/Project1

Then, command look like below:
$ git clone //SourceRepository/Project1 myproject

The code from your network drive will be copied to myproject folder in your local PC's github folder.

If you want to change the path and then load there then the command will be:
Change path:
$ cd /D/myproject
Then clone:
$ git clone //SourceRepository/Project1 myproject

Now the code will be in your folder /D/myproject

Then initialize it:
git init

Working with Git
Now you are ready to work on your code, you can add new file, edit and update the existing one.

Add new file:
git add paul.html

Commit file:
git commit -m 'added new page' /*here we are putting comments while committing*/

Check status :
git status

To see list of the files in the folder:

git ls-files

Fig 3: List of the file in the Git folder

It is good idea to make branch and then work on that branch e.g.

Make your branch:
git branch MyTestBranch

Checkout the branch:
git checkout MyTestBranch

then you can add any file to your branch or change the existing file:

Create new file:

touch test.pli

touch paul.html

Add those file:
git add .

and commit it:
git commit -m 'two new files are added'

These files are now in my branch but not saved in Master

If you checked out the master we will not find those two files:

$ git checkout master

How to merge the change from MyTestBranch to Master, you must have to be in Master Branch then write below command:
git merge MyTestBranch

[1] https://git-scm.com/

Tuesday, February 23, 2016

Common Table Expression (CTE) in MS SQL and Teradata

Common Table Expression (CTE) is a SQL syntax that uses in MS SQL, Teradata and other databases. This post will include why will we use CTE? and SQL syntax of CTE.

By using CTE; complex query can be written as simple readable and maintainable way. It’s something like temporary result set but not exactly temporary/volatile table. You can avoid having volatile table if you use CTE. CTE is like dynamic view, however views meta data is stored in the database but CTE doesn't require that.

A recursive CTE can simplify the code required to run a recursive query within a SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. For example, in an organization chart you need display employee’s positions in the organization or e.g. Showing product hierarchies, you need to find out product segment, and then Product template, eventually the product. Above both cases, CTE can be helpful to find out the hierarchical chart.

A simple CTE query will look like below:

with CTE_PRODUCT_X_CUSTOMER (PD_ID,PROD_NAME,CUSTOMER_ID,CUSTOMER_NAME,SALES_DATE)

(

Select P.PD_ID as PD_ID, P.PROD_NAME as PROD_NAME, C.Customer_ID as CUSTOMER_ID, C.CUSTOMER_NAME as CUSTOMER_NAME, S.Sales_date as SALES_DATE

From Sales as S

JOIN

From Product as P on

S.PD_ID=P.PD_ID

JOIN Customer as C

ON S.CST_ID=P.CST_ID

)

select * from CTE_PRODUCT_X_CUSTOMER

Above example deliver sales information of particular customer. of course, this can be done without having CTE, a simple join or view can achieve same output. However, the above example can introduce you CTE syntax.

Now, Lets look at an example where CTE can avoid volatile/temporary table. e.g. you need to update Manufacture_id from your PRODUCT_SALES table. And to update the column Manufacture_id, you are depend on Product_ID matches from other sub queries. The example can go like this:

create volatile table TMP

(

Select K.PD_ID as PD_ID, P.PROD_NAME as PROD_NAME, Ck.Customer_ID as customer_id, P.EFF_DT as EFF_DT

From Product_STG as P

join Product_KEY as k

on p.PRD_CODE=k.PRD_CODE

and p.EFF_DT> '2016-01-29'

JOIN Customer_key as CK

on CK.SRC_ID=P.Source_Id

and p.eff_dt>'2016-01-29'

) with data

on commit preserve rows;

update PRODUCT_SALES

set Manf_id=(select customer_id from TMP

where PRODUCT_SALES.PD_ID=TMP.PD_ID

and PRODUCT_SALES.EFF_DT=TMP.EFF_DT)

Above example required to create volatile table. Now we can see how we can avoid using volatile table; instead, lets use CTE syntax.

with CTE_TEMP (PD_ID,PROD_NAME,customer_id,EFF_DT)

(

Select K.PD_ID as PD_ID, P.PROD_NAME as PROD_NAME, Ck.Customer_ID as customer_id, P.EFF_DT as EFF_DT

From Product_STG as P

join Product_KEY as k

on p.PRD_CODE=k.PRD_CODE

and p.EFF_DT> '2016-01-29'

JOIN Customer_key as CK

on CK.SRC_ID=P.Source_Id

and p.eff_dt>'2016-01-29'

)

update PRODUCT_SALES

set Manf_id=(select customer_id from CTE_TEMP

where PRODUCT_SALES.PD_ID=CTE_TEMP.PD_ID

and PRODUCT_SALES.EFF_DT=CTE_TEMP.EFF_DT)

As you see, you can avoid creating temporary table by using CTE. Please remember, you can’t create view on top of CTE.

Saturday, January 16, 2016

Teardata Error Code 2652: Operation is not allowed and table is being Loaded, how to solve?

I was working with a SAS DI job and using transformation called Teradata Table Loader to load data into the teradata table named 'Table1'.

Fig 1: Job (dummy) with Teradata Table Loader

After running the job I found error: Operation is not allowed: 'Table1' is being Loaded. After investigating a little found the reason that when you are loading data into Teradata and using TPT (Teradata Parallel Transporter) utility with Teradata MultiLoad you can find such error. The Teradata Table Loader property in the job looks like below:

Fig 2: Property for transformation Teradata table loader.

And the error means you will not able to make any operation to this table, it's locked.

Then I started looking into the solution; so far I found a few solutions:

1) Use empty file and connect with target table and then run it so the error will disappear. Then run with the right source file.
2) Use code to release the lock, syntax is:
RELEASE MLOAD Table1 ;
OR
RELEASE MLOAD Table1 IN APPLY

3) Drop and recreate the table.

I have tried number 1 and 2 solution but did not work successfully. DBA kill my SQL session but still it's same, did not fix the problem.

However, solution 3 works for me. Then I was able to run the job and load the data into the table.

Saturday, December 19, 2015

How to change CSV format from comma(,) delimited to pipe(|) delimited?

By default, when you save an excel file to .csv, it will be save as comma seperated file.
For example, you have data in excel that looks like:

Fig 1: Original format of data

Lets save this as .csv format:

Fig 2: Save as Comma separated file

After saving the file data will look like below:

Fig 3: Data after converting into CSV

However, you need to have pipe delimited instead of comma delimited, how do you do that? Please follow the below steps to save file as pipe delimited.

If you are using Windows 10, then you need to go to Settings and then you will find below:

Fig 4: Settings in Windows 10

Now please click "Time and Language" option then you will find below screen, where you need to click "Additional time & regional settings" button.

Fig 5: Related Setting area

Now you will find below option:

Fig 6: More click

If you click on "Change date, time and number formats" under Region section; will find below window (left) and then if you click "Additional settings" then right side window will pop up.

Fig 7: Additional settings

Finally you need to change, List separator from comma(,) delimited to pipe (|) delimited.

Fig 8: List seperator

And now, if you save the same excel as CSV then it will look like below:

So achieved your goal. However, please remember to change your settings back if you want to have your data as comma delimited.

Thursday, November 12, 2015

MS SQL: How to insert data into a table from Excel

One of my previous post I have written how to insert data from excel/csv file to Teradata table (URL: http://pauldhip.blogspot.dk/2015/02/insert-data-from-excel-to-terdata-table.html).This time, I am going to explain how to do that in MS SQL.

Lets say, we have a table in the database named Customer_dim which have following columns:

CREATE TABLE [dbo].[Customer_dim](
[Customer_id] [nchar](10) NULL,
[Customer_First_Name] [nchar](10) NULL,
[Customer_Last_Name] [nchar](10) NULL,
[Martial_Status] [nchar](10) NULL
) ON [PRIMARY]

The table need to filled by data from excel. To do so, we need to follow the following steps:

Step 1: Save MS excel file into CSV
The excel file should have same number of columns as in the table.

Fig 1: Data in the excel

Now save the excel data as .csv file.

Fig 2: Save as .csv

Step 2: Write code to insert data from CSV

BULK INSERT [dbo].[Customer_dim] FROM 'C:\BI Projects\Blog\Dim_Customer_CSV.csv'
WITH (
CHECK_CONSTRAINTS,
DATAFILETYPE='widechar',
FIELDTERMINATOR='|',
ROWTERMINATOR='\n',
KEEPIDENTITY,
TABLOCK
);

Since, my csv file is saved as pipe delimiter so that above SQL code has FIELDTERMINATOR='|'. You need to change that as per your csv format. As well as,vyou need to change your csv file loacation accordingly.

After running the above SQL code, you should able to see below records are inserted to the table.

Fig 3: Result set after inserting the data

Sunday, October 18, 2015

Teradata:How to check DDL(table structures) are in sync among different environments(development/Test/Production) ?

Title of this post will be explained with an example. For example, If your projects involve populating data from source to target for ten (10) new tables. Then you have to create the DDL for ten tables at least in three different environments(development/Test/Production). Well, you start with development environment, develop the jobs and populate data into target area and when those are working fine then you take DDL from development environment to Test. And if tester or business find that you need to add a new column or data type then you need to change. Moreover, maybe your go-live date is closer so same DDL that put into test also may go for deploy into Production. It means , any change caught by tester or you; now you need to change the SQL in three different environment to keep consistent. Offourse, you follow change order to keep track of that, but as a developer you will prefer to see if all three different environments's table structure looks same.

If you use database as Teradata you have options to test through different environment, you can compare tables, table's column and column's data type. So you can quickly find out if you have any inconsistency through different environments.

Below code will help you to find out the difference between Test and Production environments.

select
upper(trim(tablename)) Table_name,
upper(trim(columnname)) Column_name,
upper(trim(ColumnFormat)) Column_format
from dbc.columns

where upper(trim(tablename)) in ('Custmer', 'Cust_Addreess', 'Geography', ,'Address_X_Geo','Customer_X_Product', 'Customer_X_Sales')
and upper(trim(databasename))='DWH_T'
except
select
upper(trim(tablename)) Table_name,
upper(trim(columnname)) Column_name,
upper(trim(ColumnFormat)) Column_format
from dbc.columns
where upper(trim(tablename)) in ('Custmer', 'Cust_Addreess', 'Geography', ,'Address_X_Geo','Customer_X_Product', 'Customer_X_Sales')
and upper(trim(databasename))='DWH_P';

Sometimes it's challenging to find out if table structures (DDL) for a particular projects in different environments in data warehouse has same DDL. And due to the inconsistent table structure especially between test and production can lead into ETL job failure in production.

As precaution, by running above mentioned code you can verify if your production DDL is different than Test DDL where your ETL job ran fine. So you can fix the DDL before the jobs failed in production.

Saturday, September 26, 2015

Develop job via SAS DI studio: What is SQL Pass Through and why you use it?

If you are an ETL Developer and using SAS DI Studio as your ETL tool then this post can be interesting for you. SQL Pass-Through Facility is SQL procedure that enables developer to send DBMS-specific statements to a DBMS and to retrieve data from database.
You can specify database SQL syntax instead of SAS SQL syntax when you use the Pass-Through Facility.

What is SQL Pass Through?
When we make job via SAS DI studio and use PROC SQL to make operation in database then your SQL statement will directly communicate with database and make all the operation on your SQL database and final result will bring back to SAS. To activate the feature what you need to do is to set your SAS DI studio property SQL Pass Through 'Yes'.

Fig 1: How to activate SQL Pass through

Why will you use SQL Pass Through?

1) It's handy when you use database specific function like QUALIFY, RANK etc.
2) If your SQL query has calculation then it will be on SQL database so it's faster
3) Very little processing time on SAS side
4) If you are using store procedure or SQL macro in your ETL job then SQL Pass Through will surely pay off.

Pages