It's all about Data

Wednesday, May 13, 2015

A newly developed job in SAS DI Studio: Why target table not found after Checked In?

When you start working with new tool you will find the tool is buggy (you assume so.. but actually most of the cases it’s you who don’t know all features of the tool). However, a few cases tool have it’s known issue or can’t report the error message properly.

If you are new to SAS DI, I am sure you will find this post interesting.

Let’s start with a ETL job that which get data from an external file and loading into Teradata table.

Fig 1: An example job that get data from external file and load into Teradata table

When job is completed we do checked in and so that other developer can review it and can deployed to Production.

Check In Window Look like below:

Fig 2: Check In Window

After clicking Check In you will find below three steps to do check In the job.

Fig 3: Title and description at while check in.

Fig 4: Verify the objects that you are checking in

Fig 5: Summary of Check In Wizard

After completing the check In when you tried to open the job you will find below error:

Fig 6: Missing Objects

By looking at the information message you can't identify what has changes until you look and observe your job is missing two objects one is external source file and other is target Terdata table.

But you have checked in the job, why then those objects are missing??

Well, only by checking the job you can't expect all objects will be check in automatically. You must need to do check In source file and target table then you will find the job as it should be.

Tips: It's always good idea to do check In even if the job is not completed and check In all the objects related to job.

Thursday, April 30, 2015

Window function: Rows between Preceding and Preceding in Teradata

As continuity of earlier post Maintain Historical and Current data in data Warehouse, I am going to write how to implement it in Data Warehouse.

First we create a table that holds Customer information, and assume the table ‘CUSTOMER’ is part of OLTP database. And table doesn’t have any column with END Date. It has Effective date(EFF_DT).

create volatile table Customer

(

Customer_id int,

CUSTOMER_NAME Varchar(120),

Address VArchar(200),

EFF_DT date

) ON COMMIT PRESERVE ROWS;

And Insert data into the table.

INSERT INTO Customer VALUES ( 101, 'PAUL','Bondhev 50', date '2015-01-03' );

INSERT INTO Customer VALUES ( 101,'PAUL','Bondhev 51', date '2015-03-01' );

INSERT INTO Customer VALUES ( 101,'PAULA','Bondhev 51', date '2015-04-01');

INSERT INTO Customer VALUES ( 101, 'PAULA','Bondhev 52', date '2015-04-02');

INSERT INTO Customer VALUES ( 101, 'PAULAS','Bondhev 52', date '2015-04-10' );

INSERT INTO Customer VALUES ( 102, 'MADS',Borupvang 52', date '2015-01-03' );

INSERT INTO Customer VALUES ( 103, 'LARS',Hellerup 52', date '2015-04-01' );

INSERT INTO Customer VALUES ( 202, 'DAN','Oxford street 30', date '2014-04-10' );

INSERT INTO Customer VALUES ( 202, 'DANS','Montix road 52', date '2014-04-10');

INSERT INTO Customer VALUES ( 202,'DANS','boulevard 52', date '2014-04-10' );

INSERT INTO Customer VALUES ( 202, 'DANSA','boulevard 52', date '2014-04-20' );

Data look like below table:

Fig 1: Customer table in OLTP database

As we discussed in earlier post that when we load data from OLTP database to data warehouse then we add END_DT at the time of loading data.

From the above data, customer id 101 has more than one entry and the customer changes his name and address a few times. To add END DATE you need to do following:

Find out max EFF_DT and set that as previous row's END_DT

e.g. Customer id 101 data is arranged in descending order like below, lowest EFF_DT is 03-01-2015.

Fig 2: End date manages in Data Warehouse

We need to look at EFF_DT of previous row which is 01-03-2015 and make that as END_DT for the row.

And we do same way for other rows to find out END_DT.

SQL Code is like below to implement above:

Fig 3: Preceding and Preceding to generate End Date

“ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING” is used in an ordered analytical function to tell it to include only preceding row in the partition in the calculation being performed.

If you look at the above data (Fig: 2), most updated row EFF_DT =10-04-2015 doesn’t have any end date. Since the window function (Fig: 3) we have used to get previous row which did not find any previous row. So it return NULL.

We can use COALESEC to replace NULL and put unlimited date ‘9999-12-31’ to present as active row.

Select CUSTOMER.*,

COALESCE(MAX(EFF_DT) OVER ( PARTITION BY Customer_id ORDER BY EFF_DT desc ROWS BETWEEN 1 preceding AND 1 preceding ),date '9999-12-31') as END_DT

FROM CUSTOMER

Final output should look like below:

Fig 4: Latest row updated with ever ending date

Sunday, April 19, 2015

Maintain Historical and Current data in data Warehouse

When you want to keep history in Data Warehouse then need to manipulate END_DT along with EFF_DT in the table to find out active rows and inactive rows.

For example source table Customer look like below:

Fig: 1.1 Source table ‘Customer’

The above source table doesn’t have any end date but when you will get the date from source to target you need to add end date. As we can see from the above data, we have two customers and each customer has many rows, sometimes they changes address and sometimes name.

When you get data into target table only one row should have active for one customer by saying end_date as ‘9999-12-31’ and all other rows for the customer will have end_date as previous row’s EFF_DT. The data should look like below:

Fig: 1.2 How End_date works in ‘Customer’ table

Why we care about End date (END_DT) in data warehouse?

In data warehouse initially we take full load and then we take delta (each day’s data) every day. You may ask, can’t we do exact copy from source table? Do we need to add End Date (END_DT)??!!

And simple answer of your question will be YES and one of the reasons of having data warehouse is isolate active data from the historical data by putting END_DT.

Take an example, if you want to make query and get the latest attributes of customer_id=101 then how will you find the latest information of the customer. If you make query like below :

Select * from Customer

where customer_id=101;

--Result: you will find all five rows like above

And to get latest information of the customer ‘PAUL’, please run below query:

Select * from Customer

Where customer_id=101 and date between EFF_DT and END_DT

Result:

Customer_id	CUSTOMER_NAME	Address	EFF_DT	END_DT
101	PAULAS	Bondhev 52	10-04-2015	31-12-9999

Fig 1.3: Find out the latest information of the customer

Without having END_DT your data warehouse will behave same like as OLTP database where you have to make complex query to find out the latest information.

Now we will find out how we can populate END_DT from EFF_DT by using Window function: Rows between 1 preceding and 1 preceding (Coming in next post.)

Monday, March 23, 2015

ETL Job via SAS DI Studio:Why Precode and Postcode properties of a job is disabled?

If you are an ETL developer and working with the tool called SAS DI Studio. A bad day in the morning could be started like; you are not able to insert code in the Precode and Postcode box of your job properties. The check box for Precode and Postcode is disabled. It's maybe nonsense, you never seen this kind of error before, right??!! Precode and Postcode of the job property look like below:

Fig 1: Precode and Postcode of the Job Property is disabled.

What Can go wrong?

If you look at the below picture, you will find three different type of code generation mode.

Fig 2: Three different type of Code generation mode

Generally Code generation mode 'Automatic' is selected by default.

Fig 3: By default 'automatic' is selected.

If we want to look at the user defined code then usually we select 'All user Written' which is look like below:

Fig 4: Code generation mode 'All user written' selected.

By mistakenly, if you choose 'all user written' and click 'yes' (see below image) then code generation mode will be changed and saved.

Fig 5:Saving code generation mode 'All user written'.

At this situation, if you click on properties and want to insert Precode and Postcode for the job, you will not be able add Precode and Postcode for the job. The option is disable for the job.

How you can Enable Precode and Poscode?

To enable Precode and Postcode we need to change code generation mode from 'All user written' to 'Automatic' and save it.

Then you will be able to add Precode and Postcode in the job and property screen will look like below:

Fig 6: Enabled Precode and Postcode

Friday, March 6, 2015

SQL Command- WITH Clause (CTE)

Rather writing jargon words I will go straight to the code:

WITH EMP_HIER(EMP_ID,MGR_ID,EMP_NAME)

(Select

EMP_ID,MGR_ID,EMP_NAME

from V_Employee)

Select * from EMP_HIER

When I have seen the above code, first thing I have in my mind is; WITH clause is used to make derived table ‘EMP_HIER’ by using a source table ‘V_Employee’. And it must be used instead of making volatile table.

And then tried to run only first part:

WITH EMP_HIER(EMP_ID,MGR_ID,EMP_NAME)

(Select

EMP_ID,MGR_ID,EMP_NAME

from V_Employee)

Ya!! that was stupid move!! It doesn’t work separately; it works when select statement is there. So when I run whole code, then it works fine J

WITH EMP_HIER(EMP_ID,MGR_ID,EMP_NAME)

(Select

EMP_ID,MGR_ID,EMP_NAME

from V_Employee)

Select * from EMP_HIER

Now next question came in my mind why I am going to use the above syntax, instead I can create volatile table. I did some investigation and share my thoughts with my colleagues and so far I found reason for using ‘WITH’ Clause is below:

1) Data retrieve will not be slower

Explanation:

When we make volatile table by getting data from other table then we generally don’t create table structure for volatile table. So your volatile table doesn’t have index automatically. Whereas if you use WITH clause then you are directly accessing to the source table so it will be faster than accessing to volatile table.

2) Portability

Explanation:

There is different syntax for making volatile in different database systems. E.g.

--In MS SQL:

CREATE TABLE #Temp1

( Name Char( 30 ), seqid integer )

--In Teradata:

create volatile table V_Employee

(

EMP_ID int,

MGR_ID int,

EMP_NAME varchar(80)

)

on commit preserve rows;

But ‘WITH clause’ is same for both MS SQL and Teradata database. So you don’t need to change the syntax if your code needs to work for the both databases.

3) Not enough pool space to create volatile table

Explanation:

Sometimes you may not have enough pool space to create volatile table in your development environment and then you can use WITH clause to do the same.

In addition, my study found, ‘WITH clause’ will be helpful when recursive data query will be needed e.g. find out different level of manager of a particular employee in a big organization. However, I will cover recursive queries in other post.

FYI, ‘WITH clause’ has official name which called Common Table Expression (CTE).

Sunday, February 15, 2015

Insert data from Excel to Terdata table via SQL assistant.

Sometimes you need to get data from excel to Teradata table. There are a few ways you can import data from excel to Teradata; one of them is using Teradata SQL assistant. I am going to explain step by step process of inserting data into Teradata table from an excel data sheet.

For example, you have customer information in the excel (Fig 1) and want to import the excel data to the teradata table.

Fig 1: Data in Excel

Step 1: Save excel as .csv file
First you need to save the excel file as .CSV

Fig 2: Save as .CSV

Step 2: Remove cloumn names from the excel
You can edit the .csv file like below (Fig:3 )

Fig 3: Edit .csv file

Then remove the column name from the .csv file. (See fig:4)

Fig 4: remove column names

Step 3: Create a temp table to insert the data from Excel

Create a volatile table in teradata (see fig: 5), SQL Code for creating the volatile table:

create volatile table Customer_Info
(Customer_no int,
Cutomer_First_Name varchar(30),
Customer_LastName varchar(30),
Phone int)
on commit preserve rows;

Fig 5: Create table to store data from excel

And there are no records in the table.

Fig 6: Newly created table, waiting to get data from .csv

Step 4: Choose 'Import Data' from SQL Assistant

From Teradata SQL Assistant, go to File->Import data and click at Import data

Fig 7: Import data from Teradata SQL assistant.

And then you will see message "Ready for Import operation" at the above of SQL editor in teradata SQL assistant. (see Fig:8)

Fig 8: Message 'Ready for Import operation'

Step 5: Execute insert statement

Now you need to write below SQL statment and press F5

Fig 9: SQL syntax for inserting data
When you press F5 then you will find option to select your .csv file that you have completed in step 1 and step 2. (See figure 10)

Fig 10: Choose .csv file to get data into the table

When you select the .csv file SQL assistant import functionality will start taking the data from .csv and will insert records into the table 'Customer_Info', you will find message at the bottom of the windows (see fig;11)

Fig 11: data imported to the table

Now you can select the table to test if the table is filled by the data from the excel.

Note: Please click again 'Import data' from File->Import data to disable import mode in SQL assistant.

Thursday, January 29, 2015

Nested Case Statement in Terada

Like all other database Teradata support nested Case Statement. I will describe nested case with simple example. First We create a small example to work with nested case. Make a temporary table which called 'SKN_COUNTRY' and insert dummy data to the table.

create volatile table SKN_Country
(Country varchar(50)
,CITY varchar(50)
,CITY_CODE Decimal(15,0))
on commit preserve rows

insert into SKN_Country Values ('DK','CPH',001);
insert into SKN_Country Values ('DK','Aalborg',002);
insert into SKN_Country Values ('DK','Arhus',003);
insert into SKN_Country Values ('SE','Stockholm',001);
insert into SKN_Country Values ('SE','Gothenburg',002);
insert into SKN_Country Values ('NO','Oslo',001);

Country	CITY	CITY_CODE
DK	Aalborg	2
DK	Arhus	3
DK	CPH	1
NO	Oslo	1
SE	Stockholm	1
SE	Gothenburg	2

Approach 1 (Failed):
-------------------------
select
case T1.Country
WHEN 'DK'
Then
case when T1.CITY='CPH'
then 'DK-CPH'
else 'other-DK'
end
When 'SE'
case when T1.CITY='Stockholm'
then 'SE-STOCKHOLM'
else 'other-SE'
end

END as CITY_TYPE_CODE
FROM SKN_Country AS T1

But unfortunately, above approach will not work . You will get error "SELECT Failed. 3707: Syntax error, expected something like a 'BETWEEN' keyword or an 'IN' keyword or a 'LIKE' keyword or a 'CONTAINS' keyword between a string or a Unicode character literal and the 'case' keyword. "

Approach 2 (Success):
----------------------------
select T1.country,
case When T1.Country='DK' Then
case T1.CITY
When 'CPH' THEN 'DK-CPH'
ELSE 'NODK'
END
WHEN T1.Country='SE' THEN
case T1.CITY
When 'stockholm' THEN 'SE-stock'
ELSE 'noSE'
END
END as CITY_TYPE_CODE
FROM SKN_Country AS T1

Result Set:
-------------

Country	CITY_TYPE_CODE
DK	NODK
DK	NODK
DK	DK-CPH
NO	?
SE	SE-stock
SE	noSE

Approach 3 (Success):
---------------------------
Select T1.country,
CASE When T1.Country='DK' THEN
case when T1.CITY='CPH'
then 'DK-CPH'
else 'otherDK'
end
ELSE
Case when T1.country='SE' then
Case when T1.CITY='Stockholm' then
'SE-stock'
else 'other.SE'
end
end
END as CITY_TYPE_CODE
FROM SKN_Country AS T1

Result Set:
-------------

Country	CITY_TYPE_CODE
DK	otherDK
DK	otherDK
DK	DK-CPH
NO	?
SE	SE-stock
SE	other.SE

Pages

Wednesday, May 13, 2015

Thursday, April 30, 2015

Sunday, April 19, 2015

Monday, March 23, 2015

Friday, March 6, 2015

Sunday, February 15, 2015

Thursday, January 29, 2015