It's all about Data: 2014

Saturday, December 27, 2014

Analytical function: ROW_NUMBER () with Qualify in Teradata

This post will help to understand QUALIFY ROW_NUMBER OVER (PARTITION BY ...) phrase with simple example.

We take an example like below:

INDUST_CODE	INDUSTRY_NAME	ACTIVATE_DATE
1011	GOOGLE	13-12-2009
1013	YOUTUBE	01-09-2008
1011	GOOGLE	15-12-2009
1012	MICROSOFT	14-10-1999
1011	GOOGLE	11-12-2009

The data listed above have more than one row with Industry named ‘Google’. Business needs the industry name without duplication, they don’t want one industry name more than once. They like to see the data like below:

INDUST_CODE	INDUSTRY_NAME
1011	GOOGLE
1012	MICROSOFT
1013	YOUTUBE

How do you do that?

You are expert in SQL, so wrote the script like below which works fine:

select INDUST_CODE,INDUSTRY_NAME

from INDUSTRY

QUALIFY

row_number() Over (partition by INDUST_CODE order by ACTIVATE_DATE desc)=1

If you want to know how the code worked? Then please find the explanation step by step:

1) Group by (Partition by)with Industry Code (INDUST_CODE)

2) Order by with ACTIVATE_DATE which means latest date will be used to find the first row

3) Then uses ‘QUALIFY’ to select the row.

Lets explain the syntax :

Make the translation easy:

qualify row_number() over (Partition and order by brace) = 1

which means,

Selecting first row from the result set after applying group by and order by function.

Thursday, December 4, 2014

Extra Attention while using Multiple Tables in Update Statement (UPDATE From SELECT Statement/Using JOIN in UPDATE Statement)

While updating data in a table you may discover error with “Error: 3993 Illegal usage of alias name” , it will be really difficult to find out at first glance what went wrong?
As you wrote the code, at least you did not expect to have alias error. You are almost sure that you did not do any mistake like alias problem.

Let’s have a look at the piece of code:

Update ACCOUNTS AS ACC
FROM
(SELECT
UPD_ACC.ACC_NO
,UPD_ACC.DEB_AMT
FROM CUSTOMER AS CUST
JOIN
TRANSACTIONS AS TRN ON
CUST. City_ID =TRN.City_ID
AND CUST.Customer_ID=TRN.Customer_ID
Where TRN.CTY='NY') AS UPD_ACC
SET ACC.ACC_NO= UPD_ACC.ACC_NO
,ACC.DEB_AMT= UPD_ACC.DEB_AMT
Where
ACC.Customer_ID= UPD_ACC.Customer_ID
And ACC. City_ID =UPD_ACC.City_ID

After you run the code, you can’t expect to have alias error because you have used alias correctly in the code.

However, the main reason of the error is; you are not allow to use alias left side of the SET syntax like you did at above: Set ACC.ACC_NO= UPD_ACC.ACC_NO

So correct syntax will be: Set ACC_NO= UPD_ACC.ACC_NO

As well as, at where clause you need to use the table name not the alias(the table you are updating).

e.g.

Where

ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID --NOT ACC.Customer_ID

And ACCOUNTS. City_ID =UPD_ACC.City_ID

Now we can fix the code and rewrite again like below which will work without error:

Update ACCOUNTS

FROM

(SELECT

UPD_ACC.ACC_NO

,UPD_ACC.DEB_AMT

FROM CUSTOMER AS CUST

JOIN

TRANSACTIONS AS TRN ON

CUST. City_ID =TRN.City_ID

AND CUST.Customer_ID=TRN.Customer_ID

Where TRN.CTY='NY') as UPD_ACC

SETACC_NO= UPD_ACC.ACC_NO

,DEB_AMT= UPD_ACC.DEB_AMT

Where

ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID

AND ACCOUNTS. City_ID =UPD_ACC.City_ID

Tuesday, November 18, 2014

Data Import failure in the midway: What to DO?

If you work with data warehouse, sometime you may get task to insert data into a table from a .csv file. It's simple task, isn't it?? Just use import functionality from Terdadata SQL Assistant and insert the records. (How to import data from .csv to teradata table is not scope of this article).

But it's not always true that it can straight forward. Considering a task you got from business where you received a .csv file with 50,000 rows and 15 columns in it. And this data you have to insert into a table (not more than a table). You didn't think much and press F5 to run the query and it's started getting data into the table. You feel happy :) But your happiness will be blurred if you see an error message saying "Duplicate row error in...."after inserting 10,000 records.

Now you will start thinking "I should have think through the situation before pressing the F5 key, I am stupid bla. bla." No worries, human being make error, so you do!!

We will find out the way out :) Assume your table already had few hundred thousands rows and you inserted just now 10,000 and then it failed. Smart way to get rid off the problem is to find out the deviation of the records and insert the missing rows.

What will be steps to fix the problem?

We start with an example to explain the steps involved to solve the problem, say table name is Customer and the received .csv file from business called 'Customer_from_Mars.csv"

Step 1: Insert the data into a Temporary Table
Take the .csv file and insert data from .csv to a temporary table called "TMP_TBL_Customer_Mars".

Step 2: Find out the deviation
To find out the deviation you can use MINUS operator. The Code will look like below:

--Select the all customers from .csv file that you have inserted into temp table

Select TMP_CUST_TBL.Cust_ID

,TMP_CUST_TBL.First_Name

,TMP_CUST_TBL.PLANET_ID

,TMP_CUST_TBL.XtraField1

,TMP_CUST_TBL.XtraField2

,TMP_CUST_TBL.XtraField3

,TMP_CUST_TBL.XtraField4

,TMP_CUST_TBL.XtraField5

TMP_CUST_TBL.XtraField6

FROM TMP_TBL_Customer_Mars as TMP_CUST_TBL

MINUS

--Find out the customer which already inserted before the failure occurred.

Select TMP_CUST_MARS.Cust_ID

,TMP_CUST_MARS.First_Name

,TMP_CUST_MARS.PLANET_ID

,TMP_CUST_MARS.XtraField1

,TMP_CUST_MARS.XtraField2

,TMP_CUST_MARS.XtraField3

,TMP_CUST_MARS.XtraField4

,TMP_CUST_MARS.XtraField5

TMP_CUST_MARS.XtraField6

FROM CUSTOMER AS CUST

JOIN

TMP_TBL_Customer_Mars AS TMP_CUST_MARS

CUST.cust_id=TMP_CUST_MAR.Cust_ID

AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID

At first you are selecting whole data from .csv file that you have stored in temp table and find out the already inserted record by joining with main table and finally, use MINUS operator between them to get the records those did not insert.

Step 3: Insert the missing data to the table
Now you are ready to insert the rest of the data into the table "Customer"

Insert into CUSTOMER

Select TMP_CUST_TBL.Cust_ID

,TMP_CUST_TBL.First_Name

,TMP_CUST_TBL.PLANET_ID

,TMP_CUST_TBL.XtraField1

,TMP_CUST_TBL.XtraField2

,TMP_CUST_TBL.XtraField3

,TMP_CUST_TBL.XtraField4

,TMP_CUST_TBL.XtraField5

TMP_CUST_TBL.XtraField6

FROM TMP_TBL_Customer_Mars as TMP_CUST_TBL

MINUS

--Find out the customer which already inserted before the failure occurred.

Select TMP_CUST_MARS.Cust_ID

,TMP_CUST_MARS.First_Name

,TMP_CUST_MARS.PLANET_ID

,TMP_CUST_MARS.XtraField1

,TMP_CUST_MARS.XtraField2

,TMP_CUST_MARS.XtraField3

,TMP_CUST_MARS.XtraField4

,TMP_CUST_MARS.XtraField5

TMP_CUST_MARS.XtraField6

FROM CUSTOMER AS CUST

JOIN

TMP_TBL_Customer_Mars AS TMP_CUST_MARS

CUST.cust_id=TMP_CUST_MAR.Cust_ID

AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID

Friday, November 14, 2014

Column length Limitation In Teradata

Maybe you got a task to create a view. And you are making alias of the columns names as per your customer request. Sometimes by giving the sensible name you will experience the limitation of teradata column length.

e.g. your query can look like below:

Create view View_generate_error3737

SELECT

Customer.Name as customer_name_from_Mars_with_tedius_huge_ear

,Customer.Age as Customer_age_from_march

FROM

Alian_Information AS Customer

where CTY='Mars' and behavior='tedius_huge_ear'

As long as you pressed F5, and will found the error like below:

Fig: 3737 error

Error: "3737: name requires more than 30 bytes in LATIN internal form" which is self explanatory. It means you can't define a column name that has characters more than 30.

So by putting bigger column name you can be ended up with a compile error in Teradata.

Monday, October 20, 2014

SAS DI Error: "CLI error trying to establish connection...."

I have been working with SAS DI studio for couple of months and never found the error like this before,“CLI error trying to establish connection: [Teradata][ODBC Teradata Driver][Teradata Database] The UserId, Password or Account is invalid. : [Teradata][ODBC Teradata Driver] Not enough information to log on ”

Error is self-explanatory, it means the user id and password I have given is incorrect. I was really surprised!!! It was working fine last week. What happened in the mean time?? Who changes my SAS password??

No No, it’s not your SAS credential, look at the error carefully; it’s your Tera data credential. I tried to remember and finally recalled my credential for Teradata has been changed in the meantime (few days ago).

Oooo Yes, I know how to fix it!!??? I opened ODBC data source administration and changed the password (Fig:1). It should work now. Don’t you think same like me?

Fig 1: ODBC setup for Teradata

Nope, it did not work; still got the same error.

After searching a little bit found there is a SAS tool called ‘SAS Personal Login Manager’ where you should update your database password if your database credential changes.

How to do that?

It’s simple; open the tool “SAS Personal Login Manager” and updates the password and closes it (Fig:2).

Fig2: SAS personal Login Manager

Now if you reopen your SAS DI studio and run the job then the error will disappear.

Tuesday, October 14, 2014

Teradata: How to update duplicate records

Generally update statement will not allow updating duplicated records; records must be unique if you want to update a table.

But sometimes your business needs duplicate records in a table. Then your task is to provide the solution. How do you do that?

You need to make a new column that can generate sequential numbers and you need to add that column to duplicate rows to make a unique row.

For example:

Your data look like below, say table name is Weather_DK:

Country	CTY	AVG_TEMP	AVG_Humidity
DK	CPH	12.3	83%
DK	CPH	12.3	83%
DK	CPH	12.3	83%
DK	Arhus	11.3	85%
DK	Aalborg	12.3	87%

Fig 1.1: Table Weather_DK

For example, now you want to update the data in Weather_SE table which has same column name as Weather_DK table and data like below:

Country	AVG_TEMP	AVG_Humidity
SE	19.3	43%
SE	23.3	83%
SE	12.3	53%

Fig 1.2: Table Weather_SE

Say; your business informed, Swedish average tempreture and humidity is exactly same as Denmark's capital city Copenhagen's (CPH) average temperature and humidity.

So your job is to update the Weather_SE table’s two column AVG_TEMP and AVG_Humidity from Weather_DK data. When you write SQL query like below:

Update Weather_SE
Set
AVG_TEMP= DK.AVG_TEMP
AVG_Humidity=DK.AVG_Humidity
from
(Select AVG_TEMP , AVG_Humidity
From Weather_DK
group by 1,2) as DK
where DK.CTY='CPH'

When you will run the above script you will get error :”Failure 7547 Target row updated by multiple source rows”

To eliminate the error, you need to have unique rows. So you add a new columns with your result set, you can select like below:

select ROW_NUMBER() over (order by AVG_TEMP) as row_S, AVG_TEMP, AVG_Humidity

from Weather_DK as DK

Now you can easily update.

Tuesday, September 30, 2014

Challenge Scenario: Inserting data into a third Table

Sometimes you may find problem like below:

You have table called “Country_Code” that holds country code and name like below:

CTY_CODE	Country_name
001	USA
86	China
46	Sweden
45	Denmark

Now you have same group_code and communication rating scale value for all above country.

e.g.

group_code=09 and rating_scale_value=7

Your task is to insert data to the second table which called “Code_Rating_value”

Data should look like below:

CTY_CODE	group_code	Rating_scale_value
001	09	7
86	09	7
46	09	7
45	09	7

If it’s small amount of data then it’s easy to insert by following:

Insert into Code_Rating_value

Values(001,09,7)

Execute the above statement by changing the value four times and you are done!!

But if it’s a big table with thousands of records you will not going to do this thousands of times, will you??

You will find the below code to insert the data at once.

insert into Code_Rating_value

select CTY. CTY_Code, Rating_value .group_code, Rating_value. Rating_scale_value

from

(select '09' as group_code, '07’ as Rating_scale_value , '1' as key) as Rating_value

left join (select distinct CTY_Code, '1' as key

from Country_Code) as CTY

on CTY. key = Rating_value. Key

Wednesday, August 6, 2014

While do cast (Decimal to Char): follow me!!!

As a part of my task; I was making query something like below,

select cast (city_code as char(10)), city_name||’_’ ||city_type as CITY_INFO

From CITY

I wondered why my data for City_Code look like this:

City_Code	CITY_INFO
10134.	COPENHAGEN_CYCKLE
101.	STOCKHOLM_TOG
3497.	DHAKA_RICKSH
2789.	LISBON_TOUR

City_code ended with ‘.’ Then looked carefully the original datatype for the column (CITY_CODE), it’s decimal (18, 0). That is the reason why it’s giving the problem.

Googled it and found some solution:

select cast(cast(city_code as integer) as char(10)), city_name||’_’ ||city_type as CITY_INFO

From CITY

Since the column is decimal, it’s first cast to integer and then cast to char. It works perfectly. But senior developer who reviewed the code asked not to two time casting rather use trim function to do this.

Fair enough!! Got motivation to do something else J

Then I have tried to implement the code like below:

select TRIM(TRAILING '.' FROM (cast(UNQ_ID_SRC_DEC as char(10))))

,city_name||’_’ ||city_type as CITY_INFO

From CITY

I have trimmed with trailing ‘.’ But result did not change!!

City_Code	CITY_INFO
10134.	COPENHAGEN_CYCKLE
101.	STOCKHOLM_TOG
3497.	DHAKA_RICKSH
2789.	LISBON_TOUR

I was really curious to know where the problem is. After investigating a little more time on it, I made the query like below:

select TRIM(TRAILING '.' FROM (cast(UNQ_ID_SRC_DEC as varchar(10))))

,city_name||’_’ ||city_type as CITY_INFO

From CITY

Yaahuuu!! It works as I expected. ‘.’ has been removed.

City_Code	CITY_INFO
10134	COPENHAGEN_CYCKLE
101	STOCKHOLM_TOG
3497	DHAKA_RICKSH
2789	LISBON_TOUR

I just changes cast data type from char to varchar and then it works fine!!

Hope my experience will help some of you sometime!!!

Tuesday, August 5, 2014

Tricks: UNION and UNION ALL

While you are making union in Teradata it’s very important to follow below rules:

1) Select statement should have equal number of columns

For example:

select city_code, city_name, city_type

From CITY

union

select EMP_CITY_CODE as city_code, EMP_CITY as city_name, EMP_city_type as City_type

from CITY_EMP

2) Type should be same

If data type is different; then it should be cast and then make union, if you don’t define/cast the type then value will be truncated. So when you define type consider maximum size of data type.

For example:

select cast (city_code as char(10)), city_name, city_type

From CITY

union

select cast(EMP_CITY_CODE as char(10)) as city_code, EMP_CITY as city_name, EMP_city_type as City_type

from CITY_EMP

3) Where clause at UNION

At the time you are selecting if you cast a type , always remember to do the same when you filter at where clause.

For example:

insert DW_EMP_CITY (city_code,city_name,city_type)

select cast (city_code as char(10)), city_name, city_type

From CITY

union

select cast(EMP_CITY_CODE as char(10)) as city_code, EMP_CITY as city_name, EMP_city_type as City_type

from CITY_EMP

where not exists (select 1 from DW_EMP_CITY as EDW_CITY where EDW_CITY.city_code=cast(CITY_EMP.EMP_CITY_CODE as char(10))

4) When to use UNION/UNION ALL

If you want to avoid duplicates then use ‘UNION’ otherwise you use ‘UNION ALL’.

Thursday, May 29, 2014

SQL Left Join: Condition on where clause or on clause

As a database developer, we know about SQL left, right or inner join clearly. But when it comes to left join or right join with condition putting on where clause or on clause can make big difference that we sometimes overlook or get panic why those making difference.

I will go directly to the example, e.g. we have Account and Customer table and we want all the accounts those are exist in Account table and also consider if any customer is added on '2013-05-27', we take him as well.

Here we have Account and Customer table and data look like below:

Account
Account_No	Status	Eff_DT	Customr_ID
123	actv_	21-05-2014	301
124	actv_	22-05-2014	307
125	actv_	23-05-2014	303

Customer
Customer_ID	Customer_Name	Eff_DT
301	john	21-05-2014
302	san	21-05-2014
303	paul	22-05-2014
304	helle	23-05-2014
305	jon	24-05-2014


SQL Query :
select Account_no, status,A. eff-dt, A.customer_id, Customer_Name from ACCOUNT as A
join customer as C on A.customer_id=C.customer_id

Result Set:

Account_No	Status	Eff_DT	Customr_ID	Customer_Name
123	actv_	21-05-2014	301	john
124	actv_	22-05-2014	307	null
125	actv_	23-05-2014	303	paul


Changes the query with 'on' clause:

select Account_no, status,A. eff-dt, A.customer_id, Customer_Name, C.EFF_DT as Customer_EFF_DT from ACCOUNT as A
join customer as C on A.customer_id=C.customer_id
and C.eff_dt= date '2013-05-24'

Result will look like:

Account_No	Status	Eff_DT	Customr_ID	Customer_Name	Customer_EFF_DT
123	actv_	21-05-2014	301	john	null
124	actv_	22-05-2014	307	null	null
125	actv_	23-05-2014	303	paul	null

Changes the query with 'Where' clause

select Account_no, status,A. eff-dt, A.customer_id, Customer_Name, C.EFF_DT as Customer_EFF_DT from ACCOUNT as A
join customer as C on A.customer_id=C.customer_id
where C.eff_dt= date '2013-05-24'

Result will return nothing

It means where clause filter after join has established. When we did same condition C.eff_dt= date '2013-05-27' on clause it was still under left join on condition
and that returns you data as how left join work. But where clause work after making join, if we look at the last result, consider we are implying where clause on the last record set.

Pages

Saturday, December 27, 2014

Thursday, December 4, 2014

Tuesday, November 18, 2014

Friday, November 14, 2014

Monday, October 20, 2014

Tuesday, October 14, 2014

Tuesday, September 30, 2014

Wednesday, August 6, 2014

Tuesday, August 5, 2014

Thursday, May 29, 2014