Saturday, December 27, 2014

Analytical function: ROW_NUMBER () with Qualify in Teradata

This post will help to understand QUALIFY  ROW_NUMBER OVER (PARTITION BY ...) phrase with simple example.

We take an example like below:

INDUST_CODE
INDUSTRY_NAME
ACTIVATE_DATE
1011
GOOGLE
13-12-2009
1013
YOUTUBE
01-09-2008
1011
GOOGLE
15-12-2009
1012
MICROSOFT
14-10-1999
1011
GOOGLE
11-12-2009

The data listed above have more than one row with Industry named ‘Google’.  Business needs the industry name without duplication, they don’t want one industry name more than once. They like to see the data like below:

INDUST_CODE
 INDUSTRY_NAME

1011
GOOGLE

1012
MICROSOFT

1013
YOUTUBE


How do you do that?
You are expert in SQL, so wrote the script like below which works fine:
select INDUST_CODE,INDUSTRY_NAME
from INDUSTRY
QUALIFY
row_number() Over (partition by INDUST_CODE order by ACTIVATE_DATE desc)=1

If you want to know how the code worked? Then please find the explanation step by step:
1) Group by (Partition by)with Industry Code (INDUST_CODE)
2) Order by with ACTIVATE_DATE which means latest date will be used to find the first row
3) Then uses ‘QUALIFY’ to select the row.

Lets explain the syntax :





















Make the translation easy:
qualify row_number() over (Partition and order by brace) = 1
    which means,
    Selecting first row from the result set after applying group by and order by function. 



Thursday, December 4, 2014

Extra Attention while using Multiple Tables in Update Statement (UPDATE From SELECT Statement/Using JOIN in UPDATE Statement)

While updating data in a table you may discover error with “Error: 3993 Illegal usage of alias name” , it will be really difficult to find out at first glance what went wrong?
As you wrote the code, at least you did not expect to have alias error. You are almost sure that you did not do any mistake like alias problem.

Let’s have a look at the piece of code:

Update ACCOUNTS AS ACC
FROM
(SELECT
 UPD_ACC.ACC_NO
,UPD_ACC.DEB_AMT
FROM CUSTOMER AS CUST
JOIN
TRANSACTIONS AS TRN ON
 CUST. City_ID =TRN.City_ID
AND CUST.Customer_ID=TRN.Customer_ID
Where TRN.CTY='NY') AS UPD_ACC
SET ACC.ACC_NO= UPD_ACC.ACC_NO
     ,ACC.DEB_AMT= UPD_ACC.DEB_AMT
Where
ACC.Customer_ID= UPD_ACC.Customer_ID
And ACC. City_ID =UPD_ACC.City_ID

After you run the code, you can’t expect to have alias error because you have used alias correctly in the code.
However, the main reason of the error is; you are not allow to use alias left side of the SET syntax like you did at above:  Set ACC.ACC_NO= UPD_ACC.ACC_NO

So correct syntax will be: Set ACC_NO= UPD_ACC.ACC_NO

As well as, at where clause you need to use the table name not the alias(the table you are updating).
e.g. 

Where  
        ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID --NOT ACC.Customer_ID
And ACCOUNTS. City_ID =UPD_ACC.City_ID

Now we can fix the code and rewrite again like below which will work without error:

Update ACCOUNTS 
FROM 
(SELECT
 UPD_ACC.ACC_NO
,UPD_ACC.DEB_AMT
FROM CUSTOMER AS CUST
JOIN 
TRANSACTIONS AS TRN ON
CUST. City_ID =TRN.City_ID
AND CUST.Customer_ID=TRN.Customer_ID
Where TRN.CTY='NY') as UPD_ACC
SETACC_NO= UPD_ACC.ACC_NO
     ,DEB_AMT= UPD_ACC.DEB_AMT
Where  
         ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID
AND ACCOUNTS. City_ID =UPD_ACC.City_ID

Tuesday, November 18, 2014

Data Import failure in the midway: What to DO?

If you work with data warehouse, sometime you may get task to insert data into a table from a .csv file. It's simple task, isn't it?? Just use import functionality from Terdadata SQL Assistant and insert the records. (How to import data from .csv to teradata table is not scope of this article).

But it's not always true that it can straight forward. Considering a task you got from business where you received a .csv file with 50,000 rows and 15 columns in it. And  this data you have to insert into a table (not more than a table). You didn't think much and press F5 to run the query and it's started getting data into the table. You feel happy :) But your happiness will be blurred if you see an error message saying "Duplicate row error in...."after inserting 10,000 records.

Now you will start thinking "I should have think through the situation before pressing the F5 key, I am stupid bla. bla." No worries, human being make error, so you do!!

We will find out the way out :) Assume your table already had few hundred thousands rows and you inserted just now 10,000 and then it failed. Smart way to get rid off the problem is to find out the deviation of the records and insert the missing rows.

What will be steps to fix the problem?

We start with an example to explain the steps involved to solve the problem, say table name is Customer and the received  .csv file from business called 'Customer_from_Mars.csv"

Step 1: Insert the data into a Temporary Table
Take the .csv file and insert data from .csv to a temporary table called "TMP_TBL_Customer_Mars".

Step 2: Find out the deviation
To find out the deviation you can use MINUS operator. The Code will look like below:

--Select the all customers from .csv file that you have inserted into temp table

Select TMP_CUST_TBL.Cust_ID
,TMP_CUST_TBL.First_Name
,TMP_CUST_TBL.PLANET_ID
,TMP_CUST_TBL.XtraField1
,TMP_CUST_TBL.XtraField2
,TMP_CUST_TBL.XtraField3
,TMP_CUST_TBL.XtraField4
,TMP_CUST_TBL.XtraField5
 TMP_CUST_TBL.XtraField6 
FROM TMP_TBL_Customer_Mars  as TMP_CUST_TBL

MINUS
--Find out the customer which already inserted before the failure occurred.
Select TMP_CUST_MARS.Cust_ID
,TMP_CUST_MARS.First_Name
,TMP_CUST_MARS.PLANET_ID
,TMP_CUST_MARS.XtraField1
,TMP_CUST_MARS.XtraField2
,TMP_CUST_MARS.XtraField3
,TMP_CUST_MARS.XtraField4
,TMP_CUST_MARS.XtraField5
TMP_CUST_MARS.XtraField6
FROM CUSTOMER AS CUST
JOIN
TMP_TBL_Customer_Mars AS TMP_CUST_MARS
ON
CUST.cust_id=TMP_CUST_MAR.Cust_ID
AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID

At first you are selecting whole data from .csv file that you have stored in temp table and find out the already inserted record by joining with main table and finally, use MINUS operator between them to get the records those did not insert.

Step 3: Insert the missing data to the table
Now you are ready to insert the rest of the data into the table "Customer"

Insert into CUSTOMER

Select TMP_CUST_TBL.Cust_ID
,TMP_CUST_TBL.First_Name
,TMP_CUST_TBL.PLANET_ID
,TMP_CUST_TBL.XtraField1
,TMP_CUST_TBL.XtraField2
,TMP_CUST_TBL.XtraField3
,TMP_CUST_TBL.XtraField4
,TMP_CUST_TBL.XtraField5
 TMP_CUST_TBL.XtraField6 
FROM TMP_TBL_Customer_Mars  as TMP_CUST_TBL

MINUS
--Find out the customer which already inserted before the failure occurred.
Select TMP_CUST_MARS.Cust_ID
,TMP_CUST_MARS.First_Name
,TMP_CUST_MARS.PLANET_ID
,TMP_CUST_MARS.XtraField1
,TMP_CUST_MARS.XtraField2
,TMP_CUST_MARS.XtraField3
,TMP_CUST_MARS.XtraField4
,TMP_CUST_MARS.XtraField5
TMP_CUST_MARS.XtraField6
FROM CUSTOMER AS CUST
JOIN
TMP_TBL_Customer_Mars AS TMP_CUST_MARS
ON
CUST.cust_id=TMP_CUST_MAR.Cust_ID
AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID


Friday, November 14, 2014

Column length Limitation In Teradata


Maybe you got a task to create a view. And you are making alias of the columns names as per your customer request. Sometimes by giving the sensible name you will experience the limitation of teradata column length.

e.g. your query can look like below:

Create view View_generate_error3737
AS
SELECT
 
      Customer.Name as customer_name_from_Mars_with_tedius_huge_ear
      ,Customer.Age as Customer_age_from_march
     
FROM
Alian_Information AS Customer
where CTY='Mars' and behavior='tedius_huge_ear'

As long as you pressed F5, and will found the error like below:



Fig: 3737 error






Error: "3737: name requires more than 30 bytes in LATIN internal form" which is self explanatory. It means you can't define a column name that has characters more than 30.

So by putting bigger column name you can be ended up with a compile error in Teradata.

Monday, October 20, 2014

SAS DI Error: "CLI error trying to establish connection...."

I have been working with SAS DI studio for couple of months and never found the error like this before,“CLI error trying to establish connection: [Teradata][ODBC Teradata Driver][Teradata Database] The UserId, Password or Account is invalid. : [Teradata][ODBC Teradata Driver] Not enough information to log on ”

Error is self-explanatory, it means the user id and password I have given is incorrect. I was really surprised!!! It was working fine last week. What happened in the mean time?? Who changes my SAS password??
No No, it’s not your SAS credential, look at the error carefully; it’s your Tera data credential.  I tried to remember and finally recalled my credential for Teradata has been changed in the meantime (few days ago).


Oooo Yes, I know how to fix it!!??? I opened ODBC data source administration and changed the password (Fig:1). It should work now.  Don’t you think same like me?


Fig 1: ODBC setup for Teradata

Nope, it did not work; still got the same error.

After searching a little bit found there is a SAS tool called ‘SAS Personal Login Manager’ where you should update your database password if your database credential changes.
How to do that?

It’s simple; open the tool “SAS Personal Login Manager” and updates the password and closes it (Fig:2).  

                                             Fig2: SAS personal Login Manager


Now if you reopen your SAS DI studio and run the job then the error will disappear.





Tuesday, October 14, 2014

Teradata: How to update duplicate records

Generally update statement will not allow updating duplicated records; records must be unique if you want to update a table.

But sometimes your business needs duplicate records in a table. Then your task is to provide the solution. How do you do that?

You need to make a new column that can generate sequential numbers and you need to add that column to duplicate rows to make a unique row.

For example:
Your data look like below, say table name is Weather_DK:

Country
CTY
AVG_TEMP
AVG_Humidity
DK
CPH
12.3
83%
DK
CPH
12.3
83%
DK
CPH
12.3
83%
DK
Arhus
11.3
85%
DK
Aalborg
12.3
87%
                                        Fig 1.1: Table Weather_DK

For example, now you want to update the data in Weather_SE table which has same column name as Weather_DK table  and data like below:
Country
AVG_TEMP
AVG_Humidity
SE
19.3
43%
SE
23.3
83%
SE
12.3
53%
                                    Fig 1.2: Table Weather_SE

Say; your business informed, Swedish average tempreture and humidity is exactly same as Denmark's capital city Copenhagen's (CPH) average temperature and humidity.

So your job is to update the Weather_SE table’s two column AVG_TEMP and AVG_Humidity  from Weather_DK data. When you write SQL query like below:

Update Weather_SE
Set
AVG_TEMP= DK.AVG_TEMP
AVG_Humidity=DK.AVG_Humidity
from
(Select  AVG_TEMP , AVG_Humidity
From Weather_DK
group by 1,2) as DK
where DK.CTY='CPH'

When you will run the above script you will get  error :”Failure 7547 Target row updated by multiple source rows”

To eliminate the error, you need to have unique rows. So you add a new columns with your result set, you can select like below:

select ROW_NUMBER() over (order by AVG_TEMP) as row_S, AVG_TEMP, AVG_Humidity
from Weather_DK  as  DK

Now you can easily update.


Tuesday, September 30, 2014

Challenge Scenario: Inserting data into a third Table

Sometimes you may find problem like below:

You have table called “Country_Code” that holds country code and name like below:
CTY_CODE
Country_name
001
USA
86
China
46
Sweden
45
Denmark

Now you have same group_code and communication rating scale value for all above country.
e.g.
group_code=09 and rating_scale_value=7

Your task is to insert data to the second table which called “Code_Rating_value”

Data should look like below:
CTY_CODE
group_code
Rating_scale_value
001
09
7
86
09
7
46
09
7
45
09
7

If it’s small amount of data then it’s easy to insert by following:
Insert into Code_Rating_value
Values(001,09,7)

Execute the above statement by changing the value four times and you are done!!
But if it’s a big table with thousands of records you will not going to do this thousands of times, will you??

You will find the below code to insert the data at once.

insert into Code_Rating_value
select  CTY. CTY_Code, Rating_value .group_code, Rating_value. Rating_scale_value
from
(select '09' as group_code, '07’ as Rating_scale_value , '1' as key) as Rating_value
left join (select distinct CTY_Code, '1' as key
from Country_Code) as CTY

 on CTY. key = Rating_value. Key