Saturday, December 27, 2014

Analytical function: ROW_NUMBER () with Qualify in Teradata

This post will help to understand QUALIFY  ROW_NUMBER OVER (PARTITION BY ...) phrase with simple example.

We take an example like below:

INDUST_CODE
INDUSTRY_NAME
ACTIVATE_DATE
1011
GOOGLE
13-12-2009
1013
YOUTUBE
01-09-2008
1011
GOOGLE
15-12-2009
1012
MICROSOFT
14-10-1999
1011
GOOGLE
11-12-2009

The data listed above have more than one row with Industry named ‘Google’.  Business needs the industry name without duplication, they don’t want one industry name more than once. They like to see the data like below:

INDUST_CODE
 INDUSTRY_NAME

1011
GOOGLE

1012
MICROSOFT

1013
YOUTUBE


How do you do that?
You are expert in SQL, so wrote the script like below which works fine:
select INDUST_CODE,INDUSTRY_NAME
from INDUSTRY
QUALIFY
row_number() Over (partition by INDUST_CODE order by ACTIVATE_DATE desc)=1

If you want to know how the code worked? Then please find the explanation step by step:
1) Group by (Partition by)with Industry Code (INDUST_CODE)
2) Order by with ACTIVATE_DATE which means latest date will be used to find the first row
3) Then uses ‘QUALIFY’ to select the row.

Lets explain the syntax :





















Make the translation easy:
qualify row_number() over (Partition and order by brace) = 1
    which means,
    Selecting first row from the result set after applying group by and order by function. 



Thursday, December 4, 2014

Extra Attention while using Multiple Tables in Update Statement (UPDATE From SELECT Statement/Using JOIN in UPDATE Statement)

While updating data in a table you may discover error with “Error: 3993 Illegal usage of alias name” , it will be really difficult to find out at first glance what went wrong?
As you wrote the code, at least you did not expect to have alias error. You are almost sure that you did not do any mistake like alias problem.

Let’s have a look at the piece of code:

Update ACCOUNTS AS ACC
FROM
(SELECT
 UPD_ACC.ACC_NO
,UPD_ACC.DEB_AMT
FROM CUSTOMER AS CUST
JOIN
TRANSACTIONS AS TRN ON
 CUST. City_ID =TRN.City_ID
AND CUST.Customer_ID=TRN.Customer_ID
Where TRN.CTY='NY') AS UPD_ACC
SET ACC.ACC_NO= UPD_ACC.ACC_NO
     ,ACC.DEB_AMT= UPD_ACC.DEB_AMT
Where
ACC.Customer_ID= UPD_ACC.Customer_ID
And ACC. City_ID =UPD_ACC.City_ID

After you run the code, you can’t expect to have alias error because you have used alias correctly in the code.
However, the main reason of the error is; you are not allow to use alias left side of the SET syntax like you did at above:  Set ACC.ACC_NO= UPD_ACC.ACC_NO

So correct syntax will be: Set ACC_NO= UPD_ACC.ACC_NO

As well as, at where clause you need to use the table name not the alias(the table you are updating).
e.g. 

Where  
        ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID --NOT ACC.Customer_ID
And ACCOUNTS. City_ID =UPD_ACC.City_ID

Now we can fix the code and rewrite again like below which will work without error:

Update ACCOUNTS 
FROM 
(SELECT
 UPD_ACC.ACC_NO
,UPD_ACC.DEB_AMT
FROM CUSTOMER AS CUST
JOIN 
TRANSACTIONS AS TRN ON
CUST. City_ID =TRN.City_ID
AND CUST.Customer_ID=TRN.Customer_ID
Where TRN.CTY='NY') as UPD_ACC
SETACC_NO= UPD_ACC.ACC_NO
     ,DEB_AMT= UPD_ACC.DEB_AMT
Where  
         ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID
AND ACCOUNTS. City_ID =UPD_ACC.City_ID

Tuesday, November 18, 2014

Data Import failure in the midway: What to DO?

If you work with data warehouse, sometime you may get task to insert data into a table from a .csv file. It's simple task, isn't it?? Just use import functionality from Terdadata SQL Assistant and insert the records. (How to import data from .csv to teradata table is not scope of this article).

But it's not always true that it can straight forward. Considering a task you got from business where you received a .csv file with 50,000 rows and 15 columns in it. And  this data you have to insert into a table (not more than a table). You didn't think much and press F5 to run the query and it's started getting data into the table. You feel happy :) But your happiness will be blurred if you see an error message saying "Duplicate row error in...."after inserting 10,000 records.

Now you will start thinking "I should have think through the situation before pressing the F5 key, I am stupid bla. bla." No worries, human being make error, so you do!!

We will find out the way out :) Assume your table already had few hundred thousands rows and you inserted just now 10,000 and then it failed. Smart way to get rid off the problem is to find out the deviation of the records and insert the missing rows.

What will be steps to fix the problem?

We start with an example to explain the steps involved to solve the problem, say table name is Customer and the received  .csv file from business called 'Customer_from_Mars.csv"

Step 1: Insert the data into a Temporary Table
Take the .csv file and insert data from .csv to a temporary table called "TMP_TBL_Customer_Mars".

Step 2: Find out the deviation
To find out the deviation you can use MINUS operator. The Code will look like below:

--Select the all customers from .csv file that you have inserted into temp table

Select TMP_CUST_TBL.Cust_ID
,TMP_CUST_TBL.First_Name
,TMP_CUST_TBL.PLANET_ID
,TMP_CUST_TBL.XtraField1
,TMP_CUST_TBL.XtraField2
,TMP_CUST_TBL.XtraField3
,TMP_CUST_TBL.XtraField4
,TMP_CUST_TBL.XtraField5
 TMP_CUST_TBL.XtraField6 
FROM TMP_TBL_Customer_Mars  as TMP_CUST_TBL

MINUS
--Find out the customer which already inserted before the failure occurred.
Select TMP_CUST_MARS.Cust_ID
,TMP_CUST_MARS.First_Name
,TMP_CUST_MARS.PLANET_ID
,TMP_CUST_MARS.XtraField1
,TMP_CUST_MARS.XtraField2
,TMP_CUST_MARS.XtraField3
,TMP_CUST_MARS.XtraField4
,TMP_CUST_MARS.XtraField5
TMP_CUST_MARS.XtraField6
FROM CUSTOMER AS CUST
JOIN
TMP_TBL_Customer_Mars AS TMP_CUST_MARS
ON
CUST.cust_id=TMP_CUST_MAR.Cust_ID
AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID

At first you are selecting whole data from .csv file that you have stored in temp table and find out the already inserted record by joining with main table and finally, use MINUS operator between them to get the records those did not insert.

Step 3: Insert the missing data to the table
Now you are ready to insert the rest of the data into the table "Customer"

Insert into CUSTOMER

Select TMP_CUST_TBL.Cust_ID
,TMP_CUST_TBL.First_Name
,TMP_CUST_TBL.PLANET_ID
,TMP_CUST_TBL.XtraField1
,TMP_CUST_TBL.XtraField2
,TMP_CUST_TBL.XtraField3
,TMP_CUST_TBL.XtraField4
,TMP_CUST_TBL.XtraField5
 TMP_CUST_TBL.XtraField6 
FROM TMP_TBL_Customer_Mars  as TMP_CUST_TBL

MINUS
--Find out the customer which already inserted before the failure occurred.
Select TMP_CUST_MARS.Cust_ID
,TMP_CUST_MARS.First_Name
,TMP_CUST_MARS.PLANET_ID
,TMP_CUST_MARS.XtraField1
,TMP_CUST_MARS.XtraField2
,TMP_CUST_MARS.XtraField3
,TMP_CUST_MARS.XtraField4
,TMP_CUST_MARS.XtraField5
TMP_CUST_MARS.XtraField6
FROM CUSTOMER AS CUST
JOIN
TMP_TBL_Customer_Mars AS TMP_CUST_MARS
ON
CUST.cust_id=TMP_CUST_MAR.Cust_ID
AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID


Friday, November 14, 2014

Column length Limitation In Teradata


Maybe you got a task to create a view. And you are making alias of the columns names as per your customer request. Sometimes by giving the sensible name you will experience the limitation of teradata column length.

e.g. your query can look like below:

Create view View_generate_error3737
AS
SELECT
 
      Customer.Name as customer_name_from_Mars_with_tedius_huge_ear
      ,Customer.Age as Customer_age_from_march
     
FROM
Alian_Information AS Customer
where CTY='Mars' and behavior='tedius_huge_ear'

As long as you pressed F5, and will found the error like below:



Fig: 3737 error






Error: "3737: name requires more than 30 bytes in LATIN internal form" which is self explanatory. It means you can't define a column name that has characters more than 30.

So by putting bigger column name you can be ended up with a compile error in Teradata.

Monday, October 20, 2014

SAS DI Error: "CLI error trying to establish connection...."

I have been working with SAS DI studio for couple of months and never found the error like this before,“CLI error trying to establish connection: [Teradata][ODBC Teradata Driver][Teradata Database] The UserId, Password or Account is invalid. : [Teradata][ODBC Teradata Driver] Not enough information to log on ”

Error is self-explanatory, it means the user id and password I have given is incorrect. I was really surprised!!! It was working fine last week. What happened in the mean time?? Who changes my SAS password??
No No, it’s not your SAS credential, look at the error carefully; it’s your Tera data credential.  I tried to remember and finally recalled my credential for Teradata has been changed in the meantime (few days ago).


Oooo Yes, I know how to fix it!!??? I opened ODBC data source administration and changed the password (Fig:1). It should work now.  Don’t you think same like me?


Fig 1: ODBC setup for Teradata

Nope, it did not work; still got the same error.

After searching a little bit found there is a SAS tool called ‘SAS Personal Login Manager’ where you should update your database password if your database credential changes.
How to do that?

It’s simple; open the tool “SAS Personal Login Manager” and updates the password and closes it (Fig:2).  

                                             Fig2: SAS personal Login Manager


Now if you reopen your SAS DI studio and run the job then the error will disappear.





Tuesday, October 14, 2014

Teradata: How to update duplicate records

Generally update statement will not allow updating duplicated records; records must be unique if you want to update a table.

But sometimes your business needs duplicate records in a table. Then your task is to provide the solution. How do you do that?

You need to make a new column that can generate sequential numbers and you need to add that column to duplicate rows to make a unique row.

For example:
Your data look like below, say table name is Weather_DK:

Country
CTY
AVG_TEMP
AVG_Humidity
DK
CPH
12.3
83%
DK
CPH
12.3
83%
DK
CPH
12.3
83%
DK
Arhus
11.3
85%
DK
Aalborg
12.3
87%
                                        Fig 1.1: Table Weather_DK

For example, now you want to update the data in Weather_SE table which has same column name as Weather_DK table  and data like below:
Country
AVG_TEMP
AVG_Humidity
SE
19.3
43%
SE
23.3
83%
SE
12.3
53%
                                    Fig 1.2: Table Weather_SE

Say; your business informed, Swedish average tempreture and humidity is exactly same as Denmark's capital city Copenhagen's (CPH) average temperature and humidity.

So your job is to update the Weather_SE table’s two column AVG_TEMP and AVG_Humidity  from Weather_DK data. When you write SQL query like below:

Update Weather_SE
Set
AVG_TEMP= DK.AVG_TEMP
AVG_Humidity=DK.AVG_Humidity
from
(Select  AVG_TEMP , AVG_Humidity
From Weather_DK
group by 1,2) as DK
where DK.CTY='CPH'

When you will run the above script you will get  error :”Failure 7547 Target row updated by multiple source rows”

To eliminate the error, you need to have unique rows. So you add a new columns with your result set, you can select like below:

select ROW_NUMBER() over (order by AVG_TEMP) as row_S, AVG_TEMP, AVG_Humidity
from Weather_DK  as  DK

Now you can easily update.


Tuesday, September 30, 2014

Challenge Scenario: Inserting data into a third Table

Sometimes you may find problem like below:

You have table called “Country_Code” that holds country code and name like below:
CTY_CODE
Country_name
001
USA
86
China
46
Sweden
45
Denmark

Now you have same group_code and communication rating scale value for all above country.
e.g.
group_code=09 and rating_scale_value=7

Your task is to insert data to the second table which called “Code_Rating_value”

Data should look like below:
CTY_CODE
group_code
Rating_scale_value
001
09
7
86
09
7
46
09
7
45
09
7

If it’s small amount of data then it’s easy to insert by following:
Insert into Code_Rating_value
Values(001,09,7)

Execute the above statement by changing the value four times and you are done!!
But if it’s a big table with thousands of records you will not going to do this thousands of times, will you??

You will find the below code to insert the data at once.

insert into Code_Rating_value
select  CTY. CTY_Code, Rating_value .group_code, Rating_value. Rating_scale_value
from
(select '09' as group_code, '07’ as Rating_scale_value , '1' as key) as Rating_value
left join (select distinct CTY_Code, '1' as key
from Country_Code) as CTY

 on CTY. key = Rating_value. Key

Wednesday, August 6, 2014

While do cast (Decimal to Char): follow me!!!

As a part of my task; I was making query something like below,
select cast (city_code as char(10)), city_name||’_’ ||city_type as CITY_INFO
From CITY

I wondered why my data  for City_Code look like this:
City_Code
CITY_INFO
10134.
COPENHAGEN_CYCKLE
101.
STOCKHOLM_TOG
3497.
DHAKA_RICKSH
2789.
LISBON_TOUR

City_code ended with ‘.’ Then looked carefully the original datatype for the column (CITY_CODE), it’s decimal (18, 0). That is the reason why it’s giving the problem.
Googled it and found some solution:
select cast(cast(city_code as integer) as char(10)), city_name||’_’ ||city_type as CITY_INFO
From CITY

Since the column is decimal, it’s first cast to integer and then cast to char. It works perfectly. But senior developer who reviewed the code asked not to two time casting rather use trim function to do this.
Fair enough!! Got motivation to do something else J
Then I have tried to implement the code like below:
select TRIM(TRAILING '.' FROM (cast(UNQ_ID_SRC_DEC as char(10))))
,city_name||’_’ ||city_type as CITY_INFO
 From CITY
I have trimmed with trailing ‘.’ But result did not change!!
City_Code
CITY_INFO
10134.
COPENHAGEN_CYCKLE
101.
STOCKHOLM_TOG
3497.
DHAKA_RICKSH
2789.
LISBON_TOUR
I was really curious to know where the problem is. After investigating a little more time on it, I made the query like below:
select TRIM(TRAILING '.' FROM (cast(UNQ_ID_SRC_DEC as varchar(10))))
,city_name||’_’ ||city_type as CITY_INFO
 From CITY
Yaahuuu!! It works as I expected. ‘.’ has been removed.
City_Code
CITY_INFO
10134
COPENHAGEN_CYCKLE
101
STOCKHOLM_TOG
3497
DHAKA_RICKSH
2789
LISBON_TOUR

I just changes cast data type from char to varchar and then it works fine!!

Hope my experience will  help some of you sometime!!!

Tuesday, August 5, 2014

Tricks: UNION and UNION ALL

While you are making union in Teradata it’s very important to follow below rules:

1) Select statement should have equal number of columns
For example:
select city_code, city_name, city_type
From CITY
union
select EMP_CITY_CODE as city_code, EMP_CITY as city_name, EMP_city_type as City_type
from CITY_EMP

2) Type should be same
If data type is different; then it should be cast and then make union, if you don’t define/cast the type then value will be truncated. So when you define type consider maximum size of data type.
For example:
select cast (city_code as char(10)), city_name, city_type
From CITY
union
select cast(EMP_CITY_CODE as char(10)) as city_code, EMP_CITY as city_name, EMP_city_type as City_type
from CITY_EMP

3) Where clause at UNION
At the time you are selecting if you cast a type , always remember to do the same when you filter at where clause.
For example:
insert DW_EMP_CITY (city_code,city_name,city_type)
select cast (city_code as char(10)), city_name, city_type
From CITY
union
select cast(EMP_CITY_CODE as char(10)) as city_code, EMP_CITY as city_name, EMP_city_type as City_type
from CITY_EMP
where not exists (select 1 from DW_EMP_CITY as EDW_CITY where EDW_CITY.city_code=cast(CITY_EMP.EMP_CITY_CODE as char(10))

4) When to use UNION/UNION ALL

If you want to avoid duplicates then use ‘UNION’ otherwise you use ‘UNION ALL’.

Thursday, May 29, 2014

SQL Left Join: Condition on where clause or on clause

As a database developer, we know about SQL left, right or inner join clearly. But when it comes to left join or right join with condition putting on where clause or on clause can make big  difference that we sometimes overlook or get panic why those making difference.
I will go directly to the example, e.g. we have Account and Customer table and we want all the accounts those are exist in Account table and also consider if any customer is added on '2013-05-27', we take him as well.


Here we have Account and Customer table and data look like below:

Account
Account_No
Status
Eff_DT
Customr_ID
123
actv_
21-05-2014
301
124
actv_
22-05-2014
307
125
actv_
23-05-2014
303


Customer
Customer_ID
Customer_Name
Eff_DT
301
john
21-05-2014
302
san
21-05-2014
303
paul
22-05-2014
304
helle
23-05-2014
305
jon
24-05-2014

SQL Query :
select Account_no, status,A. eff-dt, A.customer_id, Customer_Name
 from ACCOUNT as A
join customer as C on A.customer_id=C.customer_id
Result Set:
Account_No
Status
Eff_DT
Customr_ID
Customer_Name
123
actv_
21-05-2014
301
john
124
actv_
22-05-2014
307
null
125
actv_
23-05-2014
303
paul
Changes the query with 'on' clause:
select Account_no, status,A. eff-dt, A.customer_id, Customer_Name, C.EFF_DT as Customer_EFF_DT
 from ACCOUNT as A
join customer as C on A.customer_id=C.customer_id
and C.eff_dt= date '2013-05-24'
Result will look like:
Account_No
Status                        
Eff_DT
Customr_ID
Customer_Name
Customer_EFF_DT
123
actv_
21-05-2014
301
john
null
124
actv_
22-05-2014
307
null
null
125
actv_
23-05-2014
303
paul
null
Changes the query with 'Where' clause
select Account_no, status,A. eff-dt, A.customer_id, Customer_Name, C.EFF_DT as Customer_EFF_DT 
from ACCOUNT as A
join customer as C on A.customer_id=C.customer_id
where C.eff_dt= date '2013-05-24'
Result will return nothing
It means where clause filter after join has established. When we did same condition 
 C.eff_dt= date '2013-05-27' on clause it was still under left join on condition
and that returns you data as how left join work. But where clause work after making join, 
if we look at the last result, consider we are implying where clause on the last record set.