Wednesday, January 21, 2015

Teradata: Row returned message via Macro can make you puzzle

I am going to share a interesting problem that I found while I was executing a macro.













By looking at the result set window(above image), if you click ‘NO’ button how many rows you expect to get?  You may think how stupid question it is? It’s clearly showing ‘427755’ rows so it must be returning ‘427755’ rows.

Yap!! I believed so but I was completely wrong. 

After clicking, ‘NO’ I got 142,585 rows as a result set. Now question is, who is stupid? Me or Teradata SQL Assistant??  What’s going on inside? Let’s find out.

Let closer look at the problem. To explain the problem, I am taking similar example with small number of rows.

Say, we have two tables one is ‘Customer_Info’ and other is ‘Transaction’ Table and data look like below:
   Customer_id  Name Address         Active
101                  John 37 Lucas Aven     Y
103                  Paul 47 Honey bee       Y
105                  Mile 37 Oxford Street   Y
     
          Fig 1: Customer_Info table data

Customer_id Trn_AMT Trn_Date
101 10000   2014-11-28
102 5000 2014-11-29
103 8900 2015-01-09
105 6000 2015-01-10
                 
Fig 2: Transaction table with data

Lets join  two tables:

Select CUS.Name as Customer_Name
 , CUS.Address as Customer_Address
 , T1.TRN_AMT as Amount
 , T1.TRN_Date as Transaction_DATE
 FROM Customer_Info AS CUS
 LEFT JOIN Transaction AS T1
 on T1.customer_id=CUS.customer_id;

Which return below result set:

Customer_Name Customer_Address Amount   Transaction_DATE
Mile       37 Oxford Street 6000           2015-01-10
John       37 Lucas Aven       10000          2014-11-28
Paul     47 Honey bee 8900    2015-01-09
            
Fig 3: result set after Left joining with two tables 

And Teradata SQL assistant, show below message in the bottom pane which is exactly same as returned value.
                            
                                   Fig 4: Row returned message

However, we will try now below code:

 Delete from CUSTOMER_TRANSACTION;   --3 rows deleted

 Insert into CUSTOMER_TRANSACTION   --3 rows inserted
 select CUS.Name as Customer_Name
 , CUS.Address as Customer_Address
 , T1.TRN_AMT as TRansaction_Amount
 , T1.TRN_Date as Transaction_DATE

 FROM Customer_Info AS CUS
 LEFT JOIN Transsaction AS T1
 on T1.customer_id=CUS.customer_id;
Select * from CUSTOMER_TRANSACTION;  --3 rows selected

It means total 9 rows have been processed by above code, and exactly shows like below:









Fig 5: rows processed message

At this point, we will make a macro with above code which is:

create macro TRN_Process_by_Customer
 AS
 (
 delete from CUSTOMER_TRANSACTION;       --delete 3 rows

 insert into CUSTOMER_TRANSACTION        --insert 3 rows
 select CUS.Name as Customer_Name
 , CUS.Address as Customer_Address
 , T1.TRN_AMT as TRansaction_Amount
 , T1.TRN_Date as Transaction_DATE

 FROM Customer_Info AS CUS
 LEFT JOIN Transsaction AS T1
 on T1.customer_id=CUS.customer_id;

 select * from CUSTOMER_TRANSACTION;      --select 3 rows, which is return result
 )
And execute the macro:
exec TRN_Process_by_Customer    ---what do you expect to have message from Teradata SQL Assistant?
If you look at the beginning of this article, we have seen number of rows returned ‘427755’ but actual rows were ‘142,585’. 

And this case, it’s same, we expect to get result saying “Execute completed, 3rows returned….” But actually you got the below message. 








Fig 6: misguiding message


It means when we execute a macro it show return rows as processed rows.  To avoid confusion, message could be look like “Execute completed, 9rows processed, 3 rows returned….”
Since above example return only 3 rows so you will be happy to see that you got the right result though display message saying something different.
But for huge number of rows return can give you big confusion and you will not have any clue what’s going on until you press the “NO” button and count actual rows. 


Saturday, December 27, 2014

Analytical function: ROW_NUMBER () with Qualify in Teradata

This post will help to understand QUALIFY  ROW_NUMBER OVER (PARTITION BY ...) phrase with simple example.

We take an example like below:

INDUST_CODE
INDUSTRY_NAME
ACTIVATE_DATE
1011
GOOGLE
13-12-2009
1013
YOUTUBE
01-09-2008
1011
GOOGLE
15-12-2009
1012
MICROSOFT
14-10-1999
1011
GOOGLE
11-12-2009

The data listed above have more than one row with Industry named ‘Google’.  Business needs the industry name without duplication, they don’t want one industry name more than once. They like to see the data like below:

INDUST_CODE
 INDUSTRY_NAME

1011
GOOGLE

1012
MICROSOFT

1013
YOUTUBE


How do you do that?
You are expert in SQL, so wrote the script like below which works fine:
select INDUST_CODE,INDUSTRY_NAME
from INDUSTRY
QUALIFY
row_number() Over (partition by INDUST_CODE order by ACTIVATE_DATE desc)=1

If you want to know how the code worked? Then please find the explanation step by step:
1) Group by (Partition by)with Industry Code (INDUST_CODE)
2) Order by with ACTIVATE_DATE which means latest date will be used to find the first row
3) Then uses ‘QUALIFY’ to select the row.

Lets explain the syntax :





















Make the translation easy:
qualify row_number() over (Partition and order by brace) = 1
    which means,
    Selecting first row from the result set after applying group by and order by function. 



Thursday, December 4, 2014

Extra Attention while using Multiple Tables in Update Statement (UPDATE From SELECT Statement/Using JOIN in UPDATE Statement)

While updating data in a table you may discover error with “Error: 3993 Illegal usage of alias name” , it will be really difficult to find out at first glance what went wrong?
As you wrote the code, at least you did not expect to have alias error. You are almost sure that you did not do any mistake like alias problem.

Let’s have a look at the piece of code:

Update ACCOUNTS AS ACC
FROM
(SELECT
 UPD_ACC.ACC_NO
,UPD_ACC.DEB_AMT
FROM CUSTOMER AS CUST
JOIN
TRANSACTIONS AS TRN ON
 CUST. City_ID =TRN.City_ID
AND CUST.Customer_ID=TRN.Customer_ID
Where TRN.CTY='NY') AS UPD_ACC
SET ACC.ACC_NO= UPD_ACC.ACC_NO
     ,ACC.DEB_AMT= UPD_ACC.DEB_AMT
Where
ACC.Customer_ID= UPD_ACC.Customer_ID
And ACC. City_ID =UPD_ACC.City_ID

After you run the code, you can’t expect to have alias error because you have used alias correctly in the code.
However, the main reason of the error is; you are not allow to use alias left side of the SET syntax like you did at above:  Set ACC.ACC_NO= UPD_ACC.ACC_NO

So correct syntax will be: Set ACC_NO= UPD_ACC.ACC_NO

As well as, at where clause you need to use the table name not the alias(the table you are updating).
e.g. 

Where  
        ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID --NOT ACC.Customer_ID
And ACCOUNTS. City_ID =UPD_ACC.City_ID

Now we can fix the code and rewrite again like below which will work without error:

Update ACCOUNTS 
FROM 
(SELECT
 UPD_ACC.ACC_NO
,UPD_ACC.DEB_AMT
FROM CUSTOMER AS CUST
JOIN 
TRANSACTIONS AS TRN ON
CUST. City_ID =TRN.City_ID
AND CUST.Customer_ID=TRN.Customer_ID
Where TRN.CTY='NY') as UPD_ACC
SETACC_NO= UPD_ACC.ACC_NO
     ,DEB_AMT= UPD_ACC.DEB_AMT
Where  
         ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID
AND ACCOUNTS. City_ID =UPD_ACC.City_ID

Tuesday, November 18, 2014

Data Import failure in the midway: What to DO?

If you work with data warehouse, sometime you may get task to insert data into a table from a .csv file. It's simple task, isn't it?? Just use import functionality from Terdadata SQL Assistant and insert the records. (How to import data from .csv to teradata table is not scope of this article).

But it's not always true that it can straight forward. Considering a task you got from business where you received a .csv file with 50,000 rows and 15 columns in it. And  this data you have to insert into a table (not more than a table). You didn't think much and press F5 to run the query and it's started getting data into the table. You feel happy :) But your happiness will be blurred if you see an error message saying "Duplicate row error in...."after inserting 10,000 records.

Now you will start thinking "I should have think through the situation before pressing the F5 key, I am stupid bla. bla." No worries, human being make error, so you do!!

We will find out the way out :) Assume your table already had few hundred thousands rows and you inserted just now 10,000 and then it failed. Smart way to get rid off the problem is to find out the deviation of the records and insert the missing rows.

What will be steps to fix the problem?

We start with an example to explain the steps involved to solve the problem, say table name is Customer and the received  .csv file from business called 'Customer_from_Mars.csv"

Step 1: Insert the data into a Temporary Table
Take the .csv file and insert data from .csv to a temporary table called "TMP_TBL_Customer_Mars".

Step 2: Find out the deviation
To find out the deviation you can use MINUS operator. The Code will look like below:

--Select the all customers from .csv file that you have inserted into temp table

Select TMP_CUST_TBL.Cust_ID
,TMP_CUST_TBL.First_Name
,TMP_CUST_TBL.PLANET_ID
,TMP_CUST_TBL.XtraField1
,TMP_CUST_TBL.XtraField2
,TMP_CUST_TBL.XtraField3
,TMP_CUST_TBL.XtraField4
,TMP_CUST_TBL.XtraField5
 TMP_CUST_TBL.XtraField6 
FROM TMP_TBL_Customer_Mars  as TMP_CUST_TBL

MINUS
--Find out the customer which already inserted before the failure occurred.
Select TMP_CUST_MARS.Cust_ID
,TMP_CUST_MARS.First_Name
,TMP_CUST_MARS.PLANET_ID
,TMP_CUST_MARS.XtraField1
,TMP_CUST_MARS.XtraField2
,TMP_CUST_MARS.XtraField3
,TMP_CUST_MARS.XtraField4
,TMP_CUST_MARS.XtraField5
TMP_CUST_MARS.XtraField6
FROM CUSTOMER AS CUST
JOIN
TMP_TBL_Customer_Mars AS TMP_CUST_MARS
ON
CUST.cust_id=TMP_CUST_MAR.Cust_ID
AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID

At first you are selecting whole data from .csv file that you have stored in temp table and find out the already inserted record by joining with main table and finally, use MINUS operator between them to get the records those did not insert.

Step 3: Insert the missing data to the table
Now you are ready to insert the rest of the data into the table "Customer"

Insert into CUSTOMER

Select TMP_CUST_TBL.Cust_ID
,TMP_CUST_TBL.First_Name
,TMP_CUST_TBL.PLANET_ID
,TMP_CUST_TBL.XtraField1
,TMP_CUST_TBL.XtraField2
,TMP_CUST_TBL.XtraField3
,TMP_CUST_TBL.XtraField4
,TMP_CUST_TBL.XtraField5
 TMP_CUST_TBL.XtraField6 
FROM TMP_TBL_Customer_Mars  as TMP_CUST_TBL

MINUS
--Find out the customer which already inserted before the failure occurred.
Select TMP_CUST_MARS.Cust_ID
,TMP_CUST_MARS.First_Name
,TMP_CUST_MARS.PLANET_ID
,TMP_CUST_MARS.XtraField1
,TMP_CUST_MARS.XtraField2
,TMP_CUST_MARS.XtraField3
,TMP_CUST_MARS.XtraField4
,TMP_CUST_MARS.XtraField5
TMP_CUST_MARS.XtraField6
FROM CUSTOMER AS CUST
JOIN
TMP_TBL_Customer_Mars AS TMP_CUST_MARS
ON
CUST.cust_id=TMP_CUST_MAR.Cust_ID
AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID


Friday, November 14, 2014

Column length Limitation In Teradata


Maybe you got a task to create a view. And you are making alias of the columns names as per your customer request. Sometimes by giving the sensible name you will experience the limitation of teradata column length.

e.g. your query can look like below:

Create view View_generate_error3737
AS
SELECT
 
      Customer.Name as customer_name_from_Mars_with_tedius_huge_ear
      ,Customer.Age as Customer_age_from_march
     
FROM
Alian_Information AS Customer
where CTY='Mars' and behavior='tedius_huge_ear'

As long as you pressed F5, and will found the error like below:



Fig: 3737 error






Error: "3737: name requires more than 30 bytes in LATIN internal form" which is self explanatory. It means you can't define a column name that has characters more than 30.

So by putting bigger column name you can be ended up with a compile error in Teradata.

Monday, October 20, 2014

SAS DI Error: "CLI error trying to establish connection...."

I have been working with SAS DI studio for couple of months and never found the error like this before,“CLI error trying to establish connection: [Teradata][ODBC Teradata Driver][Teradata Database] The UserId, Password or Account is invalid. : [Teradata][ODBC Teradata Driver] Not enough information to log on ”

Error is self-explanatory, it means the user id and password I have given is incorrect. I was really surprised!!! It was working fine last week. What happened in the mean time?? Who changes my SAS password??
No No, it’s not your SAS credential, look at the error carefully; it’s your Tera data credential.  I tried to remember and finally recalled my credential for Teradata has been changed in the meantime (few days ago).


Oooo Yes, I know how to fix it!!??? I opened ODBC data source administration and changed the password (Fig:1). It should work now.  Don’t you think same like me?


Fig 1: ODBC setup for Teradata

Nope, it did not work; still got the same error.

After searching a little bit found there is a SAS tool called ‘SAS Personal Login Manager’ where you should update your database password if your database credential changes.
How to do that?

It’s simple; open the tool “SAS Personal Login Manager” and updates the password and closes it (Fig:2).  

                                             Fig2: SAS personal Login Manager


Now if you reopen your SAS DI studio and run the job then the error will disappear.





Tuesday, October 14, 2014

Teradata: How to update duplicate records

Generally update statement will not allow updating duplicated records; records must be unique if you want to update a table.

But sometimes your business needs duplicate records in a table. Then your task is to provide the solution. How do you do that?

You need to make a new column that can generate sequential numbers and you need to add that column to duplicate rows to make a unique row.

For example:
Your data look like below, say table name is Weather_DK:

Country
CTY
AVG_TEMP
AVG_Humidity
DK
CPH
12.3
83%
DK
CPH
12.3
83%
DK
CPH
12.3
83%
DK
Arhus
11.3
85%
DK
Aalborg
12.3
87%
                                        Fig 1.1: Table Weather_DK

For example, now you want to update the data in Weather_SE table which has same column name as Weather_DK table  and data like below:
Country
AVG_TEMP
AVG_Humidity
SE
19.3
43%
SE
23.3
83%
SE
12.3
53%
                                    Fig 1.2: Table Weather_SE

Say; your business informed, Swedish average tempreture and humidity is exactly same as Denmark's capital city Copenhagen's (CPH) average temperature and humidity.

So your job is to update the Weather_SE table’s two column AVG_TEMP and AVG_Humidity  from Weather_DK data. When you write SQL query like below:

Update Weather_SE
Set
AVG_TEMP= DK.AVG_TEMP
AVG_Humidity=DK.AVG_Humidity
from
(Select  AVG_TEMP , AVG_Humidity
From Weather_DK
group by 1,2) as DK
where DK.CTY='CPH'

When you will run the above script you will get  error :”Failure 7547 Target row updated by multiple source rows”

To eliminate the error, you need to have unique rows. So you add a new columns with your result set, you can select like below:

select ROW_NUMBER() over (order by AVG_TEMP) as row_S, AVG_TEMP, AVG_Humidity
from Weather_DK  as  DK

Now you can easily update.