It's all about Data

Wednesday, January 21, 2015

Teradata: Row returned message via Macro can make you puzzle

I am going to share a interesting problem that I found while I was executing a macro.

By looking at the result set window(above image), if you click ‘NO’ button how many rows you expect to get? You may think how stupid question it is? It’s clearly showing ‘427755’ rows so it must be returning ‘427755’ rows.

Yap!! I believed so but I was completely wrong.

After clicking, ‘NO’ I got 142,585 rows as a result set. Now question is, who is stupid? Me or Teradata SQL Assistant?? What’s going on inside? Let’s find out.

Let closer look at the problem. To explain the problem, I am taking similar example with small number of rows.

Say, we have two tables one is ‘Customer_Info’ and other is ‘Transaction’ Table and data look like below:

Customer_id Name Address Active

101 John 37 Lucas Aven Y

103 Paul 47 Honey bee Y

105 Mile 37 Oxford Street Y

Fig 1: Customer_Info table data

Customer_id Trn_AMT Trn_Date

101 10000 2014-11-28

102 5000 2014-11-29

103 8900 2015-01-09

105 6000 2015-01-10

Fig 2: Transaction table with data

Lets join two tables:

Select CUS.Name as Customer_Name

, CUS.Address as Customer_Address

, T1.TRN_AMT as Amount

, T1.TRN_Date as Transaction_DATE

FROM Customer_Info AS CUS

LEFT JOIN Transaction AS T1

on T1.customer_id=CUS.customer_id;

Which return below result set:

Customer_Name Customer_Address Amount Transaction_DATE

Mile 37 Oxford Street 6000 2015-01-10

John 37 Lucas Aven 10000 2014-11-28

Paul 47 Honey bee 8900 2015-01-09

Fig 3: result set after Left joining with two tables

And Teradata SQL assistant, show below message in the bottom pane which is exactly same as returned value.

Fig 4: Row returned message

However, we will try now below code:

Delete from CUSTOMER_TRANSACTION; --3 rows deleted

Insert into CUSTOMER_TRANSACTION --3 rows inserted

select CUS.Name as Customer_Name

, CUS.Address as Customer_Address

, T1.TRN_AMT as TRansaction_Amount

, T1.TRN_Date as Transaction_DATE

FROM Customer_Info AS CUS

LEFT JOIN Transsaction AS T1

on T1.customer_id=CUS.customer_id;

Select * from CUSTOMER_TRANSACTION; --3 rows selected

It means total 9 rows have been processed by above code, and exactly shows like below:

Fig 5: rows processed message

At this point, we will make a macro with above code which is:

create macro TRN_Process_by_Customer

(

delete from CUSTOMER_TRANSACTION; --delete 3 rows

insert into CUSTOMER_TRANSACTION --insert 3 rows

select CUS.Name as Customer_Name

, CUS.Address as Customer_Address

, T1.TRN_AMT as TRansaction_Amount

, T1.TRN_Date as Transaction_DATE

FROM Customer_Info AS CUS

LEFT JOIN Transsaction AS T1

on T1.customer_id=CUS.customer_id;

select * from CUSTOMER_TRANSACTION; --select 3 rows, which is return result

)

And execute the macro:

exec TRN_Process_by_Customer ---what do you expect to have message from Teradata SQL Assistant?

If you look at the beginning of this article, we have seen number of rows returned ‘427755’ but actual rows were ‘142,585’.

And this case, it’s same, we expect to get result saying “Execute completed, 3rows returned….” But actually you got the below message.

Fig 6: misguiding message

It means when we execute a macro it show return rows as processed rows. To avoid confusion, message could be look like “Execute completed, 9rows processed, 3 rows returned….”

Since above example return only 3 rows so you will be happy to see that you got the right result though display message saying something different.

But for huge number of rows return can give you big confusion and you will not have any clue what’s going on until you press the “NO” button and count actual rows.

Saturday, December 27, 2014

Analytical function: ROW_NUMBER () with Qualify in Teradata

This post will help to understand QUALIFY ROW_NUMBER OVER (PARTITION BY ...) phrase with simple example.

We take an example like below:

INDUST_CODE	INDUSTRY_NAME	ACTIVATE_DATE
1011	GOOGLE	13-12-2009
1013	YOUTUBE	01-09-2008
1011	GOOGLE	15-12-2009
1012	MICROSOFT	14-10-1999
1011	GOOGLE	11-12-2009

The data listed above have more than one row with Industry named ‘Google’. Business needs the industry name without duplication, they don’t want one industry name more than once. They like to see the data like below:

INDUST_CODE	INDUSTRY_NAME
1011	GOOGLE
1012	MICROSOFT
1013	YOUTUBE

How do you do that?

You are expert in SQL, so wrote the script like below which works fine:

select INDUST_CODE,INDUSTRY_NAME

from INDUSTRY

QUALIFY

row_number() Over (partition by INDUST_CODE order by ACTIVATE_DATE desc)=1

If you want to know how the code worked? Then please find the explanation step by step:

1) Group by (Partition by)with Industry Code (INDUST_CODE)

2) Order by with ACTIVATE_DATE which means latest date will be used to find the first row

3) Then uses ‘QUALIFY’ to select the row.

Lets explain the syntax :

Make the translation easy:

qualify row_number() over (Partition and order by brace) = 1

which means,

Selecting first row from the result set after applying group by and order by function.

Thursday, December 4, 2014

Extra Attention while using Multiple Tables in Update Statement (UPDATE From SELECT Statement/Using JOIN in UPDATE Statement)

While updating data in a table you may discover error with “Error: 3993 Illegal usage of alias name” , it will be really difficult to find out at first glance what went wrong?
As you wrote the code, at least you did not expect to have alias error. You are almost sure that you did not do any mistake like alias problem.

Let’s have a look at the piece of code:

Update ACCOUNTS AS ACC
FROM
(SELECT
UPD_ACC.ACC_NO
,UPD_ACC.DEB_AMT
FROM CUSTOMER AS CUST
JOIN
TRANSACTIONS AS TRN ON
CUST. City_ID =TRN.City_ID
AND CUST.Customer_ID=TRN.Customer_ID
Where TRN.CTY='NY') AS UPD_ACC
SET ACC.ACC_NO= UPD_ACC.ACC_NO
,ACC.DEB_AMT= UPD_ACC.DEB_AMT
Where
ACC.Customer_ID= UPD_ACC.Customer_ID
And ACC. City_ID =UPD_ACC.City_ID

After you run the code, you can’t expect to have alias error because you have used alias correctly in the code.

However, the main reason of the error is; you are not allow to use alias left side of the SET syntax like you did at above: Set ACC.ACC_NO= UPD_ACC.ACC_NO

So correct syntax will be: Set ACC_NO= UPD_ACC.ACC_NO

As well as, at where clause you need to use the table name not the alias(the table you are updating).

e.g.

Where

ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID --NOT ACC.Customer_ID

And ACCOUNTS. City_ID =UPD_ACC.City_ID

Now we can fix the code and rewrite again like below which will work without error:

Update ACCOUNTS

FROM

(SELECT

UPD_ACC.ACC_NO

,UPD_ACC.DEB_AMT

FROM CUSTOMER AS CUST

JOIN

TRANSACTIONS AS TRN ON

CUST. City_ID =TRN.City_ID

AND CUST.Customer_ID=TRN.Customer_ID

Where TRN.CTY='NY') as UPD_ACC

SETACC_NO= UPD_ACC.ACC_NO

,DEB_AMT= UPD_ACC.DEB_AMT

Where

ACCOUNTS.Customer_ID= UPD_ACC.Customer_ID

AND ACCOUNTS. City_ID =UPD_ACC.City_ID

Tuesday, November 18, 2014

Data Import failure in the midway: What to DO?

If you work with data warehouse, sometime you may get task to insert data into a table from a .csv file. It's simple task, isn't it?? Just use import functionality from Terdadata SQL Assistant and insert the records. (How to import data from .csv to teradata table is not scope of this article).

But it's not always true that it can straight forward. Considering a task you got from business where you received a .csv file with 50,000 rows and 15 columns in it. And this data you have to insert into a table (not more than a table). You didn't think much and press F5 to run the query and it's started getting data into the table. You feel happy :) But your happiness will be blurred if you see an error message saying "Duplicate row error in...."after inserting 10,000 records.

Now you will start thinking "I should have think through the situation before pressing the F5 key, I am stupid bla. bla." No worries, human being make error, so you do!!

We will find out the way out :) Assume your table already had few hundred thousands rows and you inserted just now 10,000 and then it failed. Smart way to get rid off the problem is to find out the deviation of the records and insert the missing rows.

What will be steps to fix the problem?

We start with an example to explain the steps involved to solve the problem, say table name is Customer and the received .csv file from business called 'Customer_from_Mars.csv"

Step 1: Insert the data into a Temporary Table
Take the .csv file and insert data from .csv to a temporary table called "TMP_TBL_Customer_Mars".

Step 2: Find out the deviation
To find out the deviation you can use MINUS operator. The Code will look like below:

--Select the all customers from .csv file that you have inserted into temp table

Select TMP_CUST_TBL.Cust_ID

,TMP_CUST_TBL.First_Name

,TMP_CUST_TBL.PLANET_ID

,TMP_CUST_TBL.XtraField1

,TMP_CUST_TBL.XtraField2

,TMP_CUST_TBL.XtraField3

,TMP_CUST_TBL.XtraField4

,TMP_CUST_TBL.XtraField5

TMP_CUST_TBL.XtraField6

FROM TMP_TBL_Customer_Mars as TMP_CUST_TBL

MINUS

--Find out the customer which already inserted before the failure occurred.

Select TMP_CUST_MARS.Cust_ID

,TMP_CUST_MARS.First_Name

,TMP_CUST_MARS.PLANET_ID

,TMP_CUST_MARS.XtraField1

,TMP_CUST_MARS.XtraField2

,TMP_CUST_MARS.XtraField3

,TMP_CUST_MARS.XtraField4

,TMP_CUST_MARS.XtraField5

TMP_CUST_MARS.XtraField6

FROM CUSTOMER AS CUST

JOIN

TMP_TBL_Customer_Mars AS TMP_CUST_MARS

CUST.cust_id=TMP_CUST_MAR.Cust_ID

AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID

At first you are selecting whole data from .csv file that you have stored in temp table and find out the already inserted record by joining with main table and finally, use MINUS operator between them to get the records those did not insert.

Step 3: Insert the missing data to the table
Now you are ready to insert the rest of the data into the table "Customer"

Insert into CUSTOMER

Select TMP_CUST_TBL.Cust_ID

,TMP_CUST_TBL.First_Name

,TMP_CUST_TBL.PLANET_ID

,TMP_CUST_TBL.XtraField1

,TMP_CUST_TBL.XtraField2

,TMP_CUST_TBL.XtraField3

,TMP_CUST_TBL.XtraField4

,TMP_CUST_TBL.XtraField5

TMP_CUST_TBL.XtraField6

FROM TMP_TBL_Customer_Mars as TMP_CUST_TBL

MINUS

--Find out the customer which already inserted before the failure occurred.

Select TMP_CUST_MARS.Cust_ID

,TMP_CUST_MARS.First_Name

,TMP_CUST_MARS.PLANET_ID

,TMP_CUST_MARS.XtraField1

,TMP_CUST_MARS.XtraField2

,TMP_CUST_MARS.XtraField3

,TMP_CUST_MARS.XtraField4

,TMP_CUST_MARS.XtraField5

TMP_CUST_MARS.XtraField6

FROM CUSTOMER AS CUST

JOIN

TMP_TBL_Customer_Mars AS TMP_CUST_MARS

CUST.cust_id=TMP_CUST_MAR.Cust_ID

AND CUST.PLANET_ID=TMP_CUST_MAR.PLANET_ID

Friday, November 14, 2014

Column length Limitation In Teradata

Maybe you got a task to create a view. And you are making alias of the columns names as per your customer request. Sometimes by giving the sensible name you will experience the limitation of teradata column length.

e.g. your query can look like below:

Create view View_generate_error3737

SELECT

Customer.Name as customer_name_from_Mars_with_tedius_huge_ear

,Customer.Age as Customer_age_from_march

FROM

Alian_Information AS Customer

where CTY='Mars' and behavior='tedius_huge_ear'

As long as you pressed F5, and will found the error like below:

Fig: 3737 error

Error: "3737: name requires more than 30 bytes in LATIN internal form" which is self explanatory. It means you can't define a column name that has characters more than 30.

So by putting bigger column name you can be ended up with a compile error in Teradata.

Monday, October 20, 2014

SAS DI Error: "CLI error trying to establish connection...."

I have been working with SAS DI studio for couple of months and never found the error like this before,“CLI error trying to establish connection: [Teradata][ODBC Teradata Driver][Teradata Database] The UserId, Password or Account is invalid. : [Teradata][ODBC Teradata Driver] Not enough information to log on ”

Error is self-explanatory, it means the user id and password I have given is incorrect. I was really surprised!!! It was working fine last week. What happened in the mean time?? Who changes my SAS password??

No No, it’s not your SAS credential, look at the error carefully; it’s your Tera data credential. I tried to remember and finally recalled my credential for Teradata has been changed in the meantime (few days ago).

Oooo Yes, I know how to fix it!!??? I opened ODBC data source administration and changed the password (Fig:1). It should work now. Don’t you think same like me?

Fig 1: ODBC setup for Teradata

Nope, it did not work; still got the same error.

After searching a little bit found there is a SAS tool called ‘SAS Personal Login Manager’ where you should update your database password if your database credential changes.

How to do that?

It’s simple; open the tool “SAS Personal Login Manager” and updates the password and closes it (Fig:2).

Fig2: SAS personal Login Manager

Now if you reopen your SAS DI studio and run the job then the error will disappear.

Tuesday, October 14, 2014

Teradata: How to update duplicate records

Generally update statement will not allow updating duplicated records; records must be unique if you want to update a table.

But sometimes your business needs duplicate records in a table. Then your task is to provide the solution. How do you do that?

You need to make a new column that can generate sequential numbers and you need to add that column to duplicate rows to make a unique row.

For example:

Your data look like below, say table name is Weather_DK:

Country	CTY	AVG_TEMP	AVG_Humidity
DK	CPH	12.3	83%
DK	CPH	12.3	83%
DK	CPH	12.3	83%
DK	Arhus	11.3	85%
DK	Aalborg	12.3	87%

Fig 1.1: Table Weather_DK

For example, now you want to update the data in Weather_SE table which has same column name as Weather_DK table and data like below:

Country	AVG_TEMP	AVG_Humidity
SE	19.3	43%
SE	23.3	83%
SE	12.3	53%

Fig 1.2: Table Weather_SE

Say; your business informed, Swedish average tempreture and humidity is exactly same as Denmark's capital city Copenhagen's (CPH) average temperature and humidity.

So your job is to update the Weather_SE table’s two column AVG_TEMP and AVG_Humidity from Weather_DK data. When you write SQL query like below:

Update Weather_SE
Set
AVG_TEMP= DK.AVG_TEMP
AVG_Humidity=DK.AVG_Humidity
from
(Select AVG_TEMP , AVG_Humidity
From Weather_DK
group by 1,2) as DK
where DK.CTY='CPH'

When you will run the above script you will get error :”Failure 7547 Target row updated by multiple source rows”

To eliminate the error, you need to have unique rows. So you add a new columns with your result set, you can select like below:

select ROW_NUMBER() over (order by AVG_TEMP) as row_S, AVG_TEMP, AVG_Humidity

from Weather_DK as DK

Now you can easily update.

Pages

Wednesday, January 21, 2015

Saturday, December 27, 2014

Thursday, December 4, 2014

Tuesday, November 18, 2014

Friday, November 14, 2014

Monday, October 20, 2014

Tuesday, October 14, 2014