It's all about Data: Tera Data: delete duplicate records

Wednesday, December 4, 2013

Tera Data: delete duplicate records

As I am from MS Sql background,could not think deleting duplicate records could be such cumbersome. It is not straight forward like MS SQL. In teradata, you will need help of temporary table to remove duplicates data from your table. My writing is only concerned deleting records from a single table, deleting records from more than one table joined by reference is different.

--While you do group by and count is more than one, you get duplicate records
select StoreId, Smonth, ProdID, Sales from Paul_db.Test_sales_tbl group by 1,2,3,4 having count (*)>1

--or
select distinct * from Paul_db.Test_sales_tbl

--Now need to delete the duplicate records, in tera data, first you move your data in a temporary table

--create temp table without copying data

Create volatile table Tmp_Tst_sales2 as (select * from Paul_db.Test_sales_tbl) WITH NO DATA

--The above code worked well and created the table structure

--Now insert data into the table
insert into Tmp_Tst_sales2 select distinct * from Paul_db.Test_sales_tbl
select * from Tmp_Tst_sales2

--Unfortunately, above code can't insert the data.

--So We do other way; create table with records
drop table Tmp_Tst_sales
create volatile table Tmp_Tst_sales as (select * from Paul_db.Test_sales_tbl) WITH DATA
ON COMMIT PRESERVE ROWS;

select * from Tmp_Tst_sales

--delete the duplicate records from volatile table and then insert only distinct records
delete from Tmp_Tst_sales

insert into Tmp_Tst_sales select distinct * from edwwrk_U.Test_sales_tbl

--Now the above code works

--So finally insert the distinct records from temp table to real table

delete from edwwrk_U.Test_sales_tbl

insert into edwwrk_U.Test_sales_tbl select distinct * from Tmp_Tst_sales

drop Tmp_Tst_sales

-----Done---

It's all about Data

Pages

Wednesday, December 4, 2013

Tera Data: delete duplicate records

No comments: