Wednesday, December 4, 2013

Tera Data: delete duplicate records

As I am from MS Sql background,could not think deleting duplicate records could be such cumbersome. It is not straight forward like MS SQL. In teradata, you will need help of temporary table to remove duplicates data from your table. My writing is only concerned deleting records from a single table, deleting records from more than one table joined by reference is different.


 --While you do group by and  count is more than one, you get duplicate records
 select StoreId, Smonth, ProdID, Sales from Paul_db.Test_sales_tbl group by 1,2,3,4 having count (*)>1

 --or
 select distinct * from Paul_db.Test_sales_tbl

 --Now need to delete the duplicate records, in tera data, first you move your data in a temporary table

--create temp table without copying data

   Create volatile table Tmp_Tst_sales2 as (select * from Paul_db.Test_sales_tbl) WITH NO DATA
 
 --The above code worked well and created the table structure

 --Now insert data into the table
    insert into Tmp_Tst_sales2 select distinct * from Paul_db.Test_sales_tbl
    select * from Tmp_Tst_sales2

--Unfortunately, above code can't insert the data.

  --So We do other way; create table with records
  drop table Tmp_Tst_sales
  create volatile table Tmp_Tst_sales as (select * from Paul_db.Test_sales_tbl) WITH DATA
  ON COMMIT PRESERVE ROWS;

  select * from Tmp_Tst_sales

  --delete the duplicate records from volatile table and then insert only distinct records
  delete from Tmp_Tst_sales


  insert into Tmp_Tst_sales select distinct * from edwwrk_U.Test_sales_tbl

--Now the above code works

  --So finally insert the distinct records from temp table to real table

  delete from edwwrk_U.Test_sales_tbl

  insert into  edwwrk_U.Test_sales_tbl select distinct * from Tmp_Tst_sales

  drop Tmp_Tst_sales

  -----Done---

No comments: