It's all about Data

Saturday, December 19, 2015

How to change CSV format from comma(,) delimited to pipe(|) delimited?

By default, when you save an excel file to .csv, it will be save as comma seperated file.
For example, you have data in excel that looks like:

Fig 1: Original format of data

Lets save this as .csv format:

Fig 2: Save as Comma separated file

After saving the file data will look like below:

Fig 3: Data after converting into CSV

However, you need to have pipe delimited instead of comma delimited, how do you do that? Please follow the below steps to save file as pipe delimited.

If you are using Windows 10, then you need to go to Settings and then you will find below:

Fig 4: Settings in Windows 10

Now please click "Time and Language" option then you will find below screen, where you need to click "Additional time & regional settings" button.

Fig 5: Related Setting area

Now you will find below option:

Fig 6: More click

If you click on "Change date, time and number formats" under Region section; will find below window (left) and then if you click "Additional settings" then right side window will pop up.

Fig 7: Additional settings

Finally you need to change, List separator from comma(,) delimited to pipe (|) delimited.

Fig 8: List seperator

And now, if you save the same excel as CSV then it will look like below:

So achieved your goal. However, please remember to change your settings back if you want to have your data as comma delimited.

Thursday, November 12, 2015

MS SQL: How to insert data into a table from Excel

One of my previous post I have written how to insert data from excel/csv file to Teradata table (URL: http://pauldhip.blogspot.dk/2015/02/insert-data-from-excel-to-terdata-table.html).This time, I am going to explain how to do that in MS SQL.

Lets say, we have a table in the database named Customer_dim which have following columns:

CREATE TABLE [dbo].[Customer_dim](
[Customer_id] [nchar](10) NULL,
[Customer_First_Name] [nchar](10) NULL,
[Customer_Last_Name] [nchar](10) NULL,
[Martial_Status] [nchar](10) NULL
) ON [PRIMARY]

The table need to filled by data from excel. To do so, we need to follow the following steps:

Step 1: Save MS excel file into CSV
The excel file should have same number of columns as in the table.

Fig 1: Data in the excel

Now save the excel data as .csv file.

Fig 2: Save as .csv

Step 2: Write code to insert data from CSV

BULK INSERT [dbo].[Customer_dim] FROM 'C:\BI Projects\Blog\Dim_Customer_CSV.csv'
WITH (
CHECK_CONSTRAINTS,
DATAFILETYPE='widechar',
FIELDTERMINATOR='|',
ROWTERMINATOR='\n',
KEEPIDENTITY,
TABLOCK
);

Since, my csv file is saved as pipe delimiter so that above SQL code has FIELDTERMINATOR='|'. You need to change that as per your csv format. As well as,vyou need to change your csv file loacation accordingly.

After running the above SQL code, you should able to see below records are inserted to the table.

Fig 3: Result set after inserting the data

Sunday, October 18, 2015

Teradata:How to check DDL(table structures) are in sync among different environments(development/Test/Production) ?

Title of this post will be explained with an example. For example, If your projects involve populating data from source to target for ten (10) new tables. Then you have to create the DDL for ten tables at least in three different environments(development/Test/Production). Well, you start with development environment, develop the jobs and populate data into target area and when those are working fine then you take DDL from development environment to Test. And if tester or business find that you need to add a new column or data type then you need to change. Moreover, maybe your go-live date is closer so same DDL that put into test also may go for deploy into Production. It means , any change caught by tester or you; now you need to change the SQL in three different environment to keep consistent. Offourse, you follow change order to keep track of that, but as a developer you will prefer to see if all three different environments's table structure looks same.

If you use database as Teradata you have options to test through different environment, you can compare tables, table's column and column's data type. So you can quickly find out if you have any inconsistency through different environments.

Below code will help you to find out the difference between Test and Production environments.

select
upper(trim(tablename)) Table_name,
upper(trim(columnname)) Column_name,
upper(trim(ColumnFormat)) Column_format
from dbc.columns

where upper(trim(tablename)) in ('Custmer', 'Cust_Addreess', 'Geography', ,'Address_X_Geo','Customer_X_Product', 'Customer_X_Sales')
and upper(trim(databasename))='DWH_T'
except
select
upper(trim(tablename)) Table_name,
upper(trim(columnname)) Column_name,
upper(trim(ColumnFormat)) Column_format
from dbc.columns
where upper(trim(tablename)) in ('Custmer', 'Cust_Addreess', 'Geography', ,'Address_X_Geo','Customer_X_Product', 'Customer_X_Sales')
and upper(trim(databasename))='DWH_P';

Sometimes it's challenging to find out if table structures (DDL) for a particular projects in different environments in data warehouse has same DDL. And due to the inconsistent table structure especially between test and production can lead into ETL job failure in production.

As precaution, by running above mentioned code you can verify if your production DDL is different than Test DDL where your ETL job ran fine. So you can fix the DDL before the jobs failed in production.

Saturday, September 26, 2015

Develop job via SAS DI studio: What is SQL Pass Through and why you use it?

If you are an ETL Developer and using SAS DI Studio as your ETL tool then this post can be interesting for you. SQL Pass-Through Facility is SQL procedure that enables developer to send DBMS-specific statements to a DBMS and to retrieve data from database.
You can specify database SQL syntax instead of SAS SQL syntax when you use the Pass-Through Facility.

What is SQL Pass Through?
When we make job via SAS DI studio and use PROC SQL to make operation in database then your SQL statement will directly communicate with database and make all the operation on your SQL database and final result will bring back to SAS. To activate the feature what you need to do is to set your SAS DI studio property SQL Pass Through 'Yes'.

Fig 1: How to activate SQL Pass through

Why will you use SQL Pass Through?

1) It's handy when you use database specific function like QUALIFY, RANK etc.
2) If your SQL query has calculation then it will be on SQL database so it's faster
3) Very little processing time on SAS side
4) If you are using store procedure or SQL macro in your ETL job then SQL Pass Through will surely pay off.

Thursday, August 20, 2015

PERIOD: An useful function in Teradata

Let’s start with an example where a table is defined with a column with the type PERIOD. PERIOD data type in Teradata can hold two dates or two time stamps. To ease your date and time calculation PERIOD can play a big role. The below example will show university students’ enrollment history with sample data.

We created a table where column 'Enrolled_period' is defined as PERIOD type.

CREATE MULTISET TABLE person_University_period (

person_id INTEGER NOT NULL,

University_Name varchar(40) NOT NULL,

Enrolled_period PERIOD(DATE) FORMAT 'YYYY-MM-DD' NULL

)

PRIMARY INDEX (person_id);

And insert some dummy data in the table:

Insert into TD8245.person_University_period

Values(24568,'Gothenburg University', PERIOD(DATE '2007-08-09',DATE '2009-08-25'));

Insert into TD8245.person_University_period

Values(24569,'Gothenburg University', PERIOD(DATE '2009-07-09',DATE '2011-06-25'));

Data looks like below:

Fig 1 : Student's enrollment data

From the above table we can see Enrolled_period holds start date and end date. Lets see different type of operation you can do on Enrolled_period field.

e.g. you to know how many months or year or hour student is enrolled to the university. To find out that you don't need to do use any formula. PERIOD can find out months or year or day from the column 'enrolled_period'. Below Syntax will find out the number of months the students have studied.

Select INTERVAL(Enrolled_period) MONTH as enrolled_month from person_University_period

Fig 2: Number of months studied by students

At the same way you can calculate interval in day, hour, minutes and seconds.

From the above 'enrolled_period' column you may need to find out start date, so syntax goes like:

select begin(Enrolled_period) as Enrolled_dt from person_University_period

And to find end date SQL will be:

select END(Enrolled_period) as Enrolled_dt from person_University_period

This period function is helpful when you consider banking operation, e.g. you open a bank account and you should have account opening date and closing date. Generally in the database table we make two fields to serve the purpose which are Effective date and End date.

Instead of using two fields in the table, we can use one column that can hold both start and end date and calculation on two dates or time stamp become easier.

As an example:

INSERT INTO Account

VALUES (

21222250,

'Paul Sandal',

PERIOD(

CURRENT_DATE,

UNTIL_CHANGED

)

);

Result will look like below:

Account_number Account_Name Account_Period

21222250 Paul Sandal ('2015-08-20','9999-12-31')

Fig 3: Date with current and until changed

Here, UNTIL_CHANGED set date as '9999-12-31'. If the account is updated for any reason; then the date '9999-12-31' will be updated with the date which account get updated.

Saturday, August 1, 2015

Teradata DBC : Easy way to get DDL of views, tables and macros!!

As a database developer you may need to take back up of DDL of views, macros and tables for a particular database that you are working on. Or when your development work is done then you need to move table/macro structure (DDL) from development environment to test and test to production.

And those cases, how do you move DDL from Development environment to Test environment and test to Production?

You can copy DDL from database one by one and then paste it to a document to save it and then forward the document to the DBA to execute your DDL in Test or Production environment.

If you would like to copy DDL of the table then you do like below:

Fig 1: Manually copy DDL one by one

If you have twenty(20) tables, 10 macros or 5 views then you have to do the above action 35 times which is time consumable, error prone and boring tasks.

Then, what could be the better way to do it?

You can use DBC to get DDL of views, macros and tables for one particular database by using below SQL:

select * from dbc.tables where databasename='Customer_Service'

order by TableKind desc

Result look like below:

Fig 2: DBC result set

From the above result set (Fig 2), you will find a column named RequestText that holds DDL for all elements exist in the database.

To find DDL for table; SQL will look like below:

Select RequestText from dbc.tables where tablekind='T' and databasename='Customer_Service'

tablekind=T means table

tablekind=V means views

tablekind=M means macro

By changing tablekind column's value you can get DDL for different object in the database.

Saturday, July 11, 2015

SET and MULTISET Table in Teradata

After moving into Teradata world, discovered a few challenges; one of them was not knowing about SET and MULTISET concept in Teradata. My story goes like this:

I was making ETL job where I had three different tables and inserting into a target table. While I have selected columns after joining three tables I got 8,000 rows. However, when I inserted them into target table then I got around 5000 rows which puzzled me.

After doing some investigation found the Teradata table looks like below:

CREATE TABLE INTEREST_MAP

(

INT_ID,

BANK_NAME,

INT_RATE,

EFF_DT,

RATING_SCALE

)

PRIMARY INDEX(INT_ID);

Fig 1: DDL without MULTISET

If the table is made above way, then the table can't accept duplicate values. So, my select statement returned 8000 rows which has 3000 duplicate rows that can not be inserted into the table.

What will be solution??

The table DDL should look like below so that it can accept duplicate values. Added MULTISET in the DDL.

CREATE MULTISET TABLE INTEREST_MAP

(

INT_ID,

BANK_NAME,

INT_RATE,

EFF_DT,

RATING_SCALE

)

PRIMARY INDEX(INT_ID);

Fig 1: DDL with MULTISET

Now, I am able to insert 8000 rows into the table. So Lesson learned; without having MULTISET in your DDL you will not be able to insert duplicate rows.

NOTE: If you have unique index in your DDL; then MULTISET will be over ruled by the Unique Index.

We know MULTISET now; What is SET?

When you work with Teradata mode then by default it is SET. Since I was working with Teradata mode so It means above DDL in Fig-1 is work as SET though DDL did not mentioned explicitly SET.

And SET doesn't allow duplicate rows to be inserted into the table.

Pages