Saturday, June 18, 2016

ABC of MDX expression

If you are working with BI (Business Intelligence) and you have cube then as a developer you have tasks to produce data set for the business from the cube or cubes. Then your preferable way of populating data set will be using MDX query.

MDX is stands for Multidimensional Expressions. MDX is not specific to a database rather it's standard expressions to make query from multi-dimensional databases and solutions e.g.  MDX uses by MS SQL, Oracle, SAP, Teradata and many more vendors.

When we make SQL query it's for two dimension data e.g. rows and columns and that's how SQL is designed. In contrast, MDX can process one, two, three or more dimensions in the query.

If you have experience working with SQL, you will try to compare MDX expression with SQL syntax. While I was reading through different blogs, I found that some suggested 'Never compare with SQL Syntax that will only hurt you'. On the other hand, some wrote blog explaining how SQL syntax can be converted into MDX expression. So, right now I can't decide what to follow??!!!

So without comparing two things, I will only explain about basic of MDX expressions.
MDX expression has three different parts:

SELECT

FROM

WHERE

Part 1: SELECT

You will put both measures (fact) and dimension table's column name at the SELECT part. However, when you put columns from fact table then you need to add 'ON COLUMN' at the end of the column selection e.g. 

SELECT                                                                                  
{                                                                                                                                         
  [Measures].[Internet Order Quantity],  
  [Measures].[Internet Sales Amount]                                                 
} ON COLUMN

And when you SELECT from dimension then you need to add 'ON ROWS'
                                                                                     
            [Date].[Calendar Year].MEMBERS ON ROWS
  
                                 

Part 2:  FROM

As you know in SQL you generally use table or view to select FROM, however, in this case it's a cube name e.g.  FROM [Adventure Works] and a cube can hold numbers of fact and dimension tables.

So, complete syntax will look like below where SELECT and FROM part is included:

SELECT {[Measures].[Internet Sales Amount]} ON COLUMNS, 
[Date].[Calendar Year].MEMBERS ON ROWS 
FROM [Adventure Works] 


Part 3: WHERE (Optional)


The ‘WHERE’ part is not mandatory in MDX expression. This part is called slicer axis, it's similar as 'Where' clause in SQL which filters the data. And the syntax looks like: 

WHERE ([Customer].[Customer Geography].[Country].&[Denmark])  

You can filter by using completely different dimension than the one used in select statement. As like above we are filtering the data based on Customer dimension and only interested about Denmark customer to produce the result set.  Let’s make full MDX expression:

SELECT {[Measures].[Internet Sales Amount]} ON COLUMNS, 
[Date].[Calendar Year].MEMBERS ON ROWS 
FROM [Adventure Works] 
WHERE ([Customer].[Customer Geography].[Country].&[Denmark]) 






Sunday, April 24, 2016

Basic understanding of Data Warehouse(DW) and Business Intelligence(BI)

We are familiar with different key words 'Business Intelligence', 'Data Warehouse', 'ETL' and so on. These keywords are interdependent, sometimes people call one but mean other, even in the job market those three keywords are often mix, e.g. 'ETL Developer is needed' post can be filled by BI or data warehouse developer.

So, the people who are new to the area, this post will give you basic understanding of the above keywords.

The folks who work with IT, they all know about database. As we know data driven applications/solutions store valuable data in the database. Mid size to big companies use different solutions through the organizations. Due to huge volume of unstructured data as well as  more than one solutions in the organization demands separate solution that can  produces intelligence decision output.

First look at below diagram where we know how software application, desktop/web based solution are built:

Fig 1: database driven application/solution



A Data Warehouse(DW) gathers information from a wide range operational systems from a company and it's external system. To get data from different operational systems and load that into EDW require a process called ETL (Extract, Transform and Load).


Fig 2: Data Warehouse basic diagram

We get the unstructured data from different solutions in the data warehouse. Now, we can make BI solutions on top of the data warehouse. Here we split between data warehouse and BI solutions. BI solutions can be built on Data warehouse, So first make data Warehouse and then built BI solutions by using Data Warehouse. To make BI solutions and data warehouse both require ETL( Extract, Transform, Load) tools.

                         
Fig 3: BI solutions top of Data Warehouse

A Small History:


Early '80s data warehouse was the place where aggregated data was placed into special data model; user presentation of dashboards, reports, scorecards, and graphical analysis was not present at that time.

In the ‘90s BI concept came along with the presentation layers where dashboards, reports, scorecards, and graphical analysis produces clear business presentation. By looking at the data presentation business can make future decisions. Over the time it became clear that BI could not exist without data warehouse and data warehouse is the foundation for BI.

My next post will be related to Technologies around BI and a simple guideline to become BI/Data Warehouse developer/Expert.


Sunday, March 20, 2016

Git for Dummies: A basic guide to use Git for your project.

As per the Git website "Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency." [1]

I have worked with Microsoft TFS before but this is the first time using Git for my current project. I am going to write my experiences with Git and the command set which is necessary to work with Git.

Download and Install Git:

You can download Git from https://git-scm.com/download
And then install by following the wizard. After installing Git you will see below three items those will be added as program.

Fig 1: Installed Git in the PC

Among three use 'Git Bash' to start with.

Get the source code:
To get the source code from repository, you need to clone the code. For example if the source code repository is on your network drive or any  server then you need to clone that code first at your own PC.

As you have seen from Fig 1, from three items click Git Bash and then you will find below window where you will write command:
Fig 2: Git Bash Window


If your source repository is in the network drive in your organization, e.g. network drive location is: //SourceRepository/Project1

Then, command look like below:
$ git clone //SourceRepository/Project1 myproject

The code from your network drive will be copied to myproject folder in your local PC's github folder.

If you want to change the path and then load there then the command will be:
Change path:
$ cd  /D/myproject
Then clone:
$ git clone //SourceRepository/Project1 myproject

Now the code will be in your folder /D/myproject

Then initialize it:
git init


Working with Git
Now you are ready to work on your code, you can add new file, edit and update the existing one.

Add new file:
git add paul.html

Commit file:
git commit -m 'added new page'  /*here we are putting comments while committing*/

Check status :
git status

To see list of the files in the folder:

git ls-files

Fig 3: List of the file in the Git  folder


It is good idea to make branch and then work on that branch e.g.

Make your branch:
git branch MyTestBranch

Checkout the branch:
git checkout MyTestBranch

then you can add any file to your branch or change the existing file:

Create new file:

touch test.pli

touch paul.html

Add those file:
git add .

and commit it:
git commit -m 'two new files are added'


These files are now in my branch but not saved in Master

If you checked out the master we will not find those two files:

$ git checkout master


How to merge the change from MyTestBranch to Master, you must have to be in Master Branch then write below command:
git merge MyTestBranch






 [1] https://git-scm.com/

Tuesday, February 23, 2016

Common Table Expression (CTE) in MS SQL and Teradata

Common Table Expression (CTE) is a SQL syntax that uses in MS SQL, Teradata and other databases. This post will include why will we use CTE? and SQL syntax of CTE.

By using CTE; complex query can be written as simple readable and maintainable way. It’s something like temporary result set but not exactly temporary/volatile table. You can avoid having volatile table if you use CTE. CTE is like dynamic view, however views meta data is stored in the database but CTE doesn't require that. 

A recursive CTE can simplify the code required to run a recursive query within a SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. For example, in an organization chart you need display employee’s positions in the organization or e.g.  Showing product hierarchies, you need to find out product segment, and then Product template, eventually the product. Above both cases, CTE can be helpful to find out the hierarchical chart.

A simple CTE query will look like below:

with CTE_PRODUCT_X_CUSTOMER (PD_ID,PROD_NAME,CUSTOMER_ID,CUSTOMER_NAME,SALES_DATE)
AS
(
Select P.PD_ID as PD_ID, P.PROD_NAME as PROD_NAME, C.Customer_ID as CUSTOMER_ID, C.CUSTOMER_NAME as CUSTOMER_NAME, S.Sales_date as SALES_DATE
From Sales as S
JOIN
From Product as P on
S.PD_ID=P.PD_ID
JOIN Customer as C
 ON S.CST_ID=P.CST_ID
 )
select * from CTE_PRODUCT_X_CUSTOMER

Above example deliver sales information of particular customer. of course, this can be done without having CTE, a simple join or view can achieve same output. However, the above example can introduce you CTE syntax.

Now, Lets look at an example where CTE can avoid volatile/temporary table. e.g. you need to update Manufacture_id from your PRODUCT_SALES table. And to update the column Manufacture_id, you are depend on Product_ID matches from other sub queries. The example can go like this:

create volatile table TMP
as
(
Select K.PD_ID as PD_ID, P.PROD_NAME as PROD_NAME, Ck.Customer_ID as customer_id, P.EFF_DT as EFF_DT

From Product_STG as P
join Product_KEY as k
on   p.PRD_CODE=k.PRD_CODE
 and p.EFF_DT> '2016-01-29'
JOIN Customer_key as CK
 on CK.SRC_ID=P.Source_Id
 and p.eff_dt>'2016-01-29'

) with data
on commit preserve rows;

 update PRODUCT_SALES
 set Manf_id=(select customer_id from TMP
              where PRODUCT_SALES.PD_ID=TMP.PD_ID
               and  PRODUCT_SALES.EFF_DT=TMP.EFF_DT)

Above example required to create volatile table. Now we can see how we can avoid using volatile table; instead, lets use CTE syntax.

with CTE_TEMP (PD_ID,PROD_NAME,customer_id,EFF_DT)
AS
(
Select K.PD_ID as PD_ID, P.PROD_NAME as PROD_NAME, Ck.Customer_ID as customer_id, P.EFF_DT as EFF_DT

From Product_STG as P
join Product_KEY as k
on   p.PRD_CODE=k.PRD_CODE
 and p.EFF_DT> '2016-01-29'
JOIN Customer_key as CK
 on CK.SRC_ID=P.Source_Id
 and p.eff_dt>'2016-01-29'
)

update PRODUCT_SALES
set Manf_id=(select customer_id from CTE_TEMP
              where PRODUCT_SALES.PD_ID=CTE_TEMP.PD_ID
               and  PRODUCT_SALES.EFF_DT=CTE_TEMP.EFF_DT) 

As you see, you can avoid creating temporary table by using CTE. Please remember, you can’t create view on top of CTE.



Saturday, January 16, 2016

Teardata Error Code 2652: Operation is not allowed and table is being Loaded, how to solve?

I was working with a SAS DI job and using transformation called Teradata Table Loader to load data into the teradata table named 'Table1'.

Fig 1: Job  (dummy) with Teradata Table Loader

After running the job I found error:  Operation is not allowed: 'Table1' is being Loaded. After investigating a little found the reason that when you are loading data into Teradata and using TPT (Teradata Parallel Transporter) utility with Teradata MultiLoad you can find such error. The Teradata Table Loader property in the job looks like below:

Fig 2: Property for transformation Teradata table loader.

 And the error means you will not able to make any operation to this table, it's locked. 

Then I started looking into the solution; so far I found a few solutions:

1) Use empty file and connect with target table and then run it so the error will disappear. Then run with the right source file.
2) Use code to release the lock, syntax is:
       RELEASE MLOAD Table1 ;
       OR
       RELEASE MLOAD Table1 IN APPLY
     
3) Drop and recreate the table.

I have tried  number 1 and 2 solution but did not work successfully.  DBA kill my SQL session but still it's same, did not fix the problem.

However, solution 3 works for me. Then I was able to run the job and load the data into the table.

Saturday, December 19, 2015

How to change CSV format from comma(,) delimited to pipe(|) delimited?

By default, when you save an excel file to .csv, it will be save as comma seperated file.
For example, you have data in excel that looks like:

Fig 1: Original format of data

Lets save this as .csv format:

Fig 2: Save as Comma separated file

After saving the file data will look like below:

Fig 3: Data after converting into CSV 

However, you need to have pipe delimited instead of comma delimited, how do you do that? Please follow the below steps to save file as pipe delimited.

If you are using Windows 10, then you need to go to Settings and then you will find below:


Fig 4: Settings in Windows 10

Now please click "Time and Language" option then you will find below screen, where you need to click "Additional time & regional settings" button.


Fig 5: Related Setting area
Now you will find below option:

Fig 6: More click
If you click on "Change date, time and number formats" under Region section; will find below window (left) and then if you click "Additional settings" then right side window will pop up.

Fig 7: Additional settings
Finally you need to change, List separator from comma(,) delimited to pipe (|) delimited.

Fig 8: List seperator
And now, if you save the same excel as CSV then it will look like below:

So achieved your goal. However, please remember to change your settings back if you want to have your data as comma delimited.



















Thursday, November 12, 2015

MS SQL: How to insert data into a table from Excel

One of my previous post I have written how to insert data from excel/csv file to Teradata table (URL: http://pauldhip.blogspot.dk/2015/02/insert-data-from-excel-to-terdata-table.html).This time, I am going to explain how to do that in MS SQL.

Lets say, we have a table in the database named Customer_dim which have following columns:

CREATE TABLE [dbo].[Customer_dim](
[Customer_id] [nchar](10) NULL,
[Customer_First_Name] [nchar](10) NULL,
[Customer_Last_Name] [nchar](10) NULL,
[Martial_Status] [nchar](10) NULL
) ON [PRIMARY]

The table need to filled by data from excel. To do so, we need to follow the following steps:


Step 1: Save MS excel file into CSV
The excel file should have same number of columns as in the table.

Fig 1: Data in the excel

Now save the excel data as .csv file.
Fig 2: Save as .csv





Step 2: Write code to insert data from CSV

BULK INSERT [dbo].[Customer_dim] FROM 'C:\BI Projects\Blog\Dim_Customer_CSV.csv'
WITH (
    CHECK_CONSTRAINTS,
    DATAFILETYPE='widechar',
    FIELDTERMINATOR='|',
    ROWTERMINATOR='\n',
    KEEPIDENTITY,
    TABLOCK
);

Since, my csv file is saved as pipe delimiter so that above SQL code has FIELDTERMINATOR='|'. You need to change that as per your csv format. As well as,vyou need to change your csv file loacation accordingly.

After running the above SQL code, you should able to see below records are inserted to the table.

Fig 3: Result set after inserting the data