Saturday, July 20, 2019

New to Git and Github? How Azure DevOps connect with Git? Must know.


Those who have been using TFS for a long and now you know 'Git' is a 'thing' that you can't avoid.  When you start with Git, please don’t expect it will be exact same like TFS. When I have started with TFS 10 years back, it was super easy to start with, however for Git, start with a mindset that, command line is required.

As you know Git, other name come along which is 'GitHub' and then confusion grows. 
Many people think Git and GitHub are the same, But it’s not. Git is the version-Control software and GitHub is your storage with sharing capability. If you don’t want to share your code with anyone but still having versions maintenance you don’t need GitHub (which is unlikely though). Moreover, alternative to GitHub you can use Azure DevOps.

Those of you used to work with TFS, maybe already got to know about Azure Devops which give you option to either use Git or TFS as your version-control tool. It means in this scenario, you don’t need GitHub. Below picture depicts Git relate with Azure DevOps and GitHub.

Fig 1: Git, GitHub and Azure Devops

Let's start with beginner steps to work with Git.

Step 1: Install Git at your PC.
Follow the link and choose your right platform to download the file and then install it.
https://git-scm.com/downloads

Step 2: Open the Git command window, which look like below

Fig 2: Installed Git software

Now, you would like to create a solution or file which you would like to add to the Git and would like to work with other colleagues.

Adding file to Git means, the folder where you keep the files/code should be recognized by Git. To achieve that, you need to do following:

For example, your code is located under C\Code and you want to make this as Git repository.

After opening the Git Bash, write below command:

1 cd C:\                                                                                                            

2 cd Code
3 git init

Now the folder is ready to use as Git repository

'Initialized empty local Git repository in C:/Code/.git/'

And you will a find a git folder created in the folder C:/Code/

Now you can copy all your files under the folder C:/Code and let's add a file to the local Git repository.

4. git add Testcode.txt

and Commit the code in local repository

5.  git commit -m 'added test file'


Now your solution or all code files is under git repository, If you work on your own and wanted to have version control so you can always go back to your earlier code, which is ready at this stage.

Step 3:
Let's get to the next step where you would like to push this code to either Github or Azure Devops. In my case I have uploaded the code to Azure DevOps.

How do you do this?

Login to the Azure devops : https://dev.azure.com/

Create repository , choose Type as 'Git'

Fig 3: Create repo under Azure Devops
After finishing the repository creation, you will find like below window, where you need to copy the command line code.



Now go back to your  'Git Bash' command window, and write two lines of code:

6. git remote add origin https://d*****@dev.azure.com/dpaul0628/DpaulGithub/_git/LoveGit
7. git push -u origin –all

Now if you login to the Azure DevOps, you will see the local file you have committed earlier is uploaded to the Azure DevOps.

Saturday, June 29, 2019

Creating script along with Data for database objects- Why require? and Easy way to accomplish.

Sometimes you may have a table which have populated by business or tech to support the solution you develop. This type of table can be called as Master data table which is not derived from any source system. So whenever you need to move table DDL/script to higher environment (from DEV to UAT or PROD) you need to move not only DDL but also data for that table.

Now let's find out How do you do it?

There are a few ways to do it but my preferable way to do it by using SQL Task, we mostly use this for generating scripts; however, it can also take the data.

Step 1: Select the database and find generate script option
Fig 1: Finding out generate scripts for the database
Step 2: Follow the wizard
Fig 2: Go thought the wizard
Step 3: Choose the database object, the table you would like to move to the higher environment. And click Next.
Fig 3: Choose your database object
Step 4: Before you generate the script, you also need to have data,so click at the 'Advanced' button.
Fig 4: Advanced scripting options
Step 5: Change 'Type of data to script' to Schema and data
Fig 5: get schema and data together




You are almost done, after completing the addition few steps in the wizard script will be ready with DDL and data. Now you can move the saved file to the upper environment and run the script, you will find table and data together.



Sunday, May 19, 2019

When and why unpivot require for your Data Warehouse(DW) project?

You may involve with a Data warehouse (DW) project where you find source system got many repeated columns in a table which can be reduced to a few columns table. It's not only reducing the number of columns rather you would like to find out attributes those can be aggregated for the reporting purpose, in DW world, which will be your fact attributes. Let's start with an example:

In Medical data, Hospital can have different type of emergency bed type e.g. below table holds bed availability data for different type of bed.

Fig 1: Hospital Bed data

Let's make a table with above data, use below SQL query to populate the data.

create table #HospitalBed (TranID int,SubmissionDate datetime, GenAcuteCom int, 
ShortstayCrisisComp int, MoodAnxietyComp int, EatDisorderComp int)

 insert into #HospitalBed
 select 101,getutcdate(),10,3,9,4
 union 
 select 102,getutcdate(),5,6,2,8
 union 
select 104,getutcdate(),6,3,9,1
 union 
 select 105,getutcdate(),5,6,2,8



If you would like to get the data under, SELECT * FROM #HospitalBed

Fig 2: Data from Table

Now to fit this data for data warehouse model, we can clearly see columns need to turned into rows. These different type of bed should go under one column called 'BedType' and values for those columns  under another column can be called 'complementbed'. Now how do we do it? We have function in SQL called 'unpivot', we are going to use the function to populate expected data set.

select u.tranID, u.Bedtype, u.complementbed
from #HospitalBed s
unpivot
(
  complementbed
  for Bedtype in (GenAcuteCom, ShortstayCrisisComp, MoodAnxietyComp,EatDisorderComp)
) u;

Fig 3: After unpivot

As you can see , after applying UNPIVOT you get more rows but column got reduced.

But what is the benefit? This will help to make dimensional model easy, where all these BedType can be kept in a dimension table and fact table can have the key of BedType along the with ComplementBed value. Below Fig (4) shows how dimensional model can be established for the earlier mentioned source table (Fig 1) eventually.

Fig 4: Sample Dimension and Fact table 


And consider the source table we started have 4 different type of bed, however, it could be around 20 columns and more. Dimensional model can be super complicated and even may not able to build proper dimensional model without transposing the columns into rows for the above scenario,

Sunday, April 21, 2019

Misleading popup message in SSIS solution can give you trouble!!

SQL Server Data Tools (2017) is smart and trying to help you many ways. But sometime performance improvement popup message can lead you to some other issue.

I would like to share my experience recently, I was working with SSIS packages and at the end of the day commited and pushed the code to github successfully. But very next day when I opened the solution, surprisingly got incompatibility error (Fig 1), I was not able to open the project.

Fig 1: Solution incompatibility error.
I was really surprised, and trying to remember what I did wrong which may causes the issue. The project incompatible (Fig 1) generally happens when Visual studio version miss match. I did not do any update !! My mind was stuck for a while then recall that I clicked a pop up message in SSDT last day even can't remember what I responded. 

The pop up message was the below one (Fig 2). And I surely clicked 'Disable this extension'.

Fig 2: Misleading pop up
Pop up message in SSDT also says 'Disabling it may improve your experience' (Fig 2). I did not pay close attention and trusted the part where saying it may improve the performance which is misleading. And this causes the solution turned into incompatible.

How to solve this issue?

There are two ways you can solve:

1) Repair the software or 2) Enable the extension

1) Repair the software

You can execute the SQL server data tools (SSDT-Setup-ENU.exe) and click repair as like below figure 3, complete a few steps in wizard. When repair is done then open the solution, now compatibility error will go away.
Fig 3: Repair SSDT

2) Enable the extension
Please open SQL Server Data Tool and go to Tools->Extension and updates. You will find 'Micoroft Integration Services Project' extension is Disabled (Fig 4), so you need to enable the extension,


Fig 4: Extension and update





Saturday, March 16, 2019

Unions in OLEDB Source can kill performance in SSIS, How to handle this?

If you write SQL with many UNION in OLEDB source transformation then you will find the SSIS package is running slow.

For example, you may require many SQL unions to populate the data set as like below:
SELECT x,y,z,z1,z2
FROM tblA

UNION
SELECT x,y,z,z1,z2
FROM tblB

UNION
SELECT x,y,z,z1,z2
FROM tblC

UNION
SELECT x,y,z,z1,z2
FROM tblD

UNION
SELECT x,y,z,z1,z2
FROM tblN


Fig 1: OLEDB source 

Instead of using the above SQL in OLEDB source, you can use Union All transformation which is much faster in compare with using the SQL code in OLEDB source.

Fig 2: Union All Transformation



Saturday, February 16, 2019

SQL Analytical Function: Teradata Vs. MS SQL syntax

A few years back, I posted about Teradata analytical function where I described how this works and why we need to use analytical function :  https://pauldhip.blogspot.com/2014/12/analytical-function-rownumber-with.html

However, this post includes SQL syntax for MS SQL and how it's different from Teradata.

The SQL Code I have written in Teradata in my earlier post to find latest version of the data by using below:
select INDUST_CODE,INDUSTRY_NAME
from INDUSTRY
QUALIFY
row_number() Over (partition by INDUST_CODE order by ACTIVATE_DATE desc)=1

Do you think, the above syntax will work in MS SQL? Not really!! MS SQL doesn't have QUALIFY function. You need to write the syntax a little different way and  a few extra lines required to achieve same results.

Let's translate the above syntax for MS SQL:

SELECT INDUST_CODE, INDUSTRY_NAME, RANKRESULT
FROM(
SELECT  INDUST_CODE, INDUSTRY_NAME ,
RANK() OVER (PARTITION BY INDUST_CODE ORDER BY ACTIVATE_DATE DESC) AS RANKRESULT  
 FROM INDUSTRY
) TT
WHERE TT.RANKRESULT=1 

As you see, for MS SQL we had to use sub query and RANK function to achieve the same result we got in Teradata.

Sunday, January 13, 2019

Why Big Data and What is Big Data?

Before we know What is Big data, let’s start with Why Big data came into the picture?

Big data gets generated in multi petabyte quantities every day. Data changes fast and comes in different format e.g. audio, video, picture, text, structure, unstructured etc. those are difficult to manage and process using RDBMS or other traditional technologies. Since tech company like Google, yahoo in early 2000s found challenges to solve these various types of data with huge volume with existing technologies, so they started looking into alternative solution and that's how Big data is here today. You will find more about the big data history at the end of this post.


Let’s start with, what is Big data?

Is big data a Tool? Language? Solution? Or what? ...

Well, it’s a platform that comprises many tools, fortunately most of them are open source. However, since there are many tools available in the market to solve big data challenges, so next confusion arises; what tools to use when, I will write about this in my next post.

Let’s focus on concept of big data, People think big data is always about huge data, but it’s not the case. We can say, to be candidate for big data solution it should meet at least one of the three elements from 3 Vs:
 1) Volume 
 2) Velocity and 
 3) Variety

Fig 1: Elements to meet big data challenge

High volume: Social media like Facebook has billions of users, huge content created on YouTube every hour, organization like NASA generated 1.73 gigabytes of data at the end of year 2014 in every few seconds, Maersk vessels send huge volume of data every minutes over network.

High Velocity: Speed of the data matter, you need to capture real time data from IoT devices. Your mobile devices produce tons of data every day. Some business can’t wait longer, so you may have to capture near real time of data and need to process immediately. Some business like retail industry require real time data.

High Variety: Different type of data mixed in the same platform e.g. Wikipedia or Twitter or Facebook they have mix of text, audio, videos, images etc. Regular business also receive different format of data which need to transform into useful output.

So when your organization deal with the above 3 Vs then it's time to consider moving into big data platform. As Forbes research shown [1], the companies who said don't have any plan to use big data in 2015,  out of those; 11% percent already started using big data from 2017. And in 2015, 17% mentioned they are using big data but those number is increased to 53%. in 2017. The research also added that, among all industries; Finance and Telecom are ahead to adapt the big data.

History of Big data (literally how Hadoop invented):

Since data started growing exponentially and you get various type of data with great velocity which existing transactional database could not handle. Hence, many says; at first Google faced challenge how to handle the scenarios where they tried to gain an advantage in their searches, Google wanted to create a catalog of the entire Internet. To be successful, they had to meet the challenges presented by the 3 V's (as mentioned above) in an innovative way.

Google tackled the big data problem working together in a group of interconnected, inexpensive computers. This was revolutionary, over a span of a couple of years, Google Labs released papers describing the parts of their big data solution. From these, Doug Cutting and Mike Cafarella began developing a project at Yahoo!, which was later open sourced into the Apache Foundation project called Hadoop, named after the toy elephant of Mr. Cutting’s son.

When people talk about big data, the first name come is ‘Hadoop’. Hadoop is High-availability distributed object-oriented platform is used in maintaining, scaling, error handling, self-healing and securing large scale of data. These data can be structured or unstructured. As mentioned earlier if data is huge with variety and need to process instantly then traditional systems are unable to handle it. Thus, Hadoop comes in the picture.

But please remember, big data is not only Hadoop, there are so many other tools work with Hadoop eco system which you must need to use to solve the big data challenges which I am going to write in my next post.


[1] https://www.forbes.com/sites/louiscolumbus/2017/12/24/53-of-companies-are-adopting-big-data-analytics/#19852cfd39a1 (accessed on 11th Jan 2019)