It's all about Data: MS SQL

Showing posts with label MS SQL. Show all posts

Sunday, April 26, 2020

Basic differences between SQL and NoSQL.

Around three decades Oracle, IBM, Microsoft relational databases were consistent leaders for Operational Database Management Systems. However, currently all of these vendors offer NoSQL platform parallel to their traditional database platform e.g. Microsoft got Azure Cosmos DB as NoSQL database and so on.

Journey of NoSQL:

The world of NoSQL emerged in 1998 and it's gained popularity in 2010s. In 2015, the popular NoSQL database provider MongoDB appeared in Gartners magic Quardent as a leader for operational database management system.

Fig 1:Gartners Magic Quadrant in 2015

Though recent trends shows that NoSQL is getting more popular in Big data landscape than relational database management system.

What is NoSql and Why?

NoSQL stands for 'Not only SQL'. NoSQL database is for storing both structure and semi structure data.

NoSQL stores data in one of four categories:

Key-Value storage
Document storage
Wide Column storage
Graph database

The most popular NoSQL database MongoDB is document storage, however, it can be used as a key/value store, which is just a subset of features.

Today data become more broader in terms of shape and size. Data is spread around documents, social media, complex web application, IoT sources. The traditional SQL database can't handle these data due to the continuous changing behavior. Traditional database need to know shape of the data (schema) beforehand to store those, hence it failed to capture continuous evolving data. And thus, NoSQL emerge and conquer the world!!!

Fixed schema vs. No Schema

For structure database (traditional SQL database) you need to have schema ready before you insert data to a table but for NoSQL you don't need to have schema created at first rather you can directly insert the data. Though, it's better to say 'Schema agnostice' than 'Schemaless' e.g. Microsoft's NoSQL database Azure Cosmos DB known as schema agnostic database, which is not bound by schemas but are aware of the schemas.

Relationship vs. No Relationship

Traditional SQL require to maintain relationship to perform, that's why normalization is there. Whereas, NoSQL doesn't require to maintain the relationship. You can embed a few tables into one table in NoSQL database.

Below table (fig:2) shows basic differences between SQL and NoSQL database:

Fig 2: Comparison between SQL and NoSQL

Conclusion: NoSQL denotes 'Not only SQL', it means NoSQL database can perform what traditional relational database can do as well as do more. But these two types of Database have different expertise, as per business requirements you can choose from them. A way to find out which fit best for you could be asking the question. Do you know the shape of data well advance? If answer is 'Yes' which means you can define schema earlier and can build the relationship before data arrive then it's good to choose traditional SQL. On the other hand, when you don't know the shape of data and behavior changes continuously then go with NoSQL. Nevertheless, some complex enterprise solution can be built by using both SQL and NoSQL database to leverage best of each.

Saturday, March 21, 2020

Finding unused Schemas in SQL database

Fun part of a developer life is: 'Today what is unknown, tomorrow you know it by learning and doing". And by learning the unknown and implementing you become expert :) I will share something which I learned and develop recently. Long back I got a task to find out used schemas in database where the solution was pretty straight forward. However, this time the task was to get unused schemas. Firstly, I thought this must be fun, never did it but someone must did it so started with google and got some clue but not with proper explanation so thought about writing it.

To find out the solution, you need to go through three tables:
1) Sys.Schemas
2) Sys.database_principals
3) Sys.objects

Easy way to find unused schemas is : If the schema_id from sys.schemas table is not found in Sys.objects table means those are never used. And both tables schemas and objects got the column ‘Schema_ID’. Woww!! We know the solution, yes all most 😊 Why we need database_principals table? When schema name is not specified for those cases we need to use this table. Relations among three tables can be depicted with below diagram:

Fig 1: Relationship among tables

We covered theory part, let's see in action, below is the SQL code to find out unused schemas in database:

SELECT name

FROM sys.schemas SCH

where SCH.schema_id not between 1 and 4 --This is for dbo, guest, information_schema,sys

and SCH.principal_id not between 16384 and 16399 --- it's related to db_* schema, so don't touch this.

and not exists

( select 1

from sys.database_principals P

where P.default_schema_name=SCH.name ---Name to be used when SQL name does not specify a schema, so we are taking this off

)

and not exists

( select 1

from sys.objects O

where O.[schema_id]=SCH.[schema_id] ---Object uses the schema, so if schema id is not exist in the object; it means the schema is not used.

)

order by SCH.name

You know how to find unused schema, now you can delete them if you want.

Wednesday, February 19, 2020

What is Linked services in Azure Data factory(ADF)?

If you are new to Azure Data Factory and started to build the very first pipeline. Assuming, you already found the source and target. Your source data either from on premise or cloud database or blob storage or from any Web API. Whether you connect with storage account or on-premise /cloud database you require Linked Services. Linked Services is like Connection string.

For example, if you like to get data from Blob storage and insert into the cloud database then you will require two linked services. One linked service is your source connection from blog storage to the data factory and another is the target connection which connect from data factory to database.

Creating Linked service is easy, following a few steps via wizard, however; when you create Linked service one of the step will be creating or using called Integration Runtime. Since there are three different type of Integration Runtime (IR), you may get confuse with which one to choose:

Let's explain three different type of Integration Runtime,

1. Azure
2. Self-Hosted
3. Linked Self-Hosted

Please note that, there is one more Integration Runtime which called Azure-SSIS integration Runtime, this IR is required when you shift your SSIS package to Azure. We will not cover SSIS IR in this post.

1. Azure
It's the default IR, means if you don't create anything then it will create automatically and give the name as 'AutoResolveIntegrationRuntime' like as below figure 1.

Fig 1: AutoResolveIntegrationRuntime

If the data integration capabilities like Data Flow, Data movement and Activity dispatch is under public network then 'Azure IR' is used. If Azure IR is not automatically created then you can create manually by clicking the '+New' like below fig 2

Fig 2: Choose to create IR

and you will find below window to create Azure IR (Fig 3)

Fig 3: Choose from different IR

When IR is created, you can find them running under data factory-> Connections

Fig 4: IR running under Data Factory

2. Self-Hosted:

If the source or target is under private network e.g. database maybe resides on premise in that case you must need to create Self-Hosted Runtime. There are a few steps to create Self-Hosted Runtime. While you are working with Azure Data factory, you can create it via data factory UI.
please follow the link to create Self-Hosted IR : https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#create-a-self-hosted-ir-via-azure-data-factory-ui

3. Linked Self-Hosted:

If you already have Self-Hosted IR and need to use the same for other Data Factory then you don't need to create new Self-Hosted IR rather you can link the existing Self-Hosted to the other data factory.

Reference Resources:

https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime

https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime#integration-runtime-location

Saturday, November 25, 2017

How to handle trailing space while you use LEN function in SQL

LEN function in MS SQL returns the number of characters of the specified string expression, excluding trailing blanks. [1]

It means if your data has trailing blank /space in the string then LEN function will ignore that.

e.g.

SELECT LEN('DK123456789 ') as LengthofString

Which give you as length 11, though with space it’s 12.

Well, you can say we know it will be 11 why you should have 12? Or what is problem if it returns length as 11?

Let’s discuss the above scenario, consider above string 'DK123456789’ is a valid company VAT registration number. And your customer asked to get only the number part, so output should be ‘123456789’

So you use RIGHT function as like below:

Fig 1: RIGHT function to find number part

And you get the right result.

However, if your data has issue like trailing blank then can you expect the result`? Let’s see:

Fig 2: When you have data issue

So you lost the first digit since RIGHT function consider space/blank as character so you get 9 character from the right side of the value which include one space but discard first character ‘1’.

The above particular example we already know the length, however; when we query SQL table then we don’t know the length so we must calculate length as well.

so let’s create a table and insert some data:

CREATE Table #Table1 (

[VATRegistration] nvarchar(50))

insert into #Table1 values('DK123456789 ')

insert into #Table1 values('DE7891012879')

insert into #Table1 values('BD989741258')

select * from #Table1

Fig 3: VAT registration data

We make the SQL query to find out only number part from the above dataset, the query is look like:

SELECT [VATRegistration],RIGHT([VATRegistration],LEN([VATRegistration])-2) as [VATRegistrationNo]

FROM #Table1 /*Since we know first two characters need to exclude from the VAT number*/

Fig 4: Output with error data

Does our result correct? Look closer and will find first row has discarded first digit. Why this happened? Because the data has issue, If you look at the above insert statement you will find it has trailing space. So how do we solve this?

The main problem here is to calculate the LEN, since data LEN will not count space as length. Well can’t we try RTRIM then? Sure let’s do it.

SELECT [VATRegistration],RIGHT([VATRegistration],LEN(RTRIM([VATRegistration]))-2) as [VATRegistrationNo]

FROM #Table1 /*Since we know first two characters need to exclude from the VAT number*/

Unfortunately, result did not change, well; it’s not unfortunate; rather this how it’s built, TRIM function will not have any effect under LEN function.

What can we do then? Yes…, REPLACE must be a good option here.

SELECT [VATRegistration],RIGHT([VATRegistration],LEN(REPLACE([VATRegistration],' ','_'))-2) as [VATRegistrationNo]

FROM #Table1 /*Since we know first two characters need to exclude from the VAT number*/

Result look fine now:

Fig 5: Correct output by using REPLACE

Since REPLACE make removed the space and put ‘_’ which is count as character. And RIGHT function return 10 character from right to the left which include space as well.

Fig 6: Where is the difference??

After you get the result you can RTRIM the final value before you present to the report or sending to the business.

[1] https://docs.microsoft.com/en-us/sql/t-sql/functions/len-transact-sql

Tuesday, February 23, 2016

Common Table Expression (CTE) in MS SQL and Teradata

Common Table Expression (CTE) is a SQL syntax that uses in MS SQL, Teradata and other databases. This post will include why will we use CTE? and SQL syntax of CTE.

By using CTE; complex query can be written as simple readable and maintainable way. It’s something like temporary result set but not exactly temporary/volatile table. You can avoid having volatile table if you use CTE. CTE is like dynamic view, however views meta data is stored in the database but CTE doesn't require that.

A recursive CTE can simplify the code required to run a recursive query within a SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. For example, in an organization chart you need display employee’s positions in the organization or e.g. Showing product hierarchies, you need to find out product segment, and then Product template, eventually the product. Above both cases, CTE can be helpful to find out the hierarchical chart.

A simple CTE query will look like below:

with CTE_PRODUCT_X_CUSTOMER (PD_ID,PROD_NAME,CUSTOMER_ID,CUSTOMER_NAME,SALES_DATE)

(

Select P.PD_ID as PD_ID, P.PROD_NAME as PROD_NAME, C.Customer_ID as CUSTOMER_ID, C.CUSTOMER_NAME as CUSTOMER_NAME, S.Sales_date as SALES_DATE

From Sales as S

JOIN

From Product as P on

S.PD_ID=P.PD_ID

JOIN Customer as C

ON S.CST_ID=P.CST_ID

)

select * from CTE_PRODUCT_X_CUSTOMER

Above example deliver sales information of particular customer. of course, this can be done without having CTE, a simple join or view can achieve same output. However, the above example can introduce you CTE syntax.

Now, Lets look at an example where CTE can avoid volatile/temporary table. e.g. you need to update Manufacture_id from your PRODUCT_SALES table. And to update the column Manufacture_id, you are depend on Product_ID matches from other sub queries. The example can go like this:

create volatile table TMP

(

Select K.PD_ID as PD_ID, P.PROD_NAME as PROD_NAME, Ck.Customer_ID as customer_id, P.EFF_DT as EFF_DT

From Product_STG as P

join Product_KEY as k

on p.PRD_CODE=k.PRD_CODE

and p.EFF_DT> '2016-01-29'

JOIN Customer_key as CK

on CK.SRC_ID=P.Source_Id

and p.eff_dt>'2016-01-29'

) with data

on commit preserve rows;

update PRODUCT_SALES

set Manf_id=(select customer_id from TMP

where PRODUCT_SALES.PD_ID=TMP.PD_ID

and PRODUCT_SALES.EFF_DT=TMP.EFF_DT)

Above example required to create volatile table. Now we can see how we can avoid using volatile table; instead, lets use CTE syntax.

with CTE_TEMP (PD_ID,PROD_NAME,customer_id,EFF_DT)

(

Select K.PD_ID as PD_ID, P.PROD_NAME as PROD_NAME, Ck.Customer_ID as customer_id, P.EFF_DT as EFF_DT

From Product_STG as P

join Product_KEY as k

on p.PRD_CODE=k.PRD_CODE

and p.EFF_DT> '2016-01-29'

JOIN Customer_key as CK

on CK.SRC_ID=P.Source_Id

and p.eff_dt>'2016-01-29'

)

update PRODUCT_SALES

set Manf_id=(select customer_id from CTE_TEMP

where PRODUCT_SALES.PD_ID=CTE_TEMP.PD_ID

and PRODUCT_SALES.EFF_DT=CTE_TEMP.EFF_DT)

As you see, you can avoid creating temporary table by using CTE. Please remember, you can’t create view on top of CTE.

Friday, March 6, 2015

SQL Command- WITH Clause (CTE)

Rather writing jargon words I will go straight to the code:

WITH EMP_HIER(EMP_ID,MGR_ID,EMP_NAME)

(Select

EMP_ID,MGR_ID,EMP_NAME

from V_Employee)

Select * from EMP_HIER

When I have seen the above code, first thing I have in my mind is; WITH clause is used to make derived table ‘EMP_HIER’ by using a source table ‘V_Employee’. And it must be used instead of making volatile table.

And then tried to run only first part:

WITH EMP_HIER(EMP_ID,MGR_ID,EMP_NAME)

(Select

EMP_ID,MGR_ID,EMP_NAME

from V_Employee)

Ya!! that was stupid move!! It doesn’t work separately; it works when select statement is there. So when I run whole code, then it works fine J

WITH EMP_HIER(EMP_ID,MGR_ID,EMP_NAME)

(Select

EMP_ID,MGR_ID,EMP_NAME

from V_Employee)

Select * from EMP_HIER

Now next question came in my mind why I am going to use the above syntax, instead I can create volatile table. I did some investigation and share my thoughts with my colleagues and so far I found reason for using ‘WITH’ Clause is below:

1) Data retrieve will not be slower

Explanation:

When we make volatile table by getting data from other table then we generally don’t create table structure for volatile table. So your volatile table doesn’t have index automatically. Whereas if you use WITH clause then you are directly accessing to the source table so it will be faster than accessing to volatile table.

2) Portability

Explanation:

There is different syntax for making volatile in different database systems. E.g.

--In MS SQL:

CREATE TABLE #Temp1

( Name Char( 30 ), seqid integer )

--In Teradata:

create volatile table V_Employee

(

EMP_ID int,

MGR_ID int,

EMP_NAME varchar(80)

)

on commit preserve rows;

But ‘WITH clause’ is same for both MS SQL and Teradata database. So you don’t need to change the syntax if your code needs to work for the both databases.

3) Not enough pool space to create volatile table

Explanation:

Sometimes you may not have enough pool space to create volatile table in your development environment and then you can use WITH clause to do the same.

In addition, my study found, ‘WITH clause’ will be helpful when recursive data query will be needed e.g. find out different level of manager of a particular employee in a big organization. However, I will cover recursive queries in other post.

FYI, ‘WITH clause’ has official name which called Common Table Expression (CTE).

Pages

Sunday, April 26, 2020

Basic differences between SQL and NoSQL.

Saturday, March 21, 2020

Finding unused Schemas in SQL database

Wednesday, February 19, 2020

What is Linked services in Azure Data factory(ADF)?

Saturday, November 25, 2017

How to handle trailing space while you use LEN function in SQL

Tuesday, February 23, 2016

Common Table Expression (CTE) in MS SQL and Teradata

Friday, March 6, 2015

SQL Command- WITH Clause (CTE)