Saturday, May 23, 2020

Parameterize data source in Power BI and Tricks for source as Salesforce

Parameters in Power BI can be used for different purposes, one of the useful case to hold environment variable. For example, while you are building Power BI report source can be connected to DEV environment and as soon as report is ready to release you may want to connect the source from PRE-PROD/PROD environment. If you use parameter then it's become easy to handle.

How to create Parameter?

Open power query editor and then find Manage parameter and create New parameter as like below fig 1.0

Fig 1.0: Find New Parameter

My source is Salesforce and have three different environments, QA, UAT and PROD.  At first provide the name of the parameter, e.g. my parameter name is: Environment, then choose Type as 'Text' and for Suggested values please choose 'List of values' as like below figure 2.0.

Fig 2.0: Creating parameter for different environment

Under List of values you will list down all your source environment and Default and Current value you need to set from those list of values. e.g. if would like to use QA environment then you need to choose the URL for QA for both default and current value.

Fig 3.0: Default and current value

As soon as you click 'OK' button then you are done. Yes, you are done with current value you set up; if you are connected for QA that's work fine. However, as soon as you change your environment from current value QA to UAT that works also fine if your source is database, but not for Salesforce source as expected.

What special about source as Salesforce?

After changing the source from QA to UAT and refresh the model, I got the below error:

Fig 3.0: Error after changing environment

How to fix it or What do you do after environment update for source as Salesforce ?

If you run into the same situation then please go to the power query editor then go to the source query as like below figure 4.0 for each table and then validate it by enter or the validation button (tick mark as like in the below figure). You will find error disappear. It's now can identity correct environment to connect.

Fig 4.0: Refresh data source 

Update Environment from Power BI service:
Since you have created the parameter, so you can update the parameter value from Power  BI service as well. Firstly login to the power BI portal and then find your published datasets as like below figure 5.0.

Fig 5.0: Settings under Datasets

And clicking ellipsis (...) you will find settings 

                                          Fig 6.0: Settings menu

Under settings you will see the parameter, where you can update the environment, you can replace UAT by PROD URL and click Apply which will connect to PROD environment and refresh your power BI model.

Sunday, April 26, 2020

Basic differences between SQL and NoSQL.

Around three decades Oracle, IBM, Microsoft relational databases were consistent leaders for Operational Database Management Systems. However, currently all of these vendors offer NoSQL platform parallel to their traditional database platform e.g. Microsoft got Azure Cosmos DB as NoSQL database and so on.

Journey of NoSQL:
The world of NoSQL emerged in 1998 and it's gained popularity in 2010s. In 2015, the popular NoSQL database provider MongoDB appeared in Gartners magic Quardent as a leader for operational database management system.

Fig 1:Gartners Magic Quadrant in 2015

Though recent trends shows that NoSQL is getting more popular in Big data landscape than relational database management system.

What is NoSql and Why?

NoSQL stands for 'Not only SQL'. NoSQL database is for storing both structure and semi structure data.

NoSQL stores data in one of four categories:
  1. Key-Value storage
  2. Document storage
  3. Wide Column storage
  4. Graph database
The most popular NoSQL  database MongoDB is document storage, however, it can be used as a key/value store, which is just a subset of features.

Today data become more broader in terms of shape and size. Data is spread around documents, social media, complex web application, IoT sources. The traditional SQL database can't handle these data due to the continuous changing behavior. Traditional database need to know shape of the data (schema) beforehand to store those, hence it failed to capture continuous evolving data. And thus, NoSQL emerge and conquer the world!!!

Fixed schema vs. No Schema

For structure database (traditional SQL database) you need to have schema ready before you insert data to a table but for NoSQL you don't need to have schema created at first rather you can directly insert the data. Though, it's better to say 'Schema agnostice' than 'Schemaless' e.g. Microsoft's NoSQL database Azure Cosmos DB known as schema agnostic database, which is not bound by schemas but are aware of the schemas. 

Relationship vs. No Relationship

Traditional SQL require to maintain relationship to perform, that's why normalization is there. Whereas, NoSQL doesn't require to maintain the relationship. You can embed a few tables into one table in NoSQL database.

Below table (fig:2) shows basic differences between SQL and NoSQL database:
Fig 2: Comparison between SQL and NoSQL

Conclusion: NoSQL denotes 'Not only SQL', it means NoSQL database can perform what traditional relational database can do as well as do more. But these two types of Database have different expertise,  as per business requirements you can choose from them. A way to find out which fit best for you could be asking the question. Do you know the shape of data well advance? If answer is 'Yes' which means you can define schema earlier and can build the relationship before data arrive then it's good to choose traditional SQL. On the other hand, when you don't know the shape of data and behavior changes continuously then go with NoSQL.  Nevertheless, some complex enterprise solution can be built by using both SQL and NoSQL database to leverage best of each.

Saturday, March 21, 2020

Finding unused Schemas in SQL database

Fun part of a developer life is: 'Today what is unknown, tomorrow you know it by learning and doing". And by learning the unknown and implementing you become expert :) I will share something which I learned and develop recently. Long back I got a task to find out used schemas in database where the solution was pretty straight forward. However, this time the task was to get unused schemas. Firstly, I thought this must be fun, never did it but someone must did it so started with google and got some clue but not with proper explanation so thought about writing it.

To find out the solution, you need to go through three tables:
1) Sys.Schemas
2) Sys.database_principals
3) Sys.objects

Easy way to find unused schemas is : If the schema_id from sys.schemas table is not found in Sys.objects table means those are never used. And both tables schemas and objects got the column ‘Schema_ID’. Woww!! We know the solution, yes all most 😊 Why we need database_principals table? When schema name is not specified for those cases we need to use this table. Relations among three tables can be depicted with below diagram:

Fig 1: Relationship among tables

We covered theory part, let's see in action, below is the SQL code to find out unused schemas in database:


FROM sys.schemas SCH

where SCH.schema_id not between 1 and 4    --This is for dbo, guest, information_schema,sys

and SCH.principal_id not between 16384 and 16399 ---  it's related to db_* schema, so don't touch this.

and not exists

( select 1

from sys.database_principals P

where     ---Name to be used when SQL name does not specify a schema, so we are taking this off


and not exists

( select 1

from sys.objects O

where O.[schema_id]=SCH.[schema_id]   ---Object uses the schema, so if schema id is not exist in the object; it means the schema is not used.


order by

You know how to find unused schema, now you can delete them if you want.

Wednesday, February 19, 2020

Azure Data factory(ADF): Important components you need to know.

If you are new to Azure Data Factory and started to build the very first pipeline. Assuming, you already found the source and target. Your source data either from on premise or cloud database or blob storage or from any Web API. Whether you connect with storage account or on-premise /cloud database you require Linked Services. Linked Services is like Connection string.

For example, if you like to get data from Blob storage and insert into the cloud database then you will require two linked services. One linked service is your source connection from blog storage to the data factory and another is the target connection which connect from data factory to database.

Creating Linked service is easy, following a few steps via wizard, however; when you create Linked service  one of the step will be creating or using called Integration Runtime.  Since there are three different type of Integration Runtime (IR), you may get confuse with which one to choose:

Let's explain three different type of Integration Runtime,

1. Azure
2. Self-Hosted
3. Linked Self-Hosted

Please note that, there is one more Integration Runtime which called Azure-SSIS integration Runtime, this IR is required when you shift your SSIS package to Azure. We will not cover SSIS IR in this post.

1. Azure
It's the default IR, means if you don't create anything then it will create automatically and give the name as 'AutoResolveIntegrationRuntime' like as below figure 1.

Fig 1: AutoResolveIntegrationRuntime

If the data integration capabilities like Data Flow, Data movement and Activity dispatch is under public network then 'Azure IR' is used. If Azure IR is not automatically created then you can create manually by clicking the '+New' like below fig 2

Fig 2: Choose to create IR

 and you will find below window to create Azure IR (Fig 3)

Fig 3: Choose from different IR

When IR is created, you can find them running under data factory-> Connections

Fig 4: IR running under Data Factory

2. Self-Hosted:

If the source or target is under private network e.g. database maybe resides on premise in that case you must need to create Self-Hosted Runtime. There are a few steps to create Self-Hosted Runtime. While you are working with Azure Data factory, you can create it via data factory UI.
please follow the link to create Self-Hosted IR :

3. Linked Self-Hosted: 

If you already have Self-Hosted IR and need to use the same for other Data Factory then you don't need to create  new Self-Hosted IR rather you can link the existing Self-Hosted to the other data factory.

Saturday, January 18, 2020

Azure Data Factory: How to add and remove code repository to ADF

If you are working with Azure data Factory (ADF), the very first step of the development is to connect with code repository, well!!  you can develop without connecting code repository but  collaboration work and version control can't be there.

In general when you work with ADF and publish code is saved under ADF service but no repository is there means you can't see who saved the code or which version of the code you are working with.
For example, below Fig 01 give me option to choose from ADF mode or use Github. If I choose ADF mode then there will be no repository for the project, however; with Github I will have that option.

Fig 01: ADF repo ot ADF mode

Step 1: Create source source repository

If you have visual studio subscription then you can get free azure DevOps account and then create repository there, if don't then please go to  Github ( and create a repository which is completely free. It's very simple to create a repository in Github, just a few click.

Step 2: Login to Azure Data Factory and choose relevant repository type

Let's log into the Azure Data Factory and after clicking 'Author and Monitor' you will have the page for the development work. At the top left of the page like below Fig 02, you will find option called 'Set up Code repository'.

Fig 02: Set up Code repository

And you get options to choose from either Azure DevOps Git or Github (see above fig: 02). If you already have visual studio account then you can choose 'Azure DevOps', otherwise you need to use Github.

For example, in my case; I logged in Azure portal with my personal account and the account doesn't have any association with Visual studio account. So when choose 'Azure DevOps' nothing is showing(Fig 03)
Fig 03: Azure Devops require VS account

Step 2: Settings for the chosen repository type

Let's go to the next step where you have already chosen repository type 'Github' and now please fill up the necessary information as like shown in figure 04.

Fig 04: Detail for GitHub

Github repository is added to the Azure data factory, so it's become easy to collaborate and deploy the pipeline.

Fig 05: Repository is connected

Removing Code Repository:

The code repository have been added for ADF, however, for any reason if you want to remove the repository, then?

You need to find out 'repo settings', Please come out from Author tab and switch to Home tab to find 'repo settings' as like figure 06.

Fig 06: Find Git repo

As long as you click the 'repo settings', you will find below screen (Fig 07)  there will be option to remove the repository, just click the 'remove git' you are done!!

Fig 07: Removing Git repo