Pages

Saturday, June 20, 2020

Adding Active Directory users to Azure SQL databases

If you are working with Azure SQL Database you will find adding user a bit different than general SQL DB, especially; adding Active Directory users. Though it's easy and a few steps, off course if you know the steps. While I was working on it and went through different routes so thought about sharing the easy one which worked like Gem.

There are two authentication approach for Azure SQL DB and SQL Managed Instance.

1) SQL Authentication
2) Azure Active directory Authentication


Fig 1.0: Authentication type in Azure


SQL authentication is straight forward, while you are creating database in Azure, you must have to put SQL server credential which will become your Azure database credential if you don't create database credential separately. As long as you have administrative credential to login to the database then you create all necessary users. However, you can't create Active directory users in Azure by logging as SQL authenticate user.

For example, I have logged in with the SQL admin user and then tried to execute below query to add Active directory user:

CREATE USER [user1@domain.com] 
FROM EXTERNAL PROVIDER 
WITH DEFAULT_SCHEMA = dbo;  
  
--add user to role(s) for the particular database
ALTER ROLE dbmanager ADD MEMBER [user1@domain.com]; 
ALTER ROLE loginmanager ADD MEMBER [user1@domain.com]; 

However, found below error:

Fig 2.0: Error creating AD user

The error message is self explanatory: "Principal 'user1@domain.com' could not be created. Only connections established with Active Directory accounts can create other Active Directory users."

So, how to solve it? Or How to add the AD user to Azure SQL DB ?

Step 1: Please login to you azure portal and find out your SQL server resource and you will find 'Active directory Admin' left side under settings. Please click 'Active Directory Admin' to find out UI like below where click 'Set admin' , now your Active Directory user account become Admin to the SQL server. 

Fig 3.0:  Active Directory Admin

Step 2: Now either use SSMS or Azure data studio and login with Active directory authentication. Since my organization have Multi factor Authentication (MFA) enables so I have chosen Azure Active Directoty- Universal with MFA, but you can choose either 'Active directory - Integrated' or 'Active Directory - Password.'

Fig 4: Login with AD user


Step 3: And now you can run the query to add Active directory user and give permission as you want, though traditional database level roles like db_datareader, db_datawriter, db_ddladmin, etc. are the same in Azure database, but roles like sysadmin, serveradmin, etc. don’t exist in Azure SQL database. There are two admin roles, dbmanager (similar to dbcreator) that can create and drop databases, and loginmanager (similar to securityadmin) that can create new logins which you can see in the below code:


CREATE USER [user1@domain.com] 
FROM EXTERNAL PROVIDER 
WITH DEFAULT_SCHEMA = dbo;  
  
-- add user to role(s) for the particular database
ALTER ROLE dbmanager ADD MEMBER [user1@domain.com]; 
ALTER ROLE loginmanager ADD MEMBER [user1@domain.com]; 


As soon as you execute the above code the AD user got all the necessary access to perform activities. 



  

Saturday, May 23, 2020

Parameterize data source in Power BI and Tricks for source as Salesforce

Parameters in Power BI can be used for different purposes, one of the useful case to hold environment variable. For example, while you are building Power BI report source can be connected to DEV environment and as soon as report is ready to release you may want to connect the source from PRE-PROD/PROD environment. If you use parameter then it's become easy to handle.

How to create Parameter?

Open power query editor and then find Manage parameter and create New parameter as like below fig 1.0

Fig 1.0: Find New Parameter

My source is Salesforce and have three different environments, QA, UAT and PROD.  At first provide the name of the parameter, e.g. my parameter name is: Environment, then choose Type as 'Text' and for Suggested values please choose 'List of values' as like below figure 2.0.

Fig 2.0: Creating parameter for different environment


Under List of values you will list down all your source environment and Default and Current value you need to set from those list of values. e.g. if would like to use QA environment then you need to choose the URL for QA for both default and current value.

Fig 3.0: Default and current value


As soon as you click 'OK' button then you are done. Yes, you are done with current value you set up; if you are connected for QA that's work fine. However, as soon as you change your environment from current value QA to UAT that works also fine if your source is database, but not for Salesforce source as expected.

What special about source as Salesforce?

After changing the source from QA to UAT and refresh the model, I got the below error:

Fig 3.0: Error after changing environment


How to fix it or What do you do after environment update for source as Salesforce ?

If you run into the same situation then please go to the power query editor then go to the source query as like below figure 4.0 for each table and then validate it by enter or the validation button (tick mark as like in the below figure). You will find error disappear. It's now can identity correct environment to connect.

Fig 4.0: Refresh data source 

Update Environment from Power BI service:
Since you have created the parameter, so you can update the parameter value from Power  BI service as well. Firstly login to the power BI portal and then find your published datasets as like below figure 5.0.

Fig 5.0: Settings under Datasets

And clicking ellipsis (...) you will find settings 

                              
                                          Fig 6.0: Settings menu


Under settings you will see the parameter, where you can update the environment, you can replace UAT by PROD URL and click Apply which will connect to PROD environment and refresh your power BI model.


Sunday, April 26, 2020

Basic differences between SQL and NoSQL.

Around three decades Oracle, IBM, Microsoft relational databases were consistent leaders for Operational Database Management Systems. However, currently all of these vendors offer NoSQL platform parallel to their traditional database platform e.g. Microsoft got Azure Cosmos DB as NoSQL database and so on.

Journey of NoSQL:
 
The world of NoSQL emerged in 1998 and it's gained popularity in 2010s. In 2015, the popular NoSQL database provider MongoDB appeared in Gartners magic Quardent as a leader for operational database management system.

Fig 1:Gartners Magic Quadrant in 2015


Though recent trends shows that NoSQL is getting more popular in Big data landscape than relational database management system.



What is NoSql and Why?

NoSQL stands for 'Not only SQL'. NoSQL database is for storing both structure and semi structure data.

NoSQL stores data in one of four categories:
  1. Key-Value storage
  2. Document storage
  3. Wide Column storage
  4. Graph database
The most popular NoSQL  database MongoDB is document storage, however, it can be used as a key/value store, which is just a subset of features.

Today data become more broader in terms of shape and size. Data is spread around documents, social media, complex web application, IoT sources. The traditional SQL database can't handle these data due to the continuous changing behavior. Traditional database need to know shape of the data (schema) beforehand to store those, hence it failed to capture continuous evolving data. And thus, NoSQL emerge and conquer the world!!!

Fixed schema vs. No Schema

For structure database (traditional SQL database) you need to have schema ready before you insert data to a table but for NoSQL you don't need to have schema created at first rather you can directly insert the data. Though, it's better to say 'Schema agnostice' than 'Schemaless' e.g. Microsoft's NoSQL database Azure Cosmos DB known as schema agnostic database, which is not bound by schemas but are aware of the schemas. 


Relationship vs. No Relationship

Traditional SQL require to maintain relationship to perform, that's why normalization is there. Whereas, NoSQL doesn't require to maintain the relationship. You can embed a few tables into one table in NoSQL database.


Below table (fig:2) shows basic differences between SQL and NoSQL database:
Fig 2: Comparison between SQL and NoSQL



Conclusion: NoSQL denotes 'Not only SQL', it means NoSQL database can perform what traditional relational database can do as well as do more. But these two types of Database have different expertise,  as per business requirements you can choose from them. A way to find out which fit best for you could be asking the question. Do you know the shape of data well advance? If answer is 'Yes' which means you can define schema earlier and can build the relationship before data arrive then it's good to choose traditional SQL. On the other hand, when you don't know the shape of data and behavior changes continuously then go with NoSQL.  Nevertheless, some complex enterprise solution can be built by using both SQL and NoSQL database to leverage best of each.




Saturday, March 21, 2020

Finding unused Schemas in SQL database

Fun part of a developer life is: 'Today what is unknown, tomorrow you know it by learning and doing". And by learning the unknown and implementing you become expert :) I will share something which I learned and develop recently. Long back I got a task to find out used schemas in database where the solution was pretty straight forward. However, this time the task was to get unused schemas. Firstly, I thought this must be fun, never did it but someone must did it so started with google and got some clue but not with proper explanation so thought about writing it.

To find out the solution, you need to go through three tables:
1) Sys.Schemas
2) Sys.database_principals
3) Sys.objects


Easy way to find unused schemas is : If the schema_id from sys.schemas table is not found in Sys.objects table means those are never used. And both tables schemas and objects got the column ‘Schema_ID’. Woww!! We know the solution, yes all most 😊 Why we need database_principals table? When schema name is not specified for those cases we need to use this table. Relations among three tables can be depicted with below diagram:

Fig 1: Relationship among tables

We covered theory part, let's see in action, below is the SQL code to find out unused schemas in database:

SELECT name

FROM sys.schemas SCH

where SCH.schema_id not between 1 and 4    --This is for dbo, guest, information_schema,sys

and SCH.principal_id not between 16384 and 16399 ---  it's related to db_* schema, so don't touch this.

and not exists

( select 1

from sys.database_principals P

where P.default_schema_name=SCH.name     ---Name to be used when SQL name does not specify a schema, so we are taking this off

)

and not exists

( select 1

from sys.objects O

where O.[schema_id]=SCH.[schema_id]   ---Object uses the schema, so if schema id is not exist in the object; it means the schema is not used.

)

order by SCH.name


You know how to find unused schema, now you can delete them if you want.

Wednesday, February 19, 2020

Azure Data factory(ADF): Important components you need to know.

If you are new to Azure Data Factory and started to build the very first pipeline. Assuming, you already found the source and target. Your source data either from on premise or cloud database or blob storage or from any Web API. Whether you connect with storage account or on-premise /cloud database you require Linked Services. Linked Services is like Connection string.

For example, if you like to get data from Blob storage and insert into the cloud database then you will require two linked services. One linked service is your source connection from blog storage to the data factory and another is the target connection which connect from data factory to database.


Creating Linked service is easy, following a few steps via wizard, however; when you create Linked service  one of the step will be creating or using called Integration Runtime.  Since there are three different type of Integration Runtime (IR), you may get confuse with which one to choose:

Let's explain three different type of Integration Runtime,

1. Azure
2. Self-Hosted
3. Linked Self-Hosted

Please note that, there is one more Integration Runtime which called Azure-SSIS integration Runtime, this IR is required when you shift your SSIS package to Azure. We will not cover SSIS IR in this post.

1. Azure
It's the default IR, means if you don't create anything then it will create automatically and give the name as 'AutoResolveIntegrationRuntime' like as below figure 1.

Fig 1: AutoResolveIntegrationRuntime

If the data integration capabilities like Data Flow, Data movement and Activity dispatch is under public network then 'Azure IR' is used. If Azure IR is not automatically created then you can create manually by clicking the '+New' like below fig 2


Fig 2: Choose to create IR

 and you will find below window to create Azure IR (Fig 3)

Fig 3: Choose from different IR

When IR is created, you can find them running under data factory-> Connections

Fig 4: IR running under Data Factory



2. Self-Hosted:

If the source or target is under private network e.g. database maybe resides on premise in that case you must need to create Self-Hosted Runtime. There are a few steps to create Self-Hosted Runtime. While you are working with Azure Data factory, you can create it via data factory UI.
please follow the link to create Self-Hosted IR : https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#create-a-self-hosted-ir-via-azure-data-factory-ui



3. Linked Self-Hosted: 

If you already have Self-Hosted IR and need to use the same for other Data Factory then you don't need to create  new Self-Hosted IR rather you can link the existing Self-Hosted to the other data factory.


Saturday, January 18, 2020

Azure Data Factory: How to add and remove code repository to ADF

If you are working with Azure data Factory (ADF), the very first step of the development is to connect with code repository, well!!  you can develop without connecting code repository but  collaboration work and version control can't be there.

In general when you work with ADF and publish code is saved under ADF service but no repository is there means you can't see who saved the code or which version of the code you are working with.
For example, below Fig 01 give me option to choose from ADF mode or use Github. If I choose ADF mode then there will be no repository for the project, however; with Github I will have that option.


Fig 01: ADF repo ot ADF mode

Step 1: Create source source repository

If you have visual studio subscription then you can get free azure DevOps account and then create repository there, if don't then please go to  Github (https://github.com/) and create a repository which is completely free. It's very simple to create a repository in Github, just a few click.


Step 2: Login to Azure Data Factory and choose relevant repository type

Let's log into the Azure Data Factory and after clicking 'Author and Monitor' you will have the page for the development work. At the top left of the page like below Fig 02, you will find option called 'Set up Code repository'.

Fig 02: Set up Code repository

And you get options to choose from either Azure DevOps Git or Github (see above fig: 02). If you already have visual studio account then you can choose 'Azure DevOps', otherwise you need to use Github.

For example, in my case; I logged in Azure portal with my personal account and the account doesn't have any association with Visual studio account. So when choose 'Azure DevOps' nothing is showing(Fig 03)
Fig 03: Azure Devops require VS account


Step 2: Settings for the chosen repository type

Let's go to the next step where you have already chosen repository type 'Github' and now please fill up the necessary information as like shown in figure 04.


Fig 04: Detail for GitHub

Github repository is added to the Azure data factory, so it's become easy to collaborate and deploy the pipeline.

Fig 05: Repository is connected


Removing Code Repository:

The code repository have been added for ADF, however, for any reason if you want to remove the repository, then?


You need to find out 'repo settings', Please come out from Author tab and switch to Home tab to find 'repo settings' as like figure 06.

Fig 06: Find Git repo


As long as you click the 'repo settings', you will find below screen (Fig 07)  there will be option to remove the repository, just click the 'remove git' you are done!!

Fig 07: Removing Git repo






Wednesday, December 25, 2019

Environment variable setup in SSIS Catalogue

If your SSIS solution is built with project deployment mode(not package deployment mode) and would like introduce environment variable then this post is for you.

Let's start with a few steps before going into environment variable. In general, after building  SSIS solution, you get .ispac file which you take from DEV to UAT/TEST/PREPROD/PROD environment. And then by following the wizard you deploy the .ispac file under SSIS catalogue.  To know step by step process to deploy SSIS package to catalogue please follow : Deploying SSIS package to Catalogue

After deployment , you will find the packages under SSIS catalogue, as like below figure 01:


Fig 01: SSIS Catalogue


At next step you will create  job/Jobs  to run the packages. Now say, if you don't create any environment variable then the job will like below:

Fig 2: Job Properties without Environment variable

Will this job execute without environment variable? Surely, it does. But the connection string still retain development server information. However, the solution is already moved to UAT or Test Environment, so you need to change the connection string manually.   If  Environment variable have been defined then it can be assigned to the job so the same job (process) can run for different environment (UAT/Pre-PROD/PROD) by pointing the right environment.

Job with environment variable look like below figure 3:


Fig 3: Job with Environment variable

At this setup, you can choose from different environments before running the job. And each environment can hold different source and target database as well as source and destination file location. Now let's see how to create the environment  variable.


Step 1: Create environment variable
Under SSISDB, you will find Environment folder as like below picture, right mouse click and follow the step.

Fig 4: Create environment variable


Step 2: Property set up for environment variable

At this step, you will define connection strings, source/destination file path etc.

Fig 5: setup properties for environment



Step 3: Connect environment variable from project, Under SSISDB catalog choose the project and then right mouse click to find 'Configure'.

Fig 6: Configure environment variable



And then add reference to the project, References->Add and then environment can be chosen.

Fig 7: adding environment variable reference to the project



Step 4: From step 1 to 3, we worked with SSISDB, Now let's look at the job, SQL Server Agent->Jobs, where will have option to choose from environment

Fig 8: Job Properties
And from the properties, click on Steps and then you will find variable to choose from.



In this example, only UAT environment have been created but as per your need Pre-PROD, PROD can be created and source and target database can be assigned accordingly. And those environment can be assigned to job.