Wednesday, November 29, 2023

How to solve available workspace capacity exceeded error (Livy session) in Azure Synapse Analytics?

Livy session errors in Azure Synapse with Notebook

If you are working with Notebook in Azure Synapse Analytics and multiple people are using the same Spark cluster at the same time then chances are high that you have seen any of these below errors: 

"Livy Session has failed. Session state: error code: AVAILBLE_WORKSPACE_CAPACITY_EXCEEDED.You job requested 24 vcores . However, the workspace only has 2 vcores availble out of quota of 50 vcores for node size family [MemoryOptmized]."

"Failed to create Livy session for executing notebook. Error: Your pool's capacity (3200 vcores) exceeds your workspace's total vcore quota (50 vcores). Try reducing the pool capacity or increasing your workspace's vcore quota. HTTP status code: 400."

"InvalidHttpRequestToLivy: Your Spark job requested 56 vcores. However, the workspace has a 50 core limit. Try reducing the numbers of vcores requested or increasing your vcore quota. Quota can be increased using Azure Support request"

The below figure:1 shows one of the error while running the Synapse notebook.

Fig 1: Error Available workspace exceed

What is Livy?

Your initial thought will likely be, "What exactly is this Livy session?"! Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface [1]. Whenever we execute a notebook from Azure Synapse Analytics Livy helps interact with Spark engine.


Will it fix if you increase vCores?

By looking at the error message you may attempt to increase the vCores; however, increasing the vCores will not solve your problem. 

Let's look into how Spark pool works, by definition of Spark pool; when Spark pool instantiated it's create a Spark instance that process the data. Spark instances are created when you connect to a Spark pool, create a session and run the job. And multiple users may have access to a single Spark pool. When multiple users are running a job at the same time then it may happen that the first job already used most vCores so the another job executed by other user will find the Livy session error . 

How to solve it?

The problem occurs when multiple people work on same Spark pool or same user running more than one Synapse notebook in parallel. When multiple Data Engineer works on same Spark pool in Synapse they can configure the session and save into their DevOps branch. Let's look into it step by step:

1. You will find configuration button (a gear icon) as below fig 1 shown:
Fig 2: configuration button

By clicking on the configuration button you will find details about Spark pool configuration details as shown in the below diagram:
Fig 3: Session details
And this is your session, as you find in the above Fig 3. And there is no active session. 

2. Activate your session by attaching the Spark pool and assigning right resources. Please find the below fig 4 and details steps to avoid the Livy session error.

Fig 4: Fine-tune the Settings

a) Attach the Spark pool from available pools (if you have more than one). I got only one Spark pool.
b) Select session size, I have chosen small by clicking the 'Use' button
c) Enable Dynamically allocate executor, this will help Spark engine to allocate the executor dynamically.
d) You can change Executors, e.g. by default small session size have 47 executors however; I have chosen 3 to 18 executors to free up the rest for other users. Sometimes you may need to get down to 1 to 3 executors to avoid the errors.
e) And finally apply the changes to your session.

And you can commit this changes to your DevOps branch for the particular notebook you are working on so that you don't need to apply the same settings again.

In addition, since the maximum Synapse workspace's total vCores is 50, you can create request to Microsoft to increase the vCores for your organization.

In summary, the Livy session error is a common error when multiple data engineers working in the same Spark pool. So it's important to understand how you can have your session setup correctly so that more than one session can be executed at the same time.



Saturday, July 1, 2023

Provisioning Microsoft Fabric: A Step-by-Step Guide for Your Organization

Learn how to provision Microsoft Fabric with ease! Unveiled at Microsoft Build 2023, Microsoft Fabric is a cutting-edge Data Intelligence platform that has taken the industry by storm. This blog post will cover a simple step-by-step guide to provisioning Microsoft Fabric using the efficient Microsoft Fabric Capacity.

Before we look into the Fabric Capacity, let's get into different licensing models. There are three different types of licenses you can choose from to start working with Microsoft Fabric:
1. Microsoft Fabric trial
It's free for two months and you need to use your organization email. Personal email doesn't work. Please find how you can activate your free trial
2. Power BI Premium Per Capacity (P SKUs)
If you already have a Power BI premium license you can work with Microsoft Fabric. 
3. Microsoft Fabric Capacity (F SKUs)
With your organizational Azure subscription, you can create Microsoft Fabric capacity. You will find more details about Microsoft Fabric licenses
We will go through the steps in detail on how Microsoft Fabric capacity can be provisioned from Azure Portal.

Step 1: Please login to the Azure Portal and find Microsoft Fabric from your organization's Azure Portal as shown below in Fig 1.
Fig 1: Finding Microsoft Fabric in Azure Portal

Step 2: To create the Fabric capacity, the very first step is to create a resource group as shown in Fig 2


Fig 2: Creating resource group

Step 3: To create the right capacity you need to choose the resource size that you require. e.g. I have chosen F8 as shown in the below figure 3

Fig 3: Choose the right resource size


You will find more capacity details in this Microsft blog post.

Step 4: As shown below in Fig 4, before creating the capacity please review all the information you have provided including resource group, region, capacity size, etc., and then hit the create button. 

Fig 4: Review and create


When it's done you will able to see Microsoft Fabric Capacity is created (see below fig 5)

Fig 5: Microsoft Fabric capacity created

You can go to the admin portal to validate your recently created Fabric capacity too. Fig 6, shows the Fabric capacity under the admin panel.

Fig 6: Fabric capacity under admin portal


To explore the Microsoft Fabric please browse through the site and you will find the capacity you just created finally and the home page will look like below in Fig 7

Fig 7: Fabric Home page


In summary, you have a few ways to use Microsoft Fabric and learned provisioning Fabric by using Fabric Capacity. However, it's important to remember you must need to Enable Microsoft Fabric for your organization.


Wednesday, May 24, 2023

What is OneLake in Microsoft Fabric?

Get ready to be blown away! The highly anticipated Microsoft Build 2023 has finally unveiled its latest and greatest creation: the incredible Microsoft Fabric - an unparalleled Data Intelligence platform that is guaranteed to revolutionize the tech world!



fig 1: OneLake for all Data

One of the most exciting things in Fabric I found is OneLake. I was amazed to discover how OneLake is simplified just like OneDrive! It's a single unified logical SaaS data lake for the whole organization (no data silos). Over the past couple of months, I've had the incredible opportunity to engage with the product team and dive into the private preview of Microsoft Fabric. I'm sharing my learning through the Private Preview via this blog post, emphasizing that it is not an exhaustive list of what OneLake encompasses.

 

I got OneLake installed on my PC and can easily access the data in the OneLake like OneDrive as shown in Fig 2:


Fig 2: OneLake is like OneDrive on a PC


Single unified Managed and Governed SaaS Data Lake 

All Fabric items keep their data in OneLake so no data silos. OneLake is fully compatible with Azure Data Lake Storage Gen 2 at the API layer which means it can be accessible as ADLS Gen 2.

Let's investigate some of the benefits of OneLake:

Fig 3: Unified management and governance
  • OneLake comes automatically provisioned with every Microsoft Fabric tenant with no infrastructure to manage.

  • Any data in OneLake works with out-of-the-box governance such as data linage, data protection, certification, catalog integration, etc. Please note that this feature is not part of the public preview.

  • OneLake enables distributed ownership. Different workspaces allow different parts of the organization to work independently while still contributing to the same data lake
  • Each workspace can have its own administrator, access control, region, and capacity for billing

Do you have requirements that data must reside in those countries?

Fig 4: OneLake covers data residency

Yes, your requirement is covered through OneLake. If you're concerned about how to effectively manage data across multiple countries while meeting local data residency requirements, fear not - OneLake has got you covered! With its global span, OneLake enables you to create different workspaces in different regions, ensuring that any data stored in those workspaces also reside in their respective countries. Built on top of the mighty Azure Data Lake Store gen2, OneLake is a powerhouse solution that can leverage multiple storage accounts across different regions, while virtualizing them into one seamless, logical lake. So go ahead and take the plunge - OneLake is ready to help you navigate the global data landscape with ease and confidence!


Data Mesh as a Service:


OneLake gives a true data mesh as a service. Business groups can now operate autonomously within a shared data lake, eliminating the need to manage separate storage resources. The implementation of the data mesh pattern has become more streamlined. OneLake enhances this further by introducing domains as a core concept. A business domain can have multiple workspaces, which typically align with specific projects or teams.



Open Data Format

Simply, no matter which item you start with, they will all store their data in OneLake similar to how Word, Excel, and PowerPoint save documents in OneDrive.

 

You will see files and folders just like you would in a data lake today. All workspaces are going to be folders, each data item will be a folder. Any tabular data will be stored in delta lake format. There are no new proprietary file formats for Microsoft Fabric. Proprietary formats create data silos. Even the data warehouse will natively store its data in Delta Lake parquet format. While Microsoft Fabric data items will standardize on delta parquet for tabular data, OneLake is still a Data Lake built on top of ADLS gen2. It will support any file type, structured or unstructured.


Shortcuts/Data Virtualization


Shortcuts virtualize data across domains and clouds. A shortcut is nothing more than a symbolic link that points from one data location to another. Just like you can create shortcuts in Windows or Linux, the data will appear in the shortcut location as if it were physically there.


fig 5: Shortcuts

As shown in above fig 4, if you have existing data lakes stored in ADLS gen2 or in Amazon S3 buckets. These Lakes can continue to exist and be managed externally by OneLake in Microsoft Fabric.

Shortcuts will help to avoid data movements or duplication. It’s easy to create Shortcuts from Microsoft Fabric as shown in Figure 6:

 Fig 6: Shortcuts from Microsoft Fabric


OneLake Security


In the current preview, data in OneLake is secured at the item or workspace level. A user will either have access or not. Additional engine-specific security can be defined in the T-SQL engine. These security definitions will not apply to other engines. Direct access to the item in the lake can be restricted to only users who are allowed to see all the data for that warehouse. 

In addition, Power BI reports will continue to work against data in OneLake as the analysis services can still leverage the security defined in the T-SQL engine through DirectQuery mode and can sometimes still optimize to DirectLake mode depending on the security defined.


In summary, OneLake is a revolutionary advancement in the data and analytics industry, surpassing my initial expectations. It transforms Data Lakes into user-friendly OneDrive-like folders, providing unprecedented convenience. The delta file format is the optimal choice for Data Engineering workloads. With OneLake, Data Engineers, Data Scientists, BI professionals, and business stakeholders can collaborate more effectively than ever before. To find out more about Microsoft Fabric and OneLake, please visit Microsoft Fabric Document.


Sunday, April 23, 2023

How to solve Azure hosted Integration Runtime (IR) validation error in Synapse Analytics?

This blog post shares recent learning while working with Data flows in Azure Synapse Analytics.

The ETL pipeline was developed using Azure Synapse Data Flows. However, when attempting to merge the code changes made in the feature branch into the main branch of the DevOps code repository, a validation error occurred, as shown below in Figure 1:

Fig 1: Validation error

It is worth noting that the same pipeline was executed in Debug mode, and it ran successfully without encountering any errors, as depicted in Figure 2.:

Fig 2: Successfully run on debug mode


On the one hand, when trying to merge the code into the main branch from the feature branch it throws a validation error, on the other hand, the pipeline executed successfully in the debug mode. It seems a bug and reported it to the Microsoft Synapse team.

The validation error needed to be fixed so that the code can be released to the Production environment. The Azure Integration run time (IR) was used in the Data flows created by using the Canada Central region. However, IR must need to use 'Auto Resolve' as shown in Fig 3. 

Fig 3: Region as 'Auto Resolve'

And used the newly created IR in the pipeline which resolved the issue.

In summary, though your Azure resources can be created under one particular region e.g. all our resources are created under the region Canada Central, but; for Synapse Data Flows activity you need to create Azure IR by choosing Auto Resolve as the region.



Sunday, February 5, 2023

How to implement Continuous integration and delivery (CI/CD) in Azure Data Factory

Continuous Integration and Continuous Deployment (CI/CD) are integral parts of the software development lifecycle. As a Data Engineer/ Data Professional when we build the ETL pipeline using Azure Data Factory (ADF), we need to move our code from a Development environment to Pre-Prod and Production environment. One of the ways to do this is by using Azure DevOps.

Target Audience:

This article will be helpful for below mentioned two types of audiences.

Audience Type 1: Using DevOps code repository in ADF but CI/CD is missing

Audience Typ 2: Using ADF for ETL development but never used DevOps repository and CI/CD

For Audience Type 1 you can follow the rest of the blog to implement CI/CD. And for audience type 2, you need to first connect ADF with the DevOps repository, and to do so please follow this blog post and then follow the rest of this blog post for the CI/CD.

This blog post will describe how to set up CI/CD for ADF. There are two parts to this process one called 1) Continuous Integration (CI) and 2) Continuous Deployment (CD).

1) Continuous Integration (CI)

First, you need to log in to the Azure DevOps site from your organization and click the pipelines as shown in fig 1.

Pipeline creation

Fig 1: DevOps pipelines

And then click “New pipeline” as shown in fig 2

New Pipeline

Fig 2: New pipeline creation

And then follow the few steps to create the pipeline:

Step 1: Depending on where your code is located, choose the option. In this blog post, we are using the classic editor as shown below in fig 3.

Fig 3: Choose the right source or use the classic editor

Step 2: In this step, you need to choose the repository that you are using in ADF and make sure to select the default branch. We have chosen ADF_publish as a default branch for the builds.

                         

Fig 4: selecting the branch for the pipeline

Step 3: Create an Empty job by clicking the Empty job as shown in below figure 5

Fig 5: Selected Empty job

Step 4: You need to provide the agent pool and agent specification at this step. Please choose Azure Pipelines for the agent pool and windows latest for the Agent specification as shown in figure 6.

Fig 6: Agent pool and Agent specification

 

Now click "Save and Queue" to save the build pipeline with the name “Test_Build Pipeline” as shown below figure:

Fig 7: Saved the build pipeline

The build pipeline is completed, and the next part is creating the release pipeline which will do the continuous deployment (CD) means the movement of the code/artifacts from Development to the pre-prod and Production environment.

2) Continous Deployment (CD)

The very first step of Continuous Deployment is to create a new release pipeline. And this release pipeline will have a connection with the previously created build pipeline named "Test_Build Pipeline".


 Fig 8: Release Pipeline

As soon as you hit the new release pipeline following step will appear. Please close the right popup screen and then click +Add on the artifact as shown below figure:

 


  Fig 9: Create the artifact

Next step to connect the build pipeline and the release pipeline, you will find out the build pipeline that you created at the CI process earlier and choose the item “Test_Build Pipeline”

               

Fig 10: connecting the build pipeline from the release pipeline

Click on Stage 2 and select an empty job as shown in fig 10

empty job
  Fig 11: Empty job under release pipeline

After the last steps the pipeline will look like the below, now please click on 1 job, 0 tasks. We need to create an ARM template deployment task

job and tasks

Fig 12: Release pipeline job and task

Search for ARM template deployment, as shown in fig 13. ARM template Deployment task will create or update the resources and the artifacts e.g. Linked services, datasets, pipelines, and so on.

search for ARM template
  Fig 13: search for the ARM template

After adding the ARM template, you need to fill in the information for 1 to 12 as shown in the below diagram:

ARM template configuration

Fig 14: ARM Template information

  1. Display Name: Simply put any name to display for the deployment template
  1. Deployment Scope: You have options to choose the deployment scope from Resource Group, Subscription, or Management Group. Please choose Resource Group for ADF CI/CD.
  1. Azure Resouce Manager Connection: This is a service connection; if you already have a service principal you can set up a service connection or create a completely new one. The Cloud Infrastructure team in your organization can set it up for you. You will also find details about Service Connection in Microsoft Learn.
  1. Subscription: Please choose the right subscription where the resource group for the ADF instance resided for the pre-Prod or Production environment.
  1. Action: Please Choose "Create or Update resource group" from the list.
  1. Resource Group: The resource group name where the ADF instance is lying.
  1. Location: Resource location e.g. I have used Canada Central
  1. Template Location: Please choose either "Linked Artifact" or "URL of the file". I have chosen "Linked Artifact" which will connect the ARM template which is already built via Continuous Integration (CI) process.
  1. Template: Please choose the file "ARMTemplateForFactory.json"
  1. Template parameters: Please choose the template parameter file: "ARMTemplateParametersForFactory.json"
  1. OverrideTemplate Parameters: Make sure to overwrite all the parameters including pre-prod or production environment's database server details and so on. Go through all the parameters that exist in the Development environment you need to update those for the upper environments.
  1. Deployment mode: Please choose "Incremental" for the Deployment mode.

After putting all the above information please save it. Hence your release pipeline is completed.

However, to make this pipeline runs automated, means when you push the code from the master to publish branch you need to set up a Continuous deployment trigger as shown in fig 15

release trigger

fig 15: Continuous deployment Trigger

 

The last step of the CD part is creating a Release from the Release pipeline you just created. To do so please select the Release pipeline and create a release as shown in fig 16. 

Create release

Fig 16: Create a Release from the release pipeline

 You are done with CI/CD. Now to test it please go to your ADF instance choose the master branch and then click the 'Publish' button as shown below in figure 17. The C/CD process starts and the codes for pipelines, Linked Services, and Datasets move from the Development environment to Pre-Prod and then to Production.

publish from ADF

Fig 17: Publish from Azure Data Factory (ADF)

The blog post demonstrates step by step process of implementing CI/CD for ADF by using ARM templates. However, in addition to ARM template deployment; there is another ARM template named "Azure Data Factory Deployment (ARM)" which can also be used to implement CI/CD for ADF.

 

Friday, December 2, 2022

Step-by-step guidelines to provision Azure Synapse Analytics for your organization

This blog post will cover end-to-end deployment guidelines including the right roles and permission required to provision Azure Synapse Analytics for your organization.

Prerequisites: You need to have an Azure Portal login.

After logging into the Azure Portal search with the keyword "Synapse", you will find Azure Synapse Analytics as it is shown in the below diagram 1:


Fig 1: Finding out Azure Synapse Analytics


After you find it click the item and hit the "Create" button which will take you to the next screen which will look like below (Fig 2).

There are five different sections (tabs) you need to fill up to provision Azure Synapse Analytics. You will find each section elaborated on details below. 

 Basics: You need to fill up the basic information about the Synapse workspace.



Fig 2: Basic steps of Synapse configuration

1. Choose your subscription: Please choose the subscription where you want to provision the Azure Synapse Analytics resource

2. Choose or create a Resource Group: If you already have a resource group then you can choose the resource group or create a new one.

3. Managed resource group: Managed resource group will be automatically created, if you want to give a name please fill up this field otherwise it will be chosen automatically.

4. Workspace name: You need to pick up a globally unique name, it will suggest if the name is not unique.

5. Region: Please pick the region where you want to provision the Azure Synapse Analytics resource. Since I am in the Canadian region so put the region "Canada Central"

6. Data Lake Storage: Please choose Data Lake from the subscription or put it manually.

7. Data Lake Storage Account: This is the Data Lake Storage you have already created or if you need to create one please do so by choosing "Create new"

8.  Data Lake Storage File System: The file System name is a container name in the Data Lake storage.


This is how it looks after filling up the Basics information (shown in Fig 3)


Fig 3: After filling up the basic information


Security:

Let's move to the Security tab as shown below in figure 4:

Fig 4: Security 


The Security part is to connect with both serverless and dedicated SQL Pools. You can choose either local user and AAD login or only ADD login. I have chosen SQL and AAD login like in the old days when you provision SQL database instances. So you have both options available whenever or if required. 

And the check box "Allow network to Data Lake Storage Gen2 account" will be automatically chosen if you put the Data Lake Storage information the under the "Basics" tab. Synapse Serverless SQL pool required communication with Data Lake Storage and in this step Synapse workspace network allows to access a Data Lake Storage account.


Networking: You need to choose the right networking options for your organization. If you are provisioning for demo purposes then you can allow the public network or allow outbound traffic. However, if data security is top of your mind I would suggest following the below setup (fig 5) for the networking.

Fig 5: Networking

1. Managed virtual network: Enable the Managed Virtual network so that communication between resources inside Synapse happens through the private network.

2. Managed Private End Point: Create a managed private endpoint for the primary storage account (we did a storage account under the Basic tab and step #6)

3. Allow outbound Traffic:  I have set this "No" for not limiting only the approved Azure AD tenants. However, the data security is tightened through the next point #4

4. Public Network Access: Public network access has been disabled, which means there is no risk of exposing the data to the public network, and communication between resources will take place via private endpoints.

Tags: It's the best practice to have Tags. The tagging helps identify resources and deployment environments and so on.

Fig 6: Tags

Review and Create: It's the final steps that show you a summary for you to review. Please verify your storage account, database details, security, and networking settings before you hit the create button (shown below fig 7)


Fig 7: Review and Create



You have done with provisioning the Azure Synapse Analytics, as an Admin, you can work with the Azure Synapse Analytics. 

However, if any additional team members want to work with the Azure Synapse Analytics tool you need to do a few more steps.

You need to add a user to the Synapse workspace, as shown in fig 8




Fig 8: Synapse workspace
After clicking "Access control"  you will find "+Add" button to add user with the right role. At first you need to choose role as shown in below figure 9.



                                                           Fig 9: Choosing the right role
If users are data engineers or developers you may want to choose "contributor" role which I have chosen as shown in fig 9. After choosing the role you need to choose members, it can be individual members or AD group members.

Fig 10: Choosing the right member




The above fig 10 shown I have chosen a member and then click "Next' button to review and assign the role to the members. You have completed the steps for adding right role and members to the Synapse workspace.

After adding the right role and member to the Synapse workspace, you also need to add the user to the Azure Synapse Portal as shown in below fig 11. At first click "Access Control" and then by clicking "+Add" button you can assign members or AD group to the right role. If you are giving access to Data Engineers or Developers they will require Contributor role. In below fig 9, I have given Contributor role to the member.



Fig 11: Synapse administrator from the Synapse Portal

Hpwever, to have access to Serverless SQL Pool and Linked Service creation the members will require more permission. To know more about Synapse roles please go through this Microsoft documentation.

In summary, by following up the above step by step guidelines you can provision Azure Synapse Analytics for your organization. And please make sure through this process work closely with your organization's cloud infrastructure team who can guide you through all networking and security questions you may have.






Saturday, February 12, 2022

How do you secure sensitive data in a modern Data Warehouse?

In 2019 Canadian Broadcasting Corporation (CBC) news reported a massive data breach at the Desjardins Group, which is a Canadian financial service cooperative and the largest federation of credit unions in North America. The report indicated, a "malicious" employee copied sensitive personal information collected by Desjardins from their data warehouse. The data breach compromised the data of nearly 9.7 million Canadians.

Now the question is, how do we secure data warehouses so that employees of the same organization can't breach the data? We need to make sure sensitive data is protected inside the organization.

When any IT solution is built in an organization, there are two types of environment that exist, one is called non-production and the other is a production environment. Production and non-production environments are physically separated. A non-Production Environment means an environment for development, testing, or quality assurance, and the solution is not consumed by end-users daily basis from a non-production environment. However, the Production environment is a place where the latest version of the software or IT solution is available and ready to be used by the intended users.

As stated at the beginning, a rogue employee was involved in the massive data breach at the Desjardins Group. Hence; an organization building a data-driven IT solution needs to work on setting up both Production and Non-Production environments secure way. This article will describe how sensitive data can be protected in both the Production and Non-Production environments.

A. Protecting Sensitive Data in Non-Production Environment in a Data Warehouse:

In general, a Non-Production environment is not well guarded with security. Different personas can have access to a Non-Production environment in a data warehouse e.g. Developers, Testers, business stakeholders. 

So it's important to protect sensitive data inside the organization. The very first thing we need to do is whenever copying data from any application to a data warehouse (non-production environment) sensitive data need to be scrambled. 

Fig 1: Moving data from IT application to non-prod data warehouse

There are a few steps that can help us to scramble the data in  Non-Production Environment in a Data Warehouse:

Step 1: Business or data steward find the list of sensitive or Personal Identifiable Information (PII) data

Step 2: Data Engineer or ETL Developer will use any standard tool like Azure Data Factory (ADF) to mask the data and store it in the data warehouse.

Step 3: Either Test Engineer or Business Stakeholder will verify all the sensitive columns in the database before it releases to the rest of the team.


B. Protecting Sensitive Data in Production Environment in a Data Warehouse:

In a Production environment, we can't scramble the data in such a way that is irreversible. We need to keep the original data intact but make sure only the intended users have access to the data. So if we can mask the columns that hold sensitive or PII data in such a way so that only privileged users get access to the data. Below figure shows what is expected from the solution:




Fig 2: Protect sensitive data from Data Warehouse (PROD)


As shown in Fig 2, when users try to access the data via an application such as Power BI, only intended users will be able to see the intact data. Non-intended users will find the data obfuscated. The above-explained process can be done by using dynamic data masking provided by Microsoft Databases. The process only masks the data on the fly at the query time. If you would like to learn about dynamic data masking, please follow the Microsoft document.

In Summary, whenever PII data is taken from the operational system to the Non-Production environment to build any analytics solution data need to be scrambled. And in a Production environment, though dynamic data masking can prevent viewing the data by unintended users, however; it's important to properly manage database permission on the Data Warehouse. As well as, make sure to have auditing enabled to track all activities taking place in the Data Warehouse in the Production environment.