It's all about Data: July 2020

Sunday, July 26, 2020

How to deploy SQL Server big data cluster by using Azure data studio?

There are different ways to deploy SQL server 2019 big data cluster. However, this blog post will use Azure data studio to deploy big data cluster.

The very first thing you need to have is Azure data studio, if you don't have installed Azure data studio then please find it: https://docs.microsoft.com/en-us/sql/azure-data-studio/download-azure-data-studio?view=sql-server-ver15

Step 1: After you open Azure data studio you will find below UI:

Fig 1: Azure data studio

Step 2: Choose SQL server big data cluster

Fig 2: Choose SQL Server Big Data Cluster

Step 3: Missing prerequisite

kubectl
Azure CLI
azdata

Please install those.

Fig 3: Missing prerequisite

After prerequisite have been installed you will find below UI

Fig 4: After prequisite is installed

Step 4: Deployment Configuration Profile

Fig 5: Configuration profile

Step 5.A: Azure Settings:

Azure settings where resource group will be created, make sure you have enough permission to create resource group, if you don't have permission then it will fail.

Fig 5: Azure settings

If failed then the error message will show like below:

Fig 5.1: Error while creating resource group

Step 5.B: Cluster Settings: Your setup cluster name and credential which you will require later to connect the cluster, so please keep it safe.

Fig 7: SQL Server cluster settings

Step 5.C: Service and storage settings: It will fill automatic but please adjust as per your need.

Fig 8: Service settings.

Step 6: Script to Notebook

Last step of the wizard, where as soon as click the 'Script to Notebook' it will open Notebook in Azure studio.

Fig 9: Script to Notebook

However, if this is the first time you are deploying big data cluster then python need to be installed, so you will find below UI:

Fig 10: install python

When you are done then go and hit 'Run All'

Fig 11: Execute the script

You may find error where pandas is still missing

Fig 12: Pandas is missing

So you need to install Pandas package from Azure data studio, Go to Package manager

Fig 13: Finding package manager in Azure data studio

And then find pandas under package manager, as soon as you find them , hit the install button.

Fig 14: Install pandas

When installation is done, then hit the 'Run All' from Notebook again, this time it will successfully start running and you will find , one of the step will login to Azure portal where automatically redirect to the portal and you will find like below UI where you don't need to do anything.

Fig 15: Azure login

When deployment is over , your big data cluster should be ready to use, so you must need to connect that. You will find all the end points after deployment is completed, please take the end point for 'SQL Server Master Instance Front-End'

Fig 16: Click to connect the Cluster

After connecting the big data cluster, it will look like below:

Fig 17: Connect with big data cluster

If you would like to remove the cluster then either you delete it by using azdata command in the command shell.

Delete cluster:

azdata bdc delete -n mssql-cluster

Fig 18: delete cluster

Or you can completely remove the the resource group from Azure portal.

Pages

Sunday, July 26, 2020

How to deploy SQL Server big data cluster by using Azure data studio?