Pages

Wednesday, February 19, 2020

Azure Data factory(ADF): Important components you need to know.

If you are new to Azure Data Factory and started to build the very first pipeline. Assuming, you already found the source and target. Your source data either from on premise or cloud database or blob storage or from any Web API. Whether you connect with storage account or on-premise /cloud database you require Linked Services. Linked Services is like Connection string.

For example, if you like to get data from Blob storage and insert into the cloud database then you will require two linked services. One linked service is your source connection from blog storage to the data factory and another is the target connection which connect from data factory to database.


Creating Linked service is easy, following a few steps via wizard, however; when you create Linked service  one of the step will be creating or using called Integration Runtime.  Since there are three different type of Integration Runtime (IR), you may get confuse with which one to choose:

Let's explain three different type of Integration Runtime,

1. Azure
2. Self-Hosted
3. Linked Self-Hosted

Please note that, there is one more Integration Runtime which called Azure-SSIS integration Runtime, this IR is required when you shift your SSIS package to Azure. We will not cover SSIS IR in this post.

1. Azure
It's the default IR, means if you don't create anything then it will create automatically and give the name as 'AutoResolveIntegrationRuntime' like as below figure 1.

Fig 1: AutoResolveIntegrationRuntime

If the data integration capabilities like Data Flow, Data movement and Activity dispatch is under public network then 'Azure IR' is used. If Azure IR is not automatically created then you can create manually by clicking the '+New' like below fig 2


Fig 2: Choose to create IR

 and you will find below window to create Azure IR (Fig 3)

Fig 3: Choose from different IR

When IR is created, you can find them running under data factory-> Connections

Fig 4: IR running under Data Factory



2. Self-Hosted:

If the source or target is under private network e.g. database maybe resides on premise in that case you must need to create Self-Hosted Runtime. There are a few steps to create Self-Hosted Runtime. While you are working with Azure Data factory, you can create it via data factory UI.
please follow the link to create Self-Hosted IR : https://docs.microsoft.com/en-us/azure/data-factory/create-self-hosted-integration-runtime#create-a-self-hosted-ir-via-azure-data-factory-ui



3. Linked Self-Hosted: 

If you already have Self-Hosted IR and need to use the same for other Data Factory then you don't need to create  new Self-Hosted IR rather you can link the existing Self-Hosted to the other data factory.