When you work with ETL and the source file is JSON, many documents may get nested attributes in the JSON file. Your requirements will often dictate that you flatten those nested attributes. There are many ways you can flatten the JSON hierarchy, however; I am going to share my experiences with Azure Data Factory (ADF) to flatten JSON.
The ETL process involved taking a JSON source file, flattening it, and storing in an Azure SQL database. The attributes in the JSON files were nested, which required flattening them. The source JSON look like this:
"name": "Tom Hanks",
The above JSON document has a nested attribute, Cars. We would like to flatten these values that produce a final outcome look like below:
"name": "Tom Hanks",
How do we do it by using ADF?
Let's create a pipeline that includes the Copy activity, which has the capabilities to flatten the JSON attributes. Let's do that step by step.
First, create a new ADF Pipeline and add a copy activity.
Next, we need datasets. You need to have both source and target datasets to move data from one place to another. In this case source is Azure Data Lake Storage (Gen 2). The target is Azure SQL database. The below figure shows the source dataset. We are using a JSON file in Azure Data Lake.
We will insert data into the target after flattening the JSON. the below figure shows the sink dataset, which is an Azure SQL Database.
Please note that, you will need Linked services to create both the datasets, this article will not go into details about Linked Services, to know details you can look into the Microsoft document.
3. Flattening JSON
After you create source and target dataset, you need to click on mapping as shown below figure 4 and follow the steps:
After you have done above, then save it and execute the pipeline. You will find flatten records are inserted to the database as shown in fig 5.
Make sure to choose "Collection Reference" as mentioned 3.b, if you forget to choose that then the mapping will look like below Fig 6:
In summary, I found Copy Activity in Azure Data Factory make easier to flatten the JSON, you don't need to write any custom code which is super cool.