When you work with ETL and the
source file is JSON, many documents may get nested attributes in the JSON file.
Your requirements will often dictate that you flatten those nested attributes.
There are many ways you can flatten the JSON hierarchy, however; I am going to
share my experiences with Azure Data Factory (ADF) to flatten JSON.
The ETL process involved taking a
JSON source file, flattening it, and storing in an Azure SQL database. The
attributes in the JSON files were nested, which required flattening them. The
source JSON look like this:
{
"id":
"01",
"name":
"Tom Hanks",
"age":
20.0,
"email":
"th@hollywood.com",
"Cars":
{
"make": "Bentley",
"year": 1973.0,
"color": "White"
}
}
The above JSON document has a nested
attribute, Cars. We would like to flatten these values that produce a final
outcome look like below:
{
"id":
"01",
"name":
"Tom Hanks",
"age":
20.0,
"email":
"th@hollywood.com",
"Cars_make":
"Bentley",
"Cars_year":
"1973.0",
"Cars_color":
"White"
}
How do we do it by using ADF?
Let's create a pipeline that
includes the Copy activity, which has the capabilities to flatten the JSON
attributes. Let's do that step by step.
First, create a new ADF Pipeline and
add a copy activity.
Next, we need datasets. You need to
have both source and target datasets to move data from one place to another. In
this case source is Azure Data Lake Storage (Gen 2). The target is Azure SQL
database. The below figure shows the source dataset. We are using a JSON file
in Azure Data Lake.
We will insert data into the target
after flattening the JSON. the below figure shows the sink dataset, which is an
Azure SQL Database.
Please note that, you will need Linked services to create both the datasets, this article will not go into details about Linked Services, to know details you can look into the Microsoft document.
3. Flattening JSON
After you create source and target dataset, you need to click on mapping as shown below figure 4 and follow the steps:
After you have done above, then save it and execute the pipeline. You will find flatten records are inserted to the database as shown in fig 5.
Be
cautious
Make sure to choose "Collection Reference" as mentioned 3.b, if you forget to choose that then the mapping will look like below Fig 6:
In summary, I found Copy Activity in Azure Data Factory make easier to flatten the JSON, you don't need to write any custom code which is super cool.