Sunday, April 25, 2021

Handling SQL DB row-level errors in ADF (Azure data factory) Data Flows

If you are working with ADF (Azure data factory) data flows then you may have noticed there is a new feature released in Nov 2020 which is useful to capture any error while inserting/updating the records to the SQL database.

Fig 1: Error row handling at sink database

For error handling there are two options to choose from:

1) Stop on first error (default)

2) Continue on error

                                                    Fig 2: Error row handling options

By default, ADF pipeline will stop at the error. However, the main purpose of this feature to use option "Continue on error" to catch and log the error so that we can look at later and take action accordingly. 

Let's fill up the settings to catch errors rows, below figures show the settings and will also describe each setup (Please follow the numbering in the figure 3).

1) Error row handling: Since we wanted to catch the error so we have chosen "Continue of error" at Error row handling.

Fig 3: Settings Continue on error

2) Transaction Commit: Choose whether the data flow will be written in a single transaction or in batches, I have chosen single, it means whenever there is failure it will store the record on the other hand batch will store error records when full batch is completed.

3)Output rejected data: You need to make this check mark TRUE to store the error rows. The whole point of error row handling is you want to know the error records; if so, please tick check mark. Though you can avoid this, in that case pipeline will run but if there is any error you will not know which records causes the error.

4) Linked Service: Put the linked service and test the connection

5)Storage folder path: Storage path need to mention here, it's the path where you would like to store the error records in a file.

6)Report success on error: I don't put report on success checkbox to TRUE since I wanted to know if there is a failure.

After the settings, when you run the pipeline and if there is any error in the dataset, it will be stored in your storage folder as you have provided at point no 5 in the settings.

In general, when there is a failure at the time of inserting records to the database it takes sometime to find out the reason of failure. You may have to go through large chunk of dataset and look for miss match of data types or NULL value etc. to find out the root cause. Through this feature the error records will be captured and stored in the storage so you will be able to identify the reason for any error very quickly. And if you would like to ingest those error rows then you can fix those records and re-run the pipeline.