Wednesday, December 30, 2020

ADF data flow: can particular date format generate NULL value?

I am going to share recent finding in ADF data flow where my source data in .csv got correct date. However, as long as when I did some transformation and saved in .parquet found those date all got empty values. This blog post will describe the issue and the resolution.

Source data in .csv have startDate and CloseDate like below figure 1.0 where format of the date is MM/dd/yyyy

Fig 1.0: Date in the source .csv

I have used ADF data flow to cleanup/transform the files and saved into .parquet format. However, in the .parquet file these two date columns 'StartDate' and 'CloseDate' become empty.


Fig 1.1: Dates become empty in .parquet

After looking into the dataflow and specific looking at the projection of the source found auto detect date format ‘MM/dd/YYYY’ which is original source date format.

Fig 1.3: Date format auto detected in the data flow

 And when previewed the data those date shown as NULL which was kind of weird.


Fig 1.4: Date shown as NULL in the data preview

How to solve it? 

To fix this issue, what you need to do is, go under projection and change the date format to ‘yyyy-MM-dd’ as like below figure 2.0

Fig 2.0: Change date format


 And you can go and see the preview, it looks good now.


Fig 2.1: After changing the date format preview looks perfect


 Note that, I have tried with other format from projection such as yyyy/MM/dd and so on but those did not resolve the issue.

Fig 2.2: 

Other Solution?

 You can also take other approach,  change the format from 'date' to 'string'  under the projection

Fig 3.0: change data type from date to string

And then use derive column activity 

Fig 3.1: get 'derived column' in the data flow

Now, use expression to convert into the correct date format : toDate(OpenDate,'yyyy-MM-dd') 

Fig 3.2: expression to convert from string to date

In summary, if you find any discrepancy at output file's date columns then look closely the date format, preview the data in the ADF data flow then either change the format from source projections or use derived column activities to fix it.