I was working with a pipeline where data needs to move from source and load into RAW zone in ADLS. As soon as loaded the file it got date time stamp appended end of the file. e.g. source file name is: customer.csv and when it's landed to RAW zone, then file name will be : customer_2020-09-28T12:15:32.csv
How do you add date time stamp end of the file?
Adding dynamic content at the sink pipeline like below (Fig 1) will do the task.
In the azure data lake storage file name with ':' did not give any issue while creating the file name as well viewing it.
However, as soon as I use that file in the Data Flow activity and debug the code (when apache spark engine fire) then below error : java.lang.illegalArgumentException:java.net.URISyntaxException:....
How to resolve the error?
You can't have file with ':', which also true if you try to create a file in Windows OS with ':' in it. It will throw error, however, interestingly that's not the case when you create a file with same name in Azure data lake storage (HDFS is working behind).
To resolve this, I updated the format of timestamp part while adding end of the each file, instead of using yyyy-MM-ddTHH:mm:ss , I have used yyyy-MM-ddTHHmmss. so I get the file name as: customer_2020-09-28T121532.csv
 
1 comment:
Nicely done, Thank you for sharing such a useful article. I had a great time. This article was fantastic to read. continue to write about
Data Engineering Solutions
Post a Comment