I have a large XML file with Orders and Onderlines which I want to (merge) join into a new destination. To join the two outputs I need to order them, but the sort transformation takes too much time. Is there a faster alternative?
|XML Source with two joined outputs|
The solution is surprisingly quite simple: the outputs are already sorted and you only have to tell SSIS that (Similar to a source with order by in the query).
For XML files with multiple levels (first for orders and second for orderlines) like below, SSIS will create two output ports.
The outputs will have an extra bigint column which allows you to connect the orderlines to the correct order.
|Two outputs with additional ID column|
Instead of using these ID columns in the SORT transformations, you can also use the advanced editor of the XML source to tell SSIS that these columns are already sorted. Right click the XML source and choose 'Show Advanced Editor...'.
|Show Advanced Editor...|
Then go to the last page 'Input and Output Property' and select the Orderline output. In the properties of this output you can tell SSIS that the output is sorted.
|Set IsSorted to true|
Next expand OrderLine and then Output Columns and click on the additional ID column 'Order_Id'. In its properties locate the SortKeyPosition and change it from 0 to 1.
|Set SortKeyPosition to 1|
Repeat this for the second output called 'Order' and then close the advanced editor. If you still have the SORT transformations, you will notice the yellow triangle with the exclamation mark in it. It tells you that the data is already sorted and that you can remove the SORT transformations.
And if you edit the Data Flow Path and view the metadata you will see that the column is now sorted.
|Sorted! Remove the SORT transformations!|
The solution is very simple and perhaps this should have been the default sort key position anyway? It smells like a bug to me...