Thursday, 24 March 2011

Sorting in SQL vs sorting in SSIS

Case
When I join two flows in SSIS, the flows need to be sorted. According to the Performance Best Practices I try to sort them in the source components because that's better for performance. Ofcource that's not always possible, for instance when your source is a flat file, and then you need the SSIS sort component.

But watch out when combining these two sort methods. Because SSIS and SQL don't sort the same way! The sort of SSIS is case sensitive and the sort in SQL Server is case insensitive. See the result when you combine these two. This will result in unwanted situations and missing joins. Is there a solution?
Click to enlarge





















Solution
This has to do with the different sorting collations. SSIS uses the Windows collation (Case-Sensitive) and SQL Server uses SQL collation (Case-Insensitive, by default). You can either adjust the SQL sort to SSIS or the SSIS sort to SQL.

Adjust SQL Sort to SSIS: Case-Sensitive ORDER BY
You can solve this by changing the source query of the sorted source:
-- Notice the extra COLLATE
SELECT     myStringColumn
FROM       myTable
ORDER BY   myStringColumn
COLLATE    Latin1_General_CS_AS_WS


The suffix CS_AS_WS stands for Case-sensitive, accent-sensitive, kana-insensitive, width-sensitive. See the compleet suffix list at msdn. This new query will result in a correct join in SSIS.
Both the same sort






















Adjust SSIS Sort to SQL: Case-Insensitive Sort Transformation
Important: This solution has one side affect. The merge join will also be Case Insensitive!

1) Source
In the sorted OLE DB source you already used the advanced editor to tell SSIS this source is sorted with an ORDER BY in the query, but you now also need to tell SSIS that it is sorted Case-Insensitive by setting the ComparisonFlags property to Ignore case. Note: this doesn't change the actual sorting. You are just telling SSIS how it is sorted.
Indicate that source is sorted Case-Insensitive
























2) Sort
In the SORT Transformation you also need to set the Comparison Flags property to Ignore case. Note: This will change the actual sorting.

Setting the Comparison Flags property to Ignore case

























3) Result
Now both flows are sorted the same and the Merge Join works as a Case-Insensitive Inner join (similar to a T-SQL INNER JOIN).
Like a T-SQL Inner Join


















3 comments:

  1. Actually - the sort order in SQL Server will be dependent on your collation. If you're using a case sensitive collation, the SQL Server sort will also be case sensitive.

    ReplyDelete
    Replies
    1. 100% correct, but this is just an example to make you aware of the differences... and most servers I see are (luckily) case-insensitive.

      Delete
  2. I didn't find that either of these methods worked. The only way I couldn't the merge join to work is to use the SSIS sort. I'm using a multi text column join criteria

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.