Sunday, 23 January 2011

Cast and Convert to avoid the Data Conversion Transformation

Case
The Data Conversion transformation works fine for converting for example a non-unicode string to an unicode string, but it has one annoying feature. It creates an extra column (with the prefix "Copy of ") which can be confusing. Is there an alternative?

Solution
There is, but only for database sources (not for flatfiles and suchlike). You can use the Cast or Convert statement in your source query for data conversions. It moves some of the 'workload' from SSIS to your database engine.
Data Conversion in source



















And of cource it also works in a Lookup


















--CAST (ANSI standard)
SELECT  CAST(someColumn AS nvarchar(50)) AS someColumn
FROM    someTable

--CONVERT (Microsoft specific)
SELECT  CONVERT(nvarchar(50), someColumn) AS someColumn
FROM    someTable

The difference between the Cast and Convert is that the CAST is an ANSI standard and the CONVERT is Microsoft specific and the CONVERT has an extra optional Style option. More information about that is available at MSDN.

Performance
My simple test case was processing a million records and convert a column via a CAST and via the standard SSIS component. And run those two different packages a thousand times.
Testing with a million records














After a thousand times the CAST had an average of 4 seconds and the Data Conversion transformation had an average of 5,3 seconds. A negligible result on simple test like this, that was not that thoroughly. But no annoying, unnecessary and confusing columns! :-)

What to do for the non-database sources?
Well, Todd McDermid developed a pattern to bypass the confusing extra columns. His workaround consists of using a Data Conversion transformation immediately followed by an Union All filtering the unnecessary columns and renaming the converted columns. And he even developed an open source replacement for the Data Conversion transformation if the CAST or CONVERT option is not possible for your source. But be warned. These solution will decrease the performance because they are asynchronous.

You could also support this feedback at Microsoft.com to just hide unnecessary columns.

2 comments:

  1. How to handle when error fails? How do you capture those data?

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.