Sunday, 14 April 2013

Performance Best Practice: Flat File (Fast) Parse

Here are some tips to speed up the reading of flat files within the SSIS data flow. They are especially handy for importing large flat files (or when you merge join your small flat file to a large dataset). Start, where possible, by reading the files from a fast drive (a Solid State Disk / preferably not used by Windows or SQL Server) instead of some share.

Minimum datatype
By default all columns are string 50 in a Flat File Connection Manager. To get as many rows in a data flow buffer it's important to use a minimum data type for each column. And if it's for example an integer column then parse it to an integer in the Flat File Connection Manager instead of parsing it in the data flow with a Data Conversion or Derived Column Transformation. Otherwise you end up with two columns in your buffer instead of one.

A good start is to use the Suggest Types button in the Flat File Connection Manager editor. It will scan a couple of rows from your flat file and come up with a minimum data type.

Suggest Types



















Unused columns
In the Flat File Connection Manager it's impossible to skip columns that you don't need, but to minimize pressure on performance you should not parse unneeded columns. Parsing/converting is expensive. Just leave it string. In the Flat File Source editor you can uncheck the unneeded columns.
Leave it string, don't parse.




















Uncheck unneeded flat file columns



















Fast Parse
If you have a so called 'trusted' source (for example a database export) then you don't have to worry for mistakes in the data types. To speed up the performance you can enable fast parse for all non-string columns. But for date columns you have to be sure that the format is correct. Try using the ISO format YYYY-MM-DD.
You can enable fast parse in the Advanced Editor of the Flat File Source. Right click it and choose "Show Advanced Editor...".
Show Advanced Editor...






















Then go to the Input and Output Properties tab and then to the Output Columns. Select a non-string column and set the FastParse property to true. Repeat this for all non-string columns.
Enable FastParse





















Bulk Insert Task
If your destination is a SQL Server table and you don't need data transformations then you might want to consider/test the Bulk Insert Task as a alternative for the Data Flow Task.

Summary
Minimum data types
Parse in Connection Manager
No unnecessary parsing
Fast Parse

More info: Blog of Jamie Thomson and Henk van der Valk or MSDN

No comments:

Post a Comment

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.