Saturday, 9 July 2011

Balanced Data Distributor

A few weeks ago Microsoft quietly released a new Data Flow transformation for SSIS 2008 and 2008 R2: Balanced Data Distributor. This transform takes a single input and distributes the incoming rows to one or more outputs uniformly via multi-threading.

UPDATE: 2012 version has been released

How does it work? If you load a large file in to a staging table it looks something like the flow below. Out of the box SSIS doesn't process this via multi-threading; one flow, one thread.
Single flow













The Balanced Data Distributor introduces parallelism in the data flow and multi-processor and multi-core servers will profit from that new feature. SSIS can now distribute the work over multiple threads which makes your data flow a lot faster.

Balanced Data Distributor


















When do you use this new Balanced Data Distributor?
- You have a large amount of data coming in
- You can read faster than you can process it in your data flow
- The destination supports parallelism (a flat file target wont work because it gets locked)
- Your package runs on a multi-processor and multi-core server
More info about that and the bottlenecks in this MSDN Blog.

If your destination doesn't support parallelism or you want for example to aggregate all data then you can use a Union All. Now only part of the flow is multi-threading.
Part of the data flow is multi-threading.























Alternatives for Balanced Data Distributor.
There are a few constructions already available in SSIS to get parallelism.
1) A Conditional Split with a modulo expression to split the rows into multiple streams
2) A Script Component with multiple outputs and a modulo construction in .Net
Conditional Split with a Modulo expression












Example of a modulo expression
























For more details about parallelism with a Condition Split can be found in this great blog article from fellow Dutchman Henk van der Valk.

Conclusion
A nice start to get parallelism into SSIS, but there already where some possibilities. Hopefully this component wont be necessary in the future versions of SSIS. An automatic form of multi-threading would be nice!

No comments:

Post a Comment

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.