UPDATE: 2012 version has been released
How does it work? If you load a large file in to a staging table it looks something like the flow below. Out of the box SSIS doesn't process this via multi-threading; one flow, one thread.
The Balanced Data Distributor introduces parallelism in the data flow and multi-processor and multi-core servers will profit from that new feature. SSIS can now distribute the work over multiple threads which makes your data flow a lot faster.
|Balanced Data Distributor|
When do you use this new Balanced Data Distributor?
- You have a large amount of data coming in
- You can read faster than you can process it in your data flow
- The destination supports parallelism (a flat file target wont work because it gets locked)
- Your package runs on a multi-processor and multi-core server
More info about that and the bottlenecks in this MSDN Blog.
If your destination doesn't support parallelism or you want for example to aggregate all data then you can use a Union All. Now only part of the flow is multi-threading.
|Part of the data flow is multi-threading.|
Alternatives for Balanced Data Distributor.
There are a few constructions already available in SSIS to get parallelism.
1) A Conditional Split with a modulo expression to split the rows into multiple streams
2) A Script Component with multiple outputs and a modulo construction in .Net
|Conditional Split with a Modulo expression|
|Example of a modulo expression|
For more details about parallelism with a Condition Split can be found in this great blog article from fellow Dutchman Henk van der Valk.
A nice start to get parallelism into SSIS, but there already where some possibilities. Hopefully this component wont be necessary in the future versions of SSIS. An automatic form of multi-threading would be nice!