A client of mine had a fact table with a lot of unknown dimension values because the source had a lot of garbage. He built a construction to log all facts with unknown dimensions (id = -1) to a log table with the original lookup value so he could make reports of all garbage values to correct the source. This is a simplified version of the construction:
Conditional Split, Multicast and Union All |
This works fine, but there is a more elegant solution with a script component. Let's call it the Conditional Multicast script. I stole this idea from SSIS Junkie.
Solution
1) Remove the Conditional Split, Multicast and Union All. Add a Script Component instead.
Script component (transformation) |
2) Select all dimensionkeys as ReadOnly input columns.
ReadOnly Input Columns |
3) Rename the default Output port to "All" and change the ExclusionGroup to a non-zero value (let's say: 1).
Change default output port |
4) Now add a second Output port with the name "Unknown". Change the ExclusionGroup to the same value as in the previous step ("1") and synchronise it with the Input port.
Second output port |
5) Now the script (I used C# in this example). SSIS will generate some default methods but we only need Input0_ProcessInputRow for this solution. Create an If-statement to check if there are any unknown dimensions and redirect them to the second port (Unknown). After the If-statement redirect all facts to de default output port (All).
// This script redirects all fact with unknown dimensions to an other target using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { // Method that will be started for each record in you dataflow public override void Input0_ProcessInputRow(Input0Buffer Row) { // Check if there are unknown dimensions if (Row.TimeKey == -1 || Row.LocationKey == -1 || Row.ColorKey == -1) { // Redirect all facts with unknown dimensions Row.DirectRowToUnknown(); } // Direct all facts (including the unknown) to the fact table Row.DirectRowToAll(); } }
6) Now reconnect the targets and run the package to see the result:
More elegant solution |
Note: you don't have to do this for the performance improvement (because there hardly is any), but do it for the simplicity.
No comments:
Post a Comment
Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.
All comments are moderated manually to prevent spam.