Sunday, 19 December 2010

Conditional Multicast Script

Case
A client of mine had a fact table with a lot of unknown dimension values because the source had a lot of garbage. He built a construction to log all facts with unknown dimensions (id = -1) to a log table with the original lookup value so he could make reports of all garbage values to correct the source. This is a simplified version of the construction:
Conditional Split, Multicast and Union All









































This works fine, but there is a more elegant solution with a script component. Let's call it the Conditional Multicast script. I stole this idea from SSIS Junkie.

Solution
1) Remove the Conditional Split, Multicast and Union All. Add a Script Component instead.
Script component (transformation)





















2) Select all dimensionkeys as ReadOnly input columns.
ReadOnly Input Columns



















3) Rename the default Output port to "All" and change the ExclusionGroup to a non-zero value (let's say: 1).
Change default output port



















4) Now add a second Output port with the name "Unknown". Change the ExclusionGroup to the same value as in the previous step ("1") and synchronise it with the Input port.

Second output port


















5) Now the script (I used C# in this example). SSIS will generate some default methods but we only need Input0_ProcessInputRow for this solution. Create an If-statement to check if there are any unknown dimensions and redirect them to the second port (Unknown). After the If-statement redirect all facts to de default output port (All).
// This script redirects all fact with unknown dimensions to an other target
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Method that will be started for each record in you dataflow 
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Check if there are unknown dimensions
        if (Row.TimeKey == -1 || Row.LocationKey == -1 || Row.ColorKey == -1) 
        {
            // Redirect all facts with unknown dimensions
            Row.DirectRowToUnknown();
        }
        // Direct all facts (including the unknown) to the fact table
        Row.DirectRowToAll();
    }

}

6) Now reconnect the targets and run the package to see the result:
More elegant solution





















Note: you don't have to do this for the performance improvement (because there hardly is any), but do it for the simplicity.

No comments:

Post a Comment

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.