Wednesday, 12 January 2011

Create a Row Id

Case
If you add records to a database table with an ID column, you preferably would use an identity column (Identity Specification, Is Identity = Yes. But what if your destination does not support an auto-identity or you're not allowed to modify it? You could do this with our Rownumber Component or a Third Party component, or..

Solution
You can use a Script component to accomplish an auto-identity column. In this example I will get the highest ID from a table and use that number as a starting number for new records.

1) Add variable
Add an integer variable named Counter to store an ID.
Right click in Control or Data Flow to show variables














2) Add Execute SQL Task and Data Flow Task
Add an Execute SQL Task to your Control Flow and add a Data Flow Task right behind it.
Execute SQL Task















3) Get max ID
Edit the SQL task and change the ResultSet from None to Single Row. Select the right Connection and enter the query to get the highest ID.
General Tab of  the Execute SQL Task



















-- Get highest ID
SELECT  MAX([Id]) as MaxId
FROM    [YourTable]

4) Result Set
Continue editing the SQL task and go to the Result Set tab. Connect the field MaxId from the query to your variable Counter. After this the Execute SQL Task is ready.
Result Set



















5) DataFlow
Now go to your Data Flow. Add a random source, a Script Component (transformation) and a destination (the same table as in your Execute SQL Task. Give them suitable names. The result should look something like this.
Data Flow






















6) The script Component
We need a new column to store the RowId in. Add a new column on the tab Inputs and Outputs. The type should be an integer, size depends on the column size in your database table.
New column RowId



















7) The script itself
SSIS create 3 methods for you: PreExecute to get the MaxId from the variable, Input0_ProcessInputRow to fill the new column RowId and optional PostExecute to fill the variable with the new MaxId after all the records have passed. This third method is only required if your need that number somewhere else.
// C# code: surrogate key script
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // New internal variable to store the rownumber
    private int rowCounter = 0;

    // Method that will be started before the rows start to pass
    public override void PreExecute()
    {
        base.PreExecute();

        // Lock variable for read 
        VariableDispenser variableDispenser = (VariableDispenser)this.VariableDispenser;
        variableDispenser.LockForRead("User::Counter");
    
        IDTSVariables100 vars;
        variableDispenser.GetVariables(out vars);

        // Fill the internal variable with the value of the SSIS variable
        rowCounter = (int)vars["User::Counter"].Value;

        // Unlock variable
        vars.Unlock();
    }

    // Method that will be started after all rows have passed
    // This method is optional. Only add it if you are gonna
    // use the SSIS variable after the dataflow is finished.
    public override void PostExecute()
    {
        base.PostExecute();

        // Lock variable for write
        VariableDispenser variableDispenser = (VariableDispenser)this.VariableDispenser;
        variableDispenser.LockForWrite("User::Counter");

        IDTSVariables100 vars;
        variableDispenser.GetVariables(out vars);

        // Fill the SSIS variable with the value of the internal variable
        vars["User::Counter"].Value = rowCounter;

        // Unlock variable
        vars.Unlock();
    }

    // Method that will be started for each record in you dataflow  
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Seed counter
        rowCounter++;
        // Fill the new column
        Row.RowId = rowCounter;
    }
}

Or VB.net

' VB.Net code: surrogate key script 
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    ' New internal variable to store the rownumber
    Private rowCounter As Integer = 0

    ' Method that will be started before the rows start to pass 
    Public Overrides Sub PreExecute()
        MyBase.PreExecute()

        ' Lock variable for read  
        Dim variableDispenser As VariableDispenser = CType(Me.VariableDispenser, VariableDispenser)
        variableDispenser.LockForRead("User::Counter")

        'Use IDTSVariables90 if you're using SSIS 2005
        Dim vars As IDTSVariables100
        variableDispenser.GetVariables(vars)

        ' Fill the internal variable with the value of the SSIS variable
        rowCounter = CInt(vars("User::Counter").Value)

        ' Unlock(Variable)
        vars.Unlock()
    End Sub

    ' Method that will be started after all rows have passed
    ' This method is optional. Only add it if you are gonna
    ' use the SSIS variable after the dataflow is finished.
    Public Overrides Sub PostExecute()
        MyBase.PostExecute()

        ' Lock variable for write 
        Dim variableDispenser As VariableDispenser = CType(Me.VariableDispenser, VariableDispenser)
        VariableDispenser.LockForWrite("User::Counter")

        'Use IDTSVariables90 if you're using SSIS 2005
        Dim vars As IDTSVariables100
        variableDispenser.GetVariables(vars)

        ' Fill the SSIS variable with the value of the internal variable 
        vars("User::Counter").Value = rowCounter
        ' Unlock variable 
        vars.Unlock()
    End Sub

    ' Method that will be started for each record in you dataflow   
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        'Seed counter
        rowCounter = rowCounter + 1
        ' Fill the new column
        Row.RowId = rowCounter
    End Sub
End Class

8) Map in Destination
Make sure you don't forget to map the new column RowId in your destination. Now run your package to see the result.

An other option to create an unique RowId is to use a GUID instead of an integer. The Script component needs only one method for this solution.
// C# code
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    // Create a Globally Unique Identifier with SSIS
    Row.Guid = System.Guid.NewGuid();
}

' VB.Net code
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    ' Create a Globally Unique Identifier with SSIS
    Row.Guid = System.Guid.NewGuid()
End Sub

Detailed information about that can be found here.

8 comments:

  1. i have no idea, why i run your scrip in my ssis, there is exception Arithmetic operation resulted in an overflow, i have no idea why, could you help me to fix up

    ReplyDelete
  2. @Anonymous: make sure that the SSIS variable and the Script Component variable have the right data types. They should be integers. If int32 doesn't fit the number of records then you could use int64.
    If this isn't the solution, then please contact me and give me the exact error message.

    ReplyDelete
  3. Seems like a lot of work for a surrogate key. I hope they address this in SQL server 2012 with a sequence generator transformation. You shouldn't have to be a c# or vb coder to do a simple task like this.

    ReplyDelete
  4. @Anonymous: there are already third party components for those who don't want to use C# or VB.net

    ReplyDelete
  5. Did you mean autoident instead of autoindent?

    ReplyDelete
  6. Thanks a lot....

    ReplyDelete
  7. Thanks a lot. It works.... I was looking for this such a long time.... Thanks you so much.... :)

    ReplyDelete
  8. Thank you very much. It really helped!!

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.