Friday, 30 March 2012

Custom SSIS Component: Rownumber Transformation

Adding a rownumber in a data flow is an often seen activity. This isn't a standard feature of SSIS so you will have to use a Script Component or a Third Party Component for that. My Microsoft.Net Colleague Marc Potters and I created a custom component for that which should make thinks easier for those who have less programming experience.
Rownumber
























You can either create a new rownumber column and specify the datatype or use one of the existing columns. In the second part you can specify the start number and in the increment. The startnumber can also be specified by a variable which is for example populated by a query in an Execute SQL Task. The last block if for storing the final rownumber in a variable. However this number will only be available when the data flow task has been finished.

Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008 and 2012 version on the download page.

Version 1.1: Added the option to add one increment at the end. This makes it easier to use if you want to continue with that number in the next data flow task.

Installation
The installer registers the DLL in the GAC and copies it to the component folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\). After that you have to close Visual Studio/BIDS because it caches the GAC on startup.

How add the task the the toolbox
When you open a SSIS project in Visual Studio/Bids, right click the toolbox and choose "Choose Items...". After this Visual Studio will be busy for a couple of seconds!
Right click toolbox






















When the Choose Toolbox Items window appears, go to the SSIS Data Flow Items and search for the newly installed  Rownumber component and select it. Click ok to finish.
Choose Toolbox Items




















Now the new component will appear in the toolbox. Ready to use! Have fun.
New component added























Bugs or feature suggestions
Please contact me or leave a comment if you have any suggestions or found a bug in this Custom component.

Thursday, 1 March 2012

Adding a header row to a Flat File

Case
I want to add a header row with totals (total number of records, total count) on the first row:
ROWCOUNT: 5  TOTAL: 57
MOUNTAINBIKE BLUE   10
MOUNTAINBIKE RED     8
MOUNTAINBIKE BLACK  21
MOUNTAINBIKE SILVER 13
MOUNTAINBIKE WHITE   5

This information is only available after the Data Flow Task has finished, so you can't dynamicly alter the header property of a Flat File Destination.

Solution
A) You can add a Multicast after your source component and use one flow for the regular data and one flow for calculating the totals and add some header row. Read this post from Keith Hyer for more details for that solution. The flow should look something like this:
Example for adding header.



























B) If your header doesn't contain data that's calculated in the data flow it self, you could try something like this with an expression on the header:
Header with no calculated data

















C) In all other cases you could use a Script Task to add a header to a Flat File. Let's elaborate that solution.

1) Variables
I created two integer variables that will be filled by the Data Flow Task in step 2.
The variables











2) The Data Flow
The Data Flow I will be using for this example looks something like the flow below. You can adjust it for your own needs. There is a Row Count Transformation that counts the number of records and there is a Script Component (Transformation) that calculates the total of my sales column.
My data flow



























And the corresponding Script for calculating the sales total:
// C# Code
// Basic script for calculating
// row totals without aggregate.
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Declare variable for the total sales
    int ColumnTotal = 0;

    public override void PostExecute()
    {
        base.PostExecute();
        // Copy value of script variable to SSIS variable
        this.Variables.TotalSales = ColumnTotal;
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Add value of sales column to script variable
        ColumnTotal += Row.Sales;
    }
}
or VB.Net

' VB.Net Code
' Basic script for calculating
' row totals without aggregate.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    Dim ColumnTotal As Integer = 0

    Public Overrides Sub PostExecute()
        MyBase.PostExecute()
        ' Copy value of script variable to SSIS variable
        Me.Variables.TotalSales2 = ColumnTotal
    End Sub

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ColumnTotal = ColumnTotal + Row.Sales
    End Sub
End Class

3) The Script Task
Add a Script Task after your Data Flow Task and give it a suitable name. Note the name of the Flat File Connection Manager. It will be used in step 5.
Script Task after Data Flow Task





















4) ReadOnly variables
Add the two variables from step 1 as ReadOnly variables.
ReadOnly variables
























5) The Script
Add the following code to the main method in your Script Task.
// C# Code
// Script Task for appending header row on top of textfile
using System;
using System.Data;
using System.IO;      // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_cc10f7cb927344e7a8239df8ebf9bca3.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            try
            {
                // Determine Filenames using the Flat File Connection manager
                string FileName = Dts.Connections["MyFlatFile"].ConnectionString;
                string tempFileName = FileName + ".temp";

                // Create temporary copy of source file
                File.Move(FileName, tempFileName);

                using (StreamReader input = new StreamReader(tempFileName))
                {
                    using (StreamWriter output = new StreamWriter(FileName, false))
                    {
                        // Create header in empty file
                        output.WriteLine("ROWCOUNT: " + Dts.Variables["RowCount"].Value.ToString() +
                            " TOTAL: " + Dts.Variables["TotalSales"].Value.ToString());

                        // Create a buffer. This is needed for large files that won't fit in the servers RAM
                        var buf = new char[4096];

                        // Read temporary copy of source file in blocks
                        // and write in blocks to empty file with header
                        int read = 0;
                        do
                        {
                            read = input.ReadBlock(buf, 0, buf.Length);
                            output.Write(buf, 0, read);
                        } while (read > 0);

                        // Clear and close
                        output.Flush();
                        output.Close();
                        input.Close();
                    }
                }

                // temporary copy of source file
                File.Delete(tempFileName);

                // Close Script Task with Succes
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception ex)
            {
                // Log error
                Dts.Events.FireError(0, "Write header", ex.Message, string.Empty, 0);

                // Close Script Task with Failure
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }
}
or VB.Net

' VB.Net Code
' Script Task for appending header row on top of textfile
Imports System
Imports System.Data
Imports System.IO       ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 
    Public Sub Main()
        Try
            ' Determine Filenames using the Flat File Connection manager
            Dim FileName As String = Dts.Connections("MyFlatFile").ConnectionString
            Dim tempFileName As String = FileName + ".temp"

            ' Create temporary copy of source file
            File.Move(FileName, tempFileName)

            Using input As StreamReader = New StreamReader(tempFileName)
                Using output As StreamWriter = New StreamWriter(FileName, False)
                    ' Create header in empty file
                    output.WriteLine("ROWCOUNT: " + Dts.Variables("RowCount").Value.ToString() + _
                                     " TOTAL: " + Dts.Variables("TotalSales").Value.ToString())

                    ' Create a buffer. This is needed for large files that won't fit in the servers RAM
                    Dim buf = New Char(4095) {}

                    ' Read temporary copy of source file in blocks
                    ' and write in blocks to empty file with header
                    Dim read As Integer = 0
                    Do
                        read = input.ReadBlock(buf, 0, buf.Length)
                        output.Write(buf, 0, read)
                    Loop While read > 0

                    ' Clear and close
                    output.Flush()
                    output.Close()
                    input.Close()

                End Using
            End Using

            ' temporary copy of source file
            File.Delete(tempFileName)

            ' Close Script Task with Succes
            Dts.TaskResult = ScriptResults.Success

        Catch ex As Exception
            ' Log error
            Dts.Events.FireError(0, "Write header", ex.Message, String.Empty, 0)

            ' Close Script Task with Failure
            Dts.TaskResult = ScriptResults.Failure
        End Try
    End Sub
End Class


6) The Result
The result is an extra header line on top of the existing file.
The Result