Friday, 22 April 2011

How to configure a Foreach Loop Container: Sorted File Enumerator

Case
I want my files to be process in filedate order, but the foreach loop doesn't provide a sort mechanism. Is there a way to order by filedate or filename?

Solution
The standard foreach loop container can't be sorted. See/vote this Feedback suggestion at Microsoft. It looks like it's standard ordered by filename. You could use this custom Sorted File Enumerator or you can accomplish it with a Script Task. This Script Task solution let's you order (asc or desc) by filename, creation date, last modified date or any other file property.

1) Control Flow
Drag a Script Task and a Foreach Loop Container to the Control Flow like the image below.
Sorted foreach loop container




















2) Variables
Create three variables:
  1. startlocation (string) that indicates the folder wherein we gonna search for files. Fill it with a path like "d:\mySourceFiles\".
  2. dataset  (object) which we gonna use for communicating between the Script Task and the Foreach Loop Container.
  3. filepath (string) which is used for the variable mapping in the Foreach Loop Container.
The variables for this solution












3) Script Task
Select the startlocation as a readonly variable (we only read the path inside it) and select dataset as a readwritevariable (an ADO object will be stored in it).
Variables in Script Task



















4) The Script
Copy the following script to your Script Task. Line 52 does the filtering and sorting. It is almost the same as a SQL query. See more examples on msdn.
// C# code 
using System;
using System.Data;
using System.IO;    // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_2e776e26793b45939128add3d850f70d.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            // Create a dataset. I named it unsorted, because it's not yet sorted
            DataSet dsUnsorted = new DataSet();

            // Create a new table in the dataset
            DataTable filelistTable = dsUnsorted.Tables.Add();
            filelistTable.Columns.Add("FilePath", typeof(string));  // Filepath needed for connectionstring.
            filelistTable.Columns.Add("FileName", typeof(string));  // Filename used for sorting [optional]. 
            filelistTable.Columns.Add("FileDate", typeof(DateTime));// Filedate used for sorting [optional].

            // Get all files within the folder       
            string[] allFiles = Directory.GetFiles(Dts.Variables["User::startlocation"].Value.ToString());

            // Variable for storing file properties
            FileInfo fileInfo;

            // Loop through the files in the folder       
            foreach (string currentFile in allFiles)
            {
                // Fill fileInfo variable with file information         
                fileInfo = new FileInfo(currentFile);

                // Choose which the file properties you will use
                // Columns:            FilePath           FileName       FileDate
                filelistTable.Rows.Add(fileInfo.FullName, fileInfo.Name, fileInfo.CreationTime);
            }

            // Filtering on *.txt extension. Note: like uses * instead of %
            // Sorting the files on filename (or filedate: FileName DESC)
            DataRow[] rows = dsUnsorted.Tables[0].Select("FileName like '*.txt'", "FileDate ASC");

            // Create a new sorted dataset that the SSIS foreach loop uses.
            DataSet dsSorted = new DataSet();
            DataTable filelistTableSorted = dsSorted.Tables.Add();

            // Only interested in the filepath which is needed for the connectionstring 
            filelistTableSorted.Columns.Add("FilePath", typeof(string));

            // Fill the new dataset with the sorted rows.
            foreach (DataRow row in rows)
            {
                filelistTableSorted.Rows.Add(row["FilePath"].ToString());
            }

            // Store the dataset in the SSIS variable       
            Dts.Variables["dataset"].Value = dsSorted;

            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

or with VB.Net
' VB.Net Code
Imports System
Imports System.Data
Imports System.IO       ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 
    Public Sub Main()
        ' Create a dataset. I named it unsorted, because it's not yet sorted
        Dim dsUnsorted As New DataSet()

        ' Create a new table in the dataset
        Dim filelistTable As DataTable = dsUnsorted.Tables.Add()
        filelistTable.Columns.Add("FilePath", GetType(String))   ' Filepath needed for connectionstring.
        filelistTable.Columns.Add("FileName", GetType(String))   ' Filename used for sorting [optional]. 
        filelistTable.Columns.Add("FileDate", GetType(DateTime)) ' Filedate used for sorting [optional].

        ' Get all files within the folder       
        Dim allFiles As String() = Directory.GetFiles(Dts.Variables("User::startlocation").Value.ToString())

        ' Variable for storing file properties
        Dim fileInfo As FileInfo

        ' Loop through the files in the folder       
        For Each currentFile As String In allFiles
            ' Fill fileInfo variable with file information         
            fileInfo = New FileInfo(currentFile)

            ' Choose which the file properties you will use
            ' Columns:            FilePath           FileName       FileDate
            filelistTable.Rows.Add(fileInfo.FullName, fileInfo.Name, fileInfo.CreationTime)
        Next

        ' Filtering on *.txt extension. Note: like uses * instead of %
        ' Sorting the files on filename (or filedate: FileName DESC)
        Dim rows As DataRow() = dsUnsorted.Tables(0).[Select]("FileName like '*.txt'", "FileDate ASC")

        ' Create a new sorted dataset that the SSIS foreach loop uses.
        Dim dsSorted As New DataSet()
        Dim filelistTableSorted As DataTable = dsSorted.Tables.Add()

        ' Only interested in the filepath which is needed for the connectionstring 
        filelistTableSorted.Columns.Add("FilePath", GetType(String))

        ' Fill the new dataset with the sorted rows.
        For Each row As DataRow In rows
            filelistTableSorted.Rows.Add(row("FilePath").ToString())
        Next

        ' Store the dataset in the SSIS variable       
        Dts.Variables("dataset").Value = dsSorted
        Dts.TaskResult = ScriptResults.Success
    End Sub

End Class


5) Foreach Loop Container
Edit the Foreach Loop Container and change the enumerator on the Collection tab to Foreach ADO Enumerator. This means that it will loop through an ADO object. Select the dataset variable as the ADO object source variable. This is the variable that contains the sorted dataset from the Script Task.
Foreach ADO Enumerator



















6) Variable Mappings
Map the variable filepath to Index 0. This will store the path in this variable.
Variable mapping


















7) Test the result
Now you can add your own tasks to the Foreach Loop container and use the variable filepath in the expression for Connection String. I added a simple Script Task for testing that shows the filepath in a MessageBox.
The Result: a sorted Foreach Loop Container





















Note: there are other ways to accomplish this, like a first Foreach Loop Container that adds all filenames to a database table and then a second Foreach Loop Container to loop through that database table. Let me know what your solution is.

Example package 2008

Update: I have created a Custom Foreach Sorted File Enumerator if you don't like scripting.

Saturday, 16 April 2011

Compare values of two rows

Case
I get the cumulative sales each week, but I want to know the sales per week.

This is what I have:
WeekNrBikeSales
1Red bike4
1Green bike2
1Black bike5
1Blue bike1
2Red bike6
2Green bike7
2Black bike7
2Blue bike4
3Red bike7
3Green bike8
3Black bike9
3Blue bike6

And this is what I want:

WeekNrBikeCumuSalesWeekSales
1Black bike55
2Black bike72
3Black bike92
1Blue bike11
2Blue bike43
3Blue bike62
1Green bike22
2Green bike75
3Green bike81
1Red bike44
2Red bike62
3Red bike71

Solution
One of the solutions is to sort the rows on the key column (Bike in this case) and then on the week number. Then you can compare each row to the previous row to calculate the week sales.

1) Source and Sorting
This example uses a CSV source, so I have to add a Sort Transformation. If your source is a database you can add an ORDER BY clause in the source query. Sort the rows on Bike (1) and WeekNr (2).
The WeekNr and Sales are integers and the Bike is a varchar/string column.
Sorting is important for this solution.


















2) Script Component
Add a Script Component (type transformation) and add all rows as readonly input columns.
ReadOnly Input Columns



















3) Add new output column
On the Inputs and Outputs tab, add a new integer (DT_I4) column named WeekSales to store the sales per week.
Add new output column



















4) The Script
Go to the Script tab and add the following C# code:
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Variables to store the previous row
    string Bike = "";
    int WeekNr = 0;
    int Sales = 0;

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Compare current key with previous key
        if (Row.Bike == Bike)
        {
            // Keys match, so you can compare current sales with previous sales
            Row.WeekSales = Row.Sales - Sales;
        }
        else
        {
            // Keys don't match, so this is the first week
            Row.WeekSales = Row.Sales;
        }

        // Store current row values in the variables for the next row
        Bike = Row.Bike;
        WeekNr = Row.WeekNr;
        Sales = Row.Sales;
    }
}

or VB.net code:

' VB.net Code
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<microsoft .sqlserver.dts.pipeline.ssisscriptcomponententrypointattribute=".sqlserver.dts.pipeline.ssisscriptcomponententrypointattribute"> _
<clscompliant false="false"> _
Public Class ScriptMain
    Inherits UserComponent

    ' Variables to store the previous row
    Dim Bike As String = ""
    Dim WeekNr As Integer = 0
    Dim Sales As Integer = 0

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Compare current key with previous key
        If (Row.Bike = Bike) Then
            ' Keys match, so you can compare current sales with previous sales
            Row.WeekSales = Row.Sales - Sales
        Else

            ' Keys don't match, so this is the first week
            Row.WeekSales = Row.Sales
        End If

        ' Store current row values in the variables for the next row
        Bike = Row.Bike
        WeekNr = Row.WeekNr
        Sales = Row.Sales
    End Sub
End Class

5) The result
I added an empty Derived Column and a Data Viewer for testing purposes.
The result

Thursday, 14 April 2011

Pause in SSIS

Case
How can I pause or delay the Control Flow of my package?

Solution
There is an open source pause task at codeplex, but there are three other way's to create a delay in your package.

1) For Loop Container
Add an empty For Loop Container to your control flow and one of the following expressions.
Use this expression to wait 5 seconds:
DATEADD("ss", 5, @[System::ContainerStartTime]) > GETDATE()
or 2 minutes:
DATEADD("mi", 2, @[System::ContainerStartTime]) > GETDATE()
Add the next task behind the container and it get's a 5 second delay.

2) Script Task
Add a Script Task to your control flow and add the following C# code:
// C# Code
        public void Main()
        {
            // Sleep for 5 seconds
            System.Threading.Thread.Sleep(5000);
            Dts.TaskResult = (int)ScriptResults.Success;
        }
or in VB.net
' VB.Net code
    Public Sub Main()
        'Sleep for 5 seconds
        System.Threading.Thread.Sleep(5000)
        Dts.TaskResult = ScriptResults.Success
    End Sub

3) Execute SQL Task
Add an Execute SQL Task to your Control Flow. Add a connection to a random SQL Server and add the following Transact-SQL statement:
-- T-SQL
WAITFOR DELAY '00:00:05'
Waiting in SSIS
Let me know if you thought of a different way to wait for a few seconds.

Note: the For Loop pause is the most processor intensive method. OK for a few seconds, but not for long periods

Saturday, 9 April 2011

IsNumeric or IsNumber expression in SSIS

Case
How do I check if a column value contains a number? I can't find the SSIS equivalent of the .net methods IsNumeric or IsNumber.

Solution
That expression isn't available in SSIS. See/vote for this Feedback request on Microsoft.com.
There are basicly two different workarounds available: (A) Script Component or (B) try casting the value to an int. A third option is the (C) FINDSTRING expression, but that works only single positions.

Solution A
Script Component with a .Net method to check whether a value is a number.

1) Script Component
Add a Script Component (type: transformation) where you need to do the check.
Script Component Transformation


















2) Select Input Columns
Add the column that needs to be checked as a ReadOnly input column.
Input Columns Tab



















3) Add Output Column
Add a new column to the Output columns on the tab Inputs and Outputs. The type shoot be Boolean and give it a suitable name.
Inputs and Outputs Tab



















4) The Script
The easiest way is to use Visual Basic.net as the language because vb.net has a method named IsNumeric and C# doesn't.
' VB.Net code
' Check whether the string value contains a number
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

 _
 _
Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Standard VB.net method, indicating whether
        ' an expression can be evaluated as a number
        If (IsNumeric(Row.TextNumbers)) Then
            Row.IsNumeric = True
        Else
            Row.IsNumeric = False
        End If
    End Sub

End Class

And the C# example
// C# Code
// Check whether the string value contains a number
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if (IsNumeric(Row.TextNumbers))
        {
            Row.IsNumeric = true;
        }
        else
        {
            Row.IsNumeric = false;
        }
    }

    // Custom C# method, indicating whether an 
    // expression can be evaluated as a number
    static bool IsNumeric(string myString)
    {
        try
        {
            Int32.Parse(myString);
        }
        catch
        {
            return false;
        }
        return true;
    } 
}
More C# equivalent methods for the IsNumeric are available here.

5) The Result
I added an empty Derived Column with a Data Viewer for testing.
The Result






















Note: you could also add a second output port and create a conditional split construction with a Script Component. An example of two outputs can be found here.


Solution B
Try to cast a string to an integer. If it works, it's an integer and if it raises an error, it isn't. The casting can be done by a Derived Column Transformation or a Data Conversion Transformation.

1) Add Derived Column
Add a Derived Column where you need to do the check.
Derived Column


















2) Add Expression
Add a new column with the following expression and give it a suitable name: !ISNULL((DT_I8)TextNumbers). All numbers will result in True and all non-numbers will raise an error.

3) Ignore error
Go to the Configure Error Output window in the Derived column and ignore errors for the new field.
Ignore error














4) The Result
I added an empty Derived Column with a Data Viewer for testing. Notice the NULL value for non-numbers. That's the difference between the two methods. You can add an ISNULL expression in the next Derived column to replace the null values with false.
The Result






















5) Alternative with Data Conversion
An alternative could be to try convert the value to an int via a Data Conversion Transformation and also ignore any errors. Than add a Derived Column with an expression to check for null values: !ISNULL(IntTextNumbers)
Same Result




























The results of methods A and B are equal. Choose the method that fits you.

Solution C
If you want to check one position for a numeric value, you could also use an FINDSTRING expression:
FINDSTRING("0123456789", SUBSTRING([YourColumn], 1, 1), 1) != 0

Tuesday, 5 April 2011

Breakpoint does not work within SSIS Script Component

Case
I cannot debug (use breakpoints) in a Script Component. What's wrong?

Solution
The Script Component does not support the use of breakpoints. Therefore, you cannot step through your code and examine values as the package runs. There are a few workarounds to still get some form of debugging.

* Script Task not debugging? Switch Project Properties to 32bit for SSIS 2005 and 2008! *
1) MessageBox
The good old messagebox is a simple quick way of displaying some value. But it could be a little annoying with a lot of records.
// C# Code
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    System.Windows.Forms.MessageBox.Show("SomeMessage: " + Row.YourColumn);
}

Messagebox.Show

















2) Fire events
You can fire events and watch the execution result tab.
// C# Code
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    bool fireAgain = true;
    this.ComponentMetaData.FireInformation(0, "ScriptComponent", "SomeMessage: " + Row.YourColumn, string.Empty, 0, ref fireAgain);
}

Partial Execution Results








3) Trace log
The .Net framework has it's own trace features which you can use the write messages to a listner. There are a lot of listners (third party, open source or your own custom handmade .net application), but your can also download one from Microsoft.com: DebugView for Windows.
// C# Code
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    System.Diagnostics.Trace.WriteLine("SomeMessage: " + Row.YourColumn);
}











Let me know if you have an other workaround. And also see/vote for this Feedback request on Microsoft.com.
* Update 17 November: Debugging has been added in SQL Server 2012 RC0 *