Sunday, 27 February 2011

How to skip the trailer records

Case
Some flat files have one or more trailer / footer records at the bottom of the file. For example:

Name    Address
Joost   Street 1
William Avenue 4b
John    Plaza 3
Bill    Road 15
Records 4

The header is easy to skip in the Flat File Connection Manager, but there isn't an option for skipping the trailer record. How do I skip trailer records?
No option for skipping trailer records




















Solution
It all depends on whether you can distinguish the trailer record from the other records.

A distinguishable trailer
If you can distinguish the trailer record, you should use an expression in a Conditional Split to filter it out. The Data Flow below is for the file example above:
Filter with an expression























No distinguishable trailer
For files where you can't distinguish the trailer record from the others there are roughly two kind of solutions:
  1. Write your own source with a Script Component (or a custom component) which skips the last record.
  2. Make the trailer record(s) distinguishable so you can still use a Conditional Split to skip it.
Downside of both solutions is that you have to count the total number of records/lines in the textfile to know when to start skipping records. This means you have to loop through the file twice.

// Even in C# you have to loop through the file to count the records
var lineCount = 0;
using (var reader = File.OpenText(@"C:\file.txt"))
{
    while (reader.ReadLine() != null)
    {
        lineCount++;
    }
}

// There is a more efficient way of counting lines, but it's
// only available/efficient in .Net Framework 4 and later.
var lineCount = File.ReadLines(@"C:\file.txt").Count(); 


Let's elaborate the second solution.
This solution will first count the number of records in the file and then adds a numberfield with an increasing number and uses that number to filter out the last record.

1) Add Variable
Add an integer variable named Counter to store the number of records.
Right Click in your Data Flow













2) Add first dataflow
In the first data flow there is only a Flat File Source Component that reads your text file. Select only 1 column (preferably the smallest) because we are only counting the rows. Add a Row Count Component that uses the variable of step 1 to store the number of records. It should look like this:
Only counting the rows.














3) Add second dataflow
Add a second dataflow with the same Flat File Source Component that reads your text file (also reuse the Flat File Connection Manager). And add a Script Component (transformation) behind it. The Script Component will add a RowId column.
Script Component (transformation)














4) Script Component
Edit the Script Component. We need a new column to store the Row ID in. So, add a new column on the tab Inputs and Outputs. The type should be an integer. A four-byte signed integer should be enough, but you can change that for smaller or larger files.
Add new integer column



















5) The Script
SSIS will create three methods on default, but you will only need Input0_ProcessInputRow (remove the others). Copy the following method and variable declaration to your script.
// C# code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // New internal variable to store the rownumber 
    private int rowCounter = 0;

    // Method that will be started for each record in you dataflow 
    // It fills the new column with an increasing number
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Indent counter 
        rowCounter++;
        // Fill the new column 
        Row.RowId = rowCounter;
    }
}

The result will be something like this:
RowId Name    Address
1     Joost   Street 1
2     William Avenue 4b
3     John    Plaza 3
4     Bill    Road 15
5     Records 4


6) Conditional Split
Now you can use the RowId column and Counter variable to split the trailer record(s). Add a Conditional Split behind your Script Component. If you have only one trailer record the expression should be: RowId < @[User::Counter]
Split the trailer from the rest





















7) The result
The solution may look clumsy, but it works rather fast (even on my laptop with over a million records).
Overview of the solution




Note: merging the two Data Flows or adding a post method in the Script Component wont work because they fill the variable too late and the conditional split wont work correct.
Won't work






















Please let me know if you have an other solution to skip the trailer record.

Tuesday, 15 February 2011

Eventlog as a source

Case
I want to use the Windows Eventlog as a source in SSIS.

Solution
You can either read the saved eventlog files (*.evt / *.evtx) with for example this open source component or you can read directly from the eventlog itself. Let's elaborate the last option.

1) Script Component
Go to your Data Flow and add a Script Component. The Type should be Source.
Script Type Source




















2) Add Columns
Edit the Script Component and go to the Inputs and Outputs tab. Add four columns at the output section:
  1. EntryType (string 255)
  2. Source (string 255)
  3. Message (string 4000)
  4. TimeGenerated (database timestamp)
Add output columns



















This example uses only four columns, but there are more columns available in the eventlog.

3) The Script
Go to the first tab and hit the Edit Script... button. This script is in the default C# language. SSIS will create three methods: PreExecute, PostExecute and CreateNewOutputRows. Remove the first two, because we don't need them. Replace your method with this one and add the using row at the top.
// C# code
using System;
using System.Data;
using System.Diagnostics;       // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
        // Get all events from the Application(/System/Security) log from the local server (.)
        EventLog myEvenLog = new EventLog("Application", ".");

        // Create variable to store the entry
        EventLogEntry myEntry;

        // Loop trough all entries (oldest first)
        for (int i = 0; i < myEvenLog.Entries.Count; i++)
        {
            // Get single entry
            myEntry = myEvenLog.Entries[i];

            // Add new records
            this.Output0Buffer.AddRow();

            // Fill columns
            this.Output0Buffer.EntryType = myEntry.EntryType.ToString();
            this.Output0Buffer.Source = myEntry.Source;
            this.Output0Buffer.TimeGenerated = myEntry.TimeGenerated;
            // Take a max of 4000 chars
            this.Output0Buffer.Message = myEntry.Message.Substring(0, (myEntry.Message.Length > 4000 ? 3999 : myEntry.Message.Length - 1));
        }
    }
}

Note 1: with variables you can avoid hard coded strings in your Script Component.

Note 2: You should add some error handling to your script (Try Catch) to avoid unexpected errors (authorization, too large texts, etc.).


4) The result
Add a random transformation component after your source and add a dataviewer between them to see the result.
The result















Note: It's also possible to write to the eventlog with .net code, but that is not a best practice. Use the SSIS script event logging instead!

Monday, 14 February 2011

Script Task and Component Logging

Case
I want to add logging to my Script Tasks and Script Components. How do I implement that?

Solution
You can raise events in the Script Task / Script Component and those events can be logged by SSIS. This article will explain the event raising.

Script Component
The Script Component has the following events:
  • FireCustomEvent
  • FireError
  • FireInformation
  • FireProgress
  • FireWarning
Fire Event in Script Component






















You can raise an error event with the following code
//C# code 
bool pbCancel = false;
this.ComponentMetaData.FireError(0, "myScriptComponent", "An error occurred.", "", 0, out pbCancel);

But you can also combine the event raising with the try catch statements from .Net:
//C# code 
try
{
    // your code
}
catch(Exception e)
{
    bool pbCancel = false;
    this.ComponentMetaData.FireError(0, "myScriptComponent", "An error occurred: " + e.Message, "", 0, out pbCancel);
}

You can even combine that with SSIS System Variables:
//C# code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Declare script variable
    private string myTask;

    public override void PreExecute()
    {
        base.PreExecute();
        // Fill script variable with SSIS variable
        // Don't forget to add the system variable
        // to the ReadOnlyVariables in the Script
        // Component.
        myTask = Variables.TaskName; 
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        try
        {
            // Throwing exception for testing
            // Add your own code here.
            throw new ArgumentNullException();
        }
        catch (Exception e)
        {
            // Raise event so it can be logged by SSIS
            bool pbCancel = false;
            this.ComponentMetaData.FireError(0, myTask, "An error occurred: " + e.Message.ToString(), "", 0, out pbCancel);
        }
    }
}


Script Task
The Script Task has the following events:
  • FireBreakpointHit
  • FireCustomEvent
  • FireError
  • FireInformation
  • FireProgress
  • FireQueryCancel
  • FireWarning  
Fire Event in Script Task






















You can combine the event raising with the try catch statements from .Net and the SSIS System Variables:
//C# code 
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_c9f1ec34c2ee4dbbb0bf0b0db3f3ae58.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            // Don't forget to add the system variable to the ReadOnlyVariables in the Script Task
            bool fireAgain = true;
            Dts.Events.FireInformation(0, Dts.Variables["System::TaskName"].Value.ToString(), "Starting", string.Empty, 0, ref fireAgain); 
            try
            {
                // Throwing exception for testing
                // Add your own code here.
                // Dts.TaskResult = (int)ScriptResults.Success;
                Dts.Events.FireWarning(0, Dts.Variables["System::TaskName"].Value.ToString(), "About to crash", string.Empty, 0);
                throw new ArgumentNullException();
            }
            catch (Exception e)
            {
                // Raise event so it can be logged by SSIS
                Dts.Events.FireError(0, Dts.Variables["System::TaskName"].Value.ToString(), "An error occurred: " + e.Message.ToString(), "", 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }
}





Saturday, 12 February 2011

Script Component as source 2

Case
This is an follow up on another strangely formatted file article: My CSV file has splitted rows. There is only one attribute on each row:
id,1
name,Joost
address,Street 2
id,2
name,Jack
address,Avenue 4b
id,3
name,William
address,Plaza 5

I want to convert it to:
id,name,address
1,Joost,Street 1
2,Jack,Avenue 4b
3,William,Plaza 5

The pivot isn't usable because the records aren't grouped with a key. What's the solution?

Solution
You can use a Script Component as a source to solve this.

1) Script Component
Go to your Data Flow and add a Script Component. The Type should be Source.
Script Type Source




















2) Add Columns
Edit the Script Component and go to the Inputs and Outputs tab. Add three columns at the output section:
  1. ID (four-byte signed integer)
  2. Name (string)
  3. Address (string)
Add output columns



















3) Connection Manager
Go to the Connection Managers tab and add a connection (1) to avoid a hardcoded connection string in your Script. The Connection Manager Type (2) should be FILE and the Usage type (3) should be Existing file. After that change the default name from "Connection" to "myFile".
Add FILE connection to avoid hardcoded connection strings


























4) The Script
Go to the first tab and hit the Edit Script... button. This script is in the default C# language. SSIS will create three methods: PreExecute, PostExecute and CreateNewOutputRows. Remove the first two, because we don't need them. Replace your method with this one and add the using row at the top.
// C# code
using System;
using System.Data;
using System.IO;    // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
        // Variables to store column values
        int ID = 0;             // Column 1
        string Name = "";       // Column 2
        string Address = "";    // Column 3
        int NumberOfColumns = 3;

        // Counter to keep track of the current column        
        int counter = 0;

        // Read file (use the FILE connection added in the script component, named myFile)
        using (StreamReader sr = new StreamReader(this.Connections.myFile.ConnectionString))
        {
            String line;
            // Read lines from the file until the end of the file is reached.
            while ((line = sr.ReadLine()) != null)
            {
                // Raising the counter
                counter++;

                // Split the line into columns
                string[] columns = line.Split(',');

                // Fill the right variable
                if (counter.Equals(1))
                {
                    // Column 1
                    ID = System.Convert.ToInt32(columns[1]);
                }
                else if (counter.Equals(2))
                {
                    // Column 2
                    Name = columns[1];
                }
                else if (counter.Equals(3))
                {
                    // Column 3
                    Address = columns[1];
                }

                // Add a row if the last column has been reached
                if (counter.Equals(NumberOfColumns))
                {
                    // Add one new row and fill columns
                    this.Output0Buffer.AddRow();
                    Output0Buffer.ID = ID;
                    Output0Buffer.Name = Name;
                    Output0Buffer.Address = Address;

                    // Last column, reset counter
                    counter = 0;
                }
            }
        }
    }
}

5) The result
Add a random transformation component after your source and add a dataviewer between them to see the result.
The result




















Note: You should add some error handling to your script (Try Catch) to avoid unexpected errors (no file, other delimiter, empty rows, etc.).

Let me know if you have another strangely formatted file. Maybe I can find a solution to process it.

Thursday, 10 February 2011

Script Component as source 1

Case
My CSV file has some rows where not all columns at the end of the row are filled:

"test1";"abc";"xyz";"123"
"test2";"cba";"zyx";"321"
"test3";"abc"
"test4";"efg";"zyx"
"test5";"cba";"zyx";"321"

If you use a flat file connection manager, you will get strange things:
Some rows will 'merge'




















And if the last row is incomplete you will get:








Is there a workaround for this poorly formatted file?

Solution
You can use a Script Component as a source to solve this.
* UPDATE: This has been solved in SQL Server 2012 RC0 *

1) Script Component
Go to your Data Flow and add a Script Component. The Type should be Source.
Script Type Source




















2) Add Columns
Edit the Script Component and go to the Inputs and Outputs tab. Add four string columns at the output section (this example only uses strings).
Add output columns



















3) Connection Manager
Go to the Connection Managers tab and add a connection (1) to avoid a hardcoded connection string in your Script. The Connection Manager Type (2) should be FILE and the Usage type (3) should be Existing file. After that change the default name from "Connection" to "myFile".
Add FILE connection to avoid hardcoded connection strings


























4) The Script
Go to the first tab and hit the Edit Script... button. This script is in the default C# language. SSIS will create three methods: PreExecute, PostExecute and CreateNewOutputRows. Remove the first two, because we don't need them. Replace your method with this one and add the using row at the top.
// C# code
using System;
using System.Data;
using System.IO;    // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
        // Read file (use the FILE connection added in the script component, named myFile)
        using (StreamReader sr = new StreamReader(this.Connections.myFile.ConnectionString, System.Text.Encoding.UTF7)) // Encoding is optional
        {
            String line;
            // Read lines from the file until the end of the file is reached.
            while ((line = sr.ReadLine()) != null)
            {
                // Split the line into columns
                string[] columns = line.Split(';');

                // Add one new row
                this.Output0Buffer.AddRow();

                // Fill columns, but check if they exist
                if (columns.Length > 0)
                {
                    // Remove the " at the start and end of the string
                    // with a trim or use a substring.
                    Output0Buffer.MyFirstColumn = columns[0].TrimStart('"').TrimEnd('"');
                }
                if (columns.Length > 1)
                {
                    Output0Buffer.MySecondColumn = columns[1].TrimStart('"').TrimEnd('"');
                }
                if (columns.Length > 2)
                {
                    Output0Buffer.MyThirdColumn = columns[2].TrimStart('"').TrimEnd('"');
                }
                if (columns.Length > 3)
                {
                    Output0Buffer.MyFourthColumn = columns[3].TrimStart('"').TrimEnd('"');
                }
            }
        }
    }
}
or VB.Net
// C# code
' VB.Net code
Imports System
Imports System.Data
Imports System.IO  'Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub CreateNewOutputRows()
        ' Read file (use the FILE connection added in the script component, named myFile)
        Using sr As New StreamReader(Me.Connections.myFile.ConnectionString, System.Text.Encoding.UTF7) ' Encoding is optional

            ' Create variable to store a line of text
            Dim line As String = sr.ReadLine

            ' Read lines from the file until the end of the file is reached.
            Do While (Not line Is Nothing)

                ' Split the line into columns
                Dim columns As String() = line.Split(";")

                ' Add one new row
                Me.Output0Buffer.AddRow()

                ' Fill columns, but check if they exist
                If columns.Length > 0 Then
                    ' Remove the " at the start and end of the string
                    ' with a trim or use a substring.
                    Output0Buffer.MyFirstColumn = columns(0).TrimStart("""").TrimEnd("""")
                End If
                If columns.Length > 1 Then
                    Output0Buffer.MySecondColumn = columns(1).TrimStart("""").TrimEnd("""")
                End If
                If columns.Length > 2 Then
                    Output0Buffer.MyThirdColumn = columns(2).TrimStart("""").TrimEnd("""")
                End If
                If columns.Length > 3 Then
                    Output0Buffer.MyFourthColumn = columns(3).TrimStart("""").TrimEnd("""")
                End If

                ' Goto next line
                line = sr.ReadLine
            Loop
        End Using
    End Sub
End Class

5) The result
Add a random transformation component after your source and add a dataviewer between them to see the result.
Notice the NULL values

















Note: You should add some error handling to your script (Try Catch) to avoid unexpected errors (no file, other delimiter, empty rows, etc.).

Note: If you are using this in a foreach loop, you don't need the connection manager. The variable from the foreach loop already contains the filepath.

Saturday, 5 February 2011

Generating a new SSIS Package GUID

dtutil /FILE "myFirstPackage.dtsx" /I
This action creates a new GUID for the newly-copied package to distinguish it from the original

You can even use the FOR command to give a whole bunch of file-based packages a new GUID at once:
for %f in (N:\folder\*.dtsx) do dtutil /IDRegenerate /FILE %f
Use a single percent sign (%) when typing the command at the command prompt. Use a double percent sign (%%) if the command is used inside a batch file.

C) Script Task
If you have a whole bunch of packages (file- or server-based) and you're not sure which one have duplicate GUIDS, you can use a Script Task to give them all new GUIDS.

Create a new package and add a Script Task to the Control Flow. Edit the Script Task and replace you Main()-method with this one.

For file-based packages:
//File-based GUID changer
using System;
using System.Data;
using System.IO; // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_fb40a32cd5d74b8da11ec9720443960a.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            /////////////////////////////////////////             
            // Get all package from a folder.
            string[] packages = Directory.GetFiles(@"d:\Integration Services Project1\", "*.dtsx");

            /////////////////////////////////////////             
            // Set application context
            Microsoft.SqlServer.Dts.Runtime.Application app = new Microsoft.SqlServer.Dts.Runtime.Application();

            /////////////////////////////////////////             
            // Loop through the packages
            foreach (string filepath in packages)
            {
                /////////////////////////////////////////             
                // Open package, Generate new GUID and save package.
                Package localPackage = app.LoadPackage(filepath, null);
                string oldGuid = localPackage.ID.ToString();
                localPackage.RegenerateID();
                app.SaveToXml(filepath, localPackage, null);

                /////////////////////////////////////////             
                // Log old and new GUID
                bool fireAgain = true;
                Dts.Events.FireInformation(0, "Change GUID", "Package: " + localPackage.Name + ", Old GUID: " + oldGuid + ", New GUID: " + localPackage.ID.ToString(), string.Empty, 0, ref fireAgain);   
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

For SQL server-based packages:
//SQL server-based GUID changer
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_fb40a32cd5d74b8da11ec9720443960a.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            /////////////////////////////////////////             
            // The SQL Server
            String SSISServer = "Name_Of_Your_Server"; 
            
            /////////////////////////////////////////             
            // Set application context
            Microsoft.SqlServer.Dts.Runtime.Application app = new Microsoft.SqlServer.Dts.Runtime.Application();

            /////////////////////////////////////////             
            // Create a package variable to temporary store the server packages
            Package serverPackage;
            
            /////////////////////////////////////////
            // Loop through packages and folders in the root: / (or change the path for your folder)
            foreach (PackageInfo serverPackageInfo in app.GetPackageInfos("/", SSISServer, null, null))
            {
                // Get only the packages. You could make a recursive function to loop through all folders
                if (serverPackageInfo.Flags.ToString().ToLower().Equals("package"))
                {
                    /////////////////////////////////////////             
                    // Open package, Generate new GUID and save package.
                    serverPackage = app.LoadFromSqlServer(serverPackageInfo.Folder + serverPackageInfo.Name, SSISServer, null, null, null);
                    string oldGuid = serverPackage.ID.ToString();
                    serverPackage.RegenerateID();
                    app.SaveToSqlServer(serverPackage, null, SSISServer, null, null);

                    /////////////////////////////////////////             
                    // Log old and new GUID
                    bool fireAgain = true;
                    Dts.Events.FireInformation(0, "Change GUID", "Package: " + serverPackage.Name + ", Old GUID: " + oldGuid + ", New GUID: " + serverPackage.ID.ToString(), string.Empty, 0, ref fireAgain); 
                }
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

Note: the folderpath and servername are hardcoded because you probably won't run this package on a regular base. But you can use variables to remove the hardcoded strings.

Note: If you want to reset the GUIDS of tasks, you should use this open source application: BIDS Helpeer

Friday, 4 February 2011

Create a GUID column in SSIS

Case
How do you create a new Guid column in SSIS?

Solution
There is no SSIS function for that, but there are a few workarounds.

A) If you already have a valid GUID in your source, but it's still a string type, then you can use a Derived Column to create a real guid.
(DT_GUID) ("{" + YourGuid + "}")



















B) If your source is a SQL Server table, you can use the TSQL statement to generate a Guid column.
NEWID() as Guid


















C) Or you can use a Script Component to generate a new Guid Column:

1) Script Component
Add a Script Component in your Data Flow and select Transformation as the Script Component Type.
Transformation Type













2) Create new column
Edit the Script Component and goto the Inputs and Outputs tab. Expand the Output 0 and add a new column. The column type should be unique identifier [DT_GUID].
Add new Guid column



















3) The Script
Edit the Script. Remove the PreExecute and PostExecute methods and add the following code to the Input0_ProcessInputRow method. That's all. Only one row of code.
// C# code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Create a Globally Unique Identifier with SSIS
        Row.Guid = System.Guid.NewGuid(); 
    }
}

' VB.net code
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<microsoft.sqlserver.dts.pipeline.ssisscriptcomponententrypointattribute> _
<clscompliant(false)> _
Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Create a Globally Unique Identifier with SSIS
        Row.Guid = System.Guid.NewGuid()
    End Sub
End Class

Thursday, 3 February 2011

How to configure a Foreach Loop Container: File Enumerator

How to configure a Foreach Loop Container? The Foreach Loop Container is a repeating container that loops through a collection. These are the collections:
This article explains the File Enumerator. The others will follow in the coming time.

Foreach File Enumerator
The file enumerator loops through a collection of files within a folder and makes it possible to execute for example a Data Flow Task foreach of the files without manualy changing the connection string.

1) Create variabel
We need a variable to store the filepath in. Create a string variable with the name FilePath.
Variabel










Note: I addad a value (the filepath of one of the sourcefiles) to show in the expression later on. It's optional.

2) Foreach Loop
Add a Foreach Loop Container to the Control Flow and give it a suitable name. Go to the Collection tab and:
A) Select Foreach File Enumerator. You have to selected it although it is already selected by default, but that's a little bug in the user interface. They eventually solved it in SQL 2012.
B) Select the folder where your sourcefiles are and use wildcard characters to specify the files to include in the collection. There is one little strange behaviour here that could cause unexpected results: *.xls will also return files with the extension .xlsx (it's simulair to a dos command dir *.xls).
C) Choose Fully qualified because that returns the complete filepath and not only the name and extension.
Configure Foreach Loop Container




















3) Variable mapping
Go to the Vaiable Mappings tab. Now we are gonna use the variable from step 1 and fill it with the filepath from the collection.
Map variable to Index 0 of the collection



















4) Dataflow
The loop is now ready. Add a Data Flow Task to the loop and create a simple dataflow that reads one of your source files and add the data to a database table. (Flat File Source/Connection Manager can be replaced by an Excel Source/Connection Manager)
Add Data Flow Task. => Add Source & Destination




















5) Connection Manager Expression
The Flat File Connection Manager YourSourceFile from the previous step is still hardcoded with the path of one of your sourcefiles and the variable FilePath contains the filepath from the Foreach Loop.

To replace this hardcoded path with the value of the variable we use an expression on the Connection Manager. Goto the properties of the FlatFile Connection Manager and add a new expression.
Add an expression






















Note: This step is the same if you use an Excel Connection Manager.


6) Expression
Select Connection String in the Expression Editor and press the button to edit the expression. In the Expression Builder drag the variable FilePath to the textbox.

Build the expression. Use Evaluate to test the expression.



















Note: If you used an Excel Connection Manager, then don't add an expression on the Connection String, but on ExcelFilePath. The rest is the same.


7) Testing
Now your Foreach Loop is ready for testing, but you could also add a File System Task to move the imported files to an archive folder or add a Script Task that archivces the source files in a zip file.
File System Task





















Note: the File Enumerator only loops through files. It doesn't return folders. Read this arcticle for a Foreach Folder Enumerator.
Note: the File Enumerator can't be sorted. Read this article for a sorted File Enumerator.

Tuesday, 1 February 2011

How to configure a For Loop Container

How to configure a simple For Loop Container? The For Loop Container is a repeating container that does something untill a condition evaluates true.

Four examples:
- Do something ten times
- Wait untill a certain time
- Wait for a couple of seconds / minutes
- Package Queue

A) Do something ten times
This example does ten times what's within the loop.

1) Variable
Create a integer variable named Counter. This variable is used the count till 10.
Variable Counter (right click in Control Flow)









2) Add For Loop Container
Drag a For Loop Container to your Control Flow and give it a suitable name.
For Loop Container




















3) Edit For Loop Container
  • Edit the For Loop Container and set the InetExpression to: "@[User::Counter] = 0". This is the initial value of the counter. This example starts at zero.
  • Set the required EvalExpression to: "@[User::Counter] < 10". This is the evaluation expression that contains the expression used to test whether the loop should stop or continue. The example stops after 10 times (0, 1, 2, 3, 4, 5, 6, 7, 8, 9).
  • Set the requires AssignExpression to: "@[User::Counter] = @[User::Counter] + 1". This is an optional iteration expression that increments the loop counter.
Set the expressions



4) Testing
For testing purposes I added a Script Task with a MessageBox that shows de value of the counter.
Testing the For Loop






B) Wait untill a certain time
This example uses the For Loop as a Wait statement. Let's wait untill 12:20.

1) For Loop Container
Add a For Loop Container to your Control Flow and give it a suitable name. I added a Annotation to clarify that it does nothing within the loop.
Waiting For Loop














2) Edit For Loop Container
Only enter this EvalExpression: GETDATE() <= DATEADD("Mi",20,DATEADD("Hh",12,(DT_DBDATE)GETDATE()))
EvalExpression



















Clarification of the expression
It removes the time from the current datetime. So 01-02-2011 12:01:45.002 becomes 01-02-2011 0:00:00.000:
(DT_DBDATE)GETDATE()
After that it adds 20 minutes and 12 hours. So it becomes 01-02-2011 12:20:00.000:
DATEADD("Mi",20,DATEADD("Hh",12,(DT_DBDATE)GETDATE()))
That datetime is compared to the current datetime:
GETDATE() <= DATEADD("Mi",20,DATEADD("Hh",12,(DT_DBDATE)GETDATE()))

3) Testing
For testing purposes I added a Script Task after the loop that shows the current time in a messagebox.
Testing














C)  Wait for a couple of seconds / minutes
With a slightly different approach then example B, you can wait for a couple of seconds, minutes, etc.
Use this expression in the same way as example B to wait 30 seconds:
DATEADD("ss", 30, @[System::ContainerStartTime]) > GETDATE()
or 5 minutes:
DATEADD("mi", 5, @[System::ContainerStartTime]) > GETDATE()

Click here for more information about DATEADD.

Note: Delay techniques with a For Loop causes unnecessary CPU processing. A Script Task with a Thread.Sleep doesn't have that drawback.