Friday, 22 April 2011

How to configure a Foreach Loop Container: Sorted File Enumerator

Case
I want my files to be process in filedate order, but the foreach loop doesn't provide a sort mechanism. Is there a way to order by filedate or filename?

Solution
The standard foreach loop container can't be sorted. See/vote this Feedback suggestion at Microsoft. It looks like it's standard ordered by filename. You could use this custom Sorted File Enumerator or you can accomplish it with a Script Task. This Script Task solution let's you order (asc or desc) by filename, creation date, last modified date or any other file property.

1) Control Flow
Drag a Script Task and a Foreach Loop Container to the Control Flow like the image below.
Sorted foreach loop container




















2) Variables
Create three variables:
  1. startlocation (string) that indicates the folder wherein we gonna search for files. Fill it with a path like "d:\mySourceFiles\".
  2. dataset  (object) which we gonna use for communicating between the Script Task and the Foreach Loop Container.
  3. filepath (string) which is used for the variable mapping in the Foreach Loop Container.
The variables for this solution












3) Script Task
Select the startlocation as a readonly variable (we only read the path inside it) and select dataset as a readwritevariable (an ADO object will be stored in it).
Variables in Script Task



















4) The Script
Copy the following script to your Script Task. Line 52 does the filtering and sorting. It is almost the same as a SQL query. See more examples on msdn.
// C# code 
using System;
using System.Data;
using System.IO;    // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_2e776e26793b45939128add3d850f70d.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            // Create a dataset. I named it unsorted, because it's not yet sorted
            DataSet dsUnsorted = new DataSet();

            // Create a new table in the dataset
            DataTable filelistTable = dsUnsorted.Tables.Add();
            filelistTable.Columns.Add("FilePath", typeof(string));  // Filepath needed for connectionstring.
            filelistTable.Columns.Add("FileName", typeof(string));  // Filename used for sorting [optional]. 
            filelistTable.Columns.Add("FileDate", typeof(DateTime));// Filedate used for sorting [optional].

            // Get all files within the folder       
            string[] allFiles = Directory.GetFiles(Dts.Variables["User::startlocation"].Value.ToString());

            // Variable for storing file properties
            FileInfo fileInfo;

            // Loop through the files in the folder       
            foreach (string currentFile in allFiles)
            {
                // Fill fileInfo variable with file information         
                fileInfo = new FileInfo(currentFile);

                // Choose which the file properties you will use
                // Columns:            FilePath           FileName       FileDate
                filelistTable.Rows.Add(fileInfo.FullName, fileInfo.Name, fileInfo.CreationTime);
            }

            // Filtering on *.txt extension. Note: like uses * instead of %
            // Sorting the files on filename (or filedate: FileName DESC)
            DataRow[] rows = dsUnsorted.Tables[0].Select("FileName like '*.txt'", "FileDate ASC");

            // Create a new sorted dataset that the SSIS foreach loop uses.
            DataSet dsSorted = new DataSet();
            DataTable filelistTableSorted = dsSorted.Tables.Add();

            // Only interested in the filepath which is needed for the connectionstring 
            filelistTableSorted.Columns.Add("FilePath", typeof(string));

            // Fill the new dataset with the sorted rows.
            foreach (DataRow row in rows)
            {
                filelistTableSorted.Rows.Add(row["FilePath"].ToString());
            }

            // Store the dataset in the SSIS variable       
            Dts.Variables["dataset"].Value = dsSorted;

            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

or with VB.Net
' VB.Net Code
Imports System
Imports System.Data
Imports System.IO       ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 
    Public Sub Main()
        ' Create a dataset. I named it unsorted, because it's not yet sorted
        Dim dsUnsorted As New DataSet()

        ' Create a new table in the dataset
        Dim filelistTable As DataTable = dsUnsorted.Tables.Add()
        filelistTable.Columns.Add("FilePath", GetType(String))   ' Filepath needed for connectionstring.
        filelistTable.Columns.Add("FileName", GetType(String))   ' Filename used for sorting [optional]. 
        filelistTable.Columns.Add("FileDate", GetType(DateTime)) ' Filedate used for sorting [optional].

        ' Get all files within the folder       
        Dim allFiles As String() = Directory.GetFiles(Dts.Variables("User::startlocation").Value.ToString())

        ' Variable for storing file properties
        Dim fileInfo As FileInfo

        ' Loop through the files in the folder       
        For Each currentFile As String In allFiles
            ' Fill fileInfo variable with file information         
            fileInfo = New FileInfo(currentFile)

            ' Choose which the file properties you will use
            ' Columns:            FilePath           FileName       FileDate
            filelistTable.Rows.Add(fileInfo.FullName, fileInfo.Name, fileInfo.CreationTime)
        Next

        ' Filtering on *.txt extension. Note: like uses * instead of %
        ' Sorting the files on filename (or filedate: FileName DESC)
        Dim rows As DataRow() = dsUnsorted.Tables(0).[Select]("FileName like '*.txt'", "FileDate ASC")

        ' Create a new sorted dataset that the SSIS foreach loop uses.
        Dim dsSorted As New DataSet()
        Dim filelistTableSorted As DataTable = dsSorted.Tables.Add()

        ' Only interested in the filepath which is needed for the connectionstring 
        filelistTableSorted.Columns.Add("FilePath", GetType(String))

        ' Fill the new dataset with the sorted rows.
        For Each row As DataRow In rows
            filelistTableSorted.Rows.Add(row("FilePath").ToString())
        Next

        ' Store the dataset in the SSIS variable       
        Dts.Variables("dataset").Value = dsSorted
        Dts.TaskResult = ScriptResults.Success
    End Sub

End Class


5) Foreach Loop Container
Edit the Foreach Loop Container and change the enumerator on the Collection tab to Foreach ADO Enumerator. This means that it will loop through an ADO object. Select the dataset variable as the ADO object source variable. This is the variable that contains the sorted dataset from the Script Task.
Foreach ADO Enumerator



















6) Variable Mappings
Map the variable filepath to Index 0. This will store the path in this variable.
Variable mapping


















7) Test the result
Now you can add your own tasks to the Foreach Loop container and use the variable filepath in the expression for Connection String. I added a simple Script Task for testing that shows the filepath in a MessageBox.
The Result: a sorted Foreach Loop Container





















Note: there are other ways to accomplish this, like a first Foreach Loop Container that adds all filenames to a database table and then a second Foreach Loop Container to loop through that database table. Let me know what your solution is.

Example package 2008

Update: I have created a Custom Foreach Sorted File Enumerator if you don't like scripting.

19 comments:

  1. My requirement demands me to use a extract based on a date. And on success i need to pass the VDate variable value to Vsource variableand update the VDate variable to system date
    i have initially given values for VDate and Vsource .
    How can i use loop container , also i have some issues with the script that i've written , NEW to SSIS

    ReplyDelete
  2. @Vikram: For general SSIS questions you should post a question at the MSDN SSIS Forum. This is a very active forum with lot's of helpful people.

    If you have problems with the script posted on this page, then contact me via the contact page and let me know what the exact error was.

    ReplyDelete
  3. I can't get this to run in 2012. It would be helpful if you had the SSIS package for download.

    ReplyDelete
  4. Hi Joost,

    I Use SQL 2005 BIDS, is there an SSIS pacakge to download for 2005 please.

    Thanks
    Senthil

    ReplyDelete
    Replies
    1. I don't have 2005 on my pc, but you can use the VB.Net version of the Script Task. Copy the lines of code from the main method to your own main method. Only the last row (setting script task result) has been changed in 2008, so you need to keep that from your 2005 script task method. And check if all the imports rows are there in 2005. If not add them.

      Delete
  5. Thanks Joost i am able to work through it.

    ReplyDelete
  6. Hi Joost,

    I am using 2008 version and did the exact same as this post but still its not working. I am trying to loop through the files in a folder in sorted order and import the text file data to DB.

    It will be really helpful to me if you could share your project.

    Thank you in advance.

    ReplyDelete
    Replies
    1. I will create an example package and put it online...

      Delete
    2. Example Package has been uploaded. Let me know if works for you.

      Delete
    3. Hi Joost,

      Thank you very much for taking time and creating sample package to help me.

      Could you please provide me the link to download the package?

      I appreciate your help.

      Delete
    4. It's at the bottom of the post or click here

      Delete
    5. Hello Joost,

      Thank you once again.

      I found the issue with my package, its not setting the correct variable for my dataflow task connection string. hence its not picking the flatfile. finally i got it working.

      Delete
  7. I tried your code today,
    simply fabulous, thanks!

    ReplyDelete
  8. I found that I was not able to save my variables to a configuration file because the variable dataset was an object, even though I unselected it in so it wouldn't export.Have you encountered that? I'm going to try to work around that by using your component. I'm using BIDS 2008R2

    ReplyDelete
    Replies
    1. For this example you should only configure the string variable startlocation. The other variables are filled by the package. Giving them a default value via configurations isn't useful.

      Delete
  9. Instead of assigning dsSorted to the dataset should you assign filelistTableSorted? When I tried this the dsSorted had no rows but the filelistTableSorted did so I changed line 61 to use filelistTableSorted instead of dsSorted and it worked. I'm uncomfortable since no one else seemed to have the problem but it seems to be working now.

    ReplyDelete
  10. This is great! Worked flawlessly.

    Thanks!

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.