Thursday, 20 January 2011

Foreach folder enumerator

Case
I want a Foreach Folder Enumerator, but the Foreach Loop component only loops through files.

Solution
Unfortunately this isn't supported by the standard Foreach Loop component, but there are a couple of workarounds:
- Use an open source component Directory Foreach Enumerator or my own Custom Foreach Folder Enumerator
- Use a Script task to accomplish this.

This example uses the Script task and a Foreach Loop to enumerate through the folders. The Script tasks generates a list of Directories and the Foreach Loop loops through that list.

1) Control Flow
Drag a Script task and a Foreach Loop container to the Control Flow like the image below.
Control Flow




















2) Variables
Create three string variables:
  1. startlocation that indicates the parent folder wherein we gonna search for subfolders. Fill it with a path like "d:\foreachfoldertest\".
  2. xmldoc which we gonna use for communicating between the Script task and the Foreach Loop.
  3. folder which is used for the variable mapping in the Foreach Loop container.
Varibles (right click in Control Flow)










3) Script Task
Select the startlocation as a readonly variable (we only read the path inside it) and select xmldoc as a readwritevariable (an xml string will be stored in it).
Variables



















4) The Script
Copy the following script to your Script task
// C# code 
// This script loops through a folder
// and adds all subfolders to an xml string
using System;
using System.Data;
using System.IO;        // Added
using System.Xml;       // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_5da96344f1c4411ab56207579f2e5e91.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            // Get all subfolders within the parent folder
            string[] subFolders = Directory.GetDirectories(Dts.Variables["User::startlocation"].Value.ToString());

            // Create variables for the XML string
            XmlDocument xmldoc = new XmlDocument();
            XmlElement xmlRootElem;
            XmlElement xmlChildElem;
            XmlText xmltext;

            // Add the root element: <ROOT>
            xmlRootElem = xmldoc.CreateElement("", "ROOT", "");

            // Variable for directory information
            DirectoryInfo directoryInfo;

            // Loop through the parent folder
            foreach (string currentFolder in subFolders)
            {
                // Fill directoryInfo variable with folder information
                directoryInfo = new DirectoryInfo(currentFolder);

                // Create the child element that contains the path:
                // <Folder>d:\foreachfoldertest\subfolder1\</Folder>
                xmlChildElem = xmldoc.CreateElement("", "Folder", "");
                xmltext = xmldoc.CreateTextNode(directoryInfo.FullName);
                xmlChildElem.AppendChild(xmltext);

                // Add the child element to the root element
                xmlRootElem.AppendChild(xmlChildElem);
            }
            // Add the root element to the xml document
            xmldoc.AppendChild(xmlRootElem);
            
            // Store the xml in the SSIS variable
            Dts.Variables["xmldoc"].Value = xmldoc.InnerXml.ToString();

            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

This script will result in:
<ROOT>
  <Folder>d:\foreachfoldertest\subfolder1</Folder>
  <Folder>d:\foreachfoldertest\subfolder2</Folder>
  <Folder>d:\foreachfoldertest\subfolder3</Folder>
</ROOT>

5) Foreach Loop Container
Edit the Foreach Loop Container and change the enumerator on the Collection tab to Foreach NodeList Enumerator. This means that it will loop through an xml string. Change the DocumentSourceType to Variable and select the xmldoc variable as the DocumentSource. This is the variable that contains the xml string from the Script task. Last... add the following string to the OuterXPathString: "/ROOT/*" (without the quotes). This means that it's gonna look within the root element.
Foreach NodeList Enumerator



















6) Variable Mappings
Map the variable folder to Index 0. This will store the path in this variable.
Variable Mappings



















7) Test the result
Now you can add your own tasks to the Foreach Loop container and use the variable folder to get the path. I added a simple Script Task for testing that shows the path in a MessageBox.
Example with MessageBox














8) More advanced script
You can even add a Traverse subfolders option with this recursive method.
// C# code
// This script loops through a folder
// and adds all subfolders to an xml string
using System;
using System.Data;
using System.IO;        // Added
using System.Xml;       // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_5da96344f1c4411ab56207579f2e5e91.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        // Variables for the xml string
        private XmlDocument xmldoc;
        private XmlElement xmlRootElem;

        public void Main()
        {
            // Inialize XMLdoc
            xmldoc = new XmlDocument();

            // Add the root element: <ROOT>
            xmlRootElem = xmldoc.CreateElement("", "ROOT", "");

            // Add Subfolders as Child elements to the root element
            GetSubFolders(Dts.Variables["User::startlocation"].Value.ToString());

            // Add root element to XMLdoc
            xmldoc.AppendChild(xmlRootElem);

            // Fill SSIS variable with XMLdoc
            Dts.Variables["xmldoc"].Value = xmldoc.InnerXml.ToString();

            Dts.TaskResult = (int)ScriptResults.Success;
        }

        // Recursive method that loops through subfolders
        private void GetSubFolders(String parentFolder)
        {
            // Get subfolders of the parent folder
            string[] subFolders = Directory.GetDirectories(parentFolder);

            // XML child element
            XmlElement xmlChildElem;
            XmlText xmltext;

            // Variable for file information
            DirectoryInfo directoryInfo;

            // Loop through subfolders
            foreach (string currentFolder in subFolders)
            {
                // Fill fileInfo variable with file information
                directoryInfo = new DirectoryInfo(currentFolder);

                // Create child element "Folder":
                // <Folder>d:\foreachfoldertest\subfolder1\</Folder>
                xmlChildElem = xmldoc.CreateElement("", "Folder", "");
                xmltext = xmldoc.CreateTextNode(directoryInfo.FullName);
                xmlChildElem.AppendChild(xmltext);
                
                // Add child element to root element
                xmlRootElem.AppendChild(xmlChildElem);

                // Recursive call
                GetSubFolders(directoryInfo.FullName);
            }
        }
    }
}

Note: there are more options to accomplish the communication between the Script Task and the Foreach Loop container, but I wanted you to show an infrequently used option of the Foreach Loop container. Let me know what your solution was.

23 comments:

  1. Thanks bro. Helped a ton.

    ReplyDelete
  2. Hey thanks for your article actually I am very new to SSIS ad tryig to learn this .
    I am getting this error when I follow the article

    Error: Failed to lock variable "User::startloaction" for read access with error 0xC0010001 "The variable cannot be found. This occurs when an attempt is made to retrieve a variable from the Variables collection on a container during execution of the package, and the variable is not there. The variable name may have changed or the variable is not being created.".


    All the three variables are in the Foreach scope

    ReplyDelete
  3. Thank you for your response,
    I am using 2008 version, i don't have permissions to install custom enumerator.
    Actually i am looking some change in the script task (to include parent folder) which you provide for looping over subfolders, the output of the xml path contains only subfolders

    * MOVED FROM Custom SSIS Component: Foreach Folder Enumerator *

    ReplyDelete
    Replies
    1. add something like this before calling GetSubFolders from the main method:
      xmlChildElem = xmldoc.CreateElement("", "Folder", "");
      xmltext = xmldoc.CreateTextNode(Dts.Variables["User::startlocation"].Value.ToString());
      xmlChildElem.AppendChild(xmltext);

      Delete
    2. Thank you Joost

      Delete
  4. Hi joost,

    Thank you for your response,
    I am using 2008 version, i don't have permissions to install custom enumerator.
    Actually i am looking some change in the script task (to include parent folder) , the output of the xml path contains parent folder and subfolders.so that below
    foreachloop container loops all the files with out bypass the files of parent folder.this is my goal
    Please consider it and let me known what change i need make in the scripttask

    Thanks

    ReplyDelete
    Replies
    1. @Anonymous: Not sure I'm getting you...This foreach loop doesn't loop through files. Please explain your self or try this forum: http://social.msdn.microsoft.com/forums/en-us/sqlintegrationservices/threads/

      Delete
  5. Joost, I got an error on the actually For Each Loop

    Error: 0xC0014023 at For Each Folder: The GetEnumerator method of the ForEach Enumerator has failed with error 0x80131943 "(null)". This occurs when the ForEach Enumerator cannot enumerate.

    Cracking piece of code but unfortunately I have no experience with C# so debugging is difficult.

    Any ideas?

    ReplyDelete
    Replies
    1. For debugging you have to switch the project to 32bit. You can also add a messagebox to check the value of the xml:

      // Fill SSIS variable with XMLdoc
      Dts.Variables["xmldoc"].Value = xmldoc.InnerXml.ToString();
      MessageBox.Show(xmldoc.InnerXml.ToString());

      And if you don't have programming experience at all you could also try my custom enumerator:
      http://microsoft-ssis.blogspot.com/2012/02/custom-ssis-component-foreach-folder.html

      Delete
  6. Good article....However, when i tried this and executed my package i got the following error:

    DTS Script has encountered an exception in user code:

    Project name: ST_01a52907a688442d8776aa9bf03b3a41

    Cannot load script for execution.


    I have also tried to install the custom component but unfortunately after installation i cannot see the changes to my ForEach Loop Component.

    ReplyDelete
    Replies
    1. Which version of SSIS are you using and which script did you use? 4 or 8? Did you copy the contents of the method or did you copy the whole script?

      Delete
    2. encountered the same issue. did you get figured out what did you missed? I copied the whole script #4 using MVS2010/C#2010

      Delete
    3. Do not copy the whole script, but only the missing usings and then the contents of the Main method.

      Delete
  7. Worked very well for me in 2008 - thanks!

    ReplyDelete
  8. Is XML really necessary? Can't you just use an array or a list and make the variable type an object? For loop does have the option for looping a variable.

    ReplyDelete
    Replies
    1. No! It just one of the possible solutions... you could also store a datatable in an object variable and then use a foreach ado enumerator... and there are even more possibilities.

      Delete
    2. facing similar issue ssdt -2014. can you suggest me a solution ?

      Delete
    3. @Chaithanya Reddy Cheruku26: Please be more specific. Or contact me via the webform in the menu.

      Delete
  9. Thanks. This works very well for all folders. How do I exclude some folders.

    ReplyDelete
    Replies
    1. Just a view and always the same? You could add a simple if statement within the loop.
      if (currentFolder.Equals("xxxxx"))
      {
      }

      Delete
  10. Hi Joost, Can you please help me how to exclude several folders with the loop? I have tried and still not working. Thanks.

    ReplyDelete
    Replies
    1. I would add a IF construction in the subfolders method. Something like
      if (!currentFolder.Equals("xxxxx"))
      {
      }
      Or check one of the properties of directoryInfo:
      if (!directoryInfo.FullName.Equals("xxxxx"))
      {
      }

      Delete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.