Thursday 3 February 2011

How to configure a Foreach Loop Container: File Enumerator

How to configure a Foreach Loop Container? The Foreach Loop Container is a repeating container that loops through a collection. These are the collections:
This article explains the File Enumerator. The others will follow in the coming time.

Foreach File Enumerator
The file enumerator loops through a collection of files within a folder and makes it possible to execute for example a Data Flow Task foreach of the files without manualy changing the connection string.

1) Create variabel
We need a variable to store the filepath in. Create a string variable with the name FilePath.
Variabel










Note: I addad a value (the filepath of one of the sourcefiles) to show in the expression later on. It's optional.

2) Foreach Loop
Add a Foreach Loop Container to the Control Flow and give it a suitable name. Go to the Collection tab and:
A) Select Foreach File Enumerator. You have to selected it although it is already selected by default, but that's a little bug in the user interface. They eventually solved it in SQL 2012.
B) Select the folder where your sourcefiles are and use wildcard characters to specify the files to include in the collection. There is one little strange behaviour here that could cause unexpected results: *.xls will also return files with the extension .xlsx (it's simulair to a dos command dir *.xls).
C) Choose Fully qualified because that returns the complete filepath and not only the name and extension.
Configure Foreach Loop Container




















3) Variable mapping
Go to the Vaiable Mappings tab. Now we are gonna use the variable from step 1 and fill it with the filepath from the collection.
Map variable to Index 0 of the collection



















4) Dataflow
The loop is now ready. Add a Data Flow Task to the loop and create a simple dataflow that reads one of your source files and add the data to a database table. (Flat File Source/Connection Manager can be replaced by an Excel Source/Connection Manager)
Add Data Flow Task. => Add Source & Destination




















5) Connection Manager Expression
The Flat File Connection Manager YourSourceFile from the previous step is still hardcoded with the path of one of your sourcefiles and the variable FilePath contains the filepath from the Foreach Loop.

To replace this hardcoded path with the value of the variable we use an expression on the Connection Manager. Goto the properties of the FlatFile Connection Manager and add a new expression.
Add an expression






















Note: This step is the same if you use an Excel Connection Manager.


6) Expression
Select Connection String in the Expression Editor and press the button to edit the expression. In the Expression Builder drag the variable FilePath to the textbox.

Build the expression. Use Evaluate to test the expression.



















Note: If you used an Excel Connection Manager, then don't add an expression on the Connection String, but on ExcelFilePath. The rest is the same.


7) Testing
Now your Foreach Loop is ready for testing, but you could also add a File System Task to move the imported files to an archive folder or add a Script Task that archivces the source files in a zip file.
File System Task





















Note: the File Enumerator only loops through files. It doesn't return folders. Read this arcticle for a Foreach Folder Enumerator.
Note: the File Enumerator can't be sorted. Read this article for a sorted File Enumerator.

28 comments:

  1. Hi there,

    Great tutorial. I am having trouble with the File System Task and archiving the files. I have two files that are correctly reading into SQL Server, but cannot find the file when copying or moving. I think something is up with the scope of the SourceFile variable.

    Has anyone been able to move the processed file immediately after the Data Flow task is complete?

    ReplyDelete
  2. In some (rare) cases the first task has not yet releases the file, while the subsequent task is already attempting to access the same file. You could create a pause in your control flow. Or your could wait until the file is unlocked with a Script.

    You could also consult the SSIS MSDN forum for other solutions.

    ReplyDelete
  3. I have a small issue,
    for some reason I don't have the foreach file enumerator in my collection.
    any ideas how to configure it?
    regrads,
    hkravitz

    ReplyDelete
  4. Scope of variable should be SSIS package not Foreach Loop. When I created User::FilePath variable in Foreach Loop scope, I couldn't see User::FilePath varibale in sourcefile connection expression editor.

    ReplyDelete
    Replies
    1. Maybe a bit confusing, but it's the name of the package... The package doesn't contain a Foreach Loop yet in step 1.

      Delete
  5. Hi Thanks for posting wonderful article. But you have missed to tell ExcleFilePath under [5) Expression] section. We have to select ExcleFilePath under Expression part under Excle connection properties.

    Note: Reader please make correction while reading this article

    ReplyDelete
    Replies
    1. Hi Unknown,

      Thanks for your comment, but I don't get it. I think all you need to know about looping through excel files is mentioned... please explain yourself. Or contact me via de Contact me link in the menu.

      Kind regards,

      Joost

      Delete
  6. Hi,

    Is there any way we can fetch the text file name which is being used in each enumeration and can be mapped in a variable.
    I need to load data from a text file to excel file having same name s text file name.

    ReplyDelete
    Replies
    1. Hi Shiwam,

      Not sure what you mean... you have the name of a text file in a variable filled by the Foreach Loop. Let's say fileA.csv and you now need the corresponding filename from excel? fileA.xls?

      Please explain or post a message in the SSIS MSDN forum.

      Delete
    2. This is what I do. I use the filename variable aquired in de foreach loop to write a new file to a different server.

      However, during a build something goes wrong. During the build the foreach loop is not running and so the variable stays empty. So the connection manager that is using the variable as a filename is not exported during a build because it needs a filename to be valid.

      Any solution to this?

      Delete
    3. Hi Anonymous,

      Just fill the variable with a default value...

      Delete
  7. I was surprised with how easy that was. But then, no sigar. While deploying the Connection Manager is removed from the package, since it is deemed incomplete even with a default value for the variable.

    ReplyDelete
  8. What you mentioned did have an effect, though. The Connection String - for instance in the properties of the File Connection - wasn't erased during Build or Deployment. However, on the server whereto the package was deployed the Connection Manager is nowhere to be found. So, we're not there yet. But thanks for the help sofar!
    Paul

    ReplyDelete
  9. There was apparently no way to preserve the connection manager so I decided not to waste anymore time on it. It wrote a script to write the file to the new location.

    ReplyDelete
  10. Great tutorial - a real good help. Thank you very much
    Scott

    ReplyDelete
  11. I'm kind of a SSIS newbie so I have to ask . . . I'd like to import the file name itself into an additional column in my staging table. Is it possible to map the variable to a column in my OLE DB Destination?

    Thanks,

    Andre Ranieri

    ReplyDelete
    Replies
    1. You can use the variable in a Derived Column Transformation to create/fill a filename column or you could use the Flat File Source component to create a filename column. Also check out the SSIS forum on MSDN.

      Delete
  12. Hi,

    I am using foreach file enumarator to load multiple files in to Sql server.Along this i have implimented SCD1 using scd component.My issue is this Foreach component takes latest file first for loading making my SCD logic useless. Any idea how to get old file first?

    Regards,

    Chaitanya

    ReplyDelete
  13. Well explained....
    One tricky question...how to read multiple CSV files each having different number of columns??

    Regards,
    Nishant

    ReplyDelete
    Replies
    1. That is only possible when you can distinguish the difference on the filename. That way you can have multiple Data Flow Tasks within your loop and start the correct Data Flow Task depending on the filename.
      Another option is to read everything as one big column.... or to take a look at the third party data flow task from cozyroc

      Delete
  14. Great article. Just what I needed.

    But now, I want to upload processed files to a Successful folder on the FTP site and failed imports to a Failed folder.

    I have not been able to figure out how to do that. It doesn't like the connections I have.

    Any ideas?

    Regards,

    Dan

    ReplyDelete
    Replies
    1. In the FTP Task you need to overwrite the LocalPath property with an expression that uses the variable from the foreach loop. Edit the FTP Task and go to the Expressions page. Locate that property and use the foreach loop variable to overwrite its value. To have differnt remote folders you need to thing of an expression on the RemotePath propery (or use two FTP Tasks. One for succeeded files an one for failed file).

      If the variable contains and empty string then you might want to enter a dummy value or set the DelayValidation property of the FTP Task to true to get it validated in designtime.

      Delete
    2. Great help, thanks!

      Delete
  15. More I read about File Enumerator, more confused I get. I understand, that we have a collection of files and our FileName variable will be changing the value as we go through the list of files. But.. am I missing something here or all these examples are lacking the explanation of the why the FIleName variable has an index 0? How do we map the FileName variable, if we don't know the index of the respective property for each item?

    ReplyDelete
    Replies
    1. It's a zero based collection. You can either use the column names (if you know them) or use the 0-th column and if there are more columns the 0, 1, 2 - column....

      Delete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.

Related Posts Plugin for WordPress, Blogger...