Sunday, 7 August 2011

Foreach ftp file enumerator

Case
I want to loop through all files in a FTP folder and process them. I need a Foreach ftp file enumerator, but the Foreach Loop component only loops through local files.

Solution
There are a couple of options. You could use a Script Task, a wildcard in the RemotePath Property of the FTP Task or a Third Party component to download all files to a certain folder and loop through those files with a Foreach Loop file enumerator. This is the most common approach.
This example uses a Script Task to get a list of ftp files and uses a Foreach From Variable Enumerator to loop through those files.

Update: There is now a custom Foreach FTP File Enumerator available.



1) Variables
We need a couple of variables to keep things flexible:
  • FtpWorkingDirectory: String variable with the remote folder path. Example: /root/data/
  • DownloadDirectory: String variable that contains the path of the download folder: Example: d:\SourceFiles\
  • FtpFileURL: String variable used in the foreach loop to store the remote filename in. Example: 01-01-2011.csv
  • FtpFileList: Object variable to store the remote file list in.
Variables to keep things flexible











2) Ftp Connection Manager
Create a FTP Connection Manager by right clicking in the Connection Manager pane and select New Connection... In the new window select FTP and click Add. After that fill in the FTP Connection Manager Editor. Test the connection manager and rename it to myFtpServer (this name is used in the Script Task).
New FTP connection




















After closing you can rename it.

















3) Script Task
Add a Script Task to the Control Flow and give it a suitable name. Edit the Script Task and add variable FtpWorkingDirectory as ReadOnly and FtpFileList as ReadWrite.
Script Task to get list of files






















4) The Script
Edit the script and copy the following code.
// #C Code
using System;
using System.Collections;       // Added
using System.Data;
using System.Net;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_f20cc76ba3bc47849f70c6cbfd4701dc.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            try
            {
                // Get the ftp connection from the Connection Managers collection
                ConnectionManager ftpServer = Dts.Connections["myFtpServer"];

                // Create a FTP connection object and us the credentials of connection manager
                FtpClientConnection myFtpConnection = new FtpClientConnection(ftpServer.AcquireConnection(null));

                // Open the connection
                myFtpConnection.Connect();

                // Set work folder with the value of the variable
                myFtpConnection.SetWorkingDirectory(Dts.Variables["FtpWorkingDirectory"].Value.ToString());

                // Create StringArrays for filenames and folders
                // The folderNames aren't used, but is mandatory
                // for the next method.
                String[] fileNames;
                String[] folderNames;

                // Get a directory listing and fill the StringArray variables
                myFtpConnection.GetListing(out folderNames, out fileNames);

                // Copy StringArray to ArrayList to fit in Object variable
                ArrayList fileNamesArray = new ArrayList(fileNames);
                
                // Optional sorter
                fileNamesArray.Sort();

                // Fill ssis object variable
                Dts.Variables["FtpFileList"].Value = fileNamesArray;

                // Close connection
                myFtpConnection.Close();

                // Close Script Task, set result to success
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception ex)
            {
                // Fire error and set result to failure
                Dts.Events.FireError(0, "FTP Script Task", "Error: " + ex.Message, string.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }
}

Or VB.Net

' VB.Net code
Imports System
Imports System.Collections
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 
    Public Sub Main()
        Try
            ' Get the ftp connection from the Connection Managers collection
            Dim ftpServer As ConnectionManager = Dts.Connections("myFtpServer")

            ' Create a FTP connection object and us the credentials of connection manager
            Dim myFtpConnection As FtpClientConnection = New FtpClientConnection(ftpServer.AcquireConnection(Nothing))

            ' Open the connection
            myFtpConnection.Connect()

            ' Set work folder with the value of the variable
            myFtpConnection.SetWorkingDirectory(Dts.Variables("FtpWorkingDirectory").Value.ToString())

            ' Create StringArrays for filenames and folders
            ' The folderNames aren't used, but is mandatory
            ' for the next method.
            Dim fileNames() As String
            Dim folderNames() As String

            ' Get a directory listing and fill the StringArray variables
            myFtpConnection.GetListing(folderNames, fileNames)

            ' Copy StringArray to ArrayList to fit in Object variable
            Dim fileNamesArray As ArrayList = New ArrayList(fileNames)

            ' Optional sorter
            fileNamesArray.Sort()

            ' Fill ssis object variable
            Dts.Variables("FtpFileList").Value = fileNamesArray

            ' Close connection
            myFtpConnection.Close()

            ' Close Script Task, set result to success
            Dts.TaskResult = ScriptResults.Success
        Catch ex As Exception
            ' Fire error and set result to failure
            Dts.Events.FireError(0, "FTP Script Task", "Error: " + ex.Message, String.Empty, 0)
            Dts.TaskResult = ScriptResults.Failure
        End Try
    End Sub
End Class

5) Foreach Loop
Add a Foreach Loop after the Script Task. Edit the Foreach Loop and on the Collection tab select Foreach From Variable Enumerator as enumerator and select the object variable FtpFileList as the variable.
Foreach From Variable Enumerator


















6) Foreach Loop Variable Mappings
On the Variable Mappings tab of the Foreach Loop select the variable FtpFileURL.
Variable Mapping


















7) FTP Task General
Add a FTP task within the Foreach Loop. Edit the FTP Task and on the General tab you can select the FTP Connection which we created in step 2. And you can give it a suitable name.
General: select the ftp connection


















8) FTP Task File Transfer
On the File Transfer tab you can select the DownloadDirectory variable for the local folder. In the Operation category you should select Receive files as the operation. For RemotePath you can enter a dummy value because we are overriding that value with an expression in the next step.
File Transfer


















9) FTP Task Expressions
At the Expressions tab you must add an expression for the RemotePath: @[User::FtpWorkingDirectory] +  @[User::FtpFileURL] This expression combines the working folder (/root/data/) and the filename (01-01-2011.csv) into: /root/data/01-01-2011.csv.
Click at 2 to open the Expression Editor


















10) Process file in Data Flow
Add a Data Flow Task within the Foreach Loop to process the downloaded files.
Process files






















10) Process file in Connection Manager
Create a Data Flow like the example below. I named the connection for the csv file myDownloadedFtpFile. Go to the properties of that connection and add an expression for the ConnectionString so we can use it in the Foreach Loop: @[User::DownloadDirectory] + @[User::FtpFileURL]
A simple data flow





















Expression to override the ConnectionString property













11) The result
Run the package to see the result.
The result

















Download example package

Note: In the Script Task you can use the .Net variable folderNames to create some recursive function to loop through all subfolders as well.

Note: If you just want to download all files with a Script Task and use a Foreach Loop file enumerator to process all files, you can replace the C# lines 48 - 55 or the VB.Net lines 41 - 48 with the following code:

// C# Code
// Download all files at once. Don't forget to add the SSIS variable DownloadDirectory to the ReadOnlyVariables
myFtpConnection.ReceiveFiles(fileNames, Dts.Variables["DownloadDirectory"].Value.ToString(), true, false);



' VB.Net code
' Download all files at once. Don't forget to add the SSIS variable DownloadDirectory to the ReadOnlyVariables
myFtpConnection.ReceiveFiles(fileNames, Dts.Variables("DownloadDirectory").Value.ToString(), true, false)



Update: There is a custom Foreach FTP File Enumerator available.

35 comments:

  1. Hi,
    Can you please setup this package for download? Thanks for replying to my question in the forums.

    ReplyDelete
  2. @PeaceOut: I have added the example package.

    ReplyDelete
  3. Grate post. For some reason I had to set UsePassiveMode property to true on the FTP connection manager.

    ReplyDelete
  4. @AikoAiko: An FTP connection manager can send and receive files using either active mode or passive mode. In active mode, the server initiates the data connection, and in passive mode, the client initiates the data connection.

    ReplyDelete
  5. Hi,Great post! Quick question, I have recreated the package but nothing is being passed into the data flow tasks to be written to the database table. Any tips on what i have messed up? Hoping to be able to files the files downloaded from the FTP site into the database table.

    ReplyDelete
  6. @RDH: First check if the Script is working with a debug or by adding a MessageBox.Show in the script to check wether the variable is getting filled.
    Something like MessageBox.Show(fileNamesArray.Length().ToString()) or MessageBox.Show(fileNames.ToString())

    If it's filled then something is wrong with you Foreach Loop Container (Variable mapping or Expression). If it's not filled then something is probably wrong with your FTP Connection manager (or the script).

    There is also an example package that you could try. Let me know your observations...

    ReplyDelete
  7. Great post, very helpful. I have one suggestion to improve it. The package will fail on the Get File List of FTP Site Script Task if there are no files to FTP. To avoid this I've modified the script to pass an empty list if there are no files present. With this modification it will still make it to the Foreach Loop Container and obviously won't do anything since the list is empty. The difference being it finishes with Success.

    The changes to the script is as follows:
    ...

    // Get a directory listing and fill the StringArray variables
    myFtpConnection.GetListing(out folderNames, out fileNames);
    ArrayList fileNamesArray = new ArrayList();

    if (fileNames != null)
    {
    // Copy StringArray to ArrayList to fit in Object variable
    fileNamesArray = new ArrayList(fileNames);

    // Optional sorter
    fileNamesArray.Sort();
    }

    // Fill ssis object variable
    Dts.Variables["FtpFileList"].Value = fileNamesArray;

    ...

    ReplyDelete
  8. Joost, could you please tell me where should i insert the "anonymous" code in above script...
    i am using VB code from above and please compile for errors...as i am not expert in .Net
    Thanks in Advance!!!!!!

    ReplyDelete
    Replies
    1. @Anonymous: are you talking about anonymous access? The script uses the ftp connection manager to contact the ftp server. So access should be handled in the connection manager. See this msdn page for more details about the ftpconnection in the script. You can 'overrule' the connection manager part and use your own credentials.

      Delete
  9. i am talking about the following script...
    please post it in VB.net and tell me in where should i insert below code in main script...please compile for errors...i am not .net Pro

    // Get a directory listing and fill the StringArray variables
    myFtpConnection.GetListing(out folderNames, out fileNames);
    ArrayList fileNamesArray = new ArrayList();

    if (fileNames != null)
    {
    // Copy StringArray to ArrayList to fit in Object variable
    fileNamesArray = new ArrayList(fileNames);

    // Optional sorter
    fileNamesArray.Sort();
    }

    // Fill ssis object variable
    Dts.Variables["FtpFileList"].Value = fileNamesArray;

    ReplyDelete
    Replies
    1. try something like this:

      ' Get a directory listing and fill the StringArray variables
      myFtpConnection.GetListing(folderNames, fileNames)

      Dim fileNamesArray As ArrayList

      If Not (fileNames Is Nothing) Then
      ' Copy StringArray to ArrayList to fit in Object variable
      fileNamesArray = New ArrayList(fileNames)
      ' Optional sorter
      fileNamesArray.Sort()
      End If
      ' Fill ssis object variable
      Dts.Variables("FtpFileList").Value = fileNamesArray

      Delete
  10. code fails again when there are no files, can u tell me where shud i insert the above code in Main code..

    that helps....

    ReplyDelete
    Replies
    1. replace the part between (including these two lines):
      ' Get a directory listing and fill the StringArray variables

      and

      Dts.Variables("FtpFileList").Value = fileNamesArray

      with the code above... tested it myself and it works. If it doesn't work then contact me via the "contact me page" and include the code and errormessage.

      Delete
  11. Joost...this works great for downloading the files, but what if I needed to create the parent directory on my side and then download the files into that directory? So if the files on the FTP server reside in FolderA, I need to create FolderA on my side and the download all of the files in FolderA.

    ReplyDelete
    Replies
    1. The code is for populating the foreach loop only, but you can use the variable from the foreach loop (that contains the complete ftp file path) to create a folder. You could for example create a second string variable with an expression on it that converts the string from the first variable into a local path. Now you can use that variable to create a folder with an Execute File Task and you can also use it in the FTP Task as a download (destination) folder.

      Delete
  12. is it possible to modify this code to not only download the files but also it's parent folder?

    ReplyDelete
    Replies
    1. Not sure what you mean... this code is for populating the foreach loop only. It only returns filepaths. You can make a recursive function to also loop through subfolders (or if you don't know .Net you can use my custom ftp file enumerator.
      Creating local folders should be done as described above.

      An other great resources is the SSIS MSDN Forum: http://social.msdn.microsoft.com/forums/en-US/sqlintegrationservices/threads/

      Delete
  13. Excellent Example. Solved all my problems. Thanks a lot!!!!

    ReplyDelete
  14. Thank you for the code. Very good job.

    One small problem I have. I want to download only certain files.

    All files that start with "test" as an example.

    I've already tried with String.StartsWith but that does not work.

    Can you help me?

    ReplyDelete
    Replies
    1. One option could be to loop trough the string array or the arraylist and add an if-statement inside. Something like (haven't tested it):

      // Loop through arraylist of ftp files
      ArrayList fileNamesArrayFiltered = new ArrayList();
      foreach (string value in fileNamesArray)
      {
      // Check if it starts with "test"
      if (value.StartsWith("test"))
      {
      fileNamesArrayFiltered.Add(value);
      }
      }

      // Optional sorter
      fileNamesArrayFiltered.Sort();

      // Fill ssis object variable
      Dts.Variables["FtpFileList"].Value = fileNamesArrayFiltered;

      An other option could be to use my custom ftp enumerator. It has a filter in it.

      Delete
  15. Joost,
    Thanks for this, and for all your work responding to users. Without using too much of your time, could you point me in the best direction for using your ftp enumerator for only grabbing the most recent file in a dir? I've searched for ways to alter the script or to use the sort desc and then only run the loop one time, but I haven't hit on the answer.

    Thanks again,
    scott

    ReplyDelete
    Replies
    1. Hi Scott,

      The standard FTP Connection in SSIS doesn't support filedates. So the FTP FILE Enumerator wont help you unless the date is stored in de filename.

      I think you have two options:
      1) .Net coding in a Script Task (any experience?)
      2) Use a FTP client that does support filedates and call it via the Execue Process Task.

      Regards,

      Joost

      Delete
  16. how would i get all the files in the subfolders?

    ReplyDelete
    Replies
    1. You have t make a recursive method that gets all the files and sub folders in a certain folder. And for each folder you call this method again...

      I created a custom foreach ftp file enumerator that does it for you if you don't want to program.

      Delete
  17. i made th same example but using excel as source data , but i can't excute the package i have an error :

    Erreur sur Tâche de flux de données [Source Excel [1]] : Code d'erreur SSIS DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. Échec de l'appel de la méthode AcquireConnection vers le gestionnaire de connexions « Gestionnaire de connexions Excel ». Code d'erreur : 0xC0202009. Des messages d'erreur peuvent être envoyés au préalable avec des informations indiquant la raison de l'échec de la méthode AcquireConnection.


    Erreur sur Tâche de flux de données [SSIS.Pipeline] : Échec de la validation de composant « Source Excel » (1). Code d'erreur : 0xC020801C.


    Erreur sur Tâche de flux de données [SSIS.Pipeline] : Échec de la validation d'un ou de plusieurs composants.


    Erreur sur Tâche de flux de données : Des erreurs se sont produites au cours de la validation de la tâche.


    ReplyDelete
    Replies
    1. Sorry don't speak/read French. But I think you have to set delay validation property to true because the excel doesn't yet exist when starting the package. Try setting it for the Excel Connection Manager and/or for the Data Flow Task.

      Delete
  18. Thanks, this article was really helpful. I do wish the SSIS FTP Task was more robust though.

    ReplyDelete
  19. thanks for sharing .. it helped :)

    ReplyDelete
  20. Hi there, this is a superb script. However what ammendments would you sugget for automatically moving the file on the FTP to a "Processed" folder; so that the file doesn't get downloaded again when the script re-runs. I have added a "Remove" file ammendment, however I would like to store it in a separate folder.

    ReplyDelete
  21. Hi Joost, I need to download the file from local to ftp one by one from the folder with foreach loop container and make sure that if any file loading fail then it try again 2 time and if its still fails then its sent us the meessage in a catch blog.

    ReplyDelete
    Replies
    1. Hello Unknown,

      If you want to upload multiple files to the FTP server then you should use the regular FILE enumerator (not this custom FTP FILE enumerator). Within the loop you can add a FTP task to upload the file. With a retry trick http://microsoft-ssis.blogspot.com/2014/06/retry-task-on-failure.html you can accomplish your other request.

      Delete
  22. i did not understand the data flow part, can you please add a video.
    I am downloading images files.

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.