Sunday, 7 August 2011

Foreach ftp file enumerator

Case
I want to loop through all files in a FTP folder and process them. I need a Foreach ftp file enumerator, but the Foreach Loop component only loops through local files.

Solution
There are a couple of options. You could use a Script Task, a wildcard in the RemotePath Property of the FTP Task or a Third Party component to download all files to a certain folder and loop through those files with a Foreach Loop file enumerator. This is the most common approach.
This example uses a Script Task to get a list of ftp files and uses a Foreach From Variable Enumerator to loop through those files.

Update: There is now a custom Foreach FTP File Enumerator available.



1) Variables
We need a couple of variables to keep things flexible:
  • FtpWorkingDirectory: String variable with the remote folder path. Example: /root/data/
  • DownloadDirectory: String variable that contains the path of the download folder: Example: d:\SourceFiles\
  • FtpFileURL: String variable used in the foreach loop to store the remote filename in. Example: 01-01-2011.csv
  • FtpFileList: Object variable to store the remote file list in.
Variables to keep things flexible











2) Ftp Connection Manager
Create a FTP Connection Manager by right clicking in the Connection Manager pane and select New Connection... In the new window select FTP and click Add. After that fill in the FTP Connection Manager Editor. Test the connection manager and rename it to myFtpServer (this name is used in the Script Task).
New FTP connection




















After closing you can rename it.

















3) Script Task
Add a Script Task to the Control Flow and give it a suitable name. Edit the Script Task and add variable FtpWorkingDirectory as ReadOnly and FtpFileList as ReadWrite.
Script Task to get list of files






















4) The Script
Edit the script and copy the following code.
// #C Code
using System;
using System.Collections;       // Added
using System.Data;
using System.Net;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_f20cc76ba3bc47849f70c6cbfd4701dc.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            try
            {
                // Get the ftp connection from the Connection Managers collection
                ConnectionManager ftpServer = Dts.Connections["myFtpServer"];

                // Create a FTP connection object and us the credentials of connection manager
                FtpClientConnection myFtpConnection = new FtpClientConnection(ftpServer.AcquireConnection(null));

                // Open the connection
                myFtpConnection.Connect();

                // Set work folder with the value of the variable
                myFtpConnection.SetWorkingDirectory(Dts.Variables["FtpWorkingDirectory"].Value.ToString());

                // Create StringArrays for filenames and folders
                // The folderNames aren't used, but is mandatory
                // for the next method.
                String[] fileNames;
                String[] folderNames;

                // Get a directory listing and fill the StringArray variables
                myFtpConnection.GetListing(out folderNames, out fileNames);

                // Copy StringArray to ArrayList to fit in Object variable
                ArrayList fileNamesArray = new ArrayList(fileNames);
                
                // Optional sorter
                fileNamesArray.Sort();

                // Fill ssis object variable
                Dts.Variables["FtpFileList"].Value = fileNamesArray;

                // Close connection
                myFtpConnection.Close();

                // Close Script Task, set result to success
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception ex)
            {
                // Fire error and set result to failure
                Dts.Events.FireError(0, "FTP Script Task", "Error: " + ex.Message, string.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }
}

Or VB.Net

' VB.Net code
Imports System
Imports System.Collections
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 
    Public Sub Main()
        Try
            ' Get the ftp connection from the Connection Managers collection
            Dim ftpServer As ConnectionManager = Dts.Connections("myFtpServer")

            ' Create a FTP connection object and us the credentials of connection manager
            Dim myFtpConnection As FtpClientConnection = New FtpClientConnection(ftpServer.AcquireConnection(Nothing))

            ' Open the connection
            myFtpConnection.Connect()

            ' Set work folder with the value of the variable
            myFtpConnection.SetWorkingDirectory(Dts.Variables("FtpWorkingDirectory").Value.ToString())

            ' Create StringArrays for filenames and folders
            ' The folderNames aren't used, but is mandatory
            ' for the next method.
            Dim fileNames() As String
            Dim folderNames() As String

            ' Get a directory listing and fill the StringArray variables
            myFtpConnection.GetListing(folderNames, fileNames)

            ' Copy StringArray to ArrayList to fit in Object variable
            Dim fileNamesArray As ArrayList = New ArrayList(fileNames)

            ' Optional sorter
            fileNamesArray.Sort()

            ' Fill ssis object variable
            Dts.Variables("FtpFileList").Value = fileNamesArray

            ' Close connection
            myFtpConnection.Close()

            ' Close Script Task, set result to success
            Dts.TaskResult = ScriptResults.Success
        Catch ex As Exception
            ' Fire error and set result to failure
            Dts.Events.FireError(0, "FTP Script Task", "Error: " + ex.Message, String.Empty, 0)
            Dts.TaskResult = ScriptResults.Failure
        End Try
    End Sub
End Class

5) Foreach Loop
Add a Foreach Loop after the Script Task. Edit the Foreach Loop and on the Collection tab select Foreach From Variable Enumerator as enumerator and select the object variable FtpFileList as the variable.
Foreach From Variable Enumerator


















6) Foreach Loop Variable Mappings
On the Variable Mappings tab of the Foreach Loop select the variable FtpFileURL.
Variable Mapping


















7) FTP Task General
Add a FTP task within the Foreach Loop. Edit the FTP Task and on the General tab you can select the FTP Connection which we created in step 2. And you can give it a suitable name.
General: select the ftp connection


















8) FTP Task File Transfer
On the File Transfer tab you can select the DownloadDirectory variable for the local folder. In the Operation category you should select Receive files as the operation. For RemotePath you can enter a dummy value because we are overriding that value with an expression in the next step.
File Transfer


















9) FTP Task Expressions
At the Expressions tab you must add an expression for the RemotePath: @[User::FtpWorkingDirectory] +  @[User::FtpFileURL] This expression combines the working folder (/root/data/) and the filename (01-01-2011.csv) into: /root/data/01-01-2011.csv.
Click at 2 to open the Expression Editor


















10) Process file in Data Flow
Add a Data Flow Task within the Foreach Loop to process the downloaded files.
Process files






















10) Process file in Connection Manager
Create a Data Flow like the example below. I named the connection for the csv file myDownloadedFtpFile. Go to the properties of that connection and add an expression for the ConnectionString so we can use it in the Foreach Loop: @[User::DownloadDirectory] + @[User::FtpFileURL]
A simple data flow





















Expression to override the ConnectionString property













11) The result
Run the package to see the result.
The result

















Download example package

Note: In the Script Task you can use the .Net variable folderNames to create some recursive function to loop through all subfolders as well.

Note: If you just want to download all files with a Script Task and use a Foreach Loop file enumerator to process all files, you can replace the C# lines 48 - 55 or the VB.Net lines 41 - 48 with the following code:

// C# Code
// Download all files at once. Don't forget to add the SSIS variable DownloadDirectory to the ReadOnlyVariables
myFtpConnection.ReceiveFiles(fileNames, Dts.Variables["DownloadDirectory"].Value.ToString(), true, false);



' VB.Net code
' Download all files at once. Don't forget to add the SSIS variable DownloadDirectory to the ReadOnlyVariables
myFtpConnection.ReceiveFiles(fileNames, Dts.Variables("DownloadDirectory").Value.ToString(), true, false)



Update: There is a custom Foreach FTP File Enumerator available.

Monday, 1 August 2011

Exporting images with SSIS

Case
In a previous article I showed you how to import images (or other files) with SSIS into a SQL Server table. This article shows you how to get them out again with the Export Column Transformation.

Solution
This example  assumes that you have a filled table named ImageStore which I created in the previous article.
CREATE TABLE [dbo].[ImageStore](
 [id] [int] IDENTITY(1,1) NOT NULL,
 [imageName] [nvarchar](50) NULL,
 [imageFile] [varbinary](max) NULL
) ON [PRIMARY]

1) Variables
Add a string variable to the package to store the path of the export folder.
I called mine ExportFolder. Fill it with a value like: C:\Users\Joost\Pictures\Export\
Variable to store the export folder









2) OLE DB Source
Add an OLE DB Source component that reads from the table ImageStore. We need the columns imageName and imageFile.
OLE DB Source component













3) Derived Column
With the folderpath from the variable and the filename from the column imageName we create an export filepath which is needed for the Export Column component. Name it NewFilePath.
Expression: @[User::ExportFolder] + imageName






















4) Export Column
Add an Export Column component and select the imageFile column as Extract Column and the NewFilePath column as File Path Column.
Export Column



















5) The result
Run the package and watch the folder for the result.
The result

















Note: it also works with other filetypes, not just images.

Importing images with SSIS

Case
I want to import files (images) with SSIS to a SQL Server table. And export those images to an other folder.

Solution
The easiest way to import files/images with SSIS, is with the Import Column Transformation. You throw in a filepath and it creates a blob column with binary data. But the Script Component can be useful too. For export you can use the Export Column Transformation.

Three examples:
A) Importing images with Import Column
B) Importing images with Script Component and Import Column
C) Importing images with Script Component


A) Importing images with Import Column
For the first example I have created a CSV file with filepaths of images named bikes.csv:
C:\Users\Joost\Pictures\MountainBike1.jpg
C:\Users\Joost\Pictures\MountainBike2.jpg
C:\Users\Joost\Pictures\MountainBike3.jpg

My target table looks like this:
CREATE TABLE [dbo].[ImageStore](
 [id] [int] IDENTITY(1,1) NOT NULL,
 [imageName] [nvarchar](50) NULL,
 [imageFile] [varbinary](max) NULL
) ON [PRIMARY]

Note: If you want to store images in a SQL Server table use the VARBINARY datatype because the IMAGE datatype is due to be removed from SQL Server.

A1) Flat File Source
Go to your Data Flow and use a Flat File Source to read the CSV file from above.
Import file with only one column: ImagePath















A2) Get filename from path
I also want to store the original filename in my table. I use an expression in a Derived Column to get the filename from the filepath. Add a Derived Column and add a new column named FileName. The expression is:
RIGHT(ImagePath,FINDSTRING(REVERSE(ImagePath),"\\",1) - 1)
Extract filename from filepath



















A3) Import Column
Now the actual importing of the image files into SSIS. Add a Import Column to the Control Flow.
Import Column






















A4) Input Columns
Edit the Import Column transformation and go to the Input Columns tab. Select the column that contains the filepath of the images.
Select column




















A5) Input and Output Properties
Go to the Input and Output Properties tab and add a new output column of the type DT_IMAGE and give it the name ImageData. Remember the generated ID because you need it for the next step.
New output column: DT_IMAGE




















A6) Connect input and output
Now go to the Input Columns. Select the input column and change the FileDataColumnId to the ID of the output column (24 in my case).
FileDataColumnId




















A7) OLE DB Destination
Now you're ready to import the data in the database table. Add an OLE DB Destination and select the table ImageStore. Connect FileName to imageName and ImageData to imageFile.
OLE DB Destination



















A8) The result
Run the package and watch the table for the result.
The result
























B) Importing images with Script Component and Import Column
This example only replaces the Flat File Source Component from example A into a Script Component that reads all files in a certain folder.
Replace CSV file by Script Component



















B1) Variable
Add a string variable to the package to store the path of the folder with images.
I called mine imageLocation. Fill it with a value like: C:\Users\Joost\Pictures\
Variable to store the import folder











B2) Script Component
Add a Script Component to the Data Flow and select Source as the Script Component Type.
Script Component Source





















B3) Select variable
Select the variable from step 1 as a ReadOnly variable.
Select ReadOnlyVariables



















B4) Inputs and Outputs
Go to the Inputs and Outputs tab and add an output column: imageName (Unicode string [DT_WSTR] 150).
Column for storing the image path



















B5) The Script
Edit the Script and replace the script with this one (we only need CreateNewOutputRows).
//C# Code
using System;
using System.Data;
using System.IO;        // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
        // Get all files within the folder
        string[] allFiles = Directory.GetFiles(this.Variables.imageLocation.ToString());

        // Loop through the files in the folder
        foreach (string currentFile in allFiles)
        {
            // Create a new record foreach file
            this.Output0Buffer.AddRow();
            Output0Buffer.ImagePath = currentFile;
        }
    }
}

Or VB.Net

' VB.Net Code
Imports System
Imports System.Data
Imports System.IO       ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub CreateNewOutputRows()
        ' Get all files within the folder
        Dim allFiles() As String = Directory.GetFiles(Me.Variables.imageLocation.ToString())

        ' Loop through the files in the folder
        Dim currentFile As String
        For Each currentFile In allFiles
            ' Create a new record foreach file
            Me.Output0Buffer.AddRow()
            Output0Buffer.imagePath = currentFile
        Next
    End Sub
End Class

B6) Restore Invalid Column References
Connect the new Script Component to the Derived Column and map the old imagePath column to new imagePath Column by double clicking on the Derived Column and pressing OK to confirm the mapping.
Restore Invalid Column References editor

















B7) The result
Run the package and watch the table for the result.
The result

























C) Importing images with Script Component
This example uses a Script Component to import all files in a folder to the same database table as in the first examples. The Script Component replaces all components except the OLE DB Destination. The Script Component loads the file into a Bit array via a FileStream object. That array can be used to fill an image column in SSIS.

Note: this is just an example to show that there are other ways to import files. Where possible use the Import Column Transformation.

C1) Variable
Add a string variable to the package to store the path of the folder with images.
I called mine imageLocation. Fill it with a value like: C:\Users\Joost\Pictures\
Variable to store the import folder








C2) Script Component
Add a Script Component to the Data Flow and select Source as the Script Component Type.
Script Component Source





















C3) Select variable
Select the variable from step 1 as a ReadOnly variable.
Select ReadOnlyVariables



















C4) Inputs and Outputs
Go to the Inputs and Outputs tab and add two output columns: imageName (Unicode string [DT_WSTR] 50) and imageFile (image [DT_IMAGE]).
Create two output columns



















C5) The Script
Edit the Script and replace the script with this one (we only need CreateNewOutputRows).
// C# Code
using System;
using System.Data;
using System.IO;        // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
        // Get all files within the folder
        string[] allFiles = Directory.GetFiles(this.Variables.imageLocation.ToString());

        // Variable for storing file properties
        FileInfo fileInfo;
        
        // Loop through the files in the folder
        foreach (string currentFile in allFiles)
        {
            // Fill fileInfo variable with file information
            fileInfo = new FileInfo(currentFile);

            // Create new record
            this.Output0Buffer.AddRow();

            // Fill filename column
            Output0Buffer.imageName = fileInfo.Name;

            // Stream the file into a byte[] object which can be used as column data
            FileStream fs = File.OpenRead(fileInfo.FullName);
            byte[] b = new byte[fs.Length];
            fs.Read(b, 0, b.Length);
            
            // Fill column and close filestream
            Output0Buffer.imageFile.AddBlobData(b);
            fs.Close();
        } 
    }
}

or VB.Net

' VB.Net code
Imports System
Imports System.Data
Imports System.IO           ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub CreateNewOutputRows()
        ' Get all files within the folder
        Dim allFiles As String() = Directory.GetFiles(Me.Variables.imageLocation.ToString())

        ' Variable for storing file properties
        Dim fileInfo As FileInfo

        ' Loop through the files in the folder
        Dim currentFile As String
        For Each currentFile In allFiles

            ' Fill fileInfo variable with file information
            fileInfo = New FileInfo(currentFile)

            ' Create new record
            Me.Output0Buffer.AddRow()

            ' Fill filename column
            Output0Buffer.imageName = fileInfo.Name

            ' Stream the file into a byte() object which can be used as column data
            Dim fs As FileStream = File.OpenRead(fileInfo.FullName)
            Dim b(fs.Length - 1) As Byte
            fs.Read(b, 0, b.Length)

            ' Fill column and close filestream
            Output0Buffer.imageFile.AddBlobData([b])
            fs.Close()
        Next
    End Sub
End Class

Note: Don't forget to add error handling and logging.

C6) OLE DB Destination
Now you're ready to import the data in the database table. Add an OLE DB Destination and select the table ImageStore. Connect imageName to imageName and imageFile to imageFile.
Connect input and destination columns



















C7) The result
Run the package and watch the table for the result.
The result

















Note: it also works with other filetypes, not just images.