Saturday, 15 January 2011

Use filedates in SSIS

Case
A while ago someone asked me if it was possible to use the actual filedate of a file in SSIS.

Solution
Well, a Script task can retrieve the creation and modify date of a file. In this example I will import the latest file in a folder and delete all files older than 5 days. And as a bonus I will add an audit column with the filedate. A lot of steps (may be a bit too much for the experienced users, but not for the novice ones).

1) Create variables
Create three variables named FilePath (string), FileDate (datetime) and SourceFolder (string). Fill the SourceFolder variable with an existing path where the sourcefiles are stored.
Variables










2) Script task
Drag a Script task and a Data Flow task to you Control Flow and give them suitable names.
Script and Data Flow task















3) Map variables in Script task
Edit the Script task and select the FilePath and FileDate as ReadWrite variables and the SourceFolder as ReadOnly variable.
Map variables



















4) The Script
Copy the code from the main method to your code and add the extra using.
// C# code 
// Script to get the last modified file
// Note: if there are two files with both
// the highest modified date, the script
// will pick a random one.
using System;
using System.Data;
using System.IO;    // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_9f1bb3d696fe44b9b5e50ee8f5f648b5.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            // Get all files in the SourceFolder
            string[] sourceFiles = Directory.GetFiles(Dts.Variables["User::SourceFolder"].Value.ToString());

            // Store highest date en file
            DateTime highestDate = new DateTime();
            string lastFile = string.Empty;

            // Variable for file information
            FileInfo fileInfo;

            // Loop through sourceFiles
            foreach (string currentFile in sourceFiles)
            {
                // Fill fileInfo variable with file information
                fileInfo = new FileInfo(currentFile);

                // Choose between creation date and lastmodified date
                // Compare the file date to the variable
                if (fileInfo.LastWriteTime > highestDate)
                {
                    // Fill variables if the current file is the newest
                    lastFile = currentFile;
                    highestDate = fileInfo.CreationTime;
                }
            }

            // Return values to the SSIS variables
            Dts.Variables["User::FilePath"].Value = lastFile;
            Dts.Variables["User::FileDate"].Value = highestDate;

            // Finish script
            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

5) Edit Data Flow
Create a Flat file source, Derived Column and a target of your choice. It should look something like this:
Data Flow






















6) Source File
Add a connection manager to your Flat File Source. Select one of the files in your source folder for this connection manager. It doesn't matter which one, because the connection string will be overwritten by an expression. Add the colums to finish it. Now go to the properties of your newly created connection manager and add an expression.
Right Click if you don't see the properties


Add an expression






















7) Expression
Add an expression for the ConnectionString Property so the ConnectionString will be overwritten by the value of the FilePath variable.
Create an expression




















8) Derived Column
Edit the Derived Column and add a new column that is filled with the variable FileDate.
New column with FileDate



















9) Target
Finish the dataflow by adding a target and test the first part of this mapping.
The test result

















Now the second part which deletes files older than 5 days. We could do this in one simple Script task, but I will try to limit the .Net code because this is an SSIS blog and not a .Net blog.


10) Delete old files
Add a Foreach Loop to the Control Flow with a Script Task and a File System Task in it. Your Control Flow should look something like this.
Continuing Control Flow






















11) Configure Foreach Loop
Edit the Foreach Loop and select the Foreach File Enumerator. After that enter the path of you sourcefolder.
Foreach File



















12) Expression
This step is not really necessary for this example, but it's just a bit nicer. Goto the Expressions on the Collection tab of the Foreach Loop and add an expression for the Directory which will overwrite the Folder property (don't let the different names confuse you) with the variable SourceFolder.
Add an expression for the Directory/Folder






















13) Variable Mapping
Map the FilePath variable to Index 0 in the Foreach Loop.
Variable Mappings



















14) Script task
Copy the code from the main method to your code and add the extra using. Don't fill in the variables this time (it's done by script).
// C# code 
// Fill SSIS variable with the last modified date
// of the current source file
using System;
using System.Data;
using System.IO;    // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_cb8dd466d98149fcb2e3852ead6b6a09.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            // Lock SSIS variables
            Dts.VariableDispenser.LockForRead("User::FilePath");
            Dts.VariableDispenser.LockForWrite("User::FileDate");

            // Create a variables 'container' to store variables
            Variables vars = null;

            // Add variables from the VariableDispenser to the variables 'container'
            Dts.VariableDispenser.GetVariables(ref vars);

            // Variable for file information
            FileInfo fileInfo;

            // Fill fileInfo variable with file information
            fileInfo = new FileInfo(vars["User::FilePath"].Value.ToString());

            // Choose between creation date and lastmodified date
            // Fill SSIS variable with last modified date
            vars["User::FileDate"].Value = fileInfo.LastWriteTime;

            // Release the locks
            vars.Unlock();
            
            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

15)  Precedence Constraint
Enter the following Contraint to the line between the Script Task and the File System Task:
DATEDIFF("dd", @[User::FileDate], @[System::StartTime] ) > 5
Date check in the Expression

















16) File System Task
Delete the file which is stored in the variabe FilePath. You need to enter some dummy default value in the variable. Otherwise you will get an error message: Variable "FilePath" is used as an source or destination and is empty.
Delete the old file



















17) The result
Now test the package with a couple of files.
Last file wasn't old.






















That's that! Let me know if you have some other interesting SSIS ideas with file properties.

Update: a whole bunch of file properties.

18 comments:

  1. Do you know how to change the "last modified date" on a file using SSIS? I'm using a File System Task to move & rename a file, but I can't get it to update the "last modified date". Thanks.

    ReplyDelete
  2. @Andy Abel: Copy or rename wont change the LastModifiedDate. So the File System Task wont help you here. You could use a Script Task with some .net code:
    File.SetLastWriteTime(Dts.Variables["User::FilePathVariable"].Value.TosString(), DateTime.Now);

    More info here

    ReplyDelete
  3. Thank your for this useful post.

    In my package the the audit column is already created in the destination table.
    How can I update this column with the FileDate variable ?

    Thanks in advance

    ReplyDelete
    Replies
    1. Hi,

      You can use the Derived Column in your Data Flow to get the values of the variables in a column. An alternative could be to run an update query with these variables in a Execute SQL Task. Let me know if there are any problems.

      Joost

      Delete
  4. Hi Joost,

    In my package I want to move the flat files to an archive folder after they are imported in DB. The archive-Folder is in Source folder. From foreach loop i get the source directory and save it in a variable named sourceDirectory. Now I want to say archiveDirectory = sourceDirectory + "\\Archive". Could you please help me to solve this problem?

    thanks

    sead

    ReplyDelete
    Replies
    1. Hi Sead,

      Create a variable with an expression (don't forget to switch the validate as expresion property to true). The expression looks something like: @[User::sourceDirectory] + "\\Archive".
      You can use that variable in a File System Task to indicate the destination folder.

      Delete
  5. Helpful. Thanks!

    ReplyDelete
  6. Well done - thanks for sharing that.

    ReplyDelete
  7. Thanks for the post...

    I have a set of files in the folder such as Cusomer.CSV,Product.CSV,Geography.CSV...I got to mention the filename some where to ensure the package picks up the correct file and load into database..The above code works for latest file in a folder...But I need to pick up the latest file from a group of different files and load the same...any help?

    ReplyDelete
    Replies
    1. Do you have products_A.CSV, products_B.CSV and products_C.CSV and you want to determine the latest file of all files starting with products?

      Delete
  8. ya I have something like products1.csv,products2.csv,products3.csv....and each file with different creation date/modified date..... I ll have 3 seperate packages for product,customer , geography and the product package has to compare all the product files and take the latest file and load it.

    any help?thank u in advance

    ReplyDelete
    Replies
    1. The getfiles method has a searchpattern option:

      string[] sourceFiles = Directory.GetFiles(Dts.Variables["User::SourceFolder"].Value.ToString(), "product*.csv");

      Delete
  9. thank u for yoour help:) worked perfectly!!!

    ReplyDelete
  10. Perfect explanation, well documented. Thank you

    ReplyDelete
  11. Thank you very much. unfortunately I having trouble by getting the error:
    Error: A deadlock was detected while trying to lock variables "User::FilePath" for read access and variables "User::FileDate" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out.

    Is there some thing I can do about it?

    Thanks,
    Sasha

    ReplyDelete
    Replies
    1. If the error is in the second script. See step 14 (don't add them as readonly or readwrite variables)

      Delete
    2. I had a similar problem. Solution? Create my files and then copy them by giving it a unique name and add the link of the unique name to the user email.
      Why? when creating the file, it should not be open by anyone. unique name save it all.

      Delete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.