A while ago someone asked me if it was possible to use the actual filedate of a file in SSIS.
Solution
Well, a Script task can retrieve the creation and modify date of a file. In this example I will import the latest file in a folder and delete all files older than 5 days. And as a bonus I will add an audit column with the filedate. A lot of steps (may be a bit too much for the experienced users, but not for the novice ones).
1) Create variables
Create three variables named FilePath (string), FileDate (datetime) and SourceFolder (string). Fill the SourceFolder variable with an existing path where the sourcefiles are stored.
Variables |
2) Script task
Drag a Script task and a Data Flow task to you Control Flow and give them suitable names.
Script and Data Flow task |
3) Map variables in Script task
Edit the Script task and select the FilePath and FileDate as ReadWrite variables and the SourceFolder as ReadOnly variable.
Map variables |
4) The Script
Copy the code from the main method to your code and add the extra using.
// C# code // Script to get the last modified file // Note: if there are two files with both // the highest modified date, the script // will pick a random one. using System; using System.Data; using System.IO; // Added using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; namespace ST_9f1bb3d696fe44b9b5e50ee8f5f648b5.csproj { [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase { #region VSTA generated code enum ScriptResults { Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success, Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure }; #endregion public void Main() { // Get all files in the SourceFolder string[] sourceFiles = Directory.GetFiles(Dts.Variables["User::SourceFolder"].Value.ToString()); // Store highest date en file DateTime highestDate = new DateTime(); string lastFile = string.Empty; // Variable for file information FileInfo fileInfo; // Loop through sourceFiles foreach (string currentFile in sourceFiles) { // Fill fileInfo variable with file information fileInfo = new FileInfo(currentFile); // Choose between creation date and lastmodified date // Compare the file date to the variable if (fileInfo.LastWriteTime > highestDate) { // Fill variables if the current file is the newest lastFile = currentFile; highestDate = fileInfo.CreationTime; } } // Return values to the SSIS variables Dts.Variables["User::FilePath"].Value = lastFile; Dts.Variables["User::FileDate"].Value = highestDate; // Finish script Dts.TaskResult = (int)ScriptResults.Success; } } }
5) Edit Data Flow
Create a Flat file source, Derived Column and a target of your choice. It should look something like this:
Data Flow |
6) Source File
Add a connection manager to your Flat File Source. Select one of the files in your source folder for this connection manager. It doesn't matter which one, because the connection string will be overwritten by an expression. Add the colums to finish it. Now go to the properties of your newly created connection manager and add an expression.
Right Click if you don't see the properties |
Add an expression |
7) Expression
Add an expression for the ConnectionString Property so the ConnectionString will be overwritten by the value of the FilePath variable.
Create an expression |
8) Derived Column
Edit the Derived Column and add a new column that is filled with the variable FileDate.
New column with FileDate |
9) Target
Finish the dataflow by adding a target and test the first part of this mapping.
The test result |
Now the second part which deletes files older than 5 days. We could do this in one simple Script task, but I will try to limit the .Net code because this is an SSIS blog and not a .Net blog.
10) Delete old files
Add a Foreach Loop to the Control Flow with a Script Task and a File System Task in it. Your Control Flow should look something like this.
Continuing Control Flow |
11) Configure Foreach Loop
Edit the Foreach Loop and select the Foreach File Enumerator. After that enter the path of you sourcefolder.
Foreach File |
12) Expression
This step is not really necessary for this example, but it's just a bit nicer. Goto the Expressions on the Collection tab of the Foreach Loop and add an expression for the Directory which will overwrite the Folder property (don't let the different names confuse you) with the variable SourceFolder.
Add an expression for the Directory/Folder |
13) Variable Mapping
Map the FilePath variable to Index 0 in the Foreach Loop.
Variable Mappings |
14) Script task
Copy the code from the main method to your code and add the extra using. Don't fill in the variables this time (it's done by script).
// C# code // Fill SSIS variable with the last modified date // of the current source file using System; using System.Data; using System.IO; // Added using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; namespace ST_cb8dd466d98149fcb2e3852ead6b6a09.csproj { [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase { #region VSTA generated code enum ScriptResults { Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success, Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure }; #endregion public void Main() { // Lock SSIS variables Dts.VariableDispenser.LockForRead("User::FilePath"); Dts.VariableDispenser.LockForWrite("User::FileDate"); // Create a variables 'container' to store variables Variables vars = null; // Add variables from the VariableDispenser to the variables 'container' Dts.VariableDispenser.GetVariables(ref vars); // Variable for file information FileInfo fileInfo; // Fill fileInfo variable with file information fileInfo = new FileInfo(vars["User::FilePath"].Value.ToString()); // Choose between creation date and lastmodified date // Fill SSIS variable with last modified date vars["User::FileDate"].Value = fileInfo.LastWriteTime; // Release the locks vars.Unlock(); Dts.TaskResult = (int)ScriptResults.Success; } } }
Date check in the Expression |
Delete the old file |
Last file wasn't old. |
Great Post!
ReplyDeleteDo you know how to change the "last modified date" on a file using SSIS? I'm using a File System Task to move & rename a file, but I can't get it to update the "last modified date". Thanks.
ReplyDelete@Andy Abel: Copy or rename wont change the LastModifiedDate. So the File System Task wont help you here. You could use a Script Task with some .net code:
ReplyDeleteFile.SetLastWriteTime(Dts.Variables["User::FilePathVariable"].Value.TosString(), DateTime.Now);
More info here
Thank your for this useful post.
ReplyDeleteIn my package the the audit column is already created in the destination table.
How can I update this column with the FileDate variable ?
Thanks in advance
Hi,
DeleteYou can use the Derived Column in your Data Flow to get the values of the variables in a column. An alternative could be to run an update query with these variables in a Execute SQL Task. Let me know if there are any problems.
Joost
Hi Joost,
ReplyDeleteIn my package I want to move the flat files to an archive folder after they are imported in DB. The archive-Folder is in Source folder. From foreach loop i get the source directory and save it in a variable named sourceDirectory. Now I want to say archiveDirectory = sourceDirectory + "\\Archive". Could you please help me to solve this problem?
thanks
sead
Hi Sead,
DeleteCreate a variable with an expression (don't forget to switch the validate as expresion property to true). The expression looks something like: @[User::sourceDirectory] + "\\Archive".
You can use that variable in a File System Task to indicate the destination folder.
Helpful. Thanks!
ReplyDeleteWell done - thanks for sharing that.
ReplyDeleteThanks for the post...
ReplyDeleteI have a set of files in the folder such as Cusomer.CSV,Product.CSV,Geography.CSV...I got to mention the filename some where to ensure the package picks up the correct file and load into database..The above code works for latest file in a folder...But I need to pick up the latest file from a group of different files and load the same...any help?
Do you have products_A.CSV, products_B.CSV and products_C.CSV and you want to determine the latest file of all files starting with products?
Deleteya I have something like products1.csv,products2.csv,products3.csv....and each file with different creation date/modified date..... I ll have 3 seperate packages for product,customer , geography and the product package has to compare all the product files and take the latest file and load it.
ReplyDeleteany help?thank u in advance
The getfiles method has a searchpattern option:
Deletestring[] sourceFiles = Directory.GetFiles(Dts.Variables["User::SourceFolder"].Value.ToString(), "product*.csv");
thank u for yoour help:) worked perfectly!!!
ReplyDeletePerfect explanation, well documented. Thank you
ReplyDeleteThank you very much. unfortunately I having trouble by getting the error:
ReplyDeleteError: A deadlock was detected while trying to lock variables "User::FilePath" for read access and variables "User::FileDate" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out.
Is there some thing I can do about it?
Thanks,
Sasha
If the error is in the second script. See step 14 (don't add them as readonly or readwrite variables)
DeleteI had a similar problem. Solution? Create my files and then copy them by giving it a unique name and add the link of the unique name to the user email.
DeleteWhy? when creating the file, it should not be open by anyone. unique name save it all.