The standard Foreach Loop File Enumerator only has a wildcard filter, but in some cases a regular expression filter would be more useful. See/vote this request at Microsoft Connect.
Solution
At the moment I'm working on a custom File Enumerator with regular expression support, but for those who don't want to wait or don't want to use custom components... You could achieve the same result with a Script Task.
*update: Now also available as Custom Foreach Enumerator. *
1) Foreach Loop
For this case I will use a standard Foreach File Enumerator that fills a variable FilePath. The filter is *.* so all files will be returned.
Standard foreach loop |
2) Variables
I will use two extra variables in this example: RegexFilter (string) for storing the regular expression and PassesRegexFilter (boolean) for indicating whether the filename passes the regular expression filter.
Variables |
3) Script Task
Add a Script Task in front of the Data Flow Task and give it a suitable name.
Script Task |
4) Add variables
Edit the Script Task and add the FilePath and RegexFilter as ReadOnlyVariables and the PassesRegexFilter as ReadWriteVariable.
Variables |
5) The Script
Copy the following script to the Script Task.
// C# Code for filtering filenames with Regex using System; using System.Data; using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; using System.Text.RegularExpressions; // Added using System.IO; // Added namespace ST_02b6595da2274d7182409fb43af929ae.csproj { [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase { #region VSTA generated code enum ScriptResults { Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success, Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure }; #endregion public void Main() { // Get the filename from the complete filepath String FileName = Path.GetFileName(Dts.Variables["User::FilePath"].Value.ToString()); // Create a regex object with the pattern from the SSIS variable Regex FileFilterRegex = new Regex(Dts.Variables["User::RegexFilter"].Value.ToString()); // Check if it is match and return that value (boolean) to the SSIS variable Dts.Variables["User::PassesRegexFilter"].Value = FileFilterRegex.IsMatch(FileName); Dts.TaskResult = (int)ScriptResults.Success; } } }or with VB.Net
' VB.Net Code for filtering filenames with Regex Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports System.Text.RegularExpressions ' Added Imports System.IO ' Added <System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _ <System.CLSCompliantAttribute(False)> _ Partial Public Class ScriptMain Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase Enum ScriptResults Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure End Enum Public Sub Main() ' Get the filename from the complete filepath Dim FileName As String = Path.GetFileName(Dts.Variables("User::FilePath").Value.ToString()) ' Create a regex object with the pattern from the SSIS variable Dim FileFilterRegex As Regex = New Regex(Dts.Variables("User::RegexFilter").Value.ToString()) ' Check if it is match and return that value (boolean) to the SSIS variable Dts.Variables("User::PassesRegexFilter").Value = FileFilterRegex.IsMatch(FileName) Dts.TaskResult = ScriptResults.Success End Sub End Class
6) Precedence Constraints Expression
Connect the Script Task to the Data Flow Task and add an expression that does the real filtering:
@[User::PassesRegexFilter] == true
Precedence Constraint with expression |
7) The result
For testing the result, I added an other Script Task below the Data Flow Task that logs all files. My regular expression in this test case: .*[0-9]\.csv$ (all csv files that end with a number in the filename)
The result |
Note 1: to gain some performance I should have changed the Foreach Loop File Enumerator wildcard from *.* to *.csv
Note 2: this method could be a bit inefficient if your file collection contains thousands of files and you only need two of them.
This has helped me significantly, but I'm running into problems with this part:
ReplyDelete" _ _ Partial Public Class ScriptMain Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase "
SSIS 2005 doesn't acknowledge System.AddIn.AddIn, or VSTARTScriptObjectModelBase. I tried using ScriptObjectModel, but now receive the error "Error 30387: Class 'ScriptTask_ed44248e579b4e0baaac2b3f49154d1d.ScriptMain' must declare a 'Sub New' because its base class 'Microsoft.SqlServer.Dts.Tasks.ScriptTask.ScriptObjectModel' does not have an accessible 'Sub New' that can be called with no arguments."
Talking about the number of files in the collection, I guess my project is relatively straightforward since there is a beginning portion of the file that remains constant, so I can use that in Foreach Loop setup to only return files that begin with that portion, it's just there are now two separate processes that are based on the ending of the file names.
Thanks for your help.
Code is from SSIS 2008. Just copy the two new imports rows and the first 6 lines (not counting the empty lines) of the main method to your own main method. Let me know if it worked.
Deletehi joost!
ReplyDeletejust wondering if regex can handle two different file names (i.e., circle files with circle patterns and square files with square patterns) instead of two different extensions in one foreach loop container?
for example:
square_a.xlsx
square_b.xlsx
square_c.xlsx
circle_a.xlsx
circle_b.xlsx
Should be no problem at all. There is an or available within regex. The or operator is an pipe |
DeleteYou have to google for the right syntax.
He there. It would be easier, probably, to retrieve the names of the relevant files through a script task and put them into a variable (object) that would hold the collection and then loop over it in a foreach component. Just a thought... You would then avoid going through potentially unnecessary files. It would have the potential to be extremely fast regardless of the number of files in the relevant folder.
ReplyDeleteHelpful post. Thank you
ReplyDelete