Case
The standard Foreach Loop File Enumerator only has a wildcard filter, but in
some cases a regular expression filter would be more useful. See/vote this
request at Microsoft Connect.
Solution
At the moment I'm working on a custom
File Enumerator with regular expression support, but for those who don't want to wait or don't want to use custom components... You could achieve the same result with a Script Task.
*update: Now also available as
Custom Foreach Enumerator. *
1) Foreach Loop
For this case I will use a
standard Foreach File Enumerator that fills a variable FilePath.
The filter is *.* so all files will be returned.
|
Standard foreach loop |
2) Variables
I will use two extra variables in this example:
RegexFilter (string) for storing the regular expression and
PassesRegexFilter (boolean) for indicating whether the filename passes the regular expression filter.
|
Variables |
3) Script Task
Add a Script Task in front of the Data Flow Task and give it a suitable name.
|
Script Task |
4) Add variables
Edit the Script Task and add the FilePath and RegexFilter as ReadOnlyVariables and the PassesRegexFilter as ReadWriteVariable.
|
Variables |
5) The Script
Copy the following script to the Script Task.
// C# Code for filtering filenames with Regex
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Text.RegularExpressions; // Added
using System.IO; // Added
namespace ST_02b6595da2274d7182409fb43af929ae.csproj
{
[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
#region VSTA generated code
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
public void Main()
{
// Get the filename from the complete filepath
String FileName = Path.GetFileName(Dts.Variables["User::FilePath"].Value.ToString());
// Create a regex object with the pattern from the SSIS variable
Regex FileFilterRegex = new Regex(Dts.Variables["User::RegexFilter"].Value.ToString());
// Check if it is match and return that value (boolean) to the SSIS variable
Dts.Variables["User::PassesRegexFilter"].Value = FileFilterRegex.IsMatch(FileName);
Dts.TaskResult = (int)ScriptResults.Success;
}
}
}
or with VB.Net
' VB.Net Code for filtering filenames with Regex
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Text.RegularExpressions ' Added
Imports System.IO ' Added
<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
Enum ScriptResults
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
End Enum
Public Sub Main()
' Get the filename from the complete filepath
Dim FileName As String = Path.GetFileName(Dts.Variables("User::FilePath").Value.ToString())
' Create a regex object with the pattern from the SSIS variable
Dim FileFilterRegex As Regex = New Regex(Dts.Variables("User::RegexFilter").Value.ToString())
' Check if it is match and return that value (boolean) to the SSIS variable
Dts.Variables("User::PassesRegexFilter").Value = FileFilterRegex.IsMatch(FileName)
Dts.TaskResult = ScriptResults.Success
End Sub
End Class
6) Precedence Constraints Expression
Connect the Script Task to the Data Flow Task and add an expression that does the real filtering:
@[User::PassesRegexFilter] == true
|
Precedence Constraint with expression |
7) The result
For testing the result, I added an other Script Task below the Data Flow Task that logs all files. My regular expression in this test case: .*[0-9]\.csv$ (all csv files that end with a number in the filename)
|
The result |
Note 1: to gain some performance I should have changed the Foreach Loop File Enumerator wildcard from *.* to *.csv
Note 2: this method could be a bit inefficient if your file collection contains thousands of files and you only need two of them.