Friday, 4 March 2011

Regular Expressions in SSIS

I want to use a Regular Expression to filter incorrect emailaddresses in SSIS.

There are a couple of options:
Let's elaborate the last option and use this file as a source:

1) Add source
Add a Flat File Source Component in your dataflow for the file example above.
Flat File mailinglist

2) Add Script Component
Add a Script Component, type transformation, after your Flat File Source Component.
Script Component (transformation)

3) Input Columns
 Edit the Script Component and select the email address column as an input column (readonly).
Input Columns tab

4) Outputs
Change the name of the default output port to CorrectEmail and set the ExclusionGroup to 1. Add an extra output port for the IncorrectEmail and set the ExclusionGroup to 1. Also select the input port as the SynchronousInputID. Now you have two output ports. One for the correct and one for the incorrect email address.
Inputs and Outputs tab

4) The Script
Hit the Edit Script button to open the VSTA editor. SSIS creates some standard methods, but we only use Input0_ProcessInputRow. You can remove the rest of the methods. We are also adding a new method that validates the email address. It uses a .Net Regex method. You can search the web for your own regex pattern and change it in the extra method.
// C# code
using System;
using System.Data;
using System.Text.RegularExpressions;   // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

public class ScriptMain : UserComponent

    // Method that will be executed for each row.
    public override void Input0_ProcessInputRow(Input0Buffer Row)
        // Validate the email address
        if (IsCorrectEmail(Row.Email))
            // Redirect the correct email address
            // Redirect the incorrect email address
    // A boolean method that validates an email address
    // with a regex pattern.
    public bool IsCorrectEmail(String emailAddress)
        // The pattern for email
        string emailAddressPattern = @"^(([^<>()[\]\\.,;:\s@\""]+"
                        + @"(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@"
                        + @"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
                        + @"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+"
                        + @"[a-zA-Z]{2,}))$";
        // Create a regex object with the pattern
        Regex emailAddressRegex = new Regex(emailAddressPattern);
        // Check if it is match and return that value (boolean)
        return emailAddressRegex.IsMatch(emailAddress);

' VB.Net code (SSIS 2008)
Imports System
Imports System.Data
Imports System.Math
Imports System.Text.RegularExpressions    ' Added
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
    Inherits UserComponent

    ' Method that will be executed for each row.
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Validate the email address
        If (IsCorrectEmail(Row.Email)) Then
            ' Redirect the correct email address
            ' Redirect the incorrect email address
        End If
    End Sub

    ' A boolean method that validates an email address
    ' with a regex pattern.
    Public Function IsCorrectEmail(ByVal emailAddress As String) As Boolean
        ' The pattern for email
        Dim emailAddressPattern As String = "^(([^<>()[\]\\.,;:\s@\""]+" + _
        "(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@" + _
        "((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" + _
        "\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+" + _

        ' Create a regex object with the pattern
        Dim emailAddressRegex As Regex = New Regex(emailAddressPattern)

        ' Check if it is match and return that value (boolean)
        Return emailAddressRegex.IsMatch(emailAddress)
    End Function

End Class

5) The result
For testing purposes I added two Derived Columns with data viewes on the Data Flow Paths.
The result

Jamie Thomson, alias SSIS Junkie, has an other example in available. And you can also use Regular Expression to clean column values. More about that is described in this article.


  1. Keep in mind I'm not a C# programmer, but your code doesn't work for me. I get too many errors when placing in the code of a script task. Obviously this is most likely because you ommited to explain how this is accomplished in your example. I would like to see you put a little more step by step information instead of assuming that all of us reading your blog know what you know.

  2. @Kevin Ward: Thanks for the feedback. Try the example of SSIS MVP Jamie Thomson. May be you find his example easier.

    And an even better option for non-programmers is to use one of the third party/open source components I mentioned. Then you don't need my difficult coding.

    If you do want to use my coding. Let me know what errors you got or where you lost me.

  3. can you plz tell me where to place c# code???

    1. Double click the script component and hit the Edit Script-button on the first page.

  4. Hi Joost

    Thanks for the script. I created the script, it runs ok, but I do not see my data after the scripting step. I am trying to validate a 5 digits US zip code (12345), I used "^\\d{5}$" as the zipCode pattern.

    Best Regards

  5. For my previous question, I got it to work; the problem is that I forgot to save the script after I ran it, believing that it will be saved automatically.

    I followed the tutorial and got my package to work.

    Thank you.

    1. Code in the .cs or .vb file is automatically saved when you close the vsta editor. Only changes in the internal script component project have to be saved by pressing Save All. For example when you add a reference to a dll.


Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.

Related Posts Plugin for WordPress, Blogger...