Thursday, 29 December 2011

ProperCase in SSIS

Case
I have a source where all data is in uppercase and I want to use ProperCase (similar synonyms: UpperCamelCase, PascalCase, TitleCase, StartCase) instead. How do I do that in SSIS?
All uppercase








Solution
A) If your source is a SQL Server table then you could use a User-Defined Function with some TSQL code (example 1, example 2) to proper case your data.

B) You could come up with some clever and complex, but probably unreadable expression.

C) You could use a Script Component with one line of code for the actual proper case.

Let’s elaborate that Script Component solution. VB.net has a standard method for it: StrConv, but it’s also possible in C#.

1) Add Script Component
Go to the Data Flow and add a Script Component (type transformation) after your source. Connect it to your source and give it a suitable name.
Script Component (type transformation)
























2) Input Columns
Edit the Script Component and go to the Input Columns tab. Select the columns that you want to change as ReadWrite.
Input Columns: ReadWrite























3) The Script
Select the programming language (Visual Basic), edit the Script and copy the following VB.net code.
' VB.Net code
' The standard VB.net string function to convert
' a string to ProperCase (also called TitleCase)
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    ' Just one line of code foreach column
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        Row.FirstName = StrConv(Row.FirstName, VbStrConv.ProperCase)
        Row.LastName = StrConv(Row.LastName, VbStrConv.ProperCase)
    End Sub

End Class

or use the official C# variant:
// C# code
// The C# variant of Proper Case (also title case)
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Globalization;     // Added
using System.Threading;         // Added

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // C# alternative for VB.Net StrConv uses culterinfo and threading
        // See: http://support.microsoft.com/kb/312890/
        CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
        TextInfo textInfo = cultureInfo.TextInfo;

        // The actual Proper Casing
        Row.FirstName = textInfo.ToTitleCase(Row.FirstName);
        Row.LastName = textInfo.ToTitleCase(Row.LastName);

        // Update: Or try this!
        // Row.FirstName = new System.Globalization.CultureInfo("en").TextInfo.ToTitleCase(Row.FirstName);
        // Row.LastName = new System.Globalization.CultureInfo("en").TextInfo.ToTitleCase(Row.LastName);
    }
}

NOTE: for some reason it doesn't work if all chars are uppercase, but it's also possible to use the VB.net VbStrConv.ProperCase in your C# code by adding a reference to Visual Basic.Net

// C# code
// Using the Visual Basic VbStrConv.ProperCase
// by adding a reference to Visual Basic.
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Microsoft.VisualBasic;         // Added
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        Row.FirstName = Strings.StrConv(Row.FirstName, VbStrConv.ProperCase, 0);
        Row.LastName = Strings.StrConv(Row.LastName, VbStrConv.ProperCase, 0);
    }
}

Here is how you add the Visual Basic Reference.
Adding Visual Basic reference


















4) The result
I added a destination and two Data Viewers for testing purposes.
The Result

Saturday, 3 December 2011

Redirect duplicate rows

Case
I have duplicate records in my source which I want to redirect to a separate destination in my data flow task. Most solutions like the Aggregate Transformation or the Sort Transformation with "Remove rows with duplicate sort values"  selected, will remove the records instead of redirecting them.

Example: If there are three identical records (two columns), then two of them should be redirected as duplicate so that I can capture them in an error destination.

Solution
This solution uses a TSQL order by or a SSIS sort in combination with a Script Component to detect and redirect all duplicate records.

If your source is a SQL Server table you could probably come up with a fancy TSQL deduplication script to do the same in less processing time, but this solution is easy and also works for other kind of sources. Even if for example you only want to compare two out of five columns.

1) Source and Sort
Add your source to the Data Flow Task. You need to sort on the columns you want to use for deduplication. If your source is a database table, then you should add a GROUP BY clause to the query else you should add a SORT Transformation right after your source.
Add a sorted source


















2) Add Script Component
Add a Script Component type Transformation after your sorted source and give it a suitable name.
Add Script Component





















3) Edit Script Component: Input Columns
Edit the Script Component and go to the Input Columns tab and select all the columns you want to use for deduplication as ReadOnly (the same fields as you used for sorting).
Select columns.Type doesn't matter.





















4) Edit Script Components: Inputs and Outputs, Edit Output
Go to the Inputs and Outputs tab and change the name of the Output port from "Output 0" to "Unique". Also change the ExclusionGroup to 1.
Rename default output port





















5) Edit Script Components: Inputs and Outputs, New output
Add a new Output and give it the name Duplicate and change the ExlusionGroup to 1. To connect this new output port to the input port, change the SynchronousInputID property and select the input port.
New Output port






















6) The Script
Edit the Script and copy the following code. This script uses reflection to get all selected columns so that you don't have to change the script if you change the input columns. But read the coding comments.
// C# code
// This script automaticly compares the selected columns, but there is one 'bug':
// You have to edit and close this script again if you change input columns.
using System;
using System.Data;
using System.Reflection;                                // Added
using System.Text;                                      // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Create a variable to store the concatenated values for the previous row
    string previousRow = "";

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Create a variable to store the concatenated values of current row
        StringBuilder currentRow = new StringBuilder();

        foreach (PropertyInfo p in Row.GetType().GetProperties())
        {
            // We can't use the _IsNull columns, so ignore them. Also ignore the new output column
            if ((p.Name.ToLower().EndsWith("_IsNull") == false) && (p.Name.Equals("Duplicate") == false))
            {
                try
                {
                    // Concatenate value as string to variable
                    currentRow.Append(p.GetValue(Row, null).ToString() + "|");
                }
                catch (ArgumentException)
                {
                    // If the value is NULL (empty) then you can't get the value of it
                    currentRow.Append("NULL|");
                }
                catch (Exception ex)
                {
                    // Raise error because something unexpected went wrong
                    bool pbCancel = false;
                    this.ComponentMetaData.FireError(0, "MarkDuplicates", p.Name + ": " + ex.Message, string.Empty, 0, out pbCancel);
                }
            }
        }

        // Check if the current row and previous row are the same
        if (currentRow.ToString().Equals(previousRow))
        {
            // Redirect to duplicate output
            Row.DirectRowToDuplicate();
        }
        else
        {
            // Redirect to unique output
            Row.DirectRowToUnique();
        }

        // Fill previous row with current value for next check
        previousRow = currentRow.ToString();
    }
}

or VB.Net
' VB.Net code
' This script automaticly compares the selected columns, but there is one 'bug':
' You have to edit and close this script again if you change input columns.

Imports System
Imports System.Data
Imports System.Math
Imports System.Reflection                           ' Added
Imports System.Text                                 ' Added
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    ' Create a variable to store the concatenated values for the previous row
    Dim previousRow As String = ""

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Create a variable to store the concatenated values of current row
        Dim currentRow As StringBuilder = New StringBuilder()

        Dim p As PropertyInfo
        For Each p In Row.GetType().GetProperties()
            ' We can't use the _IsNull columns, so ignore them. Also ignore the new output column
            If ((p.Name.ToLower().EndsWith("_IsNull") = False) And (p.Name.Equals("Duplicate") = False)) Then
                Try
                    ' Concatenate value as string to variable
                    currentRow.Append(p.GetValue(Row, Nothing).ToString() + "|")
                Catch ex As ArgumentException
                    ' If the value is NULL (empty) then you can't get the value of it
                    currentRow.Append("NULL|")
                Catch ex As Exception
                    ' Raise error because something unexpected went wrong
                    Dim pbCancel As Boolean
                    Me.ComponentMetaData.FireError(0, "MarkDuplicates", p.Name + ": " + ex.Message, String.Empty, 0, pbCancel)
                End Try
            End If
        Next

        ' Check if the current row and previous row are the same
        If (currentRow.ToString().Equals(previousRow)) Then
            ' Redirect to duplicate output
            Row.DirectRowToDuplicate()
        Else
            ' Redirect to unique output
            Row.DirectRowToUnique()
        End If

        ' Fill previous row with current value for next check
        previousRow = currentRow.ToString()
    End Sub
End Class


6) Destinations
Add two destinations and connect the Data Flow Paths to the destinations. When connecting you will have to select the output port.
Choose right output















7) Testing
I added some dataviewers for testing purposes and selected the first three columns for deduplication.
The Result

























All roads lead to Rome, so let me know what your solution is!
Update: here is an other cool solution that doesn't need sorting.

Sunday, 27 November 2011

Long Validation of SSIS Packages in Visual Studio

Case
Opening my package in BIDS lasts for an eternity because the package validation takes forever. Is there a way to prevent long validation on opening?

Solution
SSIS validates all connections and the related components to make sure the metadata is valid. When you have a slow connection or just a very big package it will take some time to validate (up to several minutes). A big package could be divided in a couple of smaller packages, but the speed of your connections is usually something you can’t change.

Here are a couple of solutions/workarounds:
- Work Offline;
- DelayValidation;
- ValidateExternalMetadata;
- Offline Connection Managers.

Work Offline
There is an option in the SSIS menu called Work Offline. This project level property prevents validation if your connection is unavailable (or very slow). You must set it before opening a package.
Work Offline


















There are two concerns/downsides for this solution:
1) You can’t debug/run your package when you are in offline mode (of course, can’t do anything about that).
2) This offline property is only available when you have a SSIS project opened and if you closed the project while there where still packages opened, then it will reopen these packages when the project is loaded in BIDS… That will prevent you to switch to Offline before opening the packages.
You can overcome that by locating the [project].dtproj.user file in the project folder. Open this file in notepad and change the <OfflineMode> property from false to true.
<OfflineMode>












Also see of vote for this change request at Microsoft Connect: Open package in offline mode.

DelayValidation
Setting the DelayValidation property to true will prevent validation at design time. This property is available for the package, containers, connections and tasks.
Delay Validation






















There are two concerns/downsides for this solution:
1) Setting it on package level won’t help. Best thing is to set it on all tasks that use the slow connection.
2) On runtime the package will normally validate twice. Always right before executing a task and if DelayValidation is set to false it will also be validated up front. This will catch problems before the package runs for a long time. If your package runs for example 10 hours it would be very disappointing to have a validation error at the end.

So you have to keep that in mind. You could set this property to true while developing the packages and change it back to true before deploying it on the server.

ValidateExternalMetadata
This solution is similar to the DelayValidation property solution. While the DelayValidation property can only be set for the complete data flow task, this property can be set for individual data flow components. Set it to false to gain a shorter package opening time.

ValidateExternalMetadata























There are two concerns/downsides for this solution:
1) If a lot of components in the data flow task use the same slow connection, it could be a bit annoying to set it for all these components. In that case the DelayValidation property is probably faster.
2) Components that have this property on false will not be aware of changes in the metadata of external data sources.


Offline Connection Managers
This is a new option in Denali CTP 3 (SQL Server Integration Services 2012). It will allow you set a single connection to work offline by right clicking the connection manager. See Matt Massons blog post for more info about this new feature.
Offline Connection Managers






















There are two concerns/downsides for this solution:
1) You have to open the package before you can set this option. A workaround for that could be to first set the project to work offline before opening the package.
2) If you close the project all connection managers will be reset to work online.


Let me know if you have an other solution/workaround. Also see: Troubleshooting Package Development.

Thursday, 3 November 2011

How to pivot in SSIS

Case
The Pivot Transformation in SSIS has a big drawback: It has no interface which makes it hard to work with, even for seasoned users.

I have a CSV source that looks like this:
Unpivoted data 















And I want to transfer it to this:
Pivoted data







How does the Pivot Transformation work?

Solution
First vote for this feedback on Microsoft Connect because even in Denali (ctp3) there is still no good gui and that's a bit of a shame!
* UPDATE 17 November: Pivot GUI for SQL Server 2012 RC0 *

1) Source
For this example I use a CSV file. Add a Flat File Source for the file.
CSV as Source
















2) Sort
The data needs to be sorted on the key column. In this case that is the day column (it will be the unique key after the pivot). If your source is a database table, then you should add an order by clause in the source query. If your source can't be sorted, then you have to use a Sort Component. (see Performance Best Practices step 5)
Sort on key column



















If you forget to sort you will get more rows and a lot of null values.
Forgot to sort













3) Pivot input columns
Select all the columns you need as (readonly) input columns. In this case select them all.
Input columns




















4) Pivot Usage
Now the hard part. You need to determine the so called pivot usage for each column. There are 4 values (0 to 3):
means that the column will pass through unaffected
1 means that the column becomes the key of the pivot (aka the Set Key)
2 means that the column values become the column names of the pivot (aka the Pivot Column)
3 means that the column values are pivoted in the pivot

In our case:
Day gets pivot usage 1 because it will be the new key
Period gets pivot usage 0 because it will be passed through unaffected
Product gets pivot usage 2, because its values will be the new column names
Volume gets pivot usage 3, because its values will be used for the new columns
Pivot Usage (notice the LineageID for the next step)





















5) Output columns
Now we have to create new output columns. For the first column we will keep the same name: Day.
Create a new column named Day and in the SourceColumn property you need to fill in the LineageID from the source column Day. See picture of previous step. In my case it's 100, but yours could be different! Name and SourceColumn are the only things you need to fill in.
New column





















Do the same for the Period column: Give it the name Period and lookup the LineageID for the SourceColumn.

For the next three columns there are more things to fill in. First create three new columns and give them suitable names. I used the same names as the values in the excel sheet, but you can change that. Now the tricky part. The PivotKeyValue has to be the exact value of the various row values in the product column (Mountainbike, Tricycle, Recumbentbike). The SourceColumn refers to the column where the new value comes from. So in our case it will be the LineageID from the Volume column.
New columns





















6) The result
Now you can add a destination and test the result. I used data viewers to show you the result.
The result
























Download example package (2008 R2)
Download example file (CSV)













Monday, 31 October 2011

Encrypt files in SSIS

Case
Last month I did a post about encrypting certain columns with a Script Component. But I got questions about encrypting the whole file. So here are a couple of solutions.

Solutions
A) You could use an Execute Process Task to call an external executable which does the encryption for you. I personally don't like this task (feels you're a bit out of control), but it works. See this blog example and there are plenty of examples in the SSIS forum.
B) There are a couple of third party tasks that will do the encryption for you: BizCrypto, CozyRoc OpenPGP and PGP Encryption Task.
C) You could use the Script Task with some .Net coding to do it yourself.

Let's elaborate the Script Task solution:

1) Baseline Situation
The baseline situation of this case is an export from a database table to a CSV file. We will encrypt the CSV file when the data flow task is finished.
Baseline Situation






















2) Variable
Create a string variable named EncryptionKey to store your encryption key in and give it a value.
Rightclick in the Control Flow









3) Script Task
Add a Script Task to the Control Flow. Give it a suitable name and connect it to the Data Flow Task.
Script Task
















4) Select Variable
Edit the Script Task and select the variable EncryptionKey from step 2 as a ReadOnlyVariable.
Readonly variable



















5) The Script
Hit the Edit Script button and copy the code below. The script contains an extra row to show you how you can decrypt the file again. I have created two new static methods (Encrypt and Decrypt) that will do the job. They are simplified versions of these examples at CodeProject. If you want to use these scripts commercially, make sure you read the documentation and test it thoroughly! I don't take any responsibility whatsoever.

// C# code
using System;
using System.Data;
using System.IO;                        // Added 
using System.Security.Cryptography;     // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_2b17b0d6160d4866b944d20ce45d0994.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            // Get the filepath of the file that needs to be encrypted. This is
            // the name of the connection manager.
            string filepath = Dts.Connections["UsersExport"].ConnectionString;

            // Determine new name: d:\folder\file.csv => d:\folder\file_ENCRYPTED.csv
            string newFilepath = filepath.Substring(0, filepath.LastIndexOf(".") -1) + "_ENCRYPTED" + filepath.Substring(filepath.LastIndexOf("."));

            // Get password from SSIS variable
            string encryptionKey = Dts.Variables["EncryptionKey"].ToString();
            
            // Create an encrypted copy of the file
            Encrypt(filepath, newFilepath, encryptionKey);

            // ADDED JUST FOR TESTING: Create a decrypted copy of the encrypted file
            Decrypt(newFilepath, newFilepath.Replace("_ENCRYPTED","_DECRYPTED"), encryptionKey);

            // Close Script Task
            Dts.TaskResult = (int)ScriptResults.Success;
        }

        // Encrypt textfile with Rijndael encryption 
        public static void Encrypt(string fileIn, string fileOut, string Password)
        {
            // Open filestream for source file
            FileStream fsIn = new FileStream(fileIn, FileMode.Open, FileAccess.Read);

            // Open filestream for encrypted file
            FileStream fsOut = new FileStream(fileOut, FileMode.OpenOrCreate, FileAccess.Write);

            // Create Key and IV from the password with salt technique 
            PasswordDeriveBytes pdb = new PasswordDeriveBytes(Password, new byte[] {0x49, 0x76, 0x61, 0x6e, 0x20, 0x4d, 0x65, 0x64, 0x76, 0x65, 0x64, 0x65, 0x76});

            // Create a symmetric algorithm with Rijndael 
            Rijndael alg = Rijndael.Create();

            // Set Key and IV 
            alg.Key = pdb.GetBytes(32);
            alg.IV = pdb.GetBytes(16);

            // Create a CryptoStream 
            CryptoStream cs = new CryptoStream(fsOut, alg.CreateEncryptor(), CryptoStreamMode.Write);

            // Initialize a buffer and process the input file in chunks. 
            // This is done to avoid reading the whole file (which can be huge) into memory. 
            int bufferLen = 4096;
            byte[] buffer = new byte[bufferLen];
            int bytesRead;

            do
            {
                // read a chunk of data from the input file 
                bytesRead = fsIn.Read(buffer, 0, bufferLen);

                // encrypt it 
                cs.Write(buffer, 0, bytesRead);
            } while (bytesRead != 0);

            // close everything 
            // this will also close the unrelying fsOut stream
            cs.Close();
            fsIn.Close();
        }

        // Decrypt textfile with Rijndael encryption 
        public static void Decrypt(string fileIn, string fileOut, string Password)
        {
            // Open filestream for encrypted source file
            FileStream fsIn = new FileStream(fileIn, FileMode.Open, FileAccess.Read);

            // Open filestream for decrypted file
            FileStream fsOut = new FileStream(fileOut, FileMode.OpenOrCreate, FileAccess.Write);

            // Create Key and IV from the password with salt technique 
            PasswordDeriveBytes pdb = new PasswordDeriveBytes(Password, new byte[] {0x49, 0x76, 0x61, 0x6e, 0x20, 0x4d, 0x65, 0x64, 0x76, 0x65, 0x64, 0x65, 0x76});

            // Create a symmetric algorithm with Rijndael 
            Rijndael alg = Rijndael.Create();

            // Set Key and IV 
            alg.Key = pdb.GetBytes(32);
            alg.IV = pdb.GetBytes(16);

            // Create a CryptoStream 
            CryptoStream cs = new CryptoStream(fsOut, alg.CreateDecryptor(), CryptoStreamMode.Write);

            // Initialize a buffer and process the input file in chunks. 
            // This is done to avoid reading the whole file (which can be huge) into memory.
            int bufferLen = 4096;
            byte[] buffer = new byte[bufferLen];
            int bytesRead;

            do
            {
                // read a chunk of data from the input file 
                bytesRead = fsIn.Read(buffer, 0, bufferLen);

                // Decrypt it 
                cs.Write(buffer, 0, bytesRead);
            } while (bytesRead != 0);

            // close everything 
            // this will also close the unrelying fsOut stream 
            cs.Close(); 
            fsIn.Close();
        }
    }
}

Or VB.Net

' VB.Net code
Imports System
Imports System.Data
Imports System.IO                       ' Added
Imports System.Security.Cryptography    ' Added 
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 

 Public Sub Main()
        ' Get the filepath of the file that needs to be encrypted. This is
        ' the name of the connection manager.
        Dim filepath As String = Dts.Connections("UsersExport").ConnectionString

        ' Determine new name: d:\folder\file.csv => d:\folder\file_ENCRYPTED.csv
        Dim newFilepath As String = filepath.Substring(0, filepath.LastIndexOf(".") - 1) & "_ENCRYPTED" & filepath.Substring(filepath.LastIndexOf("."))

        ' Get password from SSIS variable
        Dim encryptionKey As String = Dts.Variables("EncryptionKey").ToString()

        ' Create an encrypted copy of the file
        Encrypt(filepath, newFilepath, encryptionKey)

        ' ADDED JUST FOR TESTING: Create a decrypted copy of the encrypted file
        Decrypt(newFilepath, newFilepath.Replace("_ENCRYPTED", "_DECRYPTED"), encryptionKey)

        ' Close Script Task
        Dts.TaskResult = ScriptResults.Success
 End Sub

    ' Encrypt textfile with Rijndael encryption 
    Public Shared Sub Encrypt(ByVal fileIn As String, ByVal fileOut As String, ByVal Password As String)
        ' Open filestream for source file
        Dim fsIn As New FileStream(fileIn, FileMode.Open, FileAccess.Read)

        ' Open filestream for encrypted file
        Dim fsOut As New FileStream(fileOut, FileMode.OpenOrCreate, FileAccess.Write)

        ' Create Key and IV from the password with salt technique 
        Dim pdb As New PasswordDeriveBytes(Password, New Byte() {&H49, &H76, &H61, &H6E, &H20, &H4D, &H65, &H64, &H76, &H65, &H64, &H65, &H76})

        ' Create a symmetric algorithm with Rijndael 
        Dim alg As Rijndael = Rijndael.Create()

        ' Set Key and IV 
        alg.Key = pdb.GetBytes(32)
        alg.IV = pdb.GetBytes(16)

        ' Create a CryptoStream 
        Dim cs As New CryptoStream(fsOut, alg.CreateEncryptor(), CryptoStreamMode.Write)

        ' Initialize a buffer and process the input file in chunks. 
        ' This is done to avoid reading the whole file (which can be huge) into memory. 
        Dim bufferLen As Integer = 4096
        Dim buffer As Byte() = New Byte(bufferLen - 1) {}
        Dim bytesRead As Integer

        Do
            ' read a chunk of data from the input file 
            bytesRead = fsIn.Read(buffer, 0, bufferLen)

            ' encrypt it 
            cs.Write(buffer, 0, bytesRead)
        Loop While bytesRead <> 0

        ' close everything 
        ' this will also close the unrelying fsOut stream
        cs.Close()
        fsIn.Close()
    End Sub

    ' Decrypt textfile with Rijndael encryption 
    Public Shared Sub Decrypt(ByVal fileIn As String, ByVal fileOut As String, ByVal Password As String)
        ' Open filestream for encrypted source file
        Dim fsIn As New FileStream(fileIn, FileMode.Open, FileAccess.Read)

        ' Open filestream for decrypted file
        Dim fsOut As New FileStream(fileOut, FileMode.OpenOrCreate, FileAccess.Write)

        ' Create Key and IV from the password with salt technique 
        Dim pdb As New PasswordDeriveBytes(Password, New Byte() {&H49, &H76, &H61, &H6E, &H20, &H4D, &H65, &H64, &H76, &H65, &H64, &H65, &H76})

        ' Create a symmetric algorithm with Rijndael 
        Dim alg As Rijndael = Rijndael.Create()

        ' Set Key and IV 
        alg.Key = pdb.GetBytes(32)
        alg.IV = pdb.GetBytes(16)

        ' Create a CryptoStream 
        Dim cs As New CryptoStream(fsOut, alg.CreateDecryptor(), CryptoStreamMode.Write)

        ' Initialize a buffer and process the input file in chunks. 
        ' This is done to avoid reading the whole file (which can be huge) into memory.
        Dim bufferLen As Integer = 4096
        Dim buffer As Byte() = New Byte(bufferLen - 1) {}
        Dim bytesRead As Integer

        Do
            ' read a chunk of data from the input file 
            bytesRead = fsIn.Read(buffer, 0, bufferLen)

            ' Decrypt it 
            cs.Write(buffer, 0, bytesRead)
        Loop While bytesRead <> 0

        ' close everything 
        ' this will also close the unrelying fsOut stream 
        cs.Close()
        fsIn.Close()
    End Sub
End Class

6) The result
See the result of encryption and decryption.
Good luck reading that encrypted file

















Note: This example uses Rijndael, but there are more methods available in .Net

Sunday, 25 September 2011

Encrypt column data in SSIS

Case
I want to encrypt a column with sensitive data. How do I do that in SSIS?

Solutions
A) If your source is a SQL Server table, you could use T-SQL to encrypt your data.
B) An other option is the Script Component with some .Net coding to encrypt your sensitive data.

Note: see this post for encrypting a whole file instead of certain columns.

Let's elaborate that Script Component solution. Let's assume you have an user table that you want to export to a csv file, but you don't want to export the password (or salary) column in plaintext.
Usertable in SQL Server












1) Variable
Create a string variable named EncryptionKey to store your encryption key in and give it a value.
Rightclick in the Control Flow









2) Data Flow Task
Create a new Data Flow Task with a source connection to your table. This example has two columns: Name and Password (see screenshot above step 1).
Data Flow Task with source

















3) Script Component
Add a Script Component after the source and choose Transformation. Connect it to the source and give it a suitable name.
Script Component Type Transformation



















4) Select Variable
Edit the Script Component and select the variable EncryptionKey from step 1 as a ReadOnlyVariable.
ReadOnlyVariables




















5) Input Column
Go to the Input Columns section and select the column that you want to encrypt as a ReadOnly input column.
Readonly input column




















6) Output Column
Go to the Inputs and Outputs section and create a new output column to store the excrypted password in. The encypted string will be larger than the original string so make sure it will fit.
New output column named EncryptedPassword




















7) The Script
Hit the Edit Script button and remove the PreExecute and PostExecute methods because we don't need them. Copy the code below. I have created two new static methods (Encrypt and Decrypt) that will do the job. They are simplified versions of these examples at CodeProject. If you want to use these scripts commercially, make sure you read the documentation and test it thoroughly! I don't take any responsibility whatsoever.
// C# code
using System;
using System.Data;
using System.IO;                        // Added
using System.Security.Cryptography;     // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Windows.Forms;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
       // Fill the new column with an encrypted version of the password column.
        Row.EncryptedPassword = Encrypt(Row.Password.ToString(), this.Variables.EncryptionKey.ToString());
    }

    // Encrypt text with Rijndael encryption
    public static string Encrypt(string clearText, string Password)
    {
        // Convert password string into byte array
        byte[] clearBytes = System.Text.Encoding.Unicode.GetBytes(clearText);

        // Create Key and IV from the password with salt technique
        PasswordDeriveBytes pdb = new PasswordDeriveBytes(Password, new byte[] { 0x49, 0x76, 0x61, 0x6e, 0x20, 0x4d, 0x65, 0x64, 0x76, 0x65, 0x64, 0x65, 0x76 });

        // Create a symmetric algorithm with Rijndael
        Rijndael alg = Rijndael.Create();

        // Set Key and IV
        alg.Key = pdb.GetBytes(32);
        alg.IV = pdb.GetBytes(16);

        // Create a MemoryStream  
        MemoryStream ms = new MemoryStream();

        // Create a CryptoStream
        CryptoStream cs = new CryptoStream(ms, alg.CreateEncryptor(), CryptoStreamMode.Write);

        // Write the data and make it do the encryption 
        cs.Write(clearBytes, 0, clearBytes.Length);

        // Close CryptoStream
        cs.Close();

        // Get Encypted data from MemoryStream
        byte[] encryptedData = ms.ToArray();

        // return the Encypted data as a String
        return Convert.ToBase64String(encryptedData);
    }

    // Decrypt text with Rijndael encryption
    public static string Decrypt(string cipherText, string Password)
    {
        // Convert password string into byte array
        byte[] cipherBytes = Convert.FromBase64String(cipherText);

        // Create Key and IV from the password with salt technique
        PasswordDeriveBytes pdb = new PasswordDeriveBytes(Password, new byte[] { 0x49, 0x76, 0x61, 0x6e, 0x20, 0x4d, 0x65, 0x64, 0x76, 0x65, 0x64, 0x65, 0x76 });

        // Create a symmetric algorithm with Rijndael
        Rijndael alg = Rijndael.Create();

        // Set Key and IV
        alg.Key = pdb.GetBytes(32);
        alg.IV = pdb.GetBytes(16);

        // Create a MemoryStream  
        MemoryStream ms = new MemoryStream();

        // Create a CryptoStream 
        CryptoStream cs = new CryptoStream(ms, alg.CreateDecryptor(), CryptoStreamMode.Write);

        // Write the data and make it do the decryption 
        cs.Write(cipherBytes, 0, cipherBytes.Length);

        // Close CryptoStream
        cs.Close();

        // Get Decypted data from MemoryStream
        byte[] decryptedData = ms.ToArray();

        // return the Decypted data as a String
        return System.Text.Encoding.Unicode.GetString(decryptedData);
    }
}

Or VB.Net

' VB.Net code
Imports System
Imports System.Data
Imports System.IO                       ' Added
Imports System.Security.Cryptography    ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Fill the new column with an encrypted version of the password column.
        Row.EncryptedPassword = Encrypt(Row.Password.ToString(), Me.Variables.EncryptionKey.ToString())
    End Sub

    ' Encrypt text with Rijndael encryption
    Public Shared Function Encrypt(ByVal clearText As String, ByVal Password As String) As String
        ' Convert password string into byte array
        Dim clearBytes As Byte() = System.Text.Encoding.Unicode.GetBytes(clearText)

        ' Create Key and IV from the password with salt technique
        Dim pdb As New PasswordDeriveBytes(Password, New Byte() {&H49, &H76, &H61, &H6E, &H20, &H4D, &H65, &H64, &H76, &H65, &H64, &H65, &H76})

        ' Create a symmetric algorithm with Rijndael
        Dim alg As Rijndael = Rijndael.Create()

        ' Set Key and IV
        alg.Key = pdb.GetBytes(32)
        alg.IV = pdb.GetBytes(16)

        ' Create a MemoryStream  
        Dim ms As MemoryStream = New MemoryStream()

        ' Create a CryptoStream
        Dim cs As CryptoStream = New CryptoStream(ms, alg.CreateEncryptor(), CryptoStreamMode.Write)

        ' Write the data and make it do the encryption 
        cs.Write(clearBytes, 0, clearBytes.Length)

        ' Close CryptoStream
        cs.Close()

        ' Get Encypted data from MemoryStream
        Dim encryptedData As Byte() = ms.ToArray()

        ' return the Encypted data as a String
        Encrypt = Convert.ToBase64String(encryptedData)
    End Function

    ' Decrypt text with Rijndael encryption
    Public Shared Function Decrypt(ByVal cipherText As String, ByVal Password As String) As String
        ' Convert password string into byte array
        Dim cipherBytes As Byte() = Convert.FromBase64String(cipherText)

        ' Create Key and IV from the password with salt technique
        Dim pdb As New PasswordDeriveBytes(Password, New Byte() {&H49, &H76, &H61, &H6E, &H20, &H4D, &H65, &H64, &H76, &H65, &H64, &H65, &H76})

        ' Create a symmetric algorithm with Rijndael
        Dim alg As Rijndael = Rijndael.Create()

        ' Set Key and IV
        alg.Key = pdb.GetBytes(32)
        alg.IV = pdb.GetBytes(16)

        ' Create a MemoryStream  
        Dim ms As MemoryStream = New MemoryStream()

        ' Create a CryptoStream 
        Dim cs As CryptoStream = New CryptoStream(ms, alg.CreateDecryptor(), CryptoStreamMode.Write)

        ' Write the data and make it do the decryption 
        cs.Write(cipherBytes, 0, cipherBytes.Length)

        ' Close CryptoStream
        cs.Close()

        ' Get Decypted data from MemoryStream
        Dim decryptedData As Byte() = ms.ToArray()

        ' return the Decypted data as a String
        Decrypt = System.Text.Encoding.Unicode.GetString(decryptedData)
    End Function

End Class



8) The result
Now finish the Data Flow by adding a destination. I added a Data Viewer to see the result.
The result: encrypted data


















9) Reverse
By reversing the Data Flow we can decrypt the data.
The result: decrypted data
















// C# code
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    // Fill the new column with the decrypted version of the encrypted password column.
    Row.DecryptedPassword = Decrypt(Row.Password.ToString(), this.Variables.EncryptionKey.ToString());
}



Or VB.Net

' VB.Net code
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    ' Fill the new column with the decrypted version of the encrypted password column.
    Row.DecryptedPassword = Decrypt(Row.Password.ToString(), Me.Variables.EncryptionKey.ToString())
End Sub



Note: This example uses Rijndael, but there are more methods available in .Net

Thursday, 1 September 2011

SSIS Transactions with TSQL

Case
A couple of months ago I did a post on Transactions in SSIS, but that solution requires enabling the windows service Microsoft Distributed Transaction Coordinator (MS DTC). What if you can't (or prefer not to) use that service?

Solution
You can use the Transact SQL transactions to accomplish the same result. Same example as before. I want to empty and refill a table with values from a CSV file, but I want to keep the old data when the refill fails. My package:
Example



















1) Container
Add a Sequence Container and drag the existing Execute SQL Task (which empties the table) and the Data Flow Task (which fills the table) to it.
Sequence Container



















2) Start Transaction
Add an Execute SQL Task before the Sequence Container and use the same connection as in the other tasks. Enter the following statement in the SQLStatement field: BEGIN TRAN MyTran.
Start Transaction






















BEGIN TRAN



















3) Commit Transaction
Add an Execute SQL Task after the Sequence Container and use the same connection as in the other tasks. Enter the following statement in the SQLStatement field: COMMIT TRAN MyTran.
Commit Transaction






















COMMIT TRAN



















4) Rollback Transaction
Add an other Execute SQL Task after the Sequence Container and use the same connection as in the other tasks. Enter the following statement in the SQLStatement field: ROLLBACK TRAN MyTran.
Rollback Transaction






















ROLLBACK TRAN



















5) Precedence Contraint
Change the Value property of the Precedence Contraint between the Sequence Container and the Rollback from Success to Failure. If something fails in the Sequence Container the Rollback command will be executed.
Precedence Contraint






















Failure

















6) RetainSameConnection
Now the most important thing. Change the RetainSameConnection property of the database connection from false to True.
RetainSameConnection






















7) The Result
That's all there is. Now you can test your package. You can open the CSV file in Excel to lock the file and fail the package.
The Result
















* UPDATE *
Added the optional transactionname in case you want to re-execute. See comment Arthur Zubarev.