Monday, 29 October 2012

Split multi value column into multiple columns

Case
 I have a source where one column has a divider that splits the sales values for each month. How do I split that value over multiple columns?
Mountainbike;black;10,4,7,3,11,5,8,6,10,4,12,12
Mountainbike;blue;12,2,9,1,13,7,6,4,8,12,3,4
The last column contains the sales per month and is divided by a comma instead of a semicolon.

Solution
There are various options to split that value.
A) Substring/Findstring
B) Script Component
C) Token

Solution A: Substring/Findstring
You can use an expression in the Derived Column. The first and last deviate from the rest:
SUBSTRING(Sales,1,FINDSTRING(Sales,",",1) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",1) + 1,FINDSTRING(Sales,",",2) - FINDSTRING(Sales,",",1) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",2) + 1,FINDSTRING(Sales,",",3) - FINDSTRING(Sales,",",2) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",3) + 1,FINDSTRING(Sales,",",4) - FINDSTRING(Sales,",",3) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",4) + 1,FINDSTRING(Sales,",",5) - FINDSTRING(Sales,",",4) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",5) + 1,FINDSTRING(Sales,",",6) - FINDSTRING(Sales,",",5) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",6) + 1,FINDSTRING(Sales,",",7) - FINDSTRING(Sales,",",6) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",7) + 1,FINDSTRING(Sales,",",8) - FINDSTRING(Sales,",",7) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",8) + 1,FINDSTRING(Sales,",",9) - FINDSTRING(Sales,",",8) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",9) + 1,FINDSTRING(Sales,",",10) - FINDSTRING(Sales,",",9) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",10) + 1,FINDSTRING(Sales,",",11) - FINDSTRING(Sales,",",10) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",11) + 1,LEN(Sales) - FINDSTRING(Sales,",",11))
Substring/Findstring solution




















Solution B: Script Component
Add a Script Component (type transformation) and select the Sales column as ReadOnly input column in the Input Columns pane. Then go to the Inputs and Outputs pane and create a column foreach month. After that choose your language and hit the Edit Script button and copy the following method.
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Split input column on comma fill output columns
        // Added the Convert.ToInt32(XXXX) to convert it to int
        string[] sales = Row.Sales.ToString().Split(new char[]{','}, StringSplitOptions.None);
        Row.Jan = Convert.ToInt32(sales[0]);
        Row.Feb = Convert.ToInt32(sales[1]);
        Row.Mar = Convert.ToInt32(sales[2]);
        Row.Apr = Convert.ToInt32(sales[3]);
        Row.May = Convert.ToInt32(sales[4]);
        Row.Jun = Convert.ToInt32(sales[5]);
        Row.Jul = Convert.ToInt32(sales[6]);
        Row.Aug = Convert.ToInt32(sales[7]);
        Row.Sep = Convert.ToInt32(sales[8]);
        Row.Oct = Convert.ToInt32(sales[9]);
        Row.Nov = Convert.ToInt32(sales[10]);
        Row.Dec = Convert.ToInt32(sales[11]);
    }
}

or VB.Net

'VB.Net code
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()g _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Split input column on comma fill output columns
        ' Added the Convert.ToInt32(XXXX) to convert it to int
        Dim sales As String() = Row.Sales.ToString().Split(New Char() {","c}, StringSplitOptions.None)
        Row.Jan = Convert.ToInt32(sales(0))
        Row.Feb = Convert.ToInt32(sales(1))
        Row.Mar = Convert.ToInt32(sales(2))
        Row.Apr = Convert.ToInt32(sales(3))
        Row.May = Convert.ToInt32(sales(4))
        Row.Jun = Convert.ToInt32(sales(5))
        Row.Jul = Convert.ToInt32(sales(6))
        Row.Aug = Convert.ToInt32(sales(7))
        Row.Sep = Convert.ToInt32(sales(8))
        Row.Oct = Convert.ToInt32(sales(9))
        Row.Nov = Convert.ToInt32(sales(10))
        Row.Dec = Convert.ToInt32(sales(11))
    End Sub
End Class
This is more readable than the Substring/Findstring expressions!
Script Component solution




















Solution C: Token
SSIS 2012 has a new expression called Token which makes life a lot easier than the Substring/Findstring solution:
TOKEN(Sales,",",1)
TOKEN(Sales,",",2)
TOKEN(Sales,",",3)
TOKEN(Sales,",",4)
TOKEN(Sales,",",5)
TOKEN(Sales,",",6)
TOKEN(Sales,",",7)
TOKEN(Sales,",",8)
TOKEN(Sales,",",9)
TOKEN(Sales,",",10)
TOKEN(Sales,",",11)
TOKEN(Sales,",",12)

Token solution

Sunday, 28 October 2012

Creating a comma separated list of related records

Case
I have a list of teacher and student combinations with one combination per record and I want to created a comma delimited list of students per teacher.

Solution
If your source is a database then the easiest solution is a TSQL query like this:
--TSQL Query
WITH UniqueTeachers AS
(
 SELECT  DISTINCT Teacher
 FROM  TeacherStudentTable
)
SELECT  Teacher
,   Students = STUFF((
      SELECT  ',' + Student
      FROM  TeacherStudentTable
      WHERE  Teacher = UniqueTeachers.Teacher
      ORDER BY Student
      FOR XML PATH(''), TYPE).value('.','varchar(100)'), 1, 1, '')
FROM  UniqueTeachers
ORDER BY Teacher
The query in SSIS as source



























If your source is for example a flat file or a database that doesn't support a query like this, then there are also options within SSIS. For this solution I use a asynchronous Script Component.

1) Sorted source
We need a sorted source because we are comparing records with each other. In this case make sure the source is sorted on teacher first (and optional secondly on student).
Add Sort transformation if source isn't sorted




















2) Script Component
Add a Script Component (type transformation) and select the Teacher and Student columns as ReadOnly input columns.
Input columns: Teacher and Student




















3) Asynchronous
We need to make the Script Component asynchronous because it throws out a different number of rows than there are incomming. Go to the Inputs and Outputs pane, click on Output 0 and change the SynchronousInputID to None.
Asynchonous



















4) Output
We now need to create an output for the Script Component. Expand the Output 0 and add two columns:
Teacher (same data type and size as the input column teacher)
Students (same data type as the input column student, but larger to fit multiple student names)
Output columns




















5) The Script
Copy the three variables and the two methods to your Script Component (and remove any other existing methods).
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    bool initialRow = true;     // Indicater for the first row
    string teacher = "";        // Name of the teacher to track teacherchanges between rows
    string students = "";       // The comma delimited list of students

    public override void Input0_ProcessInput(Input0Buffer Buffer)
    {
        // Loop through buffer
        while (Buffer.NextRow())
        {
            // Process an input row
            Input0_ProcessInputRow(Buffer);

            // Change the indicator after the first row has been processed
            initialRow = false;
        }
        
        // Check if this is the last row
        if (Buffer.EndOfRowset())
        {
            // Fill the columns of the existing output row with values
            // from the variable before closing this Script Component
            Output0Buffer.Teacher = teacher;
            Output0Buffer.Students = students;
        }
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if (initialRow)
        {
            // This is for the first input row only
            
            // Create a new output row
            Output0Buffer.AddRow();

            // Now fill the variables with the values from the input row
            teacher = Row.Teacher;
            students = Row.Student;
        }
        else if ((!initialRow) & (teacher != Row.Teacher))
        {
            // This isn't the first row, but the teacher did change

            // Fill the columns of the existing output row with values
            // from the variable before creating a new output row
            Output0Buffer.Teacher = teacher;
            Output0Buffer.Students = students;

            // Create a new output row
            Output0Buffer.AddRow();

            // Now fill the variables with the values from the input row
            teacher = Row.Teacher;
            students = Row.Student;
        }
        else if ((!initialRow) & (teacher == Row.Teacher))
        {
            // This isn't the first row, and the teacher did not change

            // Concatenate the studentsname to the variable
            students += "," + Row.Student;
        }
    }

    // Little explanation:
    // Rows are created in memory with .AddRow()
    // and will be submitted to the output when a
    // new / subsequent row is created or when
    // the last buffer has been finished.
}

or in VB.Net

'VB.Net code
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    Private initialRow As Boolean = True    ' Indicater for the first row
    Private teacher As String = ""          ' Name of the teacher to track teacherchanges between rows
    Private students As String = ""         ' The comma delimited list of students

    Public Overrides Sub Input0_ProcessInput(Buffer As Input0Buffer)
        ' Loop through buffer
        While Buffer.NextRow()
            ' Process an input row
            Input0_ProcessInputRow(Buffer)

            ' Change the indicator after the first row has been processed
            initialRow = False
        End While

        ' Check if this is the last row
        If Buffer.EndOfRowset() Then
            ' Fill the columns of the existing output row with values
            ' from the variable before closing this Script Component
            Output0Buffer.Teacher = teacher
            Output0Buffer.Students = students
        End If
    End Sub

    Public Overrides Sub Input0_ProcessInputRow(Row As Input0Buffer)
        If initialRow Then
            ' This is for the first input row only

            ' Create a new output row
            Output0Buffer.AddRow()

            ' Now fill the variables with the values from the input row
            teacher = Row.Teacher
            students = Row.Student
        ElseIf (Not initialRow) And (teacher <> Row.Teacher) Then
            ' This isn't the first row, but the teacher did change

            ' Fill the columns of the existing output row with values
            ' from the variable before creating a new output row
            Output0Buffer.Teacher = teacher
            Output0Buffer.Students = students

            ' Create a new output row
            Output0Buffer.AddRow()

            ' Now fill the variables with the values from the input row
            teacher = Row.Teacher
            students = Row.Student
        ElseIf (Not initialRow) And (teacher = Row.Teacher) Then
            ' This isn't the first row, and the teacher did not change

            ' Concatenate the studentsname to the variable
            students += "," & Convert.ToString(Row.Student)
        End If
    End Sub

    ' Little explanation:
    ' Rows are created in memory with .AddRow()
    ' and will be submitted to the output when a
    ' new / subsequent row is created or when
    ' the last buffer has been finished.
End Class

Note: You can change the delimiter in C# line 70 and VB.Net line 64.


6) Testing
For testing purposes I added a derived column and a couple of data viewer.
The result



















Conclusion: both methods have the same result. For a SQL database source, the T-SQL method is probably a little faster. If you can do the sort in the source the performance differences will diminish.

In one of my next blog posts I will show you how to split a comma separated list in to records.

Sunday, 21 October 2012

Replace Null Transformation a.k.a. the defaulter

An often used task in a data flow task is to replace all null values for dimension lookups. If you're lucky you can use the short 2012 expression REPLACENULL or else you're stuck with ? :

Replace Null



















Doing that a dozen times for each fact package over and over is boring, so my collegue Marc Potters developped a Replace Null Transformation (a.k.a. Defaulter Transformation) that can replace null values in several columns at once. We made it a bit more general so it can be used to replace null values for all data types.
Replace Null


















Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008 and 2012 version on the download page.

BETA2: solved bug that could result in double output.
BETA3: solved bug that could result in value 0 if only one column is selected.

Installation
The installer registers the DLL in the GAC and copies it to the component folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\). After that you have to close Visual Studio/BIDS because it caches the GAC on startup. (Restart not required for SSDT)

How add the transformation to the toolbox (2008 only)
When you open a SSIS project in Visual Studio/Bids, right click the toolbox and choose "Choose Items...". After this Visual Studio will be busy for a couple of seconds!
Right click toolbox






















When the Choose Toolbox Items window appears, go to the SSIS Data Flow Items and search for the newly installed transformation and select it. Click ok to finish.
Choose Toolbox Items





















Now the new transformation will appear in the toolbox. Ready to use! Have fun.

New component added






























Bugs or feature suggestions
Please contact me or leave a comment if you have any suggestions or found a bug in this Custom component.

Sunday, 14 October 2012

Custom SSIS Component: UnZip Task

In january 2011 I did a post about unzipping files within SSIS with a Script Task. Because not everybody is fond about programming, I decided to make my own UnZip (and Zip) Task. For the actual unzipping I used the well know opensource DotNetZip library.





This UnZip Task is still beta and more unzip features will follow. Please use it, test it and let me know your findings or wishes in the comments below.
The UnZip Task is now stable, but please submit suggestions for new features.






















Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008 and 2012 version on the download page.

BETA2: UNC path support, cancel variable window bug solved.
BETA3: Various validations added, New option added to store unzipped filepath in variable.
BETA4: Switched to DotNetZip for the actual unzipping * Probably last beta version before official release. *
V1.0: Bug solved that didn't validate driveletters with smallcaps
V1.1: Option added to delete zip file after unzipping it (see properties). Code added for easier upgrading to new release.
V1.2: SSIS 2016 added and now upgradable. Added 64 installer.

Installation
The installer registers the DLL in the GAC and copies it to the task folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\Tasks\). After that you have to close Visual Studio/BIDS because it caches the GAC on startup. Restarting SQL Server Data Tools is not necessary.

How add the task the the toolbox (2008 only)
When you open a SSIS project in Visual Studio/Bids, right click the toolbox and choose "Choose Items...". After this Visual Studio will be busy for a couple of seconds!

Right click toolbox






















When the Choose Toolbox Items window appears, go to the SSIS Control Flow Items and search for the newly installed UnZip Task and select it. Click ok to finish.
Choose Toolbox Items























Now the new task will appear in the toolbox. Ready to use! Have fun.
New task added



























Bugs or feature suggestions
Please contact me or leave a comment if you have any suggestions or found a bug in this Custom task.




Friday, 12 October 2012

Custom SSIS Component: Zip Task

In January 2011 I did a post about archiving processed source files in a zipfile via a Script Task. Because not everybody is fond about programming, I decided to make my own Zip Task (and UnZip task). For the actual zipping I used the well know opensource DotNetZip library.

This Zip Task is still beta and more zip features will follow. Please use it, test it and let me know your findings or wishes in the comments below.
The Zip Task is now stable, but please keep submitting suggestions for new features.
Zip Task 2008 Beta




















Zip Task 2012 V1.2























Here is an example on how to add this Zip Task in your BIML Script.

Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008 and 2012 version on the download page.

BETA2: Various validations added, New option added to append to an existing zip file, Switch from SharpZipLib to DotNetZip because it had better append possibilities.
BETA3: Bug solved when using a variable as destination.
V1.0: Bug solved that didn't validate driveletters with smallcaps
V1.1: Option added to delete source files after zipping (see properties). Bug solved that permitted Asian chars in filenames. Bug solved that added extra folder in zipfile. Code added for easier upgrading to new release.
V1.2: Finetuned code to allow more than 65535 files and to allow very big files (zip64).
V1.3: 2016 version added and made task upgradable in SSDT 2016. Added 64 installer.

Installation
The installer registers the DLL in the GAC and copies it to the task folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\Tasks\). After that you have to close Visual Studio/BIDS because it caches the GAC on startup. Restarting SQL Server Data Tools is not necessary.

How add the task the the toolbox (2008 only)
When you open a SSIS project in Visual Studio/Bids, right click the toolbox and choose "Choose Items...". After this Visual Studio will be busy for a couple of seconds!

Right click toolbox






















When the Choose Toolbox Items window appears, go to the SSIS Control Flow Items and search for the newly installed Zip Task and select it. Click ok to finish.


Choose Toolbox Items




















Now the new task will appear in the toolbox. Ready to use! Have fun.
New task added



























Bugs or feature suggestions
Please contact me or leave a comment if you have any suggestions or found a bug in this Custom task.





Monday, 1 October 2012

Limit memory usage SQL Server

Case
I have SSIS and SQL Server on the same machine and the SQL Server proces (SQLSERVR.EXE) is taking a lot of memory. How can I limit the memory usage of SQL Server so that there is more left for SSIS?

Solution
SQL Server is a bit greedy when it comes to memory usage. It will take what's there for optimal SQL Server performance, but sharing(releasing) it isn't it's best quality. Luckily you can limit SQL Server. The screenshots are from 2012, but it works the same with 2005 and 2008.

1) Open SSMS
Open SQL Server Management Studio (SSMS) and connect to your server. Right click the server and choose Properties. A new window will open with Server Properties.
SQL Server Properties



















2) Memory
Select the Memory page and check the value of the "Maximum server memory (in MB)" field. It has a has a gigantic limit of 2,147,483,647MB.
2,147,483,647MB























3) Limit Memory
Now you can limit this amount. My machine has 8GB, so I have limited SQL Server to 3GB: 3 * 1024 = 3072MB. Click OK and you will see that SQL Server will gradually release memory until it reaches your limit (or restart the SQL Server Service).
Limited to 3GB























Note: be a little careful changing settings on production servers. Misuse may cause damage...