Monday, 29 October 2012

Split multi value column into multiple columns

Case
 I have a source where one column has a divider that splits the sales values for each month. How do I split that value over multiple columns?
Mountainbike;black;10,4,7,3,11,5,8,6,10,4,12,12
Mountainbike;blue;12,2,9,1,13,7,6,4,8,12,3,4
The last column contains the sales per month and is divided by a comma instead of a semicolon.

Solution
There are various options to split that value.
A) Substring/Findstring
B) Script Component
C) Token

Solution A: Substring/Findstring
You can use an expression in the Derived Column. The first and last deviate from the rest:
SUBSTRING(Sales,1,FINDSTRING(Sales,",",1) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",1) + 1,FINDSTRING(Sales,",",2) - FINDSTRING(Sales,",",1) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",2) + 1,FINDSTRING(Sales,",",3) - FINDSTRING(Sales,",",2) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",3) + 1,FINDSTRING(Sales,",",4) - FINDSTRING(Sales,",",3) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",4) + 1,FINDSTRING(Sales,",",5) - FINDSTRING(Sales,",",4) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",5) + 1,FINDSTRING(Sales,",",6) - FINDSTRING(Sales,",",5) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",6) + 1,FINDSTRING(Sales,",",7) - FINDSTRING(Sales,",",6) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",7) + 1,FINDSTRING(Sales,",",8) - FINDSTRING(Sales,",",7) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",8) + 1,FINDSTRING(Sales,",",9) - FINDSTRING(Sales,",",8) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",9) + 1,FINDSTRING(Sales,",",10) - FINDSTRING(Sales,",",9) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",10) + 1,FINDSTRING(Sales,",",11) - FINDSTRING(Sales,",",10) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",11) + 1,LEN(Sales) - FINDSTRING(Sales,",",11))
Substring/Findstring solution




















Solution B: Script Component
Add a Script Component (type transformation) and select the Sales column as ReadOnly input column in the Input Columns pane. Then go to the Inputs and Outputs pane and create a column foreach month. After that choose your language and hit the Edit Script button and copy the following method.
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Split input column on comma fill output columns
        // Added the Convert.ToInt32(XXXX) to convert it to int
        string[] sales = Row.Sales.ToString().Split(new char[]{','}, StringSplitOptions.None);
        Row.Jan = Convert.ToInt32(sales[0]);
        Row.Feb = Convert.ToInt32(sales[1]);
        Row.Mar = Convert.ToInt32(sales[2]);
        Row.Apr = Convert.ToInt32(sales[3]);
        Row.May = Convert.ToInt32(sales[4]);
        Row.Jun = Convert.ToInt32(sales[5]);
        Row.Jul = Convert.ToInt32(sales[6]);
        Row.Aug = Convert.ToInt32(sales[7]);
        Row.Sep = Convert.ToInt32(sales[8]);
        Row.Oct = Convert.ToInt32(sales[9]);
        Row.Nov = Convert.ToInt32(sales[10]);
        Row.Dec = Convert.ToInt32(sales[11]);
    }
}

or VB.Net

'VB.Net code
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()g _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Split input column on comma fill output columns
        ' Added the Convert.ToInt32(XXXX) to convert it to int
        Dim sales As String() = Row.Sales.ToString().Split(New Char() {","c}, StringSplitOptions.None)
        Row.Jan = Convert.ToInt32(sales(0))
        Row.Feb = Convert.ToInt32(sales(1))
        Row.Mar = Convert.ToInt32(sales(2))
        Row.Apr = Convert.ToInt32(sales(3))
        Row.May = Convert.ToInt32(sales(4))
        Row.Jun = Convert.ToInt32(sales(5))
        Row.Jul = Convert.ToInt32(sales(6))
        Row.Aug = Convert.ToInt32(sales(7))
        Row.Sep = Convert.ToInt32(sales(8))
        Row.Oct = Convert.ToInt32(sales(9))
        Row.Nov = Convert.ToInt32(sales(10))
        Row.Dec = Convert.ToInt32(sales(11))
    End Sub
End Class
This is more readable than the Substring/Findstring expressions!
Script Component solution




















Solution C: Token
SSIS 2012 has a new expression called Token which makes life a lot easier than the Substring/Findstring solution:
TOKEN(Sales,",",1)
TOKEN(Sales,",",2)
TOKEN(Sales,",",3)
TOKEN(Sales,",",4)
TOKEN(Sales,",",5)
TOKEN(Sales,",",6)
TOKEN(Sales,",",7)
TOKEN(Sales,",",8)
TOKEN(Sales,",",9)
TOKEN(Sales,",",10)
TOKEN(Sales,",",11)
TOKEN(Sales,",",12)

Token solution

3 comments:

  1. I know this is an older article, but it saved my cookies! Thanks guys!

    ReplyDelete
  2. Thanks, this post helps me to know more about how to handling string using different method.

    ReplyDelete
  3. Thanks a lot, this post helps me to understand more about how to handle string using different methods

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.