Thursday, 1 November 2012

Split multi value column into multiple records

Case
I have a column with multiple values and I want to split them into multiple records.












Solution
You could solve it with a fancy TSQL query. I saw split functions with common table expressions, but a relatively easy script could to the trick in SSIS as well. (don't hesitate to post your query/solution in the comments)

1) Source
Add your source to the Data Flow Task












2) Script Component - input columns
This solution uses an asynchronous Script Component, so add a Script Component (type transformation) to your Data Flow Task. Edit it, go to the Input Columns pane and select all the columns you need downstream the data flow as readonly. In this case we need the columns Teacher and Students.
Input columns (readonly)




















3) Script Component - output port
Go to the Inputs and Outputs pane and click on Output 0. Set the SynchronousInputID property to none to make this Script Component asynchronous.
asynchronous




















4) Script Component - output columns
Add output columns for each input column that you need downstream the data flow. In this case we need Teacher (same datatype and size) and a new column named Student which will contain one value from the input column Students (same datatype, but size could probably be smaller).
Output columns



















5) The script
Copy the Inputs0 _ProcessInputRow method to your script and remove all the other methods (PreExecute, PostExecute and CreateNewOutputRows) because we don't need them.
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Method that will execute for each row passing
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // First we are converting the comma seperated list into a string array.
        // You can change the comma if you are using an other seperator like | or ;
        string[] Students = Row.Students.ToString().Split(new char[] { ',' }, StringSplitOptions.None);

        // Counter var used the loop through the string array
        int i = 0;

        // Looping through string array with student names
        while (i < Students.Length)
        {
            // Start a new row in the output
            Output0Buffer.AddRow();

            // Pass through all columns that you need downstream the data flow
            Output0Buffer.Teacher = Row.Teacher;

            // This is the splitted column. Take the [n] element from the array
            // and put it in the new column.
            Output0Buffer.Student = Students[i];

            // Increase counter to go the next value
            i++;
        }
    }
}

or VB.Net

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    ' Method that will execute for each row passing
    Public Overrides Sub Input0_ProcessInputRow(Row As Input0Buffer)
        ' First we are converting the comma seperated list into a string array.
        ' You can change the comma if you are using an other seperator like | or ;
        Dim Students As String() = Row.Students.ToString().Split(New Char() {","c}, StringSplitOptions.None)

        ' Counter var used the loop through the string array
        Dim i As Integer = 0

        ' Looping through string array with student names
        While i < Students.Length
            ' Start a new row in the output
            Output0Buffer.AddRow()

            ' Pass through all columns that you need downstream the data flow
            Output0Buffer.Teacher = Row.Teacher

            ' This is the splitted column. Take the [n] element from the array
            ' and put it in the new column.
            Output0Buffer.Student = Students(i)

            ' Increase counter to go the next value
            i += 1
        End While
    End Sub
End Class


6) The result
For testing purposes I added a derived column and a couple of data viewer
The result



























Note: if you want to do this backwards, see this post.


16 comments:

  1. Thanks man, it helped me a lot. really a simple and helpful way to solve the problem.
    ~omair

    ReplyDelete
  2. +1 very helpful, thanks a lot!

    ReplyDelete
  3. When we need to make store of Multiple values in one cell of a table ????

    Please give me real example of that .

    Regards ,

    Mahmoud

    ReplyDelete
    Replies
    1. It's not a best practice to use/create multi-value columns. But sometimes you just get them and you have to deal with it.

      Delete
  4. How does the Output0Buffer write the new rows to the DB? My DB table isn't getting updated.

    ReplyDelete
    Replies
    1. Have you set the SynchronousInputID property to none? (step 3)

      Delete
    2. yes, I ensured that prior to posting. If I set up the data viewer then I can see the expected results, but the DB table remains the same.

      Delete
    3. So the solution works (you see data in the Data Viewer), but you have a problem with your destination component? Try removing and adding it or check the properties/settings/mappings. If that doesn't work then post your problem with in one of the forums mentioned below) with more details and screenshots.

      Delete
  5. Dear Steve Cook,
    Blogging doesn't pay the bills. Sometimes I have to work and earn money ;-) So please be a little polite and patient. If you need a faster response then you could try the msdn or stackoverflow forums.

    ReplyDelete
  6. hi please tell me tsql code for this

    ReplyDelete
  7. This guide was very helpful to me today. Thank you!
    Input row had three columns with arbitrary number of comma separated date quantity pairs in each. Output to a table by looping on each of the three splits was the happy result.

    ReplyDelete
  8. i keep getting errors on pass through rows that a null has been found...i want to pass through nulls....how would you handle this?

    ReplyDelete
  9. so i have one column that is pipe delimited and 67 other rows that need to pass through....i'm finding that if there is a null, it blows up....how would you handle that since I want to pass through nulls as well? Obviously i dont want a null for the column I'm splitting, so I've accounted for that in sql query.

    ReplyDelete
  10. What if the Student column has 3rd row and its null. Can you help me with code.

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.