Saturday, 12 February 2011

Script Component as source 2

Case
This is an follow up on another strangely formatted file article: My CSV file has splitted rows. There is only one attribute on each row:
id,1
name,Joost
address,Street 2
id,2
name,Jack
address,Avenue 4b
id,3
name,William
address,Plaza 5

I want to convert it to:
id,name,address
1,Joost,Street 1
2,Jack,Avenue 4b
3,William,Plaza 5

The pivot isn't usable because the records aren't grouped with a key. What's the solution?

Solution
You can use a Script Component as a source to solve this.

1) Script Component
Go to your Data Flow and add a Script Component. The Type should be Source.
Script Type Source




















2) Add Columns
Edit the Script Component and go to the Inputs and Outputs tab. Add three columns at the output section:
  1. ID (four-byte signed integer)
  2. Name (string)
  3. Address (string)
Add output columns



















3) Connection Manager
Go to the Connection Managers tab and add a connection (1) to avoid a hardcoded connection string in your Script. The Connection Manager Type (2) should be FILE and the Usage type (3) should be Existing file. After that change the default name from "Connection" to "myFile".
Add FILE connection to avoid hardcoded connection strings


























4) The Script
Go to the first tab and hit the Edit Script... button. This script is in the default C# language. SSIS will create three methods: PreExecute, PostExecute and CreateNewOutputRows. Remove the first two, because we don't need them. Replace your method with this one and add the using row at the top.
// C# code
using System;
using System.Data;
using System.IO;    // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void CreateNewOutputRows()
    {
        // Variables to store column values
        int ID = 0;             // Column 1
        string Name = "";       // Column 2
        string Address = "";    // Column 3
        int NumberOfColumns = 3;

        // Counter to keep track of the current column        
        int counter = 0;

        // Read file (use the FILE connection added in the script component, named myFile)
        using (StreamReader sr = new StreamReader(this.Connections.myFile.ConnectionString))
        {
            String line;
            // Read lines from the file until the end of the file is reached.
            while ((line = sr.ReadLine()) != null)
            {
                // Raising the counter
                counter++;

                // Split the line into columns
                string[] columns = line.Split(',');

                // Fill the right variable
                if (counter.Equals(1))
                {
                    // Column 1
                    ID = System.Convert.ToInt32(columns[1]);
                }
                else if (counter.Equals(2))
                {
                    // Column 2
                    Name = columns[1];
                }
                else if (counter.Equals(3))
                {
                    // Column 3
                    Address = columns[1];
                }

                // Add a row if the last column has been reached
                if (counter.Equals(NumberOfColumns))
                {
                    // Add one new row and fill columns
                    this.Output0Buffer.AddRow();
                    Output0Buffer.ID = ID;
                    Output0Buffer.Name = Name;
                    Output0Buffer.Address = Address;

                    // Last column, reset counter
                    counter = 0;
                }
            }
        }
    }
}

5) The result
Add a random transformation component after your source and add a dataviewer between them to see the result.
The result




















Note: You should add some error handling to your script (Try Catch) to avoid unexpected errors (no file, other delimiter, empty rows, etc.).

Let me know if you have another strangely formatted file. Maybe I can find a solution to process it.

No comments:

Post a Comment

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.