Case
This is an follow up on another strangely formatted file article: My CSV file has splitted rows. There is only one attribute on each row:
id,1 name,Joost address,Street 2 id,2 name,Jack address,Avenue 4b id,3 name,William address,Plaza 5 |
id,name,address 1,Joost,Street 1 2,Jack,Avenue 4b 3,William,Plaza 5 |
The pivot isn't usable because the records aren't grouped with a key. What's the solution?
Solution
You can use a Script Component as a source to solve this.
1) Script Component
Go to your Data Flow and add a Script Component. The Type should be Source.
Script Type Source |
2) Add Columns
Edit the Script Component and go to the Inputs and Outputs tab. Add three columns at the output section:
- ID (four-byte signed integer)
- Name (string)
- Address (string)
Add output columns |
3) Connection Manager
Go to the Connection Managers tab and add a connection (1) to avoid a hardcoded connection string in your Script. The Connection Manager Type (2) should be FILE and the Usage type (3) should be Existing file. After that change the default name from "Connection" to "myFile".
Add FILE connection to avoid hardcoded connection strings |
4) The Script
Go to the first tab and hit the Edit Script... button. This script is in the default C# language. SSIS will create three methods: PreExecute, PostExecute and CreateNewOutputRows. Remove the first two, because we don't need them. Replace your method with this one and add the using row at the top.
// C# code using System; using System.Data; using System.IO; // Added using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { public override void CreateNewOutputRows() { // Variables to store column values int ID = 0; // Column 1 string Name = ""; // Column 2 string Address = ""; // Column 3 int NumberOfColumns = 3; // Counter to keep track of the current column int counter = 0; // Read file (use the FILE connection added in the script component, named myFile) using (StreamReader sr = new StreamReader(this.Connections.myFile.ConnectionString)) { String line; // Read lines from the file until the end of the file is reached. while ((line = sr.ReadLine()) != null) { // Raising the counter counter++; // Split the line into columns string[] columns = line.Split(','); // Fill the right variable if (counter.Equals(1)) { // Column 1 ID = System.Convert.ToInt32(columns[1]); } else if (counter.Equals(2)) { // Column 2 Name = columns[1]; } else if (counter.Equals(3)) { // Column 3 Address = columns[1]; } // Add a row if the last column has been reached if (counter.Equals(NumberOfColumns)) { // Add one new row and fill columns this.Output0Buffer.AddRow(); Output0Buffer.ID = ID; Output0Buffer.Name = Name; Output0Buffer.Address = Address; // Last column, reset counter counter = 0; } } } } }
5) The result
Add a random transformation component after your source and add a dataviewer between them to see the result.
The result |
Note: You should add some error handling to your script (Try Catch) to avoid unexpected errors (no file, other delimiter, empty rows, etc.).
Let me know if you have another strangely formatted file. Maybe I can find a solution to process it.
No comments:
Post a Comment
Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.
All comments are moderated manually to prevent spam.