Sunday, 27 February 2011

How to skip the trailer records

Case
Some flat files have one or more trailer / footer records at the bottom of the file. For example:

Name    Address
Joost   Street 1
William Avenue 4b
John    Plaza 3
Bill    Road 15
Records 4

The header is easy to skip in the Flat File Connection Manager, but there isn't an option for skipping the trailer record. How do I skip trailer records?
No option for skipping trailer records




















Solution
It all depends on whether you can distinguish the trailer record from the other records.

A distinguishable trailer
If you can distinguish the trailer record, you should use an expression in a Conditional Split to filter it out. The Data Flow below is for the file example above:
Filter with an expression























No distinguishable trailer
For files where you can't distinguish the trailer record from the others there are roughly two kind of solutions:
  1. Write your own source with a Script Component (or a custom component) which skips the last record.
  2. Make the trailer record(s) distinguishable so you can still use a Conditional Split to skip it.
Downside of both solutions is that you have to count the total number of records/lines in the textfile to know when to start skipping records. This means you have to loop through the file twice.

// Even in C# you have to loop through the file to count the records
var lineCount = 0;
using (var reader = File.OpenText(@"C:\file.txt"))
{
    while (reader.ReadLine() != null)
    {
        lineCount++;
    }
}

// There is a more efficient way of counting lines, but it's
// only available/efficient in .Net Framework 4 and later.
var lineCount = File.ReadLines(@"C:\file.txt").Count(); 


Let's elaborate the second solution.
This solution will first count the number of records in the file and then adds a numberfield with an increasing number and uses that number to filter out the last record.

1) Add Variable
Add an integer variable named Counter to store the number of records.
Right Click in your Data Flow













2) Add first dataflow
In the first data flow there is only a Flat File Source Component that reads your text file. Select only 1 column (preferably the smallest) because we are only counting the rows. Add a Row Count Component that uses the variable of step 1 to store the number of records. It should look like this:
Only counting the rows.














3) Add second dataflow
Add a second dataflow with the same Flat File Source Component that reads your text file (also reuse the Flat File Connection Manager). And add a Script Component (transformation) behind it. The Script Component will add a RowId column.
Script Component (transformation)














4) Script Component
Edit the Script Component. We need a new column to store the Row ID in. So, add a new column on the tab Inputs and Outputs. The type should be an integer. A four-byte signed integer should be enough, but you can change that for smaller or larger files.
Add new integer column



















5) The Script
SSIS will create three methods on default, but you will only need Input0_ProcessInputRow (remove the others). Copy the following method and variable declaration to your script.
// C# code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // New internal variable to store the rownumber 
    private int rowCounter = 0;

    // Method that will be started for each record in you dataflow 
    // It fills the new column with an increasing number
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Indent counter 
        rowCounter++;
        // Fill the new column 
        Row.RowId = rowCounter;
    }
}

The result will be something like this:
RowId Name    Address
1     Joost   Street 1
2     William Avenue 4b
3     John    Plaza 3
4     Bill    Road 15
5     Records 4


6) Conditional Split
Now you can use the RowId column and Counter variable to split the trailer record(s). Add a Conditional Split behind your Script Component. If you have only one trailer record the expression should be: RowId < @[User::Counter]
Split the trailer from the rest





















7) The result
The solution may look clumsy, but it works rather fast (even on my laptop with over a million records).
Overview of the solution




Note: merging the two Data Flows or adding a post method in the Script Component wont work because they fill the variable too late and the conditional split wont work correct.
Won't work






















Please let me know if you have an other solution to skip the trailer record.

7 comments:

  1. Hi, this looks great. I have a large text file and need to read only the footer only. Is there a quick way (without reading all the records) of reading this?

    ReplyDelete
    Replies
    1. I'm afraid that isn't possible with SSIS. But there are some cleaver .Net Scripts available (which I haven't tested myself). For example: http://stackoverflow.com/questions/398378/get-last-10-lines-of-very-large-text-file-10gb-c-sharp
      http://stackoverflow.com/questions/4619735/how-to-read-last-n-lines-of-log-file

      Delete
    2. Some of the images are missing - in particular 4) Script Component. I hope you can re-add it. Thanks.

      Delete
    3. Hi Thanks for leaving a message! Please try refreshing the page... can see missing images. The image a 4) is http://3.bp.blogspot.com/-VEA4tmJdQJo/TV5_no0jqXI/AAAAAAAAAOA/GVV3Z1BcJWc/s320/Footer5.jpg. If it's still not working then please contact me via the Contact Me in the menu.

      Delete
  2. How to remove header and footer row in flat file and move the flat file to another location without having that header and footer rows .

    ReplyDelete
    Replies
    1. Just read the file as in the solutions above and add a Flat File destination.

      Delete
  3. This worked wonders. Thank you so much!

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.

Related Posts Plugin for WordPress, Blogger...