Some flat files have one or more trailer / footer records at the bottom of the file. For example:
Name Address Joost Street 1 William Avenue 4b John Plaza 3 Bill Road 15 Records 4 |
The header is easy to skip in the Flat File Connection Manager, but there isn't an option for skipping the trailer record. How do I skip trailer records?
No option for skipping trailer records |
Solution
It all depends on whether you can distinguish the trailer record from the other records.
A distinguishable trailer
If you can distinguish the trailer record, you should use an expression in a Conditional Split to filter it out. The Data Flow below is for the file example above:
Filter with an expression |
No distinguishable trailer
For files where you can't distinguish the trailer record from the others there are roughly two kind of solutions:
- Write your own source with a Script Component (or a custom component) which skips the last record.
- Make the trailer record(s) distinguishable so you can still use a Conditional Split to skip it.
// Even in C# you have to loop through the file to count the records var lineCount = 0; using (var reader = File.OpenText(@"C:\file.txt")) { while (reader.ReadLine() != null) { lineCount++; } } // There is a more efficient way of counting lines, but it's // only available/efficient in .Net Framework 4 and later. var lineCount = File.ReadLines(@"C:\file.txt").Count();
Let's elaborate the second solution.
This solution will first count the number of records in the file and then adds a numberfield with an increasing number and uses that number to filter out the last record.
1) Add Variable
Add an integer variable named Counter to store the number of records.
Right Click in your Data Flow |
2) Add first dataflow
In the first data flow there is only a Flat File Source Component that reads your text file. Select only 1 column (preferably the smallest) because we are only counting the rows. Add a Row Count Component that uses the variable of step 1 to store the number of records. It should look like this:
Only counting the rows. |
3) Add second dataflow
Add a second dataflow with the same Flat File Source Component that reads your text file (also reuse the Flat File Connection Manager). And add a Script Component (transformation) behind it. The Script Component will add a RowId column.
Script Component (transformation) |
4) Script Component
Edit the Script Component. We need a new column to store the Row ID in. So, add a new column on the tab Inputs and Outputs. The type should be an integer. A four-byte signed integer should be enough, but you can change that for smaller or larger files.
Add new integer column |
5) The Script
SSIS will create three methods on default, but you will only need Input0_ProcessInputRow (remove the others). Copy the following method and variable declaration to your script.
// C# code using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { // New internal variable to store the rownumber private int rowCounter = 0; // Method that will be started for each record in you dataflow // It fills the new column with an increasing number public override void Input0_ProcessInputRow(Input0Buffer Row) { // Indent counter rowCounter++; // Fill the new column Row.RowId = rowCounter; } }
The result will be something like this:
RowId Name Address 1 Joost Street 1 2 William Avenue 4b 3 John Plaza 3 4 Bill Road 15 5 Records 4 |
6) Conditional Split
Now you can use the RowId column and Counter variable to split the trailer record(s). Add a Conditional Split behind your Script Component. If you have only one trailer record the expression should be: RowId < @[User::Counter]
Split the trailer from the rest |
7) The result
The solution may look clumsy, but it works rather fast (even on my laptop with over a million records).
Overview of the solution |
Note: merging the two Data Flows or adding a post method in the Script Component wont work because they fill the variable too late and the conditional split wont work correct.
Won't work |
Please let me know if you have an other solution to skip the trailer record.
Hi, this looks great. I have a large text file and need to read only the footer only. Is there a quick way (without reading all the records) of reading this?
ReplyDeleteI'm afraid that isn't possible with SSIS. But there are some cleaver .Net Scripts available (which I haven't tested myself). For example: http://stackoverflow.com/questions/398378/get-last-10-lines-of-very-large-text-file-10gb-c-sharp
Deletehttp://stackoverflow.com/questions/4619735/how-to-read-last-n-lines-of-log-file
Some of the images are missing - in particular 4) Script Component. I hope you can re-add it. Thanks.
DeleteHi Thanks for leaving a message! Please try refreshing the page... can see missing images. The image a 4) is http://3.bp.blogspot.com/-VEA4tmJdQJo/TV5_no0jqXI/AAAAAAAAAOA/GVV3Z1BcJWc/s320/Footer5.jpg. If it's still not working then please contact me via the Contact Me in the menu.
DeleteHow to remove header and footer row in flat file and move the flat file to another location without having that header and footer rows .
ReplyDeleteJust read the file as in the solutions above and add a Flat File destination.
DeleteThis worked wonders. Thank you so much!
ReplyDelete