I have an Excel file with multiple identical worksheets (one foreach month) and I want to add the data to a single database table. Is there a foreach loop solution so that I don't need to add multiple data flows or sources.?
Solution
a) You could use a union all query in the Excel Source. Not very flexible, but very easy.
SELECT Column1 , Column2 , Column3 FROM [Sheet1$] UNION ALL SELECT Column1 , Column2 , Column3 FROM [Sheet2$] UNION ALL SELECT Column1 , Column2 , Column3 FROM [Sheet3$]b) You could loop through Excel tables by using the Foreach ADO.NET Schema Rowset enumerator. One downside: it returns both worksheets (which have the $ suffix) and named ranges. So you need an extra dummy task in your foreach loop with an expression on the precedence constraint to the next task. Something like: RIGHT(@[User::WorksheetName], 1) == "$"
c) You could use a Script Task to fill a SSIS object variable and use that to loop through. Let's elaborate that solution.
*update: Custom Excel Worksheet Enumerator.
1) Excel Connection Manager and Variables
Add an Excel Connection manager with a link to an Excel File with multiple Worksheets. I named mine "MyExcelFile". If you choose an other name, then make sure to also change that in the script task of step 3.
And we also need two SSIS variables. One object variable (ExcelWorksheets) to loop through in the foreach loop and one string variable (WorksheetName) to be filled by the foreach loop.
Two variables |
2) Script Task
Add a Script Task to your Control Flow and give it a suitable name. Edit the Script Task and add the SSIS Object variable "ExcelWorksheets" from step 1 as ReadWrite variable.
ReadWriteVariables |
3) The script
Edit the Script Task (open VSTA editor) and copy the following script.
// C# Code using System; using System.Data; using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; using System.Data.OleDb; // Added namespace ST_0607edbf2c834733a551cd01039cb715.csproj { [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase { #region VSTA generated code enum ScriptResults { Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success, Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure }; #endregion public void Main() { try { // Get connectionstring from Excel Connection Manager and use it to connect through OLE DB OleDbConnection excelConnection = new OleDbConnection(Dts.Connections["MyExcelFile"].ConnectionString); excelConnection.Open(); // Once connected, get the table schema and close the OLE DB connection DataTable dtDatasetsInExcel = excelConnection.GetSchema("Tables"); excelConnection.Close(); // NOTE: The datatable from GetSchema includes both worksheets (which have the $ suffix) // and named ranges. So we need to exclude those named ranges. // Create a dataset. DataSet dsWorksheetsInExcel = new DataSet(); // Create a new table in the dataset DataTable dtWorksheetsInExcel = dsWorksheetsInExcel.Tables.Add(); dtWorksheetsInExcel.Columns.Add("WorksheetName", typeof(string)); // Loop through all tables and only get those ending with a $ foreach (DataRow drWorksheet in dtDatasetsInExcel.Rows) { // Check for $ suffix if (drWorksheet["TABLE_NAME"].ToString().EndsWith("$")) { dtWorksheetsInExcel.Rows.Add(drWorksheet["TABLE_NAME"].ToString()); } } // Fire information event with the total number of worksheets bool fireAgain = true; Dts.Events.FireInformation(-1, "Foreach Worksheet", dtWorksheetsInExcel.Rows.Count.ToString() + " worksheets found.", string.Empty, 0, ref fireAgain); // Fill SSIS Object variable with worksheet dataset. Dts.Variables["User::ExcelWorksheets"].Value = dsWorksheetsInExcel; // Success Dts.TaskResult = (int)ScriptResults.Success; } catch (Exception ex) { // Something went wrong. Log error and fail Script Task Dts.Events.FireError(-1,"Foreach Worksheet", ex.Message, string.Empty, 0); Dts.TaskResult = (int)ScriptResults.Failure; } } } }
or VB.Net
' VB.Net code Imports System Imports System.Data Imports System.Data.OleDb ' Added Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime <System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="" > _ <System.CLSCompliantAttribute(False)> _ Partial Public Class ScriptMain Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase Enum ScriptResults Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure End Enum Public Sub Main() Try ' Get connectionstring from Excel Connection Manager and use it to connect through OLE DB Dim excelConnection As New OleDbConnection(Dts.Connections("MyExcelFile").ConnectionString) excelConnection.Open() ' Once connected, get the table schema and close the OLE DB connection Dim dtDatasetsInExcel As DataTable = excelConnection.GetSchema("Tables") excelConnection.Close() ' NOTE: The datatable from GetSchema includes both worksheets (which have the $ suffix) ' and named ranges. So we need to exclude those named ranges. ' Create a dataset. Dim dsWorksheetsInExcel As New DataSet() ' Create a new table in the dataset Dim dtWorksheetsInExcel As DataTable = dsWorksheetsInExcel.Tables.Add() dtWorksheetsInExcel.Columns.Add("WorksheetName", GetType(String)) ' Loop through all tables and only get those ending with a $ For Each drWorksheet As DataRow In dtDatasetsInExcel.Rows ' Check for $ suffix If drWorksheet("TABLE_NAME").ToString().EndsWith("$") Then dtWorksheetsInExcel.Rows.Add(drWorksheet("TABLE_NAME").ToString()) End If Next ' Fire information event with the total number of worksheets Dim fireAgain As Boolean = True Dts.Events.FireInformation(-1, "Foreach Worksheet", dtWorksheetsInExcel.Rows.Count.ToString() & " worksheets found.", String.Empty, 0, fireAgain) ' Fill SSIS Object variable with worksheet dataset. Dts.Variables("User::ExcelWorksheets").Value = dsWorksheetsInExcel ' Success Dts.TaskResult = ScriptResults.Success Catch ex As Exception ' Something went wrong. Log error and fail Script Task Dts.Events.FireError(-1, "Foreach Worksheet", ex.Message, String.Empty, 0) Dts.TaskResult = ScriptResults.Failure End Try End Sub End Class
Worksheets and named ranges |
4) Foreach Loop
Add a Foreach Loop Container to the Control Flow and give it a suitable name. Connect it to the Script Task from step 2.
Foreach Loop |
5) Foreach ADO enumerator
Edit the Foreach Loop Container and select the ADO enumerator in the collection tab. After that select the SSIS object variable from step 1 as ADO object source variable and select Rows in first table as the enumeration mode.
Foreach ADO enumerator |
6) Variable mappings
In the Variable mappings tab select the string variable from step 1. Now the Foreach Loop will fill this variable with the currect Worksheet name.
Variable mappings |
7) Excel Source
Add a Data Flow Task in the foreach loop and add a Excel Source to that Data Flow Task. Select the Excel Connection manager created in step 1 and select Table name or view name variable. Now you can select the variable that contains the Worksheet name.
Excel Source using variable for table name |
Note 1: hidden worksheets are not returned by the OLE DB provider.
Note 2: Worksheets are always sorted ascending on their name
Note 3: If you have different data structures, but worksheets all have some common columns, then you could create a query in a variable and use that as a source query. Something like "SELECT column1, column4 FROM " + @[User::WorksheetName]
thanks very much for publishing this.
ReplyDeletejust a quick note:
your check for $ suffix
if(drWorksheet["TABLE_NAME"].ToString().EndsWith("$"))
does not always work as Microsoft, on their wisdom, can set worksheet names with single quotes to surround non-alpha characters as in:
tab name = JUL-12
will be loaded by your script as 'JUL-12$'
hence the last character is no longer $ but instead '
with regards,
Nicolas
www.brainpowered.net
@Nicolas: Thanks for the valuable feedback!
DeleteAn easy work-around could be to extend the if statement with an OR. Something like:
if (drWorksheet["TABLE_NAME"].ToString().EndsWith("$")) or drWorksheet["TABLE_NAME"].ToString().EndsWith("$!"))
But will test it and come up with a solution.
Hi Joost, I downloaded the ForeachEnumerator, however when I try to install it, I get an error stating "Could not access network location ForEachEumerators\."
ReplyDeleteCould you shed some light on this?
Have you tried running it as administrator? Which version/edition of SSIS are you using?
DeleteI know this post was from a while ago but maybe someone can help. It appears that the oledb connection does not update if this code is placed inside a nested foreach loop to loop through multiple files and multiple worksheets. For whatever reason, design value sticks even though connection is being changed during the runtime. I ended up updating the connection string inside the script task. This isn't probably much to go on but anyone has ideas on why this may be so?
ReplyDelete