Thursday, 5 July 2012

Foreach Excel Worksheet Enumerator

Case
I have an Excel file with multiple identical worksheets (one foreach month) and I want to add the data to a single database table. Is there a foreach loop solution so that I don't need to add multiple data flows or sources.?

Solution
a) You could use a union all query in the Excel Source. Not very flexible, but very easy.
SELECT  Column1
,  Column2
,  Column3
FROM  [Sheet1$]
UNION ALL
SELECT  Column1
,  Column2
,  Column3
FROM  [Sheet2$]
UNION ALL
SELECT  Column1
,  Column2
,  Column3
FROM  [Sheet3$]
b) You could loop through Excel tables by using the Foreach ADO.NET Schema Rowset enumerator. One downside: it returns both worksheets (which have the $ suffix) and named ranges. So you need an extra dummy task in your foreach loop with an expression on the precedence constraint to the next task. Something like: RIGHT(@[User::WorksheetName], 1) == "$"
c) You could use a Script Task to fill a SSIS object variable and use that to loop through. Let's elaborate that solution.

*update: Custom Excel Worksheet Enumerator.


1) Excel Connection Manager and Variables
Add an Excel Connection manager with a link to an Excel File with multiple Worksheets. I named mine "MyExcelFile". If you choose an other name, then make sure to also change that in the script task of step 3.

And we also need two SSIS variables. One object variable (ExcelWorksheets) to loop through in the foreach loop and one string variable (WorksheetName) to be filled by the foreach loop.
Two variables



















2) Script Task
Add a Script Task to your Control Flow and give it a suitable name. Edit the Script Task and add the SSIS Object variable "ExcelWorksheets" from step 1 as ReadWrite variable.
ReadWriteVariables






















3) The script
Edit the Script Task (open VSTA editor) and copy the following script.
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Data.OleDb;        // Added

namespace ST_0607edbf2c834733a551cd01039cb715.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            try
            {
                // Get connectionstring from Excel Connection Manager and use it to connect through OLE DB
                OleDbConnection excelConnection = new OleDbConnection(Dts.Connections["MyExcelFile"].ConnectionString);
                excelConnection.Open();

                // Once connected, get the table schema and close the OLE DB connection
                DataTable dtDatasetsInExcel = excelConnection.GetSchema("Tables");
                excelConnection.Close();


                // NOTE: The datatable from GetSchema includes both worksheets (which have the $ suffix)
                // and named ranges. So we need to exclude those named ranges.

                // Create a dataset. 
                DataSet dsWorksheetsInExcel = new DataSet();

                // Create a new table in the dataset
                DataTable dtWorksheetsInExcel = dsWorksheetsInExcel.Tables.Add();
                dtWorksheetsInExcel.Columns.Add("WorksheetName", typeof(string));


                // Loop through all tables and only get those ending with a $
                foreach (DataRow drWorksheet in dtDatasetsInExcel.Rows)
                {
                    // Check for $ suffix
                    if (drWorksheet["TABLE_NAME"].ToString().EndsWith("$"))
                    {
                        dtWorksheetsInExcel.Rows.Add(drWorksheet["TABLE_NAME"].ToString());
                    }
                }

                // Fire information event with the total number of worksheets
                bool fireAgain = true;
                Dts.Events.FireInformation(-1, "Foreach Worksheet", dtWorksheetsInExcel.Rows.Count.ToString() + " worksheets found.", string.Empty, 0, ref fireAgain);


                // Fill SSIS Object variable with worksheet dataset.
                Dts.Variables["User::ExcelWorksheets"].Value = dsWorksheetsInExcel;

                // Success
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception ex)
            {
                // Something went wrong. Log error and fail Script Task
                Dts.Events.FireError(-1,"Foreach Worksheet", ex.Message, string.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }
}


or VB.Net

' VB.Net code
Imports System
Imports System.Data
Imports System.Data.OleDb   ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="" > _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 
    Public Sub Main()
        Try
            ' Get connectionstring from Excel Connection Manager and use it to connect through OLE DB
            Dim excelConnection As New OleDbConnection(Dts.Connections("MyExcelFile").ConnectionString)
            excelConnection.Open()

            ' Once connected, get the table schema and close the OLE DB connection
            Dim dtDatasetsInExcel As DataTable = excelConnection.GetSchema("Tables")
            excelConnection.Close()

            ' NOTE: The datatable from GetSchema includes both worksheets (which have the $ suffix)
            ' and named ranges. So we need to exclude those named ranges.

            ' Create a dataset. 
            Dim dsWorksheetsInExcel As New DataSet()

            ' Create a new table in the dataset
            Dim dtWorksheetsInExcel As DataTable = dsWorksheetsInExcel.Tables.Add()
            dtWorksheetsInExcel.Columns.Add("WorksheetName", GetType(String))

            ' Loop through all tables and only get those ending with a $
            For Each drWorksheet As DataRow In dtDatasetsInExcel.Rows
                ' Check for $ suffix
                If drWorksheet("TABLE_NAME").ToString().EndsWith("$") Then
                    dtWorksheetsInExcel.Rows.Add(drWorksheet("TABLE_NAME").ToString())
                End If
            Next

            ' Fire information event with the total number of worksheets
            Dim fireAgain As Boolean = True
            Dts.Events.FireInformation(-1, "Foreach Worksheet", dtWorksheetsInExcel.Rows.Count.ToString() & " worksheets found.", String.Empty, 0, fireAgain)

            ' Fill SSIS Object variable with worksheet dataset.
            Dts.Variables("User::ExcelWorksheets").Value = dsWorksheetsInExcel

            ' Success
            Dts.TaskResult = ScriptResults.Success
        Catch ex As Exception
            ' Something went wrong. Log error and fail Script Task
            Dts.Events.FireError(-1, "Foreach Worksheet", ex.Message, String.Empty, 0)
            Dts.TaskResult = ScriptResults.Failure
        End Try
    End Sub
End Class



Worksheets and named ranges











4) Foreach Loop
Add a Foreach Loop Container to the Control Flow and give it a suitable name. Connect it to the Script Task from step 2.
Foreach Loop


























5) Foreach ADO enumerator
Edit the Foreach Loop Container and select the ADO enumerator in the collection tab. After that select the SSIS object variable from step 1 as ADO object source variable and select Rows in first table as the enumeration mode.
Foreach ADO enumerator
























6) Variable mappings
In the Variable mappings tab select the string variable from step 1. Now the Foreach Loop will fill this variable with the currect Worksheet name.
Variable mappings
























7) Excel Source
Add a Data Flow Task in the foreach loop and add a Excel Source to that Data Flow Task. Select the Excel Connection manager created in step 1 and select Table name or view name variable. Now you can select the variable that contains the Worksheet name.
Excel Source using variable for table name
























Note 1: hidden worksheets are not returned by the OLE DB provider.
Note 2: Worksheets are always sorted ascending on their name
Note 3: If you have different data structures, but worksheets all have some common columns, then you could create a query in a variable and use that as a source query. Something like "SELECT column1, column4 FROM " + @[User::WorksheetName]

5 comments:

  1. thanks very much for publishing this.

    just a quick note:
    your check for $ suffix
    if(drWorksheet["TABLE_NAME"].ToString().EndsWith("$"))

    does not always work as Microsoft, on their wisdom, can set worksheet names with single quotes to surround non-alpha characters as in:

    tab name = JUL-12

    will be loaded by your script as 'JUL-12$'

    hence the last character is no longer $ but instead '

    with regards,

    Nicolas

    www.brainpowered.net

    ReplyDelete
    Replies
    1. @Nicolas: Thanks for the valuable feedback!
      An easy work-around could be to extend the if statement with an OR. Something like:
      if (drWorksheet["TABLE_NAME"].ToString().EndsWith("$")) or drWorksheet["TABLE_NAME"].ToString().EndsWith("$!"))

      But will test it and come up with a solution.

      Delete
  2. Hi Joost, I downloaded the ForeachEnumerator, however when I try to install it, I get an error stating "Could not access network location ForEachEumerators\."

    Could you shed some light on this?

    ReplyDelete
    Replies
    1. Have you tried running it as administrator? Which version/edition of SSIS are you using?

      Delete
  3. I know this post was from a while ago but maybe someone can help. It appears that the oledb connection does not update if this code is placed inside a nested foreach loop to loop through multiple files and multiple worksheets. For whatever reason, design value sticks even though connection is being changed during the runtime. I ended up updating the connection string inside the script task. This isn't probably much to go on but anyone has ideas on why this may be so?

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.