Tuesday, 17 July 2012

Custom SSIS Component: Foreach Excel Worksheet Enumerator

Recently I did a post on how to loop through all sheets in an Excel file. For those who don't like scripting and for those who think the Foreach ADO.NET Schema Rowset solution is ugly, I created a custom enumerator. Same result, but with an easier interface and a possibility to filter with both wildcards and regular expressions.

Foreach Excel Worksheet Enumerator























Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008 and 2012 version on the download page.

Installation
The installer registers the DLL in the GAC and copies it to the ForEachEnumerators folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\ForEachEnumerators\). After that you have to close Visual Studio/BIDS because it caches the GAC on startup.

SSIS
The enumerator is automatically added to the Foreach Loop when you open a SSIS project.
The new enumerator

Tuesday, 10 July 2012

Disable multiple checkout for SSIS 2008 R2 and before

Case
A while ago I did a post on how to get SSIS 2008 working with Team Foundation Server (TFS). But beware, you shouldn't work with two developers on one package at the same time! That will cause problems with for example lineage id's. Merging two versions will be nearly impossible. So how do you prevent that?

Solution
Go to the Team Explorer pane in Visual Studio (BIDS). Right click on the solution and select Team Project Settings and then Source Control. Now you can uncheck the checkbox for Enable multiple check-out.  
Disable multiple check-out for SSIS 2008
























Note: SSIS 2012 seems to work better with merging.

Thursday, 5 July 2012

Foreach Excel Worksheet Enumerator

Case
I have an Excel file with multiple identical worksheets (one foreach month) and I want to add the data to a single database table. Is there a foreach loop solution so that I don't need to add multiple data flows or sources.?

Solution
a) You could use a union all query in the Excel Source. Not very flexible, but very easy.
SELECT  Column1
,  Column2
,  Column3
FROM  [Sheet1$]
UNION ALL
SELECT  Column1
,  Column2
,  Column3
FROM  [Sheet2$]
UNION ALL
SELECT  Column1
,  Column2
,  Column3
FROM  [Sheet3$]
b) You could loop through Excel tables by using the Foreach ADO.NET Schema Rowset enumerator. One downside: it returns both worksheets (which have the $ suffix) and named ranges. So you need an extra dummy task in your foreach loop with an expression on the precedence constraint to the next task. Something like: RIGHT(@[User::WorksheetName], 1) == "$"
c) You could use a Script Task to fill a SSIS object variable and use that to loop through. Let's elaborate that solution.

*update: Custom Excel Worksheet Enumerator.


1) Excel Connection Manager and Variables
Add an Excel Connection manager with a link to an Excel File with multiple Worksheets. I named mine "MyExcelFile". If you choose an other name, then make sure to also change that in the script task of step 3.

And we also need two SSIS variables. One object variable (ExcelWorksheets) to loop through in the foreach loop and one string variable (WorksheetName) to be filled by the foreach loop.
Two variables



















2) Script Task
Add a Script Task to your Control Flow and give it a suitable name. Edit the Script Task and add the SSIS Object variable "ExcelWorksheets" from step 1 as ReadWrite variable.
ReadWriteVariables






















3) The script
Edit the Script Task (open VSTA editor) and copy the following script.
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Data.OleDb;        // Added

namespace ST_0607edbf2c834733a551cd01039cb715.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            try
            {
                // Get connectionstring from Excel Connection Manager and use it to connect through OLE DB
                OleDbConnection excelConnection = new OleDbConnection(Dts.Connections["MyExcelFile"].ConnectionString);
                excelConnection.Open();

                // Once connected, get the table schema and close the OLE DB connection
                DataTable dtDatasetsInExcel = excelConnection.GetSchema("Tables");
                excelConnection.Close();


                // NOTE: The datatable from GetSchema includes both worksheets (which have the $ suffix)
                // and named ranges. So we need to exclude those named ranges.

                // Create a dataset. 
                DataSet dsWorksheetsInExcel = new DataSet();

                // Create a new table in the dataset
                DataTable dtWorksheetsInExcel = dsWorksheetsInExcel.Tables.Add();
                dtWorksheetsInExcel.Columns.Add("WorksheetName", typeof(string));


                // Loop through all tables and only get those ending with a $
                foreach (DataRow drWorksheet in dtDatasetsInExcel.Rows)
                {
                    // Check for $ suffix
                    if (drWorksheet["TABLE_NAME"].ToString().EndsWith("$"))
                    {
                        dtWorksheetsInExcel.Rows.Add(drWorksheet["TABLE_NAME"].ToString());
                    }
                }

                // Fire information event with the total number of worksheets
                bool fireAgain = true;
                Dts.Events.FireInformation(-1, "Foreach Worksheet", dtWorksheetsInExcel.Rows.Count.ToString() + " worksheets found.", string.Empty, 0, ref fireAgain);


                // Fill SSIS Object variable with worksheet dataset.
                Dts.Variables["User::ExcelWorksheets"].Value = dsWorksheetsInExcel;

                // Success
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception ex)
            {
                // Something went wrong. Log error and fail Script Task
                Dts.Events.FireError(-1,"Foreach Worksheet", ex.Message, string.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }
}


or VB.Net

' VB.Net code
Imports System
Imports System.Data
Imports System.Data.OleDb   ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="" > _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 
    Public Sub Main()
        Try
            ' Get connectionstring from Excel Connection Manager and use it to connect through OLE DB
            Dim excelConnection As New OleDbConnection(Dts.Connections("MyExcelFile").ConnectionString)
            excelConnection.Open()

            ' Once connected, get the table schema and close the OLE DB connection
            Dim dtDatasetsInExcel As DataTable = excelConnection.GetSchema("Tables")
            excelConnection.Close()

            ' NOTE: The datatable from GetSchema includes both worksheets (which have the $ suffix)
            ' and named ranges. So we need to exclude those named ranges.

            ' Create a dataset. 
            Dim dsWorksheetsInExcel As New DataSet()

            ' Create a new table in the dataset
            Dim dtWorksheetsInExcel As DataTable = dsWorksheetsInExcel.Tables.Add()
            dtWorksheetsInExcel.Columns.Add("WorksheetName", GetType(String))

            ' Loop through all tables and only get those ending with a $
            For Each drWorksheet As DataRow In dtDatasetsInExcel.Rows
                ' Check for $ suffix
                If drWorksheet("TABLE_NAME").ToString().EndsWith("$") Then
                    dtWorksheetsInExcel.Rows.Add(drWorksheet("TABLE_NAME").ToString())
                End If
            Next

            ' Fire information event with the total number of worksheets
            Dim fireAgain As Boolean = True
            Dts.Events.FireInformation(-1, "Foreach Worksheet", dtWorksheetsInExcel.Rows.Count.ToString() & " worksheets found.", String.Empty, 0, fireAgain)

            ' Fill SSIS Object variable with worksheet dataset.
            Dts.Variables("User::ExcelWorksheets").Value = dsWorksheetsInExcel

            ' Success
            Dts.TaskResult = ScriptResults.Success
        Catch ex As Exception
            ' Something went wrong. Log error and fail Script Task
            Dts.Events.FireError(-1, "Foreach Worksheet", ex.Message, String.Empty, 0)
            Dts.TaskResult = ScriptResults.Failure
        End Try
    End Sub
End Class



Worksheets and named ranges











4) Foreach Loop
Add a Foreach Loop Container to the Control Flow and give it a suitable name. Connect it to the Script Task from step 2.
Foreach Loop


























5) Foreach ADO enumerator
Edit the Foreach Loop Container and select the ADO enumerator in the collection tab. After that select the SSIS object variable from step 1 as ADO object source variable and select Rows in first table as the enumeration mode.
Foreach ADO enumerator
























6) Variable mappings
In the Variable mappings tab select the string variable from step 1. Now the Foreach Loop will fill this variable with the currect Worksheet name.
Variable mappings
























7) Excel Source
Add a Data Flow Task in the foreach loop and add a Excel Source to that Data Flow Task. Select the Excel Connection manager created in step 1 and select Table name or view name variable. Now you can select the variable that contains the Worksheet name.
Excel Source using variable for table name
























Note 1: hidden worksheets are not returned by the OLE DB provider.
Note 2: Worksheets are always sorted ascending on their name
Note 3: If you have different data structures, but worksheets all have some common columns, then you could create a query in a variable and use that as a source query. Something like "SELECT column1, column4 FROM " + @[User::WorksheetName]