Wednesday, 6 February 2013

Value of variable during runtime

Case
I want to know the value of a variable during runtime of an SSIS package.
Value of variable between two tasks


















Solution
You can either use a breakpoint or a Script Task to find out the value of a variable.

A) Script Task

A1) Add Script Task
Add a Script Task between the two tasks and select the FilePath variable as readdonly variable.
Script Task - Readonly variable



















A2) The script
There are three options that you could choose. Pick one.
//C# code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_d730d75a40304a6bb675bc184c2aa717
{
 [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
 public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
 {

  public void Main()
  {
        // Choose one of these methods:

        // 1) Fire event and watch the execution result tab
        bool fireAgain = true;
        Dts.Events.FireInformation(-1, "Value of FilePath:", Dts.Variables["User::FilePath"].Value.ToString(), string.Empty, -1, ref fireAgain);


        // 2) Use the .Net framework trace log and see Debug View: http://technet.microsoft.com/en-us/sysinternals/bb896647.aspx
        System.Diagnostics.Trace.WriteLine("Value of FilePath: " + Dts.Variables["User::FilePath"].Value.ToString());


        // 3) Good old messagebox and click to continue
        System.Windows.Forms.MessageBox.Show("Value of FilePath: " + Dts.Variables["User::FilePath"].Value.ToString());


        Dts.TaskResult = (int)ScriptResults.Success;
  }

        #region ScriptResults declaration
        /// <summary>
        /// This enum provides a convenient shorthand within the scope of this class for setting the
        /// result of the script.
        /// 
        /// This code was generated automatically.
        /// </summary>
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion
 }
}

or VB.Net

'VB.Net code
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute()> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
    Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    Public Sub Main()
        ' Choose one of these methods:

        ' 1) Fire event and watch the execution result tab
        Dim fireAgain As Boolean = True
        Dts.Events.FireInformation(-1, "Value of FilePath:", Dts.Variables("User::FilePath").Value.ToString(), String.Empty, -1, fireAgain)


        ' 2) Use the .Net framework trace log and see Debug View: http://technet.microsoft.com/en-us/sysinternals/bb896647.aspx
        System.Diagnostics.Trace.WriteLine("Value of FilePath: " + Dts.Variables("User::FilePath").Value.ToString())


        ' 3) Good old messagebox and click to continue
        System.Windows.Forms.MessageBox.Show("Value of FilePath: " + Dts.Variables("User::FilePath").Value.ToString())


        Dts.TaskResult = ScriptResults.Success
    End Sub

#Region "ScriptResults declaration"
    'This enum provides a convenient shorthand within the scope of this class for setting the
    'result of the script.

    'This code was generated automatically.
    Enum ScriptResults
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    End Enum

#End Region

End Class

A3) The Result
Now run the package and watch the result.
Oops, no flat file but an Excel file























B) Breakpoints

B1) Breakpoint
Right click the Data Flow Task, Choose "Edit Breakpoints..." and then select the OnPreExecute event. This is the event right before starting the Data Flow Task.
Add breakpoint























B2) Excute package
Now execute the package and wait for it to hit the breakpoint.
Run package and wait for breakpoint

















B3) Locals
Wait for the package to hit the breakpoint. Then go to the Debug menu and click Windows and then Locals (Ctrl+Alt+V,L). This wil open a new window.
Locals
















B4) The result
Now downdrill the variables in the Locals window and search for your variable and its value.
Oops, no flat file but an Excel file

























This last method is probably a lot easier.

Friday, 1 February 2013

T-SQL Merge in SSIS as SCD alternative

Case
About two year ago I did a post about alternatives for the poorly performing SCD Transformation in SSIS. And even in SSIS 2012 it is still unusably slow for larger data sets. In that post I showed a couple of alternatives, but I didn't workout the T-SQL Merge solution.

Solution
A small recap of the alternatives in that post:
  1. The SSIS Dimension Merge SCD from Pragmatic Works which is according to the developers about a 100 times faster.
  2. And there are lot of third party components that can do the trick like CozyRoc Table Difference ($), the Pragmatic Works Upsert Destination ($), Konesans Checksum Transformation and our own Checksum Transformation.
  3. And of course the standard Lookup Transformation solution which works quite well for 2005.
  4. But if you have SSIS 2008 or later, then you could also use the TSQL Merge statement. Let's elaborate that solution.
Before we start, there is also a MERGE Destination that uses a T-SQL Merge statement, but it is still beta (since 2008) and the reviews aren't that good. I did a small test my self, but stopped when I got some meaningless error messages.

T-SQL Merge solution
For this example I use a text file with client information that I want to use to populate my Client dimension. To keep it simple I just use 4 columns: BusinessKey, Title, Firstname and Lastname.


1) Variables
For logging purposes we need to add three integer variables: staged, inserts, updates:
Variables for logging








2) Staging
Next step is to load the flat file with clients into a staging table with a Data Flow Task. The staging table is truncated before loading and the number of records in the Data Flow Task is stored in the variable staged with a rowcount transformation. It should look something like this below:
















3) Execute SQL Task
Now we have two tables (staging and dimension) and we need to merge them:
Staging and dimension





















For this we need to add an Execute SQL Task.
Execute SQL Task






















4) Execute SQL Task - General
Edit the task and set the ResultSet to Single row, select the OLE DB connection and enter the MERGE query. The MERGE query below does INSERTS and UPDATES only. We don't delete records from the dimension table. For logging purposes I added a count of the number of inserts and updates.
General pane




















The query to use:
-- Create a table variable to store the number of updates and inserts
DECLARE @tableVar TABLE (MergeAction VARCHAR(20))

-- The actual MERGE statement
MERGE  dwh.Dim_Client as MyTarget
USING  (
   SELECT BusinessKey
   ,  Title
   ,  FirstName
   ,  LastName
   FROM staging.Clients
   ) as MySource
   ON MyTarget.BusinessKey = MySource.BusinessKey
   WHEN MATCHED AND NOT (
    MySource.Title   = ISNULL(MyTarget.Title, '')
    AND MySource.FirstName = ISNULL(MyTarget.FirstName, '')
    AND MySource.LastName = ISNULL(MyTarget.LastName, '')
   )
   THEN -- BusinessKey of client exists, but data is different
    UPDATE  
    Set MyTarget.Title  = MySource.Title
    , MyTarget.FirstName = MySource.FirstName
    , MyTarget.LastName = MySource.LastName
   WHEN NOT MATCHED BY TARGET
   THEN -- BusinessKey of client does not exist
    INSERT (BusinessKey, Title, FirstName, LastName)
    VALUES (MySource.BusinessKey, MySource.Title, MySource.FirstName, MySource.LastName)
OUTPUT $action INTO @tableVar;

-- Get the number of inserts and updates on one line
SELECT  SUM(Inserted) as Inserted
,   SUM(Updated) as Updated
FROM  (
   -- Count the number of inserts
   SELECT COUNT(*) as Inserted, 0 as Updated
   FROM @tableVar  
   WHERE MergeAction = 'INSERT'

   UNION ALL
   -- Count the number of updates
   SELECT 0 as Inserted, COUNT(*) as Updated
   FROM @tableVar  
   WHERE MergeAction = 'UPDATE'
   ) as CountTable;


5) Execute SQL Task - Result Set
Go to the Result Set pane and select the insert and update variables for storing the counts.
Result Set pane























6) The Result
For testing purposes I added a ridiculous number of clients in my text file and instead of logging to a table I added a MessageBox in a Script Task. My 4 year old, 4GB machine did quite well with the MERGE statement (around 30 second), but the SCD hadn't processed any records after two hours.
The result