Sunday, 30 December 2012

SSIS wishlist for 2013

It's almost 2013. A good time to look back on the past year, but also a good time for my SSIS wishlist for 2013. It's only a top 10, because I don't want to look greedy. ;-)
  1. Package parts for reusability. Vote
  2. Folder options in Visual Studio for SSIS projects. Vote
  3. BI templates for the latest Visual Studio (want to look cool among my .Net colleagues). Vote and Vote
  4. Toolbox improvements (ordering and folders). Vote and Vote
  5. Rename refactoring option (try to rename an used variable). Vote
  6. Change column order of Flat File connection manager. Vote
  7. Validate without running option. Vote
  8. Expressions on Project Parameters. Vote
  9. XML Source using a connection manager. Vote
  10. Replicate option for server environments. Vote
 
Submit your own SSIS wishes in the comments or at Microsoft Connect.

Monday, 17 December 2012

SSIS 2012 Copy Environments

Case
I created a new environment in the Integration Services Catalogs, but I can't copy it. If I want a duplicate / clone the enviroment then I have to create it from scratch with all the variables. Where is the copy option?
Where is the copy environment option?
















Solution
Well there isn't one yet (please see/vote this Microsoft Connect suggestion).  So I copied the create_environment stored procedure and altered it to a copy_environment stored procedure. You have to know the environment id from the environment you want to copy. Double click the existing environment and get the number from the Identifier property.
Get identifier of existing environment



















Add the new stored procedure and execute it like:
EXEC [catalog].[copy_environment] 3, 'Test', 'Test environment';

The result: a new environment with
same variables as 'Development'














PS you still have to reference the enviroment to your project manually:
Update: I adjusted the stored procedure and now it also links the new environment to the same project as the existing environment.
Reference the enviroment to your project
















The new stored procedure (use at own risk!):
USE [SSISDB]
GO

-- USE AT OWN RISK! This stored procedure is altered from create_environment that was shipped with:
--      Microsoft SQL Server 2012 (SP1) - 11.0.3000.0 (X64) 
--      Oct 19 2012 13:38:57 
--      Copyright (c) Microsoft Corporation
--      Developer Edition (64-bit) on Windows NT 6.1 <x64> (Build 7601: Service Pack 1)

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO


CREATE PROCEDURE [catalog].[copy_environment]
        --@folder_name        nvarchar(128),         -- Changed from parameter to variable
        @current_environment_id bigint,              -- Added to store the ID of the environment you want to copy
        @new_environment_name   nvarchar(128),         -- Renamed (added new_)
        @new_environment_description    nvarchar(1024)= NULL    -- Renamed (added new_)
WITH EXECUTE AS 'AllSchemaOwner'
AS
    SET NOCOUNT ON
    
    DECLARE @folder_id bigint
    DECLARE @folder_name        nvarchar(128)     -- Added (moved from parameter)
    DECLARE @new_environment_id bigint        -- Renamed (added new_)
    DECLARE @result bit
    
    
    DECLARE @caller_id     int
    DECLARE @caller_name   [internal].[adt_sname]
    DECLARE @caller_sid    [internal].[adt_sid]
    DECLARE @suser_name    [internal].[adt_sname]
    DECLARE @suser_sid     [internal].[adt_sid]
    
    EXECUTE AS CALLER
        EXEC [internal].[get_user_info]
            @caller_name OUTPUT,
            @caller_sid OUTPUT,
            @suser_name OUTPUT,
            @suser_sid OUTPUT,
            @caller_id OUTPUT;
          
          
        IF(
            EXISTS(SELECT [name]
                    FROM sys.server_principals
                    WHERE [sid] = @suser_sid AND [type] = 'S')  
            OR
            EXISTS(SELECT [name]
                    FROM sys.database_principals
                    WHERE ([sid] = @caller_sid AND [type] = 'S')) 
            )
        BEGIN
            RAISERROR(27123, 16, 1) WITH NOWAIT
            RETURN 1
        END
    REVERT
    
    IF(
            EXISTS(SELECT [name]
                    FROM sys.server_principals
                    WHERE [sid] = @suser_sid AND [type] = 'S')  
            OR
            EXISTS(SELECT [name]
                    FROM sys.database_principals
                    WHERE ([sid] = @caller_sid AND [type] = 'S')) 
            )
    BEGIN
            RAISERROR(27123, 16, 1) WITH NOWAIT
            RETURN 1
    END
    
    --IF (@folder_name IS NULL OR @environment_name IS NULL)  -- Changed, not checking the folder_name any more. Renamed @environment_name to @new_environment_name
 IF (@new_environment_name IS NULL)
    BEGIN
        RAISERROR(27138, 16 , 6) WITH NOWAIT 
        RETURN 1 
    END
    
    IF [internal].[is_valid_name](@new_environment_name) = 0
    BEGIN
        RAISERROR(27142, 16, 1, @new_environment_name ) WITH NOWAIT
        RETURN 1
    END
    
    
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
    
    
    
    DECLARE @tran_count INT = @@TRANCOUNT;
    DECLARE @savepoint_name NCHAR(32);
    IF @tran_count > 0
    BEGIN
        SET @savepoint_name = REPLACE(CONVERT(NCHAR(36), NEWID()), N'-', N'');
        SAVE TRANSACTION @savepoint_name;
    END
    ELSE
        BEGIN TRANSACTION;                                                                                          
    BEGIN TRY
    
        EXECUTE AS CALLER
            --SET @folder_id = (SELECT [folder_id] FROM [catalog].[folders] WHERE [name] = @folder_name)  -- Changed, getting the folder_id via the environment_id
   SET @folder_id = (SELECT [folder_id] FROM [internal].[environments] WHERE [environment_id] = @current_environment_id)
   SET @folder_name = (SELECT [name] FROM [catalog].[folders] WHERE [folder_id] = @folder_id)    -- Added, filling the folder_name which was a parameter
        REVERT
        
        IF @folder_id IS NULL
        BEGIN
            RAISERROR(27104 , 16 , 1, @folder_name) WITH NOWAIT
        END
        
        EXECUTE AS CALLER   
            SET @result =  [internal].[check_permission] 
                (
                    1,
                    @folder_id,
                    100
                 ) 
        REVERT
        
        IF @result = 0
        BEGIN
            RAISERROR(27209 , 16 , 1, @folder_name) WITH NOWAIT    
        END
         
        IF EXISTS(SELECT env.[environment_name] 
                      FROM [internal].[folders] fld INNER JOIN [internal].[environments] env
                      ON fld.[folder_id] = env.[folder_id] AND 
                      fld.[name] = @folder_name   AND 
                      env.[environment_name] = @new_environment_name)
        BEGIN
            RAISERROR(27157 , 16 , 1, @new_environment_name) WITH NOWAIT
        END
    
  -- CHANGED, old Insert query
        --INSERT INTO [internal].[environments] 
        --    VALUES (@environment_name, @folder_id, @environment_description, @caller_sid, @caller_name, SYSDATETIMEOFFSET())
        --    
        --SET @environment_id = SCOPE_IDENTITY()
       
        ------------------------------------------------------------------------------------------------------------
  -- NEW INSERT INTO QUERIES
  ------------------------------------------------------------------------------------------------------------
  -- Copy the environment
  INSERT INTO [internal].[environments]
  (
    [environment_name]
  ,  [folder_id]
  ,  [description]
  ,  [created_by_sid]
  ,  [created_by_name]
  ,  [created_time]
  )
  SELECT  @new_environment_name as [environment_name]  -- My new environment name from the parameter
  ,  [folder_id]
  ,  @new_environment_description as [description] -- My new environment description from the parameter
  ,  [created_by_sid]
  ,  [created_by_name]
  ,  SYSDATETIMEOFFSET() as [created_time]
  FROM [internal].[environments]
  WHERE [environment_id] = @current_environment_id   -- My existing environment from the parameter
  
  
  -- Get ID from the new environment
  SET @new_environment_id = SCOPE_IDENTITY()
  
  -- Copy the variables from the existing environment to the new environment
  INSERT INTO [internal].[environment_variables]
  (
    [environment_id]
  ,  [name]
  ,  [description]
  ,  [type]
  ,  [sensitive]
  ,  [value]
  ,  [sensitive_value]
  ,  [base_data_type]
  )
  SELECT @new_environment_id as [environment_id]   -- Id from the new environment
  ,  [name]
  ,  [description]
  ,  [type]
  ,  [sensitive]
  ,  [value]
  ,  [sensitive_value]
  ,  [base_data_type]
  FROM [internal].[environment_variables]
  WHERE [environment_id] = @current_environment_id   -- My existing environment from the parameter

  -- Reference this new environment to the same project
  INSERT INTO [internal].[environment_references]
  (
     [project_id]
  ,   [reference_type]
  ,   [environment_folder_name]
  ,   [environment_name]
  ,   [validation_status]
  ,   [last_validation_time]
  )
  SELECT  p.project_id
  ,   r.reference_type
  ,   r.environment_folder_name
  ,   @new_environment_name as environment_name -- My new environment name from the parameter
  ,   r.validation_status
  ,   r.last_validation_time
  FROM  [internal].[environment_references] as r
  INNER JOIN [internal].[projects] as p
     on r.project_id = p.project_id
  INNER JOIN [internal].[folders] as f
     on p.folder_id = f.folder_id
  INNER JOIN [internal].[environments] as e
     on e.folder_id = f.folder_id
     and e.environment_name = r.environment_name
  WHERE  e.environment_id = @current_environment_id -- My existing environment id from the parameter
  
  ------------------------------------------------------------------------------------------------------------
  -- END NEW INSERT INTO QUERIES
  ------------------------------------------------------------------------------------------------------------
        
        DECLARE @sqlString    nvarchar(1024)
        DECLARE @key_name               [internal].[adt_name]
        DECLARE @certificate_name       [internal].[adt_name]
        DECLARE @encryption_algorithm   nvarchar(255)
        
        SET @encryption_algorithm = (SELECT [internal].[get_encryption_algorithm]())
        
        IF @encryption_algorithm IS NULL
        BEGIN
            RAISERROR(27156, 16, 1, 'ENCRYPTION_ALGORITHM') WITH NOWAIT
        END
        
        
        SET @key_name = 'MS_Enckey_Env_'+CONVERT(varchar,@new_environment_id)
        SET @certificate_name = 'MS_Cert_Env_'+CONVERT(varchar,@new_environment_id)
        
        SET @sqlString = 'CREATE CERTIFICATE ' + @certificate_name + ' WITH SUBJECT = ''ISServerCertificate'''
        
        IF  NOT EXISTS (SELECT [name] FROM [sys].[certificates] WHERE [name] = @certificate_name)
            EXECUTE sp_executesql @sqlString 
        
        SET @sqlString = 'CREATE SYMMETRIC KEY ' + @key_name +' WITH ALGORITHM = ' 
                            + @encryption_algorithm + ' ENCRYPTION BY CERTIFICATE ' + @certificate_name
        
        IF  NOT EXISTS (SELECT [name] FROM [sys].[symmetric_keys] WHERE [name] = @key_name)
            EXECUTE sp_executesql @sqlString 
        
        
        DECLARE @retval int
        EXECUTE AS CALLER
            EXEC @retval = [internal].[init_object_permissions] 3, @new_environment_id, @caller_id
        REVERT
        IF @retval <> 0
        BEGIN
            
            RAISERROR(27153, 16, 1) WITH NOWAIT
        END    
         
        
        
        IF @tran_count = 0
            COMMIT TRANSACTION;                                                                                 
    END TRY

    BEGIN CATCH
        
        IF @tran_count = 0 
            ROLLBACK TRANSACTION;
        
        ELSE IF XACT_STATE() <> -1
            ROLLBACK TRANSACTION @savepoint_name;                                                                           
        
        THROW 

    END CATCH

    RETURN 0 


GO
Download as SQL file

NOTE: Please use at own risk and let me know it things could be improved!

Saturday, 15 December 2012

DTAP package configurations - part 2

Case
I have a DTAP (Development, Test, Acceptance, Production). How do I use SSIS configurations over these multiple environments?

Solutions
The most commonly used solutions for configurating multiple environments are:
a) XML Configuration File
b) SQL Server Configuration (in combination with environment variable)
c) SSIS 2012 (in combination with the new project deployment)

In this post I will eleborate SQL Server Configuration solution. For this scenario I have a simple package with one Data Flow Task that has one CSV source and one SQL destination. Let's start by adding configurations for the two Connection Managers.
Simple Scenario



















1) Add configurations
Go to the SSIS menu and select Package Configurations... and then Enable package configurations. After that hit the Add button.
Enable package configurations




















2) Configuration Type
Select SQL Server in the Configuration type selectbox. Create a new Connection Managers that links to the database with your configuration table (1). Create a new Configuration table or select an existing one (2). Enter a name that will describe the configurations (3). After that click Next.
Configuration Type




















3) Select Properties to Export
In this step you select all the properties you want to configure in the config table. In our case we select the ConnectionString property of both Connection Managers (but not the one from the new connection manager that you created in the previous step!). After that click next.
Select the two ConnectionString properties






















4) Configuration Name
Give your configuration a suitable name that will explain its purpose if you have multiple configurations.
Configuration name



















5) Add Environment variable
We now have a third Connection Manager, but its connectionstring will be different on all servers in your dtap environment. So we need to configure that new Connection Manager as well. We will store its connectionstring in a Windows Environment Variable.
This step will vary for each Operating System. The screenshots are from my Windows 7 laptop. Go to the properties of "My Computer", then to Advanced system settings, then to the Advanced tab and then hit he Environment Variables button. Now add a new System variable with the name SsisConfigurationDatabase and the value must me the connectionstring of that new Connection Manager. Repeat this step for each ssis server in your dtap environment.
New System Variable


















6) Add second configuration
Before we add a second configuration, you probably need to restart Visual Studio because it doesn't know your new Windows Environment variable. Add a new configuration and select Environment variable as Configuration type. Then select your new variable named SsisConfigurationDatabase. And click Next.
Configuration type




















7) Select Target Property
Now select the connectionstring of the new connection manager and click Next.
Select the ConnectionString




















8) Configuration name and move up
Give your configuration a suitable name and move it upwards so that it will be the first configuration. After that close the window.
Move it upwards!






















9) Copy configuration database
Now copy the new configuration database table to all database servers in your dtap enviroment and make sure its values are adjusted to the server. Also make sure the environment variable is available on each server. Now you can deploy your package to each server and it will use its own configuration.

Friday, 14 December 2012

DTAP package configurations - part 1

Case
I have a DTAP (Development, Test, Acceptance, Production). How do I use SSIS configurations over these multiple environments?

Solutions
The most commonly used solutions for configurating multiple environments are:
a) XML Configuration File
b) SQL Server Configuration (in combination with environment variable)
c) SSIS 2012 (in combination with the new project deployment)

In this post I will eleborate the XML Config file solution. For this scenario I have a simple package with one Data Flow Task that has one CSV source and one SQL destination. Let's start by adding configurations for the two Connection Managers.
Simple Scenario




















1) Add configurations
Go to the SSIS menu and select Package Configurations... and then Enable package configurations. After that hit the Add button.
Enable package configurations




















2) Configuration Type
Select XML configuration file (default value) in the Configuration type select box and enter a filepath for your config file. After that click Next.
Configuration Type





















3) Select Properties to Export
In this step you select all the properties you want to configure in the config file. In our case we select the ConnectionString property of both Connection Managers. After that click next.
Select both ConnectionString properties




















4) Configuration Name
Give your configuration a suitable name that will explain its purpose if you have multiple configurations. After that hit the Finish button and close the configuration window.
Configuration name






















5) View the file
If you browse to your config file and open it in for example Internet Explorer then you can see what's in it. Now you can just copy the xml config file to all environments. The path to the config file is embedded in the packages. So if you keep the file path the same, then you can just copy the packages and config files between environments. You can edit the config file in notepad to change the values for each environment.









6) Deployment Manifest
If you use different paths to store the config files then you should use the Deployment Manifest to change the embedded file path of the config file. Go to the properties of your project and go to the Deployment Utility page and select True at the CreateDeploymentUtility.
CreateDeploymentUtility





















7) Build to create the Deployment Utility
Build your project (right click project in Solution Explorer and select Build). This will create the 3 files in the bin folder of your project:

Folder: \ProjectName\bin\Deployment\




8) Deployment
Now you can give these three files to the server administrator and let him/her double click on PackageConfig.SSISDeploymentManifest to deploy the package(s) and determine the config file folder. It's even possible to change the config value while deploying.

Sunday, 9 December 2012

BUG: Could not decrypt file

Case
I have an Excel (xls) file as a source and it has cell protection turned on to prevent editing the wrong cells. It's an official form from a national care authority, so I can't change that. I don't even have the password.

When I try to read that file in SSIS it gives me an error:
Error at Package.dtsx [connection manager "xxxxx"]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occured. Error code: 0x80004005
An OLE DB record is available. Source "Microsoft JET Database Engine"Hresult: 0x80004005
Description: Could Not Decrypt File

Error at xxxxx [Excel Source]"SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager "xxxxx" failed with error code 0xC0202009.

SSIS 2005 - 2012: Could not decrypt file














The error doesn't occur when the file is still opened in Excel, but that's not really an option on a server.

Solution
You either have to remove the cell protection or convert the XLS file to XLSX (The ACE OLE DB provider doesn't have the same problem as the JET OLE DB provider). Also see/vote for this Microsoft Connect Bug report.

But can you do this automatically if you can't (or don't want to) ask that to the people who provided the excel files or if you have a whole bunch of those files? Here are a couple of options I have examined:

1) Interop Excel in Script Task
There is a Microsoft library (Microsoft.Office.Interop.Excel) available that can do that for you, but the BIG downside is that it requires an Office installation! Even if you download the Redistributable Primary Interop Assemblies (PIA), you still need Office installed. PIA is only a .Net wrapper that lets you communicate in .Net to the COM dll's from Office.
Serverside Office installation is a bad practice and Microsoft doesn't support/recommend an Office installation on a server.

2) Third party / open source dll's  in Script Task
There are a lot of third party and open source dll's for excel. A .Net colleague of mine tried a whole bunch them, but they either don't work for the old (xls) excel files or they don't have an option to convert to xlsx or to remove the cell protection.

3) Cozyroc Excel Task
The COZYROC SSIS+ Library has an Excel Task that can do the conversion to XLSX for you. It's not for free, but I think it's worth it. Hiring an external BI/ETL consultant for two days is probably more expensive. Moreover you get a whole bunch of other cool tasks and transformations with it and you can test them freely within BIDS or SSDT before you buy them. This is my solution with the Cozyroc tasks.
(with a little help from Cozyroc Consultant Ivan Peev)


















Note: Although the Cozyroc Excel Task works like a charm. I still think this is a bug that Microsoft should fix. Cell protection is an edit prevention and I 'm not editing...

Tip: You need to know the password to remove the cell protection in Excel. If you don't have it, use this vba macro to get it.

Saturday, 17 November 2012

SSIS OLEDB or ODBC

Case
A while ago I read this MSDN Blog post that stated that (short version) OLE DB is outdated and we should all use ODBC again. SQL 2012 will be the last release to support OLE DB. Should we all start using ODBC in SSIS?


To OLE DB or not to OLE DB?















Solution
Today, at the Dutch SQL saturdays, I had the change to ask that question to some of the brightest SQL people from The Netherlands (a Microsoft Certified Master and a couple of Microsoft Premier Field Engineers). Their answer: ODBC will become the standard in the future versions, but at the moment (SQL 2012) OLE DB is still much faster! So use OLE DB where possible.

Besides that... a lot of components (like the lookup) only support OLEDB. See complete list here.

Thursday, 1 November 2012

Split multi value column into multiple records

Case
I have a column with multiple values and I want to split them into multiple records.












Solution
You could solve it with a fancy TSQL query. I saw split functions with common table expressions, but a relatively easy script could to the trick in SSIS as well. (don't hesitate to post your query/solution in the comments)

1) Source
Add your source to the Data Flow Task












2) Script Component - input columns
This solution uses an asynchronous Script Component, so add a Script Component (type transformation) to your Data Flow Task. Edit it, go to the Input Columns pane and select all the columns you need downstream the data flow as readonly. In this case we need the columns Teacher and Students.
Input columns (readonly)




















3) Script Component - output port
Go to the Inputs and Outputs pane and click on Output 0. Set the SynchronousInputID property to none to make this Script Component asynchronous.
asynchronous




















4) Script Component - output columns
Add output columns for each input column that you need downstream the data flow. In this case we need Teacher (same datatype and size) and a new column named Student which will contain one value from the input column Students (same datatype, but size could probably be smaller).
Output columns



















5) The script
Copy the Inputs0 _ProcessInputRow method to your script and remove all the other methods (PreExecute, PostExecute and CreateNewOutputRows) because we don't need them.
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Method that will execute for each row passing
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // First we are converting the comma seperated list into a string array.
        // You can change the comma if you are using an other seperator like | or ;
        string[] Students = Row.Students.ToString().Split(new char[] { ',' }, StringSplitOptions.None);

        // Counter var used the loop through the string array
        int i = 0;

        // Looping through string array with student names
        while (i < Students.Length)
        {
            // Start a new row in the output
            Output0Buffer.AddRow();

            // Pass through all columns that you need downstream the data flow
            Output0Buffer.Teacher = Row.Teacher;

            // This is the splitted column. Take the [n] element from the array
            // and put it in the new column.
            Output0Buffer.Student = Students[i];

            // Increase counter to go the next value
            i++;
        }
    }
}

or VB.Net

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    ' Method that will execute for each row passing
    Public Overrides Sub Input0_ProcessInputRow(Row As Input0Buffer)
        ' First we are converting the comma seperated list into a string array.
        ' You can change the comma if you are using an other seperator like | or ;
        Dim Students As String() = Row.Students.ToString().Split(New Char() {","c}, StringSplitOptions.None)

        ' Counter var used the loop through the string array
        Dim i As Integer = 0

        ' Looping through string array with student names
        While i < Students.Length
            ' Start a new row in the output
            Output0Buffer.AddRow()

            ' Pass through all columns that you need downstream the data flow
            Output0Buffer.Teacher = Row.Teacher

            ' This is the splitted column. Take the [n] element from the array
            ' and put it in the new column.
            Output0Buffer.Student = Students(i)

            ' Increase counter to go the next value
            i += 1
        End While
    End Sub
End Class


6) The result
For testing purposes I added a derived column and a couple of data viewer
The result



























Note: if you want to do this backwards, see this post.


Monday, 29 October 2012

Split multi value column into multiple columns

Case
 I have a source where one column has a divider that splits the sales values for each month. How do I split that value over multiple columns?
Mountainbike;black;10,4,7,3,11,5,8,6,10,4,12,12
Mountainbike;blue;12,2,9,1,13,7,6,4,8,12,3,4
The last column contains the sales per month and is divided by a comma instead of a semicolon.

Solution
There are various options to split that value.
A) Substring/Findstring
B) Script Component
C) Token

Solution A: Substring/Findstring
You can use an expression in the Derived Column. The first and last deviate from the rest:
SUBSTRING(Sales,1,FINDSTRING(Sales,",",1) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",1) + 1,FINDSTRING(Sales,",",2) - FINDSTRING(Sales,",",1) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",2) + 1,FINDSTRING(Sales,",",3) - FINDSTRING(Sales,",",2) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",3) + 1,FINDSTRING(Sales,",",4) - FINDSTRING(Sales,",",3) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",4) + 1,FINDSTRING(Sales,",",5) - FINDSTRING(Sales,",",4) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",5) + 1,FINDSTRING(Sales,",",6) - FINDSTRING(Sales,",",5) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",6) + 1,FINDSTRING(Sales,",",7) - FINDSTRING(Sales,",",6) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",7) + 1,FINDSTRING(Sales,",",8) - FINDSTRING(Sales,",",7) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",8) + 1,FINDSTRING(Sales,",",9) - FINDSTRING(Sales,",",8) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",9) + 1,FINDSTRING(Sales,",",10) - FINDSTRING(Sales,",",9) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",10) + 1,FINDSTRING(Sales,",",11) - FINDSTRING(Sales,",",10) - 1)
SUBSTRING(Sales,FINDSTRING(Sales,",",11) + 1,LEN(Sales) - FINDSTRING(Sales,",",11))
Substring/Findstring solution




















Solution B: Script Component
Add a Script Component (type transformation) and select the Sales column as ReadOnly input column in the Input Columns pane. Then go to the Inputs and Outputs pane and create a column foreach month. After that choose your language and hit the Edit Script button and copy the following method.
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Split input column on comma fill output columns
        // Added the Convert.ToInt32(XXXX) to convert it to int
        string[] sales = Row.Sales.ToString().Split(new char[]{','}, StringSplitOptions.None);
        Row.Jan = Convert.ToInt32(sales[0]);
        Row.Feb = Convert.ToInt32(sales[1]);
        Row.Mar = Convert.ToInt32(sales[2]);
        Row.Apr = Convert.ToInt32(sales[3]);
        Row.May = Convert.ToInt32(sales[4]);
        Row.Jun = Convert.ToInt32(sales[5]);
        Row.Jul = Convert.ToInt32(sales[6]);
        Row.Aug = Convert.ToInt32(sales[7]);
        Row.Sep = Convert.ToInt32(sales[8]);
        Row.Oct = Convert.ToInt32(sales[9]);
        Row.Nov = Convert.ToInt32(sales[10]);
        Row.Dec = Convert.ToInt32(sales[11]);
    }
}

or VB.Net

'VB.Net code
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()g _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Split input column on comma fill output columns
        ' Added the Convert.ToInt32(XXXX) to convert it to int
        Dim sales As String() = Row.Sales.ToString().Split(New Char() {","c}, StringSplitOptions.None)
        Row.Jan = Convert.ToInt32(sales(0))
        Row.Feb = Convert.ToInt32(sales(1))
        Row.Mar = Convert.ToInt32(sales(2))
        Row.Apr = Convert.ToInt32(sales(3))
        Row.May = Convert.ToInt32(sales(4))
        Row.Jun = Convert.ToInt32(sales(5))
        Row.Jul = Convert.ToInt32(sales(6))
        Row.Aug = Convert.ToInt32(sales(7))
        Row.Sep = Convert.ToInt32(sales(8))
        Row.Oct = Convert.ToInt32(sales(9))
        Row.Nov = Convert.ToInt32(sales(10))
        Row.Dec = Convert.ToInt32(sales(11))
    End Sub
End Class
This is more readable than the Substring/Findstring expressions!
Script Component solution




















Solution C: Token
SSIS 2012 has a new expression called Token which makes life a lot easier than the Substring/Findstring solution:
TOKEN(Sales,",",1)
TOKEN(Sales,",",2)
TOKEN(Sales,",",3)
TOKEN(Sales,",",4)
TOKEN(Sales,",",5)
TOKEN(Sales,",",6)
TOKEN(Sales,",",7)
TOKEN(Sales,",",8)
TOKEN(Sales,",",9)
TOKEN(Sales,",",10)
TOKEN(Sales,",",11)
TOKEN(Sales,",",12)

Token solution

Sunday, 28 October 2012

Creating a comma separated list of related records

Case
I have a list of teacher and student combinations with one combination per record and I want to created a comma delimited list of students per teacher.

Solution
If your source is a database then the easiest solution is a TSQL query like this:
--TSQL Query
WITH UniqueTeachers AS
(
 SELECT  DISTINCT Teacher
 FROM  TeacherStudentTable
)
SELECT  Teacher
,   Students = STUFF((
      SELECT  ',' + Student
      FROM  TeacherStudentTable
      WHERE  Teacher = UniqueTeachers.Teacher
      ORDER BY Student
      FOR XML PATH(''), TYPE).value('.','varchar(100)'), 1, 1, '')
FROM  UniqueTeachers
ORDER BY Teacher
The query in SSIS as source



























If your source is for example a flat file or a database that doesn't support a query like this, then there are also options within SSIS. For this solution I use a asynchronous Script Component.

1) Sorted source
We need a sorted source because we are comparing records with each other. In this case make sure the source is sorted on teacher first (and optional secondly on student).
Add Sort transformation if source isn't sorted




















2) Script Component
Add a Script Component (type transformation) and select the Teacher and Student columns as ReadOnly input columns.
Input columns: Teacher and Student




















3) Asynchronous
We need to make the Script Component asynchronous because it throws out a different number of rows than there are incomming. Go to the Inputs and Outputs pane, click on Output 0 and change the SynchronousInputID to None.
Asynchonous



















4) Output
We now need to create an output for the Script Component. Expand the Output 0 and add two columns:
Teacher (same data type and size as the input column teacher)
Students (same data type as the input column student, but larger to fit multiple student names)
Output columns




















5) The Script
Copy the three variables and the two methods to your Script Component (and remove any other existing methods).
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    bool initialRow = true;     // Indicater for the first row
    string teacher = "";        // Name of the teacher to track teacherchanges between rows
    string students = "";       // The comma delimited list of students

    public override void Input0_ProcessInput(Input0Buffer Buffer)
    {
        // Loop through buffer
        while (Buffer.NextRow())
        {
            // Process an input row
            Input0_ProcessInputRow(Buffer);

            // Change the indicator after the first row has been processed
            initialRow = false;
        }
        
        // Check if this is the last row
        if (Buffer.EndOfRowset())
        {
            // Fill the columns of the existing output row with values
            // from the variable before closing this Script Component
            Output0Buffer.Teacher = teacher;
            Output0Buffer.Students = students;
        }
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if (initialRow)
        {
            // This is for the first input row only
            
            // Create a new output row
            Output0Buffer.AddRow();

            // Now fill the variables with the values from the input row
            teacher = Row.Teacher;
            students = Row.Student;
        }
        else if ((!initialRow) & (teacher != Row.Teacher))
        {
            // This isn't the first row, but the teacher did change

            // Fill the columns of the existing output row with values
            // from the variable before creating a new output row
            Output0Buffer.Teacher = teacher;
            Output0Buffer.Students = students;

            // Create a new output row
            Output0Buffer.AddRow();

            // Now fill the variables with the values from the input row
            teacher = Row.Teacher;
            students = Row.Student;
        }
        else if ((!initialRow) & (teacher == Row.Teacher))
        {
            // This isn't the first row, and the teacher did not change

            // Concatenate the studentsname to the variable
            students += "," + Row.Student;
        }
    }

    // Little explanation:
    // Rows are created in memory with .AddRow()
    // and will be submitted to the output when a
    // new / subsequent row is created or when
    // the last buffer has been finished.
}

or in VB.Net

'VB.Net code
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    Private initialRow As Boolean = True    ' Indicater for the first row
    Private teacher As String = ""          ' Name of the teacher to track teacherchanges between rows
    Private students As String = ""         ' The comma delimited list of students

    Public Overrides Sub Input0_ProcessInput(Buffer As Input0Buffer)
        ' Loop through buffer
        While Buffer.NextRow()
            ' Process an input row
            Input0_ProcessInputRow(Buffer)

            ' Change the indicator after the first row has been processed
            initialRow = False
        End While

        ' Check if this is the last row
        If Buffer.EndOfRowset() Then
            ' Fill the columns of the existing output row with values
            ' from the variable before closing this Script Component
            Output0Buffer.Teacher = teacher
            Output0Buffer.Students = students
        End If
    End Sub

    Public Overrides Sub Input0_ProcessInputRow(Row As Input0Buffer)
        If initialRow Then
            ' This is for the first input row only

            ' Create a new output row
            Output0Buffer.AddRow()

            ' Now fill the variables with the values from the input row
            teacher = Row.Teacher
            students = Row.Student
        ElseIf (Not initialRow) And (teacher <> Row.Teacher) Then
            ' This isn't the first row, but the teacher did change

            ' Fill the columns of the existing output row with values
            ' from the variable before creating a new output row
            Output0Buffer.Teacher = teacher
            Output0Buffer.Students = students

            ' Create a new output row
            Output0Buffer.AddRow()

            ' Now fill the variables with the values from the input row
            teacher = Row.Teacher
            students = Row.Student
        ElseIf (Not initialRow) And (teacher = Row.Teacher) Then
            ' This isn't the first row, and the teacher did not change

            ' Concatenate the studentsname to the variable
            students += "," & Convert.ToString(Row.Student)
        End If
    End Sub

    ' Little explanation:
    ' Rows are created in memory with .AddRow()
    ' and will be submitted to the output when a
    ' new / subsequent row is created or when
    ' the last buffer has been finished.
End Class

Note: You can change the delimiter in C# line 70 and VB.Net line 64.


6) Testing
For testing purposes I added a derived column and a couple of data viewer.
The result



















Conclusion: both methods have the same result. For a SQL database source, the T-SQL method is probably a little faster. If you can do the sort in the source the performance differences will diminish.

In one of my next blog posts I will show you how to split a comma separated list in to records.

Sunday, 21 October 2012

Replace Null Transformation a.k.a. the defaulter

An often used task in a data flow task is to replace all null values for dimension lookups. If you're lucky you can use the short 2012 expression REPLACENULL or else you're stuck with ? :

Replace Null



















Doing that a dozen times for each fact package over and over is boring, so my collegue Marc Potters developped a Replace Null Transformation (a.k.a. Defaulter Transformation) that can replace null values in several columns at once. We made it a bit more general so it can be used to replace null values for all data types.
Replace Null


















Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008 and 2012 version on the download page.

BETA2: solved bug that could result in double output.
BETA3: solved bug that could result in value 0 if only one column is selected.

Installation
The installer registers the DLL in the GAC and copies it to the component folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\). After that you have to close Visual Studio/BIDS because it caches the GAC on startup. (Restart not required for SSDT)

How add the transformation to the toolbox (2008 only)
When you open a SSIS project in Visual Studio/Bids, right click the toolbox and choose "Choose Items...". After this Visual Studio will be busy for a couple of seconds!
Right click toolbox






















When the Choose Toolbox Items window appears, go to the SSIS Data Flow Items and search for the newly installed transformation and select it. Click ok to finish.
Choose Toolbox Items





















Now the new transformation will appear in the toolbox. Ready to use! Have fun.

New component added






























Bugs or feature suggestions
Please contact me or leave a comment if you have any suggestions or found a bug in this Custom component.

Sunday, 14 October 2012

Custom SSIS Component: UnZip Task

In january 2011 I did a post about unzipping files within SSIS with a Script Task. Because not everybody is fond about programming, I decided to make my own UnZip (and Zip) Task. For the actual unzipping I used the well know opensource DotNetZip library.





This UnZip Task is still beta and more unzip features will follow. Please use it, test it and let me know your findings or wishes in the comments below.
The UnZip Task is now stable, but please submit suggestions for new features.






















Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008 and 2012 version on the download page.

BETA2: UNC path support, cancel variable window bug solved.
BETA3: Various validations added, New option added to store unzipped filepath in variable.
BETA4: Switched to DotNetZip for the actual unzipping * Probably last beta version before official release. *
V1.0: Bug solved that didn't validate driveletters with smallcaps
V1.1: Option added to delete zip file after unzipping it (see properties). Code added for easier upgrading to new release.
V1.2: SSIS 2016 added and now upgradable. Added 64 installer.

Installation
The installer registers the DLL in the GAC and copies it to the task folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\Tasks\). After that you have to close Visual Studio/BIDS because it caches the GAC on startup. Restarting SQL Server Data Tools is not necessary.

How add the task the the toolbox (2008 only)
When you open a SSIS project in Visual Studio/Bids, right click the toolbox and choose "Choose Items...". After this Visual Studio will be busy for a couple of seconds!

Right click toolbox






















When the Choose Toolbox Items window appears, go to the SSIS Control Flow Items and search for the newly installed UnZip Task and select it. Click ok to finish.
Choose Toolbox Items























Now the new task will appear in the toolbox. Ready to use! Have fun.
New task added



























Bugs or feature suggestions
Please contact me or leave a comment if you have any suggestions or found a bug in this Custom task.