Friday, 27 May 2016

TSQL Merge for Slowly Changing Dimension or Persistent Staging Area

Case What is a fast way to load a Slowly Changing Dimension or Persistent Staging Area in SSIS. When using a Data Flow Task for this, the process could become very slow when there are a lot of updates.

Solution
Instead of using the OLE DB Command for updating records you could load all those records to a temporary table and then use a batch update command to update all records in the target table with the values from the temporary table.

An even fancier way is by using the TSQL MERGE statement. This statement has one downside and that is that it can only update records when there is a match and not update the old record and insert a new record. You can overcome this by using the output of the MERGE statement.

The MERGE statement will update the existing record in the destination table, but it can output the old version of the updated record. You can then use this output to do an insert on the destination table.


First create a source and destination table for testing purposes:
-- Drop if exist
IF OBJECT_ID('dbo.Employees', 'U') IS NOT NULL 
    DROP TABLE dbo.Employees; 

-- Create source table
CREATE TABLE [dbo].[Employees](
 [EmployeeNumber] [varchar](5) NULL,
 [FirstName] [varchar](50) NULL,
 [LastName] [varchar](50) NULL,
 [DateOfBirth] [date] NULL,
 [Salary] [money] NULL
);

-- Insert test records in source
INSERT [dbo].[Employees] ([EmployeeNumber], [FirstName], [LastName], [DateOfBirth], [Salary]) VALUES (N'00001', N'John',  N'Williams', CAST(N'1972-02-15' AS Date), 5100.00);
INSERT [dbo].[Employees] ([EmployeeNumber], [FirstName], [LastName], [DateOfBirth], [Salary]) VALUES (N'00002', N'Jane',  N'Smith',    CAST(N'1965-09-02' AS Date), 4900.00);
INSERT [dbo].[Employees] ([EmployeeNumber], [FirstName], [LastName], [DateOfBirth], [Salary]) VALUES (N'00003', N'Marc',  N'Brown',    CAST(N'1981-12-01' AS Date), 3300.00);
INSERT [dbo].[Employees] ([EmployeeNumber], [FirstName], [LastName], [DateOfBirth], [Salary]) VALUES (N'00004', N'David', N'Garcia',   CAST(N'1975-01-01' AS Date), 3700.00);

-- Drop if exist
IF OBJECT_ID('dbo.DimEmployee', 'U') IS NOT NULL 
    DROP TABLE dbo.DimEmployee;

-- Create destination table
CREATE TABLE [dbo].[DimEmployee](
 [EmployeeID] [int] IDENTITY(1,1) NOT NULL,
 [EmployeeNumber] [varchar](5) NULL,
 [FirstName] [varchar](50) NULL,
 [LastName] [varchar](50) NULL,
 [DateOfBirth] [date] NULL,
 [Salary] [money] NULL,
 [Active] [bit] NULL,
 [DateFrom] [datetime] NULL,
 [DateEnd] [datetime] NULL
);


The destination table is a Slowly Changing Dimension, but it could also be a Persistent Staging Area. It has the same columns as the source with a few extra columns:
  1. EmployeeID is the dimension id it's populated by the identity setting. You could skip this column for the Persistent Staging Area.
  2. Active: This is a Boolean to quickly filter all active records. It's a bit redundant, but also easy.
  3. DateFrom: This is the datetime to indicate the insertion of this record. It never changes.
  4. DateFrom: This is the datetime to indicate when this record was inactivated. I use NULL for active records, but you could also use a future date like '9999-31-12'

And now the MERGE script. Below I will try to describe each section that starts with a comment and number: /***** 1 *****/.
--Merge script
/***** 6b *****/
INSERT INTO DimEmployee 
   (EmployeeNumber
   , FirstName
   , LastName
   , DateOfBirth
   , Salary
   , Active
   , DateFrom
   , DateEnd)
SELECT  MergeOutput.EmployeeNumber
,   MergeOutput.FirstName
,   MergeOutput.LastName
,   MergeOutput.DateOfBirth
,   MergeOutput.Salary
,   0         -- InActivate the record
,   MergeOutput.DateFrom    -- Keep the old from date
,   GETDATE()       -- Close the record
FROM (
  /***** 1 *****/
  MERGE DimEmployee as T     -- Target
  USING Employees as S     -- Source
  ON T.EmployeeNumber = S.EmployeeNumber -- Compare key
  AND T.Active = 1      -- Only compare open records
  /***** 2 *****/
  WHEN NOT MATCHED BY TARGET THEN   -- Not found in destination
   INSERT 
   ( EmployeeNumber
   , FirstName
   , LastName
   , DateOfBirth
   , Salary
   , Active
   , DateFrom)
   VALUES
   ( S.EmployeeNumber
   , S.FirstName
   , S.LastName
   , S.DateOfBirth
   , S.Salary
   , 1         -- Activate the record
   , GETDATE())      -- Open the record
  /***** 3 *****/
  WHEN NOT MATCHED BY SOURCE    -- Not found in source
   AND T.Active = 1 THEN    -- Only compare open records
   UPDATE 
   SET T.Active  = 0    -- Inactivate record
   , T.DateEnd  = GETDATE()  -- Close date
  /***** 4 *****/
  WHEN MATCHED       -- Found in source and destination
   AND T.Active = 1     -- Only compare open records
   AND EXISTS (SELECT S.FirstName
      ,  S.LastName
      ,  S.DateOfBirth
      ,  S.Salary
      EXCEPT
      SELECT T.FirstName
      ,  T.LastName
      ,  T.DateOfBirth
      ,  T.Salary) THEN
   UPDATE
   SET T.FirstName  = S.FirstName
   , T.LastName  = S.LastName
   , T.DateOfBirth = S.DateOfBirth
   , T.Salary  = S.Salary
   , T.Active  = 1    -- Make record active
   , T.DateFrom  = GETDATE()  -- Open record with current datetime
   , T.DateEnd  = null   -- Keep record open
  /***** 5 *****/
  OUTPUT $action as MergeAction, Deleted.*, Inserted.Active as NewActiveCheck
  /***** 6a *****/
  ) as MergeOutput WHERE MergeAction = 'UPDATE' and NewActiveCheck = 1;


  1. In this section you provide the name of the target and source table and which key to use to compare those records. I also added a filter on active to only compare open records. You could replace it by T.DateFrom is null.
  2. This section is for new records. Source records that are not found in the destination will be inserted with Active set to 1 (true), the FromDate set to now and the EndDate set to null.
  3. This section is for deleted records. Active destination records that are not found in the source are deactived and closed by setting the EndDate. Other columns remain unchanged.
  4. This secion is for active updating record with new values. To prevent unnecessary updates I have added EXISTS-EXCEPT part. This is  a very handy way to compare all (non key) columns for changes and above all it can even compare NULL values.
  5. This is the last part of the MERGE statement and it can output the old and new values of deletes, updates and insertions. In this case I'm interested in the old values of the changed records (Deleted.* or Deleted.column1, Deleted.column2, etc). I also output the Active column from the new record to filter inactivated records (deleted records from the source shouldn't be inserted again). The $active indicates whether this is an 'INSERT', 'UPDATE', or 'DELETE'.
  6. In 6a I filter on the action to only keep the old values of the updated records. In 6b I insert a new record with the old values of the changed records. I inactivate the new record and I set the EndDate to close the new record. Other columns remain unchanged.


Testing the script:
One update




















Second test:
One update, one delete and one insert




















I use this script primarily for the Persistent Staging Area. When you want to use it for an SCD you have to reload the facttable because the dimension ID changes. The fact pointing to ID 4 with David's old salary now points to the record with David's new salary.

The alternative script below could be a solution for that. Instead of comparing the key columns, I compare the CHECKSUM (or HASHBYTES) of all columns and remove the WHEN matched part (if the checksum matches, then we don't have to do anything). The benefit of this is that the dimension ID never changes. A second benefit is that you don't need to know the key columns. One downside is that CHECKSUM may not be unique and the HASHBYTES can only handle 8000bytes and can't compare NULL values. So the script below is NOT yet foolproof!!! Will work on that, but let me know if you have a solution.


--Alternative Merge script with checkum or hashbytes
/***** 6b *****/
INSERT INTO DimEmployee 
   ( EmployeeNumber
   , FirstName
   , LastName
   , DateOfBirth
   , Salary
   , Active
   , DateFrom
   , DateEnd)
SELECT  MergeOutput.EmployeeNumber
,   MergeOutput.FirstName
,   MergeOutput.LastName
,   MergeOutput.DateOfBirth
,   MergeOutput.Salary
,   0         -- InActivate the record
,   MergeOutput.DateFrom    -- Keep the old from date
,   GETDATE()       -- Close the record
FROM (
  /***** 1 *****/
  MERGE DimEmployee as T     -- Target
  USING Employees as S     -- Source
  ON    CHECKSUM(S.EmployeeNumber + '|' + S.FirstName + '|' + S.LastName + '|' + CAST(S.DateOfBirth as varchar(10))  + '|' + CAST(S.Salary as varchar(20))) = 
        CHECKSUM(T.EmployeeNumber + '|' + T.FirstName + '|' + T.LastName + '|' + CAST(T.DateOfBirth as varchar(10))  + '|' + CAST(T.Salary as varchar(20)))
  --ON    HASHBYTES('MD5 ', S.EmployeeNumber + '|' + S.FirstName + '|' + S.LastName + '|' + CAST(S.DateOfBirth as varchar(10))  + '|' + CAST(S.Salary as varchar(20))) = 
  --      HASHBYTES('MD5 ', T.EmployeeNumber + '|' + T.FirstName + '|' + T.LastName + '|' + CAST(T.DateOfBirth as varchar(10))  + '|' + CAST(T.Salary as varchar(20)))

  AND T.Active = 1      -- Only compare open records
  /***** 2 *****/
  WHEN NOT MATCHED BY TARGET THEN   -- Not found in destination
   INSERT 
   ( EmployeeNumber
   , FirstName
   , LastName
   , DateOfBirth
   , Salary
   , Active
   , DateFrom)
   VALUES
   ( S.EmployeeNumber
   , S.FirstName
   , S.LastName
   , S.DateOfBirth
   , S.Salary
   , 1         -- Activate the record
   , GETDATE())      -- Open the record
  /***** 3 *****/
  WHEN NOT MATCHED BY SOURCE    -- Not found in source
   AND T.Active = 1 THEN    -- Only compare open records
   UPDATE 
   SET T.Active  = 0    -- Inactivate record
   , T.DateEnd  = GETDATE()  -- Close date
  /***** 4 *****/
  /***** REMOVED *****/
  /***** 5 *****/
  OUTPUT $action as MergeAction, Deleted.*, Inserted.Active as NewActiveCheck
  /***** 6a *****/
  ) as MergeOutput WHERE MergeAction = 'UPDATE' and NewActiveCheck = 1;


Testing the script:
One update



















Second test:
One update, one delete and one insert

Friday, 22 April 2016

Get packages from SSIS Catalog

Case
I'm at a client and they lost the original SSIS Visual Studio projects. I have the packages deployed in the SSIS catalog. Is there a way to retrieve the packages from the catalog and create SSIS project from them?
Integration Services Catalog













Solution
First you should tell your client that they need to start using TFS (or similar versioning software)! And then you could use one of these three methods to recreate the SSIS projects. The catalog contains the packages, connection managers and parameters, but not the original Visual Studio (SSDT) projects, but recreating is quite easy.

A. Integration Service Import Project Wizard
B. Export in SSMS
C. PowerShell to the rescue



A. Integration Service Import Project Wizard
If your development PC, with SSDT/Visual Studio installed, can connect to the SQL Server with the catalog on it then this method is the easiest. Open SSDT, because it has an SSIS project with an import wizard!

1) Create a new Integration Services project
Instead of using the standard "Integration Services" - project, we need to use the "Integration Services Import Project Wizard" - project. For the project name you should use the name of the project in the SSIS catalog and for the solution name you could use the name of project folder from the SSIS catalog.
Integration Services Import Project Wizard




















2) Wizard
Now the wizard will appear which allows you to select an SSIS Catalog project and export its content to your new SSIS project in Visual Studio. With just a few clicks your project is ready to use.
Animated GIF of wizard


























B. Export in SSMS
If you cannot reach the server from within Visual Studio then there is an other option. Start SQL Server Management Studio and go to the SSIS Catalog. It has an option the export projects to an ISPAC file.

1) Right click your project and choose Export
When you export the project, you need to save the ISPAC file to a diskdrive or network folder. Use the project name from the catalog as the filename. 
Add caption
















1b) Integration Service Import Project Wizard
If you saved the ISPAC file to a folder which can be reached from your development PC you could either go to method A and use the wizard to import the ISPAC file to your project or continue to step 2 and use an alternative method to recreate an SSIS project in SSDT.


2) Extract ISPAC file
If you saved the file as XXXX.ispac then you first need to rename the .ISPAC file to .ZIP, after that you can to extract all files from that zipfile.
Unzip the ISPAC file




















3) Create new SSIS project
Because the solution and project files aren't saved within the Catalog you need to create a new/empty project in SSDT. For the projectname you should use the name of the project in the SSIS catalog and for the solution name you could use the name of project folder from the SSIS catalog.
Create new SSIS project in SSDT/Visual Studio
















4) Add items from unzipped ISPAC
First remove the default package.dtsx file and then right click the project and choose Add, Existing Item... Then browse to your unzipped ISPAC file and select all files, except "@Project.manifest" and "[Content_Types].xml" (if you forget that the will end up in the Miscellaneous folder from your project).

If you import the Project.params file it will notice that there is already one. Replace it and after that reload the file.











After that you have your SSIS project back and it's probably time to put it under Source Control. Don't forget to change/check the package protection level of your packages and project.
Add project/solution to source control












C. PowerShell
If you have a lot ispac files to download then you could also use PowerShell. The only thing this solution doesn't do is creating a project in Visual Studio. You could either use method A or B for that or you could even try to create the XML for the xxxx.dtproj file with PowerShell. (download example)

 
#PowerShell: DownloadIspac.ps1
################################
########## PARAMETERS ##########
################################ 
# Change Server, folder, project and download folder
$SsisServer = "MyServer\SQL2014" # Mandatory
$FolderName = "MyFolder"         # Can be empty to download multiple projects
$ProjectName = "MyProject"       # Can be empty to download multiple projects
$DownloadFolder = "D:\MyIspacs\" # Mandatory
$CreateSubfolders = $true        # Mandatory
$UnzipIspac = $false             # Mandatory


#################################################
########## DO NOT EDIT BELOW THIS LINE ##########
#################################################
clear
Write-Host "========================================================================================================================================================"
Write-Host "== Used parameters =="
Write-Host "========================================================================================================================================================"
Write-Host "SSIS Server             :" $SsisServer
Write-Host "Folder Name             :" $FolderName
Write-Host "Project Name            :" $ProjectName
Write-Host "Local Download Folder   :" $DownloadFolder
Write-Host "Create Subfolders       :" $CreateSubfolders
Write-Host "Unzip ISPAC (> .NET4.5) :" $UnzipIspac
Write-Host "========================================================================================================================================================"


##########################################
########## Mandatory parameters ##########
##########################################
if ($SsisServer -eq "")
{
    Throw [System.Exception] "SsisServer parameter is mandatory"
}
if ($DownloadFolder -eq "")
{
    Throw [System.Exception] "DownloadFolder parameter is mandatory"
}
elseif (-not $DownloadFolder.EndsWith("\"))
{
    # Make sure the download path ends with an slash
    # so we can concatenate an subfolder and filename
    $DownloadFolder = $DownloadFolder = "\"
}


############################
########## SERVER ##########
############################
# Load the Integration Services Assembly
Write-Host "Connecting to server $SsisServer "
$SsisNamespace = "Microsoft.SqlServer.Management.IntegrationServices"
[System.Reflection.Assembly]::LoadWithPartialName($SsisNamespace) | Out-Null;

# Create a connection to the server
$SqlConnectionstring = "Data Source=" + $SsisServer + ";Initial Catalog=master;Integrated Security=SSPI;"
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection $SqlConnectionstring

# Create the Integration Services object
$IntegrationServices = New-Object $SsisNamespace".IntegrationServices" $SqlConnection

# Check if connection succeeded
if (-not $IntegrationServices)
{
    Throw [System.Exception] "Failed to connect to server $SsisServer "
}
else
{
    Write-Host "Connected to server" $SsisServer
}


#############################
########## CATALOG ##########
#############################
# Create object for SSISDB Catalog
$Catalog = $IntegrationServices.Catalogs["SSISDB"]

# Check if the SSISDB Catalog exists
if (-not $Catalog)
{
    # Catalog doesn't exists. Different name used?
    Throw [System.Exception] "SSISDB catalog doesn't exist."
}
else
{
    Write-Host "Catalog SSISDB found"
}


############################
########## FOLDER ##########
############################
if ($FolderName -ne "")
{
    # Create object to the folder
    $Folder = $Catalog.Folders[$FolderName]
    # Check if folder exists
    if (-not $Folder)
    {
        # Folder doesn't exists, so throw error.
        Write-Host "Folder" $FolderName "not found"
        Throw [System.Exception] "Aborting, folder not found"
    }
    else
    {
        Write-Host "Folder" $FolderName "found"
    }
}


#############################
########## Project ##########
#############################
if ($ProjectName -ne "" -and $FolderName -ne "")
{
    $Project = $Folder.Projects[$ProjectName]
    # Check if project already exists
    if (-not $Project)
    {
        # Project doesn't exists, so throw error.
        Write-Host "Project" $ProjectName "not found"
        Throw [System.Exception] "Aborting, project not found"
    }
    else
    {
        Write-Host "Project" $ProjectName "found"
    }
}


##############################
########## DOWNLOAD ##########
##############################
Function DownloadIspac
{
    Param($DownloadFolder, $Project, $CreateSubfolders, $UnzipIspac)
    if ($CreateSubfolders)
    {
        $DownloadFolder = ($DownloadFolder + $Project.Parent.Name)
    }

    # Create download folder if it doesn't exist
    New-Item -ItemType Directory -Path $DownloadFolder -Force > $null

    # Check if new ispac already exists
    if (Test-Path ($DownloadFolder + $Project.Name + ".ispac"))
    {
        Write-Host ("Downloading [" + $Project.Name + ".ispac" + "] to " + $DownloadFolder + " (Warning: replacing existing file)")
    }
    else
    {
        Write-Host ("Downloading [" + $Project.Name + ".ispac" + "] to " + $DownloadFolder)
    }

    # Download ispac
    $ISPAC = $Project.GetProjectBytes()
    [System.IO.File]::WriteAllBytes(($DownloadFolder + "\" + $Project.Name + ".ispac"),$ISPAC)
    if ($UnzipIspac)
    {
        # Add reference to compression namespace
        Add-Type -assembly "system.io.compression.filesystem"

        # Extract ispac file to temporary location (.NET Framework 4.5) 
        Write-Host ("Unzipping [" + $Project.Name + ".ispac" + "]")

        # Delete unzip folder if it already exists
        if (Test-Path ($DownloadFolder + "\" + $Project.Name))
        {
            [System.IO.Directory]::Delete(($DownloadFolder + "\" + $Project.Name), $true)
        }

        # Unzip ispac
        [io.compression.zipfile]::ExtractToDirectory(($DownloadFolder + "\" + $Project.Name + ".ispac"), ($DownloadFolder + "\" + $Project.Name))

        # Delete ispac
        Write-Host ("Deleting [" + $Project.Name + ".ispac" + "]")
        [System.IO.File]::Delete(($DownloadFolder + "\" + $Project.Name + ".ispac"))
    }
    Write-Host ""
}


#############################
########## LOOPING ##########
#############################
# Counter for logging purposes
$ProjectCount = 0

# Finding projects to download
if ($FolderName -ne "" -and $ProjectName -ne "")
{
    # We have folder and project
    $ProjectCount++
    DownloadIspac $DownloadFolder $Project $CreateSubfolders $UnzipIspac
}
elseif ($FolderName -ne "" -and $ProjectName -eq "")
{
    # We have folder, but no project => loop projects
    foreach ($Project in $Folder.Projects)
    {
        $ProjectCount++
        DownloadIspac $DownloadFolder $Project $CreateSubfolders $UnzipIspac
    }
}
elseif ($FolderName -eq "" -and $ProjectName -ne "")
{
    # We only have a projectname, so search
    # in all folders
    foreach ($Folder in $Catalog.Folders)
    {
        foreach ($Project in $Folder.Projects)
        {
            if ($Project.Name -eq $ProjectName)
            {
                Write-Host "Project" $ProjectName "found in" $Folder.Name
                $ProjectCount++
                DownloadIspac $DownloadFolder $Project $CreateSubfolders $UnzipIspac
            }
        }
    }
}
else
{
    # Download all projects in all folders
    foreach ($Folder in $Catalog.Folders)
    {
        foreach ($Project in $Folder.Projects)
        {
            $ProjectCount++
            DownloadIspac $DownloadFolder $Project $CreateSubfolders $UnzipIspac
        }
    }
}

###########################
########## READY ##########
###########################
# Kill connection to SSIS
$IntegrationServices = $null
Write-Host "Finished, total downloads" $ProjectCount

Output example

Related Posts Plugin for WordPress, Blogger...