Wednesday, 5 April 2017

Dynamically unpivot data

Case
For a client I need to read hundreds of bus route matrices and they all vary in size. This makes it hard to read them dynamically with a Foreach Loop Container because the number of columns differs per file. And I don't want to create hundreds of Data Flow Tasks by hand. Even BIML won't help this time, because the routes change regularly and I don't want to generate and deploy packages every day.
I need to dynamically unpivot data within the Data Flow Task. How do I solve this within SSIS?
Dynamically unpivot data



















Solution
The trick for this case is to read everything as one big column and then dynamically split and unpivot the column in a Script Component Transformation. The unpivot output will always have three columns: Start Station, End Station and Distance. And the good news is that it has only a few lines of relatively easy code.
The solution


























1) Source with one big column
Change your Flat File Connection Manager so that it will read everything as one big column. Make sure the column is big enough to fit all data. For this example I called the column 'ColumnOne'.
Flat File with one column only














2) Script Component Transformation Input
Drag a Script Component on the surface and choose Transformation. Connect it to your source. Then edit the Script Component  and go to the 'Input Columns' page. On that page select the column with all the matrix data as ReadOnly.
Input Columns
























3) Script Component Transformation Input
On the 'Inputs and Outputs' page we need to add the new output columns. For this example I need a StartStation (string), EndStation (string) and the Distance (int).
An other important step is setting the SynchronousInputID property (of Output 0) to 'None'. This makes the transformation asynchronous which means the number of row in could be unequal to the number of rows out. And that means the input buffer with records isn't reused in this component, but a new output buffer will be created.
Inputs and Outputs
























4) The script
Go to the script page, choose C# as scripting language and hit the Edit Script button. And now copy the contents of my Input0_ProcessInputRow method to your Input0_ProcessInputRow method. And there are also two variables called Stations and Distances. They are declared above this method. Copy those to your code and put them on the same place.
I also remove the unused methods PreExecute, PostExecute and CreateNewOutputRows to keep the code clean and mean.
#C# Code
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
#endregion

/// <summary>
/// Split and unpivot data
/// </summary>
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Define two arrays for distances and stations
    // The Stations array will be filled only once
    // The Distances array will change for each row
    string[] Stations;
    string[] Distances;

    /// <summary>
    /// This method is called once for every row that passes through the component from Input0.
    /// </summary>
    /// <param name="Row">The row that is currently passing through the component</param>
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // The first time this method executes the Stations array
        // is still empty (null). In the true clause of the if-
        // statement we will fill the Stations array.
        // Therefore the second, third, etc. time this method
        // executes we will go to the false clause of the if-
        // statement.
        if (Stations == null)
        {
            // We know that the first row contains the stations.
            // We will add those to the stations array and use
            // it to determine the end station later on.

            // Split the string from ColumnOne on ; (or your own
            // column separator). The Split returns an array.
            Stations = Row.ColumnOne.Split(';');
        }
        else
        {
            // Now the rows will contain distances (and the StartStation)
            // Split the distances on ; (or your own column separator)
            Distances = Row.ColumnOne.Split(';');

            // Now loop through distances array, but start on 1 (not on 0)
            // because 0 contains the StartStation in the distances array
            for (int counter = 1; counter < Distances.Length; counter++)
            {
                // Add new Row and then fill the columns
                Output0Buffer.AddRow();
                // Get the Distance from the Distance array and convert it to int
                Output0Buffer.Distance = Convert.ToInt32(Distances[counter]);
                // Get the Start station from the distance array (the first item)
                Output0Buffer.StartStation = Distances[0];
                // Get the End station from stations array
                Output0Buffer.EndStation = Stations[counter];
            }
        }
    }
}

4) The result
Now close the Script Component and add more transformations or a destination and see what the Script Component does with your data. I added a dummy Derived Column and Data Viewer to see the data before and after the Script Component. For this file I had 27 rows and columns as input and 676 rows as output (26 * 26).



Saturday, 25 February 2017

Change Protection Level for all packages at once

Case
I created dozens of packages in my project but I forgot to change the default Protection Level in the project properties from "EncryptSensitiveWithUserKey" to "DontSaveSensitive". Now I have to change all packages one by one. Is there an alternative? I tried search and replace in the XML, but I can't find the Protection Level property.




















Solution
Of course the best option is to prevent this from happening by setting the default before you start. You can do this in the properties of the project. All new packages will then inherit the Protection Level from the project.
Setting Protection Level on project














First, when trying to search and replace in the XML code of the packages you will notice that you cannot find the default 'EncryptSensitiveWithUserKey' which makes it hard to replace.
Default Protection Level is not in package
















Secondly, the Protection Level is also stored in the Visual Studio project file (*.dtproj). When you open a package in design mode and press the save button it also updates metadata in the project file.
Protection Level in project file as well


















Solution A
Good old Command Prompt to the rescue! The dtutil Utility can do the package conversion for you. If you are afraid of the Command Prompt or even never heard about it, then don't use this solution.

1) Command Prompt
Open a Command Prompt and use CD (Change Directory) command to navigate to your folder with packages.
Navigate to your project folder with packages













2) Foreach Loop Container in DOS
Now you can call the dtutil Utility for each package in that folder with something similar as a Foreach Loop Container:
FOR %p IN (*.dtsx) DO dtutil.exe /file %p /encrypt file;%p;0 /quiet
The colors explain the command













3) Execute
When you execute the command, dtutil Utility will quickly change the Protection Level of all your packages.
101 packages changed within 5 seconds. Try that in Visual Studio!





















4) Project Protection Level
If you haven't already done it, change the Protection Level in the Project Properties. See second screenshot of this blog post.

5) dtproj file
Now the project and all its packages have the same Protection Level, but the project doesn't now that yet. If you try to execute a package it will complain about the Protection Level inconsistencies.
Failed to execute the package or element. Build errors were encountered.








Error : Project consistency check failed. The following inconsistencies were detected:
 MyPackage000.dtsx has a different ProtectionLevel than the project.
 MyPackage001.dtsx has a different ProtectionLevel than the project.

To update the dtproj file you have to open all packages and then Rebuild the project. This will update the project file. Now you can execute the packages without the consistency error.
Open all packages and rebuild the project





















Solution B
Good old PowerShell to the rescue! This PowerShell script does the same as above, but also changes the project file. So no manual labour at all. Because the dtutil utility was so fast, I didn't edit the packages with .net libraries. It just executes dtutil in a hidden window.

The script is thoroughly tested for SSIS 2012-2016 from 'EncryptSensitiveWithUserKey' to 'DontSaveSensitive'. Other situations require more testing. Make sure to keep a copy of your project before using this script and let me know which situations require some more attention.
Change the protection level of the entire project in seconds


















#PowerShell script
################################
########## PARAMETERS ##########
################################ 
$projectFolder = "C:\SSIS\myProject\myProject"
$dtutilPath = "C:\Program Files\Microsoft SQL Server\130\DTS\Binn\dtutil.exe"
# The number changes per SQL Server version
# 130=2016, 120=2014, 110=2012
# Also check the drive where SQL Server is
# installed


#################################################
########## DO NOT EDIT BELOW THIS LINE ##########
#################################################
clear
Write-Host "========================================================================================="
Write-Host "==                                 Used parameters                                     =="
Write-Host "========================================================================================="
Write-Host "Project Folder          :" $projectFolder                                                 
Write-Host "dtutil Path             :" $dtutilPath                                                    
Write-Host "========================================================================================="
 
 
######################################
########## Check parameters ##########
######################################
# Test whether the paths are filled
# and exists.
if ($projectFolder -eq "")
{
    Throw [System.Exception] "Project path parameter is mandatory"
}
elseif (-Not (Test-Path $projectFolder))
{
    Throw  [System.IO.FileNotFoundException] "Project path $($projectFolder) doesn't exists!"
}
elseif (-Not $projectFolder.EndsWith("\"))
{
    # Make sure path ends with \ for command
    $projectFolder = $projectFolder + "\"
}
if ($dtutilPath -eq "")
{
    Throw [System.Exception] "dtutil parameter is mandatory"
}
elseif (-Not (Test-Path $dtutilPath))
{
    Throw  [System.IO.FileNotFoundException] "dtutil not found at $($dtutilPath)"
}

 
#############################################
########## dtutil for loop command ##########
#############################################
# In this script we are executing dtutil.exe
# Perhaps a bit quick & dirty, but more quick
# than dirty. It changes 100 packages within
# seconds.
$command = "/C FOR %p IN ($($projectFolder)*.dtsx) DO dtutil.exe /file %p /encrypt file;%p;0 /quiet"
Write-Host "Editing packages in $($projectFolder)... " -NoNewline

# Open the command prompt (hidden) and execute
# dtutil.exe with the parameters from above.
Start-Process "C:\Windows\System32\cmd.exe" -ArgumentList $command -WindowStyle Hidden -Wait
Write-Host "Done."


##########################################
########## Editing project file ##########
##########################################
# Find the project file. There should be
# only one dtproj file.
$projectFile = get-childitem $projectFolder -name -filter *.dtproj
Write-Host "Editing project file $($projectFile)... "  -NoNewline                                                  

# Edit the project file and replace the
# protection level. First replace is for
# all the packages and the second replace
# is for the project itself. It uses a
# regular expression for the replace, but
$projectFilePath = Join-Path -Path $projectFolder -ChildPath $projectFile
(Get-Content $projectFilePath) -replace 'ProtectionLevel">[0-9]', 'ProtectionLevel">0' -replace 'ProtectionLevel="[A-Za-z]*"', 'ProtectionLevel="DontSaveSensitive"' | Set-Content $projectFilePath
Write-Host "Done."

##############################
########## Finished ##########
##############################
# Finished editing packages and project file
Write-Host "Finished editing $($projectFile) and $((get-childitem $projectFolder -name -filter *.dtsx).Count) packages" -ForegroundColor Magenta



Related Posts Plugin for WordPress, Blogger...