Thursday, 15 February 2018

Pause and resume Integration Runtime to save money

Case
Azure Data Factory (V2) now supports running SSIS packages in an Integration Runtime, but you are charged by the hour. How can I automatically pause (and resume) my SSIS environment in Azure to save some money on my Azure bill?
Pause your Integration Runtime in the portal













Solution
For this example I'm using an Integration Runtime (IR) that runs an SSIS package each hour during working hours. After working hour it is no longer necessary to refresh the data warehouse. Therefore the IR could be suspended to save money. I will also suspend the trigger that runs the pipeline (package) each hour to prevent errors. For this solution I will use a PowerShell script that runs in an Azure Automation Runbook.

1) Collect parameters
Before we start coding we first we need to get the name of the Azure Data Factory and its Resource group.
Your Azure Data Factory (V2)
















If you also want to disable the trigger then we need its name. This is probably only necessary if you are running hourly and didn't create separate triggers for each hour. You can find it by clicking on Author & Monitor under Quick links.
Your trigger













2) Azure Automation Account
Create an Azure Automation Account. You can find it under + New, Monitoring + Management. Make sure that Create Azure Run As account is turned on.
Azure Automation Account























3) Import Module
We need to tell our code about Integration Runtimes in Azure Data Factory. You do this by adding a modules. Scroll down in the menu of the Automation Account and click on Module. Now you see all installed modules. We need to add the module called AzureRM.DataFactoryV2, however it depends on AzureRM.Profile (≥ 4.2.0). Click on Browse gallery and search for AzureRM.Profile and import it and then repeat it for AzureRM.DataFactoryV2. Make sure to add the V2 version!
Import modules


















4) Connections
This step is for your information only and to understand the code. Under Connections you will find a default connection named 'AzureRunAsConnection' that contains information about the Azure environment, like the tendant id and the subscription id. To prevent hardcoded connection details we will retrieve some of these fields in the PowerShell code.
AzureRunAsConnection













5) Runbooks
Now it is time to add a new Azure Runbook for the PowerShell code. Click on Runbooks and then add a new runbook (There are also five example runbooks of which AzureAutomationTutorialScript could be useful as an example). Give your new Runbook a suitable name and choose PowerShell as type. There will be two separate runbooks/scripts: one for pause and one for resume. When finished with the Pause script you need to repeat this for the Resume script.
Create new Rubook
















6) Edit Script
After clicking Create in the previous step the editor will we opened. When editing an existing Runbook you need to click on the Edit button to edit the code. You can copy and paste the code of one of these scripts below to your editor. Study the green comments to understand the code. Also make sure to fill in the right value for the variables (see parameters).

The first script is the pause script and the second script is the resume script. You could merge both scripts and use an if statement on the status property to either pause or resume, but I prefer two separate scripts both with their own schedule.

# This scripts pauses your Integration Runtime (and its trigger) if it is running

# Parameters
$ConnectionName = 'AzureRunAsConnection'
$DataFactoryName = 'ADF-SSISJoost'
$ResourceGroup = 'joost_van_rossum'
$TriggerName = 'Hourly'

# Do not continue after an error
$ErrorActionPreference = "Stop"

########################################################
# Log in to Azure (standard code)
########################################################
Write-Verbose -Message 'Connecting to Azure'
try
{
    # Get the connection "AzureRunAsConnection "
    $ServicePrincipalConnection = Get-AutomationConnection -Name $ConnectionName         

    'Log in to Azure...'
    $null = Add-AzureRmAccount `
        -ServicePrincipal `
        -TenantId $ServicePrincipalConnection.TenantId `
        -ApplicationId $ServicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $ServicePrincipalConnection.CertificateThumbprint 
}
catch 
{
    if (!$ServicePrincipalConnection)
    {
        # You forgot to turn on 'Create Azure Run As account' 
        $ErrorMessage = "Connection $ConnectionName not found."
        throw $ErrorMessage
    }
    else
    {
        # Something went wrong
        Write-Error -Message $_.Exception.Message
        throw $_.Exception
    }
}
########################################################


# Get Integration Runtime in Azure Data Factory
$IntegrationRuntime = Get-AzureRmDataFactoryV2IntegrationRuntime `
                            -DataFactoryName $DataFactoryName `
                            -ResourceGroupName $ResourceGroup

# Check if Integration Runtime was found
if (!$IntegrationRuntime)
{
    # Your ADF does not have a Integration Runtime
    # or the ADF does not exist
    $ErrorMessage = "No Integration Runtime found in ADF $($DataFactoryName)."
    throw $ErrorMessage
}
# Check if the Integration Runtime is running
elseif ($IntegrationRuntime.State -eq "Started")
{
    <# Start Trigger Deactivation #>
    # Getting trigger to check if it exists
    $Trigger = Get-AzureRmDataFactoryV2Trigger `
  -DataFactoryName $DataFactoryName `
  -Name $TriggerName `
  -ResourceGroupName $ResourceGroup

    # Check if the trigger was found
    if (!$Trigger)
    {
        # Fail options:
        # The ADF does not exist (typo)
        # The trigger does not exist (typo)
        $ErrorMessage = "Trigger $($TriggerName) not found."
        throw $ErrorMessage
    }
    # Check if the trigger is activated
    elseif ($Trigger.RuntimeState -eq "Started")
    {
        Write-Output "Stopping Trigger $($TriggerName)"
        $null = Stop-AzureRmDataFactoryV2Trigger `
                    -DataFactoryName $DataFactoryName `
                    -Name $TriggerName `
                    -ResourceGroupName $ResourceGroup `
                    -Force
    }
    else
    {
        # Write message to screen (not throwing error)
        Write-Output "Trigger $($TriggerName) is not activated."
    }
    <# End Trigger Deactivation #>

    # Stop the integration runtime
    Write-Output "Pausing Integration Runtime $($IntegrationRuntime.Name)."
    $null = Stop-AzureRmDataFactoryV2IntegrationRuntime `
                -DataFactoryName $IntegrationRuntime.DataFactoryName `
                -Name $IntegrationRuntime.Name `
                -ResourceGroupName $IntegrationRuntime.ResourceGroupName `
                -Force
    Write-Output "Done"
}
else
{
    # Write message to screen (not throwing error)
    Write-Output "Integration Runtime $($IntegrationRuntime.Name) is not running."
}




# This scripts resumes your Integration Runtime (and its trigger) if it is stopped

# Parameters
$ConnectionName = 'AzureRunAsConnection'
$DataFactoryName = 'ADF-SSISJoost'
$ResourceGroup = 'joost_van_rossum'
$TriggerName = 'Hourly'

# Do not continue after an error
$ErrorActionPreference = "Stop"

########################################################
# Log in to Azure (standard code)
########################################################
Write-Verbose -Message 'Connecting to Azure'
try
{
    # Get the connection "AzureRunAsConnection "
    $ServicePrincipalConnection = Get-AutomationConnection -Name $ConnectionName         

    'Log in to Azure...'
    $null = Add-AzureRmAccount `
        -ServicePrincipal `
        -TenantId $ServicePrincipalConnection.TenantId `
        -ApplicationId $ServicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $ServicePrincipalConnection.CertificateThumbprint 
}
catch 
{
    if (!$ServicePrincipalConnection)
    {
        # You forgot to turn on 'Create Azure Run As account' 
        $ErrorMessage = "Connection $ConnectionName not found."
        throw $ErrorMessage
    }
    else
    {
        # Something went wrong
        Write-Error -Message $_.Exception.Message
        throw $_.Exception
    }
}
########################################################


# Get Integration Runtime in Azure Data Factory
$IntegrationRuntime = Get-AzureRmDataFactoryV2IntegrationRuntime `
                            -DataFactoryName $DataFactoryName `
                            -ResourceGroupName $ResourceGroup

# Check if Integration Runtime was found
if (!$IntegrationRuntime)
{
    # Your ADF does not have a Integration Runtime
    # or the ADF does not exist
    $ErrorMessage = "No Integration Runtime found in ADF $($DataFactoryName)."
    throw $ErrorMessage
}
# Check if the Integration Runtime is running
elseif ($IntegrationRuntime.State -ne "Started")
{
    # Resume the integration runtime
    Write-Output "Resuming Integration Runtime $($IntegrationRuntime.Name)."
    $null = Start-AzureRmDataFactoryV2IntegrationRuntime `
                -DataFactoryName $IntegrationRuntime.DataFactoryName `
                -Name $IntegrationRuntime.Name `
                -ResourceGroupName $IntegrationRuntime.ResourceGroupName `
                -Force
    Write-Output "Done"
}
else
{
    # Write message to screen (not throwing error)
    Write-Output "Integration Runtime $($IntegrationRuntime.Name) is already running."
}

<# Start Trigger Activation #>
# Getting trigger to check if it exists
$Trigger = Get-AzureRmDataFactoryV2Trigger `
    -DataFactoryName $DataFactoryName `
    -Name $TriggerName `
    -ResourceGroupName $ResourceGroup

# Check if the trigger was found
if (!$Trigger)
{
    # Fail options:
    # The ADF does not exist (typo)
    # The trigger does not exist (typo)
    $ErrorMessage = "Trigger $($TriggerName) not found."
    throw $ErrorMessage
}
# Check if the trigger is activated
elseif ($Trigger.RuntimeState -ne "Started")
{
    Write-Output "Resuming Trigger $($TriggerName)"
    $null = Start-AzureRmDataFactoryV2Trigger `
                -DataFactoryName $DataFactoryName `
                -Name $TriggerName `
                -ResourceGroupName $ResourceGroup `
                -Force
}
else
{
    # Write message to screen (not throwing error)
    Write-Output "Trigger $($TriggerName) is already activated."
}
<# End Trigger Deactivation #>

Note: when you don't want to disable your trigger then remove the lines between <# Start Trigger Deactivation #> and <# End Trigger Deactivation #>.

7) Testing
You can use the Test Pane menu option in the editor to test your PowerShell scripts. When clicking on Run it will first Queue the script before Starting it. Running takes a couple of minutes.
Pausing Integration Runtime











Resuming Integration Runtime (20 minutes+)














8) Publish
When your script is ready, it is time to publish it. Above the editor click on the Publish button. Confirm overriding any previously published versions.
Publish your script















9) Schedule
And now that we have a working and published Azure Runbook, we need to schedule it. Click on Schedule to create a new schedule for your runbook. My packages run each hour during working hours. So for the resume script I created a schedule that runs every working day on 7:00AM. The pause script could for example be scheduled on working days at 9:00PM (21:00).
Now you need to hit the refresh button in the Azure Data Factory dashboard to see if it really works. It takes a few minutes to run, so don't worry too soon.
Add schedule




















Summary
In this post you saw how you can pause and resume your Integration Runtime in ADF to save some money on your Azure bill during the quiet hours. As said before pausing and resuming the trigger is optional. When creating the schedule, keep in mind that resuming/starting takes around 20 minutes to finish and note that you also pay during this startup phase.
Steps 5 to 9 need to be repeated for the resume script after you finished the pause script.

Update: you can also do a pause and resume in the pipeline itself if you only have one ETL job.

Update 2: now even easier done via Rest API

Saturday, 10 February 2018

SSIS Snack: The semaphore timeout period has expired

Case
My SSIS package that runs in Azure Data Factory V2 (and gets data from an on-premises source) suddenly stops working and throws an communication error. After this error it won't run again. What is happening?

TCP Provider: The semaphore timeout period has expired









DFT - DIM_xxxxx:Error: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "Communication link failure".
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 11.0"  Hresult: 0x80004005  Description: "TCP Provider: The semaphore timeout period has expired.".



Solution
The error seems to be caused by a network communication hiccup with our on-premises source which is connected with a VNET in Azure. Although I do not know the actual cause of the error. There is a solution: restart your Integration Runtime.

1) ADF dashboard
Go to your Azure Data Factory (ADF) that hosts your Integration Runtime and click on Author & Monitor within the Quick links section. This will open the ADF dashboard.
Author & Monitor




















2) Stop & Start IR
Click on Author (pencil), on Connections and then on Integration Runtimes (IR). Then Stop and start your IR. This could take up to 30 minutes! After that rerun your package and it should be working again.
Stop & Start IR

















Please let me know in the comments whether it worked for you and if you found the actual cause of the error.

Sunday, 4 February 2018

Schedule package in Azure Integration Services (ADF V2)

Case
I have deployed my SSIS project to an SSIS Catalog in Azure (Data Factory) and now I want to schedule the package to run each day at 7:00AM. How do I do that?
How to schedule this SSIS package in Azure?





















Solution
Microsoft suggests the following options:
  • SQL Server Agent. Here you use an on-premises SQL Server Agent to schedule the SSIS package in the cloud. This sounds like the least attractive solution. However if you still have a lot of on-premises projects that all run via SQL Server Agent then it is easy to use this solution and have all scheduled executions on one location.
  • SQL Database elastic jobs. This requires adding an extra component to your subscription, but if you already use a SQL Elastic database pool then this could be a good solution. Will try this in a future post.
  • Azure Data Factory. Since we already use ADF V2 to host the Integration Runtime, this is most obvious solution. It executes a stored procedure that executes the package.
This post explains the ADF solution with the stored procedure, but executing an stored procedure can be done in various other Azure components such as a Runbook in Azure Automation, Logic Apps or even Azure Functions with a trigger on arriving new files in a Blob Storage container.

*update 13-04-2018: new Execute SSIS Package activity*

1) ADF V2 - Author & Monitor
Go to your Azure Data Factory that hosts your SSIS Integration Runtime. Under Quick links, click on the Author & Monitor link. A new tab will be opened with the Azure Data Factory dashboard. Next click on Create Pipeline circle and go to the next step.
Azure Data Factory dashboard






















2) Create pipeline with Stored Procedure
You just created a new pipeline in the previous step. Give it a descriptive name (like 'SSIS Job MyAzureProject') in the General tab. A description is optional. Next collapse the General activities and drag a Stored Procedure to the canvas. Again, give it a descriptive name in the General tab.
Add Stored Procedure to Pipeline





















3) Linked service
Next step is to create a linked service to the SSISDB to execute the Stored Procedure. Go to the SQL Account tab and add a new Linked service. Point it to the SSISDB that hosts the package that you want to execute. Hit the test button after filling in all fields to make sure the everything is correct. Then click Finish and continue to the next step.
Most important fields of the Linked service









Add Linked service

































4) Creating Stored Procedure call
The code to execute a package consists of multiple Stored Procedures. To keep it simple we will use sp_executesql to execute a string of SQL code containing all Stored Procedure calls. You can easily create the SQL for this in SSMS.

Go to your package in the Catalog. Right click it and choose Execute... Now set all options like Logging Level, Environment and 32/64bit. After setting all options hit the Script button instead of the Ok button. This is the code you want to use. You can finetune it with some code to check whether the package finished successfully.
Generating code in SSMS















The code below was generated and finetuned. Copy your code we need it in the next step.
-- Variables for execution and error message
DECLARE @execution_id bigint, @err_msg NVARCHAR(150)

-- Create execution and fill @execution_id variable
EXEC [SSISDB].[catalog].[create_execution] @package_name=N'Package.dtsx', @execution_id=@execution_id OUTPUT, @folder_name=N'SSISJoost', @project_name=N'MyAzureProject', @use32bitruntime=False, @reference_id=Null, @useanyworker=True, @runinscaleout=True

-- Set logging level: 0=None, 1=Basic, 2=Performance, 3=Verbose 
EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id,  @object_type=50, @parameter_name=N'LOGGING_LEVEL', @parameter_value=1 

-- Set synchonized option 0=A-SYNCHRONIZED, 1=SYNCHRONIZED 
-- A-SYNCHRONIZED: don't wait for the result
EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id,  @object_type=50, @parameter_name=N'SYNCHRONIZED',  @parameter_value=1 

-- Execute the package with parameters from above
EXEC [SSISDB].[catalog].[start_execution] @execution_id, @retry_count=0

-- Check if the package executed succesfully (only for SYNCHRONIZED execution)
IF(SELECT [status] FROM [SSISDB].[catalog].[executions] WHERE execution_id=@execution_id)<>7
BEGIN
 SET @err_msg=N'Your package execution did not succeed for execution ID: ' + CAST(@execution_id AS NVARCHAR(20))
 RAISERROR(@err_msg,15,1)
END

The last part of this T-SQL code is very useful. It will cause an error in de pipeline monitor of ADF when a package fails. When you click on the text balloon it will show which SSIS execution failed.
Showing when a package fails in the Pipeline monitor















5) Adding Stored Procedure
Go to the Stored Procedure tab and enter a Stored Procedure name manually by first checking the Edit option. Then enter sp_executesql. Then add a new parameter with the name stmt (type string) and paste the complete code of the previous step the value field. After this we are ready to test and schedule this ADF Pipeline.
Add Stored Procedure code


















6) Publish and trigger
Now it is time to test the Pipeline, but first hit the Publish All button (on the left side) to publish your new pipeline to ADF. Then click on the Trigger button and choose Trigger Now to execute the pipeline immediately.

After that click on the Monitor button to check your execution and/or go to the Catalog and open an execution report to see the result.
Publish and trigger manually















7) Schedule
If everything went well it is time to schedule your package. Go back to the Author page by clicking on the pencil (on the left side). Then click on the Trigger button, but now choose New/Edit to create a new schedule for your pipeline (package). For this example I choose Daily at 7:00AM. After adding the new schedule you have to Publish your pipeline (again).
Add Schedule















Note: that it uses UTC time.