Showing posts with label AZURE. Show all posts
Showing posts with label AZURE. Show all posts

Wednesday, 30 January 2019

Introducing Data Flows in Azure Data Factory

Case
Since last year we can run the good old SSIS packages in the Azure cloud. Not on a self created and self maintained virtual machine, but with the Integration Runtime service in Azure Data Factory.

However Microsoft also introduced Azure Databricks and called in ETL 2.0. So I'm not sure how much effort Microsoft will put in any future release of SSIS. But Azure Databricks is not a visual tool and not everybody is into coding. Is there an alternative?
Visual ETL in Azure















Solution
You can probably keep on using SSIS for an other decade, but if indeed the development of SSIS stagnates and the rest of the data platform isn't, then it is good to learn a couple of new tricks. Within the Microsoft Data Platform we a couple of alternatives for SSIS:


  1. Azure Databricks
    As mentioned above this requires learning some new coding skills since this isn't a visual development tool. I will post an introduction in a later blog post.
  2. Azure Data Factory with Pipelines and T-SQL
    You could use the Copy Data activity in combination with the Stored Procedure activity and build all transformations in T-SQL. If you are a diehard SSIS developer then this is probably not your cup of thee.
  3. Power BI Dataflows
    The Power BI team just introduced self service ETL within Power BI. We just added a new post to explain this new option which will make some people very exciting.
  4. Azure Data Factory Dataflows
    This is a new preview feature in Azure Data Factory to visually create ETL flows. Below I will show you the steps to create you own first simple Data Flow.


1) Request preview access
If you are reading this during the preview periode (early 2019), then first request access via this form. With your Azure subscription Guid that you need to provide, Microsoft will turn this new feature on for you that specific subscription.
Check 'Version' for the preview version


























2) Create Databricks Service
Yes you are reading this correctly. Under the hood Data Factory is using Databricks to execute the Data flows, but don't worry you don't have to write code.
Create a Databricks Service and choose the right region. This should be the same as your storage region to prevent high data movement costs. As Pricing Tier you can use Standard for this introduction. Creating the service it self doesn't cost anything.
Create Databricks Service



















3) Create Databricks cluster
Now go to the newly created service and click on the Launch Workspace button under the big Databricks logo in the center.
Launch Workspace













A new tab will open (just like with data factory) and in this workspace we need to create a cluster. A cluster is a collection of virtual machines that will do the ETL work for you. This is also where you start paying! Click on Clusters in the left menu and then on Create Cluster (or click on New Cluster under Common Tasks in the center).
Create new Databricks cluster























For the Cluster Mode use the default option: Standard. For the Runtime Version choose: Runtime: 5.0 (Scale 2.11, Spark 2.4.0). Pyhthon version. The other options to change are the number of minutes of inactivity before terminating the machine. For this example 60 minutes should be enough to create and execute your new data flow. And most important for you wallet is the Worker Type and the numbers of workers. For this example one of the cheapest (Standard_DS3_v2) should be more than enough.
Create cluster manually




















Note: In a future post I will show you how to create a cluster automatically. For now we will create one manually, but terminate it automatically when you are ready (= not doing anything with the cluster).
Creating a cluster takes a few minutes












4) Create Data Factory V2 with data flow (preview)
Create a new Data Factory and choose 'V2 with data flow (preview)' as the version. At the moment of writing this new feature is only available in 4 regions, but more regions will be added on its way from preview to general availability. So watch out with moving lots of data between regions. Also notice the option to add a couple of examples which will help you to master the Data Flows.
Only available in 4 regions at the moment


























5) Add new Data Flow
Now go to the newly created Data Factory and click on Author & Monitor to go to the Data Factory portal.
Azure Data Factory, open portal














In the portal go to the Author page (pencil icon in the left menu) and then click on the three dots behind Data Flows and choose Add Dataflow. A new empty Dataflow will be created and we can start adding a source.
Add Dataflow















6) Add source
For this example I will use an existing file that is located in an Azure Blob Storage Container. Go to the new dataflow and click on the source to specify the file from the Blob Storage Container. First give the source a suitable name. In the second field you can find all existing datasets, but if you click on the +New button behind it, you can create a new dataset. Now you first need to give the dataset a name and then you can specify the connection to the Azure Blob Storage Container.
Adding source part 1















When creating a new Blob Storage Container dataset it only creates a link to the storage account itself. You still need to specify the container, filename and properties like delimiter and column names. To do that hit the Edit button and go to the Connections tab. Here you can specify all properties like location and file format. Next go to the Schema tab to specify the columns. You can do that manually or by clicking on the Import Schema button that will read the column names from the file. After this you can specify the datatypes for each column. Now the source dataset is ready, but we still have to map this to the source in the dataflow.
Adding source part 2















Go back to the new dataflow and go to the Define Schema tab. Use the Import from dataset button to import all columns from the dataset. The datatype is not copied from the dataset. I'm not sure whether this is deliberately or that it is a bug. Now the source is ready and we can start adding transformations.
Adding source part 3















7) Add transformation
The number of different transformations is not yet as rich as those of the SSIS Data Flow Task, but the most important transformations are available and there are also some handy new ones like SurrogateKey and Window (for aggregation on time windows) which are not available in SSIS.

On the other hand the expression language in the Derived Column is much more comprehensive with for example functions for hashing and a lot of string functions which are not available in SSIS.

For this example I will add a Derived column to upper the location column. Click on the little + behind the source to add a new transformation. Click on Derived Column and edit an existing column. When editing the expression a new window will appear. The upper method can be found under String and the column name can be found under Input. If you know SSIS then this should be very familiar.
Adding a Derived Column transformation















8) Add destination
For this example I will use an existing table in an Azure SQL Database as destination. Click on the little + behind the Derived Column to add a new item to the dataflow. Scroll down to the bottom and click on Sink. Now you can add a new destination dataset (sink dataset) for this output. First give the output stream a name and then click on the + New button to add a dataset. Choose Azure SQL Server and give the dataset a name in the next step. After that we need to create a new connection to the database server. See the animated gifs for the details.
Destination - Add sink dataset and connection















When creating a new sink dataset and its connection, it only creates a link to the SQL Server database. You still need to specify the table name and columns. To do that hit the Edit button on the dataflow destination and go to the Connections tab. Here you can specify which table you want to use. Next go to the Schema tab to specify the columns. You can do that by clicking on the Import Schema button that will read the columns from the database table. Now the sink dataset is ready, but you still need to create a column mapping.
Destination - Choose table and columns















Go back to the dataflow, click on the destination and then go to the Mapping tab. If the column names in your dataflow are the same then you can use Auto Mapping. Otherwise turn of the auto mapping and do it manually.
Destination - Mapping
















9) Debug
You probably already clicked on the Data Preview tab of the source, derived column or destination and saw that you first have to turn on debug mode.
Please turn on the debug mode and wait until cluster is ready to preview data




















To do this we first need to get a new token from Azure Databricks to connect from Data Factory. Go to the Databricks portal and click in the person icon in the top right. Then choose User Settings and then hit the Generate New Token button. Copy it and keep it save in for example KeePass because you wont be able to retrieve it again.
Generate New Token in Azure Databricks















Now go back to your dataflow and switch on the debug mode in the top left of the dataflow. A new form appears on the right side where you can create a new linked service to the Databricks service. This is where you need the token from the previous step. For details see animated gif. After that select the cluster that you created and hit the start button. Now it will connect to the cluster and you will be able to see the contents of the Data Preview tabs.
Debugging dataflow (not the actual speed)















10) Create pipeline
The dataflow is ready and we can now add it to a data factory pipeline with the new dataflow preview activity. You need to provide the name of the dataflow you want to execute, but also the link to the Azure Databricks Service. You can reuse the linked service from the previous step.
Adding dataflow to the pipeline (not the actual speed)















And the result viewed with SSMS


























Summary
Microsoft created a new Visual ETL tool with lots of promising features that could become a great successor for SSIS. However it does not yet have the Visual Studio feeling like with SSIS. So that will take some time to get used to. Debugging is an other thing. It works, but takes me way to much waiting time at the moment. It will probably be faster with a more expensive cluster, but not sure it will match the SSDT experience. Besides these two notes, I can't wait to get some actual working experience with it.







Sunday, 1 April 2018

Start and stop Integration Runtime in ADF pipeline

Case
You showed me how to schedule a pause and resume of the Integration Runtime (IR) in Azure Automation, but can you also start and stop IR in the Azure Data Factory (ADF) pipeline with one of the activities? This will save the most money possible, especially when you only have one ETL job.
Pause and Resume IR in ADF pipeline












Solution
Yes you can and even better... you can reuse the existing Runbook PowerShell script that pauses and resumes the IR. Instead of scheduling it, which is more appropriate when you have multiple projects to run, we will call the scripts via their webhooks.

Update: now easier done via Rest API

Prerequisites



A) Azure Automation Runbook
If you already have the two Runbooks that pauses and resumes the IR then you could skip step 1 to 8 and only do step 9 (create webhook) for both scripts. But then make sure to delete, disable or unlink the schedule of the two Runbooks and remove the optional trigger part of the code.

1) Collect parameters
Before we start coding we first we need to get the name of the Azure Data Factory and its Resource group.
Your Azure Data Factory (V2)

















2) Azure Automation Account
Create an Azure Automation Account. You can find it under + New, Monitoring + Management. Make sure that Create Azure Run As account is turned on.
Azure Automation Account























3) Import Module
We need to tell our code about Integration Runtimes in Azure Data Factory. You do this by adding a modules. Scroll down in the menu of the Automation Account and click on Module. Now you see all installed modules. We need to add the module called AzureRM.DataFactoryV2, however it depends on AzureRM.Profile (≥ 4.2.0). Click on Browse gallery and search for AzureRM.Profile and import it and then repeat it for AzureRM.DataFactoryV2. Make sure to add the V2 version!
Import modules


















4) Connections
This step is for your information only and to understand the code. Under Connections you will find a default connection named 'AzureRunAsConnection' that contains information about the Azure environment, like the tendant id and the subscription id. To prevent hardcoded connection details we will retrieve some of these fields in the PowerShell code.
AzureRunAsConnection













5) Runbooks
Now it is time to add a new Azure Runbook for the PowerShell code. Click on Runbooks and then add a new runbook (There are also five example runbooks of which AzureAutomationTutorialScript could be useful as an example). Give your new Runbook a suitable name and choose PowerShell as type. There will be two separate runbooks/scripts: one for pause and one for resume. When finished with the Pause script you need to repeat this for the Resume script (steps 5 to 9).
Create new Rubook

















6) Edit Script
After clicking Create in the previous step the editor will we opened. When editing an existing Runbook you need to click on the Edit button to edit the code. You can copy and paste the code of one of these scripts below to your editor. Study the green comments to understand the code. Also make sure to fill in the right value for the variables (see parameters).

The first script is the pause script and the second script is the resume script. You could merge both scripts and use an if statement on the status property or some parameters to either pause or resume, but I prefer two separate scripts.

# This scripts pauses your Integration Runtime if it is running

# Parameters
$ConnectionName = 'AzureRunAsConnection'
$DataFactoryName = 'ADF-SSISJoost'
$ResourceGroup = 'joost_van_rossum'

# Do not continue after an error
$ErrorActionPreference = "Stop"

########################################################
# Log in to Azure (standard code)
########################################################
Write-Verbose -Message 'Connecting to Azure'
try
{
    # Get the connection "AzureRunAsConnection "
    $ServicePrincipalConnection = Get-AutomationConnection -Name $ConnectionName         

    'Log in to Azure...'
    $null = Add-AzureRmAccount `
        -ServicePrincipal `
        -TenantId $ServicePrincipalConnection.TenantId `
        -ApplicationId $ServicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $ServicePrincipalConnection.CertificateThumbprint 
}
catch 
{
    if (!$ServicePrincipalConnection)
    {
        # You forgot to turn on 'Create Azure Run As account' 
        $ErrorMessage = "Connection $ConnectionName not found."
        throw $ErrorMessage
    }
    else
    {
        # Something went wrong
        Write-Error -Message $_.Exception.Message
        throw $_.Exception
    }
}
########################################################


# Get Integration Runtime in Azure Data Factory
$IntegrationRuntime = Get-AzureRmDataFactoryV2IntegrationRuntime `
                            -DataFactoryName $DataFactoryName `
                            -ResourceGroupName $ResourceGroup

# Check if Integration Runtime was found
if (!$IntegrationRuntime)
{
    # Your ADF does not have a Integration Runtime
    # or the ADF does not exist
    $ErrorMessage = "No Integration Runtime found in ADF $($DataFactoryName)."
    throw $ErrorMessage
}
# Check if the Integration Runtime is running
elseif ($IntegrationRuntime.State -eq "Started")
{
    # Stop the integration runtime
    Write-Output "Pausing Integration Runtime $($IntegrationRuntime.Name)."
    $null = Stop-AzureRmDataFactoryV2IntegrationRuntime `
                -DataFactoryName $IntegrationRuntime.DataFactoryName `
                -Name $IntegrationRuntime.Name `
                -ResourceGroupName $IntegrationRuntime.ResourceGroupName `
                -Force
    Write-Output "Done"
}
else
{
    # Write message to screen (not throwing error)
    Write-Output "Integration Runtime $($IntegrationRuntime.Name) is not running."
}




# This scripts resumes your Integration Runtime if it is stopped

# Parameters
$ConnectionName = 'AzureRunAsConnection'
$DataFactoryName = 'ADF-SSISJoost'
$ResourceGroup = 'joost_van_rossum'

# Do not continue after an error
$ErrorActionPreference = "Stop"

########################################################
# Log in to Azure (standard code)
########################################################
Write-Verbose -Message 'Connecting to Azure'
try
{
    # Get the connection "AzureRunAsConnection "
    $ServicePrincipalConnection = Get-AutomationConnection -Name $ConnectionName         

    'Log in to Azure...'
    $null = Add-AzureRmAccount `
        -ServicePrincipal `
        -TenantId $ServicePrincipalConnection.TenantId `
        -ApplicationId $ServicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $ServicePrincipalConnection.CertificateThumbprint 
}
catch 
{
    if (!$ServicePrincipalConnection)
    {
        # You forgot to turn on 'Create Azure Run As account' 
        $ErrorMessage = "Connection $ConnectionName not found."
        throw $ErrorMessage
    }
    else
    {
        # Something went wrong
        Write-Error -Message $_.Exception.Message
        throw $_.Exception
    }
}
########################################################


# Get Integration Runtime in Azure Data Factory
$IntegrationRuntime = Get-AzureRmDataFactoryV2IntegrationRuntime `
                            -DataFactoryName $DataFactoryName `
                            -ResourceGroupName $ResourceGroup

# Check if Integration Runtime was found
if (!$IntegrationRuntime)
{
    # Your ADF does not have a Integration Runtime
    # or the ADF does not exist
    $ErrorMessage = "No Integration Runtime found in ADF $($DataFactoryName)."
    throw $ErrorMessage
}
# Check if the Integration Runtime is running
elseif ($IntegrationRuntime.State -ne "Started")
{
    # Resume the integration runtime
    Write-Output "Resuming Integration Runtime $($IntegrationRuntime.Name)."
    $null = Start-AzureRmDataFactoryV2IntegrationRuntime `
                -DataFactoryName $IntegrationRuntime.DataFactoryName `
                -Name $IntegrationRuntime.Name `
                -ResourceGroupName $IntegrationRuntime.ResourceGroupName `
                -Force
    Write-Output "Done"
}
else
{
    # Write message to screen (not throwing error)
    Write-Output "Integration Runtime $($IntegrationRuntime.Name) is already running."
}

7) Testing
You can use the Test Pane menu option in the editor to test your PowerShell scripts. When clicking on Run it will first Queue the script before Starting it. Running takes a couple of minutes.
Pausing Integration Runtime











Resuming Integration Runtime (20 minutes+)














8) Publish
When your script is ready, it is time to publish it. Above the editor click on the Publish button. Confirm overriding any previously published versions.
Publish your script















9) Adding a Webhook
To start the runbooks we will be using a Webhook. This is a URL with a long key in it. Do not share this URL, because with it others could execute your runbook. Treat it like a password.
On the Runbook overview page click on Webhook and create a new Webhook. Give it a suitable name (that is similar to its runbook). Set the expiration date and don't forget the copy the URL. This is the only time you can see the URL. If you loose it you need to recreate the Webhook. The URL looks something like: https://s2events.azure-automation.net/webhooks?token=vfIx%2fOAHcsJCn95abSXbklPrPXNlFUHwpr%2bSWyANlk0%3d
Add Webhook



















B) Azure Data Factory
The scripts are now ready. Next go to your existing ADF pipeline that executes the SSIS package. We will be adding two activities before executing the package and one behind it.


1) Resume IR Add Web Activity
Next collapse the General activities and drag a Web activity as first activity in your pipeline. This activity will be calling the Webhook of the Resume-SSIS runbook. Give it a suitable name like 'Resume-SSIS'.
Add Web Activity for Resume IR

















2) Resume IR Web activity settings
Select the newly added Web activity and go to the Settings page. In the URL field you must paste the URL from step A9 (the URL/Webhook that resumes the IR) and as method you need to select Post.
Add Webhook URL

















3) Wait until online Stored Procedure
Because calling the Webhook is asynchronous (don't wait for the result) and starting the IR takes 15 to 20 minutes we need to wait that amount of time. The database view [catalog].[worker_agents] in the SSISDB database will tell you when the IR is active.

In the SQL Account tab you can select the existing connection to the SSISDB. Then go to the Stored Procedure tab and add 'sp_executesql' as Stored procedure name. Next add a string parameter called stmt (statement) and paste the code below as value.

When finished connect the Resume-SSIS activity to this new Wait activity and then connect the Wait activity to the activity that executes the package.
-- Wait until Azure-SSIS IR is started
WHILE NOT EXISTS (SELECT * FROM [SSISDB].[catalog].[worker_agents] WHERE IsEnabled = 1 AND LastOnlineTime > DATEADD(MINUTE, -1, SYSDATETIMEOFFSET()))
BEGIN
    WAITFOR DELAY '00:00:30';
END
Add wait

















Tip: add a 30 minute time-out on this activity to prevent endless waiting in case of unexpected errors.

4) Pause IR Web Activity
Last step is to add an other Web activity called Pause-SSIS. This is similar to the Resume-SSIS activity, but with an other Webhook URL. When finished connect the activity that executes the package to this new Pause activity. Make sure to change it from the default Success to Completion. Otherwise the IR keeps on running in case a package fails.

The only thing left is some realignment and then you can publish and test your ADF pipeline.
Add Web Activity for Pausing IR

















5) Test result
If you run the pipeline and check its activities in the monitor you see all the individual steps. Starting the IR took almost 25 minutes of the 32 minutes in total. So now you only have to pay for the 32 minutes.
Activities










Summary
In this post I showed you how to pause and resume the Integration Runtime in the ADF pipeline. This is especially useful if you only have one job or just a few that don't run at the same time. For all other cases I recommend to schedule pause and resume.

Thursday, 1 March 2018

Azure Blob Source ≠ Flat File Source

Case
I'm running my SSIS packages in Azure and my source is a flat file in an Azure Blob Storage container. Therefor I use the Azure Blob Source as a source in my Data Flow Task. However this source has just a few formatting options compared to the Flat File Source (and its connection manager). I want to specify things like qualifiers and data types. How do I do that?
Azure Blob Source has too few options


























Solution
With the current version of the Azure Blob Source you can only specify the column separator, but there is a workaround available. Your Integration Runtime (IR) that is hosted in ADF is actually a virtual machine with Integration Services on it. A simple Script Task running on that IR reveals the drives and their available space. It shows that we have several drives available on that Virtual Machine.
Log with drive details











// C# Code to replace your Main() method
public void Main()
{
    // Create array with drive information
    System.IO.DriveInfo[] allDrives = System.IO.DriveInfo.GetDrives();

    // Create string to store a message
    String DriveDetails = "";

    // Loop through all drives to get info about it
    foreach (System.IO.DriveInfo d in allDrives)
    {
        // Get drive letter (C:) and type (NTFS)
        DriveDetails = d.Name + "(" + d.DriveType + ")" + Environment.NewLine;

        // If drive is ready you can get more details
        if (d.IsReady == true)
        {
            DriveDetails += " - Volume label: " + d.VolumeLabel + Environment.NewLine;
            DriveDetails += " - File system: " + d.DriveFormat + Environment.NewLine;
            DriveDetails += " - Available space to current user: " + d.AvailableFreeSpace + Environment.NewLine;
            DriveDetails += " - Total available space: " + d.TotalFreeSpace + Environment.NewLine;
            DriveDetails += " - Total size of drive: " + d.TotalSize;
        }

        // Fire the message as warning to stand out between other messages
        Dts.Events.FireWarning(0, "Details", DriveDetails, "", 0);
    }

    // End Script Task
    Dts.TaskResult = (int)ScriptResults.Success;
}

So the solution is to first use the Azure Blob Download Task to download the file from the Blob Storage Container to the Virtual Machine. After that you can use a regular Flat File Source in the Data Flow Task.
Azure Blob Download Task



























I'm not sure what the purpose is of these disks and if one of them is for non-persistent data (disk that are automatically cleaned), but I recommend using the E drive to temporarily store the downloaded files and clean up afterwards.

Windows Temp folder
An alternative to pick a temporarily folder on your IR machine is to use a very simple Script Task with only one line of code that retrieves the path of the Windows temp folder. The path looks something like D:\Users\WATASK_1\AppData\Local\Temp\. If you store this path in an SSIS string variable, then you can use that for expressions on your tasks and Flat File connection manager. After the next reboot Windows removes all old files in this folder.
// C# code (see line 5)
public void Main()
{
    // TODO: Add your code here
    Dts.Variables["User::tempPath"].Value = System.IO.Path.GetTempPath();

    Dts.TaskResult = (int)ScriptResults.Success;
}

Script Task that fills a variable




Thursday, 15 February 2018

Pause and resume Integration Runtime to save money

Case
Azure Data Factory (V2) now supports running SSIS packages in an Integration Runtime, but you are charged by the hour. How can I automatically pause (and resume) my SSIS environment in Azure to save some money on my Azure bill?
Pause your Integration Runtime in the portal













Solution
For this example I'm using an Integration Runtime (IR) that runs an SSIS package each hour during working hours. After working hour it is no longer necessary to refresh the data warehouse. Therefore the IR could be suspended to save money. I will also suspend the trigger that runs the pipeline (package) each hour to prevent errors. For this solution I will use a PowerShell script that runs in an Azure Automation Runbook.

1) Collect parameters
Before we start coding we first we need to get the name of the Azure Data Factory and its Resource group.
Your Azure Data Factory (V2)
















If you also want to disable the trigger then we need its name. This is probably only necessary if you are running hourly and didn't create separate triggers for each hour. You can find it by clicking on Author & Monitor under Quick links.
Your trigger













2) Azure Automation Account
Create an Azure Automation Account. You can find it under + New, Monitoring + Management. Make sure that Create Azure Run As account is turned on.
Azure Automation Account























3) Import Module
We need to tell our code about Integration Runtimes in Azure Data Factory. You do this by adding a modules. Scroll down in the menu of the Automation Account and click on Module. Now you see all installed modules. We need to add the module called AzureRM.DataFactoryV2, however it depends on AzureRM.Profile (≥ 4.2.0). Click on Browse gallery and search for AzureRM.Profile and import it and then repeat it for AzureRM.DataFactoryV2. Make sure to add the V2 version!
Import modules


















4) Connections
This step is for your information only and to understand the code. Under Connections you will find a default connection named 'AzureRunAsConnection' that contains information about the Azure environment, like the tendant id and the subscription id. To prevent hardcoded connection details we will retrieve some of these fields in the PowerShell code.
AzureRunAsConnection













5) Runbooks
Now it is time to add a new Azure Runbook for the PowerShell code. Click on Runbooks and then add a new runbook (There are also five example runbooks of which AzureAutomationTutorialScript could be useful as an example). Give your new Runbook a suitable name and choose PowerShell as type. There will be two separate runbooks/scripts: one for pause and one for resume. When finished with the Pause script you need to repeat this for the Resume script.
Create new Rubook
















6) Edit Script
After clicking Create in the previous step the editor will we opened. When editing an existing Runbook you need to click on the Edit button to edit the code. You can copy and paste the code of one of these scripts below to your editor. Study the green comments to understand the code. Also make sure to fill in the right value for the variables (see parameters).

The first script is the pause script and the second script is the resume script. You could merge both scripts and use an if statement on the status property to either pause or resume, but I prefer two separate scripts both with their own schedule.

# This scripts pauses your Integration Runtime (and its trigger) if it is running

# Parameters
$ConnectionName = 'AzureRunAsConnection'
$DataFactoryName = 'ADF-SSISJoost'
$ResourceGroup = 'joost_van_rossum'
$TriggerName = 'Hourly'

# Do not continue after an error
$ErrorActionPreference = "Stop"

########################################################
# Log in to Azure (standard code)
########################################################
Write-Verbose -Message 'Connecting to Azure'
try
{
    # Get the connection "AzureRunAsConnection "
    $ServicePrincipalConnection = Get-AutomationConnection -Name $ConnectionName         

    'Log in to Azure...'
    $null = Add-AzureRmAccount `
        -ServicePrincipal `
        -TenantId $ServicePrincipalConnection.TenantId `
        -ApplicationId $ServicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $ServicePrincipalConnection.CertificateThumbprint 
}
catch 
{
    if (!$ServicePrincipalConnection)
    {
        # You forgot to turn on 'Create Azure Run As account' 
        $ErrorMessage = "Connection $ConnectionName not found."
        throw $ErrorMessage
    }
    else
    {
        # Something went wrong
        Write-Error -Message $_.Exception.Message
        throw $_.Exception
    }
}
########################################################


# Get Integration Runtime in Azure Data Factory
$IntegrationRuntime = Get-AzureRmDataFactoryV2IntegrationRuntime `
                            -DataFactoryName $DataFactoryName `
                            -ResourceGroupName $ResourceGroup

# Check if Integration Runtime was found
if (!$IntegrationRuntime)
{
    # Your ADF does not have a Integration Runtime
    # or the ADF does not exist
    $ErrorMessage = "No Integration Runtime found in ADF $($DataFactoryName)."
    throw $ErrorMessage
}
# Check if the Integration Runtime is running
elseif ($IntegrationRuntime.State -eq "Started")
{
    <# Start Trigger Deactivation #>
    # Getting trigger to check if it exists
    $Trigger = Get-AzureRmDataFactoryV2Trigger `
  -DataFactoryName $DataFactoryName `
  -Name $TriggerName `
  -ResourceGroupName $ResourceGroup

    # Check if the trigger was found
    if (!$Trigger)
    {
        # Fail options:
        # The ADF does not exist (typo)
        # The trigger does not exist (typo)
        $ErrorMessage = "Trigger $($TriggerName) not found."
        throw $ErrorMessage
    }
    # Check if the trigger is activated
    elseif ($Trigger.RuntimeState -eq "Started")
    {
        Write-Output "Stopping Trigger $($TriggerName)"
        $null = Stop-AzureRmDataFactoryV2Trigger `
                    -DataFactoryName $DataFactoryName `
                    -Name $TriggerName `
                    -ResourceGroupName $ResourceGroup `
                    -Force
    }
    else
    {
        # Write message to screen (not throwing error)
        Write-Output "Trigger $($TriggerName) is not activated."
    }
    <# End Trigger Deactivation #>

    # Stop the integration runtime
    Write-Output "Pausing Integration Runtime $($IntegrationRuntime.Name)."
    $null = Stop-AzureRmDataFactoryV2IntegrationRuntime `
                -DataFactoryName $IntegrationRuntime.DataFactoryName `
                -Name $IntegrationRuntime.Name `
                -ResourceGroupName $IntegrationRuntime.ResourceGroupName `
                -Force
    Write-Output "Done"
}
else
{
    # Write message to screen (not throwing error)
    Write-Output "Integration Runtime $($IntegrationRuntime.Name) is not running."
}




# This scripts resumes your Integration Runtime (and its trigger) if it is stopped

# Parameters
$ConnectionName = 'AzureRunAsConnection'
$DataFactoryName = 'ADF-SSISJoost'
$ResourceGroup = 'joost_van_rossum'
$TriggerName = 'Hourly'

# Do not continue after an error
$ErrorActionPreference = "Stop"

########################################################
# Log in to Azure (standard code)
########################################################
Write-Verbose -Message 'Connecting to Azure'
try
{
    # Get the connection "AzureRunAsConnection "
    $ServicePrincipalConnection = Get-AutomationConnection -Name $ConnectionName         

    'Log in to Azure...'
    $null = Add-AzureRmAccount `
        -ServicePrincipal `
        -TenantId $ServicePrincipalConnection.TenantId `
        -ApplicationId $ServicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $ServicePrincipalConnection.CertificateThumbprint 
}
catch 
{
    if (!$ServicePrincipalConnection)
    {
        # You forgot to turn on 'Create Azure Run As account' 
        $ErrorMessage = "Connection $ConnectionName not found."
        throw $ErrorMessage
    }
    else
    {
        # Something went wrong
        Write-Error -Message $_.Exception.Message
        throw $_.Exception
    }
}
########################################################


# Get Integration Runtime in Azure Data Factory
$IntegrationRuntime = Get-AzureRmDataFactoryV2IntegrationRuntime `
                            -DataFactoryName $DataFactoryName `
                            -ResourceGroupName $ResourceGroup

# Check if Integration Runtime was found
if (!$IntegrationRuntime)
{
    # Your ADF does not have a Integration Runtime
    # or the ADF does not exist
    $ErrorMessage = "No Integration Runtime found in ADF $($DataFactoryName)."
    throw $ErrorMessage
}
# Check if the Integration Runtime is running
elseif ($IntegrationRuntime.State -ne "Started")
{
    # Resume the integration runtime
    Write-Output "Resuming Integration Runtime $($IntegrationRuntime.Name)."
    $null = Start-AzureRmDataFactoryV2IntegrationRuntime `
                -DataFactoryName $IntegrationRuntime.DataFactoryName `
                -Name $IntegrationRuntime.Name `
                -ResourceGroupName $IntegrationRuntime.ResourceGroupName `
                -Force
    Write-Output "Done"
}
else
{
    # Write message to screen (not throwing error)
    Write-Output "Integration Runtime $($IntegrationRuntime.Name) is already running."
}

<# Start Trigger Activation #>
# Getting trigger to check if it exists
$Trigger = Get-AzureRmDataFactoryV2Trigger `
    -DataFactoryName $DataFactoryName `
    -Name $TriggerName `
    -ResourceGroupName $ResourceGroup

# Check if the trigger was found
if (!$Trigger)
{
    # Fail options:
    # The ADF does not exist (typo)
    # The trigger does not exist (typo)
    $ErrorMessage = "Trigger $($TriggerName) not found."
    throw $ErrorMessage
}
# Check if the trigger is activated
elseif ($Trigger.RuntimeState -ne "Started")
{
    Write-Output "Resuming Trigger $($TriggerName)"
    $null = Start-AzureRmDataFactoryV2Trigger `
                -DataFactoryName $DataFactoryName `
                -Name $TriggerName `
                -ResourceGroupName $ResourceGroup `
                -Force
}
else
{
    # Write message to screen (not throwing error)
    Write-Output "Trigger $($TriggerName) is already activated."
}
<# End Trigger Deactivation #>

Note: when you don't want to disable your trigger then remove the lines between <# Start Trigger Deactivation #> and <# End Trigger Deactivation #>.

7) Testing
You can use the Test Pane menu option in the editor to test your PowerShell scripts. When clicking on Run it will first Queue the script before Starting it. Running takes a couple of minutes.
Pausing Integration Runtime











Resuming Integration Runtime (20 minutes+)














8) Publish
When your script is ready, it is time to publish it. Above the editor click on the Publish button. Confirm overriding any previously published versions.
Publish your script















9) Schedule
And now that we have a working and published Azure Runbook, we need to schedule it. Click on Schedule to create a new schedule for your runbook. My packages run each hour during working hours. So for the resume script I created a schedule that runs every working day on 7:00AM. The pause script could for example be scheduled on working days at 9:00PM (21:00).
Now you need to hit the refresh button in the Azure Data Factory dashboard to see if it really works. It takes a few minutes to run, so don't worry too soon.
Add schedule




















Summary
In this post you saw how you can pause and resume your Integration Runtime in ADF to save some money on your Azure bill during the quiet hours. As said before pausing and resuming the trigger is optional. When creating the schedule, keep in mind that resuming/starting takes around 20 minutes to finish and note that you also pay during this startup phase.
Steps 5 to 9 need to be repeated for the resume script after you finished the pause script.

Update: you can also do a pause and resume in the pipeline itself if you only have one ETL job.

Update 2: now even easier done via Rest API
Related Posts Plugin for WordPress, Blogger...