Microsoft released the SSIS Feature Pack for Microsoft Azure (2012, 2014), but how do the Azure HDInsight Cluster Task work?
Create / Delete HDInsight Cluster |
Solution
With these tasks you can create an HDInsight cluster (and then do some Hive or Pig tasks) and then delete it when you're ready with the cluster.
1) Storage Account
First make sure you have an Azure account that has an storage account in it. Besides the name the location is also important. You need to use the same location in the Azure HDInsight Create Cluster Task. Mine is called SSISJoost.
Storage Account |
2) Access Keys
Click on the Manage Access Keys icon to get the primary access key. We need this for the Azure Storage Connection Manager.
Storage Account Access Keys |
3) Connection Manager for Azure Storage
If you haven't already installed the SSIS Feature Pack for Microsoft Azure then now it's time to do that. Create a new Connection Manager for Azure Storage by right clicking the Connection Managers Pane. Then choose New Connection... and then AzureStorage. Fill in the Storage account name from step 1 and the Access key from step 2. Test the connection and click OK to save the new Connection Manager.
SSIS Connection Manager for Azure Storage |
4) Makecert.exe (Certificate Creation Tool)
Open the Visual Studio Command prompt to create a new certificate with Makecert.exe. The command is as follows (but replace SSISJoostCertificate by your own name, twice):
makecert -sky exchange -r -n "CN=SSISJoostCertificate" -pe -a sha1 -len 2048 -ss My "SSISJoostCertificate.cer"
Makecert.exe |
5) Azure Subscription ID
Go to manage.windowsazure.com and locate your Subscription ID under settings. You need this in one of the next steps.
Locate Subscription Id |
6) Upload Certificate
Go to manage.windowsazure.com and then to Settings (1) and then to Management certificates (2). Upload (3) the .cer file created in step 4 and notice the thumbprint (4).
Management certificates |
7) Azure Subscription Connection Manager
Create a new Azure Subscription Connection Manager by right clicking the Connection Managers Pane. Then choose New Connection... and then AzureSubscription. Fill in the Azure Subscription ID from step 5 and browse to find your certificate, The thumbprint should be the same as in step 6. Test the connection and click OK to save the new Connection Manager.
Azure Subcription |
8) Azure HDInsight Create Cluster Task
Add the Azure HDInsight Create Cluster Task to the surface of the control flow. Give it a suitable name and then edit the task. Under connections you must select the two newly created connection managers (step 3 and 7). And then change the General properties:
ClusterName: the name of your cluster
ClusterSizeInNodes: the number of nodes (be careful or you get a high invoice if you choose a high number)
StorageContainerName: specify a container from your storage account to store the data in
UserName: specify a new username
Password: specify a new password
Location: choose the same location as your storage account (see step 1)
FailIfExists: Specify whether the task should fail if it already exists
Azure HDInsight Create Cluster Task |
9) Test
Run the task and check whether a HDInsight cluster is created. If could take a while (single node in North Europe took ± 20 minutes). When it is ready you can performe a Hyve or Pig Task on this cluster.
HDInsight Cluster |
10) Delete cluster
When your PIG and/or Hive Tasks are ready you can delete the cluster with the Azure HDInsight Cluster Task. Drag it to the surface and give it a suitable name. Edit it and select the Azure Subscription Connection Manager from step 7. The ClusterName is the name of the cluster you want to delete (same name as in step 8). And the FailIfNotExists indicates whether the task should fail if the cluster is already deleted. Now run the task to delete the cluster. Should be a lot faster than creating a new cluster.
Azure HDInsight Cluster Task |
Hi Joost, great write-up as usual. Do you know if there's any way to modify the configuration options of the HDInsight cluster using this component? It creates 2 Head Nodes and n Worker nodes, all on A3 instances by default.
ReplyDeleteI'm wondering if you know of a way to play around with this config, say specify 2 Head Nodes as D12 instances and n Workers as D3's, for example?
I think you will have to use some .NET or PowerShell script for that. The out of the box task doesn't have such options.
Delete