Thursday, 1 March 2018

Azure Blob Source ≠ Flat File Source

Case
I'm running my SSIS packages in Azure and my source is a flat file in an Azure Blob Storage container. Therefor I use the Azure Blob Source as a source in my Data Flow Task. However this source has just a few formatting options compared to the Flat File Source (and its connection manager). I want to specify things like qualifiers and data types. How do I do that?
Azure Blob Source has too few options


























Solution
With the current version of the Azure Blob Source you can only specify the column separator, but there is a workaround available. Your Integration Runtime (IR) that is hosted in ADF is actually a virtual machine with Integration Services on it. A simple Script Task running on that IR reveals the drives and their available space. It shows that we have several drives available on that Virtual Machine.
Log with drive details











// C# Code to replace your Main() method
public void Main()
{
    // Create array with drive information
    System.IO.DriveInfo[] allDrives = System.IO.DriveInfo.GetDrives();

    // Create string to store a message
    String DriveDetails = "";

    // Loop through all drives to get info about it
    foreach (System.IO.DriveInfo d in allDrives)
    {
        // Get drive letter (C:) and type (NTFS)
        DriveDetails = d.Name + "(" + d.DriveType + ")" + Environment.NewLine;

        // If drive is ready you can get more details
        if (d.IsReady == true)
        {
            DriveDetails += " - Volume label: " + d.VolumeLabel + Environment.NewLine;
            DriveDetails += " - File system: " + d.DriveFormat + Environment.NewLine;
            DriveDetails += " - Available space to current user: " + d.AvailableFreeSpace + Environment.NewLine;
            DriveDetails += " - Total available space: " + d.TotalFreeSpace + Environment.NewLine;
            DriveDetails += " - Total size of drive: " + d.TotalSize;
        }

        // Fire the message as warning to stand out between other messages
        Dts.Events.FireWarning(0, "Details", DriveDetails, "", 0);
    }

    // End Script Task
    Dts.TaskResult = (int)ScriptResults.Success;
}

So the solution is to first use the Azure Blob Download Task to download the file from the Blob Storage Container to the Virtual Machine. After that you can use a regular Flat File Source in the Data Flow Task.
Azure Blob Download Task



























I'm not sure what the purpose is of these disks and if one of them is for non-persistent data (disk that are automatically cleaned), but I recommend using the E drive to temporarily store the downloaded files and clean up afterwards.

Windows Temp folder
An alternative to pick a temporarily folder on your IR machine is to use a very simple Script Task with only one line of code that retrieves the path of the Windows temp folder. The path looks something like D:\Users\WATASK_1\AppData\Local\Temp\. If you store this path in an SSIS string variable, then you can use that for expressions on your tasks and Flat File connection manager. After the next reboot Windows removes all old files in this folder.
// C# code (see line 5)
public void Main()
{
    // TODO: Add your code here
    Dts.Variables["User::tempPath"].Value = System.IO.Path.GetTempPath();

    Dts.TaskResult = (int)ScriptResults.Success;
}

Script Task that fills a variable




No comments:

Post a Comment

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.