Tuesday, 4 January 2011

Unzip files with SSIS

Case
My source files are zipped and I want to unzip them within the SSIS package.

Solution
There are a lot of options to unzip files within SSIS. For example:
* Update: my own unzip task *
 
 
 
 

This solution uses the Microsoft Visual J# Redistributable Packages (don't worry you don't have to write J#), because it's free and you don't have use a third party or opensource dll on a production server which is often a big deal.

1) Visual J# Redistributable Packages
Install Microsoft Visual J# Redistributable Packages on your machine(s). You can download the correct version (x84, x64, IA64) at Microsoft.

2) Variables
Create two String variables named ZipFile and UnZipFolder to store the zip filename and the unzip folder.
Two String variables









3) Loop through folder with zip files
Create a foreach loop that loops through a folder and looks for zipfiles.
Foreach (zip)file enumerator



















4) Map variable
Map the ZipFile variable to index 0 in the Foreach Loop.
Variable Mappings



















5) Script task with variables
Create a Script task within the Foreach loop and select the two variables as ReadOnly.
ReadOnlyVariables


















6) Unzip with Script task
Add a reference to the Microsoft Visual J# Redistributable Packages dll in your script. The name of the .NET dll is vjslib. After you have added it, it shows on the right site (3).
In Project menu, Add Reference...
















After that add the following code (method and usings)
// C# code
using System;
using System.Data;
using System.IO;        // Added
using java.util.zip;    // Added
using java.io;          // Added
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_2651c39f95ad4031897db646b9d61275.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion


        public void Main()
        {
            // Variables for zipfile and unzip folder
            String zipfilePath;
            zipfilePath = Dts.Variables["ZipFile"].Value.ToString();
            String unZipFolderPath;
            unZipFolderPath = Dts.Variables["UnZipFolder"].Value.ToString();

            // Two variables needed for unzipping
            sbyte[] buf = new sbyte[1024];
            int len;

            // Filestreams for the zipfile
            FileInputStream fis = new FileInputStream(zipfilePath);
            ZipInputStream zis = new ZipInputStream(fis);
            ZipEntry ze;

            // Loop through all files in the zipfile
            while ((ze = zis.getNextEntry()) != null)
            {
                // File name format in zip file is:
                // folder/subfolder/filename
                string fileName = ze.getName();

                // Get the filename out of the complete path
                int index = fileName.LastIndexOf('/');
                // If file
                if (index > 1)
                {
                    // Determine directory
                    string folder = fileName.Substring(0, index);
                    DirectoryInfo di = new DirectoryInfo(unZipFolderPath + folder);

                    // Create directory in UnZipFolder if not exists
                    if (!di.Exists)
                    {
                        di.Create();
                    }
                }
                // Extract file
                java.io.FileOutputStream fos = new java.io.FileOutputStream(unZipFolderPath + fileName);
                while ((len = zis.read(buf)) >= 0)
                {
                    fos.write(buf, 0, len);
                }
                fos.close();
            }
            zis.close();
            fis.close();

            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

5) Continuing...
Now add your own tasks to continue the Control Flow. It could look something like this.
Loop through unzipped
files and do something



















Note: This is a very basic script. You can add tons of checks and other features to make sure the script always works. Let me known if you have any questions. Zipping of files will be handled in another article.

Download the SSIS 2008 example package.

24 comments:

  1. Thank you for the post. That's brilliant.

    One more question that how can I know if unzip process is done and make the next control-flow step start as a natural followup?

    ReplyDelete
  2. @Anonymous: The unzip process is finished when the Script Task is finnished. So just connect the next task with a Precedence Constraints.

    ReplyDelete
  3. Thank you for the reply.

    You're right. I added one variable inside my SSIS package to double confirm:
    1. unzip is done on one hand
    2. the file is ready there for the next step to pick up.

    ReplyDelete
  4. Hey mate, thanks for your solution, works almost perfectly! My only problem is, that I should unzip files with a size of 50MB+, and I get the error below...

    Do you have any ideas, how to solve this problem? With smaller Zips, everything works like a charm...

    Error: System.Reflection.TargetInvocationException: Ein Aufrufziel hat einen Ausnahmefehler verursacht.
    ---> java.io.FileNotFoundException: Die Dateigröße überschreitet die maximal zulässige Größe und kann nicht gespeichert werden.

    --- Ende der internen Ausnahmestapelüberwachung ---
    bei System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
    bei System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
    bei System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
    bei System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
    bei System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
    bei System.Type.InvokeMember(String name, BindingFlags invokeAttr, Binder binder, Object target, Object[] args, CultureInfo culture)
    bei Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()

    ReplyDelete
  5. @Anonymous:
    I can't find a maximum size in the documentation of J# java.io. Could you debug the Script Task or add some error handling (try catch) to see where it fails?

    ReplyDelete
  6. @Joost

    Thanks for your input!

    The problem was a limitation of WebDav... After changing the limits on the registry keys, everything works finde now with your script...

    Thanks!

    http://www.deploymentzone.com/2008/08/01/sharepoint-webdav-and-http-not-https/

    ReplyDelete
  7. I am using the same code above but the folder is creating but the content is not copying it take the filename as the zipfilename but it should take the filename inside the zip file.
    Please let me wat to do?

    ReplyDelete
  8. @Gurushankar: Not sure what you could have done wrong. I have added the example package so you can download it. Let me know if it worked.

    ReplyDelete
    Replies
    1. Do you have a version that works with SSIS 2012?

      Delete
    2. Did you find a suitable Visual J# Redistributable Packages for your O.S.? Which O.S. are you using?

      Delete
  9. This works well in 2008, but does not work in 2012. I get the following DTS Script Task Runtime Error:

    at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
    at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
    at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
    at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()

    ReplyDelete
    Replies
    1. Did you find a suitable Visual J# Redistributable Packages for your O.S.? Which O.S. are you using?

      Delete
    2. Sorry, I replied to the wrong user. I am using SSIS in SQL Server 2012. I am using Windows 7 as my development environment, but for production, it will be Windows 2012. I have searched for a Visual J# Redistributable Packages for my O.S. The version that I have installed is version 2.0. In SSIS when I go to add the reference in the script task, I do not see it under the .Net tab. Thanks. I do not know if I have installed the correct version for my O.S.
      Thanks,
      Natasha

      Delete
    3. Not gonna work for 2012. Try one of the third party zip tasks (I created one myself) or use one of the .NET Zip libs.

      Other alternatives are the execute process task.

      Delete
    4. Thanks for your help. I will use your zip task.

      Delete
  10. after i'm download your package, and i execute that package in SSIS i have error like this bellow, can you help me

    Error: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.IO.FileNotFoundException: Could not load file or assembly 'vjslib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.
    File name: 'vjslib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
    at ST_2651c39f95ad4031897db646b9d61275.csproj.ScriptMain.Main()

    WRN: Assembly binding logging is turned OFF.
    To enable assembly bind failure logging, set the registry value [HKLM\Software\Microsoft\Fusion!EnableLog] (DWORD) to 1.
    Note: There is some performance penalty associated with assembly bind failure logging.
    To turn this feature off, remove the registry value [HKLM\Software\Microsoft\Fusion!EnableLog].

    --- End of inner exception stack trace ---
    at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
    at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
    at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()

    ReplyDelete
    Replies
    1. Which version of SSIS are you using? And on which operating system is SSIS installed? And did you find a suitable Visual J# Redistributable Packages for your O.S.?

      Delete
    2. what versions would you recommend us to implement the above pacakge. thanks.

      Delete
    3. Visual J# Redistributable Packages is getting a bit outdated... could getting it to work on newer O.S. versions. Try the free zip task or one of the other solutions.

      Delete
  11. Hi Joost,

    Thank you for the great post. But can you fix the link of Script task with System.IO.Compression (works only for gzip)?

    I really need to do unzipping of gzip format and I am kinda stuck. Also appreciated if any other posts and solutions to gzip you might know.

    ReplyDelete
    Replies
    1. I refreshed the URL... I haven't yet studied the gzip possibilities, but I get more request about it lately. So maybe I will create a custom task for it. (perhaps January if I have time)

      Delete
    2. Thank you Joost. Yes I also noticed that. I managed to do it with simple method using IO.Compression. But it will still be a good extenstion to your custom task :)

      Enjoy.

      Delete
  12. great post, thanks a lot.

    ReplyDelete
  13. With not having dot NET 4.5 installed this is a brilliant and easy solution. Took 5 minutes to install the package and add the code. Ran perfectly first time. Thanks.

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.