Monday, 6 August 2012

Validating XML file against XSD with SSIS - part II Script Task

Case
I have an XML file which I want to validate against an XSD file before processing it. However my XSD contains an include (or import) tag and the XML Task will not validate it and throws an error about certain parts that are not declared.

How can I validate an XML file against an XSD with an include (or import) tag?

Solution
Although the XML source component honors includes (and imports), the XML task doesn't. See(/vote for) this Microsoft Connect feedback.

You can use a Script Task to validate your XML file against the XSD file. For this example I use three files: movies.xmlmovies2.xsd and movies2_include.xsd.
<?xml version="1.0"?>
<movies>
 <movie id="123">
  <title>Pulp Fiction</title>
  <year>1994</year>
  <rating>8.9</rating>
 </movie>
 <movie id="124">
  <title>Memento</title>
  <year>2000</year>
  <rating>8.6</rating>
 </movie>
 <movie id="124">
  <title>The Usual Suspects</title>
  <year>2000</year>
  <rating>8.6</rating>
 </movie>
</movies>

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:include schemaLocation="movies2_include.xsd"/>

  <xs:element name="movies">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" maxOccurs="unbounded" name="movie">
          <xs:complexType>
            <xs:sequence>
              <xs:element minOccurs="0" name="title" type="LimitedString" />
              <xs:element minOccurs="0" name="year" type="xs:unsignedShort" />
              <xs:element minOccurs="0" name="rating" type="xs:decimal" />
            </xs:sequence>
            <xs:attribute name="id" type="xs:unsignedByte" use="optional" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:simpleType name="LimitedString"> 
    <xs:restriction base="xs:string"> 
      <xs:maxLength value="50" /> 
    </xs:restriction> 
  </xs:simpleType> 
</xs:schema>

1) New File Connection
I will use Connection Managers in this example to make things easier to configure. Feel free to use something else such as variables. Right click in the Connection Managers pane and select "New File Connection...".
New File Connection for XML and XSD file






















Select the xml file in the File Connection Manager Editor and repeat these steps for the XSD file (movies2.xsd) so that you have two Connection Managers named movies.xml and movies2.xsd. You don't have to create a Connection Manager for the include xsd, because it's included by code.
File Connection Manager Editor













2) Script Task
Add an Script Task to your Control Flow and give it a suitable name. You can connect it to your Data Flow Task that processes the XML file.
Script Task to validate XML






















3) The script
Edit the Script Task and choose your Script Language, hit the edit button to edit the script and copy the code from the main method below to your main method and add the import/include row.
// C# Code
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Xml;   // Added
#endregion

namespace ST_d2a4d346c17d488da0cd327617293c52
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            // create object for the XSD file that will be used
            // for validating the XML file. Use the Connection
            // Manager to get the path from the XSD file.
            XmlReaderSettings xmlrs = new XmlReaderSettings();
            xmlrs.ValidationType = ValidationType.Schema;
            xmlrs.Schemas.Add(null, Dts.Connections["movies2.xsd"].ConnectionString);
            xmlrs.Schemas.Compile();

            // Try reading the XML file using the XSD. Use the
            // Connection Manager to get the path from the XML file
            try
            {
                XmlReader xmlr = XmlReader.Create(Dts.Connections["movies.xml"].ConnectionString, xmlrs);
                while (xmlr.Read())
                {
                }
                // The XML file was succesfully read.
                
                // Close XML file
                xmlr.Close();

                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception ex)
            {
                // Validation failed, fire error event with errormessage
                Dts.Events.FireError(-1, "Validate XML", "Validation error: " + ex.Message, string.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }

        #region ScriptResults declaration
        /// 
        /// This enum provides a convenient shorthand within the scope of this class for setting the
        /// result of the script.
        /// 
        /// This code was generated automatically.
        /// 
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion
    }
}
or VB.net
' VB.Net Code
#Region "Imports"
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Xml      ' Added
#End Region

<Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute()> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
    Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    Public Sub Main()
        ' create object for the XSD file that will be used
        ' for validating the XML file. Use the Connection
        ' Manager to get the path from the XSD file.
        Dim xmlrs As New XmlReaderSettings()
        xmlrs.ValidationType = ValidationType.Schema
        xmlrs.Schemas.Add(Nothing, Dts.Connections("movies2.xsd").ConnectionString)
        xmlrs.Schemas.Compile()

        ' Try reading the XML file using the XSD. Use the
        ' Connection Manager to get the path from the XML file
        Try
            Dim xmlr As XmlReader = XmlReader.Create(Dts.Connections("movies.xml").ConnectionString, xmlrs)
            While xmlr.Read()
            End While
            ' The XML file was succesfully read.

            ' Close XML file
            xmlr.Close()

            Dts.TaskResult = ScriptResults.Success
        Catch ex As Exception
            ' Validation failed, fire error event with errormessage
            Dts.Events.FireError(-1, "Validate XML", "Validation error: " + ex.Message, String.Empty, 0)
            Dts.TaskResult = ScriptResults.Failure
        End Try
    End Sub

#Region "ScriptResults declaration"
    'This enum provides a convenient shorthand within the scope of this class for setting the
    'result of the script.

    'This code was generated automatically.
    Enum ScriptResults
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    End Enum

#End Region

End Class

Note: For those using SSIS 2005: use the taskresult row from 2005 that can be found in your main method and replace the followin lines: C# line 34 & 40 / VB.Net line 31 & 35.


4) Testing
Run the package. Change something in the XML file to make it fail (a movie title longer than 50 chars).
Test result










Sunday, 5 August 2012

Validating XML file against XSD with SSIS - part I

Case
I have an XML file which I want to validate before processing it. How can I validate an XML file against an XSD file with SSIS?

Solution
There are two solutions in SSIS for validating an XML file.
A) XML Task, but the big downside is that it can't validate against XSD files with import or include tags. See(/vote for) this Microsoft Connect feedback.
B) Script Task, but that requires some .Net knowledge.

In this post I will elaborate the XML Task solution and in the next post I will workout the Script Task solution.

XML Task
For this example I use two files: movies.xml and movies.xsd
<?xml version="1.0"?>
<movies>
 <movie id="123">
  <title>Pulp Fiction</title>
  <year>1994</year>
  <rating>8.9</rating>
 </movie>
 <movie id="124">
  <title>Memento</title>
  <year>2000</year>
  <rating>8.6</rating>
 </movie>
 <movie id="124">
  <title>The Usual Suspects</title>
  <year>2000</year>
  <rating>8.6</rating>
 </movie>
</movies>

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:simpleType name="LimitedString"> 
    <xs:restriction base="xs:string"> 
      <xs:maxLength value="50" /> 
    </xs:restriction> 
  </xs:simpleType> 

  <xs:element name="movies">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" maxOccurs="unbounded" name="movie">
          <xs:complexType>
            <xs:sequence>
              <xs:element minOccurs="0" name="title" type="LimitedString" />
              <xs:element minOccurs="0" name="year" type="xs:unsignedShort" />
              <xs:element minOccurs="0" name="rating" type="xs:decimal" />
            </xs:sequence>
            <xs:attribute name="id" type="xs:unsignedByte" use="optional" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

1) New File Connection
I will use Connection Managers in this example to make things easier to configure. Feel free to use something else such as variables. Right click in the Connection Managers pane and select "New File Connection...".
New File Connection for XML and XSD file






















Select the xml file in the File Connection Manager Editor and repeat these steps for the XSD file so that you have two Connection Managers named movies.xml and movies.xsd
File Connection Manager Editor













2) XML Task
Add an XML Task to your Control Flow and give it a suitable name. You can connect it to your Data Flow Task that processes the XML file.
XML Task






















3) XML Task editor
Edit the XML Task and select the XML Connection Manager as input and the XSD Connection Manager as Second Operand. OperationType should be Validate and ValidationType should be XSD.
Set XML Task editor for validation






















4) Testing
Run the package. Change something in the XML file to make it fail (a movie title longer than 50 chars).
The result

















Note: If you try to validate against an XSD with an include or import, you will get an error that certain parts are not declared. The next post will be about validating an XML file against an XSD file with a Script Task.

Friday, 3 August 2012

SSMS: Prevent saving changes that require table re-creation

Case
When changing a column in a table I get this message preventing me to save the changes:
Saving changes is not permitted. The changes you have made require the
following tables to be dropped and recreated. You have either made
changes to a table that can't be re-created or enabled the option
Prevent saving changes that require the table to be re-created.


























This question got nothing to do with SSIS self, but editing tables is a common task for SSIS developers. And I'm always browsing a couple of minutes to find the right option to disable.


Solution
1) Menu Tools
Go to the Tools menu and select "Options..."
Tools


















2) Options, Designers
Go to Designers and disable Prevent saving changes that require table re-creation.
Designers

Wednesday, 1 August 2012

Loop through all connection managers

Case
I want log all my connection managers, delete all my excel files, check whether my flat files exists. And I don't want to add task for all of them.

Solution
You can use a Script Task to loop through the collection of Connection Managers and do something with the connection manager (log, delete, check, etc.). Either do the task in .Net code or fill a SSIS object variable which can be used in Foreach From Variable Enumerator. I will show you both.

Note: these solutions don't work for dynamic Connection Managers. For example with a foreach loop and expressions on the connectionstring.


A) Script Task only solution
Add a Script Task to the Control Flow and edit it. Copy the code of this main method to your main method. There are a couple of examples so adjust it to your own needs.
// C# Code
// This isn't a complete solution.
// There are a couple of examples.
// Adjust them to you own needs.    
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO; // Added
namespace ST_ae645b8974b54c7abd6d5058ded524b6.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            // Used for information events
            Boolean fireAgain = true;

  
            // Loop through all your package connections
            foreach (ConnectionManager connectionManager in Dts.Connections)
            {
                // Optional filter on TYPE. For example FLATFILE, OLEDB, EXCEL
                // Let's check if the flatfile exists
                if (connectionManager.CreationName.Equals("FLATFILE"))
                {
                    if (File.Exists(connectionManager.ConnectionString))
                    {
                        Dts.Events.FireInformation(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString + ") does exist", string.Empty, 0, ref fireAgain);
                    }
                    else
                    {
                        Dts.Events.FireError(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString + ") does not exist", string.Empty, 0);
                    }
                }


                // Optional filter on PREFIX. For example starting with "tmp"
                // Let's delete all tmp files.
                if (connectionManager.Name.StartsWith("tmp"))
                {
                    try
                    {
                        File.Delete(connectionManager.ConnectionString);
                        Dts.Events.FireInformation(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString + ") was deleted", string.Empty, 0, ref fireAgain);
                    }
                    catch (Exception Ex)
                    {
                        Dts.Events.FireError(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString + ") not deleted. Error: " + Ex.Message, string.Empty,0);
                    }
                }


                // Optional filter on TYPE.
                // Let's move all excel files to a certain folder.
                if (connectionManager.CreationName.Equals("EXCEL"))
                {
                    // Excel connection string contains a lot more then just the file path. We need to extract the ExcelFilePath property
                    // Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\MyExcelFile.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
                    string ExcelFilePath = connectionManager.Properties["ExcelFilePath"].GetValue(connectionManager).ToString();
                    
                    FileInfo myExcelFile = new FileInfo(ExcelFilePath);
                    File.Move(ExcelFilePath, @"D:\MyExcelFiles\Archive\" + myExcelFile.Name);
                }


                // Just log all connection managers
                Dts.Events.FireInformation(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString + ")", string.Empty, 0, ref fireAgain);
            }

            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

or with VB.Net
' VB.Net code
' This isn't a complete solution.
' There are a couple of examples.
' Adjust them to you own needs.  
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO ' Added

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
    Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    Enum ScriptResults
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    End Enum

    Public Sub Main()
        ' Used for information events
        Dim fireAgain As [Boolean] = True


        ' Loop through all your package connections
        For Each connectionManager As ConnectionManager In Dts.Connections
            ' Optional filter on TYPE. For example FLATFILE, OLEDB, EXCEL
            ' Let's check if the flatfile exists
            If connectionManager.CreationName.Equals("FLATFILE") Then
                If File.Exists(connectionManager.ConnectionString) Then
                    Dts.Events.FireInformation(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString & ") does exist", String.Empty, 0, fireAgain)
                Else
                    Dts.Events.FireError(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString & ") does not exist", String.Empty, 0)
                End If
            End If


            ' Optional filter on PREFIX. For example starting with "tmp"
            ' Let's delete all tmp files.
            If connectionManager.Name.StartsWith("tmp") Then
                Try
                    File.Delete(connectionManager.ConnectionString)
                    Dts.Events.FireInformation(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString & ") was deleted", String.Empty, 0, fireAgain)
                Catch Ex As Exception
                    Dts.Events.FireError(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString & ") not deleted. Error: " & Ex.Message, String.Empty, 0)
                End Try
            End If


            ' Optional filter on TYPE.
            ' Let's move all excel files to a certain folder.
            If connectionManager.CreationName.Equals("EXCEL") Then
                ' Excel connection string contains a lot more then just the file path. We need to extract the ExcelFilePath property
                ' Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\MyExcelFile.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
                Dim ExcelFilePath As String = connectionManager.Properties("ExcelFilePath").GetValue(connectionManager).ToString()

                Dim myExcelFile As New FileInfo(ExcelFilePath)
                File.Move(ExcelFilePath, "D:\MyExcelFiles\Archive\" & myExcelFile.Name)
            End If


            ' Just log all connection managers
            Dts.Events.FireInformation(-1, "Connection Manager Loop", connectionManager.Name + " (" + connectionManager.ConnectionString & ")", String.Empty, 0, fireAgain)
        Next

        Dts.TaskResult = ScriptResults.Success
    End Sub
End Class

B) Foreach Connection Manager Enumerator

1) Variables
Add an Object variable and name it connectionManagers and add a String variable named connectionString.
Variables










2) Script Task
Add a Script Task to the Control Flow and give it a suitable name.
Script Task
















3) ReadWriteVariable
Add the object variable from step 1 as ReadWriteVariable.
ReadWriteVariable
























4) The Script
Edit the script and copy the code from the main method to your main method.
// C# Code
// This isn't a complete solution.
// There are a couple of examples.
// Adjust them to you own needs.    
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_ae645b8974b54c7abd6d5058ded524b6.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            // Create array list for storing the connection managers
            System.Collections.ArrayList connectionManagers = new System.Collections.ArrayList();

            // Loop through all your package connections
            foreach (ConnectionManager connectionManager in Dts.Connections)
            {
                // Optional filter on TYPE. For example FLATFILE, OLEDB, EXCEL
                // Let's check if the flatfile exists
                if (connectionManager.CreationName.Equals("FLATFILE"))
                {
                    // Add item to array list
                    connectionManagers.Add(connectionManager.ConnectionString);
                }


                // Optional filter on TYPE.
                // Let's move all excel files to a certain folder.
                if (connectionManager.CreationName.Equals("EXCEL"))
                {
                    // Excel connection string contains a lot more then just the file path. We need to extract the ExcelFilePath property
                    // Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\MyExcelFile.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
                    string ExcelFilePath = connectionManager.Properties["ExcelFilePath"].GetValue(connectionManager).ToString();

                    // Add item to array list
                    connectionManagers.Add(ExcelFilePath);
                }
            }

            // Fill object variable with array list
            Dts.Variables["User::connectionManagers"].Value = connectionManagers;

            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

or VB.Net

' VB.Net code
' This isn't a complete solution.
' There are a couple of examples.
' Adjust them to you own needs.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 
    Public Sub Main()
        ' Create array list for storing the connection managers
        Dim connectionManagers As New System.Collections.ArrayList()

        ' Loop through all your package connections
        For Each connectionManager As ConnectionManager In Dts.Connections
            ' Optional filter on TYPE. For example FLATFILE, OLEDB, EXCEL
            ' Let's check if the flatfile exists
            If connectionManager.CreationName.Equals("FLATFILE") Then
                ' Add item to array list
                connectionManagers.Add(connectionManager.ConnectionString)
            End If


            ' Optional filter on TYPE.
            ' Let's move all excel files to a certain folder.
            If connectionManager.CreationName.Equals("EXCEL") Then
                ' Excel connection string contains a lot more then just the file path. We need to extract the ExcelFilePath property
                ' Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\MyExcelFile.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
                Dim ExcelFilePath As String = connectionManager.Properties("ExcelFilePath").GetValue(connectionManager).ToString()

                ' Add item to array list
                connectionManagers.Add(ExcelFilePath)
            End If
        Next

        ' Fill object variable with array list
        Dts.Variables("User::connectionManagers").Value = connectionManagers

        Dts.TaskResult = ScriptResults.Success
    End Sub
End Class

5) Add Foreach Loop
Add a foreach loop and choose Foreach From Variable Enumerator. Select the Object variable from step 1 as the variable to loop through.
Foreach From Variable Enumerator




























6) Variable Mapping
Go to the Variable Mappings tab and add the String variable from step 1.
Variable Mappings






















7) The Result
I added a Script Task with a Messagebox to test the loop.
The result
Related Posts Plugin for WordPress, Blogger...