Monday, 6 August 2012

Validating XML file against XSD with SSIS - part II Script Task

Case
I have an XML file which I want to validate against an XSD file before processing it. However my XSD contains an include (or import) tag and the XML Task will not validate it and throws an error about certain parts that are not declared.

How can I validate an XML file against an XSD with an include (or import) tag?

Solution
Although the XML source component honors includes (and imports), the XML task doesn't. See(/vote for) this Microsoft Connect feedback.

You can use a Script Task to validate your XML file against the XSD file. For this example I use three files: movies.xmlmovies2.xsd and movies2_include.xsd.
<?xml version="1.0"?>
<movies>
 <movie id="123">
  <title>Pulp Fiction</title>
  <year>1994</year>
  <rating>8.9</rating>
 </movie>
 <movie id="124">
  <title>Memento</title>
  <year>2000</year>
  <rating>8.6</rating>
 </movie>
 <movie id="124">
  <title>The Usual Suspects</title>
  <year>2000</year>
  <rating>8.6</rating>
 </movie>
</movies>

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:include schemaLocation="movies2_include.xsd"/>

  <xs:element name="movies">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" maxOccurs="unbounded" name="movie">
          <xs:complexType>
            <xs:sequence>
              <xs:element minOccurs="0" name="title" type="LimitedString" />
              <xs:element minOccurs="0" name="year" type="xs:unsignedShort" />
              <xs:element minOccurs="0" name="rating" type="xs:decimal" />
            </xs:sequence>
            <xs:attribute name="id" type="xs:unsignedByte" use="optional" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:simpleType name="LimitedString"> 
    <xs:restriction base="xs:string"> 
      <xs:maxLength value="50" /> 
    </xs:restriction> 
  </xs:simpleType> 
</xs:schema>

1) New File Connection
I will use Connection Managers in this example to make things easier to configure. Feel free to use something else such as variables. Right click in the Connection Managers pane and select "New File Connection...".
New File Connection for XML and XSD file






















Select the xml file in the File Connection Manager Editor and repeat these steps for the XSD file (movies2.xsd) so that you have two Connection Managers named movies.xml and movies2.xsd. You don't have to create a Connection Manager for the include xsd, because it's included by code.
File Connection Manager Editor













2) Script Task
Add an Script Task to your Control Flow and give it a suitable name. You can connect it to your Data Flow Task that processes the XML file.
Script Task to validate XML






















3) The script
Edit the Script Task and choose your Script Language, hit the edit button to edit the script and copy the code from the main method below to your main method and add the import/include row.
// C# Code
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Xml;   // Added
#endregion

namespace ST_d2a4d346c17d488da0cd327617293c52
{
    [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        public void Main()
        {
            // create object for the XSD file that will be used
            // for validating the XML file. Use the Connection
            // Manager to get the path from the XSD file.
            XmlReaderSettings xmlrs = new XmlReaderSettings();
            xmlrs.ValidationType = ValidationType.Schema;
            xmlrs.Schemas.Add(null, Dts.Connections["movies2.xsd"].ConnectionString);
            xmlrs.Schemas.Compile();

            // Try reading the XML file using the XSD. Use the
            // Connection Manager to get the path from the XML file
            try
            {
                XmlReader xmlr = XmlReader.Create(Dts.Connections["movies.xml"].ConnectionString, xmlrs);
                while (xmlr.Read())
                {
                }
                // The XML file was succesfully read.
                
                // Close XML file
                xmlr.Close();

                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception ex)
            {
                // Validation failed, fire error event with errormessage
                Dts.Events.FireError(-1, "Validate XML", "Validation error: " + ex.Message, string.Empty, 0);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }

        #region ScriptResults declaration
        /// 
        /// This enum provides a convenient shorthand within the scope of this class for setting the
        /// result of the script.
        /// 
        /// This code was generated automatically.
        /// 
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion
    }
}
or VB.net
' VB.Net Code
#Region "Imports"
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Xml      ' Added
#End Region

<Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute()> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
    Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

    Public Sub Main()
        ' create object for the XSD file that will be used
        ' for validating the XML file. Use the Connection
        ' Manager to get the path from the XSD file.
        Dim xmlrs As New XmlReaderSettings()
        xmlrs.ValidationType = ValidationType.Schema
        xmlrs.Schemas.Add(Nothing, Dts.Connections("movies2.xsd").ConnectionString)
        xmlrs.Schemas.Compile()

        ' Try reading the XML file using the XSD. Use the
        ' Connection Manager to get the path from the XML file
        Try
            Dim xmlr As XmlReader = XmlReader.Create(Dts.Connections("movies.xml").ConnectionString, xmlrs)
            While xmlr.Read()
            End While
            ' The XML file was succesfully read.

            ' Close XML file
            xmlr.Close()

            Dts.TaskResult = ScriptResults.Success
        Catch ex As Exception
            ' Validation failed, fire error event with errormessage
            Dts.Events.FireError(-1, "Validate XML", "Validation error: " + ex.Message, String.Empty, 0)
            Dts.TaskResult = ScriptResults.Failure
        End Try
    End Sub

#Region "ScriptResults declaration"
    'This enum provides a convenient shorthand within the scope of this class for setting the
    'result of the script.

    'This code was generated automatically.
    Enum ScriptResults
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    End Enum

#End Region

End Class

Note: For those using SSIS 2005: use the taskresult row from 2005 that can be found in your main method and replace the followin lines: C# line 34 & 40 / VB.Net line 31 & 35.


4) Testing
Run the package. Change something in the XML file to make it fail (a movie title longer than 50 chars).
Test result










6 comments:

  1. Thank you So much. It has really helped me.

    ReplyDelete
  2. Thank you very much! Your script helped me a lot.
    Best wishes :)

    ReplyDelete
  3. Hello,

    How can you validate the XML file before validating it against the structure defined in the XSD??
    I've a file containing HTML instead of xml and the package identifies it correctly as not valid regarding the XSD, however, the failure fails also the complete package....
    As I have more than 300K xml files to parse, I don't want to restart the job everytime such a content is encountered.

    Many thanks for your reactions

    ReplyDelete
    Replies
    1. It's best to ask this question in the SSIS forum. More people see your question and it makes it easier to reply.

      Do you use a Foreach Loop to handle all your files? You can ignore errors within the loop and continue looping.

      Delete
  4. Good Stuff.. Thanks a lot.. it resolved my issue...

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.