I have an XML file which I want to validate against an XSD file before processing it. However my XSD contains an include (or import) tag and the XML Task will not validate it and throws an error about certain parts that are not declared.
How can I validate an XML file against an XSD with an include (or import) tag?
Solution
Although the XML source component honors includes (and imports), the XML task doesn't. See(/vote for) this Microsoft Connect feedback.
You can use a Script Task to validate your XML file against the XSD file. For this example I use three files: movies.xml, movies2.xsd and movies2_include.xsd.
<?xml version="1.0"?> <movies> <movie id="123"> <title>Pulp Fiction</title> <year>1994</year> <rating>8.9</rating> </movie> <movie id="124"> <title>Memento</title> <year>2000</year> <rating>8.6</rating> </movie> <movie id="124"> <title>The Usual Suspects</title> <year>2000</year> <rating>8.6</rating> </movie> </movies>
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:include schemaLocation="movies2_include.xsd"/> <xs:element name="movies"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" maxOccurs="unbounded" name="movie"> <xs:complexType> <xs:sequence> <xs:element minOccurs="0" name="title" type="LimitedString" /> <xs:element minOccurs="0" name="year" type="xs:unsignedShort" /> <xs:element minOccurs="0" name="rating" type="xs:decimal" /> </xs:sequence> <xs:attribute name="id" type="xs:unsignedByte" use="optional" /> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:simpleType name="LimitedString"> <xs:restriction base="xs:string"> <xs:maxLength value="50" /> </xs:restriction> </xs:simpleType> </xs:schema>
1) New File Connection
I will use Connection Managers in this example to make things easier to configure. Feel free to use something else such as variables. Right click in the Connection Managers pane and select "New File Connection...".
New File Connection for XML and XSD file |
Select the xml file in the File Connection Manager Editor and repeat these steps for the XSD file (movies2.xsd) so that you have two Connection Managers named movies.xml and movies2.xsd. You don't have to create a Connection Manager for the include xsd, because it's included by code.
File Connection Manager Editor |
2) Script Task
Add an Script Task to your Control Flow and give it a suitable name. You can connect it to your Data Flow Task that processes the XML file.
Script Task to validate XML |
3) The script
Edit the Script Task and choose your Script Language, hit the edit button to edit the script and copy the code from the main method below to your main method and add the import/include row.
// C# Code #region Namespaces using System; using System.Data; using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; using System.Xml; // Added #endregion namespace ST_d2a4d346c17d488da0cd327617293c52 { [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase { public void Main() { // create object for the XSD file that will be used // for validating the XML file. Use the Connection // Manager to get the path from the XSD file. XmlReaderSettings xmlrs = new XmlReaderSettings(); xmlrs.ValidationType = ValidationType.Schema; xmlrs.Schemas.Add(null, Dts.Connections["movies2.xsd"].ConnectionString); xmlrs.Schemas.Compile(); // Try reading the XML file using the XSD. Use the // Connection Manager to get the path from the XML file try { XmlReader xmlr = XmlReader.Create(Dts.Connections["movies.xml"].ConnectionString, xmlrs); while (xmlr.Read()) { } // The XML file was succesfully read. // Close XML file xmlr.Close(); Dts.TaskResult = (int)ScriptResults.Success; } catch (Exception ex) { // Validation failed, fire error event with errormessage Dts.Events.FireError(-1, "Validate XML", "Validation error: " + ex.Message, string.Empty, 0); Dts.TaskResult = (int)ScriptResults.Failure; } } #region ScriptResults declaration ///or VB.net/// This enum provides a convenient shorthand within the scope of this class for setting the /// result of the script. /// /// This code was generated automatically. /// enum ScriptResults { Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success, Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure }; #endregion } }
' VB.Net Code #Region "Imports" Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Runtime Imports System.Xml ' Added #End Region <Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute()> _ <System.CLSCompliantAttribute(False)> _ Partial Public Class ScriptMain Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase Public Sub Main() ' create object for the XSD file that will be used ' for validating the XML file. Use the Connection ' Manager to get the path from the XSD file. Dim xmlrs As New XmlReaderSettings() xmlrs.ValidationType = ValidationType.Schema xmlrs.Schemas.Add(Nothing, Dts.Connections("movies2.xsd").ConnectionString) xmlrs.Schemas.Compile() ' Try reading the XML file using the XSD. Use the ' Connection Manager to get the path from the XML file Try Dim xmlr As XmlReader = XmlReader.Create(Dts.Connections("movies.xml").ConnectionString, xmlrs) While xmlr.Read() End While ' The XML file was succesfully read. ' Close XML file xmlr.Close() Dts.TaskResult = ScriptResults.Success Catch ex As Exception ' Validation failed, fire error event with errormessage Dts.Events.FireError(-1, "Validate XML", "Validation error: " + ex.Message, String.Empty, 0) Dts.TaskResult = ScriptResults.Failure End Try End Sub #Region "ScriptResults declaration" 'This enum provides a convenient shorthand within the scope of this class for setting the 'result of the script. 'This code was generated automatically. Enum ScriptResults Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure End Enum #End Region End Class
Note: For those using SSIS 2005: use the taskresult row from 2005 that can be found in your main method and replace the followin lines: C# line 34 & 40 / VB.Net line 31 & 35.
4) Testing
Run the package. Change something in the XML file to make it fail (a movie title longer than 50 chars).
Test result |
Thank you So much. It has really helped me.
ReplyDeleteThanks a lot!
ReplyDeleteThank you very much! Your script helped me a lot.
ReplyDeleteBest wishes :)
Hello,
ReplyDeleteHow can you validate the XML file before validating it against the structure defined in the XSD??
I've a file containing HTML instead of xml and the package identifies it correctly as not valid regarding the XSD, however, the failure fails also the complete package....
As I have more than 300K xml files to parse, I don't want to restart the job everytime such a content is encountered.
Many thanks for your reactions
It's best to ask this question in the SSIS forum. More people see your question and it makes it easier to reply.
DeleteDo you use a Foreach Loop to handle all your files? You can ignore errors within the loop and continue looping.
Good Stuff.. Thanks a lot.. it resolved my issue...
ReplyDelete