Monday, 27 December 2010

Flexible XML Destination

Case
A client wants to export data to XML files, but SSIS has no XML destination. A script component could help him but he doesn't want to create a new script for each file/source just because the file/source lay-out is different.

Starting point
Col1Col2Col3
testTestTEST
blaBlaBLA
xxxXxxXXX

Desired situation
<?xml version="1.0" ?>
<ROOT>
 <ROW>
  <col1>test</col1>
  <col2>Test</col2>
  <col3>TEST</col3>
 </ROW>
 <ROW>
  <col1>bla</col1>
  <col2>Bla</col2>
  <col3>BLA</col3>
 </ROW>
 <ROW>
  <col1>xxx</col1>
  <col2>Xxx</col2>
  <col3>XXX</col3>
 </ROW>
</ROOT>

Solution
There is a request for an XML destination on the Microsoft site which you can support. There is an open source component: XML Destination and there are some third party components, but you can do it yourself with a Script component.
Let's create a flexible XML destination that is the same for each file/source. This could be resolved by reflection (again).

1) Source
Create a random source (flatfile / OLE DB).

2) Script component
Drag a script component to your data flow and select Destination. Connect it to the Source component.
Script component destination




















3) Input columns
Edit the Script components and select all input columns (or just the ones you need)
Select all columns as input



















4) Destination
Create a new connection named xmldocument in the Connection Managers tab. The file type should be FILE and create a new file.
FILE


















Create file











5) The Script
Now the script (C# and VB.net for this example). SSIS creates 3 methods which we will adjust and we need a fourth method to remove forbidden xml characters such as < and >.
// C# Code
// Flexible script that creates a XML document
// using the SSIS columns as nodes.
using System;
using System.Data;
using System.Reflection; // Added
using System.Xml;        // Added
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Variable that contains the XML document
    XmlTextWriter textWriter;

    // Start of XML document
    public override void PreExecute()
    {
        base.PreExecute();

        // Create a new XML document and use the filepath in the connection as XML-file
        textWriter = new XmlTextWriter(this.Connections.xmldocument.ConnectionString.ToString(), null);

        // Start writing the XML document: 
        textWriter.WriteStartDocument();

        // Create root element 
        textWriter.WriteStartElement("ROOT");
    }

    // Close of XML document
    public override void PostExecute()
    {
        base.PostExecute();

        // Close root element: 
        textWriter.WriteEndElement();

        // Stop writing the XML document
        textWriter.WriteEndDocument();

        // Close document
        textWriter.Close();
    }

    // Method that will be started for each record in you dataflow  
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Row type to get the value of a column
        Type rowType = Row.GetType();
        String columnValue = "";

        // Create row element: 
        textWriter.WriteStartElement("ROW");

        // Loop through all columns and create a column element: valuevalue
        foreach (IDTSInputColumn100 column in this.ComponentMetaData.InputCollection[0].InputColumnCollection)
        {
            // Use the SSIS column name as element name: 
            textWriter.WriteStartElement(column.Name);
            
            // Get column value, will fail if null
            try
            {
                columnValue = rowType.GetProperty(column.Name).GetValue(Row, null).ToString();
            }
            catch
            {
                // Default value for null values: "null", "" or null
                columnValue = "null";
            }
            finally
            {
                textWriter.WriteString(removeForbiddenXmlChars(columnValue));
               }
            // Close column element: 
            textWriter.WriteEndElement();
        }
        // Close row element: 
        textWriter.WriteEndElement();


        // Output the number of processed rows. 103 = RowsWritten
        this.ComponentMetaData.IncrementPipelinePerfCounter(103, 1); 
    }

    // Remove forbidden chars that could damage your XML document
    private string removeForbiddenXmlChars(string columnValue)
    {
        return columnValue.Replace("&", "&amp;").Replace("<", "&lt;").Replace(">", "&gt;");
    }
}

And the VB.net code
' VB.Net code
' Flexible script that creates a XML document 
' using the SSIS columns as nodes. 
Imports System
Imports System.Data
Imports System.Reflection       ' Added
Imports System.Xml              ' Added
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<microsoft.sqlserver.dts.pipeline.ssisscriptcomponententrypointattribute> _
<clscompliant(false)> _
Public Class ScriptMain
    Inherits UserComponent

    Dim textWriter As XmlTextWriter

    Public Overrides Sub PreExecute()
        MyBase.PreExecute()
        ' Create a new XML document and use the filepath in the connection as XML-file 
        textWriter = New XmlTextWriter(Me.Connections.xmldocument.ConnectionString.ToString(), System.Text.Encoding.Default)

        'Start writing the XML document:
        textWriter.WriteStartDocument()

        'Create root element 
        textWriter.WriteStartElement("ROOT")
    End Sub

    Public Overrides Sub PostExecute()
        MyBase.PostExecute()

        'Close root element: 
        textWriter.WriteEndElement()

        'Stop writing the XML document
        textWriter.WriteEndDocument()

        'Close document
        textWriter.Close()
    End Sub

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Row type to get the value of a column 
        Dim rowType As Type = Row.GetType()
        Dim columnValue As String = ""

        ' Create row element:  
        textWriter.WriteStartElement("ROW")

        ' Loop through all columns and create a column element: valuevalue 
        For Each column As IDTSInputColumn100 In Me.ComponentMetaData.InputCollection(0).InputColumnCollection


            ' Use the SSIS column name as element name:  
            textWriter.WriteStartElement(column.Name)

            ' Get column value, will fail if null 
            Try
                columnValue = rowType.GetProperty(column.Name).GetValue(Row, Nothing).ToString()
            Catch
                ' Default value for null values: "null", "" or null 
                columnValue = "null"
            Finally
                textWriter.WriteString(removeForbiddenXmlChars(columnValue))
            End Try

            ' Close column element:  
            textWriter.WriteEndElement()
        Next
        ' Close row element:  
        textWriter.WriteEndElement()
        ' Output the number of processed rows. 103 = RowsWritten 
        Me.ComponentMetaData.IncrementPipelinePerfCounter(103, 1)

    End Sub


    'Remove forbidden chars that could damage your XML document
    Private Function removeForbiddenXmlChars(ByVal columnValue As String) As String
        removeForbiddenXmlChars = columnValue.Replace("&", "&amp;").Replace("<", "&lt;").Replace(">", "&gt;")
    End Function

End Class

Note: there are probably some more forbidden XML chars like ' (&apos;) and " (&quot;) that you need to replace.


UPDATE:
I have a newer version which doesn't use the ugly try catch for null values, but you have to change the .Net Framework version to 3.5.

// C# Code
// Flexible script that creates a XML document 
// using the SSIS columns as nodes. 
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Reflection;    // Added 
using System.Xml;           // Added 
using System.Linq;          // Added

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    // Variable that contains the XML document 
    XmlTextWriter textWriter;
    
    // Start of XML document 
    public override void PreExecute()
    {
        base.PreExecute();
        // Create a new XML document and use the filepath in the connection as XML-file 
        textWriter = new XmlTextWriter(this.Connections.xmldocument.ConnectionString.ToString(), null);
        // Start writing the XML document: 
        textWriter.WriteStartDocument();
        // Create root element  
        textWriter.WriteStartElement("ROOT");
    }

    // Close of XML document 
    public override void PostExecute()
    {
        base.PostExecute();
        // Close root element:  
        textWriter.WriteEndElement();
        // Stop writing the XML document 
        textWriter.WriteEndDocument();
        // Close document 
        textWriter.Close();
    }

    // Method that will be started for each record in you dataflow 
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Row type to get the value of a column 
        var properties = Row.GetType().GetProperties(BindingFlags.Instance | BindingFlags.Public);;

        String columnValue = "";
        // Create row element:  
        textWriter.WriteStartElement("ROW");
        // Loop through all columns and create a column element: valuevalue 

        foreach (var property in properties.Where(p => !p.Name.EndsWith("_IsNull", StringComparison.OrdinalIgnoreCase)))
        {
            // Use the SSIS column name as element name:  
            textWriter.WriteStartElement(property.Name);

            // Get column value, but that method will fail if empty
            // so first check if column value is null
            if (!CheckNull(Row, properties, property.Name))
            {
                // Get column value
                columnValue = property.GetValue(Row, null).ToString();
            }
            else
            {
                // Default value for null values: "null", "" or null 
                columnValue = "";
            }
            
            // write column value, but first remove forbidden chars
            textWriter.WriteString(removeForbiddenXmlChars(columnValue)); 

            // Close column element:  
            textWriter.WriteEndElement(); 
        }
        // Close row element:  
        textWriter.WriteEndElement();
        // Output the number of processed rows. 103 = RowsWritten 
        this.ComponentMetaData.IncrementPipelinePerfCounter(103, 1);
    }

    // Remove forbidden chars that could damage your XML document 
    private string removeForbiddenXmlChars(string columnValue)
    {
        return columnValue.Replace("&", "&").Replace("<", "<").Replace(">", ">");
    }

    // Method that checks wether a column value is empty.
    // It uses the standard boolean [columnName]_IsNull.
    private bool CheckNull(Input0Buffer Row, PropertyInfo[] properties, string propertyName)
    {
        var property = properties.Where(p => p.Name.Equals(propertyName + "_IsNull", StringComparison.OrdinalIgnoreCase)).FirstOrDefault();

        if (property != null)
        {
            if ((bool)property.GetValue(Row, null))
            {
                return true;
            }
        }
        return false;
    }
}

And the VB.net code
' VB.Net code
' Flexible script that creates a XML document 
' using the SSIS columns as nodes.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Reflection   ' Added 
Imports System.Xml          ' Added 
Imports System.Linq         ' Added
<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    ' Variable that contains the XML document 
    Dim textWriter As XmlTextWriter

    ' Start of XML document 
    Public Overrides Sub PreExecute()
        MyBase.PreExecute()
        ' Create a new XML document and use the filepath in the connection as XML-file 
        textWriter = New XmlTextWriter(Me.Connections.xmldocument.ConnectionString.ToString(), System.Text.Encoding.Default)
        'Start writing the XML document: 
        textWriter.WriteStartDocument()
        'Create root element  
        textWriter.WriteStartElement("ROOT")
    End Sub

    ' Close of XML document
    Public Overrides Sub PostExecute()
        MyBase.PostExecute()
        'Close root element:  
        textWriter.WriteEndElement()
        'Stop writing the XML document 
        textWriter.WriteEndDocument()
        'Close document 
        textWriter.Close()
    End Sub

    ' Method that will be started for each record in you dataflow
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        ' Row type to get the value of a column 
        Dim properties = Row.[GetType]().GetProperties(BindingFlags.Instance Or BindingFlags.[Public])

        Dim columnValue As [String] = ""
        ' Create row element:  
        textWriter.WriteStartElement("ROW")
        ' Loop through all columns and create a column element: valuevalue 

        For Each [property] As Object In properties.Where(Function(p) Not p.Name.EndsWith("_IsNull", StringComparison.OrdinalIgnoreCase))
            ' Use the SSIS column name as element name:  
            textWriter.WriteStartElement([property].Name)

            ' Get column value, but that method will fail if empty
            ' so first check if column value is null
            If Not CheckNull(Row, properties, [property].Name) Then
                ' Get column value
                columnValue = [property].GetValue(Row, Nothing).ToString()
            Else
                ' Default value for null values: "null", "" or null 
                columnValue = ""
            End If

            ' write column value, but first remove forbidden chars
            textWriter.WriteString(removeForbiddenXmlChars(columnValue))

            ' Close column element:  
            textWriter.WriteEndElement()
        Next
        ' Close row element:  
        textWriter.WriteEndElement()
        ' Output the number of processed rows. 103 = RowsWritten 
        Me.ComponentMetaData.IncrementPipelinePerfCounter(103, 1)
    End Sub

    'Remove forbidden chars that could damage your XML document 
    Private Function removeForbiddenXmlChars(ByVal columnValue As String) As String
        removeForbiddenXmlChars = columnValue.Replace("&", "&").Replace("<", "<").Replace(">", ">")
    End Function

    ' Method that checks wether a column value is empty.
    ' It uses the standard boolean [columnName]_IsNull.
    Private Function CheckNull(ByVal Row As Input0Buffer, ByVal properties As PropertyInfo(), ByVal propertyName As String) As Boolean
        Dim [property] = properties.Where(Function(p) p.Name.Equals(propertyName & "_IsNull", StringComparison.OrdinalIgnoreCase)).FirstOrDefault()

        If [property] IsNot Nothing Then
            If CBool([property].GetValue(Row, Nothing)) Then
                Return True
            End If
        End If
        Return False
    End Function
End Class

14 comments:

  1. Getting this error when i pasted C# Code in the edit script. Please let me know how to reolve this as i am pretty much new to ssis & C#.
    Error:--
    'Connections' does not contain a definition for 'xmldocument' and no extension method 'xmldocument' accepting a first argument of type 'Connections' could be found (are you missing a using directive or an assembly reference

    ReplyDelete
  2. @Gautam: I think you forgot step 4 or gave the connection a different name. Let me know if that solved your problem.

    ReplyDelete
  3. Great article. Any thoughts on why the 2nd & 3rd column values would be returning null? i.e. "Get column value, will fail if null"

    ReplyDelete
  4. What is the datatype of the columns? And what kind of values do you expect?

    ReplyDelete
  5. Thanks for the reply.

    In the OLEDB source (Step 1), I am repeating the same column three times using an alias.
    SELECT x AS COLUMN1, x as COLUMN2, x as COLUMN3 FROM table (The data type is numeric)

    I'm trying to understand why the code below is not retrieving the data afer the first column.
    rowType.GetProperty(column.Name).GetValue(Row, Nothing).ToString()

    The interesting part is that I can tap the Row object directly - and the script returns the correct values.
    Row.COLUMN1.ToString()
    Row.COLUMN2.ToString()
    Row.COLUMN3.ToString()

    --Bob

    ReplyDelete
    Replies
    1. Hi Bob,

      I have an improved version. Will add it after some testing. Contact me if you want now...

      Delete
  6. Thanks again for the post. I continued to have issues with the Input0_ProcessInputRow subroutine. "object reference not set to an instance of an object" error thrown on this call: rowType.GetProperty(column.Name).GetValue(Row, Nothing).ToString()

    Patched a fix togeher, the code is below. No error checking since my data was not null.

    --Bob


    Private inputBuffer As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer
    Public Overrides Sub ProcessInput(ByVal InputID As Integer, ByVal Buffer As Microsoft.SqlServer.Dts.Pipeline.PipelineBuffer)
    inputBuffer = Buffer
    MyBase.ProcessInput(InputID, Buffer)
    End Sub

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    Dim columnValue As String = ""
    Dim counter As Integer = 0

    textWriter.WriteStartElement("ROW")

    'For counter = 0 To inputBuffer.ColumnCount - 1
    For Each column As IDTSInputColumn100 In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
    textWriter.WriteStartElement(column.Name)
    columnValue = inputBuffer.Item(counter).ToString()
    textWriter.WriteString(columnValue.ToString())
    counter += 1
    textWriter.WriteEndElement()
    Next
    textWriter.WriteEndElement()
    Me.ComponentMetaData.IncrementPipelinePerfCounter(103, 1)

    End Sub

    ReplyDelete
    Replies
    1. Cool code, thanks! Learning C#

      Delete
  7. Any idea why the call to:

    columnValue = rowType.GetProperty(column.Name).GetValue(Row, null).ToString();

    fails when the column name has an underscore in it? Other than that, it works fine.

    ReplyDelete
    Replies
    1. In a Script Component SSIS removes some chars from the columnnames like underscores and also numbers if they are the first char in the name.

      I think the easiest to overcome this is to rename the field in the Output Alias (see image step 3).

      Delete
    2. Yes, I finally figured that out. Using column.Name in the call to GetProperty retains the underscores, which will fail; using something like column.Name.Replace("_","") works just fine. Thanks.

      Delete
  8. Thanks for the script. For some reason it kept getting stuck in PostExecute method.
    After some trial and error I commented calls to WriteEndElement and WriteEndDocument and it worked. It also put appropriate end tags. so I guess explicit calls to those methods were not required.

    My PostExecute method looks as below

    public override void PostExecute()
    {
    textWriter.Close();

    base.PostExecute();

    }

    ReplyDelete
  9. Thanks for this, looks like a great start for the work I'm currently doing.

    One thing: I'm a little confused by the removeForbiddenXmlChars function, which appears to replace three characters with the same characters. I'm not sure this would be necessary even if it did work, as the XMLTextWriter's WriteString method automatically deals with encoding those characters correctly...

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.