Tuesday, 21 February 2012

Replace multiple spaces with one space

Case
I have a column with lots of extra spaces in the tekst which I want to remove:
"This     has  a lot of     spaces     in          it  ."

The standard REPLACE ([MyColumnWithLotsOfSpaces], "  ", " ") function, where I replace a double space by a single space, only replaces one set of double spaces but leaves the rest.
Still double spaces left













Solution A
You could use the triple replace trick to replace duplicate spaces:

1) Identify the pairs of double spaces and replace them with a space and an unlikely character. I used a pipeline char:
REPLACE([MyColumnWithLotsOfSpaces], "  ", " |")

2) Replace the inverted version of the new pairs by nothing:
REPLACE(REPLACE([MyColumnWithLotsOfSpaces], "  ", " |"), "| ", "")

3) Replace the unlikely character by nothing:
REPLACE(REPLACE(REPLACE([MyColumnWithLotsOfSpaces], "  ", " |"), "| ", ""), "|", "")

4) The Result
I added some data viewers for testing purposes:
No doubles spaces after triple replace













Solution B
Or you could use this one-line Script Component solution where you replace the double spaces by one space using a regular expression REPLACE.

1) Script Component
Add a Script Component type Transformation and connect it to your flow.
Script Component, type Transformation





















2) Add ReadWrite input column
Select the column where you want to get rid of the double spaces as ReadWrite.























3) The Script
Edit the script and copy the following code to your Input0_ProcessInputRow method to replace the double spaces:
// C# Code
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        // Replace double spaces with a regular expression
        Row.MyColumnWithLotsOfSpaces = System.Text.RegularExpressions.Regex.Replace(Row.MyColumnWithLotsOfSpaces, @"\s+", " "); 
    }
}
or VB.Net
' VB.Net Code
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        Row.MyColumnWithLotsOfSpaces = System.Text.RegularExpressions.Regex.Replace(Row.MyColumnWithLotsOfSpaces, "\s+", " ")
    End Sub
End Class

4) The Result
For testing purposes I added some data viewers.
The result













Conclusion
The first method is easier for people who don't know .Net, but the second method is saver because you don't have to come up with the unlikely character.

Wednesday, 1 February 2012

Custom SSIS Component: Foreach Folder Enumerator


About a year ago I did a post on creating a Foreach Folder Enumerator with a Script Task.
Foreach Folder Enumerator with Script Task














Now I have created a real Foreach Folder Enumerator and you can even sort on name and date. There is also an option to include the root folder (if it passes the filter) and the standard Traverse Subfolders is also included. In version 1.1 has regular expression support.
Foreach Folder Enumerator V1.1






























Disclaimer
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Downloads:
You can download a SSIS 2008 and 2012 version on the download page.
Version 1.1: Regular Expression support
Version 1.2: 2014 and 2016 version added, Option to fail if folder is empty. Upgradable

Installation
The installer registers the DLL in the GAC and copies it to the ForEachEnumerators folder of SSIS (example: C:\Program Files\Microsoft SQL Server\100\DTS\ForEachEnumerators\). After that you have to close Visual Studio/BIDS because it caches the GAC on startup.

SSIS
The enumerator is automatically added to the Foreach Loop when you open a SSIS project.
The new enumerator V1.0



























Bugs or feature suggestions
Please contact me or leave a comment if you have any suggestions or found a bug in this Custom Enumerator.