I have a column with lots of extra spaces in the tekst which I want to remove:
"This has a lot of spaces in it ."
The standard REPLACE ([MyColumnWithLotsOfSpaces], " ", " ") function, where I replace a double space by a single space, only replaces one set of double spaces but leaves the rest.
Still double spaces left |
Solution A
You could use the triple replace trick to replace duplicate spaces:
1) Identify the pairs of double spaces and replace them with a space and an unlikely character. I used a pipeline char:
REPLACE([MyColumnWithLotsOfSpaces], " ", " |")
2) Replace the inverted version of the new pairs by nothing:
REPLACE(REPLACE([MyColumnWithLotsOfSpaces], " ", " |"), "| ", "")
3) Replace the unlikely character by nothing:
REPLACE(REPLACE(REPLACE([MyColumnWithLotsOfSpaces], " ", " |"), "| ", ""), "|", "")
4) The Result
I added some data viewers for testing purposes:
No doubles spaces after triple replace |
Solution B
Or you could use this one-line Script Component solution where you replace the double spaces by one space using a regular expression REPLACE.
1) Script Component
Add a Script Component type Transformation and connect it to your flow.
Script Component, type Transformation |
2) Add ReadWrite input column
Select the column where you want to get rid of the double spaces as ReadWrite.
3) The Script
Edit the script and copy the following code to your Input0_ProcessInputRow method to replace the double spaces:
// C# Code using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { public override void Input0_ProcessInputRow(Input0Buffer Row) { // Replace double spaces with a regular expression Row.MyColumnWithLotsOfSpaces = System.Text.RegularExpressions.Regex.Replace(Row.MyColumnWithLotsOfSpaces, @"\s+", " "); } }or VB.Net
' VB.Net Code Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper <Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _ <CLSCompliant(False)> _ Public Class ScriptMain Inherits UserComponent Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Row.MyColumnWithLotsOfSpaces = System.Text.RegularExpressions.Regex.Replace(Row.MyColumnWithLotsOfSpaces, "\s+", " ") End Sub End Class
4) The Result
For testing purposes I added some data viewers.
The result |
Conclusion
The first method is easier for people who don't know .Net, but the second method is saver because you don't have to come up with the unlikely character.