Saturday, 25 June 2011

Mixed data types in Excel column

My column in Excel contains mixed datatypes (strings and numbers). The string values are shown as NULL. How can I get all values?

The Excel driver reads the first 8 rows to determine the datatype. If the majority of the rows are numbers it will choose a numeric datatype instead of a string data type and visa versa.

So if your Excelsheet looks like this. The first and second column get DT_R8 and the third column gets DT_WSTR.
The standard result in SSIS

You can change the default of checking 8 rows (1 to16) by changing the windows registry, but that doesn't change the majority rule! If you want to change that number... Start the Windows Registry-editor and search for "TypeGuessRows". The place of that key depends on the version of Windows and Office. Examples:

For the ACE provider (Excel 2007, 2010 & 2013):
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows

Note: Setting the TypeGuessRows value to zero (0) will result in checking the first 16384 rows in excel. 
Change at your own risk!

Import Mode
If you want to get the data of all rows, regardless the datatypes in Excel, you should add the string ;IMEX=1 to the Excel Connection String. Changing to Import Mode (0 is Export mode, 1 is Import mode and 2 is Linked mode) will result in getting everything (including numbers) as a string. Goto the properties of the Excel Connection manager and change the ConnectionString property:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\MyExcelFile1.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=D:\MyExcelFile.xlsx;Extended Properties="Excel 12.0 XML;HDR=YES;IMEX=1";

The preview of the same Excelsheet looks very different now.
The result in SSIS with IMEX=1

There is one big but.... if the first 8 rows contain a number the datatype will be DT_R8 again (despite of IMEX=1). This could be a problem if the ninth row contains a string.
Still null values!

Friday, 24 June 2011

Passing variables from a Parent Package To a Child - part II

A few months ago I did a post on how to pass the value of a parent package to a child package. That solution consisted on a peace of .Net coding in a Script Task. But there is an other solution for those who don't like .Net coding.

This solution consists of using the package configuration.

1) Create parent variable
Create a variable in your parent package. Right click in the Control Flow if the variables are not visible. I used a string variable named FilePath in this test case to store some filepath.
Create new variable, think about the scope.

2) Create Execute Package Task
Drag an Execute Package Task in your Control Flow and configure it to start your child package.
Execute Package Task

3) Create child variable
Goto to your child package and create a variable with the same name as your parent package variable.
Same name!

4) Add configurations
In the SSIS menu choose Package Configurations.
Package Configurations

5) Configuration Type
Enable package configuration(1) and add a configuration. Choose Parent package variable(2) and type the name of the variable(3).
Parent package variable

6) Select Target Property
After you typed the name of the variable, hit the next button and select the value property of the child package variable. After that hit Next, Finish and OK to complete the configuration.
Value property of the child package variable

7) Testing
To test the package I added a simple Script Task (sorry for .net coding) with a messagebox to show the value of the parent package.
Little bit of coding, but just for the show.

The SSIS solution can be downloaded here.

Friday, 3 June 2011

Often used SSIS expressions

Here is a list of often used SSIS expression which I will expand regularly. You can leave your own handy expression in the comments. Or let me know if you have a better or alternative expression for the expressions below.

And because the expression builder of SSIS is quite worthless, the opensource addon SSIS Expression Editor & Tester could be very useful.


Filename from pathstring

This expression reverses the string to find the position of the last \ and uses that number (minus 1) in the Right expression:
  • RIGHT([FilePath],FINDSTRING(REVERSE([FilePath]),"\\",1) - 1)
  • RIGHT(@[User::FilePath],FINDSTRING(REVERSE(@[User::FilePath]),"\\",1) - 1)
Folderpath from pathstring
This expressions reverses the string to find the position of the last \ and deduct that number of the total length. After that you can use that number in a substring.
  • SUBSTRING(@[User::FilePath], 1,
    LEN(@[User::FilePath]) - FINDSTRING(REVERSE(@[User::FilePath] ), "\\" ,1 ) + 1
Foldername from the pathstring
This expression finds the last and second last / in the filepath and uses the those positions to find the foldername.
  • SUBSTRING(@[User::FilePath],
    LEN(@[User::FilePath]) - FINDSTRING(REVERSE(@[User::FilePath]),"\\",2) + 2,
    (LEN(@[User::FilePath]) - FINDSTRING(REVERSE(@[User::FilePath]),"\\",1)) - (LEN(@[User::FilePath]) - FINDSTRING(REVERSE(@[User::FilePath]),"\\",2)) - 1
 In SSIS 2012 it's a lot easier/shorter with the token expression:
  • TOKEN(@[User::FilePath],"\\",TOKENCOUNT(@[User::FilePath],"\\") - 1)
Default value for an empty string
If you want to give a default value for empty columns, you can check the length or you can check for NULL or you can even check both:
  • LEN([STRING_COLUMN]) == 0 ? "default value" : [STRING_COLUMN]
  • ISNULL([STRING_COLUMN]) ? "default value" : [STRING_COLUMN]

Leading zeros
These two expression will add up to 5 leading zeros to a number. So "123" becomes "00123":
  • RIGHT(("00000" + [STRING_COLUMN]), 5)
Note: you will have to convert a numeric field to string field first with DT_WSTR or DT_STR.

Date from datetime
If you want to remove the time element in a datetime object, you should cast it to DT_DBDATE. But because that datatype is very inconvenient to use, you should cast it back to the original datatype. That will set the time to 0:00.
  • (DT_DATE)(DT_DBDATE)@[User::datetimeVariable]
  • (DT_DATE)(DT_DBDATE)[datetimeColumn]

Time from datetime
If you want to remove the date element in a datetime object, you should cast it to DT_DBTIME. And optional cast it to a string.
  • (DT_STR,8,1252)(DT_DBTIME)@[User::datetimeVariable]
  • (DT_STR,8,1252)(DT_DBTIME)[datetimeColumn]

First day of the current month
If you want to get the first day of the current month, you take the current datetime and deduct the current day number (minus 1). Optional you can remove the time part:
  • DATEADD("d", -DAY(GETDATE()) + 1, GETDATE())

Last dat of the current month
If you want to get the last day of the current month, you add 1 month and deduct the current day number. Optional you can remove the time part:
And if you realy want the last second of the current month 30-06-2011 23:59:59

Weeknumber of the month (see)
1-june-2012 is weeknumber 23 in the year, but weeknumber 1 of the month june 2012.
  • (DATEPART("ww",[YourDate]) - DATEPART("ww",DATEADD("d", -DAY([YourDate]) + 1, [YourDate]))) + 1
Related Posts Plugin for WordPress, Blogger...