Saturday, 25 June 2011

Mixed data types in Excel column

Case
My column in Excel contains mixed datatypes (strings and numbers). The string values are shown as NULL. How can I get all values?






Solution
The Excel driver reads the first 8 rows to determine the datatype. If the majority of the rows are numbers it will choose a numeric datatype instead of a string data type and visa versa.

So if your Excelsheet looks like this. The first and second column get DT_R8 and the third column gets DT_WSTR.
The standard result in SSIS






















TypeGuessRows
You can change the default of checking 8 rows (1 to16) by changing the windows registry, but that doesn't change the majority rule! If you want to change that number... Start the Windows Registry-editor and search for "TypeGuessRows". The place of that key depends on the version of Windows and Office. Examples:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.0\Engines\Excel

For the ACE provider (Excel 2007, 2010 & 2013):
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows

Note: Setting the TypeGuessRows value to zero (0) will result in checking the first 16384 rows in excel. 
Change at your own risk!











Import Mode
If you want to get the data of all rows, regardless the datatypes in Excel, you should add the string ;IMEX=1 to the Excel Connection String. Changing to Import Mode (0 is Export mode, 1 is Import mode and 2 is Linked mode) will result in getting everything (including numbers) as a string. Goto the properties of the Excel Connection manager and change the ConnectionString property:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\MyExcelFile1.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=D:\MyExcelFile.xlsx;Extended Properties="Excel 12.0 XML;HDR=YES;IMEX=1";

The preview of the same Excelsheet looks very different now.
The result in SSIS with IMEX=1






















There is one big but.... if the first 8 rows contain a number the datatype will be DT_R8 again (despite of IMEX=1). This could be a problem if the ninth row contains a string.
Still null values!

26 comments:

  1. Hi,

    very interesing post. Isn't there a way to bypass the "one big but"?
    Like Forcing excel/SSIS to read all as strings? get the value as it is?

    BR,
    Pinela

    ReplyDelete
  2. @Pinela:
    option 1: set TypeGuessRows to 0, then it will apparently scan all rows, but that could be slow for large excel files. I wouldn't recommend that
    option 2: save excel as csv with a script task
    option 3: search for thirdparty alternatives

    ReplyDelete
  3. Joost, thank you.

    option 2: through VBA? What is the best way in your opinion?

    BR,
    Pinela.

    ReplyDelete
  4. @Pinela: Sorry, I just did a test with some c# code, but it uses the same OLEDB provider as SSIS (with the same result). You could try something with the Microsoft.Office.Interop.Excel DLL, but that will require Excel on your machine.

    Try to find a third party dll or a third party SSIS component. Or even better stop using Excel as a source :-)

    ReplyDelete
  5. Thank You! - saved another week of investigation

    ReplyDelete
  6. I just disovered that the positional location for IMEX=1 in the Extended Properties is CRITICAL (at the end) and has corrected misinterpretation of text as numerics in Excel 2007+ files on import (see Connection String used that worked) ConnectionString: Provider=Microsoft.ACE.OLEDB.12.0;Data Source=E:\transferDATA\Christian_co\CHRISTIAN.xls;Extended Properties="EXCEL 12.0;HDR=YES;IMEX=1";

    ReplyDelete
    Replies
    1. I tried the same thing, exact same thing, and it didn't work for me :..(

      Delete
  7. A little trick i have used. Set IMEX = 1. Then if you have col headings, add them into the query, but then use HDR=NO. That way the first row will have text and everything else will be text. Then after the import has happened delete all header rows.

    D

    ReplyDelete
    Replies
    1. Man! God bless you! This is excatly what I was looking for :-)))

      Delete
    2. I didn't get this....CAn you please elaborate for me?

      Delete
  8. This is my connection string and it doesnt work with me
    connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source=" + uploadFileName + ";Extended Properties=\"Excel 8.0;HDR=YES;IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text\"";

    ReplyDelete
    Replies
    1. Let me know what is not working.... still get null values?

      TypeGuessRows isn't part of the connectionstring, but a windows registry value. And ImportMixedTypes=Text isn't part of the connectionstring either...

      Delete
  9. Hi guys. This is my connection string:

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source=pathname_variable;Extended Properties="Text;HDR=YES;IMEX=1"

    The non-digit values are still treated as null. Anything wrong with the string?
    Please note pathename_variable displays the pathname upon selecting file in dialog. Also, I'm only working on importing a CSV file. Thanks.

    Jared

    ReplyDelete
    Replies
    1. Hi Jared, Are you using ssis to read an Excel file? Are the first 8 rows numeric and are the null values after row 8?

      Delete
    2. Hi Joost-

      I have excel sheet which have first 8 rows integer data after that it has string data. I have done workaround as you suggest but it doesn't work.

      Please suggest in this way.


      Delete
    3. Hi MUKESH BHAGAT9,

      What do you see in SSIS (8 int and then null values?) and what have you done exactly?

      Delete
  10. This is my solution.

    Use HDR=NO and skip one value when retrieve the values or delete it later.
    This is not the best solution but it works ... :(

    Regards.

    ReplyDelete
  11. Hi Guys, I am using

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source=pathname_variable;Extended Properties="Excel 8.0;HDR=NO;IMEX=1"

    But still digital columns are NULL,my column holds both Text and Digits.

    IMEX=1 working if I am not using connection string in expression but my requirement is dynamic path, so I have to use expression.

    Please suggest any alternate solution

    Thanks in advance.
    Suvarna

    ReplyDelete
    Replies
    1. Use an expression on the excel filepath property instead of on the connectionstring. Or make sure IMEX=1 is also available in the expression in the connectionstring property.

      Delete
  12. This is awesome! We wasted a lot of time trying to figure out the solution and this one was spot on! Thanks a lot for this wonderful post.

    ReplyDelete
    Replies
    1. Apurva, HOw did you solve this issue? nothing worked for me.. can u post the connection string u used?

      Udhay

      Delete
  13. We have tried all the setting mentioned in the post but still we facing the issue., by setting up IMEX = 1 and by changing the registry setting from default 8 to 0, I am unable to read mixed datatypes. Could please help me what needs to be done in order to read mixed data types in SSIS.

    ReplyDelete
    Replies
    1. You could take a look at a blog from Koen or if these two blogs can't help you, you could post your problem on the SSIS forum. Give a lot of details like SSIS version, Excel version, excel datatype, ssis datatype, etc.

      Delete
  14. The best way is to convert the column data type in excel to text and then load

    ReplyDelete
  15. thank you very much; I was one day unable to solve the problem and I did it as described here and worked again thank you very much

    ReplyDelete

Please use the SSIS MSDN forum for general SSIS questions that are not about this post. I'm a regular reader of that forum and will gladly answer those questions over there.

All comments are moderated manually to prevent spam.