My column in Excel contains mixed datatypes (strings and numbers). The string values are shown as NULL. How can I get all values?
Solution
The Excel driver reads the first 8 rows to determine the datatype. If the majority of the rows are numbers it will choose a numeric datatype instead of a string data type and visa versa.
So if your Excelsheet looks like this. The first and second column get DT_R8 and the third column gets DT_WSTR.
The standard result in SSIS |
TypeGuessRows
You can change the default of checking 8 rows (1 to16) by changing the windows registry, but that doesn't change the majority rule! If you want to change that number... Start the Windows Registry-editor and search for "TypeGuessRows". The place of that key depends on the version of Windows and Office. Examples:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.0\Engines\Excel
For the ACE provider (Excel 2007, 2010 & 2013):
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
Note: Setting the TypeGuessRows value to zero (0) will result in checking the first 16384 rows in excel.
Change at your own risk! |
Import Mode
If you want to get the data of all rows, regardless the datatypes in Excel, you should add the string ;IMEX=1 to the Excel Connection String. Changing to Import Mode (0 is Export mode, 1 is Import mode and 2 is Linked mode) will result in getting everything (including numbers) as a string. Goto the properties of the Excel Connection manager and change the ConnectionString property:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\MyExcelFile1.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=D:\MyExcelFile.xlsx;Extended Properties="Excel 12.0 XML;HDR=YES;IMEX=1";
The preview of the same Excelsheet looks very different now.
The result in SSIS with IMEX=1 |
There is one big but.... if the first 8 rows contain a number the datatype will be DT_R8 again (despite of IMEX=1). This could be a problem if the ninth row contains a string.
Still null values! |
Hi,
ReplyDeletevery interesing post. Isn't there a way to bypass the "one big but"?
Like Forcing excel/SSIS to read all as strings? get the value as it is?
BR,
Pinela
@Pinela:
ReplyDeleteoption 1: set TypeGuessRows to 0, then it will apparently scan all rows, but that could be slow for large excel files. I wouldn't recommend that
option 2: save excel as csv with a script task
option 3: search for thirdparty alternatives
Joost, thank you.
ReplyDeleteoption 2: through VBA? What is the best way in your opinion?
BR,
Pinela.
@Pinela: Sorry, I just did a test with some c# code, but it uses the same OLEDB provider as SSIS (with the same result). You could try something with the Microsoft.Office.Interop.Excel DLL, but that will require Excel on your machine.
ReplyDeleteTry to find a third party dll or a third party SSIS component. Or even better stop using Excel as a source :-)
Thank You! - saved another week of investigation
ReplyDeleteI just disovered that the positional location for IMEX=1 in the Extended Properties is CRITICAL (at the end) and has corrected misinterpretation of text as numerics in Excel 2007+ files on import (see Connection String used that worked) ConnectionString: Provider=Microsoft.ACE.OLEDB.12.0;Data Source=E:\transferDATA\Christian_co\CHRISTIAN.xls;Extended Properties="EXCEL 12.0;HDR=YES;IMEX=1";
ReplyDeleteI tried the same thing, exact same thing, and it didn't work for me :..(
DeleteA little trick i have used. Set IMEX = 1. Then if you have col headings, add them into the query, but then use HDR=NO. That way the first row will have text and everything else will be text. Then after the import has happened delete all header rows.
ReplyDeleteD
Man! God bless you! This is excatly what I was looking for :-)))
DeleteI didn't get this....CAn you please elaborate for me?
DeleteThis is my connection string and it doesnt work with me
ReplyDeleteconnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source=" + uploadFileName + ";Extended Properties=\"Excel 8.0;HDR=YES;IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text\"";
Let me know what is not working.... still get null values?
DeleteTypeGuessRows isn't part of the connectionstring, but a windows registry value. And ImportMixedTypes=Text isn't part of the connectionstring either...
Hi guys. This is my connection string:
ReplyDeleteProvider=Microsoft.Jet.OLEDB.4.0;Data Source=pathname_variable;Extended Properties="Text;HDR=YES;IMEX=1"
The non-digit values are still treated as null. Anything wrong with the string?
Please note pathename_variable displays the pathname upon selecting file in dialog. Also, I'm only working on importing a CSV file. Thanks.
Jared
Hi Jared, Are you using ssis to read an Excel file? Are the first 8 rows numeric and are the null values after row 8?
DeleteHi Joost-
DeleteI have excel sheet which have first 8 rows integer data after that it has string data. I have done workaround as you suggest but it doesn't work.
Please suggest in this way.
Hi MUKESH BHAGAT9,
DeleteWhat do you see in SSIS (8 int and then null values?) and what have you done exactly?
This is my solution.
ReplyDeleteUse HDR=NO and skip one value when retrieve the values or delete it later.
This is not the best solution but it works ... :(
Regards.
Thank you, this helped Me.
DeleteRegards,
Durga
Hi Guys, I am using
ReplyDeleteProvider=Microsoft.Jet.OLEDB.4.0;Data Source=pathname_variable;Extended Properties="Excel 8.0;HDR=NO;IMEX=1"
But still digital columns are NULL,my column holds both Text and Digits.
IMEX=1 working if I am not using connection string in expression but my requirement is dynamic path, so I have to use expression.
Please suggest any alternate solution
Thanks in advance.
Suvarna
Use an expression on the excel filepath property instead of on the connectionstring. Or make sure IMEX=1 is also available in the expression in the connectionstring property.
DeleteThis is awesome! We wasted a lot of time trying to figure out the solution and this one was spot on! Thanks a lot for this wonderful post.
ReplyDeleteApurva, HOw did you solve this issue? nothing worked for me.. can u post the connection string u used?
DeleteUdhay
We have tried all the setting mentioned in the post but still we facing the issue., by setting up IMEX = 1 and by changing the registry setting from default 8 to 0, I am unable to read mixed datatypes. Could please help me what needs to be done in order to read mixed data types in SSIS.
ReplyDeleteYou could take a look at a blog from Koen or if these two blogs can't help you, you could post your problem on the SSIS forum. Give a lot of details like SSIS version, Excel version, excel datatype, ssis datatype, etc.
DeleteThe best way is to convert the column data type in excel to text and then load
ReplyDeletethank you very much; I was one day unable to solve the problem and I did it as described here and worked again thank you very much
ReplyDelete