Problem with using OleDbDataAdapter to fetch data from a Excel sheet - c#

First, I want to say that I'm out on deep water here, since I'm just doing some changes to code that is written by someone else in the company, using OleDbDataAdapter to "talk" to Excel and I'm not familiar with that. There is one bug there I just can't follow.
I'm trying to use a OleDbDataAdapter to read in a excel file with around 450 lines.
In the code it's done like this:
connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source='" + path + "';" + "Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1;\"");
connection.Open();
OleDbDataAdapter objAdapter = new OleDbDataAdapter(objCommand.CommandText, connection);
objAdapter.Fill(objDataSet, "Excel");
foreach (DataColumn dataColumn in objTable.Columns) {
if (dataColumn.Ordinal > objDataSet.Tables[0].Columns.Count - 1) {
objDataSet.Tables[0].Columns.Add();
}
objDataSet.Tables[0].Columns[dataColumn.Ordinal].ColumnName = dataColumn.ColumnName;
objImport.Columns.Add(dataColumn.ColumnName);
}
foreach (DataRow dataRow in objDataSet.Tables[0].Rows) {
...
}
Everything seems to be working fine except for one thing. The second column is filled with mostly four digit numbers like 6739, 3920 and so one, but fice rows have alphanumeric values like 8201NO and 8205NO. Those five cells are reported as having blank contents instead of their alphanumeric content. I have checked in excel, and all the cells in this columns are marked as Text.
This is an xls file by the way, and not xlsx.
Do anyone have any clue as why these cells are shown as blank in the DataRow, but the numeric ones are shown fine? There are other columns with alphanumeric content that are shown just fine.

What's happening is that excel is trying to assign a data type to the spreadsheet column based on the first several values in that column. I suspect that if you look at the properties in that column it will say it is a numerical column.
The problem comes when you start trying to query that spreadsheet using jet. When it thinks it's dealing with a numerical column and it finds a varchar value it quietly returns nothing. Not even a cryptic error message to go off of.
As a possible work around can you move one of the alpha numeric values to the first row of data and then try parsing. I suspect you will start getting values for the alpha numeric rows then...
Take a look at this article. It goes into more detail on this issue. it also talks about a possible work around which is:
However, as per JET documentation, we
can override the registry setting thru
the Connection String, if we set
IMEX=1( as part of Extended
Properties), the JET will set the all
column type as UNICODE VARCHAR or
ADVARWCHAR irrespective of
‘ImportMixedTypes’ key value.hey

IMEX=1 means "Read mixed data as text."
There are some gotchas, however. Jet will only use several rows to determine whether the data is mixed, and if so happens these rows are all numeric, you'll get this behaviour.
See connectionstrings.com for details:
Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance. Please also note that adding the IMEX=1 option might cause the IMEX feature to set in after just 8 rows. Use IMEX=0 instead to be sure to force the registry TypeGuessRows=0 (scan all rows) to work.

I would advise against using the OleDb data provider stuff to access Excel if you can help it. I've had nothing but problems, for exactly the reasons that others have pointed out. The performance tends to be atrocious as well when you are dealing with large spreadsheets.
You might try this open source solution:
http://exceldatareader.codeplex.com/

Related

reading excel with oledb not displaying correct values

This is old question I posted:
Reading one & Update some other Excel with c#
As suggested I created schema.ini file. My excel files have so many columns (many of them are not fixed) with mixed data. Even a cell contains numbers along with text. I observed that NOT ALL values are shown when I read excel using OLEDB and populate into a DataTable.
I can't assume ALL columns are put them into .ini file. Columns in my excel will go up to "DX". I observed that only 1st row which has number+text value are shown but similar text appears somewhere down aren't shown. It's shows as blank.
Here is connection string:
string strConn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + FilePath+ "';Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text\"";
Is there any solution so it reads all types of data?
This comes up a lot and it's very understandable because the documentation is somewhat lacking
Microsoft.ACE.OLEDB.12.0 does not handle columns of mixed data types very well. So what happens is that the driver will always read the first n values in each column and assign a datatype depending on what it finds in the first n cells of the column. n is determined by the setting of a registry key. It moves around depending on whether you have a 64 bit implementation or a 32 bit one but the 64 bit key is in...
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
Sadly it's not always convenient to go around modifying registry keys and it would have been much better to leave this setting on the connection string but it is what it is. The default value for this is 8 rows.
If the driver finds mixed data types then and only then does the setting of IMEX come into play. If IMEX=1 is included then a column of mixed data types is returned as text. If it is not specified then any values which do not correspond to the assigned data type are returned as null.
This is where HDR=No is useful. If you have a header then specify HDR=No and read it. This will help ensure that the column is returned as text as long as your headers are all text as well of course. You can then discard the header before processing the data. It won't help if you have a majority numeric/date time data types in the first n cells of the column.
As an aside the driver will read all types of Excel files including .xls, .xlsm and .xlsx - there is no need to change the extended properties away from Excel 12.0 to do so. This is a considerable advantage.
The older Microsoft.Jet.OLEDB.4.0 was good in that you could specify TypeGuessRows and ImportMixedTypes in the connection string but Microsoft.ACE.OLEDB.12.0 completely ignores them so you can remove them from your connection string as their presence is misleading. The older driver can only read .xls files.
Both drivers will only read 255 columns without amending the SELECT statement. To read more than 255 columns you specify a range. E.g.
Select * From [Sheet1$IV:SP]
will read columns 256-510. If your sheet ends on DX it is well within the 255 column limit.
Hidden columns are always returned.
There are a couple of nasties with this driver. Firstly leading empty rows or columns are ignored completely. This can really mess things up if you are expecting data in a specific rows/columns. Secondly Excel incorrectly treats 29/Feb/1900 as a valid date but OLEDB does not. You can stick 29/Feb/1900 into an Excel spreadsheet just fine but OLEDB will return it as 28/Feb/1900. I can't see anything else it could do really.
The driver is a very handy and cheap way of reading well formatted Excel spreadsheets as long as you are aware of the limitations and can code around them.
Good luck.

OleDbDataReader does not see data in the cell(.xlsx)

Have a really weird problem with reading xlsx file(I'm using OleDbDataReader).
I have a column there that consist of the following data:
50595855
59528522
C_213154
23141411
The problem is that when I read this column the reader shows me that the third row is empty. The column format in Excel is set to 'General'. But when I set the format to 'Text', everything works fine and reader sees the data in that row.
So just for a sake of experiment, I prefixed first two rows with letter and made it look like following :
C_50595855
C_59528522
C_213154
23141411
And the reader reads everything without problem even when the column format is set to 'General'.
So Excel apparently somehow analyses data in the column before loading it, and it gets confused when first cells of the column look like numeric and some of the rest are texts..
It is really weird to me as either there is data in the cell or there isn't.
Anyone have any ideas why this is happening?
Any help will be much appreciated.
Regards,
Igor
As you surmised it's an issue caused by mixed data types. If you search on "OleDBDataReader mixed types" you'll get some answers. Here's an MSDN page that describes the problem:
"This problem is caused by a limitation of the Excel ISAM driver in that once it determines the datatype of an Excel column, it will return a Null for any value that is not of the datatype the ISAM driver has defaulted to for that Excel column. The Excel ISAM driver determines the datatype of an Excel column by examining the actual values in the first few rows and then chooses a datatype that represents the majority of the values in its sampling."
... and the solution(s):
"Insure that the data in Excel is entered as text. Just reformatting the Excel column to Text will not accomplish this. You must re-enter the existing values after reformatting the Excel column. In Excel, you can use F5 to re-enter existing values in the selected cell.
You can add the option IMEX=1; to the Excel connect string in the OpenDatabase method. For example:
Set Db = OpenDatabase("C:\Temp\Book1.xls", False, True, "Excel 8.0; HDR=NO; IMEX=1;")
"

c# Excel import query

I've been requested to import an excel spreadsheet which is fine but Im getting a problem with importing a certain cell that contains both numeric and alphanumeric characters.
excel eg
Col
A B C
Row 0123 8 Fake Address CF11 1XX
XX123 8 Fake Address CF11 1XX
As per the example above when the dataset is being loaded its treating Row 2, col (A) as a numeric field resulting in an empty column in the array.
My connection for the OleDb is
var dbImportConn = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + dataSource
+ #";Extended Properties=""Excel 8.0;HDR=No;IMEX=1"";")
In this connection i have set the IMEX = 1 which should parse all contents as string into the dataset. Also if i change Row 1 Col (A) to have 'XX123' the entire Col (A) successfully parses as string! Unfortunately this is not going to help my scenario as the excel file is passed from an external client who have also advised that do not have the means to pass through the file with a header row which would solve my issue.
My one thought at this point is when I receive the file to edit the file (programmatically) to insert a header but again as the client may change how many columns are contained this would not be a safe option for me.
So basically I need to find a solution for dealing with the current format on the spreadsheet and to pass through all cells into the array. Has anyone come across this issue before ? Or know how to solve this ?
I await your thoughts
Thanks
Scott
ps If this is not clear just shout
Hi There is a registry setting called TypeGuessRows that you can change that tells Excel to scan all the column before deciding it's type. Currently, it seems, this is set to read an x number of rows in a column and decides the type of the column e.g. if your first x rows are integers and x+1 is string, the import will fail because it has already decided that this is an integer column. You can change the registry setting to read the whole column before deciding..
please see this also
http://jingyangli.wordpress.com/2009/02/13/imex1-revisit-and-typeguessrows-setting-change-to-0-watch-for-performance/
This isn't a direct answer, but I would like to recommend you use the Excel Data Reader, which is opensource under the LGPL licence and is Lightweight and fast library written in C# for reading Microsoft Excel files ('97-2007).

Is there a better way to indicate "null" values in Excel?

I have an Excel 2007 workbook that contains tables of data that I'm importing into DataTable objects using ADO.NET.
Through some experimentation, I've managed to find two different ways to indicate that a cell should be treated as "null" by ADO.NET:
The cell is completely blank.
The cell contains #N/A.
Unfortunately, both of these are problematic:
Most of my columns of data in Excel are generated via formulas, but it's not possible in Excel to generate a formula that results in a completely blank cell. And only a completely blank cell will be considered null (an empty string will not work).
Any formula that evaluates to #N/A (either due to an actual lookup error or because the NA() function was used) will be considered null. This seemed like the ideal solution until I discovered that the Excel workbook must be open for this to work. As soon as you close the workbook, OLEDB suddenly starts seeing all those #N/As as strings. This causes exceptions like the following to be thrown when filling the DataTable:
Input string was not in a correct format. Couldn't store <#N/A> in Value Column. Expected type is Int32.
Question: How can I indicate a null value via an Excel formula without having to have the workbook open when I fill the DataTable? Or what can be done to make #N/A values be considered null even when the workbook is closed?
In case it's important, my connection string is built using the following method:
var builder = new OleDbConnectionStringBuilder
{
Provider = "Microsoft.ACE.OLEDB.12.0",
DataSource = _workbookPath
};
builder.Add("Extended Properties", "Excel 12.0 Xml;HDR=Yes;IMEX=0");
return builder.ConnectionString;
(_workbookPath is the full path to the workbook).
I've tried both IMEX=0 and IMEX=1 but it makes no difference.
You're hitting the brickwall that many very frustrated users of Excel are experiencing. Unfortunately Excel as a company tool is widespread and seems quite robust, unfortunately because each cell/column/row has a variant data type it makes it a nightmare to handle with other tools such as MySQL, SQL Server, R, RapidMiner, SPSS and the list goes on. It seems that Excel 2007/2010 is not very well supported and even more so when taking 32/64 bit versions into account, which is scandalous in this day and age.
The main problem is that when ACE/Jet access each field in Excel they use a registry setting 'TypeGuessRows' to determine how many rows to use to assess the datatype. The default for "Rows to Scan" is 8 rows. The registry setting 'TypeGuessRows' can specify an integer value from one (1) to sixteen (16) rows, or you can specify zero (0) to scan all existing rows. If you can't change the registry setting (such as in 90% of office environments) it makes life difficult as the rows to guess are limited to the first 8.
For example, without the registry change
If the first occurrence of #N/A is within the first 8 rows then IMEX = 1 will return the error as a string "#N/A". If IMEX = 0 then an #N/A will return 'Null'.
If the first occurrence of #N/A is beyond the first 8 rows then both IMEX = 0 & IMEX = 1 both return 'Null' (assuming required data type is numeric).
With the registry change (TypeGuessRows = 0) then all should be fine.
Perhaps there are 4 options:
Change the registry setting TypeGuessRows = 0
List all possible type variations in the first 8 rows as 'dummy data' (eg memo fields/nchar(max)/ errors #N/A etc)
Correct ALL data type anomalies in Excel
Don't use Excel - Seriously worth considering!
Edit:
Just to put the boot in :) another 2 things that really annoy me are; if the first field on a sheet is blank over the first 8 rows and you can't edit the registry setting then the whole sheet is returned as blank (Many fun conversations telling managers they're fools for merging cells!). Also, if in Excel 2007/2010 you have a department return a sheet with >255 columns/fields then you have huge problems if you need non-contiguous import (eg key in col 1 and data in cols 255+)

Accessing Excel Spreadsheet with C# occasionally returns blank value for some cells

I need to access an excel spreadsheet and insert the data from the spreadsheet into a SQL Database. However the Primary Keys are mixed, most are numeric and some are alpha-numeric.
The problem I have is that when the numeric and alpha-numeric Keys are in the same spreadsheet the alpha-numeric cells return blank values, whereas all the other cells return their data without problems.
I am using the OleDb method to access the Excel file. After retrieving the data with a Command string I put the data into a DataAdapter and then I fill a DataSet. I iterate through all the rows (dr) in the first DataTable in the DataSet.
I reference the columns by using, dr["..."].ToString()
If I debug the project in Visual Studio 2008 and I view the "extended properties", by holding my mouse over the "dr" I can view the values of the DataRow, but the Primary Key that should be alpha-numeric is {}. The other values are enclosed in quotes, but the blank value has braces.
Is this a C# problem or an Excel problem?
Has anyone ever encountered this problem before, or maybe found a workaround/fix?
Thanks in advance.
Solution:
Connection String:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FilePath;Extended
Properties="Excel 8.0;HDR=Yes;IMEX=1";
HDR=Yes; indicates that the first row contains columnnames, not data. HDR=No; indicates the opposite.
IMEX=1; tells the driver to always read "intermixed" (numbers, dates, strings etc) data columns as text. Note that this option might affect excel sheet write access negative.
SQL syntax SELECT * FROM [sheet1$]. I.e. excel worksheet name followed by a $ and wrapped in [ ] brackets.
Important:
Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance.
If the Excel workbook is protected by a password, you cannot open it for data access, even by supplying the correct password with your connection string. If you try, you receive the following error message: "Could not decrypt file."
The Excel data source picks a column type for the entire column. If one of the cells doesn't match that type exactly, it leaves blanks like that. We had issues where our typist entered a " 8" (a space before the number, so Excel converted it to a string for that cell) in a numeric column. It would make sense to me that it would try the .Net Parse methods as they are more robust, but I guess that's not how the Excel driver works.
Our fix, since we were using database import services, was to log all the rows that 'failed' this way. Then, we went back to the XLS document and re-typed those cells, to ensure the underlying type was correct. (We found just deleting the space didn't fix it--we had to Clear the whole cell first, than re-type the '8'.) Feels hacky and isn't elagent, but that was the best method we found. If the Excel driver can't read it in correctly by itself, there's nothing you can do to get that data out of there once you're in .Net.
Just another case where Office hides the important details from users in the name of simplicity, and therefore making it more difficult when you have to be exact for power uses.
The {} means this is some sort of empty object and not a string. When you hover over the object you should be able to see its type. Likewise, when you use quickwatch to view dr["..."] you should see the object type. What type is the object you receive?
The ItemArray is an Object Array. So I assume that the "column" in the DataRow, that I am trying to reference, is of type object.
For VISTA compatibility you can use EXCEL 12.0 driver in connection string. This should resolve your issue. It did mine.
Solution:
You put HDR=No so that the first row is not considered the column header.
Connection String: Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FilePath;Extended Properties="Excel 8.0;HDR=No;IMEX=1";
You ignore the first row and you acces the data by any means you want (DataTable, DataReader ect). You acces the columns by numeric indexes, instead of column names.
It worked for me. This way you don't have to modify registers!
I answered a similar question here. Here I've copied and pasted the same answer for your convenience:
I had this same problem, but was able to work around it without resorting to the Excel COM interface or 3rd party software. It involves a little processing overhead, but appears to be working for me.
First read in the data to get the column names
Then create a new DataSet with each of these columns, setting each of their DataTypes to string.
Read the data in again into this new
dataset. Voila - the scientific
notation is now gone and everything is read in as a string.
Here's some code that illustrates this, and as an added bonus, it's even StyleCopped!
public void ImportSpreadsheet(string path)
{
string extendedProperties = "Excel 12.0;HDR=YES;IMEX=1";
string connectionString = string.Format(
CultureInfo.CurrentCulture,
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"{1}\"",
path,
extendedProperties);
using (OleDbConnection connection = new OleDbConnection(connectionString))
{
using (OleDbCommand command = connection.CreateCommand())
{
command.CommandText = "SELECT * FROM [Worksheet1$]";
connection.Open();
using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
using (DataSet columnDataSet = new DataSet())
using (DataSet dataSet = new DataSet())
{
columnDataSet.Locale = CultureInfo.CurrentCulture;
adapter.Fill(columnDataSet);
if (columnDataSet.Tables.Count == 1)
{
var worksheet = columnDataSet.Tables[0];
// Now that we have a valid worksheet read in, with column names, we can create a
// new DataSet with a table that has preset columns that are all of type string.
// This fixes a problem where the OLEDB provider is trying to guess the data types
// of the cells and strange data appears, such as scientific notation on some cells.
dataSet.Tables.Add("WorksheetData");
DataTable tempTable = dataSet.Tables[0];
foreach (DataColumn column in worksheet.Columns)
{
tempTable.Columns.Add(column.ColumnName, typeof(string));
}
adapter.Fill(dataSet, "WorksheetData");
if (dataSet.Tables.Count == 1)
{
worksheet = dataSet.Tables[0];
foreach (var row in worksheet.Rows)
{
// TODO: Consume some data.
}
}
}
}
}
}
}
Order the records in the xls file by ascii code in descending order so that alpha-numeric fields will appear at the top below the header row. This ensures that the first row of data read will define the data type as "varchar" or "nvarchar"
hi all this code is gets alphanumeric values also
using System.Data.OleDb;
string ConnectionString = #"Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source=" + filepath + ";" + "Extended Properties="+(char)34+"Excel 8.0;IMEX=1;"+(char)34;
string CommandText = "select * from [Sheet1$]";
OleDbConnection myConnection = new OleDbConnection(ConnectionString);
myConnection.Open();
OleDbDataAdapter myAdapter = new OleDbDataAdapter(CommandText, myConnection);
ds = null;
ds = new DataSet();
myAdapter.Fill(ds);
This isn't completely right! Apparently, Jet/ACE ALWAYS assumes a string type if the first 8 rows are blank, regardless of IMEX=1. Even when I made the rows read to 0 in the registry, I still had the same problem. This was the only sure fire way to get it to work:
try
{
Console.Write(wsReader.GetDouble(j).ToString());
}
catch //Lame unfixable bug
{
Console.Write(wsReader.GetString(j));
}

Categories

Resources