i am getting a weird problem. i am using OLEDB for excel connection with
connection string = Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Execute.xls;Extended Properties=Excel 8.0;");
excel file contains columns with string/integer values.
the problem is that sometimes connection read values from sheet absolutly fine but sometimes it missed out some data values and shows them as System.DBNull.
the behavior is very inconsistent.
please help.
Check out http://blog.lab49.com/archives/196.
My first guess would be to check for your regional parameters. Number formats would be different from one regional setting to another, and this could cause the problem. Although Excel is supposed to manage it for you automatically, some times it just doesn't as it is confused or something, then render some strange data like those DBNull values.
Here is your problem. IIRC, the driver only reads the first 8 rows of data and determines the data type of the columns based on that.
So let's say, in the first 8 rows of column 1, you only have numbers. The driver will decide that the column is an integer. Then, if it encounters a string in row 9, it will not be able to convert it to an integer and thus return DBNull to you.
There are several things you can do.
Pre-process your spreadsheet and convert everything to strings
There is a registry entry (the location escapes me at the moment), that allows you to increase the number of rows that the driver uses to determine data type.
Use a commercial Excel reader control
Open the sheet via the Excel Interop library and read the cells on your own
use this code
string pathcpnn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source =" + textBox1.Text + ";Extended Properties=\""Excel 8.0;HDR=Yes;IMEX=1;";";
OleDbConnection con = new OleDbConnection(pathcpnn);
OleDbDataAdapter myDataAdapter = new OleDbDataAdapter("Select * from[" + textBox2.Text + "$]", con);
myDataAdapter.Fill(dt);
DAtagridview1.datasource=dt;
Related
I have got an Excel file in this form :
Column 1 Column 2 Column 3
data1 data2
data1 data2
data1 data2
data1 data2
data1 data2 data3
That is, the whole Column 3 is empty except for the last row.
I am accessing the Excel file via OleDbDataAdapter, returning a DataTable: here's the code.
query = "SELECT * FROM [" + query + "]";
objDT = new DataTable();
objCmdSQL = this.GetCommand();
objCmdSQL.CommandText = query;
objSQLDad = new OleDbDataAdapter(objCmdSQL);
objSQLDad.Fill(objDT);
return objDT;
The point is, in this scenario my code returns a DataTable with just Column 1 and Column 2.
My guess is that JET engine tries to infer column type by the type of the very first cell in every column; being the first value null, the whole column is ignored.
I tried to fill in zeros and this code is actually returning all three columns; this is obviously the least preferable solution because I have to process large numbers of small files.
Inverting the selection range (from, i.e. "A1:C5" to "C5:A1" ) doesn't work either.
I'm looking for something more elegant.
I have already found a couple of posts discussing type mismatch (varchar cells in int columns and vice versa) but actually haven't found anything related to this one.
Thanks for reading!
edit
Weird behavior again. I have to work on mostly Excel 2003 .xls files, but since this question has been answered I thought I could test my code against Excel 2007 .xslx files.
The connection string is the following:
string strConn = #"Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" + _fileName.Trim() + #";Extended Properties=""Excel 12.0;HDR=No;IMEX=1;""";
I get the "External table is not in the expected format" exception which I reckon is the standard exception when there is a version mismatch between ACE/JET and the file being opened.
The string
Provider=Microsoft.ACE.OLEDB.12.0
means that I am using the most recent version of OLEDB, I took a quick peek around and this version is used everywhere there is need of connecting to .xlsx files.
I have tried with just a vanilla provider ( just Excel 12.0, without IMEX nor HDR ) but I get the same exception.
I am on .NET 2.0.50727 SP2, maybe time to upgrade?
I recreated your situation and following returned the 3 columns correctly. That is, the first two columns fully populated with data and the third containing null until the last row, which had data.
string connString = #"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\MyExcel.xls;Extended Properties=""Excel 8.0;HDR=No;IMEX=1"";";
DataTable dt = new DataTable();
OleDbConnection conn = new OleDbConnection(connString);
OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", conn);
adapter.Fill(dt);
Note I used the Access Database Engine(ACE) provider, which succeeded the old Joint Engine Technology(JET) provider, and my results may represent a behavior difference between the two. Of course, if you aren't already using it I suggest using the ACE provider as I believe Microsoft would too. Also, note the connection's Extended Properties:
"HDR=Yes;" indicates that the first
row contains columnnames, not data.
"HDR=No;" indicates the opposite.
"IMEX=1;" tells the driver to always
read "intermixed" (numbers, dates,
strings etc) data columns as text.
Note that this option might affect
excel sheet write access negative.
Let me know if this helps.
First, I want to say that I'm out on deep water here, since I'm just doing some changes to code that is written by someone else in the company, using OleDbDataAdapter to "talk" to Excel and I'm not familiar with that. There is one bug there I just can't follow.
I'm trying to use a OleDbDataAdapter to read in a excel file with around 450 lines.
In the code it's done like this:
connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source='" + path + "';" + "Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1;\"");
connection.Open();
OleDbDataAdapter objAdapter = new OleDbDataAdapter(objCommand.CommandText, connection);
objAdapter.Fill(objDataSet, "Excel");
foreach (DataColumn dataColumn in objTable.Columns) {
if (dataColumn.Ordinal > objDataSet.Tables[0].Columns.Count - 1) {
objDataSet.Tables[0].Columns.Add();
}
objDataSet.Tables[0].Columns[dataColumn.Ordinal].ColumnName = dataColumn.ColumnName;
objImport.Columns.Add(dataColumn.ColumnName);
}
foreach (DataRow dataRow in objDataSet.Tables[0].Rows) {
...
}
Everything seems to be working fine except for one thing. The second column is filled with mostly four digit numbers like 6739, 3920 and so one, but fice rows have alphanumeric values like 8201NO and 8205NO. Those five cells are reported as having blank contents instead of their alphanumeric content. I have checked in excel, and all the cells in this columns are marked as Text.
This is an xls file by the way, and not xlsx.
Do anyone have any clue as why these cells are shown as blank in the DataRow, but the numeric ones are shown fine? There are other columns with alphanumeric content that are shown just fine.
What's happening is that excel is trying to assign a data type to the spreadsheet column based on the first several values in that column. I suspect that if you look at the properties in that column it will say it is a numerical column.
The problem comes when you start trying to query that spreadsheet using jet. When it thinks it's dealing with a numerical column and it finds a varchar value it quietly returns nothing. Not even a cryptic error message to go off of.
As a possible work around can you move one of the alpha numeric values to the first row of data and then try parsing. I suspect you will start getting values for the alpha numeric rows then...
Take a look at this article. It goes into more detail on this issue. it also talks about a possible work around which is:
However, as per JET documentation, we
can override the registry setting thru
the Connection String, if we set
IMEX=1( as part of Extended
Properties), the JET will set the all
column type as UNICODE VARCHAR or
ADVARWCHAR irrespective of
‘ImportMixedTypes’ key value.hey
IMEX=1 means "Read mixed data as text."
There are some gotchas, however. Jet will only use several rows to determine whether the data is mixed, and if so happens these rows are all numeric, you'll get this behaviour.
See connectionstrings.com for details:
Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance. Please also note that adding the IMEX=1 option might cause the IMEX feature to set in after just 8 rows. Use IMEX=0 instead to be sure to force the registry TypeGuessRows=0 (scan all rows) to work.
I would advise against using the OleDb data provider stuff to access Excel if you can help it. I've had nothing but problems, for exactly the reasons that others have pointed out. The performance tends to be atrocious as well when you are dealing with large spreadsheets.
You might try this open source solution:
http://exceldatareader.codeplex.com/
I'm creating a utility to import data from Excel to Oracle database,
I have a fixed template for the excel file,
Now, when I'm trying to import the data by Jet provider and ADO.Net - Ole connection tools, I found the following problem: there're some columns haven't been imported because there are mixed data types in their columns [string and number],
I looked for this problem on the internet I found the reason is guessing data types from Excel
The load code:
connection = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0; Data Source={0};Extended Properties=Excel 8.0;");
string columns = "P_ID, FULL_NAME_AR, job_no, GENDER, BIRTH_DATE, RELIGION, MARITAL_STATUS, NAT_ID, JOB_Name, FIRST_HIRE_DATE, HIRE_DATE, CONTRACT_TYPE, GRADE_CODE, QUALIFICATION";
string sheetName = "[Emps$]";
OleDbCommand command = new OleDbCommand(string.Format("select {0} from {1} where p_id is not null", columns, sheetName), connection);
connection.Open();
dr = command.ExecuteReader();
DataTable table = new DataTable();
table.Load(dr);
What should I do to tell Excel STOP GUESSING and give me the data as Text ?
if there isn't, can you help me with any workarounds ?
Thanks in advance
I found a solution by adding IMEX=1 for the connection string, but there's a special format for it which descriped in the following link.
The IMEX parameter is for columns that use mixed numeric and alpha values.
The Excel driver will typically scan the first several rows
in order to determine what data type to use for each column. If a column is determined to be numeric
based upon a scan of the first several rows, then any rows with alpha characters in this column will
be returned as Null. The IMEX parameter (1 is input mode) forces the data type of the column to
text so that alphanumeric values are handled properly.
Regards
This isn't completely right! Apparently, Jet/ACE ALWAYS assumes a string type if the first 8 rows are blank, regardless of IMEX=1, and always uses a numeric type if the first 8 rows are numbers (again, regardless of IMEX=1). Even when I made the rows read to 0 in the registry, I still had the same problem. This was the only sure fire way to get it to work:
try
{
Console.Write(wsReader.GetDouble(j).ToString());
}
catch //Lame unfixable bug
{
Console.Write(wsReader.GetString(j));
}
Can you work from the excel end? This example run in Excel will put mixed data tyoes into an SQL Server table:
Dim cn As New ADODB.Connection
scn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" _
& sFullName _
& ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
cn.Open scn
s = "SELECT Col1, Col2, Col3 INTO [ODBC;Description=TEST;DRIVER=SQL Server;" _
& "SERVER=Some\Instance;Trusted_Connection=Yes;" _
& "DATABASE=test].TableZ FROM [Sheet1$]"
cn.Execute s
An alternative solution is to add or change the setting TypeGuessRows in the registry. By setting its value to 0, the complete document will be scanned.
Unfortunately, the settings may be found on various locations in the registry, depending on the which libraries and versions of them you have installed.
For instance:
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel]
"TypeGuessRows"=dword:00000000
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel]
"TypeGuessRows"=dword:00000000
This will also prevent truncation of textual data longer than 255 characters. This happens if you have a number for TypeGuessRows larger than 0 and the first text longer than 255 characters occurs beyond that number.
See also Setting TypeGuessRows for excel ACE Driver.
I'm writing to Excel file using OLEDB (C#).
What I need is just RAW data format.
I've noticed all cells (headers and values) are prefixed by apostrophe (')
Is it a way to avoid adding them in all text cells?
Here is my connection string:
string connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
filePath + ";Extended Properties='Excel 8.0;HDR=Yes'";
I've tried use IMEX=1 like this:
string connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
filePath + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1\"";
But after that I'm receiving below error:
The Microsoft Jet database engine could not find the object
'C:\Temp\New Folder\MF_2009_04_19_2008-11-182009_DMBHCSAM1118.xls'.
Make sure the object exists and that you spell its name and the path name correctly.
Finally I've tried use IMEX=0 like this:
string connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
filePath + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=0\"";
This time no exeptions raised.
Unfortunately there is still problem with apostrophes
(so each my values looks as: '123, 'abc etc...)
Any idea?
http://support.microsoft.com/kb/257819 has a statement to the effect that this behaviour might be unavoidable when inserting text into Excel via ADO:
A caution about editing Excel data with ADO: When you insert text data into Excel with ADO, the text value is preceded with a single quote. This may cause problems later in working with the new data.
Is the data explicitly text, might it be coerced into a numeric format? (clutching at straws...)
Could you just use the Excel DSN? It seems to be pretty ubiquitous. I don't know .NET, so take this with a grain of salt, but here's my connection string for an OLEDB Query straight from a stock table:
"Provider=MSDASQL.1;Persist Security Info=True;Extended Properties
=""DSN=Excel Files;DBQ=" & filePath & "\" & fileName &
";DriverId=1046;MaxBufferSize=2048;PageTimeout=5;"""
And I used this basic INSERT statement:
INSERT INTO rngOutput VALUES (1, 'ABC', '$1.00', 1300)
When I did this, I didn't have any apostrophes in my data range. I'm also using Excel 2007, and I see you're using Excel 8.0 as your driver?
Hopefully that nudges you toward a solution!
Insert some dummy values for the columns which has apostrophe attached in the template file. Say for example for Name column put something like this dummyname, and age column put 99. Instead of inserting a new row in the template just update the row (Update..... where Name = 'dummyname' and age =99).
This has worked for me..
Hope it works for you also!
Remove IMEX=1 from your connection string.
http://www.connectionstrings.com/excel
I know that when entering data into Excel, prefixing it with an apostrophe is an easy way to make it into a text field. Are you sure the data does not actually contain the apostrophe? If it's added to the data at entry time, your only option would be to catch them at import time and dealing with them in some custom code.
Check the resistry
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows
This decides how many rows should be scanned before deciding the format for the column.
Default is 8, and 0 will force ADO to scan all column values before choosing the appropriate data type.
It may not suit your requirements, particularly if you need to set up formatting in the Excel sheet, but have you considered writing the data out to a CSV file and saving it with an .XLS extension?
The other thing you could try would be to explicitly set format data types of your target cells in the Excel sheet to type "Text" (as through Format > Cells within Excel) before you attempt to load data. By default, the cells will be of type "General", and the driver may be adding the apostrophe to force your data to be displayed at text.
Try the following hack to resolve the issue. Modify the template as per the instructions
In the first data row just below the header row. Format the columns in the required format.
Enter some dummy values like space for characters, 0 for numeric values etc.
Hide the first data row that has the dummy values and save the template
Now run your insert script using ADO.net
-Venkat Kolla
I need to access an excel spreadsheet and insert the data from the spreadsheet into a SQL Database. However the Primary Keys are mixed, most are numeric and some are alpha-numeric.
The problem I have is that when the numeric and alpha-numeric Keys are in the same spreadsheet the alpha-numeric cells return blank values, whereas all the other cells return their data without problems.
I am using the OleDb method to access the Excel file. After retrieving the data with a Command string I put the data into a DataAdapter and then I fill a DataSet. I iterate through all the rows (dr) in the first DataTable in the DataSet.
I reference the columns by using, dr["..."].ToString()
If I debug the project in Visual Studio 2008 and I view the "extended properties", by holding my mouse over the "dr" I can view the values of the DataRow, but the Primary Key that should be alpha-numeric is {}. The other values are enclosed in quotes, but the blank value has braces.
Is this a C# problem or an Excel problem?
Has anyone ever encountered this problem before, or maybe found a workaround/fix?
Thanks in advance.
Solution:
Connection String:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FilePath;Extended
Properties="Excel 8.0;HDR=Yes;IMEX=1";
HDR=Yes; indicates that the first row contains columnnames, not data. HDR=No; indicates the opposite.
IMEX=1; tells the driver to always read "intermixed" (numbers, dates, strings etc) data columns as text. Note that this option might affect excel sheet write access negative.
SQL syntax SELECT * FROM [sheet1$]. I.e. excel worksheet name followed by a $ and wrapped in [ ] brackets.
Important:
Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance.
If the Excel workbook is protected by a password, you cannot open it for data access, even by supplying the correct password with your connection string. If you try, you receive the following error message: "Could not decrypt file."
The Excel data source picks a column type for the entire column. If one of the cells doesn't match that type exactly, it leaves blanks like that. We had issues where our typist entered a " 8" (a space before the number, so Excel converted it to a string for that cell) in a numeric column. It would make sense to me that it would try the .Net Parse methods as they are more robust, but I guess that's not how the Excel driver works.
Our fix, since we were using database import services, was to log all the rows that 'failed' this way. Then, we went back to the XLS document and re-typed those cells, to ensure the underlying type was correct. (We found just deleting the space didn't fix it--we had to Clear the whole cell first, than re-type the '8'.) Feels hacky and isn't elagent, but that was the best method we found. If the Excel driver can't read it in correctly by itself, there's nothing you can do to get that data out of there once you're in .Net.
Just another case where Office hides the important details from users in the name of simplicity, and therefore making it more difficult when you have to be exact for power uses.
The {} means this is some sort of empty object and not a string. When you hover over the object you should be able to see its type. Likewise, when you use quickwatch to view dr["..."] you should see the object type. What type is the object you receive?
The ItemArray is an Object Array. So I assume that the "column" in the DataRow, that I am trying to reference, is of type object.
For VISTA compatibility you can use EXCEL 12.0 driver in connection string. This should resolve your issue. It did mine.
Solution:
You put HDR=No so that the first row is not considered the column header.
Connection String: Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FilePath;Extended Properties="Excel 8.0;HDR=No;IMEX=1";
You ignore the first row and you acces the data by any means you want (DataTable, DataReader ect). You acces the columns by numeric indexes, instead of column names.
It worked for me. This way you don't have to modify registers!
I answered a similar question here. Here I've copied and pasted the same answer for your convenience:
I had this same problem, but was able to work around it without resorting to the Excel COM interface or 3rd party software. It involves a little processing overhead, but appears to be working for me.
First read in the data to get the column names
Then create a new DataSet with each of these columns, setting each of their DataTypes to string.
Read the data in again into this new
dataset. Voila - the scientific
notation is now gone and everything is read in as a string.
Here's some code that illustrates this, and as an added bonus, it's even StyleCopped!
public void ImportSpreadsheet(string path)
{
string extendedProperties = "Excel 12.0;HDR=YES;IMEX=1";
string connectionString = string.Format(
CultureInfo.CurrentCulture,
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"{1}\"",
path,
extendedProperties);
using (OleDbConnection connection = new OleDbConnection(connectionString))
{
using (OleDbCommand command = connection.CreateCommand())
{
command.CommandText = "SELECT * FROM [Worksheet1$]";
connection.Open();
using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
using (DataSet columnDataSet = new DataSet())
using (DataSet dataSet = new DataSet())
{
columnDataSet.Locale = CultureInfo.CurrentCulture;
adapter.Fill(columnDataSet);
if (columnDataSet.Tables.Count == 1)
{
var worksheet = columnDataSet.Tables[0];
// Now that we have a valid worksheet read in, with column names, we can create a
// new DataSet with a table that has preset columns that are all of type string.
// This fixes a problem where the OLEDB provider is trying to guess the data types
// of the cells and strange data appears, such as scientific notation on some cells.
dataSet.Tables.Add("WorksheetData");
DataTable tempTable = dataSet.Tables[0];
foreach (DataColumn column in worksheet.Columns)
{
tempTable.Columns.Add(column.ColumnName, typeof(string));
}
adapter.Fill(dataSet, "WorksheetData");
if (dataSet.Tables.Count == 1)
{
worksheet = dataSet.Tables[0];
foreach (var row in worksheet.Rows)
{
// TODO: Consume some data.
}
}
}
}
}
}
}
Order the records in the xls file by ascii code in descending order so that alpha-numeric fields will appear at the top below the header row. This ensures that the first row of data read will define the data type as "varchar" or "nvarchar"
hi all this code is gets alphanumeric values also
using System.Data.OleDb;
string ConnectionString = #"Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source=" + filepath + ";" + "Extended Properties="+(char)34+"Excel 8.0;IMEX=1;"+(char)34;
string CommandText = "select * from [Sheet1$]";
OleDbConnection myConnection = new OleDbConnection(ConnectionString);
myConnection.Open();
OleDbDataAdapter myAdapter = new OleDbDataAdapter(CommandText, myConnection);
ds = null;
ds = new DataSet();
myAdapter.Fill(ds);
This isn't completely right! Apparently, Jet/ACE ALWAYS assumes a string type if the first 8 rows are blank, regardless of IMEX=1. Even when I made the rows read to 0 in the registry, I still had the same problem. This was the only sure fire way to get it to work:
try
{
Console.Write(wsReader.GetDouble(j).ToString());
}
catch //Lame unfixable bug
{
Console.Write(wsReader.GetString(j));
}