I have got an Excel file in this form :
Column 1 Column 2 Column 3
data1 data2
data1 data2
data1 data2
data1 data2
data1 data2 data3
That is, the whole Column 3 is empty except for the last row.
I am accessing the Excel file via OleDbDataAdapter, returning a DataTable: here's the code.
query = "SELECT * FROM [" + query + "]";
objDT = new DataTable();
objCmdSQL = this.GetCommand();
objCmdSQL.CommandText = query;
objSQLDad = new OleDbDataAdapter(objCmdSQL);
objSQLDad.Fill(objDT);
return objDT;
The point is, in this scenario my code returns a DataTable with just Column 1 and Column 2.
My guess is that JET engine tries to infer column type by the type of the very first cell in every column; being the first value null, the whole column is ignored.
I tried to fill in zeros and this code is actually returning all three columns; this is obviously the least preferable solution because I have to process large numbers of small files.
Inverting the selection range (from, i.e. "A1:C5" to "C5:A1" ) doesn't work either.
I'm looking for something more elegant.
I have already found a couple of posts discussing type mismatch (varchar cells in int columns and vice versa) but actually haven't found anything related to this one.
Thanks for reading!
edit
Weird behavior again. I have to work on mostly Excel 2003 .xls files, but since this question has been answered I thought I could test my code against Excel 2007 .xslx files.
The connection string is the following:
string strConn = #"Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" + _fileName.Trim() + #";Extended Properties=""Excel 12.0;HDR=No;IMEX=1;""";
I get the "External table is not in the expected format" exception which I reckon is the standard exception when there is a version mismatch between ACE/JET and the file being opened.
The string
Provider=Microsoft.ACE.OLEDB.12.0
means that I am using the most recent version of OLEDB, I took a quick peek around and this version is used everywhere there is need of connecting to .xlsx files.
I have tried with just a vanilla provider ( just Excel 12.0, without IMEX nor HDR ) but I get the same exception.
I am on .NET 2.0.50727 SP2, maybe time to upgrade?
I recreated your situation and following returned the 3 columns correctly. That is, the first two columns fully populated with data and the third containing null until the last row, which had data.
string connString = #"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\MyExcel.xls;Extended Properties=""Excel 8.0;HDR=No;IMEX=1"";";
DataTable dt = new DataTable();
OleDbConnection conn = new OleDbConnection(connString);
OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", conn);
adapter.Fill(dt);
Note I used the Access Database Engine(ACE) provider, which succeeded the old Joint Engine Technology(JET) provider, and my results may represent a behavior difference between the two. Of course, if you aren't already using it I suggest using the ACE provider as I believe Microsoft would too. Also, note the connection's Extended Properties:
"HDR=Yes;" indicates that the first
row contains columnnames, not data.
"HDR=No;" indicates the opposite.
"IMEX=1;" tells the driver to always
read "intermixed" (numbers, dates,
strings etc) data columns as text.
Note that this option might affect
excel sheet write access negative.
Let me know if this helps.
Related
Initially I had an issue with the data type "guesses" when dealing with the jet driver (through oledb). If a sheet had mixed types, it would bring in null/empty values.
-Edit-
There is an IMEX setting in the connection string as well as in the registry that will tell jet/ace to use text for columns with multiple data types. This way if the first 6 rows have an integer value and the 7th cell has a text value, there won't be a type cast failure. There is also a setting in the registry (and connection string) that will allow you to say how many rows jet should use for sampling.
-end edit-
I changed the connection string, and the registry settings on the server. So now the program is reading fine. It will read values as text, and not use {n} rows for sampling. I thought it was working fine.
Now I have a data source that lists files in order to be read. If I have multiple files in there, it will have the same type casting issues... or at least the same symptoms. If I upload the files one at a time without using the queue then it works fine. It's when I have multiple files in a row that it seems to have the type casting issue.
I'm not really sure what is causing this to happen when reading multiple files in a row, but not when reading one at a time. The connection opens, reads all the data, and then closes... so I don't think it has to do with that.
I am just looking for any ideas ? It was hard enough to find the original problem. Working with Jet seems to be asking for a butt ache.
Added relevant code as per request
public static readonly String CONNECTION_STRING = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=YES; ReadOnly=True;IMEX=1;\"";
private System.Data.DataTable Query(String worksheetName, String selectList = "*")
{
DataTable table = new DataTable();
_connection.Open();
var query = String.Format(Constants.DATA_QUERY, selectList, worksheetName);
new OleDbDataAdapter(query, _connection).Fill(table);
_connection.Close();
return table;
}
I'd recommend using a native library if possible, something like Excel Data Reader or EPPlus instead of OLEDB
I found the solution here
https://www.codeproject.com/Tips/702769/How-to-Get-Data-from-Multiple-Workbooks-using-One
Provider setup:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\path\fileName1.xls;Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
The SQL Statement must be set like this:
Select * From[Hoja1$]
UNION ALL
Select * From [Hoja1$] IN 'C:\path\fileName2.xls' 'Excel 8.0;HDR=Yes;IMEX=1'
If you want to make an inner join
Select * from [Hoja1$] as a
INNER JOIN (select * from [Hoja1$] IN 'C:\path\fileName2.xls' 'Excel 8.0;HDR=Yes;IMEX=1') as b
ON a.FOLIO=b.FOLIO
I'm trying to read an Excel sheet data to a datatable for binding to a GridView. My Excel sheet contains data as follows,
ID Value1 Value2
-------------------------------------------------
1 $312976.97530297 $30790.0614862584
etc
I'm using the following code to read the values to a datatable.
DataTable table = new DataTable();
string filePath = #"D:\Book1.xlsx";
string strConn = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=YES;IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text\"", filePath);
using (OleDbConnection dbConnection = new OleDbConnection(strConn))
{
using (OleDbDataAdapter dbAdapter = new OleDbDataAdapter("SELECT ID, value1,value2 FROM [Sheet1$]", dbConnection))
dbAdapter.Fill(table);
}
Problem: since value1 and value2 contain $ symbol, dataAdaptor retrieve both the values without full precision, ie. instead of 312976.97530297, its return 312976.9753 only.
Note:
1. I cannot change the input Excel sheet since the enduser will upload it to the web site.
2. If i remove the $ symbol in the excel sheet, it will return the full precision, but $ also present in the input sheet.
3. I tried using Microsoft.Office.Interop.Excel and its working, but the performance is very low.
Or I can format all the excel cell as Text/General type before filling to the datatable, Anyone knows how to do that?
Please suggest one method using OleDbDataAdapter.
Thanks in Advance,
Wilson.
Maybe you should give http://epplus.codeplex.com/releases/view/79802 a try. It can read data very fast, at least compared to interop. Also, interop is not reliable enough for a webservice.
For your problem: Maybe there is a way to escape "$"
Microsoft advises to use double or float datatype in case of Currencies. What format does you use?
#All,
Finally i have done it in a complex way :),
1. Formatting all the excel colums to General type after uploading using interop
2. Read the excel to a datatable using oledbadaptor.(now i'm getting the full precision in the tatatable since the cell type is General)
3. Now i'm converting the datatable row type to string since i need to insert the datatable values to SQL table using UDT(without converting we will loss the precision here also)
Anyway its working fine now (an alternative approach always welcome :) )
Wilz...
I'm creating a utility to import data from Excel to Oracle database,
I have a fixed template for the excel file,
Now, when I'm trying to import the data by Jet provider and ADO.Net - Ole connection tools, I found the following problem: there're some columns haven't been imported because there are mixed data types in their columns [string and number],
I looked for this problem on the internet I found the reason is guessing data types from Excel
The load code:
connection = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0; Data Source={0};Extended Properties=Excel 8.0;");
string columns = "P_ID, FULL_NAME_AR, job_no, GENDER, BIRTH_DATE, RELIGION, MARITAL_STATUS, NAT_ID, JOB_Name, FIRST_HIRE_DATE, HIRE_DATE, CONTRACT_TYPE, GRADE_CODE, QUALIFICATION";
string sheetName = "[Emps$]";
OleDbCommand command = new OleDbCommand(string.Format("select {0} from {1} where p_id is not null", columns, sheetName), connection);
connection.Open();
dr = command.ExecuteReader();
DataTable table = new DataTable();
table.Load(dr);
What should I do to tell Excel STOP GUESSING and give me the data as Text ?
if there isn't, can you help me with any workarounds ?
Thanks in advance
I found a solution by adding IMEX=1 for the connection string, but there's a special format for it which descriped in the following link.
The IMEX parameter is for columns that use mixed numeric and alpha values.
The Excel driver will typically scan the first several rows
in order to determine what data type to use for each column. If a column is determined to be numeric
based upon a scan of the first several rows, then any rows with alpha characters in this column will
be returned as Null. The IMEX parameter (1 is input mode) forces the data type of the column to
text so that alphanumeric values are handled properly.
Regards
This isn't completely right! Apparently, Jet/ACE ALWAYS assumes a string type if the first 8 rows are blank, regardless of IMEX=1, and always uses a numeric type if the first 8 rows are numbers (again, regardless of IMEX=1). Even when I made the rows read to 0 in the registry, I still had the same problem. This was the only sure fire way to get it to work:
try
{
Console.Write(wsReader.GetDouble(j).ToString());
}
catch //Lame unfixable bug
{
Console.Write(wsReader.GetString(j));
}
Can you work from the excel end? This example run in Excel will put mixed data tyoes into an SQL Server table:
Dim cn As New ADODB.Connection
scn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" _
& sFullName _
& ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
cn.Open scn
s = "SELECT Col1, Col2, Col3 INTO [ODBC;Description=TEST;DRIVER=SQL Server;" _
& "SERVER=Some\Instance;Trusted_Connection=Yes;" _
& "DATABASE=test].TableZ FROM [Sheet1$]"
cn.Execute s
An alternative solution is to add or change the setting TypeGuessRows in the registry. By setting its value to 0, the complete document will be scanned.
Unfortunately, the settings may be found on various locations in the registry, depending on the which libraries and versions of them you have installed.
For instance:
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel]
"TypeGuessRows"=dword:00000000
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel]
"TypeGuessRows"=dword:00000000
This will also prevent truncation of textual data longer than 255 characters. This happens if you have a number for TypeGuessRows larger than 0 and the first text longer than 255 characters occurs beyond that number.
See also Setting TypeGuessRows for excel ACE Driver.
i am getting a weird problem. i am using OLEDB for excel connection with
connection string = Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Execute.xls;Extended Properties=Excel 8.0;");
excel file contains columns with string/integer values.
the problem is that sometimes connection read values from sheet absolutly fine but sometimes it missed out some data values and shows them as System.DBNull.
the behavior is very inconsistent.
please help.
Check out http://blog.lab49.com/archives/196.
My first guess would be to check for your regional parameters. Number formats would be different from one regional setting to another, and this could cause the problem. Although Excel is supposed to manage it for you automatically, some times it just doesn't as it is confused or something, then render some strange data like those DBNull values.
Here is your problem. IIRC, the driver only reads the first 8 rows of data and determines the data type of the columns based on that.
So let's say, in the first 8 rows of column 1, you only have numbers. The driver will decide that the column is an integer. Then, if it encounters a string in row 9, it will not be able to convert it to an integer and thus return DBNull to you.
There are several things you can do.
Pre-process your spreadsheet and convert everything to strings
There is a registry entry (the location escapes me at the moment), that allows you to increase the number of rows that the driver uses to determine data type.
Use a commercial Excel reader control
Open the sheet via the Excel Interop library and read the cells on your own
use this code
string pathcpnn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source =" + textBox1.Text + ";Extended Properties=\""Excel 8.0;HDR=Yes;IMEX=1;";";
OleDbConnection con = new OleDbConnection(pathcpnn);
OleDbDataAdapter myDataAdapter = new OleDbDataAdapter("Select * from[" + textBox2.Text + "$]", con);
myDataAdapter.Fill(dt);
DAtagridview1.datasource=dt;
I need to access an excel spreadsheet and insert the data from the spreadsheet into a SQL Database. However the Primary Keys are mixed, most are numeric and some are alpha-numeric.
The problem I have is that when the numeric and alpha-numeric Keys are in the same spreadsheet the alpha-numeric cells return blank values, whereas all the other cells return their data without problems.
I am using the OleDb method to access the Excel file. After retrieving the data with a Command string I put the data into a DataAdapter and then I fill a DataSet. I iterate through all the rows (dr) in the first DataTable in the DataSet.
I reference the columns by using, dr["..."].ToString()
If I debug the project in Visual Studio 2008 and I view the "extended properties", by holding my mouse over the "dr" I can view the values of the DataRow, but the Primary Key that should be alpha-numeric is {}. The other values are enclosed in quotes, but the blank value has braces.
Is this a C# problem or an Excel problem?
Has anyone ever encountered this problem before, or maybe found a workaround/fix?
Thanks in advance.
Solution:
Connection String:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FilePath;Extended
Properties="Excel 8.0;HDR=Yes;IMEX=1";
HDR=Yes; indicates that the first row contains columnnames, not data. HDR=No; indicates the opposite.
IMEX=1; tells the driver to always read "intermixed" (numbers, dates, strings etc) data columns as text. Note that this option might affect excel sheet write access negative.
SQL syntax SELECT * FROM [sheet1$]. I.e. excel worksheet name followed by a $ and wrapped in [ ] brackets.
Important:
Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance.
If the Excel workbook is protected by a password, you cannot open it for data access, even by supplying the correct password with your connection string. If you try, you receive the following error message: "Could not decrypt file."
The Excel data source picks a column type for the entire column. If one of the cells doesn't match that type exactly, it leaves blanks like that. We had issues where our typist entered a " 8" (a space before the number, so Excel converted it to a string for that cell) in a numeric column. It would make sense to me that it would try the .Net Parse methods as they are more robust, but I guess that's not how the Excel driver works.
Our fix, since we were using database import services, was to log all the rows that 'failed' this way. Then, we went back to the XLS document and re-typed those cells, to ensure the underlying type was correct. (We found just deleting the space didn't fix it--we had to Clear the whole cell first, than re-type the '8'.) Feels hacky and isn't elagent, but that was the best method we found. If the Excel driver can't read it in correctly by itself, there's nothing you can do to get that data out of there once you're in .Net.
Just another case where Office hides the important details from users in the name of simplicity, and therefore making it more difficult when you have to be exact for power uses.
The {} means this is some sort of empty object and not a string. When you hover over the object you should be able to see its type. Likewise, when you use quickwatch to view dr["..."] you should see the object type. What type is the object you receive?
The ItemArray is an Object Array. So I assume that the "column" in the DataRow, that I am trying to reference, is of type object.
For VISTA compatibility you can use EXCEL 12.0 driver in connection string. This should resolve your issue. It did mine.
Solution:
You put HDR=No so that the first row is not considered the column header.
Connection String: Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FilePath;Extended Properties="Excel 8.0;HDR=No;IMEX=1";
You ignore the first row and you acces the data by any means you want (DataTable, DataReader ect). You acces the columns by numeric indexes, instead of column names.
It worked for me. This way you don't have to modify registers!
I answered a similar question here. Here I've copied and pasted the same answer for your convenience:
I had this same problem, but was able to work around it without resorting to the Excel COM interface or 3rd party software. It involves a little processing overhead, but appears to be working for me.
First read in the data to get the column names
Then create a new DataSet with each of these columns, setting each of their DataTypes to string.
Read the data in again into this new
dataset. Voila - the scientific
notation is now gone and everything is read in as a string.
Here's some code that illustrates this, and as an added bonus, it's even StyleCopped!
public void ImportSpreadsheet(string path)
{
string extendedProperties = "Excel 12.0;HDR=YES;IMEX=1";
string connectionString = string.Format(
CultureInfo.CurrentCulture,
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"{1}\"",
path,
extendedProperties);
using (OleDbConnection connection = new OleDbConnection(connectionString))
{
using (OleDbCommand command = connection.CreateCommand())
{
command.CommandText = "SELECT * FROM [Worksheet1$]";
connection.Open();
using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
using (DataSet columnDataSet = new DataSet())
using (DataSet dataSet = new DataSet())
{
columnDataSet.Locale = CultureInfo.CurrentCulture;
adapter.Fill(columnDataSet);
if (columnDataSet.Tables.Count == 1)
{
var worksheet = columnDataSet.Tables[0];
// Now that we have a valid worksheet read in, with column names, we can create a
// new DataSet with a table that has preset columns that are all of type string.
// This fixes a problem where the OLEDB provider is trying to guess the data types
// of the cells and strange data appears, such as scientific notation on some cells.
dataSet.Tables.Add("WorksheetData");
DataTable tempTable = dataSet.Tables[0];
foreach (DataColumn column in worksheet.Columns)
{
tempTable.Columns.Add(column.ColumnName, typeof(string));
}
adapter.Fill(dataSet, "WorksheetData");
if (dataSet.Tables.Count == 1)
{
worksheet = dataSet.Tables[0];
foreach (var row in worksheet.Rows)
{
// TODO: Consume some data.
}
}
}
}
}
}
}
Order the records in the xls file by ascii code in descending order so that alpha-numeric fields will appear at the top below the header row. This ensures that the first row of data read will define the data type as "varchar" or "nvarchar"
hi all this code is gets alphanumeric values also
using System.Data.OleDb;
string ConnectionString = #"Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source=" + filepath + ";" + "Extended Properties="+(char)34+"Excel 8.0;IMEX=1;"+(char)34;
string CommandText = "select * from [Sheet1$]";
OleDbConnection myConnection = new OleDbConnection(ConnectionString);
myConnection.Open();
OleDbDataAdapter myAdapter = new OleDbDataAdapter(CommandText, myConnection);
ds = null;
ds = new DataSet();
myAdapter.Fill(ds);
This isn't completely right! Apparently, Jet/ACE ALWAYS assumes a string type if the first 8 rows are blank, regardless of IMEX=1. Even when I made the rows read to 0 in the registry, I still had the same problem. This was the only sure fire way to get it to work:
try
{
Console.Write(wsReader.GetDouble(j).ToString());
}
catch //Lame unfixable bug
{
Console.Write(wsReader.GetString(j));
}