I'm writing to Excel file using OLEDB (C#).
What I need is just RAW data format.
I've noticed all cells (headers and values) are prefixed by apostrophe (')
Is it a way to avoid adding them in all text cells?
Here is my connection string:
string connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
filePath + ";Extended Properties='Excel 8.0;HDR=Yes'";
I've tried use IMEX=1 like this:
string connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
filePath + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1\"";
But after that I'm receiving below error:
The Microsoft Jet database engine could not find the object
'C:\Temp\New Folder\MF_2009_04_19_2008-11-182009_DMBHCSAM1118.xls'.
Make sure the object exists and that you spell its name and the path name correctly.
Finally I've tried use IMEX=0 like this:
string connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
filePath + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=0\"";
This time no exeptions raised.
Unfortunately there is still problem with apostrophes
(so each my values looks as: '123, 'abc etc...)
Any idea?
http://support.microsoft.com/kb/257819 has a statement to the effect that this behaviour might be unavoidable when inserting text into Excel via ADO:
A caution about editing Excel data with ADO: When you insert text data into Excel with ADO, the text value is preceded with a single quote. This may cause problems later in working with the new data.
Is the data explicitly text, might it be coerced into a numeric format? (clutching at straws...)
Could you just use the Excel DSN? It seems to be pretty ubiquitous. I don't know .NET, so take this with a grain of salt, but here's my connection string for an OLEDB Query straight from a stock table:
"Provider=MSDASQL.1;Persist Security Info=True;Extended Properties
=""DSN=Excel Files;DBQ=" & filePath & "\" & fileName &
";DriverId=1046;MaxBufferSize=2048;PageTimeout=5;"""
And I used this basic INSERT statement:
INSERT INTO rngOutput VALUES (1, 'ABC', '$1.00', 1300)
When I did this, I didn't have any apostrophes in my data range. I'm also using Excel 2007, and I see you're using Excel 8.0 as your driver?
Hopefully that nudges you toward a solution!
Insert some dummy values for the columns which has apostrophe attached in the template file. Say for example for Name column put something like this dummyname, and age column put 99. Instead of inserting a new row in the template just update the row (Update..... where Name = 'dummyname' and age =99).
This has worked for me..
Hope it works for you also!
Remove IMEX=1 from your connection string.
http://www.connectionstrings.com/excel
I know that when entering data into Excel, prefixing it with an apostrophe is an easy way to make it into a text field. Are you sure the data does not actually contain the apostrophe? If it's added to the data at entry time, your only option would be to catch them at import time and dealing with them in some custom code.
Check the resistry
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows
This decides how many rows should be scanned before deciding the format for the column.
Default is 8, and 0 will force ADO to scan all column values before choosing the appropriate data type.
It may not suit your requirements, particularly if you need to set up formatting in the Excel sheet, but have you considered writing the data out to a CSV file and saving it with an .XLS extension?
The other thing you could try would be to explicitly set format data types of your target cells in the Excel sheet to type "Text" (as through Format > Cells within Excel) before you attempt to load data. By default, the cells will be of type "General", and the driver may be adding the apostrophe to force your data to be displayed at text.
Try the following hack to resolve the issue. Modify the template as per the instructions
In the first data row just below the header row. Format the columns in the required format.
Enter some dummy values like space for characters, 0 for numeric values etc.
Hide the first data row that has the dummy values and save the template
Now run your insert script using ADO.net
-Venkat Kolla
Related
I am trying to read the following excel file with C#.
I've tried both of these connection strings:
First connection string: I get the correct values for the header and null for all the other cells.
sbConnection.Provider = "Microsoft.ACE.OLEDB.12.0";
strExtendedProperties = "Excel 12.0;HDR=Yes;";
Second connection string: I get incorrect values for the header and the correct ones for all the other cells.
sbConnection.Provider = "Microsoft.ACE.OLEDB.12.0";
strExtendedProperties = "Excel 12.0;HDR=Yes;IMEX=1";
Using the extended properties solved my problem in reading mixed type data(header). But, now, it is not reading the decimal ,dates and percent values in my data.
With both connection strings I get for some cells null although I have values in them. How could I modify the connection string in order to read the Excel file properly?
Any help would be most appreciated.
Check out the following that will answer your question below.
I found this from the following source http://www.codeproject.com/Questions/385351/decimal-data-in-xls-file-not-being-read-and-or-con
"HDR=Yes;" indicates that the first row contains columnnames, not data. "HDR=No;" indicates the opposite.
"IMEX=1;" tells the driver to always read "intermixed" (numbers, dates, strings etc) data columns as text. Note that this option might affect excel sheet write access negative.
SQL syntax "SELECT [Column Name One], [Column Name Two] FROM [Sheet One$]". I.e. excel worksheet name followed by a "$" and wrapped in "[" "]" brackets.
Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance. Please also note that adding the IMEX=1 option might cause the IMEX feature to set in after just 8 rows. Use IMEX=0 instead to be sure to force the registry TypeGuessRows=0 (scan all rows) to work.
If the Excel workbook is protected by a password, you cannot open it for data access, even by supplying the correct password with your connection string. If you try, you receive the following error message: "Could not decrypt file."
I am trying to use C# to read excel file which has intermixed datatype. Below is my connection string
var path = //xls location
var MyConnection = new OleDbConnection("provider=Microsoft.Jet.OLEDB.4.0; Data Source='" + path + "';Extended Properties='Excel 8.0;IMEX=1;'");
Research taught me that the complete Extended Properties in the connection string is supposed to be
Excel 8.0;IMEX=1;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text
However, I was informed that in connection string, the TypeGuessRows=0 has no meaning as the value will be taken directly from the Registry. Hence I need to modify the key manually and remove this property from connection string.
The particular registry key that was involved is:
Path:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
Key:
TypeGuessRows
Original value = 8, in order to make it work change into = 0
Without doing this the IMEX won't work even tough I add TypeGuessRows=0 into the Extended Properties.
However, my company forbids modifying registry value (strictly). I was told to find alternatives doing this.
In short:
Is there a way to read intermixed datatype excel file without having to modify any registry key (which is quite a common practice)?
Further topic:
Have you experienced this before? Are there possibilites that we can set TypeGuessRows=0 from the connection string only without having to modify the registry key (cancelling out my above premise).
If things don't work out with OleDb:
Are there alternatives beside OleDb?
I appreciate any advise or suggestion.
Regards
What you can do is to require having header in first row of Excel and set connection string to
var MyConnection = new OleDbConnection("provider=Microsoft.Jet.OLEDB.4.0; Data Source='" + path + "';Extended Properties='Excel 8.0;HDR=No;IMEX=1;'");
The key here is to set HDR=No (NO HEADER), however since you have header now each column will be treated as string (text), and you can do parsing or validation on each cell value. Of course you will need to skip or remove first row, since it contains header information.
Instead of using OleDb I know use Excel Data Reader. It works greatly! Highly recommended!
First, I want to say that I'm out on deep water here, since I'm just doing some changes to code that is written by someone else in the company, using OleDbDataAdapter to "talk" to Excel and I'm not familiar with that. There is one bug there I just can't follow.
I'm trying to use a OleDbDataAdapter to read in a excel file with around 450 lines.
In the code it's done like this:
connection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;" + "Data Source='" + path + "';" + "Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1;\"");
connection.Open();
OleDbDataAdapter objAdapter = new OleDbDataAdapter(objCommand.CommandText, connection);
objAdapter.Fill(objDataSet, "Excel");
foreach (DataColumn dataColumn in objTable.Columns) {
if (dataColumn.Ordinal > objDataSet.Tables[0].Columns.Count - 1) {
objDataSet.Tables[0].Columns.Add();
}
objDataSet.Tables[0].Columns[dataColumn.Ordinal].ColumnName = dataColumn.ColumnName;
objImport.Columns.Add(dataColumn.ColumnName);
}
foreach (DataRow dataRow in objDataSet.Tables[0].Rows) {
...
}
Everything seems to be working fine except for one thing. The second column is filled with mostly four digit numbers like 6739, 3920 and so one, but fice rows have alphanumeric values like 8201NO and 8205NO. Those five cells are reported as having blank contents instead of their alphanumeric content. I have checked in excel, and all the cells in this columns are marked as Text.
This is an xls file by the way, and not xlsx.
Do anyone have any clue as why these cells are shown as blank in the DataRow, but the numeric ones are shown fine? There are other columns with alphanumeric content that are shown just fine.
What's happening is that excel is trying to assign a data type to the spreadsheet column based on the first several values in that column. I suspect that if you look at the properties in that column it will say it is a numerical column.
The problem comes when you start trying to query that spreadsheet using jet. When it thinks it's dealing with a numerical column and it finds a varchar value it quietly returns nothing. Not even a cryptic error message to go off of.
As a possible work around can you move one of the alpha numeric values to the first row of data and then try parsing. I suspect you will start getting values for the alpha numeric rows then...
Take a look at this article. It goes into more detail on this issue. it also talks about a possible work around which is:
However, as per JET documentation, we
can override the registry setting thru
the Connection String, if we set
IMEX=1( as part of Extended
Properties), the JET will set the all
column type as UNICODE VARCHAR or
ADVARWCHAR irrespective of
‘ImportMixedTypes’ key value.hey
IMEX=1 means "Read mixed data as text."
There are some gotchas, however. Jet will only use several rows to determine whether the data is mixed, and if so happens these rows are all numeric, you'll get this behaviour.
See connectionstrings.com for details:
Check out the [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] located registry REG_DWORD "TypeGuessRows". That's the key to not letting Excel use only the first 8 rows to guess the columns data type. Set this value to 0 to scan all rows. This might hurt performance. Please also note that adding the IMEX=1 option might cause the IMEX feature to set in after just 8 rows. Use IMEX=0 instead to be sure to force the registry TypeGuessRows=0 (scan all rows) to work.
I would advise against using the OleDb data provider stuff to access Excel if you can help it. I've had nothing but problems, for exactly the reasons that others have pointed out. The performance tends to be atrocious as well when you are dealing with large spreadsheets.
You might try this open source solution:
http://exceldatareader.codeplex.com/
I'm creating a utility to import data from Excel to Oracle database,
I have a fixed template for the excel file,
Now, when I'm trying to import the data by Jet provider and ADO.Net - Ole connection tools, I found the following problem: there're some columns haven't been imported because there are mixed data types in their columns [string and number],
I looked for this problem on the internet I found the reason is guessing data types from Excel
The load code:
connection = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0; Data Source={0};Extended Properties=Excel 8.0;");
string columns = "P_ID, FULL_NAME_AR, job_no, GENDER, BIRTH_DATE, RELIGION, MARITAL_STATUS, NAT_ID, JOB_Name, FIRST_HIRE_DATE, HIRE_DATE, CONTRACT_TYPE, GRADE_CODE, QUALIFICATION";
string sheetName = "[Emps$]";
OleDbCommand command = new OleDbCommand(string.Format("select {0} from {1} where p_id is not null", columns, sheetName), connection);
connection.Open();
dr = command.ExecuteReader();
DataTable table = new DataTable();
table.Load(dr);
What should I do to tell Excel STOP GUESSING and give me the data as Text ?
if there isn't, can you help me with any workarounds ?
Thanks in advance
I found a solution by adding IMEX=1 for the connection string, but there's a special format for it which descriped in the following link.
The IMEX parameter is for columns that use mixed numeric and alpha values.
The Excel driver will typically scan the first several rows
in order to determine what data type to use for each column. If a column is determined to be numeric
based upon a scan of the first several rows, then any rows with alpha characters in this column will
be returned as Null. The IMEX parameter (1 is input mode) forces the data type of the column to
text so that alphanumeric values are handled properly.
Regards
This isn't completely right! Apparently, Jet/ACE ALWAYS assumes a string type if the first 8 rows are blank, regardless of IMEX=1, and always uses a numeric type if the first 8 rows are numbers (again, regardless of IMEX=1). Even when I made the rows read to 0 in the registry, I still had the same problem. This was the only sure fire way to get it to work:
try
{
Console.Write(wsReader.GetDouble(j).ToString());
}
catch //Lame unfixable bug
{
Console.Write(wsReader.GetString(j));
}
Can you work from the excel end? This example run in Excel will put mixed data tyoes into an SQL Server table:
Dim cn As New ADODB.Connection
scn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" _
& sFullName _
& ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
cn.Open scn
s = "SELECT Col1, Col2, Col3 INTO [ODBC;Description=TEST;DRIVER=SQL Server;" _
& "SERVER=Some\Instance;Trusted_Connection=Yes;" _
& "DATABASE=test].TableZ FROM [Sheet1$]"
cn.Execute s
An alternative solution is to add or change the setting TypeGuessRows in the registry. By setting its value to 0, the complete document will be scanned.
Unfortunately, the settings may be found on various locations in the registry, depending on the which libraries and versions of them you have installed.
For instance:
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel]
"TypeGuessRows"=dword:00000000
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel]
"TypeGuessRows"=dword:00000000
This will also prevent truncation of textual data longer than 255 characters. This happens if you have a number for TypeGuessRows larger than 0 and the first text longer than 255 characters occurs beyond that number.
See also Setting TypeGuessRows for excel ACE Driver.
i am getting a weird problem. i am using OLEDB for excel connection with
connection string = Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Execute.xls;Extended Properties=Excel 8.0;");
excel file contains columns with string/integer values.
the problem is that sometimes connection read values from sheet absolutly fine but sometimes it missed out some data values and shows them as System.DBNull.
the behavior is very inconsistent.
please help.
Check out http://blog.lab49.com/archives/196.
My first guess would be to check for your regional parameters. Number formats would be different from one regional setting to another, and this could cause the problem. Although Excel is supposed to manage it for you automatically, some times it just doesn't as it is confused or something, then render some strange data like those DBNull values.
Here is your problem. IIRC, the driver only reads the first 8 rows of data and determines the data type of the columns based on that.
So let's say, in the first 8 rows of column 1, you only have numbers. The driver will decide that the column is an integer. Then, if it encounters a string in row 9, it will not be able to convert it to an integer and thus return DBNull to you.
There are several things you can do.
Pre-process your spreadsheet and convert everything to strings
There is a registry entry (the location escapes me at the moment), that allows you to increase the number of rows that the driver uses to determine data type.
Use a commercial Excel reader control
Open the sheet via the Excel Interop library and read the cells on your own
use this code
string pathcpnn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source =" + textBox1.Text + ";Extended Properties=\""Excel 8.0;HDR=Yes;IMEX=1;";";
OleDbConnection con = new OleDbConnection(pathcpnn);
OleDbDataAdapter myDataAdapter = new OleDbDataAdapter("Select * from[" + textBox2.Text + "$]", con);
myDataAdapter.Fill(dt);
DAtagridview1.datasource=dt;