OLEDB read excel cannot process data start with special character apostrophe ( ' ) - c#

The following codes read an excel sheet and copy all the data into a C# DataTable
string strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fileName + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=2\"";
conn = new OleDbConnection(strConn);
conn.Open();
string strExcel = "";
OleDbDataAdapter myCommand = null;
DataSet ds = null;
strExcel = "select * from [" + Sheet1$+ "]";
myCommand = new OleDbDataAdapter(strExcel, strConn);
ds = new DataSet();
myCommand.Fill(ds, "table1");
DataTable DT = ds.Tables[0];
When I review the data in DataTable DT, I found that the data in excel which start with apostrophe cannot be read (ex. '0010, '0026, ..., etc), i.e. become empty in DataTable DT.
Any suggested solution to solve it?

This is one of the fun things working with Excel vba.
The only way to see if there is an apostrophe is using this syntax
Dim s As String
Cells(1, 2).Value = "'abc"
s = Cells(1, 2).Formula
s = Cells(1, 2).PrefixCharacter
When you step through this vba code it will only show you the apostrophe if you add the prefixCharacter to your output.
So the only solution I know of is to loop through ALL THE CELLS.... replace the apostrophe with something silly like & #8217; (html apostrophe) and then fix it in your database..
And if you feel like you've been slapped in the face, yes VBA does that to you now and then. This is not OLEDB issue but an excel issue

Related

How to insert data from excel to datatable which has headers?

I am reading excel having like million of records first i query my table (no records) and get Datable. i query my table to get columns name as define in my excel sheet using alias.
var dal = new clsConn();
var sqlQuery = "SELECT FETAPE_THEIR_TRANDATE \"Date\" ,ISSUER Issuer, ISSU_BRAN Branch , STAN_NUMB STAN, TERMID TermID, ACQUIRER Acquirer,DEBIT_AMOUNT Debit,CREDIT_AMOUNT Credit,CARD_NUMB \"Card Number\" , DESCRIPTION Description FROM ALLTRANSACTIONS";
var returntable = dal.ReadData(sqlQuery);
DataRow ds = returntable.NewRow();
var dtExcelData = returntable;
So my datatable looks like this,
Then i read records from excel sheet
OleDbConnection con = null;
if (ext == ".xls")
{
con = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filepath + ";Extended Properties=Excel 8.0;");
}
else if (ext == ".xlsx")
{
con = new OleDbConnection(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filepath + ";Extended Properties=\"Excel 12.0;IMEX=2;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text\"");
}
con.Open();
dt = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string getExcelSheetName = dt.Rows[0]["Table_Name"].ToString();
//OleDbCommand ExcelCommand = new OleDbCommand(#"SELECT * FROM [" + getExcelSheetName + #"]", con);
OleDbCommand ExcelCommand = new OleDbCommand("SELECT F1, F2, F3, F4, F5,F6,F7,F8,F9,F10 FROM [Sheet1$]", con);
OleDbDataAdapter ExcelAdapter = new OleDbDataAdapter(ExcelCommand);
try
{
ExcelAdapter.Fill(dtExcelData); //Here I give the datatable which i made previously
}
catch (Exception ex)
{
//lblAlert2.CssClass = "message-error";
//lblAlert2.Text = ex.Message;
}
It reads successfully and fill data in datatable but creating its own column in data table like F1 to F10 how can i move this data to exactly match with my defined columns in datatable
How Will i manage this to not create other columns (f1,f2..f10)
any workaround will be appreciable or Please explain what i am doing wrong and how can i achieve this.
UPDATE :
My Excel file looks like this
The Microsoft.ACE.OLEDB.12.0 driver will handle both types of excel spreadsheets and using the same Extended Properties. i.e. "Excel 12.0" will open both .xls and .xlsx.
Leave the HDR=NO as OLEDB expects them in the first row and they are actually in row 11.
Sadly "TypeGuessRows=0;ImportMixedTypes=Text" is completely ignored by Microsoft.ACE.OLEDB.12.0, you've got to play around with the registry (yuk). Change your IMEX=2 to IMEX=1 to ensure that mixed data types as handled as text.
Change back to using "Select * From [Sheet1$]" and then I'm afriad that you are going to have to handle the source data manually.
OleDbConnection con = new OleDbConnection(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filepath + ";Extended Properties=\"Excel 12.0;IMEX=1;HDR=NO\"");
con.Open();
DataTable dt = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string getExcelSheetName = (string)dt.Rows[0]["Table_Name"];
DataTable xlWorksheet = new DataTable();
xlWorksheet.Load(new OleDbCommand("Select * From [" + getExcelSheetName + "]", con).ExecuteReader());
//More than 11 rows implies 11 header rows and at least 1 data row
if (xlWorksheet.Rows.Count > 11 & xlWorksheet.Columns.Count >= 10)
{
for (int nRow = 11; nRow < xlWorksheet.Rows.Count; nRow++)
{
DataRow returnRow = returntable.NewRow();
for (int nColumn = 0; nColumn < 10; nColumn++)
{
//Note you will probably get conversion problems here that you will have to handle
returnRow[nColumn] = xlWorksheet.Rows[nRow].ItemArray[nColumn];
}
returntable.Rows.Add(returnRow);
}
}
I'm guessing you simply want to add the excel data into your ALLTRANSACTION table? You don't specify but it seems the likely outcome of this. If so this is a terrible way to do it. You don't need to read the whole table into memory append data and then update the database. All you need to do is read the excel file and insert the data to the Oracle table.
Some thoughts, your returntable will contain data so if you just want the structure of the table then add a "Where RowNum=0" to the Select statement. To add the data to your Oracle Database you could 1) Convert to using the Oracle Data Provider (ODP) and then use using OracleBulkCopy Class or 2) simply modify the above to insert row by row as you read the data. As long as you don't have a LOT of data in your Excel spreadsheet it will work just fine. Having said that a Million rows is a LOT so perhaps not the best option. You will need to validate the input as Excel is not the best data source really.

The Microsoft Jet database engine could not find the object 'Sheet1$_'

I am reading data from an Excel file. when I read the normal Excel file,It works fine but when I read an excel file which has columns like shown below it does not find the work sheet and gives an exception-
The Microsoft Jet database engine could not find the object 'Sheet1$_'. Make sure the object exists and that you spell its name and the path name correctly.
My Code to read the excel is-
private static DataTable getExcelData(string ExcelPath)
{
OleDbConnection con;
string connectionString;
string[] pathArray = ExcelPath.Split('.');
var Extention = pathArray[pathArray.Length - 1];
if (Extention == "xlsx")
//read a 2007 file
connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" +
ExcelPath + ";Extended Properties=\"Excel 8.0;HDR=YES;\"";
else
//read a 97-2003 file
connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
ExcelPath + ";Extended Properties=Excel 8.0;";
con = new OleDbConnection(connectionString);
if (con.State == ConnectionState.Closed)
{
con.Open();
}
DataTable dbSchema = con.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, null);
var firstSheetName = dbSchema.Rows[0]["TABLE_NAME"];
OleDbDataAdapter cmd = new OleDbDataAdapter("select * from [" + firstSheetName + "] Where NOT [Event Code]=''", con);
DataSet ds = new DataSet();
cmd.Fill(ds);
con.Close();
return ds.Tables[0];
}
}
I have to get all the columns inside Mon,Tues etc.
GetOleDbSchemaTable also returns hidden tables in your Excel file: usually a name like Sheet1$_ indicates an hidden table created when you apply a filter on Sheet1$.
You need to change your code: search for table that ends with $ to set firstSheetName.
Please note that OLEDB does not preserve the sheet order as they were in Excel.
Also note that you need to do this to read an excel file with multirow titles:
set HDR=No in EXTENDED PROPERTIES of your connection string
specify column name and select range in your OleDbCommand in order to skip the first two rows
For example:
SELECT [F1] AS Location,
[F2] AS EmpId,
[F3] AS EmpName,
[F4] AS MondayShift,
[F5] AS Monday15Min,
[F6] AS Monday30Min,
[F7] AS Monday15Min2
FROM [Sheet1$A3:G]

Reading multiple Excel worksheets inside a single xlsx using c# [duplicate]

This question already has answers here:
Reading multiple excel sheets with different worksheet names
(2 answers)
Closed 8 years ago.
Using c# I can successfully open an excel document and read the data in the first worksheet with the code below. However, my .xlsx has multiple worksheets so I would like to loop through the worksheet collection rather than hard coding the name of each worksheet. Many thanks.
string path = #"C:\Extract\Extract.xlsx";
string connStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties=Excel 12.0;";
string sql = "SELECT * FROM [Sheet1$]";
using (OleDbDataAdapter adaptor = new OleDbDataAdapter(sql, connStr))
{
DataSet ds = new DataSet();
adaptor.Fill(ds);
DataTable dt = ds.Tables[0];
}
I used most of the code in the answer here [Reading multiple excel sheets with different worksheet names that was kindly pointed out to me in a comment on my question.
It wouldn't compile for me in VS 2013 though as the DataRow object does not have have the property Item (- r.Item(0).ToString in that code). So I just changed that little bit. It also brought back some worksheet that had Print_Area in its name which wasn't valid so I took that out of my loop. Here is the code as it worked for me.
string path = #"C:\Extract\Extract.xlsx";
string connStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties=Excel 12.0;";
DataTable sheets = GetSchemaTable(connStr);
string sql = string.Empty;
DataSet ds = new DataSet();
foreach (DataRow dr in sheets.Rows)
{ //Print_Area
string WorkSheetName = dr["TABLE_NAME"].ToString().Trim();
if (!WorkSheetName.Contains("Print_Area"))
{
sql = "SELECT * FROM [" + WorkSheetName + "]";
ds.Clear();
OleDbDataAdapter data = new OleDbDataAdapter(sql, connStr);
data.Fill(ds);
DataTable dt1 = ds.Tables[0];
foreach (DataRow dr1 in dt1.Rows)
{
//parsing work
}
}
}
static DataTable GetSchemaTable(string connectionString)
{
using (OleDbConnection connection = new
OleDbConnection(connectionString))
{
connection.Open();
DataTable schemaTable = connection.GetOleDbSchemaTable(
OleDbSchemaGuid.Tables,
new object[] { null, null, null, "TABLE" });
return schemaTable;
}
}
I'm about to work on almost the same problem.
I found the guide at http://www.dotnetperls.com/excel quite useful.
In short, to open worksheet no. 3, add the following code after opening the excel workbook:
var worksheet = workbook.Worksheets[3] as
Microsoft.Office.Interop.Excel.Worksheet;
Hope this answered your question.
I'd recommend using EPPlus (available via Nuget https://www.nuget.org/packages/EPPlus/ ) it's a great wrapper tool for working with .xlsx spreadsheets in .Net .In it worksheets are a collection and so you can do what you want by just looping round them, regardless of name or index.
For example,
using (ExcelPackage package = new ExcelPackage(new FileInfo(sourceFilePath)))
{
foreach (var excelWorksheet in package.Workbook.Worksheets)
...
}
You should try the Open XML Format SDK (Nuget: Link). The link below explains both reading and writing Excel documents:
http://www.codeproject.com/Articles/670141/Read-and-Write-Microsoft-Excel-with-Open-XML-SDK
Oh by the way, office doesn't have to be installed to use...

All of a sudden can't open excel file with microsoft.jet.4.0 with c#. I changed nothing at all with the connection string?

So I'm working on this thing for work, that converts an excel list of instructions into a better looking, formatted word document. I've been connecting to the excel document and then storing the file into a datatable for easier access.
I had just finally gotten the borders and stuff right for my word document when i started getting an error:
External table is not in the expected format.
Here is the full connection algorithm:
public static DataTable getWorkSheet(string excelFile =
"C:\\Users\\Mitch\\Dropbox\\Work tools\\Excel for andrew\\Air Compressor PM's.xlsx") {
string connection = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + excelFile
+ ";Extended Properties='Excel 8.0;HDR=YES;'";
string sql = null;
string worksheetName = null;
string[] Headers = new string[4];
DataTable schema = null;
DataTable worksheet = null;
DataSet workbook = new DataSet();
//Preparing and opening connection
OleDbConnection objconn = new OleDbConnection(connection);
objconn.Open();
//getting the schema data table
schema = objconn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
worksheetName = schema.Rows[0]["Table_Name"].ToString();
//Each worksheet will have a varying name, so the name is just called from
//the dataTable.rows array. Can be later modified to use multiple
//worksheets within a workbook.
sql = "SELECT * FROM[" + worksheetName + "]";
//data adapter
OleDbDataAdapter objAdapter = new OleDbDataAdapter();
//pass the sql
objAdapter.SelectCommand = new OleDbCommand(sql, objconn);
//populate the dataset
objAdapter.Fill(workbook);
//Remove spaces from the headers.
worksheet = workbook.Tables[0];
for (int x = 0; x < Headers.Count(); x++) {
Headers[x] = worksheet.Columns[x].ColumnName;
worksheet.Columns[x].ColumnName = worksheet.Columns[x].ColumnName.Replace(" ", "");
}
return worksheet;
}//end of getWorksheet
EDIT: i pulled up my old code from dropbox previous versions that was definetly working as well as redownload a copy of the excel doc i know was working..... what gives? has something changed in my computer?
You are connecting to a 2007/2010 Excel file (*.xlsx, *.xlsm). You need the updated 2010 drivers (Ace), which can be downloaded for free. The correct connection string can be obtained from http://connectionstrings.com/Excel and http://connectionstrings.com/Excel-2007

Reading Excel-file using Oledb - treating content of excel file as Text only

I am using C# and OleDb to read data from an excel 2007 file.
Connection string I am using is:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\myFolder\myExcel2007file.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=1";
Following is the code to read excel:
private OleDbConnection con = null;
private OleDbCommand cmd = null;
private OleDbDataReader dr = null;
private OleDbDataAdapter adap = null;
private DataTable dt = null;
private DataSet ds = null;
private string query;
private string conStr;
public MainWindow()
{
this.InitializeComponent();
this.query = "SELECT * FROM [Sheet1$]";
this.conStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\\Users\\301591\\Desktop\\Fame.xlsx;Extended Properties=\"Excel 12.0 Xml;HDR=YES;IMEX=1;TypeGuessRows=0;ImportMixedTypes=Text\"";
}
private void btnImport_Click(object sender, RoutedEventArgs e)
{
this.ImportingDataSetWay();
}
private void ImportingDataSetWay()
{
con = new OleDbConnection(conStr);
cmd = new OleDbCommand(query, con);
adap = new OleDbDataAdapter(cmd);
ds = new DataSet();
adap.Fill(ds);
this.grImport.ItemsSource = ds.Tables[0].DefaultView;
}
Here grImport is my WPF Data-Grid and I am using auto-generated columns.
How can I make sure the content stored in Excel will always be read as a string.
I am not allowed to modify any of the registry values to achieve this. Is there any better way to read excel. Please guide me. If you need any other information do let me know.
Regards,
Priyank
Could you try oledb provider connection string as follow.
HDR=NO means oledb will read all rows as data [NO HEADER]. So as your header columns are all text, it will treat all row data in all columns as text. After filling data into DataSet, you have to remove first row as it is not data.
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\myFolder\myExcel2007file.xlsx;Extended Properties=\"Excel 12.0;IMEX=1;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text\"";
One fix we found, is to ensure that the first row contains a header.
i.e. make sure that your column names are in the first row. If that's possible.
Then in your code, you have to programmatically ignore the first row, while
at the same time scarfing your column names from it, if need be.
Use this in your connection string.
IMEX=1;HDR=NO;
I'm not sure of this
TypeGuessRows=0;ImportMixedTypes=Text
I had similar issue.. i resolved it by splitting the connectionstring as mentioned in following string. Please note that after extended properties.. there is (char)34 to surround IMEX=1 addition to the string. without surrounding with (char)34, it will give error "cant find ISAM". Hope this resolves your issue for ACE provider also
strConn = "Provider=Microsoft.Jet.OLEDB.4.0;" +
"Data Source=" + Server.MapPath("UploadedExcel/" + FileName + ".xls") +
";Extended Properties=" +
(char)34 + "Excel 8.0;IMEX=1;" + (char)34;

Categories

Resources