how to Read xls file using OLEDB? - c#

I want to read all data from an xls file using OLEDB, but I don't have any experience in that.
string filename = #"C:\Users\sasa\Downloads\user-account-creation_2.xls";
string connString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filename + ";Extended Properties='Excel 8.0;HDR=YES'";
using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection(connString))
{
conn.Open();
System.Data.OleDb.OleDbCommand selectCommand = new System.Data.OleDb.OleDbCommand("select * from [Sheet1$]", conn);
System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter(selectCommand);
DataTable dt = new DataTable();
adapter.Fill(dt);
int counter = 0;
foreach (DataRow row in dt.Rows)
{
String dataA = row["email"].ToString();
// String dataB= row["DataB"].ToString();
Console.WriteLine(dataA + " = ");
counter++;
if (counter >= 40) break;
}
}
I want to read all data from email row
I get this error
'Sheet$' is not a valid name. Make sure that it does not include invalid characters or punctuation and that it is not too long

Well, you don't have a sheet called Sheet1 do you? Your sheet seems to be called "email address from username" so your query should be....
Select * From ['email address from username$']
Also please don't use Microsoft.Jet.OLEDB.4.0 as it's pretty much obsolete now. Use Microsoft.ACE.OLEDB.12.0. If you specify Excel 12.0 in the extended properties it will open both .xls and .xlsx files.
You can also load the DataTable with a single line...
dt.Load(new System.Data.OleDb.OleDbCommand("Select * From ['email address from username$']", conn).ExecuteReader());
To read the names of the tables in the file use...
DataTable dtTablesList = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
foreach (DataRow drTable in dtTablesList.Rows)
{
//Do Something
//But be careful as this will also return Defined Names. i.e ranges created using the Defined Name functionality
//Actual Sheet names end with $ or $'
if (drTable["Table_Name"].ToString().EndsWith("$") || drTable["Table_Name"].ToString().EndsWith("$'"))
{
Console.WriteLine(drTable["Table_Name"]);
}
}

Is it possible to use the Open XML SDK?
https://learn.microsoft.com/en-us/office/open-xml/how-to-retrieve-the-values-of-cells-in-a-spreadsheet

Related

How to insert data from excel to datatable which has headers?

I am reading excel having like million of records first i query my table (no records) and get Datable. i query my table to get columns name as define in my excel sheet using alias.
var dal = new clsConn();
var sqlQuery = "SELECT FETAPE_THEIR_TRANDATE \"Date\" ,ISSUER Issuer, ISSU_BRAN Branch , STAN_NUMB STAN, TERMID TermID, ACQUIRER Acquirer,DEBIT_AMOUNT Debit,CREDIT_AMOUNT Credit,CARD_NUMB \"Card Number\" , DESCRIPTION Description FROM ALLTRANSACTIONS";
var returntable = dal.ReadData(sqlQuery);
DataRow ds = returntable.NewRow();
var dtExcelData = returntable;
So my datatable looks like this,
Then i read records from excel sheet
OleDbConnection con = null;
if (ext == ".xls")
{
con = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filepath + ";Extended Properties=Excel 8.0;");
}
else if (ext == ".xlsx")
{
con = new OleDbConnection(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filepath + ";Extended Properties=\"Excel 12.0;IMEX=2;HDR=NO;TypeGuessRows=0;ImportMixedTypes=Text\"");
}
con.Open();
dt = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string getExcelSheetName = dt.Rows[0]["Table_Name"].ToString();
//OleDbCommand ExcelCommand = new OleDbCommand(#"SELECT * FROM [" + getExcelSheetName + #"]", con);
OleDbCommand ExcelCommand = new OleDbCommand("SELECT F1, F2, F3, F4, F5,F6,F7,F8,F9,F10 FROM [Sheet1$]", con);
OleDbDataAdapter ExcelAdapter = new OleDbDataAdapter(ExcelCommand);
try
{
ExcelAdapter.Fill(dtExcelData); //Here I give the datatable which i made previously
}
catch (Exception ex)
{
//lblAlert2.CssClass = "message-error";
//lblAlert2.Text = ex.Message;
}
It reads successfully and fill data in datatable but creating its own column in data table like F1 to F10 how can i move this data to exactly match with my defined columns in datatable
How Will i manage this to not create other columns (f1,f2..f10)
any workaround will be appreciable or Please explain what i am doing wrong and how can i achieve this.
UPDATE :
My Excel file looks like this
The Microsoft.ACE.OLEDB.12.0 driver will handle both types of excel spreadsheets and using the same Extended Properties. i.e. "Excel 12.0" will open both .xls and .xlsx.
Leave the HDR=NO as OLEDB expects them in the first row and they are actually in row 11.
Sadly "TypeGuessRows=0;ImportMixedTypes=Text" is completely ignored by Microsoft.ACE.OLEDB.12.0, you've got to play around with the registry (yuk). Change your IMEX=2 to IMEX=1 to ensure that mixed data types as handled as text.
Change back to using "Select * From [Sheet1$]" and then I'm afriad that you are going to have to handle the source data manually.
OleDbConnection con = new OleDbConnection(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filepath + ";Extended Properties=\"Excel 12.0;IMEX=1;HDR=NO\"");
con.Open();
DataTable dt = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string getExcelSheetName = (string)dt.Rows[0]["Table_Name"];
DataTable xlWorksheet = new DataTable();
xlWorksheet.Load(new OleDbCommand("Select * From [" + getExcelSheetName + "]", con).ExecuteReader());
//More than 11 rows implies 11 header rows and at least 1 data row
if (xlWorksheet.Rows.Count > 11 & xlWorksheet.Columns.Count >= 10)
{
for (int nRow = 11; nRow < xlWorksheet.Rows.Count; nRow++)
{
DataRow returnRow = returntable.NewRow();
for (int nColumn = 0; nColumn < 10; nColumn++)
{
//Note you will probably get conversion problems here that you will have to handle
returnRow[nColumn] = xlWorksheet.Rows[nRow].ItemArray[nColumn];
}
returntable.Rows.Add(returnRow);
}
}
I'm guessing you simply want to add the excel data into your ALLTRANSACTION table? You don't specify but it seems the likely outcome of this. If so this is a terrible way to do it. You don't need to read the whole table into memory append data and then update the database. All you need to do is read the excel file and insert the data to the Oracle table.
Some thoughts, your returntable will contain data so if you just want the structure of the table then add a "Where RowNum=0" to the Select statement. To add the data to your Oracle Database you could 1) Convert to using the Oracle Data Provider (ODP) and then use using OracleBulkCopy Class or 2) simply modify the above to insert row by row as you read the data. As long as you don't have a LOT of data in your Excel spreadsheet it will work just fine. Having said that a Million rows is a LOT so perhaps not the best option. You will need to validate the input as Excel is not the best data source really.

Query string selections for XLS spreadsheet C#

I am trying to grab cells in XLS spreadsheets, assign them to string arrays, then manipulate the data and export to multiple CVS files.
The trouble is the XLS spreadsheet contains information that is not relevant, useable data doesn't start till row 17 and columns have no headings with just the default Sheet1.
I have looked at related questions and tried figuring it out myself with no success. The following code to read the XLS kinda works but is messy to work with as the row lengths vary from one XLS file to another and it is automatically pulling empty columns and rows.
CODE
public static void xlsReader()
{
string fileName = string.Format("{0}\\LoadsAvailable.xls", Directory.GetCurrentDirectory());
string connectionString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fileName + ";" + #"Extended Properties='Excel 8.0;HDR=Yes;'";
string queryString = "SELECT * FROM [Sheet1$]";
using (OleDbConnection connection = new OleDbConnection(connectionString))
{
OleDbCommand command = new OleDbCommand(queryString, connection);
connection.Open();
OleDbDataReader reader = command.ExecuteReader();
int counter = 0;
while (reader.Read())
{
Console.WriteLine("Line " + counter + ":" + reader[28].ToString()); // Just for testing
counter++;
}
}
}
I could do a bunch of trickery with loops to get the data that is required but there has to be a query string that could get the data from row 17 with only 8 columns(not 26 columns with 18 empty)?
I have tried many query string examples and can not seam to get any to work with a starting row index or filter out the empty data.
Here is a handy method that converts an excel file to a flat file.
You may want to change the connection string properties to suit your needs. I needed headers for my case.
Note you will need the Access database engine installed on your machine. I needed the 32 bit version since the app i dev'd was 32 bit. I bet you will also need it.
I parameterized the delimiter for the flat file, because I had cases where I didn't need a comma but a pipe symbol.
How to call method ex: ConvertExcelToFlatFile(openFileName, savePath, '|'); // pipe delimited
// Converts Excel To Flat file
private void ConvertExcelToFlatFile(string excelFilePath, string csvOutputFile, char delimeter, int worksheetNumber = 1)
{
if (!File.Exists(excelFilePath)) throw new FileNotFoundException(excelFilePath);
if (File.Exists(csvOutputFile)) throw new ArgumentException("File exists: " + csvOutputFile);
// connection string
var cnnStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml; IMEX=1; HDR=NO\"", excelFilePath);
var cnn = new OleDbConnection(cnnStr);
// get schema, then data
var dt = new DataTable();
try
{
cnn.Open();
var schemaTable = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (schemaTable.Rows.Count < worksheetNumber) throw new ArgumentException("The worksheet number provided cannot be found in the spreadsheet");
string worksheet = schemaTable.Rows[worksheetNumber - 1]["table_name"].ToString().Replace("'", "");
string sql = String.Format("select * from [{0}]", worksheet);
var da = new OleDbDataAdapter(sql, cnn);
da.Fill(dt);
}
catch (Exception e)
{
throw e;
}
finally
{
// free resources
cnn.Close();
}
// write out CSV data
using (var wtr = new StreamWriter(csvOutputFile)) // disposes file handle when done
{
foreach (DataRow row in dt.Rows)
{
//MessageBox.Show(row.ItemArray.ToString());
bool firstLine = true;
foreach (DataColumn col in dt.Columns)
{
// skip the first line the initial
if (!firstLine)
{
wtr.Write(delimeter);
}
else
{
firstLine = false;
}
var data = row[col.ColumnName].ToString();//.Replace("\"", "\"\""); // replace " with ""
wtr.Write(String.Format("{0}", data));
}
wtr.WriteLine();
}
}
}

Can only read Excel file when it is actually open in Ms Excel

I am using the following code to open an excel file (XLS) and populate a DataTable with the first worksheet:
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", filename);
OleDbConnection connExcel = new OleDbConnection(connectionString);
connExcel.Open();
DataTable dtExcelSchema;
dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string SheetName = dtExcelSchema.Rows[0]["TABLE_NAME"].ToString();
connExcel.Close();
var adapter = new OleDbDataAdapter("SELECT * FROM [" + SheetName + "]", connectionString);
var ds = new DataSet();
int count = 0;
adapter.Fill(ds, SheetName);
DataTable dt = ds.Tables[0];
It works only when the file is already open in Ms Excel. Why could that be?
If the file is not open, I get an error message (on line connExcel.Open): External table is not in the expected format.
I'm facing the same problem and accordingly to this site, many developers are struggling for the same:
-When I try read Excel with OLE DB all values are empty
-Can't connect to excel file unless file is already open
Actually I'm using the classic connection string (note that I'm trying to read a 97/2003 file):
Provider=Microsoft.Jet.OLEDB.4.0; Data Source = " + GetFilename(filename) + "; Extended Properties ='Excel 8.0;HDR=NO;IMEX=1'
but the file can be read properly only if:
Is open in Excel or even in Word! (the file of course appears corrupted and unreadable, but then the OleDb procedure can read every line of the file), I didn't try with other Office apps
The file is not in read-only mode
I also tried to lock the file manually or to open it with other non-office applications, but the result is not the same. If I follow the two previous rules (file opened in Word or Excel in not read-only mode) I can see all the cells, otherwise it seems the first column is ignored completely (so F2 became F1, F3 became F2,... and F6, the last one, should became F5 otherwise it throws and out-of-index error).
In order to keep compatibility with OleDb without using 3rd parties libraries I found a very stupid workaround using Microsoft.Office.Interop.Excel assembly.
Excel.Application _app = new Excel.Application();
var workbooks = _app.Workbooks;
workbooks.Open(_filename);
// OleDb Connection
using (OleDbConnection conn = new OleDbConnection(connectionOleDb))
{
try
{
conn.Open();
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = conn;
cmd.CommandText = String.Format("SELECT * FROM [{0}$]", tableName);
OleDbDataReader myReader = cmd.ExecuteReader();
int i = 0;
while (myReader.Read())
{
//Here I read through all Excel rows
}
}
catch (Exception E)
{
MessageBox.Show("Error!\n" + E.Message);
}
finally
{
conn.Close();
workbooks.Close();
if (workbooks != null)
System.Runtime.InteropServices.Marshal.ReleaseComObject(workbooks);
_app.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(_app);
}
}
Essentially the first 3 lines run an Excel instance that lasts exactly the time needed to OleDb to perform its tasks.
The last 4 lines, inside the finally block, let the Excel instance to be closed correctly, immediately after the task and avoid ghost Excel processes.
I repeat it's a very stupid workaround that also requires a 1,5 MB dll (Microsoft.Office.Interop.Excel.dll) to be added to the project.
Anyway seems impossible that OleDb cannot manage by itself the missing data...
I had the same problem. If the file was open the read was ok but if the file was closed... some thing was strange... in my case I received strange data from columns and values.. Debugging I found the name of the first sheet and was strange ["xls _xlnm#_FilterDatabase"] looking on the internet I found that's a name of hidden sheet and a trick to avoid read this sheet (HERE) and so I've implemented a method:
private string getFirstVisibileSheet(DataTable dtSheet, int index = 0)
{
string sheetName = String.Empty;
if (dtSheet.Rows.Count >= (index + 1))
{
sheetName = dtSheet.Rows[index]["TABLE_NAME"].ToString();
if (sheetName.Contains("FilterDatabase"))
{
return getFirstVisibileSheet(dtSheet, ++index);
}
}
return sheetName;
}
To me worked very well.
My complete example code is:
string excelFilePath = String.Empty;
string stringConnection = String.Empty;
using (OpenFileDialog openExcelDialog = new OpenFileDialog())
{
openExcelDialog.Filter = "Excel 2007 (*.xlsx)|*.xlsx|Excel 2003 (*.xls)|*.xls";
openExcelDialog.FilterIndex = 1;
openExcelDialog.RestoreDirectory = true;
DialogResult windowsResult = openExcelDialog.ShowDialog();
if (windowsResult != System.Windows.Forms.DialogResult.OK)
{
return;
}
excelFilePath = openExcelDialog.FileName;
using (DataTable dt = new DataTable())
{
try
{
if (!excelFilePath.Equals(String.Empty))
{
stringConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + excelFilePath + ";Extended Properties='Excel 8.0; HDR=YES;';";
using (OleDbConnection conn = new OleDbConnection(stringConnection))
{
conn.Open();
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = conn;
DataTable dtSheet = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string sheetName = getFirstVisibileSheet(dtSheet);
cmd.CommandText = "SELECT * FROM [" + sheetName + "]";
dt.TableName = sheetName;
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
da.Fill(dt);
cmd = null;
conn.Close();
}
}
//Read and Use my DT
foreach (DataRow row in dt.Rows)
{
//On my case I need data on first and second Columns
if ((row.ItemArray.Count() < 2) ||
(row[0] == null || String.IsNullOrWhiteSpace(row[0].ToString()))
||
(row[1] == null ||String.IsNullOrWhiteSpace(row[1].ToString())))
{
continue;
}
//Get the number from the first COL
int colOneNumber = 0;
Int32.TryParse(row[0].ToString(), out colOneNumber);
//Get the string from the second COL
string colTwoString = row[1].ToString();
//Get the string from third COL if is a file path valid
string colThree = (row.ItemArray.Count() >= 3
&& !row.IsNull(2)
&& !String.IsNullOrWhiteSpace(row[2].ToString())
&& File.Exists(row[2].ToString())
) ? row[2].ToString() : String.Empty;
}
}
catch (Exception ex)
{
MessageBox.Show("Import error.\n" + ex.Message, "::ERROR::", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
}
}
private string getFirstVisibileSheet(DataTable dtSheet, int index = 0)
{
string sheetName = String.Empty;
if (dtSheet.Rows.Count >= (index + 1))
{
sheetName = dtSheet.Rows[index]["TABLE_NAME"].ToString();
if (sheetName.Contains("FilterDatabase"))
{
return getFirstVisibileSheet(dtSheet, ++index);
}
}
return sheetName;
}
Is it failing on ToString(), like here?
Error is "Object reference not set to an instance of an object"
Does Convert.ToString() fix anything?

Issue with importing .csv using MS Jet OLEDB

I'm importing a CSV file to SQL for an ASP .NET application. I am able to import the .csv, however one column contains null values if there is anything other than numbers in it.
This row imports fine:
1109,003,IN,0093219095,3/17/2013,3/21/2013,,Sobeys Warehouse,4819.13,61.37,4880.50,RV,1109-003
The fourth column is NULL in SQL:
1109,999,IN,REF 44308/S. DRA,3/18/2013,3/21/2013,,"EC Rebates W/E -02 14, 2013",-200.02,0.00,-200.02,SA,1109-999
All other columns that have text in them import fine, just the fourth one is an issue. I can't figure out what could possibly be different about the affected column over the others. If I replace the text with numbers it imports so its something to do with text data. SQL field is nvarchar(50) so its not a datatype issue.
My connection string (dir contains the path to the folder):
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + dir + "\\\";Extended Properties='text;HDR=Yes;FMT=Delimited(,)';";
Import code:
public DataTable GetCSV(string path)
{
if (!File.Exists(path))
{
return null;
}
DataTable dt = new DataTable();
string fullPath = Path.GetFullPath(path);
string file = Path.GetFileName(fullPath);
string dir = Path.GetDirectoryName(fullPath);
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + dir + "\\\";Extended Properties='text;HDR=Yes;FMT=Delimited(,)';";
string query = "SELECT * FROM " + file;
System.Data.OleDb.OleDbDataAdapter da = new System.Data.OleDb.OleDbDataAdapter(query, connString);
try
{
da.Fill(dt);
}
catch (InvalidOperationException)
{
}
da.Dispose();
return dt;
}
This solved my issue. Thanks to Pondlife for getting me pointed in the right direction. I figured it had something to do with data types, just not sure where it was getting walloped.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353%28v=vs.85%29.aspx

All of a sudden can't open excel file with microsoft.jet.4.0 with c#. I changed nothing at all with the connection string?

So I'm working on this thing for work, that converts an excel list of instructions into a better looking, formatted word document. I've been connecting to the excel document and then storing the file into a datatable for easier access.
I had just finally gotten the borders and stuff right for my word document when i started getting an error:
External table is not in the expected format.
Here is the full connection algorithm:
public static DataTable getWorkSheet(string excelFile =
"C:\\Users\\Mitch\\Dropbox\\Work tools\\Excel for andrew\\Air Compressor PM's.xlsx") {
string connection = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + excelFile
+ ";Extended Properties='Excel 8.0;HDR=YES;'";
string sql = null;
string worksheetName = null;
string[] Headers = new string[4];
DataTable schema = null;
DataTable worksheet = null;
DataSet workbook = new DataSet();
//Preparing and opening connection
OleDbConnection objconn = new OleDbConnection(connection);
objconn.Open();
//getting the schema data table
schema = objconn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
worksheetName = schema.Rows[0]["Table_Name"].ToString();
//Each worksheet will have a varying name, so the name is just called from
//the dataTable.rows array. Can be later modified to use multiple
//worksheets within a workbook.
sql = "SELECT * FROM[" + worksheetName + "]";
//data adapter
OleDbDataAdapter objAdapter = new OleDbDataAdapter();
//pass the sql
objAdapter.SelectCommand = new OleDbCommand(sql, objconn);
//populate the dataset
objAdapter.Fill(workbook);
//Remove spaces from the headers.
worksheet = workbook.Tables[0];
for (int x = 0; x < Headers.Count(); x++) {
Headers[x] = worksheet.Columns[x].ColumnName;
worksheet.Columns[x].ColumnName = worksheet.Columns[x].ColumnName.Replace(" ", "");
}
return worksheet;
}//end of getWorksheet
EDIT: i pulled up my old code from dropbox previous versions that was definetly working as well as redownload a copy of the excel doc i know was working..... what gives? has something changed in my computer?
You are connecting to a 2007/2010 Excel file (*.xlsx, *.xlsm). You need the updated 2010 drivers (Ace), which can be downloaded for free. The correct connection string can be obtained from http://connectionstrings.com/Excel and http://connectionstrings.com/Excel-2007

Categories

Resources