Trimming all cells in DataTable - c#

I am using the below code to trim all cells in my DataTable.
The problem is, that I am doing it through a loop, and depending on what I fill the DataTable with, if it has 1500 rows and 20 columns, the loop takes a really, really long time.
DataColumn[] stringColumns = dtDataTable.Columns.Cast<DataColumn>().Where(c => c.DataType == typeof(string)).ToArray();
foreach (DataRow row in dtDataTable.Rows)
{
foreach (DataColumn col in stringColumns)
{
if (row[col] != DBNull.Value)
{
row.SetField<string>(col, row.Field<string>(col).Trim());
}
}
}
And here is how I am importing my Excel sheet to the DataTable:
using (OpenFileDialog ofd = new OpenFileDialog() { Title = "Select File", Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97-2003|*.xls|All Files(*.*)|*.*", Multiselect = false, ValidateNames = true })
{
if (ofd.ShowDialog() == DialogResult.OK)
{
String PathName = ofd.FileName;
FileName = System.IO.Path.GetFileNameWithoutExtension(ofd.FileName);
strConn = string.Empty;
FileInfo file = new FileInfo(PathName);
if (!file.Exists) { throw new Exception("Error, file doesn't exists!"); }
string extension = file.Extension;
switch (extension)
{
case ".xls":
strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + PathName + ";Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
case ".xlsx":
strConn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + PathName + ";Extended Properties='Excel 12.0;HDR=Yes;IMEX=1;'";
default:
strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + PathName + ";Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
}
}
else
{
return;
}
}
using (OleDbConnection cnnxls = new OleDbConnection(strConn))
{
using (OleDbDataAdapter oda = new OleDbDataAdapter(string.Format("select * from [{0}$]", "Sheet1"), cnnxls))
{
oda.Fill(dtDataTableInitial);
}
}
//Clone dtDataTableInitial so that I can have the new DataTable in String Type
dtDataTable = dtDataImportInitial.Clone();
foreach (DataColumn col in dtDataTable.Columns)
{
col.DataType = typeof(string);
}
foreach (DataRow row in dtDataImportInitial.Rows)
{
dtDataTable.ImportRow(row);
}
Is there a more efficient way of accomplishing this?
EDIT: As per JQSOFT's suggestion, I am using OleDbDataReader now, but am still running two issues:
One: SELECT RTRIM(LTRIM(*)) FROM [Sheet1$] doesn't seem to work.
I know that it is possible to select each column one by one, but the number of and header of the columns in the excel sheet is random, and I am not sure how to adjust my SELECT string to account for this.
Two: A column whose rows are mostly populated with numbers, but have a few rows with letters seem to have those rows with letters omitted. For example:
Col1
1
2
3
4
5
6
a
b
Becomes:
Col1
1
2
3
4
5
6
However, I have discovered that if I manually go into the excel sheet and convert the entire table cell format to "Text", this issue is resolved. However, doing this converts any dates in that excel sheet into unrecognizable strings of numbers, so I want to avoid doing this if at all possible.
For example: 7/2/2020 becomes 44014 if converted to "Text".
Here is my new code:
private void Something()
{
if (ofd.ShowDialog() == DialogResult.OK)
{
PathName = ofd.FileName;
FileName = System.IO.Path.GetFileNameWithoutExtension(ofd.FileName);
strConn = string.Empty;
FileInfo file = new FileInfo(PathName);
if (!file.Exists) { throw new Exception("Error, file doesn't exists!"); }
}
using (OleDbConnection cn = new OleDbConnection { ConnectionString = ConnectionString(PathName, "No") })
{
using (OleDbCommand cmd = new OleDbCommand { CommandText = query, Connection = cn })
{
cn.Open();
OleDbDataReader dr = cmd.ExecuteReader();
dtDataTable.Load(dr);
}
}
dataGridView1.DataSource = dtDataTable;
}
public string ConnectionString(string FileName, string Header)
{
OleDbConnectionStringBuilder Builder = new OleDbConnectionStringBuilder();
if (Path.GetExtension(FileName).ToUpper() == ".XLS")
{
Builder.Provider = "Microsoft.Jet.OLEDB.4.0";
Builder.Add("Extended Properties", string.Format("Excel 8.0;IMEX=1;HDR=Yes;", Header));
}
else
{
Builder.Provider = "Microsoft.ACE.OLEDB.12.0";
Builder.Add("Extended Properties", string.Format("Excel 12.0;IMEX=1;HDR=Yes;", Header));
}
Builder.DataSource = FileName;
return Builder.ConnectionString;
}

OleDb Objects
Actually what I meant is, to get formatted/trimmed string values from the Excel Sheet and create a DataTable with DataColumn objects of string type only, use the forward-only OleDbDataReader to create both, DataColumn and DataRow objects as it reads. Doing so, the data will be modified and filled in one step hence no need to call another routine to loop again and waste some more time. Also, consider using asynchronous calls to speed up the process and avoid freezing the UI while executing the lengthy task.
Something might help you to go:
private async void TheCaller()
{
using (var ofd = new OpenFileDialog
{
Title = "Select File",
Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
AutoUpgradeEnabled = true,
})
{
if (ofd.ShowDialog() != DialogResult.OK) return;
var conString = string.Empty;
var msg = "Loading... Please wait.";
try
{
switch (ofd.FilterIndex)
{
case 1: //xlsx
conString = $"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={ofd.FileName};Extended Properties='Excel 12.0;HDR=Yes;IMEX=1;'";
break;
case 2: //xls
conString = $"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={ofd.FileName};Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
break;
default:
throw new FileFormatException();
}
var sheetName = "sheet1";
var dt = new DataTable();
//Optional: a label to show the current status
//or maybe show a ProgressBar with ProgressBarStyle = Marquee
lblStatus.Text = msg;
await Task.Run(() =>
{
using (var con = new OleDbConnection(conString))
using (var cmd = new OleDbCommand($"SELECT * From [{sheetName}$]", con))
{
con.Open();
using (var r = cmd.ExecuteReader())
while (r.Read())
{
if (dt.Columns.Count == 0)
for (var i = 0; i < r.FieldCount; i++)
dt.Columns.Add(r.GetName(i).Trim(), typeof(string));
object[] values = new object[r.FieldCount];
r.GetValues(values);
dt.Rows.Add(values.Select(x => x?.ToString().Trim()).ToArray());
}
}
});
//If you want...
dataGridView1.DataSource = null;
dataGridView1.DataSource = dt;
msg = "Loading Completed";
}
catch (FileFormatException)
{
msg = "Unknown Excel file!";
}
catch (Exception ex)
{
msg = ex.Message;
}
finally
{
lblStatus.Text = msg;
}
}
}
Here's a demo, reading sheets with 8 columns and 5000 rows from both xls and xlsx files:
Less than a second. Not bad.
However, this will not work correctly if the Sheet has mixed-types columns like your case where the third column has string and int values in different rows. That because the data type of a column is guessed in Excel by examining the first 8 rows by default. Changing this behavior requires changing the registry value of TypeGuessRows in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\x.0\Engines\Excel from 8 to 0 to force checking all the rows instead of just the first 8. This action will dramatically slow down the performance.
Office Interop Objects
Alternatively, you could use the Microsoft.Office.Interop.Excel objects to read the Excel Sheet, get and format the values of the cells regardless of their types.
using Excel = Microsoft.Office.Interop.Excel;
//...
private async void TheCaller()
{
using (var ofd = new OpenFileDialog
{
Title = "Select File",
Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
AutoUpgradeEnabled = true,
})
{
if (ofd.ShowDialog() != DialogResult.OK) return;
var msg = "Loading... Please wait.";
Excel.Application xlApp = null;
Excel.Workbook xlWorkBook = null;
try
{
var dt = new DataTable();
lblStatus.Text = msg;
await Task.Run(() =>
{
xlApp = new Excel.Application();
xlWorkBook = xlApp.Workbooks.Open(ofd.FileName, Type.Missing, true);
var xlSheet = xlWorkBook.Sheets[1] as Excel.Worksheet;
var xlRange = xlSheet.UsedRange;
dt.Columns.AddRange((xlRange.Rows[xlRange.Row] as Excel.Range)
.Cells.Cast<Excel.Range>()
.Where(h => h.Value2 != null)
.Select(h => new DataColumn(h.Value2.ToString()
.Trim(), typeof(string))).ToArray());
foreach (var r in xlRange.Rows.Cast<Excel.Range>().Skip(1))
dt.Rows.Add(r.Cells.Cast<Excel.Range>()
.Take(dt.Columns.Count)
.Select(v => v.Value2 is null
? string.Empty
: v.Value2.ToString().Trim()).ToArray());
});
(dataGridView1.DataSource as DataTable)?.Dispose();
dataGridView1.DataSource = null;
dataGridView1.DataSource = dt;
msg = "Loading Completed";
}
catch (FileFormatException)
{
msg = "Unknown Excel file!";
}
catch (Exception ex)
{
msg = ex.Message;
}
finally
{
xlWorkBook?.Close(false);
xlApp?.Quit();
Marshal.FinalReleaseComObject(xlWorkBook);
Marshal.FinalReleaseComObject(xlApp);
xlWorkBook = null;
xlApp = null;
GC.Collect();
GC.WaitForPendingFinalizers();
lblStatus.Text = msg;
}
}
}
Note: You need to add reference to the mentioned library.
Not fast especially with a big number of cells but it gets the desired output.

Related

No value given for one or more required parameters error - Excel

I am getting data from excel and showing it in DataGridWiew.
I have two textboxes, one is for starting index for first record and other is for last record.
Code works fine. But lets suppose starting record is 1 and ending is 10 when I change 10 to 1 or 2 it gives me an error in this line:
adapter.Fill(dataTable);
Full Code is below:
public DataSet Parse(string fileName)
{
string connectionString = string.Format("provider = Microsoft.Jet.OLEDB.4.0; data source = {0}; Extended Properties = Excel 8.0;", fileName);
DataSet data = new DataSet();
foreach (var sheetName in GetExcelSheetNames(connectionString))
{
using (OleDbConnection con = new OleDbConnection(connectionString))
{
string query = "";
var dataTable = new DataTable();
if(tbStarting.Text.Trim()=="" && tbEnding.Text.Trim() == "")
{
query = string.Format("SELECT * FROM [{0}]", sheetName);
}
else
{
query = string.Format("SELECT * FROM [{0}] where SrNo between " + int.Parse(tbStarting.Text.Trim()) + " and " + int.Parse(tbEnding.Text.Trim()) + " order by SrNo", sheetName);
}
con.Open();
OleDbDataAdapter adapter = new OleDbDataAdapter(query, con);
adapter.Fill(dataTable);
data.Tables.Add(dataTable);
con.Close();
}
}
return data;
}
static string[] GetExcelSheetNames(string connectionString)
{
OleDbConnection con = null;
DataTable dt = null;
con = new OleDbConnection(connectionString);
con.Open();
dt = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dt == null)
{
return null;
}
String[] excelSheetNames = new String[dt.Rows.Count];
int i = 0;
foreach (DataRow row in dt.Rows)
{
excelSheetNames[i] = row["TABLE_NAME"].ToString();
i++;
}
return excelSheetNames;
}
Why this is happening please help me?
Looking at the code, it seems that your procedure is working when you ask to retrieve all the record in each table. But you are not showing which table (Sheet) is actually used afterwars.
Chances are, you are using the first one only.
When you submit some parameters, only one of the tables (Sheets) can fulfill those requirements. The other(s) don't, possibly because a field named [SrNo] is not present.
This causes the More Parameters Required error when trying to apply a filter.
Not related to the error, but worth noting: you don't need to recreate the whole DataSet + DataTables to filter your DataSources.
The DataSet.Tables[N].DefaultView.RowFilter can be used to get the same result without destroying all the objects each time a filter is required.
RowFilter has some limitations in the language (e.g. does not support BETWEEN, Field >= Value1 AND Field <= Value2 must be used), but it's quite effective.
This is a possible setup:
(xDataSet is a placeholder for your actual DataSet)
//Collect the values in the TextBoxes in a string array
private void button1_Click(object sender, EventArgs e)
{
string[] Ranges = new string[] { tbStarting.Text.Trim(), tbEnding.Text.Trim() };
if (xDataSet != null)
FilterDataset(Ranges);
}
private void FilterDataset(string[] Ranges)
{
if (string.IsNullOrEmpty(Ranges[0]) & string.IsNullOrEmpty(Ranges[1]))
xDataSet.Tables[0].DefaultView.RowFilter = null;
else if (string.IsNullOrEmpty(Ranges[0]) | string.IsNullOrEmpty(Ranges[1]))
return;
else if (int.Parse(Ranges[0]) < int.Parse(Ranges[1]))
xDataSet.Tables[0].DefaultView.RowFilter = string.Format("SrNo >= {0} AND SrNo <= {1}", Ranges[0], Ranges[1]);
else
xDataSet.Tables[0].DefaultView.RowFilter = string.Format("SrNo = {0}", Ranges[0]);
this.dataGridView1.Update();
}
I've modified your code you code a bit to handle those requirements.
(I've left here those filters anyway; they're not used, but if you still want them, they are in a working condition)
DataSet xDataSet = new DataSet();
string WorkBookPath = #"[Excel WorkBook Path]";
//Query one Sheet only. More can be added if necessary
string[] WBSheetsNames = new string[] { "Sheet1" };
//Open the Excel document and assign the DataSource to a dataGridView
xDataSet = Parse(WorkBookPath, WBSheetsNames, null);
dataGridView1.DataSource = xDataSet.Tables[0];
dataGridView1.Refresh();
public DataSet Parse(string fileName, string[] WorkSheets, string[] ranges)
{
if (!File.Exists(fileName)) return null;
string connectionString = string.Format("provider = Microsoft.ACE.OLEDB.12.0; " +
"data source = {0}; " +
"Extended Properties = \"Excel 12.0;HDR=YES\"",
fileName);
DataSet data = new DataSet();
string query = string.Empty;
foreach (string sheetName in GetExcelSheetNames(connectionString))
{
foreach (string WorkSheet in WorkSheets)
if (sheetName == (WorkSheet + "$"))
{
using (OleDbConnection con = new OleDbConnection(connectionString))
{
DataTable dataTable = new DataTable();
if ((ranges == null) ||
(string.IsNullOrEmpty(ranges[0]) || string.IsNullOrEmpty(ranges[1])) ||
(int.Parse(ranges[0]) > int.Parse(ranges[1])))
query = string.Format("SELECT * FROM [{0}]", sheetName);
else if ((int.Parse(ranges[0]) == int.Parse(ranges[1])))
query = string.Format("SELECT * FROM [{0}] WHERE SrNo = {1}", sheetName, ranges[0]);
else
query = string.Format("SELECT * FROM [{0}] WHERE (SrNo BETWEEN {1} AND {2}) " +
"ORDER BY SrNo", sheetName, ranges[0], ranges[1]);
con.Open();
OleDbDataAdapter adapter = new OleDbDataAdapter(query, con);
adapter.Fill(dataTable);
data.Tables.Add(dataTable);
};
}
}
return data;
}
static string[] GetExcelSheetNames(string connectionString)
{
string[] excelSheetNames = null;
using (OleDbConnection con = new OleDbConnection(connectionString))
{
con.Open();
using (DataTable dt = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null))
{
if (dt != null)
{
excelSheetNames = new string[dt.Rows.Count];
for (int i = 0; i < dt.Rows.Count; i++)
{
excelSheetNames[i] = dt.Rows[i]["TABLE_NAME"].ToString();
}
}
}
}
return excelSheetNames;
}

Can only read Excel file when it is actually open in Ms Excel

I am using the following code to open an excel file (XLS) and populate a DataTable with the first worksheet:
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", filename);
OleDbConnection connExcel = new OleDbConnection(connectionString);
connExcel.Open();
DataTable dtExcelSchema;
dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string SheetName = dtExcelSchema.Rows[0]["TABLE_NAME"].ToString();
connExcel.Close();
var adapter = new OleDbDataAdapter("SELECT * FROM [" + SheetName + "]", connectionString);
var ds = new DataSet();
int count = 0;
adapter.Fill(ds, SheetName);
DataTable dt = ds.Tables[0];
It works only when the file is already open in Ms Excel. Why could that be?
If the file is not open, I get an error message (on line connExcel.Open): External table is not in the expected format.
I'm facing the same problem and accordingly to this site, many developers are struggling for the same:
-When I try read Excel with OLE DB all values are empty
-Can't connect to excel file unless file is already open
Actually I'm using the classic connection string (note that I'm trying to read a 97/2003 file):
Provider=Microsoft.Jet.OLEDB.4.0; Data Source = " + GetFilename(filename) + "; Extended Properties ='Excel 8.0;HDR=NO;IMEX=1'
but the file can be read properly only if:
Is open in Excel or even in Word! (the file of course appears corrupted and unreadable, but then the OleDb procedure can read every line of the file), I didn't try with other Office apps
The file is not in read-only mode
I also tried to lock the file manually or to open it with other non-office applications, but the result is not the same. If I follow the two previous rules (file opened in Word or Excel in not read-only mode) I can see all the cells, otherwise it seems the first column is ignored completely (so F2 became F1, F3 became F2,... and F6, the last one, should became F5 otherwise it throws and out-of-index error).
In order to keep compatibility with OleDb without using 3rd parties libraries I found a very stupid workaround using Microsoft.Office.Interop.Excel assembly.
Excel.Application _app = new Excel.Application();
var workbooks = _app.Workbooks;
workbooks.Open(_filename);
// OleDb Connection
using (OleDbConnection conn = new OleDbConnection(connectionOleDb))
{
try
{
conn.Open();
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = conn;
cmd.CommandText = String.Format("SELECT * FROM [{0}$]", tableName);
OleDbDataReader myReader = cmd.ExecuteReader();
int i = 0;
while (myReader.Read())
{
//Here I read through all Excel rows
}
}
catch (Exception E)
{
MessageBox.Show("Error!\n" + E.Message);
}
finally
{
conn.Close();
workbooks.Close();
if (workbooks != null)
System.Runtime.InteropServices.Marshal.ReleaseComObject(workbooks);
_app.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(_app);
}
}
Essentially the first 3 lines run an Excel instance that lasts exactly the time needed to OleDb to perform its tasks.
The last 4 lines, inside the finally block, let the Excel instance to be closed correctly, immediately after the task and avoid ghost Excel processes.
I repeat it's a very stupid workaround that also requires a 1,5 MB dll (Microsoft.Office.Interop.Excel.dll) to be added to the project.
Anyway seems impossible that OleDb cannot manage by itself the missing data...
I had the same problem. If the file was open the read was ok but if the file was closed... some thing was strange... in my case I received strange data from columns and values.. Debugging I found the name of the first sheet and was strange ["xls _xlnm#_FilterDatabase"] looking on the internet I found that's a name of hidden sheet and a trick to avoid read this sheet (HERE) and so I've implemented a method:
private string getFirstVisibileSheet(DataTable dtSheet, int index = 0)
{
string sheetName = String.Empty;
if (dtSheet.Rows.Count >= (index + 1))
{
sheetName = dtSheet.Rows[index]["TABLE_NAME"].ToString();
if (sheetName.Contains("FilterDatabase"))
{
return getFirstVisibileSheet(dtSheet, ++index);
}
}
return sheetName;
}
To me worked very well.
My complete example code is:
string excelFilePath = String.Empty;
string stringConnection = String.Empty;
using (OpenFileDialog openExcelDialog = new OpenFileDialog())
{
openExcelDialog.Filter = "Excel 2007 (*.xlsx)|*.xlsx|Excel 2003 (*.xls)|*.xls";
openExcelDialog.FilterIndex = 1;
openExcelDialog.RestoreDirectory = true;
DialogResult windowsResult = openExcelDialog.ShowDialog();
if (windowsResult != System.Windows.Forms.DialogResult.OK)
{
return;
}
excelFilePath = openExcelDialog.FileName;
using (DataTable dt = new DataTable())
{
try
{
if (!excelFilePath.Equals(String.Empty))
{
stringConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + excelFilePath + ";Extended Properties='Excel 8.0; HDR=YES;';";
using (OleDbConnection conn = new OleDbConnection(stringConnection))
{
conn.Open();
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = conn;
DataTable dtSheet = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string sheetName = getFirstVisibileSheet(dtSheet);
cmd.CommandText = "SELECT * FROM [" + sheetName + "]";
dt.TableName = sheetName;
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
da.Fill(dt);
cmd = null;
conn.Close();
}
}
//Read and Use my DT
foreach (DataRow row in dt.Rows)
{
//On my case I need data on first and second Columns
if ((row.ItemArray.Count() < 2) ||
(row[0] == null || String.IsNullOrWhiteSpace(row[0].ToString()))
||
(row[1] == null ||String.IsNullOrWhiteSpace(row[1].ToString())))
{
continue;
}
//Get the number from the first COL
int colOneNumber = 0;
Int32.TryParse(row[0].ToString(), out colOneNumber);
//Get the string from the second COL
string colTwoString = row[1].ToString();
//Get the string from third COL if is a file path valid
string colThree = (row.ItemArray.Count() >= 3
&& !row.IsNull(2)
&& !String.IsNullOrWhiteSpace(row[2].ToString())
&& File.Exists(row[2].ToString())
) ? row[2].ToString() : String.Empty;
}
}
catch (Exception ex)
{
MessageBox.Show("Import error.\n" + ex.Message, "::ERROR::", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
}
}
private string getFirstVisibileSheet(DataTable dtSheet, int index = 0)
{
string sheetName = String.Empty;
if (dtSheet.Rows.Count >= (index + 1))
{
sheetName = dtSheet.Rows[index]["TABLE_NAME"].ToString();
if (sheetName.Contains("FilterDatabase"))
{
return getFirstVisibileSheet(dtSheet, ++index);
}
}
return sheetName;
}
Is it failing on ToString(), like here?
Error is "Object reference not set to an instance of an object"
Does Convert.ToString() fix anything?

Read excel table data using c#

I have excel file with several excel tables. I want to read all the data in excel tables for a given excel sheet and copy those to a different excel table.
One excel sheet may have several tables
I have done that through VBA. I want to find a C# code to achieve it.
This is the VBA code I used.
Dim tableObject As ListObject
Dim oLastRow As ListRow
Dim srcRow As Range
Set sheet = ThisWorkbook.Worksheets("Sheet1")
For Each tableObject In sheet.ListObjects
Set srcRow = tableObject.DataBodyRange
Set oLastRow = Worksheets("Sheet2").ListObjects("table1").ListRows.Add
srcRow.Copy
oLastRow.Range.PasteSpecial xlPasteValues
Next
Try to follow this:
public static DataTable exceldata(string filePath)
{
DataTable dtexcel = new DataTable();
bool hasHeaders = false;
string HDR = hasHeaders ? "Yes" : "No";
string strConn;
if (filePath.Substring(filePath.LastIndexOf('.')).ToLower() == ".xlsx")
strConn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filePath + ";Extended Properties=\"Excel 12.0;HDR=" + HDR + ";IMEX=0\"";
else
strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath + ";Extended Properties=\"Excel 8.0;HDR=" + HDR + ";IMEX=0\"";
OleDbConnection conn = new OleDbConnection(strConn);
conn.Open();
DataTable schemaTable = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, new object[] { null, null, null, "TABLE" });
//Looping Total Sheet of Xl File
/*foreach (DataRow schemaRow in schemaTable.Rows)
{
}*/
//Looping a first Sheet of Xl File
DataRow schemaRow = schemaTable.Rows[0];
string sheet = schemaRow["TABLE_NAME"].ToString();
if (!sheet.EndsWith("_"))
{
string query = "SELECT * FROM [" + sheet3 + "]";
OleDbDataAdapter daexcel = new OleDbDataAdapter(query, conn);
dtexcel.Locale = CultureInfo.CurrentCulture;
daexcel.Fill(dtexcel);
}
conn.Close();
return dtexcel;
}
You can also use the following part of code to import data from tables inside your excel sheet
static void Main(string[] args)
{
try
{
ReadInData readInData = new ReadInData(#"C:\SC.xlsx", "sc_2014");
IEnumerable<Recipient> recipients = readInData.GetData();
}
catch (Exception ex)
{
if(!(ex is FileNotFoundException || ex is ArgumentException || ex is FileToBeProcessedIsNotInTheCorrectFormatException))
throw;
Console.WriteLine(ex.Message);
}
Console.Write(Press any key to continue...);
Console.ReadKey(true);
}
public static class ReadInData
{
public static IEnumerable<Recipient> GetData(string path, string worksheetName, bool isFirstRowAsColumnNames = true)
{
return new ExcelData(path).GetData(worksheetName, isFirstRowAsColumnNames)
.Select(dataRow => new Recipient()
{
Municipality = dataRow["Municipality"].ToString(),
Sexe = dataRow["Sexe"].ToString(),
LivingArea = dataRow["LivingArea"].ToString()
});
}
}
private static IExcelDataReader GetExcelDataReader(string path, bool isFirstRowAsColumnNames)
{
using (FileStream fileStream = File.Open(path, FileMode.Open, FileAccess.Read))
{
IExcelDataReader dataReader;
if (path.EndsWith(".xls"))
dataReader = ExcelReaderFactory.CreateBinaryReader(fileStream);
else if (path.EndsWith(".xlsx"))
dataReader = ExcelReaderFactory.CreateOpenXmlReader(fileStream);
else
throw new FileToBeProcessedIsNotInTheCorrectFormatException("The file to be processed is not an Excel file");
dataReader.IsFirstRowAsColumnNames = isFirstRowAsColumnNames;
return dataReader;
}
}
private static DataSet GetExcelDataAsDataSet(string path, bool isFirstRowAsColumnNames)
{
return GetExcelDataReader(path, isFirstRowAsColumnNames).AsDataSet();
}
private static DataTable GetExcelWorkSheet(string path, string workSheetName, bool isFirstRowAsColumnNames)
{
DataTable workSheet = GetExcelDataAsDataSet(path, isFirstRowAsColumnNames).Tables[workSheetName];
if (workSheet == null)
throw new WorksheetDoesNotExistException(string.Format("The worksheet {0} does not exist, has an incorrect name, or does not have any data in the worksheet", workSheetName));
return workSheet;
}
private static IEnumerable<DataRow> GetData(string path, string workSheetName, bool isFirstRowAsColumnNames = true)
{
return from DataRow row in GetExcelWorkSheet(path, workSheetName, isFirstRowAsColumnNames).Rows select row;
}
I think you'll find that Excel's COM interface is very similar to VBA in that the class libraries are all identical. The only challenge is in grabbing the Excel instance and then managing the syntactical differences (ie VBA uses parentheses for indexers, and C# uses square brackets).
Assuming your VBA works, this should be the identical code in C# to grab an open instance of Excel and do what you have done. If you want to open a new instance of Excel, that's actually even easier (and very easy to Google):
Excel.Application excel;
Excel.Workbook wb;
try
{
excel = (Excel.Application)Marshal.GetActiveObject("Excel.Application");
excel.Visible = true;
wb = (Excel.Workbook)excel.ActiveWorkbook;
}
catch (Exception ex)
{
ErrorMessage = "Trouble Locating Open Excel Instance";
return;
}
Excel.Worksheet sheet = wb.Worksheets["Sheet1"];
foreach (Excel.ListObject lo in sheet.ListObjects)
{
Excel.Range srcRow = lo.DataBodyRange;
Excel.ListRow oLastRow = wb.Worksheets["Sheet2"].ListObjects["table1"].ListRows.Add();
srcRow.Copy();
oLastRow.Range.PasteSpecial(Excel.XlPasteType.xlPasteValues);
}
This all presupposes you have referenced Excel COM and set your using:
using Excel = Microsoft.Office.Interop.Excel;

How to optimize insert into sql table of 3600 rows from excel file

I have an Excel file that originally had about 600 rows, and I was able to convert the excel file to a data table and everything got inserted into the sql table correctly.
The Excel file now has 3,600 rows and is having some type of issues that is not throwing an error but after 5 mins or so all the rows are still not inserted into the sql table.
Now, when converting the Excel file to a in memory datatable this happens very quickly, but when looping the datatable and inserting into the sql table is where I'm loosing data and is very slow, but I'm am receiving no errors what so ever.
For one, on each insert I've got to make a new connection to the database and insert the record, and I already know this is very VERY wrong, and I'm hoping to get some guidance from one of the sql pros on this one.
What is the correct way to process a in memory datatable with 3,600 records / rows with-out making 3,600 new connections?
--Here is the code the processes the excel file, and This happens very quickly.--
public static async Task<DataTable> ProcessExcelToDataTableAsync(string pathAndNewFileName, string hasHeader/*Yes or No*/)
{
return await Task.Run(() =>
{
string conStr = "", SheetName = "";
switch (Path.GetExtension(pathAndNewFileName))
{
case ".xls": //Excel 97-03
conStr = ConfigurationManager.ConnectionStrings["Excel03ConString"].ConnectionString;
break;
case ".xlsx":
conStr = ConfigurationManager.ConnectionStrings["Excel07ConString"].ConnectionString;
break;
}
conStr = String.Format(conStr, pathAndNewFileName, hasHeader);
OleDbConnection connExcel = new OleDbConnection(conStr);
OleDbCommand cmdExcel = new OleDbCommand();
OleDbDataAdapter oda = new OleDbDataAdapter();
DataTable dt = new DataTable();
cmdExcel.Connection = connExcel;
//Get the name of First Sheet
connExcel.Open();
DataTable dtExcelSchema;
dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
SheetName = dtExcelSchema.Rows[0]["TABLE_NAME"].ToString();
connExcel.Close();
//Read Data from First Sheet
connExcel.Open();
cmdExcel.CommandText = "SELECT * From [" + SheetName + "]";
oda.SelectCommand = cmdExcel;
oda.Fill(dt);
connExcel.Close();
cmdExcel.Dispose();
oda.Dispose();
if (File.Exists(pathAndNewFileName))
{
File.Delete(pathAndNewFileName);
}
return dt;
});
}
--Here is the code that processes the in memory datatable and inserts each new record into the sql table, and this is where things stop working, no visible errors, but just does not return or work--
**I am in need of a better way to optimize this function where the records get inserted into the the sql table.
static async Task<ProcessDataTablePartsResult> ProcessDataTablePartsAsync(int genericCatalogID, DataTable initialExcelData)
{
//#GenericCatalogID INT,
//#Number VARCHAR(50),
//#Name VARCHAR(200),
//#Length DECIMAL(8,4),
//#Width DECIMAL(8,4),
//#Height DECIMAL(8,4),
//#ProfileID TINYINT,
//#PackageQty DECIMAL(9,4),
//#CategoryID INT,
//#UnitMeasure VARCHAR(10),
//#Cost MONEY,
//#PartID INT OUT
return await Task.Run(() =>
{
DataTable badDataTable = null,
goodDataTable = initialExcelData.Clone();
goodDataTable.Clear();
int newPartID = 0,
currIx = 0,
numGoodRows = initialExcelData.Rows.Count,
numBadRows = 0;
List<int> badIndexes = new List<int>();
List<int> goodIndexes = new List<int>();
List<Profile> profiles = GenericCatalogManagerBL.GetProfiles(_genericCNN);
List<Category> categories = GenericCatalogManagerBL.GetAllCategoryNameID(_genericCNN);
Func<string, byte> getProfileID = delegate(string x)
{
return profiles.Where(p => p.TheProfile.ToLower().Replace(" ", "") == x.ToLower().Replace(" ", "")).FirstOrDefault().ID;
};
Func<string, int> getCategoryID = delegate(string x)
{
return categories.Where(c => c.Name.ToLower().Replace(" ", "") == x.ToLower().Replace(" ", "")).FirstOrDefault().ID;
};
foreach (DataRow r in initialExcelData.Rows)
{
try
{
IPart p = new Part
{
GenericCatalogID = genericCatalogID,
Number = r["Number"].ToString(),
Name = r["Name"].ToString(),
Length = decimal.Parse(r["Length"].ToString()),
Width = decimal.Parse(r["Width"].ToString()),
Height = decimal.Parse(r["Height"].ToString()),
ProfileID = getProfileID(r["Profile"].ToString()),
CategoryID = getCategoryID(r["Category"].ToString()),
PackageQty = int.Parse(r["PackageQty"].ToString()),
UnitMeasure = r["UnitMeasure"].ToString(),
Cost = decimal.Parse(r["Cost"].ToString())
};
GenericCatalogManagerBL.InsertPart(_genericCNN, p, out newPartID);
goodIndexes.Add(currIx);
}
catch (Exception)
{
numBadRows++;
numGoodRows--;
badIndexes.Add(currIx);
}
currIx++;
}
for (int i = 0; i < goodIndexes.Count; i++)
{
goodDataTable.ImportRow(initialExcelData.Rows[goodIndexes[i]]);
initialExcelData.Rows[goodIndexes[i]].Delete();
}
initialExcelData.AcceptChanges();
goodDataTable.AcceptChanges();
if (initialExcelData.Rows.Count > 0)
{
badDataTable = initialExcelData;
}
return new ProcessDataTablePartsResult(numGoodRows, numBadRows, badDataTable, goodDataTable);
});
}
**--Here is the entire flow of the function--**
public static async Task<GenericPartsReport> ProcessGenericPartsAsync(int genericCatalogID, MembershipUser user, HttpRequest request, bool emailReport, bool hasHeaders)
{
byte[] fbytes = new byte[request.ContentLength];
request.InputStream.Read(fbytes, 0, fbytes.Length);
string pathAndNewFileName = Path.GetRandomFileName() + Path.GetExtension(request.Headers["X-FILE-NAME"]),
badReportTableString = "",
goodReportTableString = "";
GenericPartsReport report = new GenericPartsReport();
//get the users temp folder
pathAndNewFileName = UtilCommon.SiteHelper.GetUserTempFolder(user, request) + pathAndNewFileName;
File.WriteAllBytes(pathAndNewFileName, fbytes);
//process the excel file first
DataTable excelDataTable = await ProcessExcelToDataTableAsync(pathAndNewFileName, hasHeaders ? "Yes" : "No");
ProcessDataTablePartsResult processedResult = await ProcessDataTablePartsAsync(genericCatalogID, excelDataTable);
if (processedResult.BadDataTable != null)
{
if (processedResult.BadDataTable.Rows.Count > 0)
{
badReportTableString = await BuildTableReportAsync(processedResult.BadDataTable, "AlumCloud Parts Not Added Report");
processedResult.BadDataTable.Dispose();
}
}
if (processedResult.GoodDataTable != null)
{
if (processedResult.GoodDataTable.Rows.Count > 0)
{
goodReportTableString = await BuildTableReportAsync(processedResult.GoodDataTable, "AlumCloud Parts Added Report");
processedResult.GoodDataTable.Dispose();
}
}
report.Report = "A total number of (" + processedResult.NumberOfGoodRows + ") records was added to your generic catalog.<br/><br/>A total number of (" + processedResult.NumberOfBadRows + ") records were excluded from being added to your generic catalog.";
if (processedResult.NumberOfBadRows > 0)
{
report.Report += "<br/><br/>You can review an excel file that meets the standards here: <a href='" + _exampleExcelFile + "'>How to format a part excel file</a>.";
report.HasBadRows = true;
}
if (processedResult.NumberOfGoodRows > 0)
{
report.Report += "<br/><br/><b>Below is all of the parts that were added to your generic catalog<b/><br/><br/>" + goodReportTableString;
}
if (processedResult.NumberOfBadRows > 0)
{
report.Report += "<br/><br/><b>Below is all of the parts that were not added to your generic catalog</b><br/><br/>" + badReportTableString;
}
if (emailReport)
{
AFCCIncCommonUtil.EmailUtil.SendMailToThreadPool(user.Email, _supportEmail, report.Report, "AlumCloud Generic Catalog Parts Report", true);
}
excelDataTable.Dispose();
return report;
}
--This is the function that never returns or is in some state of limbo--
ProcessDataTablePartsResult processedResult = await ProcessDataTablePartsAsync(genericCatalogID, excelDataTable);

Missing First Columns and First Row in Excel C#

I am trying to read an excel file in excel, but for some reason sometime, the first column is missing and first row is missing from the data.
When I open the file in excel and save it without any changes, the files are read correctly.
Any ideas about how this might happen?
Below is the code i am using to read the file:
string xlConn = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source="
+ txt_InputFile.Text
+ ";Extended Properties=Excel 8.0;";
using (OleDbConnection dbConnection = new OleDbConnection(xlConn))
{
dbConnection.Open();
// Get the name of the first worksheet:
DataTable dbSchema = dbConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dbSchema == null || dbSchema.Rows.Count < 1)
{
//"Error: Could not determine the name of the first worksheet."
throw new Exception(Program.lm_GetMethodLanguage(this.GetType().Name, "wp_InputFile_CloseFromNext", 5) );
}
string firstSheetName = dbSchema.Rows[0]["TABLE_NAME"].ToString();
using (
OleDbDataAdapter dbCommand = new OleDbDataAdapter("SELECT * FROM [" + firstSheetName + "]",
dbConnection))
{
using (DataSet myDataSet = new DataSet())
{
dbCommand.Fill(myDataSet);
inputData = myDataSet.Tables[0];
}
}
}
Use this.This will retrieve all the sheets in excel sheet.
private String[] GetExcelSheetNames(string excelFile)
{
try
{
excelConnectionString = #"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + ""yoursourcepath"+ ";Extended Properties=Excel 12.0;Persist Security Info=False";
excelConnection = new OleDbConnection(excelConnectionString);
excelConnection.Open();
dt = excelConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dt == null)
{
return null;
}
excelSheets = new String[dt.Rows.Count];
int i = 0;
foreach (DataRow row in dt.Rows)
{
excelSheets[i] = row["TABLE_NAME"].ToString();
i++;
}
return excelSheets;
}
catch (Exception ex)
{
return null;
}
finally
{
if (excelConnection != null)
{
excelConnection.Close();
excelConnection.Dispose();
}
if (dt != null)
{
dt.Dispose();
}
}
}

Categories

Resources