read particular column of excel file and add to query using c# - c#

I want to read a particular column from an excel file and pick each value and put it into a query using c#. I have written a code to read an excel file and show it in datagridview but got stuck while reading a particular column.
Need some help. Below is the code.
private void button1_Click(object sender, EventArgs e)
{
using (OpenFileDialog ofd = new OpenFileDialog() { Filter = "Excel Workbook|*.xls", ValidateNames = true })
{
if (ofd.ShowDialog() == DialogResult.OK)
{
FileStream fs = File.Open(ofd.FileName, FileMode.Open, FileAccess.Read);
IExcelDataReader reader = ExcelReaderFactory.CreateBinaryReader(fs);
var conf = new ExcelDataSetConfiguration
{
ConfigureDataTable = _ => new ExcelDataTableConfiguration
{
UseHeaderRow = true
}
};
dataSet = reader.AsDataSet(conf);
cboSheet.Items.Clear();
foreach (DataTable dt in dataSet.Tables)
cboSheet.Items.Add(dt.TableName);
reader.Close();
}
}
}
private void cboSheet_SelectedIndexChanged(object sender, EventArgs e)
{
dataGridView.DataSource = dataSet.Tables[cboSheet.SelectedIndex];
}

Do you know the excel file will be the same each time? If you can ensure the file will be consistently the same without any column modifications, you can do the following with OfficeOpenXml package.
public IEnumerable<string> ReadFile(string path)
{
using(var file = new FileStream(path, FileMode.Open))
using(var memory = new MemoryStream())
{
file.CopyTo(memory);
using(var package = new ExcelPackage(memory))
if(package.Workbook.Worksheets.Count != 0)
foreach(ExcelWorksheet worksheet in package.Workbook.Worksheets)
for(var row = 0; worksheet.Dimension.Start.Row; row <= worksheet.Dimension.End.Row; row++)
yield return worksheet.Cells[row, 2].Value;
}
}
For brevity I didn't abstract, you could make the code a bit more clear by separating the multiple loop, for worksheet then the rows. As you can see the "2" is representing the specified column. But that would give you a collection you could iterate to dump into your query to process.

You can use ExcelDataReader, that will give you the entire excel content as DataTable
Or you can use OleDbCommand
OleDbConnection conn = new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + ExcelPath + "; Extended Properties = 'Excel 12.0;HDR=YES;';");
conn.Open();
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = conn;
cmd.CommandText = SELECT [Colum1], [Colum2] FROM [Sheet1$];
OleDbDataReader reader = cmd.ExecuteReader();

Related

Trimming all cells in DataTable

I am using the below code to trim all cells in my DataTable.
The problem is, that I am doing it through a loop, and depending on what I fill the DataTable with, if it has 1500 rows and 20 columns, the loop takes a really, really long time.
DataColumn[] stringColumns = dtDataTable.Columns.Cast<DataColumn>().Where(c => c.DataType == typeof(string)).ToArray();
foreach (DataRow row in dtDataTable.Rows)
{
foreach (DataColumn col in stringColumns)
{
if (row[col] != DBNull.Value)
{
row.SetField<string>(col, row.Field<string>(col).Trim());
}
}
}
And here is how I am importing my Excel sheet to the DataTable:
using (OpenFileDialog ofd = new OpenFileDialog() { Title = "Select File", Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97-2003|*.xls|All Files(*.*)|*.*", Multiselect = false, ValidateNames = true })
{
if (ofd.ShowDialog() == DialogResult.OK)
{
String PathName = ofd.FileName;
FileName = System.IO.Path.GetFileNameWithoutExtension(ofd.FileName);
strConn = string.Empty;
FileInfo file = new FileInfo(PathName);
if (!file.Exists) { throw new Exception("Error, file doesn't exists!"); }
string extension = file.Extension;
switch (extension)
{
case ".xls":
strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + PathName + ";Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
case ".xlsx":
strConn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + PathName + ";Extended Properties='Excel 12.0;HDR=Yes;IMEX=1;'";
default:
strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + PathName + ";Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
}
}
else
{
return;
}
}
using (OleDbConnection cnnxls = new OleDbConnection(strConn))
{
using (OleDbDataAdapter oda = new OleDbDataAdapter(string.Format("select * from [{0}$]", "Sheet1"), cnnxls))
{
oda.Fill(dtDataTableInitial);
}
}
//Clone dtDataTableInitial so that I can have the new DataTable in String Type
dtDataTable = dtDataImportInitial.Clone();
foreach (DataColumn col in dtDataTable.Columns)
{
col.DataType = typeof(string);
}
foreach (DataRow row in dtDataImportInitial.Rows)
{
dtDataTable.ImportRow(row);
}
Is there a more efficient way of accomplishing this?
EDIT: As per JQSOFT's suggestion, I am using OleDbDataReader now, but am still running two issues:
One: SELECT RTRIM(LTRIM(*)) FROM [Sheet1$] doesn't seem to work.
I know that it is possible to select each column one by one, but the number of and header of the columns in the excel sheet is random, and I am not sure how to adjust my SELECT string to account for this.
Two: A column whose rows are mostly populated with numbers, but have a few rows with letters seem to have those rows with letters omitted. For example:
Col1
1
2
3
4
5
6
a
b
Becomes:
Col1
1
2
3
4
5
6
However, I have discovered that if I manually go into the excel sheet and convert the entire table cell format to "Text", this issue is resolved. However, doing this converts any dates in that excel sheet into unrecognizable strings of numbers, so I want to avoid doing this if at all possible.
For example: 7/2/2020 becomes 44014 if converted to "Text".
Here is my new code:
private void Something()
{
if (ofd.ShowDialog() == DialogResult.OK)
{
PathName = ofd.FileName;
FileName = System.IO.Path.GetFileNameWithoutExtension(ofd.FileName);
strConn = string.Empty;
FileInfo file = new FileInfo(PathName);
if (!file.Exists) { throw new Exception("Error, file doesn't exists!"); }
}
using (OleDbConnection cn = new OleDbConnection { ConnectionString = ConnectionString(PathName, "No") })
{
using (OleDbCommand cmd = new OleDbCommand { CommandText = query, Connection = cn })
{
cn.Open();
OleDbDataReader dr = cmd.ExecuteReader();
dtDataTable.Load(dr);
}
}
dataGridView1.DataSource = dtDataTable;
}
public string ConnectionString(string FileName, string Header)
{
OleDbConnectionStringBuilder Builder = new OleDbConnectionStringBuilder();
if (Path.GetExtension(FileName).ToUpper() == ".XLS")
{
Builder.Provider = "Microsoft.Jet.OLEDB.4.0";
Builder.Add("Extended Properties", string.Format("Excel 8.0;IMEX=1;HDR=Yes;", Header));
}
else
{
Builder.Provider = "Microsoft.ACE.OLEDB.12.0";
Builder.Add("Extended Properties", string.Format("Excel 12.0;IMEX=1;HDR=Yes;", Header));
}
Builder.DataSource = FileName;
return Builder.ConnectionString;
}
OleDb Objects
Actually what I meant is, to get formatted/trimmed string values from the Excel Sheet and create a DataTable with DataColumn objects of string type only, use the forward-only OleDbDataReader to create both, DataColumn and DataRow objects as it reads. Doing so, the data will be modified and filled in one step hence no need to call another routine to loop again and waste some more time. Also, consider using asynchronous calls to speed up the process and avoid freezing the UI while executing the lengthy task.
Something might help you to go:
private async void TheCaller()
{
using (var ofd = new OpenFileDialog
{
Title = "Select File",
Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
AutoUpgradeEnabled = true,
})
{
if (ofd.ShowDialog() != DialogResult.OK) return;
var conString = string.Empty;
var msg = "Loading... Please wait.";
try
{
switch (ofd.FilterIndex)
{
case 1: //xlsx
conString = $"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={ofd.FileName};Extended Properties='Excel 12.0;HDR=Yes;IMEX=1;'";
break;
case 2: //xls
conString = $"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={ofd.FileName};Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
break;
default:
throw new FileFormatException();
}
var sheetName = "sheet1";
var dt = new DataTable();
//Optional: a label to show the current status
//or maybe show a ProgressBar with ProgressBarStyle = Marquee
lblStatus.Text = msg;
await Task.Run(() =>
{
using (var con = new OleDbConnection(conString))
using (var cmd = new OleDbCommand($"SELECT * From [{sheetName}$]", con))
{
con.Open();
using (var r = cmd.ExecuteReader())
while (r.Read())
{
if (dt.Columns.Count == 0)
for (var i = 0; i < r.FieldCount; i++)
dt.Columns.Add(r.GetName(i).Trim(), typeof(string));
object[] values = new object[r.FieldCount];
r.GetValues(values);
dt.Rows.Add(values.Select(x => x?.ToString().Trim()).ToArray());
}
}
});
//If you want...
dataGridView1.DataSource = null;
dataGridView1.DataSource = dt;
msg = "Loading Completed";
}
catch (FileFormatException)
{
msg = "Unknown Excel file!";
}
catch (Exception ex)
{
msg = ex.Message;
}
finally
{
lblStatus.Text = msg;
}
}
}
Here's a demo, reading sheets with 8 columns and 5000 rows from both xls and xlsx files:
Less than a second. Not bad.
However, this will not work correctly if the Sheet has mixed-types columns like your case where the third column has string and int values in different rows. That because the data type of a column is guessed in Excel by examining the first 8 rows by default. Changing this behavior requires changing the registry value of TypeGuessRows in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\x.0\Engines\Excel from 8 to 0 to force checking all the rows instead of just the first 8. This action will dramatically slow down the performance.
Office Interop Objects
Alternatively, you could use the Microsoft.Office.Interop.Excel objects to read the Excel Sheet, get and format the values of the cells regardless of their types.
using Excel = Microsoft.Office.Interop.Excel;
//...
private async void TheCaller()
{
using (var ofd = new OpenFileDialog
{
Title = "Select File",
Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
AutoUpgradeEnabled = true,
})
{
if (ofd.ShowDialog() != DialogResult.OK) return;
var msg = "Loading... Please wait.";
Excel.Application xlApp = null;
Excel.Workbook xlWorkBook = null;
try
{
var dt = new DataTable();
lblStatus.Text = msg;
await Task.Run(() =>
{
xlApp = new Excel.Application();
xlWorkBook = xlApp.Workbooks.Open(ofd.FileName, Type.Missing, true);
var xlSheet = xlWorkBook.Sheets[1] as Excel.Worksheet;
var xlRange = xlSheet.UsedRange;
dt.Columns.AddRange((xlRange.Rows[xlRange.Row] as Excel.Range)
.Cells.Cast<Excel.Range>()
.Where(h => h.Value2 != null)
.Select(h => new DataColumn(h.Value2.ToString()
.Trim(), typeof(string))).ToArray());
foreach (var r in xlRange.Rows.Cast<Excel.Range>().Skip(1))
dt.Rows.Add(r.Cells.Cast<Excel.Range>()
.Take(dt.Columns.Count)
.Select(v => v.Value2 is null
? string.Empty
: v.Value2.ToString().Trim()).ToArray());
});
(dataGridView1.DataSource as DataTable)?.Dispose();
dataGridView1.DataSource = null;
dataGridView1.DataSource = dt;
msg = "Loading Completed";
}
catch (FileFormatException)
{
msg = "Unknown Excel file!";
}
catch (Exception ex)
{
msg = ex.Message;
}
finally
{
xlWorkBook?.Close(false);
xlApp?.Quit();
Marshal.FinalReleaseComObject(xlWorkBook);
Marshal.FinalReleaseComObject(xlApp);
xlWorkBook = null;
xlApp = null;
GC.Collect();
GC.WaitForPendingFinalizers();
lblStatus.Text = msg;
}
}
}
Note: You need to add reference to the mentioned library.
Not fast especially with a big number of cells but it gets the desired output.

How can i create a new worksheet and save it there whenever the date changes its month in EPPlus

I have a column named datein which has dates in it. The question is how can i create a new worksheet and save it there whenever the date changes its month. A new month new worksheet.
all I have right now is my code for saving into excel using epplus.
SaveFileDialog saveFileDialog1 = new SaveFileDialog();
using (MySqlConnection con = new MySqlConnection(connectionString))
{
using (MySqlCommand cmd = new MySqlCommand("SELECT * FROM statusrouted.routed", con))
{
cmd.CommandType = CommandType.Text;
using (MySqlDataAdapter sda = new MySqlDataAdapter(cmd))
{
using (DataTable dt = new DataTable())
{
using (ExcelPackage pck = new ExcelPackage())
{
sda.Fill(dt);
ExcelWorksheet ws = pck.Workbook.Worksheets.Add(DateTime.Today.ToString("MMMM-yyyy"));
ws.Cells.Style.Font.Size = 11;
ws.Cells["B1:K1"].Merge = true;
ws.Cells["B1"].Value = "INCOMING AND OUTGOING ROUTED COMMUNICATIONS";
ws.Cells["B1"].Style.Font.Size = 12;
ws.Cells["B1"].Style.Font.Bold = true;
ws.Cells["B1"].Style.HorizontalAlignment = OfficeOpenXml.Style.ExcelHorizontalAlignment.Center;
ws.Cells["B1"].Style.VerticalAlignment = OfficeOpenXml.Style.ExcelVerticalAlignment.Center;
ws.Cells["A2"].LoadFromDataTable((this.maindgv.DataSource as DataTable).DefaultView.ToTable(), true);
ws.DeleteColumn(1);
saveFileDialog1.Title = "Save as Excel";
saveFileDialog1.FileName = "";
saveFileDialog1.Filter = "Excel files(2007)|*.xlsx";
if (saveFileDialog1.ShowDialog() != DialogResult.Cancel)
{
try
{
pck.SaveAs(new FileInfo(#""+ saveFileDialog1.FileName));
recentsToolStripMenuItem1.AddRecentItem(#"" + saveFileDialog1.FileName);
}
catch (Exception)
{
DialogResult reminder = MessageBox.Show("Cannot save file, file opened in another program.\nClose it first! ", "Save Failed", MessageBoxButtons.OK);
}
}
}
}
}
}
}
How the logic will work is as given below
Follow these steps
var dateGroups = dt.AsEnumerable().GroupBy(row => row["ColumnName of Date Time"]);
foreach (var group in dateGroups)
{
// the group.Key will give the Date on the basis of which you can create sheets.
// and then your logic of excel
foreach (var rows in group)
{
// excel filling logic
}
}

How to create a Treeview from CSV after uploading it into a Dataset in C#?

I am new to C#. I decided to try and make a program for my department that users can load a CSV and look up Serial Numbers and Names associated with them. In my program, I am importing a CSV file into a DataViewGrid and displaying it.
Is it possible to then take the information I just loaded,and display it in a TreeView?
The information in the CSV currently contains only 2 headers, but I would like to add more. The first column is a Serial Number, and the second column is a Name.
Thanks in advance for any help.
Below is how I import the CSV.
public DataTable ReadCsv(string fileName)
{
DataTable dt = new DataTable("Data");
using (OleDbConnection cn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
Path.GetDirectoryName(fileName) + "\";Extended Properties='text;HDR=yes;FMT=Delimited(,)';"))
{
using (OleDbCommand cmd = new OleDbCommand(string.Format("select *from [{0}]", new FileInfo(fileName).Name), cn))
{
cn.Open();
using (OleDbDataAdapter adapter = new OleDbDataAdapter(cmd))
{
adapter.Fill(dt);
}
}
}
return dt;
}
//Button Open
private void btnOpen_Click(object sender, EventArgs e)
{
try
{
using (OpenFileDialog ofd = new OpenFileDialog() { Filter = "CSV|*.csv", ValidateNames = true, Multiselect = false })
{
if (ofd.ShowDialog() == DialogResult.OK)
dataGridView1.DataSource = ReadCsv(ofd.FileName);
dataGridView1.AutoResizeColumns(DataGridViewAutoSizeColumnsMode.AllCells); //Autosize columns
label1.Text = ofd.FileName + " Loaded"; //Label update
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message, "Unable To Read CSV", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
}

Excel Data Reader Issues, column Names, and Sheet Selection

I am using Excel Data Reader to read some data in to an Entity Framework Database
The code below is working but i need some further refinements
First of all IsFirstRowAsColumnNames does not seem to be working as intended and I have to use .Read instead.
The fudge i had in originally to select a particular sheet was has scuppered plans, can anyone help with this excelReader.Name at the moment is pointless unless i can specifically loop through or select a sheet, which I originally used .Read to achieve hence the conflict.
It would also be nice to refer to the actual column header names to retrieve the data rather than indexes such as var name = reader["applicationname"].ToString() in SQL client;
Is there perhaps a better Extension i could use to read in excel data if i can't achieve the above.
public static void DataLoadAliases(WsiContext context)
{
const string filePath = #"Alias Master.xlsx";
var stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
var excelReader = filePath.Contains(".xlsx")
? ExcelReaderFactory.CreateOpenXmlReader(stream)
: ExcelReaderFactory.CreateBinaryReader(stream);
excelReader.IsFirstRowAsColumnNames = true;
excelReader.Read(); //skip first row
while (excelReader.Read())
{
if (excelReader.Name == "Alias Master")
{
var aliasId = excelReader.GetInt16(0);
var aliasName = excelReader.GetString(1);
//Prevent blank lines coming in from excel;
if (String.IsNullOrEmpty(aliasName)) continue;
context.Aliases.Add(new ApplicationAlias
{
AliasId = aliasId,
Name = aliasName,
});
}
else
{
excelReader.NextResult();
}
}
excelReader.Close();
context.SaveChanges();
}
for .XLSX file i use OpenXML SDK :
http://www.microsoft.com/en-us/download/details.aspx?id=30425
for XLS file i use a OleDbConnection as see below :
OleDbConnection oledbConn = new OleDbConnection(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + FilePath+ ";Extended Properties='Excel 12.0;HDR=NO;IMEX=1;';");
oledbConn.Open();
OleDbCommand cmd = new OleDbCommand();
OleDbDataAdapter oleda = new OleDbDataAdapter();
DataSet ds = new DataSet();
DataTable dt = oledbConn.GetOleDbSchemaTable(System.Data.OleDb.OleDbSchemaGuid.Tables, null);
string workSheetName = (string)dt.Rows[0]["TABLE_NAME"];
cmd.Connection = oledbConn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = "SELECT * FROM [" + workSheetName + "]";
oleda = new OleDbDataAdapter(cmd);
oleda.Fill(ds, "Donnees");
oledbConn.Close();
return ds.Tables[0];
DataTable DT = new DataTable();
FileStream stream = File.Open(Filepath, FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
DataSet result = excelReader.AsDataSet();
excelReader.Close();
DT = result.Tables[0];

Read columns names issue with reader.GetName and OLEDB Excel provider

I have an issue to retrieve the columns names in an Excel sheet.
I have an Excel sheet with only 3 cells in the first row with these 3 values:
in A1: A
in B1: B
in C1: A.B.C
When I try to execute my method the label shows:
A,B,A#B#C
And not:
A,B,A.B.C
My Code:
protected void btnExecute_Click(object sender, EventArgs e)
{
string fullFileName = #"C:\TEST.xls";
List<string> columns = new List<string>();
string connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=Excel 8.0;", fullFileName);
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
// Retrieves the first sheet
DataTable dt = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string firstSheet = dt.Rows[0]["TABLE_NAME"].ToString();
// Retrieves the list column name
string query = string.Format("SELECT TOP 1 * FROM [{0}]", firstSheet);
OleDbCommand cmd = new OleDbCommand(query, conn);
OleDbDataReader reader = cmd.ExecuteReader();
for (int i = 0; i < reader.FieldCount; i++)
{
columns.Add(reader.GetName(i));
}
}
lblCols.Text = string.Join(",", columns.ToArray());
}
Do you have an idea to fix this issue??
Thanks in advance.
Daniel
Try this OleDBAdapter Excel QA I posted via stack overflow.
I just tested this and it picked up "A.B.C" from a sample .xls you described.
i.e. add the following to the bottom for a quick test:
Object o = ds.Tables["xlsImport"].Rows[0]["LocationID"];
Object oa = ds.Tables["xlsImport"].Rows[0]["PartID"];
Object row0Col3 = ds.Tables["xlsImport"].Rows[0][2];
string valLocationID = o.ToString();
string valPartID = oa.ToString();
string rowZeroColumn3 = row0Col3.ToString();

Categories

Resources