Using openxml to read an excel file and write data into a database.
Data from the spreadsheet is stored in a data table and then mapped into a object array. The problem is: if a user deletes a row in the excel file I get the exception 'Specified argument was out of the range of valid values.
Parameter name: index' when reading the file.
To counter this, I have tried checking if the cell value is null, if so then take it as an empty string. However, the error occurs when I reach the 9th column in the last "deleted" row (8 columns out of 10 are taken as an empty string).
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Row row in rows)
{
DataRow tempRow = dt.NewRow();
for (int i = 0; i < tempRow.ItemArray.Count(); i++)
{
//I get the exception here
Cell c = row.Descendants<Cell>().ElementAt(i);
if (c.CellValue != null)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
else if (c.CellValue == null )
{
tempRow[i] = "";
}
}
dt.Rows.Add(tempRow);
}
Solved by checking if we are within the range of the available columns in the row. Also checking if the first cell in the row is empty, discarding it if it is.
foreach (Row row in rows)
{
bool isEmpty = false;
DataRow tempRow = dt.NewRow();
for (int i = 0; i < tempRow.ItemArray.Count(); i++)
{
if (i < row.Descendants<Cell>().Count())
{
Cell c = row.Descendants<Cell>().ElementAt(i);
if (c.CellValue != null)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
else
{
if (i == 0)
{
isEmpty = true;
break;
}
else tempRow[i] = "";
}
}
else
{
tempRow[i] = "";
}
}
if (isEmpty) continue;
dt.Rows.Add(tempRow);
}
Related
I am using Visual Studio and writing code in C# to read an excel then format the excel worksheet to what i want then export it back into an excel. After exporting to Excel, when I open that file, it treats the date column as "General" unless I right click on it and set its type to "Date". I need excel to read it as "Date." The date is formatted correctly but Excel not reading it as a Date. Any suggestions on how to fix this would be greatly appreciated. It's alot of code so I hope this is enough for someone to understand what I am trying to do.
// Excel rows are cell arrays so loop through once to determine
// the actual table geometry
int firstRow = xlSheet.FirstRowNum;
int lastRow = xlSheet.LastRowNum;
int firstCol = xlSheet.GetRow(firstRow).FirstCellNum;
int lastCol = xlSheet.GetRow(lastRow).LastCellNum;
for (int rowIdx = firstRow; rowIdx < lastRow; rowIdx++)
{
var r = xlSheet.GetRow(rowIdx);
if (r == null) { continue; } // Nothing here, move on to the next record
int thisFirstCol = xlSheet.GetRow(rowIdx).FirstCellNum;
int thisLastCol = xlSheet.GetRow(rowIdx).LastCellNum;
firstCol = Math.Min(thisFirstCol, firstCol);
lastCol = Math.Max(thisLastCol, lastCol);
}
// System.Diagnostics.Debug.WriteLine($"First Row/Col:{firstRow} / {firstCol} Last Row/Col {lastRow} / {lastCol}");
// Set up the column headers
for (int colIdx = firstCol; colIdx < lastCol; colIdx++)
{
string colName = $"field{colIdx}"; //default
if (useHeader)
{
try
{
// Only if firstRow isn't the header, otherwise you should choose useHeader = false
// when calling this from the adapter.
// Duplicate column name is rare, but it happens.
colName = xlSheet.GetRow(firstRow).GetCell(colIdx).ToString().Trim();
if (String.IsNullOrEmpty(colName) || dt.Columns.Contains(colName))
{
colName = $"field{colIdx}";
}
}
catch { } // May not have one, so the default will be used
}
dt.Columns.Add(colName, typeof(string));
}
// Now loop again to populate the data
DataRow dr = null;
if (useHeader) {
firstRow++;
lastRow++;
}
for (int rowIdx = firstRow; rowIdx < lastRow; rowIdx++)
{
var xlRow = xlSheet.GetRow(rowIdx);
if (xlRow == null) { continue; } // Nothing here, move on to the next record
dr = dt.NewRow();
// We have to account for a bizarre -1 first column index on some Excel files, Gordon is an example
// Datacolumns always start at 0, so we'll increment it independently, the count should be the same
// in theory...
int dtColIdx = -1;
for (int colIdx = firstCol; colIdx < lastCol; colIdx++)
{
dtColIdx++;
try
{
// Convert the value to a reasonable format
ICell cell = xlRow.GetCell(colIdx);
if (cell != null)
{
switch (cell.CellType)
{
case CellType.Formula:
dr[dtColIdx] = cell.NumericCellValue.ToString();
break;
case CellType.Numeric:
if (DateUtil.IsCellDateFormatted(cell))
{
// TODO: DateTime vs Date vs Time check
dr[dtColIdx] = cell.DateCellValue.ToString("MM/dd/yyyy");
// string xx = dr[dtColIdx].ToString();
// xlSheet.GetColumnStyle(dtColIdx).DataFormat = "mm/dd/yyyy";
}
else
{
dr[dtColIdx] = cell.NumericCellValue.ToString();
}
break;
default:
dr[dtColIdx] = cell.ToString();
break;
}
}
}
catch
{
// May be nothing at this address, plug it with a blank
dr[dtColIdx] = String.Empty;
}
}
dt.Rows.Add(dr);
}
dt.AcceptChanges();
//GenFieldParseCode(dt);
return dt;
}
My code is save datagrid to csv. when process to
value = dr.Cells[i].Value.ToString();
Error message is the following:System.Windows.Forms.DataGridViewCell.Value.get return null.
Then, I add corner case to check cell is null and replace those cells by "Null".
foreach (DataGridViewRow rw in this.dataGridView1.Rows)
{
for (int i = 0; i < rw.Cells.Count; i++)
{
if (rw.Cells[i].Value == System.DBNull.Value)
{
swOut.Write("Null");
}
}
}
But the error message is there still.
The following is my code:
public void writeCSV(DataGridView gridIn, string outputFile)
{
//test to see if the DataGridView has any rows
if (gridIn.RowCount > 0)
{
string value = "";
DataGridViewRow dr = new DataGridViewRow();
StreamWriter swOut = new StreamWriter(outputFile);
foreach (DataGridViewRow rw in this.dataGridView1.Rows)
{
for (int i = 0; i < rw.Cells.Count; i++)
{
if (rw.Cells[i].Value == System.DBNull.Value)
{
swOut.Write("Null");
}
}
}
//write header rows to csv
for (int i = 0; i <= gridIn.Columns.Count - 1; i++)
{
if (i > 0)
{
swOut.Write(",");
}
swOut.Write(gridIn.Columns[i].HeaderText);
}
swOut.WriteLine();
//write DataGridView rows to csv
for (int j = 0; j <= gridIn.Rows.Count - 1; j++)
{
if (j > 0)
{
swOut.WriteLine();
}
dr = gridIn.Rows[j];
for (int i = 0; i <= gridIn.Columns.Count - 1; i++)
{
if (i > 0)
{
swOut.Write(",");
}
value = dr.Cells[i].Value.ToString();
//replace comma's with spaces
value = value.Replace(',', ' ');
//replace embedded newlines with spaces
value = value.Replace(Environment.NewLine, " ");
swOut.Write(value);
}
}
swOut.Close();
}
}
The current code appears to work… IF the DataGridView.AllowUserToAddRows is false! The default is true. If the grid allows users to add rows, the code will crash at the line…
value = dr.Cells[i].Value.ToString();
when it hits the “new” row. The cells in the “new” row are null not DBNull. If you want to allow the user to add rows (which I assume is the case since the code is stripping out commas and new lines), then the code will need to check for this “new” row and ignore it when exporting the grid.
With that said, I believe you are making this way more complicated than it has to be. The goal is to export the cells in a DataGridView to a comma delimited file (CSV). This can be done with much less code and still avoid the dreaded null values in the cells.
From the perspective of the CSV file, if a cell is “null” that means that we want to output an “empty” string to the CSV file. This will maintain the column schema. Therefore, a simple double loop through the grids cells is all that needs to be done. While looping through the cells and writing the values to the file, a simple check is needed before we try and grab a cells value.ToString(). If value is null, the code will crash as a null does not have a ToString() method. Therefore if value “is” null then write an empty string to the file… problem solved!
Therefore, to help, I recommend you use a different strategy to export the grids cells. There appears no reason for the dr variable nor the value variable. In addition, I would assume that the cells text does “NOT” contain commas (,). If you “know” the grid is going to be exported to a CSV file… I would set the grids cells such that the user would “not” be able to type a comma. Therefore, the code below does not strip out commas or new lines. Hope that makes sense.
public void writeCSV(DataGridView gridIn, string outputFile) {
try {
using (StreamWriter swOut = new StreamWriter(outputFile)) {
//write header rows to csv
for (int i = 0; i < gridIn.Columns.Count; i++) {
swOut.Write(gridIn.Columns[i].HeaderText);
if (i < gridIn.ColumnCount - 1) {
swOut.Write(",");
}
else {
swOut.WriteLine();
}
}
//write DataGridView rows to csv
for (int row = 0; row < gridIn.Rows.Count; row++) {
if (!gridIn.Rows[row].IsNewRow) {
for (int col = 0; col < gridIn.Columns.Count; col++) {
if (dataGridView1.Rows[row].Cells[col].Value != null) {
swOut.Write(dataGridView1.Rows[row].Cells[col].Value.ToString());
}
else {
swOut.Write("");
}
if (col < gridIn.Columns.Count - 1) {
swOut.Write(",");
}
else {
swOut.WriteLine();
}
}
}
}
}
}
catch (Exception e) {
MessageBox.Show("Error: " + e.Message);
}
}
Task
Import data from excel to DataTable
Problem
The cell that doesnot contain any data are getting skipped and the very next cell that has data in the row is used as the value of the empty colum.
E.g
A1 is empty A2 has a value Tom then while importing the data A1 get the value of A2 and A2 remains empty
To make it very clear I am providing some screen shots below
This is the excel data
This is the DataTable after importing the data from excel
Code
public class ImportExcelOpenXml
{
public static DataTable Fill_dataTable(string fileName)
{
DataTable dt = new DataTable();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Cell cell in rows.ElementAt(0))
{
dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
}
foreach (Row row in rows) //this will also include your header row...
{
DataRow tempRow = dt.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
dt.Rows.Add(tempRow);
}
}
dt.Rows.RemoveAt(0); //...so i'm taking it out here.
return dt;
}
public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
string value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}
}
My Thoughts
I think there is some problem with
public IEnumerable<T> Descendants<T>() where T : OpenXmlElement;
In case I want the count of columns using Descendants
IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int colCnt = rows.ElementAt(0).Count();
OR
If I am getting the count of rows using Descendants
IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int rowCnt = rows.Count();`
In both cases Descendants is skipping the empty cells
Is there any alternative of Descendants.
Your suggestions are highly appreciated
P.S: I have also thought of getting the cells values by using column names like A1, A2 but in order to do that I will have to get the exact count of columns and rows which is not possible by using Descendants function.
Had there been some data in all the cells of a row then everything works fine. But if you happen to have even single empty cell in a row then things go haywire.
Why it is happening in first place?
The reason lies in below line of code:
row.Descendants<Cell>().Count()
Count() function gives you the number of non-empty cells in the row i.e. it will ignore all the empty cells while returning the count. So, when you pass row.Descendants<Cell>().ElementAt(i) as argument to GetCellValue method like this:
GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
Then, it will find the content of the next non-empty cell, not necessarily the content of the cell at column index i e.g. if the first column is empty and we call ElementAt(1), it returns the value in the second column instead and our program logic gets messed up.
Solution: We need to deal with the occurrence of empty cells in the row i.e. we need to figure out the actual/effective column index of the target cell in case there were some empty cells before it in the given row. So, you need to substitute your for loop code below:
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
with
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
Cell cell = row.Descendants<Cell>().ElementAt(i);
int actualCellIndex = CellReferenceToIndex(cell);
tempRow[actualCellIndex] = GetCellValue(spreadSheetDocument, cell);
}
Also, add below method in your code which is used in the above modified code snippet to obtain the actual/effective column index of any cell:
private static int CellReferenceToIndex(Cell cell)
{
int index = 0;
string reference = cell.CellReference.ToString().ToUpper();
foreach (char ch in reference)
{
if (Char.IsLetter(ch))
{
int value = (int)ch - (int)'A';
index = (index == 0) ? value : ((index + 1) * 26) + value;
}
else
{
return index;
}
}
return index;
}
Note: Index in an Excel row start with 1 unlike various programming languages where it starts at 0.
public void Read2007Xlsx()
{
try
{
DataTable dt = new DataTable();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(#"D:\File.xlsx", false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Cell cell in rows.ElementAt(0))
{
dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
}
foreach (Row row in rows) //this will also include your header row...
{
DataRow tempRow = dt.NewRow();
int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
// Gets the column index of the cell with data
int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));
cellColumnIndex--; //zero based index
if (columnIndex < cellColumnIndex)
{
do
{
tempRow[columnIndex] = ""; //Insert blank data here;
columnIndex++;
}
while (columnIndex < cellColumnIndex);
}//end if block
tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);
columnIndex++;
}//end inner foreach loop
dt.Rows.Add(tempRow);
}//end outer foreach loop
}//end using block
dt.Rows.RemoveAt(0); //...so i'm taking it out here.
}//end try
catch (Exception ex)
{
}
}//end Read2007Xlsx method
/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
// Create a regular expression to match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
} //end GetColumnName method
/// <summary>
/// Given just the column name (no row index), it will return the zero based column index.
/// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ).
/// A length of three can be implemented when needed.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful; otherwise null</returns>
public static int? GetColumnIndexFromName(string columnName)
{
//return columnIndex;
string name = columnName;
int number = 0;
int pow = 1;
for (int i = name.Length - 1; i >= 0; i--)
{
number += (name[i] - 'A' + 1) * pow;
pow *= 26;
}
return number;
} //end GetColumnIndexFromName method
public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
if (cell.CellValue ==null)
{
return "";
}
string value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}//end GetCellValue method
foreach (Cell cell in row.Descendants<Cell>())
{
while (columnRef[i] + (dt.Rows.Count + 1) != cell.CellReference)
{
dt.Rows[dt.Rows.Count - 1][i] = "";
i += 1;
}
dt.Rows[dt.Rows.Count - 1][i] = GetValue(doc, cell);
i++;
}
Try this code. I have done little modifications and it worked for me:
public static DataTable Fill_dataTable(string filePath)
{
DataTable dt = new DataTable();
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(filePath, false))
{
Sheet sheet = doc.WorkbookPart.Workbook.Sheets.GetFirstChild<Sheet>();
Worksheet worksheet = doc.WorkbookPart.GetPartById(sheet.Id.Value) as WorksheetPart.Worksheet;
IEnumerable<Row> rows = worksheet.GetFirstChild<SheetData>().Descendants<Row>();
DataTable dt = new DataTable();
List<string> columnRef = new List<string>();
foreach (Row row in rows)
{
if (row.RowIndex != null)
{
if (row.RowIndex.Value == 1)
{
foreach (Cell cell in row.Descendants<Cell>())
{
dt.Columns.Add(GetValue(doc, cell));
columnRef.Add(cell.CellReference.ToString().Substring(0, cell.CellReference.ToString().Length - 1));
}
}
else
{
dt.Rows.Add();
int i = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
while (columnRef(i) + dt.Rows.Count + 1 != cell.CellReference)
{
dt.Rows(dt.Rows.Count - 1)(i) = "";
i += 1;
}
dt.Rows(dt.Rows.Count - 1)(i) = GetValue(doc, cell);
i += 1;
}
}
}
}
}
return dt;
}
I am trying to search every cell in my datagridview for a value "test". However it is only searching the first row... (i believe it is searching all the columns) Any ideas on how I can fix this?
dataGridView1.SelectionMode = DataGridViewSelectionMode.CellSelect;
string searchValue = "test";
int searching = -1;
while (searching < 7)
{
searching++;
try
{
foreach (DataGridViewRow row in dataGridView1.Rows)
{
if (row.Cells[searching].Value.ToString().Equals(searchValue))
{
row.Cells[searching].Selected = true;
break;
}
}
}
catch (Exception exc)
{
// MessageBox.Show(exc.Message);
}
}
use this snippet.. basically we iterate through every row/column and set its value as selected if we find a match.
dataGridView1.SelectionMode = DataGridViewSelectionMode.CellSelect;
string searchValue = "test";
for (int row = 0; row < dataGridView1.Rows.Count; ++row)
{
for (int col = 0; col < dataGridView1.Columns.Count; ++col)
{
var cellValue = dataGridView1.Rows[row].Cells[col].Value;
if (cellValue != null && cellValue.ToString().Equals(searchValue))
{
dataGridView1.Rows[row].Cells[col].Selected = true;
// if you want to search every cell for the searchValue then you shouldn't break.
// break;
}
}
}
you can also do the above as follows, using concise LINQ code:
dataGridView1.SelectionMode = DataGridViewSelectionMode.CellSelect;
string searchValue = "test";
dataGridView1.Rows.ToList().ForEach(row => row.Cells.ToList().ForEach(cell =>
{
cell.Selected = (cell.Value != null && cell.Value.ToString().Equals(searchValue));
}));
I have a problem.
I have two loops (one for row, one for column) for creating data in DataTable. I want to check if cell is empty for column named "Name" and if it is empty just don't add this row. And here is a question: How to cancel adding row?
Got some code:
for (int i = 0; i < data.Count(); i++)
{
cell = data.ElementAt(i);
DataRow row;
row = dataTable.NewRow();
foreach (string column in columns)
{
if (row["Name"] == "")
{
row = null;
}
else
{
row[column] = cell;
}
}
if (row != null)
{
dataTable.Rows.Add(row);
}
}
But after next loop is starting it throws NullException: Object reference not set to an instance of an object.
Generally I want to add Rows to DataTable only those where value of cell is not empty at column called "Name" (i mean where is "").
What is the best way or easiest way to do it right?
Change it as:
....
foreach (string column in columns)
{
if (row["Name"] == "")
{
row = null;
break; //--> Add this line
}
else
{
row[column] = cell;
}
}
....