Using openxml to read an excel file and write data into a database.
Data from the spreadsheet is stored in a data table and then mapped into a object array. The problem is: if a user deletes a row in the excel file I get the exception 'Specified argument was out of the range of valid values.
Parameter name: index' when reading the file.
To counter this, I have tried checking if the cell value is null, if so then take it as an empty string. However, the error occurs when I reach the 9th column in the last "deleted" row (8 columns out of 10 are taken as an empty string).
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Row row in rows)
{
DataRow tempRow = dt.NewRow();
for (int i = 0; i < tempRow.ItemArray.Count(); i++)
{
//I get the exception here
Cell c = row.Descendants<Cell>().ElementAt(i);
if (c.CellValue != null)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
else if (c.CellValue == null )
{
tempRow[i] = "";
}
}
dt.Rows.Add(tempRow);
}
Solved by checking if we are within the range of the available columns in the row. Also checking if the first cell in the row is empty, discarding it if it is.
foreach (Row row in rows)
{
bool isEmpty = false;
DataRow tempRow = dt.NewRow();
for (int i = 0; i < tempRow.ItemArray.Count(); i++)
{
if (i < row.Descendants<Cell>().Count())
{
Cell c = row.Descendants<Cell>().ElementAt(i);
if (c.CellValue != null)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
else
{
if (i == 0)
{
isEmpty = true;
break;
}
else tempRow[i] = "";
}
}
else
{
tempRow[i] = "";
}
}
if (isEmpty) continue;
dt.Rows.Add(tempRow);
}
I know this question have been asked multiple times . But I could not find much help from anyone of those.
I don't want to convert the excel into data table but I want it to be converted to a list of objects and sent to server side for processing.
If it has more than 2K rows it should throw an error. Currently what I am doing is something like :
using (var excel = new ExcelPackage(hpf.InputStream))
{
var ws = excel.Workbook.Worksheets["Sheet1"];
for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
{
if (ws.Cells[rw, 1].Value != null)
{
int headerRow = 2;
GroupMembershipUploadInput gm = new GroupMembershipUploadInput();
for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++)
{
var s = ws.Cells[rw, col].Value;
if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id"))
{
gm.cnst_mstr_id = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
}
else if (ws.Cells[headerRow, col].Value.ToString().Equals("Prefix of the constituent(Mr, Mrs etc)"))
{
gm.cnst_prefix_nm = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
}
else if (ws.Cells[headerRow, col].Value.ToString().Equals("First Name of the constituent(Mike)"))
{
gm.cnst_first_nm = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
}
.....................
.....................
}
}
iUploadedCnt = iUploadedCnt + 1; //Increase the count by 1
}
if (lgl.GroupMembershipUploadInputList.Count < 2003) //Check for the uploaded list count
{
//throw the error
}
But this approach seems slow.
Conversion of the excel to list seems slow to me. For example , when I upload more than 2k records , the list gets converted first to list and then the count is checked if more than 2003 . This process is definitely slower.
How can it be achieved in a faster /better way ?
You do a lot of repeated string processing which is unnecessary. For each row you check the column headers again if they fit some predefined value. (for instance if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id")).
You could do this once before you start parsing all rows and create for instance a Dictionary<int, SomeEnum> which maps the column number to a specific enum value. When parsing the rows you then can make a quick lookup in the dictionary, which column maps to which property.
Furthermore, you define a var s = ws.Cells[rw, col].Value; but never use it. Instead, you read this cell value again, when you assign it to a property of your object. You could just make the necessary conversions and checks here, and then use only s;
// define this enum somewhere
enum ColumPropEnum {
cnst_mstr_id, cnst_prefix_nm, ...
}
//define this prop somewhere
Dictionary<int, ColumnPropEnum> colprops = new Dictionary<int, ColumnPropEnum>();
//do this once before processing all rows
for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++) {
if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id"))
colprops.Add(col, ColumnPropEnum.cnst_mstr_id);
else if (ws.Cells[headerRow, col].Value.ToString().Equals(" ..."))
colprops.Add(col, ColumnPropEnum.cnst_prefix_nm);
...
}
//now use this dictionary in each row
for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
{
....
for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++) {
//the single ? checks, whether the Value is null, if yes it returns null, otherwise it returns ToString(). Then the double ?? checks whether the result if the operation is null, if yes, it assigns "" to s, otherwise the result of ToString();
var s = ws.Cells[rw, col].Value?.ToString() ?? "";
ColumnPropEnum cp;
if (colpros.TryGetValue(col, out cp)) {
switch (cp) {
case cnst_mstr_id: gm.cnst_mstr_id = s; break;
case cnst_prefix_nm: gm.cnst_prefix_nm = s; break;
...
}
}
}
}
I'm not sure at which position you add this object to a list or upload it to the server, as this is not part of the code. But it could be faster, to first check only the first column of each row if you have the necessary count of non-null values and throw an error if not and do all the other processing only if you didn't throw the error.
int rowcount = 0;
//If you need at minimum 2000 rows, you can stop after you count 2000 valid rows
for (int rw = 4; rw <= ws.Dimension.End.Row && rowcount < 2000; rw++)
{
if (ws.Cells[rw, 1].Value != null) rowcount++
}
if (rowcount < 2000) {
//throw error and return
}
//else do the list building and uploading
I have not found a method to normalize a DataTable that came from an Excel with merged cells. When I get the DataTable from that Excel, only the first cell has the value, others are blank.
An example of this DataTable is:
and the expected result:
To summarize: blanks cells should be completed with the value of the next cell above with a value, since is what was happened with the Excel merge of cells.
I'm using Excel.dll to read this Excel, didn't provide the autofill of cells, so that's why I'm searching for a method inside C#.
I suppose that logic should be: if a cell is blank, use the upper cell as a value. The logic appears clear but I have issues trying to get the code to apply it.
This is a sample, but at the end, I'm looking for a method to do this whenever columns or rows have the datatable.
Edit:
Thanks for your quicky feedback.
Attached what i have so far for just only one column and with errors since doesn't take care of the first and last row, but is the idea... what i try to achieve is to have a method for any amount of cols and rows (could be ok if cols are fixed with names, and then if i have more columns i will adapt).
private void NormalizeDataTable(DataTable dtRawTable)
{
DataTable dtFinalized = new DataTable();
dtFinalized.Columns.Add("Col1", typeof(String));
string previousValue = "";
for (int index = 0; index <= dtRawTable.Rows.Count; index++)
{
DataRow dr = dtFinalized.NewRow();
if (index != 0 || index == dtRawTable.Rows.Count -1)
{
if (dtRawTable.Rows[index]["Modelo"].ToString() == "")
{
dr["Col1"] = previousValue;
}
else
{
dr["Col1"] = Convert.ToString(dtRawTable.Rows[index]["Modelo"].ToString());
previousValue = (string)dr["Col1"];
}
}
dtFinalized.Rows.Add(dr);
dtFinalized.AcceptChanges();
}
}
Here is the function i using in my project for same requirement.
public static DataTable AutoFillBlankCellOfTable(DataTable outputTable)
{
for (int i = 0; i < outputTable.Rows.Count; i++)
{
for (int j = 0; j < outputTable.Columns.Count; j++)
{
if (outputTable.Rows[i][j] == DBNull.Value)
{
if (i > 0)
outputTable.Rows[i][j] = outputTable.Rows[i - 1][j];
}
}
}
return outputTable;
}
Using a Copy/Paste/Delete ContextMenuStrip I'm copying one row of datagridview and then pasting as new row with change to 1 column to maintain uniqueness. The DGV is bound to DT and DT has several columns defined as INT that DGV columns inherit.
Some INT columns that are copied are NULL and create a conversion error if I leave as is. Convert and Parse both convert the NULL to 0 but I need values that are NULL to remain NULL since this is an interface to SQL. A second issue is, after paste the '*' remains on the row as if it doesn't commit and stays that way until I make another selection in my ListView.
How do I get the paste to accept NULL in an INT column?
How I force a commit so * moves to blank line?
// paste
for (int iRow = 0; iRow < rowInClipboard.Length; iRow++)
{
if (iRow + currentRow < grid.Rows.Count)
{
string[] cellsInRow = rowInClipboard[iRow].Split(columnSplitter);
for (int iCol = 0; iCol < cellsInRow.Length; iCol++)
{
if (grid.ColumnCount > currentColumn + iCol)
{
DataGridViewCell currentCell = grid.Rows[currentRow + iRow].Cells[currentColumn + iCol];
if (!currentCell.ReadOnly) // H.NH added to avoid Read only case.
switch (grid.Columns[iCol].ValueType.Name)
{
case "Int32":
string s = cellsInRow[iCol];
int result;
if (int.TryParse(s, out result))
{
// The string was a valid integer => use result here
}
else
{
}
currentCell.Value = 999;
//currentCell.Value = cellsInRow
break;
case "String":
if (iCol == 3)
{
cellsInRow[3] += "*******";
}
currentCell.Value = cellsInRow[iCol];
break;
I have user supplied excel files that need to be converted to PDF. Using excel interop, I can do this fine with .ExportAsFixedFormat(). My problem comes up when a workbook has millions of rows. This turns into a file that has 50k+ pages. That would be fine if the workbook had content in all of those rows. Every time one of these files shows up though, there are maybe 50 rows that have content and the rest are blank. How can I go about removing the empty rows so I can export it to a decent sized PDF?
I've tried starting at the end row and, one-by-one, using CountA to check if the row has content and if it does, delete it. Not only does this take forever, this seems to fail after about 100k rows with the following error:
Unable to evaluate expression because the code is optimized or a native frame is on top of the call stack.
I've tried using SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues) but that includes a row if any cell has formatting (like a bg color).
I've tried using Worksheet.UsedRange and then deleting everything after that but UsedRange has the same problem as point two.
This is the code I've tried:
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
currentRowIndex = rows.Count;
bool contentFound = false;
while (!contentFound && currentRowIndex > 0)
{
currentRow = rows[currentRowIndex];
if (Application.WorksheetFunction.CountA(currentRow) == 0)
{
currentRow.Delete();
}
else
{
contentFound = true;
}
Marshal.FinalReleaseComObject(currentRow);
currentRowIndex--;
}
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
lastCell = rows.SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues);
int startRow = lastCell.Row;
Range range = sheet.get_Range(lastCell.get_Address(RowAbsolute: startRow));
range.Delete();
Marshal.FinalReleaseComObject(range);
Marshal.FinalReleaseComObject(lastCell);
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
Do I have a problem with my code, is this an interop problem or maybe it's just a limitation on what Excel can do? Is there a better way to do what I'm attempting?
I would suggest you to get the count of rows which contain some values, using CountA (as you have tried in point 1). Then copy those rows into a new sheet and export it from there. It will be easier to copy few rows to new sheet and working on it, rather than trying to delete huge number of rows from source sheet.
For creating new sheet and copying rows you can use the following code:
excel.Worksheet tempSheet = workbook.Worksheets.Add();
tempSheet.Name = sheetName;
workbook.Save();
//create a new method for copy new rows
//as the rowindex you can pass the total no of rows you have found out using CountA
public void CopyRows(excel.Workbook workbook, string sourceSheetName, string DestSheetName, int rowIndex)
{
excel.Worksheet sourceSheet = (excel.Worksheet)workbook.Sheets[sourceSheetName];
excel.Range source = (excel.Range)sourceSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
excel.Worksheet destSheet = (excel.Worksheet)workbook.Sheets[DestSheetName];
excel.Range dest = (excel.Range)destSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
source.Copy(dest);
excel.Range newRow = (excel.Range)destSheet.Rows[rowIndex+1];
newRow.Insert();
workbook.Save();
}
Have you tried Sheet1.Range("A1").CurrentRegion.ExportAsFixedFormat() where Sheet1 is a valid sheet name and "A1" is a cell you can test to ensure it is located in the range you want to export?
The question remains, why does Excel think there is data in those "empty" cells? Formatting? A pre-existing print area that needs to be cleared? I know I've encountered situations like that before, those are the only possibilities that come to mind at this moment.
Try these steps -
copy Worksheet.UsedRange to a separate sheet (sheet2).
use paste special so that formatting is retained
try parsing sheet2 for unused rows
If this doesnt help try repeating step 2 with formatting info being cleared and then parsing sheet2. you can always copy format info later (if they are simple enough)
If you can first load the Excel file into a DataSet via the OleDBAdapter, it's relatively easy to remove blank rows on the import...
Try this OleDBAdapter Excel QA I posted via stack overflow.
Then export the DataSet to a new Excel file and convert that file to PDF. That may be a big "IF" though of course depending on the excel layout (or lack there of).
I had to solve this problem today for what might be a subset of your possible cases.
If your spreadsheet meets the following conditions:
All columns with data have header text in line 1.
All rows with data are in sequence until the first BLANK row.
Then, the following code may help:
private static string[,] LoadCellData(Excel.Application excel, dynamic sheet)
{
int countCols = CountColsToFirstBlank(excel, sheet);
int countRows = CountRowsToFirstBlank(excel, sheet);
cellData = new string[countCols, countRows];
string datum;
for (int i = 0; i < countCols; i++)
{
for (int j = 0; j < countRows; j++)
{
try
{
if (null != sheet.Cells[i + 1, j + 1].Value)
{
datum = excel.Cells[i + 1, j + 1].Value.ToString();
cellData[i, j] = datum;
}
}
catch (Exception ex)
{
lastException = ex;
//Console.WriteLine(String.Format("LoadCellData [{1}, {2}] reported an error: [{0}]", ex.Message, i, j));
}
}
}
return cellData;
}
private static int CountRowsToFirstBlank(Excel.Application excel, dynamic sheet)
{
int count = 0;
for (int j = 0; j < sheet.UsedRange.Rows.Count; j++)
{
if (IsBlankRow(excel, sheet, j + 1))
break;
count++;
}
return count;
}
private static int CountColsToFirstBlank(Excel.Application excel, dynamic sheet)
{
int count = 0;
for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
{
if (IsBlankCol(excel, sheet, i + 1))
break;
count++;
}
return count;
}
private static bool IsBlankCol(Excel.Application excel, dynamic sheet, int col)
{
for (int i = 0; i < sheet.UsedRange.Rows.Count; i++)
{
if (null != sheet.Cells[i + 1, col].Value)
{
return false;
}
}
return true;
}
private static bool IsBlankRow(Excel.Application excel, dynamic sheet, int row)
{
for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
{
if (null != sheet.Cells[i + 1, row].Value)
{
return false;
}
}
return true;
}
Can you try with below code :
for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
{
//Assume the first row is the header. Then use the column match ups by name to determine the index.
//This will allow you to have the order of the header.Keys change without any affect.
var row = workSheet.Cells[string.Format("{0}:{0}", rowIndex)];
// check if the row and column cells are empty
bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // skip this row
else{
//here read header
if()
{
//some code
}
else
{
//some code to read body
}
}
}
Hope this help,else let me know if you need description about code.
Updated :
below code is used to check how many rows are in the worksheet. a for loop will traverse untill end of row of the worksheet.
for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
here we are checking if the row and column cells are empty using linq:
bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // if true then skip this row
else
// read headers(assuming it is presented in worksheet)
// else read row wise data
and then do necessary steps.
hoping this clears now.
I had the same problem and managed to fix it using the CurrentRegion:
var lastcell = sheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell);
var filledcells = sheet.Cells.Range[sheet.Cells.Item[1, 1],
sheet.Cells[lastcell.Row - 1, lastcell.Column]]
.CurrentRegion;
filledcells.ExportAsFixedFormat(
and so on. The CurrentRegion is said to expand to the borders where cells are empty, and apparently that means it also shrinks if it contains many empty cells.
Please try the following code:
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
sheet.Columns("A:A").SpecialCells(XlCellType.xlCellTypeBlanks).EntireRow.Delete
sheet.Rows("1:1").SpecialCells(XlCellType.xlCellTypeBlanks).EntireColumn.Delete
Marshal.FinalReleaseComObject(sheet);
}