I am trying to build a data import tool that accepts an EXCEL file from the user and parses the data from the file to import data into my application.
I am running across a strange issue with DeleteRow that I cannot seem to find any information online, although it seems like someone would have come across this issue before. If this is a duplicate question, I apologize, however I could not find anything related to my issue after searching the web, except for this one which still isn't solving my problem.
So the issue:
I use the following code to attempt to "remove" any row that has blank data through ExcelPackage.
for (int rowNum = 1; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var rowCells = from cell in ws.Cells
where (cell.Start.Row == rowNum)
select cell;
if (rowCells.Any(cell => cell.Value != null))
{
nonEmptyRowsInFile += 1;
continue;
}
else ws.DeleteRow(rowNum);
//Update: ws.DeleteRow(rowNum, 1, true) also does not affect dimension
}
Stepping through that code, I can see that the DeleteRow is indeed getting called for the proper row numbers, but the issue is when I go to set the "total rows in file" count on the returned result object:
parseResult.RowsFoundInFile = (ws.Dimension.End.Row);
ws.Dimension.End.Row will still return the original row count even after the calls to DeleteRow.
My question is...do I have to "save" the worksheet or call something in order for the worksheet to realize that those rows have been removed? What is the point of calling "DeleteRow" if the row still "exists"? Any insight on this would be greatly appreciated...
Thanks
I think I figured out the problem. This is yet again another closure issue in C#. The problem is that the reference to "ws" is still the same reference from before the DeleteRow call.
In order to get the "updated" dimension, you have to redeclare the worksheet, for example:
ws = excelPackage.Workbook.Worksheets.First();
Once you get a new reference to the worksheet, it will have the updated dimensions, including any removed/added rows/columns.
Hopefully this helps someone.
Related
I am trying to read excel null/blank values.
I have looked into hundreds of solutions and either I am implementing it wrong or it just does not seem to work and results in Microsoft.CSharp.RuntimeBinder.RuntimeBinderException:'Cannot perform runtime binding on a null reference'
This is one of the last codes I tried.(Since I was trying to put NA in all the null cells)
for (int i = 2; i <= rowCount; i++)
{
string natext = xlRange.Value2[rowCount, colCount];
if (natext == null)
{
natext = "NA";
}
Any ideas that can help me with some examples?
If the click the details it shows:
Microsoft.CSharp.RuntimeBinder.RuntimeBinderException
HResult=0x80131500 Message=Cannot perform runtime binding on a null
reference Source=
StackTrace:
First, the Excel object model is really weird. Value2 returns an object, and that object can be of all sorts of different types. If xlRange is a cell, then it returns the value of that cell, which could be a string or a double or something else. If xlRange is multiple cells then that object is an array of values. And then each of those values is an object. For each value you don't know if it's a string or a double or something else.
That's not fun to deal with. It's actually really, really bad. C# is a strongly-typed language, which means that you know what type everything is and you don't have to guess. Excel Interop takes that away from you and says, "Here's an object. It could be anything or lots of things that could each be anything. Figure it out. Good luck."
Instead of getting the Value2 property of the range and then looping through the array, it's much easier to deal with the cells in the range instead.
Given that excelRange is a Range of cells:
for (var row = 1; row <= excelRange.Rows.Count; row++)
{
for (var column = 1; row <= excelRange.Columns.Count; row++)
{
var cellText = excelRange[row, column].Text.ToString();
}
}
This does two things. First, you're looking at one cell at a time. Second, you're using the Text property. The Text property should always be a string so you could just do this and it would almost certainly work:
string cellText = excelRange.Cells[row, column].Text;
It's just that the object model returns dynamic, so even though it is a string, the possibility is left open that maybe it won't be.
My strong recommendation - and I think most developers would agree - is to abandon Excel Interop and run from it, and use a library like EPPlus instead. There are tons of examples.
Excel Interop works by actually starting an instance of Excel and giving you access to the clunky VBA object model. It's evil. Chances are that if you open your task manager right now you'll see several extra instances of Excel open that you didn't expect to see. Fixing that is a whole separate frustrating problem.
For some years Excel files have just been collections of XML documents, and EPPlus helps you to work with them as documents, but providing all sorts of helper methods so that you can interact with sheets, ranges, cells, and so forth. Try it. Trust me, you'll never look back.
Here's an example after adding the EPPlus Nuget package:
var pathToYourExcelWorkbook = #"c:\somepath\document.xlsx";
using (var workbookPackage = new ExcelPackage(new FileInfo(pathToYourExcelWorkbook)))
{
var workbook = workbookPackage.Workbook;
var sheet = workbook.Worksheets[1]; // 1-based, or use the name.
for (var row = 1; row <= 10; row++)
{
for (var column = 1; column <= 10; column++)
{
var cellText = sheet.Cells[row, column].Text;
}
}
}
It's awesome. No starting or closing an application - you're just reading from a file. No weird COM objects. And the objects are all strongly-typed. The Text property returns a string.
I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015, however starting in 2016, the structure changes to look like this: '1/4/2016 (note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.
Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row value and where that column index is, it's empty.
Edit
Here is the code I am using to read in the file:
var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
data.Add(this.FillEntity(row));
}
The problem I'm referring to is inside the row variable, which is a LinqToExcel.Row instance and contains the raw data from Excel. The values inside row all line up, with the exception of the column for the date which is empty.
** Edit 2 **
I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the
LinqToExcel.ExcelQueryExecutorclass that is failing:
private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
{
var results = new List<object>();
var columnIndexMapping = new Dictionary<string, int>();
for (var i = 0; i < columns.Count(); i++)
columnIndexMapping[columns.ElementAt(i)] = i;
while (data.Read())
{
IList<Cell> cells = new List<Cell>();
for (var i = 0; i < columns.Count(); i++)
{
var value = data[i];
//I added this in, since the worksheet has over 37K rows and
//I needed to snag right before it hit the values I was looking for
//to see what the IDataReader was exposing. The row inside the
//IDataReader relevant to the column I'm referencing is null,
//even though the data definitely exists in the Excel file
if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
{
}
value = TrimStringValue(value);
cells.Add(new Cell(value));
}
results.CallMethod("Add", new Row(cells, columnIndexMapping));
}
return results.AsEnumerable();
}
Since their class uses an OleDbDataReader to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.
Found it! Once I traced down that it was the OleDbDataReader that was failing and not the LinqToExcel library itself, it sent me down a different path to look around. Apparently, when an Excel file is read by an OleDbDataReader (as virtually all utilities do under the covers), the first few records are scanned to determine the type of content associated with the column. In my scenario, over 20K records had "normal" dates, so it assumed everything was a date. Once it got to the "bad" records, the ' in front of the date meant it couldn't be parsed into a date, so the value was null.
To circumvent this, I load the file and tell it to ignore column headers. Since the header for this column is a string and most of the values are dates, it makes everything a string because of the mismatched types and the values I need are loaded properly. From there, I can parse accordingly and get it to work.
Source: What is IMEX in the OLEDB connection string?
I am populating a ListObject with data from a database, and am pre-formatting ListColumns which come from VarChar (& similar) as Text before inserting the data.
This works well, but some affected cells now are showing the 'Number Stored As Text' error.
The answer https://stackoverflow.com/a/21869098/1281429 suppresses the error correctly, but requires looping through all cells (as it is not possible to perform the action on a range).
Unfortunately for large ranges this is unacceptably slow.
(n.b. - if you do it manually in Excel it's lightning fast)
Here is a code snippet in C# (for a particular column):
var columnDataRange = listColumn.DataBodyRange
var cells = columnDataRange.Cells;
for (var i = 1; i < cells.Count; i++)
{
InteropExcel.Range cell = cells[i, 1];
if (cell.Count > 1) break;
if (cell.Errors != null)
{
var item = cell.Errors.Item[InteropExcel.XlErrorChecks.xlNumberAsText];
item.Ignore = true;
}
}
Does anyone know of a faster way of doing this?
(Or, more generally, a faster way of iterating through cells in a range?)
Hope someone can help - thanks.
Edit: this is a VSTO Application-Level add-in for Excel 2010/2013.
Just to be sure - you are going from a database to an Excel export? Are you creating a new, clean spreadsheet or overwriting existing data in an existing spreadsheet?
If you are overwriting data in an existing spreadsheet, I would first clear the columns and format the columns in Excel (programmatically of course). It is likely old data and new data going into the same space are causing type issues.
So something like:
thisExcel.xlWorksheet.Range[yourrange].Value = ""
thisExcel.xlWorksheet.Range[yourrange].NumberFormat = choseyourformat
http://msdn.microsoft.com/en-us/library/office/ff196401(v=office.15).aspx
You should be able to apply that to a larger area.
I am trying to get the index of the last used row in a spreadsheet. I've found that in excel it could be done like that:
int lastUsedRow = worksheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell,
Type.Missing).Row;
But this doesn't seem to work with GemBox. The idea is that I have a template excel file that I want to fill with more information and therefore need the last row, so that I can continue on the next one.
Hi you can just use ExcelFile.Rows.Count property.
Gets the number of currently allocated elements (dynamically changes when worksheet is modified)
Try the following:
int lastUsedRow = worksheet.Rows.Count - 1;
Also regarding the shahkalpesh suggestion, yes you can also achieve your task with that approach as well, here is how:
var usedRange = worksheet.GetUsedCellRange(true);
int lastUsedRow = usedRange.LastRowIndex;
Note: I haven't used Gembox. My answer is based on searching in the documentation.
GetUsedCellRange returns a CellRange, which has a property named LastRowIndex.
Does this work the same way as Excel?
I'm copying and inserting rows in an Excel sheet, like so:
while (rowsToAdd > 0)
{
// copy the existing row
insertionCell.EntireRow.Copy(Type.Missing);
// location of the new row
Range newRow = insertionCell.EntireRow.get_Offset(1, 0).EntireRow;
// insert the new row
newRow.Insert(XlInsertShiftDirection.xlShiftDown, Type.Missing);
rowsToAdd--;
}
The problem I have is that sometimes, I'm left with a selection marquee around the row I originally copied.
Is there a way I can cancel the selection marquee (the way you'd normally do it with the Escape key?)
In VBA it's Application.CutCopyMode = False
Adding
myExcelApplication.CutCopyMode = XlCutCopyMode.xlCopy;
seems to do the trick, though the documentation does not explain it very well, and seems to be wrong, since bools are mentioned.
I know it's pretty old, but as I had some issue copying one row to another, and as I got the proper solution on my case, I'm now glad to share it, just in case it could fix your problem (thinking about copy rows instead of inserting one).
Here is the solution:
Microsoft.Office.Interop.Excel.Range xlSourceRow;
Microsoft.Office.Interop.Excel.Range xlNewRow;
xlSourceRow.Copy(xlNewRow);
Using this solution prevents from getting any selection in your MS-Excel file.