Read excel null/blank values - c#

I am trying to read excel null/blank values.
I have looked into hundreds of solutions and either I am implementing it wrong or it just does not seem to work and results in Microsoft.CSharp.RuntimeBinder.RuntimeBinderException:'Cannot perform runtime binding on a null reference'
This is one of the last codes I tried.(Since I was trying to put NA in all the null cells)
for (int i = 2; i <= rowCount; i++)
{
string natext = xlRange.Value2[rowCount, colCount];
if (natext == null)
{
natext = "NA";
}
Any ideas that can help me with some examples?
If the click the details it shows:
Microsoft.CSharp.RuntimeBinder.RuntimeBinderException
HResult=0x80131500 Message=Cannot perform runtime binding on a null
reference Source=
StackTrace:

First, the Excel object model is really weird. Value2 returns an object, and that object can be of all sorts of different types. If xlRange is a cell, then it returns the value of that cell, which could be a string or a double or something else. If xlRange is multiple cells then that object is an array of values. And then each of those values is an object. For each value you don't know if it's a string or a double or something else.
That's not fun to deal with. It's actually really, really bad. C# is a strongly-typed language, which means that you know what type everything is and you don't have to guess. Excel Interop takes that away from you and says, "Here's an object. It could be anything or lots of things that could each be anything. Figure it out. Good luck."
Instead of getting the Value2 property of the range and then looping through the array, it's much easier to deal with the cells in the range instead.
Given that excelRange is a Range of cells:
for (var row = 1; row <= excelRange.Rows.Count; row++)
{
for (var column = 1; row <= excelRange.Columns.Count; row++)
{
var cellText = excelRange[row, column].Text.ToString();
}
}
This does two things. First, you're looking at one cell at a time. Second, you're using the Text property. The Text property should always be a string so you could just do this and it would almost certainly work:
string cellText = excelRange.Cells[row, column].Text;
It's just that the object model returns dynamic, so even though it is a string, the possibility is left open that maybe it won't be.
My strong recommendation - and I think most developers would agree - is to abandon Excel Interop and run from it, and use a library like EPPlus instead. There are tons of examples.
Excel Interop works by actually starting an instance of Excel and giving you access to the clunky VBA object model. It's evil. Chances are that if you open your task manager right now you'll see several extra instances of Excel open that you didn't expect to see. Fixing that is a whole separate frustrating problem.
For some years Excel files have just been collections of XML documents, and EPPlus helps you to work with them as documents, but providing all sorts of helper methods so that you can interact with sheets, ranges, cells, and so forth. Try it. Trust me, you'll never look back.
Here's an example after adding the EPPlus Nuget package:
var pathToYourExcelWorkbook = #"c:\somepath\document.xlsx";
using (var workbookPackage = new ExcelPackage(new FileInfo(pathToYourExcelWorkbook)))
{
var workbook = workbookPackage.Workbook;
var sheet = workbook.Worksheets[1]; // 1-based, or use the name.
for (var row = 1; row <= 10; row++)
{
for (var column = 1; column <= 10; column++)
{
var cellText = sheet.Cells[row, column].Text;
}
}
}
It's awesome. No starting or closing an application - you're just reading from a file. No weird COM objects. And the objects are all strongly-typed. The Text property returns a string.

Related

LinqToExcel Not Parsing Date

I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015, however starting in 2016, the structure changes to look like this: '1/4/2016 (note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.
Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row value and where that column index is, it's empty.
Edit
Here is the code I am using to read in the file:
var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
data.Add(this.FillEntity(row));
}
The problem I'm referring to is inside the row variable, which is a LinqToExcel.Row instance and contains the raw data from Excel. The values inside row all line up, with the exception of the column for the date which is empty.
** Edit 2 **
I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the
LinqToExcel.ExcelQueryExecutorclass that is failing:
private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
{
var results = new List<object>();
var columnIndexMapping = new Dictionary<string, int>();
for (var i = 0; i < columns.Count(); i++)
columnIndexMapping[columns.ElementAt(i)] = i;
while (data.Read())
{
IList<Cell> cells = new List<Cell>();
for (var i = 0; i < columns.Count(); i++)
{
var value = data[i];
//I added this in, since the worksheet has over 37K rows and
//I needed to snag right before it hit the values I was looking for
//to see what the IDataReader was exposing. The row inside the
//IDataReader relevant to the column I'm referencing is null,
//even though the data definitely exists in the Excel file
if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
{
}
value = TrimStringValue(value);
cells.Add(new Cell(value));
}
results.CallMethod("Add", new Row(cells, columnIndexMapping));
}
return results.AsEnumerable();
}
Since their class uses an OleDbDataReader to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.
Found it! Once I traced down that it was the OleDbDataReader that was failing and not the LinqToExcel library itself, it sent me down a different path to look around. Apparently, when an Excel file is read by an OleDbDataReader (as virtually all utilities do under the covers), the first few records are scanned to determine the type of content associated with the column. In my scenario, over 20K records had "normal" dates, so it assumed everything was a date. Once it got to the "bad" records, the ' in front of the date meant it couldn't be parsed into a date, so the value was null.
To circumvent this, I load the file and tell it to ignore column headers. Since the header for this column is a string and most of the values are dates, it makes everything a string because of the mismatched types and the values I need are loaded properly. From there, I can parse accordingly and get it to work.
Source: What is IMEX in the OLEDB connection string?

C# ExcelPackage (EPPlus) DeleteRow does not change sheet dimension?

I am trying to build a data import tool that accepts an EXCEL file from the user and parses the data from the file to import data into my application.
I am running across a strange issue with DeleteRow that I cannot seem to find any information online, although it seems like someone would have come across this issue before. If this is a duplicate question, I apologize, however I could not find anything related to my issue after searching the web, except for this one which still isn't solving my problem.
So the issue:
I use the following code to attempt to "remove" any row that has blank data through ExcelPackage.
for (int rowNum = 1; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var rowCells = from cell in ws.Cells
where (cell.Start.Row == rowNum)
select cell;
if (rowCells.Any(cell => cell.Value != null))
{
nonEmptyRowsInFile += 1;
continue;
}
else ws.DeleteRow(rowNum);
//Update: ws.DeleteRow(rowNum, 1, true) also does not affect dimension
}
Stepping through that code, I can see that the DeleteRow is indeed getting called for the proper row numbers, but the issue is when I go to set the "total rows in file" count on the returned result object:
parseResult.RowsFoundInFile = (ws.Dimension.End.Row);
ws.Dimension.End.Row will still return the original row count even after the calls to DeleteRow.
My question is...do I have to "save" the worksheet or call something in order for the worksheet to realize that those rows have been removed? What is the point of calling "DeleteRow" if the row still "exists"? Any insight on this would be greatly appreciated...
Thanks
I think I figured out the problem. This is yet again another closure issue in C#. The problem is that the reference to "ws" is still the same reference from before the DeleteRow call.
In order to get the "updated" dimension, you have to redeclare the worksheet, for example:
ws = excelPackage.Workbook.Worksheets.First();
Once you get a new reference to the worksheet, it will have the updated dimensions, including any removed/added rows/columns.
Hopefully this helps someone.

SpreadsheetLight working with multiple worksheets

I am using SpreadsheetLight to write log files from a WinForms project. My intent is to write log entries to three worksheets in the same file, and I really want to avoid using Interop if I can avoid it.
I start with a template file made in Excel which has the three worksheets pre-populated with row titles, and since each worksheet has the same basic properties (which can vary independently), I encapsulate each sheet in a class, the basics of which look like this:
/// <summary>
/// Encapsulate the info we need to know about each worksheet in order to populate properly
/// </summary>
public class LogSheet
{
public SLDocument data;
public SLWorksheetStatistics stats;
public int RowCount;
public int ColumnCount;
public int currentColumn; //indicates what column you want to be writing to
public List<string> rowNames = new List<string>(); //used to make sure you're writing new data to the right row
public List<string> columnNames = new List<string>(); //used by GetLatestRun() to check if data already exists for a given serial number
public LogSheet(string sheet)
{
this.data = new SLDocument(_path, sheet);
this.stats = this.data.GetWorksheetStatistics();
this.RowCount = this.stats.EndRowIndex;
this.ColumnCount = this.stats.EndColumnIndex;
currentColumn = GetLatestRun();
for (int i = 1; i < RowCount + 1; i++)
{
this.rowNames.Add(this.data.GetCellValueAsString(i, 1));
}
for (int i = 1; i < ColumnCount + 1; i++)
{
this.columnNames.Add(this.data.GetCellValueAsString(1, i));
}
}
}
There are also some methods not shown in the LogSheet class that handle writing data to the right places.
This all seems to work fine, and when debugging, I can see that each of the three worksheets instantiated with new LogSheet(<sheetName>) contain the data they are supposed to after I've written things to them.
The problem is that when I want to save the data, I can get away with this.data.Save(), but it only saves one worksheet, and the other two are now left in limbo because the Save() method is terminal and closes the Excel file. trying the Save() method on either of the other sheets two ends up with an Exception "Object reference not set to an object" because, of course, Save() killed my spreadsheet, and the sheets no longer have anything to reference. The resulting file only has data for the first time I saved it.
My best guess for how to get around this is to not instantiate a new SLDocument for each sheet and instead use SLDocument.SelectWorksheet() each time I want to write to a specific worksheet, but I still want to keep things encapsulated in the LogSheet class because everything else in there is still relevant.
Any other suggestions?
The recommended and efficient way is to store all the logs to be written in memory first (with a List<> or something). Then when writing, you select the worksheet, write everything from the first List<>, select the second worksheet, write everything from the second List<>, select the third worksheet, write everything from the third List<>.
If memory is an issue, then select first worksheet, write log chunk into cell value, select second worksheet, write log chunk into cell value (will be in second worksheet because second worksheet is currently selected), select third worksheet, write log chunk. Then iterate over every log chunk with the above.
The latter method takes less memory at any one time, but takes more CPU cycles because you keep going back and forth between the worksheets. The going back and forth thing is equivalent to loading up one worksheet, unloading it, then load another worksheet and so on.

PERFORMANCE - looping over cells - suppress “number stored as text” warning in Excel VSTO with C#

I am populating a ListObject with data from a database, and am pre-formatting ListColumns which come from VarChar (& similar) as Text before inserting the data.
This works well, but some affected cells now are showing the 'Number Stored As Text' error.
The answer https://stackoverflow.com/a/21869098/1281429 suppresses the error correctly, but requires looping through all cells (as it is not possible to perform the action on a range).
Unfortunately for large ranges this is unacceptably slow.
(n.b. - if you do it manually in Excel it's lightning fast)
Here is a code snippet in C# (for a particular column):
var columnDataRange = listColumn.DataBodyRange
var cells = columnDataRange.Cells;
for (var i = 1; i < cells.Count; i++)
{
InteropExcel.Range cell = cells[i, 1];
if (cell.Count > 1) break;
if (cell.Errors != null)
{
var item = cell.Errors.Item[InteropExcel.XlErrorChecks.xlNumberAsText];
item.Ignore = true;
}
}
Does anyone know of a faster way of doing this?
(Or, more generally, a faster way of iterating through cells in a range?)
Hope someone can help - thanks.
Edit: this is a VSTO Application-Level add-in for Excel 2010/2013.
Just to be sure - you are going from a database to an Excel export? Are you creating a new, clean spreadsheet or overwriting existing data in an existing spreadsheet?
If you are overwriting data in an existing spreadsheet, I would first clear the columns and format the columns in Excel (programmatically of course). It is likely old data and new data going into the same space are causing type issues.
So something like:
thisExcel.xlWorksheet.Range[yourrange].Value = ""
thisExcel.xlWorksheet.Range[yourrange].NumberFormat = choseyourformat
http://msdn.microsoft.com/en-us/library/office/ff196401(v=office.15).aspx
You should be able to apply that to a larger area.

GemBox.Spreadsheet last used row

I am trying to get the index of the last used row in a spreadsheet. I've found that in excel it could be done like that:
int lastUsedRow = worksheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell,
Type.Missing).Row;
But this doesn't seem to work with GemBox. The idea is that I have a template excel file that I want to fill with more information and therefore need the last row, so that I can continue on the next one.
Hi you can just use ExcelFile.Rows.Count property.
Gets the number of currently allocated elements (dynamically changes when worksheet is modified)
Try the following:
int lastUsedRow = worksheet.Rows.Count - 1;
Also regarding the shahkalpesh suggestion, yes you can also achieve your task with that approach as well, here is how:
var usedRange = worksheet.GetUsedCellRange(true);
int lastUsedRow = usedRange.LastRowIndex;
Note: I haven't used Gembox. My answer is based on searching in the documentation.
GetUsedCellRange returns a CellRange, which has a property named LastRowIndex.
Does this work the same way as Excel?

Categories

Resources