I have written a C# program that does a lot of iterative calculations and then returns a huge list of data. Because the data changes each time I run the program, I draw it in an Excel spreadsheet with predefined functions and graphs that are useful to interpret the data. However, all my charts in the spreadsheet depend on a single column of data, through with other columns and axis are calculated using formulas. However, the total amount of data is not always constant.
For instance, sometimes I get 22 elements of data in the list, and sometimes the number flows into 100s. To have a stable bound, I cap the charts to graph only the first 50 rows of data, and in my program, I fill the remaining columns with the value "#N/A". However, when I open the spreadsheet, the rows with superfluous data is graphed as 0s. I want the charts to graph only the rows with valid data.
Here is what my code looks like, it is relatively very simple, so I am not going to modify this, I want to know what changes I can make in the spreadsheet.
FileInfo newFile = new FileInfo("Report.xlsx");
ExcelPackage pack = new ExcelPackage(newFile);
ExcelWorksheet ws = pack.Workbook.Worksheets[1];
int cellCount = 2;
for(int i = 0; i < 49; i++)
{
String cell = "B" + cellCount;
if (i < data.Count)
ws.Cells[cell].Value = data.ElementAt(i);
else
ws.Cells[cell].Value = "#N/A";
cellCount++;
}
Console.Out.WriteLine("saving");
pack.Save();
System.Diagnostics.Process.Start("Report.xlsx");
To access the Excel documents, I use EPPLUS. Here is what my charts look like:
As the graph shows, the last 5-6 rows contain NULL values, however, they are graphed as well, with values of 0. The blue line represents the data in the third column, and red line represents the last column (that is never going to be null because it's dependant on a fixed row).
How do I force Excel to ignore the last few NULL rows?
Related
I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015, however starting in 2016, the structure changes to look like this: '1/4/2016 (note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.
Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row value and where that column index is, it's empty.
Edit
Here is the code I am using to read in the file:
var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
data.Add(this.FillEntity(row));
}
The problem I'm referring to is inside the row variable, which is a LinqToExcel.Row instance and contains the raw data from Excel. The values inside row all line up, with the exception of the column for the date which is empty.
** Edit 2 **
I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the
LinqToExcel.ExcelQueryExecutorclass that is failing:
private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
{
var results = new List<object>();
var columnIndexMapping = new Dictionary<string, int>();
for (var i = 0; i < columns.Count(); i++)
columnIndexMapping[columns.ElementAt(i)] = i;
while (data.Read())
{
IList<Cell> cells = new List<Cell>();
for (var i = 0; i < columns.Count(); i++)
{
var value = data[i];
//I added this in, since the worksheet has over 37K rows and
//I needed to snag right before it hit the values I was looking for
//to see what the IDataReader was exposing. The row inside the
//IDataReader relevant to the column I'm referencing is null,
//even though the data definitely exists in the Excel file
if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
{
}
value = TrimStringValue(value);
cells.Add(new Cell(value));
}
results.CallMethod("Add", new Row(cells, columnIndexMapping));
}
return results.AsEnumerable();
}
Since their class uses an OleDbDataReader to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.
Found it! Once I traced down that it was the OleDbDataReader that was failing and not the LinqToExcel library itself, it sent me down a different path to look around. Apparently, when an Excel file is read by an OleDbDataReader (as virtually all utilities do under the covers), the first few records are scanned to determine the type of content associated with the column. In my scenario, over 20K records had "normal" dates, so it assumed everything was a date. Once it got to the "bad" records, the ' in front of the date meant it couldn't be parsed into a date, so the value was null.
To circumvent this, I load the file and tell it to ignore column headers. Since the header for this column is a string and most of the values are dates, it makes everything a string because of the mismatched types and the values I need are loaded properly. From there, I can parse accordingly and get it to work.
Source: What is IMEX in the OLEDB connection string?
I am populating a ListObject with data from a database, and am pre-formatting ListColumns which come from VarChar (& similar) as Text before inserting the data.
This works well, but some affected cells now are showing the 'Number Stored As Text' error.
The answer https://stackoverflow.com/a/21869098/1281429 suppresses the error correctly, but requires looping through all cells (as it is not possible to perform the action on a range).
Unfortunately for large ranges this is unacceptably slow.
(n.b. - if you do it manually in Excel it's lightning fast)
Here is a code snippet in C# (for a particular column):
var columnDataRange = listColumn.DataBodyRange
var cells = columnDataRange.Cells;
for (var i = 1; i < cells.Count; i++)
{
InteropExcel.Range cell = cells[i, 1];
if (cell.Count > 1) break;
if (cell.Errors != null)
{
var item = cell.Errors.Item[InteropExcel.XlErrorChecks.xlNumberAsText];
item.Ignore = true;
}
}
Does anyone know of a faster way of doing this?
(Or, more generally, a faster way of iterating through cells in a range?)
Hope someone can help - thanks.
Edit: this is a VSTO Application-Level add-in for Excel 2010/2013.
Just to be sure - you are going from a database to an Excel export? Are you creating a new, clean spreadsheet or overwriting existing data in an existing spreadsheet?
If you are overwriting data in an existing spreadsheet, I would first clear the columns and format the columns in Excel (programmatically of course). It is likely old data and new data going into the same space are causing type issues.
So something like:
thisExcel.xlWorksheet.Range[yourrange].Value = ""
thisExcel.xlWorksheet.Range[yourrange].NumberFormat = choseyourformat
http://msdn.microsoft.com/en-us/library/office/ff196401(v=office.15).aspx
You should be able to apply that to a larger area.
I calculate the amount of rows I want to have in my second column using a for loop based on reading how many records a file has that has been opened. I have researched and tried various solutions but nothing works, yet it seems so simple. Below is my current code where I retrieve the file's length and do a quick sum, entering a for loop where (at the moment) I am only able to populate the first column.
long Count = 1;
FileInfo Fi = new FileInfo(file);
long sum = (Fi.Length / 1024) - Count;
for (int i = 0; i < sum; i++)
{
DataGridView1.Rows.Add(Count++);
}
I'm not sure how to do it but I know the above code adds to the first column by default - I don't know how to modify it. I know by:
DataGridView1.Rows.Add("a","b");
... The 'b' value is displayed in the second column, but I don't want anything for now in the first where 'a' is.
I have looked at insert a row with one column datagridview c# but it is related to merging columns, again, I don't want this.
DataGridView1.Rows.Add("",Count++);
Works to an extent, but is not the right way to do it. I'm going to be adding data to the first column later on.
If you want to omit the value for the first column, just add null or DBNull.Value, e.g.:
DataGridView1.Rows.Add(DBNull.Value, Count++);
This way, the first column will be empty while the second columns contains the value of Count.
How do I set the source data of an excel interop chart to several entire rows?
I have a .csv file that is created by my program to display some results that are produced. For the sake of simplicity let's say these results and chart are displayed like this: (which is exactly how I want it to be)
Now the problem I am having is that the number of people is variable. So I really need to access the entire rows data.
Right now, I am doing this:
var range = worksheet.get_range("A1","D3");
xlExcel.ActiveChart.SetSourceData(range);
and this works great if you only have three Persons, but I need to access the entire row of data.
So to restate my question, how can I set the source data of my chart to several entire rows?
I tried looking here but couldn't seem to make that work with rows instead of columns.
var range = worksheet.get_range("A1").CurrentRegion;
xlExcel.ActiveChart.SetSourceData(range);
EDIT: I am assuming that the cells in the data region won't be blank.
To test this,
1) place cursor on cell A1
2) press F5
3) click on "Special"
4) choose "Current Region" as option
5) click "OK"
This will select the cells surrounding A1 which are filled, which I believe is what you are looking for.
The translation of that in VBA code points to CurrentRegion property. I think, that should work.
Check Out the option Range.EntireRow I'm not 100% on how to expand that to a single range containing 3 entire rows, but it shouldn't be that difficult to accomplish.
Another thing you can do is scan to get the actual maximum column index you need (this is assuming that there are guaranteed to be no gaps in the names), then use that index as you declare your range.
Add Code
int c = 2;//column b
while(true)
{
if (String.IsNullOrEmpty(worksheet.GetRange(1,c).Value2))
{
c--;
break;
}
c++;
}
Take a column from A to D that you're sure has no empty cells.
Do some loop to find the first empty one in that column and it will be one after the last.
Range Cell = SHeet.Range["A1"]; //or another column you're sure there's no empty data
int LineOffset = 0;
while (Cell.Offset[LineOffset, 0].Value != "") //maybe you should cast the left side to string, not sure.
{
LineOffset++;
}
int LastLine = LineOffset - 1;
Then you can get Range[Sheet.Cells[1,1], Sheet.Cells[LastLine, 4]]
Out of the box here, but why not transpose the data? Three columns for Name, Height, Weight. Convert this from an ordinary range to a Table.
When any formula, including a chart's SERIES formula references a column of a table, it always references that column, no matter how long the table gets. Add another person (another row) and the chart displays the data with the added person. Remove a few people, and the chart adjusts without leaving blanks at the end.
This is illustrated in my tutorial, Easy Dynamic Charts Using Lists or Tables.
I need to know how to get the count of cells in a Row object using OpenXML. Currently, I am using
row.Descendants<Cell>().Count<Cell>()
but this is not correct at all. Any ideas what method/property gives me the count of cells?
You can get the count using one of the followings:
row.Descendants<Cell>().Count<Cell>() //The one you used
row.Elements<Cell>().Count<Cell>()
row.Count()
I tested all of them and they are working correctly. Keep in mind the following points while validating the correctness of the returned count:
- Hidden columns are included in the calculations
- Merged cells are counted separately. If your excel sheet contains 3 columns A, B & C, but A & B are merged, then the count of the cells is 3 not 2
You can extract a list of all the merged cells in a sheet using:
SpreadsheetDocument ExcelSpreadSheet = SpreadsheetDocument.Open(ReportFile, true);
WorksheetPart ExcelWSP = GetWorksheetPartByName(ExcelSpreadSheet, TemplateEntity.ExcelSheetName);
MergeCells mergeList = ExcelWSP.Worksheet.Elements<MergeCells>().First();
The merge information is saved in the Reference property, and you can access it using the following:
((MergeCell)mergeList.ElementAt(i)).Reference