Intermittent “InteropServices.COMException / ForwardCallToInvokeMember” accessing Value2 on Range cell - c#

In my last question: ExcelDNA throwing exception accessing Range.Value2
I blamed ExcelDNA for the reason why Value2 was throwing a COMException. But I'm not sure this is the case any longer.
I disables the IsMacroType flag which completely stops the COMException from happening and noticed that sometimes Range.Value2 doesn't throw an exception at all.
So sometimes it works, and sometimes it doesn't. My question is, what would cause range.Value2 to throw intermittent COMExceptions?
It's annoying because the stacktrace gives me no useful information, and IsMacroType fixes the problem entirely.
My suspicion is if a cell is constantly changing, by the time Value2 is accessed the cell might get invalidated, but it's a guess and I'm not sure how excel works.
But also, it doesn't make sense as there aren't multiple threads in the code.
Have you encountered this problem?
This is the code:
var valueCell = (Range)row.Cells[1, 30];
if (valueCell .Value2 != null)
{
//do something
}
Seriously, Value2 fails to evaluate on the if statement

I'm a bit surprised it ever works without IsMacroType=true. It might depend on whether the cell has been calculated or not.
Excel normally prevents a UDF from reading other parts of the sheet, unless the UDF is registered as a "macro sheet equivalent" (with a # in the registration string).
Functions that are registered as "macro sheet equivalents" have the following behaviour (see towards the bottom of the documentation from xlfRegister):
Placing a # character after the last parameter code in pxTypeText gives the function the same calling permissions as functions on a macro sheet. These are as follows:
The function can retrieve the values of cells that have not yet been calculated in this recalculation cycle.
The function can call any of the XLM information (Class 2) functions, for example, xlfGetCell.
If the number sign (#) is not present: evaluating an uncalculated cell results in an xlretUncalced error, and the current function is called again once the cell has been calculated; calling any XLM information function other than xlfCaller results in an xlretInvXlfn error.
Your error in the case without IsMacroType=true might be the last one of these - you're reading an uncalculated cell hence getting an error.
The side-effects of setting IsMacroType=true for a UDF is not entirely clear. One effect is that functions registered as IsMacroType=true and having a parameter that is marked as AllowReference=true will automatically be considered volatile (even if registered with IsVolatile=false). Another side effect is that the recalculation sequence is affected - particularly if you are reading uncalculated cells from inside your UDF.
You also have to be really careful in reading other cells from a UDF, regarding your expectation of what should recalculate when, since you are kind of undermining the calculation dependency tree. In your example your UDF is reading cell A30, but will changes to cell A30 automatically cause your function to recalculate? Certainly you can't use the Excel dependency tracking tools to understand that your cell depends on A30. Really you rather want to have a function that takes the explicit parameter, and is called as =DoSomething(A30) making everything clear and avoiding all these problems.
One reason you might be trying to read Value2 is to determine the formatting of the cell instead of the underlying value that Excel stores, but that's really dangerous since it is not part of the recalculation and dependency tree.
So I would say the fact that you are seeing some unexpected behaviour is a sign that you are going in a direction that Excel does not like.

Related

Excel not happily displaying large 2D Range FormulaArray

I have an XLL which returns an LPXLOPER result of type 2D array for a Range with FormulaArray.
Things go happily <1s update until I hit about size 50x200. At that point, Excel gets stuck blinking "Ready (pretty Excel graphic)" and "Filling cells (empty progress bar)" at 100% usage of 1 core which goes on for less than half a minute before returning values.
At 100x100 it takes 8-10 minutes.
At 200x100 I'm still waiting for it to return.
The code is identical in all cases. I step through the VB and it hangs on calling RUN(...) to populate the data array. No further code is executed. I put breakpoints in my XLL and it doesn't hit any of them. I break into Excel and it's doing Excel stuff in EXCEL.EXE or in libraries I didn't even know existed.
Anyone know (a) what Excel is doing when it says Ready / Filling Cells even though it is obviously NOT ready, and (b) why the nonlinear growth wrt data size?
I tested an XLL returning a 200x100 array and its virtually instantaneous. So the problem must be either dependency building or calculation or erasing/buildng the cell table.
Try
- switching calculation to manual to turn off calculation
- setting forcefullcalculation to true to switch off dependency building
- test with an empty workbook to see if it is caused by the workbook contents

C# Interop Excel Range get_End is returning with 1 less element

I need to parse an Excel file. First I wrote an extension in Visual Basic inside the Excel file, all worked good. Now I need to port it to C# so it can be a separate application. While the functions I use are the same, the result is not the same...
When I choose from the GUI which Worksheet to parse, I do something like:
range = (workbook.Worksheets.get_Item(itemIndex) as Excel.Worksheet).UsedRange;
Then, for the first row I need to parse I do something like:
range.get_Range(range.Cells.get_Item(6, 2),
range.Cells.get_Offset(6,2).get_End(Excel.XlDirection.xlToRight)))
And I get the right result with all the fields I need.
The second time when I need to get another row, I do:
range.get_Range(range.Cells.get_Item(13, 3),
range.Cells.get_Offset(13, 3).get_End(Excel.XlDirection.xlToRight)))
This time it gives me all the elements except the last one. And I have more functions like this, some with XlDirection.xlDown and all of them return me the range without the last element.
I tried to swap the functions, thinking may be I need to release range and then acquire it again or something(wanted to check if it's always working only for the first function being executed) but it is always working only for the first example, whenever the function is being executed...
This is even stranger because it worked in VBA Excel.
I also tired with Excel.Application get_Range and Excel.Worksheet get_Range...
Anyone knows why this happens?
I managed to solve this strange behavior. It's not the correct way of getting out the data.
The correct way would be: range.get_Range(range.Cells[6, 2], (range.Cells[6, 2] as Excel.Range).get_End(Excel.XlDirection.xlToRight)) - for the first example.
Hope it helps somebody...

Unable to assign FormulaArray on a selected Excel Range object in C#

My C# code manipulates Excel Ranges using Microsoft.Office.Interop.Excel library. I need to assign a Formula Array to a selected Range. I've tried a variety of methods recommended online, including Microsoft recommendations, but so far was unable to make it work properly.
I observe 2 issues:
Issue 1.
Assignment looks fine on surface: it does not fail, cell objects in the range show .ArrayFormula property assigned, on the spreadsheet formula in every cell appears in curly brackets. However, the Formula Array is actually disjointed: each cell in the range can be changed separately, which normal Formula Array would not permit. It behaves as if every cell had its own, single-cell Formula Array, independent from others. Regardless of my best efforts, this is ALWAYS the case.
Is there actually a properly working solution for this issue?
Issue 2.
My Array Formula contains a reference to another Range (Range A), which I need to refer to in R1C1 style. I need Array Formula in every cell in the target Range point to the same Range A. Somehow I always end up with every cell in target Range having its own version of the formula, referring to shifted "Range A" area. How do I make the reference stay in place, regardless of a cell?
N.B. You may assume that Issue 2 is causing Issue 1, but this is not the case: for example, when array formula is simple, like "=SIN(1)", the Issue 1 still occurs.
I would really appreciate any WORKING suggestions. Thanks a lot in advance.
No one seemed interested, however I found a solution and will answer to my own question.
Apparently, assignment of an Excel Array Formula within C# code works only if the formula is in A1 style, not in R1C1 style. In my case, I was starting with a R1C1-style formula, so it required conversion to A1 style. This is achieved by assigning the original R1C1-style formula to the top left cell of the target range:
topLeftCell.Formula = myR1C1Formula;
// topLeftCell.FormulaR1C1 = myR1C1Formula also works
Assignment to that particular cell will ensure that A1-style formula contains correct references. Get back the converted formula as a string:
string formulaA1 = topLeftCell.Formula;
Get reference to the whole target range by rezising the top left cell:
Excel.Range newArrayRange = topLeftCell.Resize[height, width];
Resize operation must precede the following assignment. Finally, assign the A1-style formula to the FormulaArray property of the whole target range:
newArrayRange.FormulaArray = formulaA1;
This works perfectly without issues or side-effects.

Problem with C# multiline textbox memory usage

I am using a multiline text box in C# to just log some trace information. I simply use AppendText("text-goes-here\r\n") as I need to add lines.
I've let this program run for a few days (with a lot of active trace) and I noticed it was using a lot of memory. Long story short, it appears that even with the maxlength value to something very small (256) the content of the text box just keeps expanding.
I thought it worked like a FIFO (throwing away the oldest text that exceeds the maxlength size). It doesn't, it just keeps increasing in size. This is apparently the cause of my memory waste. Anybody know what I'm doing wrong?
Added a few hours after initial question...
Ok, I tried the suggested code below. To quickly test it, I simply added a timer to my app and from that timer tick I now call a method that does essentially the same thing as the code below. The tick rate is high so that I can observe the memory usage of the process and quickly determine if there is a leak. There wasn't. That was good; however, I put this in my application and memory usage did not change (still leaking). That sure seems to imply that I have a leak somwehere else :-( however, if I simply add a return at the top of that method, the usage drops back to stable. Any thoughts on this? The timer-tick-invoked code did not accumulate memory but my real code (same method) does. The difference is that I'm calling the method from a variety of different places in the real code. Can the context of the call affect this somehow? (note, if it isn't already obvious, I'm not a .NET expert by any means)...
TextBox will allow you to append text regardless of MaxLength value - it's only used to control user entry. You can create a method that will be adding new text after verifying that maxlength is not reached, and if it is, just remove x lines from the beginning.
You could use a simple function to append text:
int maxLength = 256;
private void AppendText(string text)
{
textBox1.AppendText(text);
if(textBox1.Text.Length > maxLength)
textBox1.Text = textBox1.Text.Substring(textBox1.Text.Length - maxLength);
}

Excel Interop - Efficiency and performance

I was wondering what I could do to improve the performance of Excel automation, as it can be quite slow if you have a lot going on in the worksheet...
Here's a few I found myself:
ExcelApp.ScreenUpdating = false -- turn off the redrawing of the screen
ExcelApp.Calculation = Excel.XlCalculation.xlCalculationManual -- turning off the calculation engine so Excel doesn't automatically recalculate when a cell value changes (turn it back on after you're done)
Reduce calls to Worksheet.Cells.Item(row, col) and Worksheet.Range -- I had to poll hundreds of cells to find the cell I needed. Implementing some caching of cell locations, reduced the execution time from ~40 to ~5 seconds.
What kind of interop calls take a heavy toll on performance and should be avoided? What else can you do to avoid unnecessary processing being done?
When using C# or VB.Net to either get or set a range, figure out what the total size of the range is, and then get one large 2 dimensional object array...
//get values
object[,] objectArray = shtName.get_Range("A1:Z100").Value2;
iFace = Convert.ToInt32(objectArray[1,1]);
//set values
object[,] objectArray = new object[3,1] {{"A"}{"B"}{"C"}};
rngName.Value2 = objectArray;
Note that its important you know what datatype Excel is storing (text or numbers) as it won't automatically do this for you when you are converting the type back from the object array. Add tests if necessary to validate the data if you can't be sure beforehand of the type of data.
This is for anyone wondering what the best way is to populate an excel sheet from a db result set. This is not meant to be a full list by any means but it does list a few options.
Some performance numbers while attempting to populate an excel sheet with 155 columns and 4200 records on an old Pentium 4 3GHz box including data retrieval time which was never more than 10 seconds in order of slowest to fastest is as follows...
One cell at a time - Just under 11 minutes
Populating a dataset by converting to html + Saving html to disk + Loading html into excel and saving worksheet as xls/xlsx - 5 minutes
One column at a time - 4 minutes
Using the deprecated sp_makewebtask procedure in SQL 2005 to create an HTML file - 9 Seconds + Followed by loading the html file in excel and saving as XLS/XLSX - About 2 minutes.
Convert .Net dataset to ADO RecordSet and use the WorkSheet.Range[].CopyFromRecordset function to populate excel - 45 seconds!
I ended up using option 5. Hope this helps.
If you're polling values of many cells you can get all the cell values in a range stored in a variant array in one fell swoop:
Dim CellVals() as Variant
CellVals = Range("A1:B1000").Value
There is a tradeoff here, in terms of the size of the range you're getting values for. I'd guess if you need a thousand or more cell values this is probably faster than just looping through different cells and polling the values.
Use excels builtin functionality whenever possible, for example: Instead of searching a whole column for a given string, use the find command available in the GUI by Ctrl-F:
Set Found = Cells.Find(What:=SearchString, LookIn:=xlValues, _
SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False)
If Not Found Is Nothing Then
Found.Activate
(...)
EndIf
If you want to sort some lists, use the excel sort command, don't do it manually in VBA:
Selection.Sort Key1:=Range("A1"), Order1:=xlAscending, Header:=xlGuess, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
As Anonymous Type says: reading/writing large range blocks is very important to performance.
In cases where the COM-Interop overhead is still too large you may want to switch to using the XLL interface, which is the fastest Excel interface.
Although the XLL interface is primarily meant for C++ users, both XL DNA and Addin Express provide .NET to XLL bridge capability which is significantly faster than COM-Interop.
Performance also depends a lot on how you automate Excel. VBA is faster than COM automation is faster than .NET automation. And typically early (compile time) binding is faster than late binding, too.
If you have serious performance problems you could think of moving the critical parts of the code to a VBA module and call that code from your COM/.NET automation code.
If you use .NET you should also use the optimized primary interop assemblies available from Microsoft and not use custom-built interop assemblies.
Another big thing you can do in VBA is to use Option Explicit and avoid Variants wherever possible. Variants are not 100% avoidable in VBA, but they make the interpreter do more work at runtime and waste memory.
I found this article very helpful when I was starting with VBA in Excel.
http://www.ozgrid.com/VBA/SpeedingUpVBACode.htm
And this book
http://www.amazon.com/VB-VBA-Nutshell-Language-OReilly/dp/1565923588
Similar to
app.ScreenUpdates = false //and
app.Calculation = xlCalculationManual
you can also set
app.EnableEvents = false //Prevent Excel events
app.Interactive = false //Prevent user clicks and keystrokes
although they don't seem to make as big a difference as the first two.
Similar to setting Range values to arrays, if you are working with data that is mostly tables with the same formula in every row of a column, you can use R1C1 formula notation for your formula and set an entire column equal to the formula string to set the whole thing in one call.
app.ReferenceStyle = xlR1C1
app.ActiveSheet.Columns(2) = "=SUBSTITUTE(C[-1],"foo","bar")"
Also, creating XLL add-ins using ExcelDNA & .NET (or the hard way in C) is also the only way you can get UDFs to run on multiple threads. (See Excel DNA's ExcelFunction attribute's IsThreadSafe property.)
Before I transitioned to Excel DNA completely, I also experimented with creating COM visible libraries in .NET to reference in VBA projects. Heavy text processing is a bit faster than VBA that way, as are using wrapped .NET List classes instead of VBA's Collection, but Excel DNA is better.

Categories

Resources