Excel not happily displaying large 2D Range FormulaArray - c#

I have an XLL which returns an LPXLOPER result of type 2D array for a Range with FormulaArray.
Things go happily <1s update until I hit about size 50x200. At that point, Excel gets stuck blinking "Ready (pretty Excel graphic)" and "Filling cells (empty progress bar)" at 100% usage of 1 core which goes on for less than half a minute before returning values.
At 100x100 it takes 8-10 minutes.
At 200x100 I'm still waiting for it to return.
The code is identical in all cases. I step through the VB and it hangs on calling RUN(...) to populate the data array. No further code is executed. I put breakpoints in my XLL and it doesn't hit any of them. I break into Excel and it's doing Excel stuff in EXCEL.EXE or in libraries I didn't even know existed.
Anyone know (a) what Excel is doing when it says Ready / Filling Cells even though it is obviously NOT ready, and (b) why the nonlinear growth wrt data size?

I tested an XLL returning a 200x100 array and its virtually instantaneous. So the problem must be either dependency building or calculation or erasing/buildng the cell table.
Try
- switching calculation to manual to turn off calculation
- setting forcefullcalculation to true to switch off dependency building
- test with an empty workbook to see if it is caused by the workbook contents

Related

Intermittent “InteropServices.COMException / ForwardCallToInvokeMember” accessing Value2 on Range cell

In my last question: ExcelDNA throwing exception accessing Range.Value2
I blamed ExcelDNA for the reason why Value2 was throwing a COMException. But I'm not sure this is the case any longer.
I disables the IsMacroType flag which completely stops the COMException from happening and noticed that sometimes Range.Value2 doesn't throw an exception at all.
So sometimes it works, and sometimes it doesn't. My question is, what would cause range.Value2 to throw intermittent COMExceptions?
It's annoying because the stacktrace gives me no useful information, and IsMacroType fixes the problem entirely.
My suspicion is if a cell is constantly changing, by the time Value2 is accessed the cell might get invalidated, but it's a guess and I'm not sure how excel works.
But also, it doesn't make sense as there aren't multiple threads in the code.
Have you encountered this problem?
This is the code:
var valueCell = (Range)row.Cells[1, 30];
if (valueCell .Value2 != null)
{
//do something
}
Seriously, Value2 fails to evaluate on the if statement
I'm a bit surprised it ever works without IsMacroType=true. It might depend on whether the cell has been calculated or not.
Excel normally prevents a UDF from reading other parts of the sheet, unless the UDF is registered as a "macro sheet equivalent" (with a # in the registration string).
Functions that are registered as "macro sheet equivalents" have the following behaviour (see towards the bottom of the documentation from xlfRegister):
Placing a # character after the last parameter code in pxTypeText gives the function the same calling permissions as functions on a macro sheet. These are as follows:
The function can retrieve the values of cells that have not yet been calculated in this recalculation cycle.
The function can call any of the XLM information (Class 2) functions, for example, xlfGetCell.
If the number sign (#) is not present: evaluating an uncalculated cell results in an xlretUncalced error, and the current function is called again once the cell has been calculated; calling any XLM information function other than xlfCaller results in an xlretInvXlfn error.
Your error in the case without IsMacroType=true might be the last one of these - you're reading an uncalculated cell hence getting an error.
The side-effects of setting IsMacroType=true for a UDF is not entirely clear. One effect is that functions registered as IsMacroType=true and having a parameter that is marked as AllowReference=true will automatically be considered volatile (even if registered with IsVolatile=false). Another side effect is that the recalculation sequence is affected - particularly if you are reading uncalculated cells from inside your UDF.
You also have to be really careful in reading other cells from a UDF, regarding your expectation of what should recalculate when, since you are kind of undermining the calculation dependency tree. In your example your UDF is reading cell A30, but will changes to cell A30 automatically cause your function to recalculate? Certainly you can't use the Excel dependency tracking tools to understand that your cell depends on A30. Really you rather want to have a function that takes the explicit parameter, and is called as =DoSomething(A30) making everything clear and avoiding all these problems.
One reason you might be trying to read Value2 is to determine the formatting of the cell instead of the underlying value that Excel stores, but that's really dangerous since it is not part of the recalculation and dependency tree.
So I would say the fact that you are seeing some unexpected behaviour is a sign that you are going in a direction that Excel does not like.

C# Interop Excel Range get_End is returning with 1 less element

I need to parse an Excel file. First I wrote an extension in Visual Basic inside the Excel file, all worked good. Now I need to port it to C# so it can be a separate application. While the functions I use are the same, the result is not the same...
When I choose from the GUI which Worksheet to parse, I do something like:
range = (workbook.Worksheets.get_Item(itemIndex) as Excel.Worksheet).UsedRange;
Then, for the first row I need to parse I do something like:
range.get_Range(range.Cells.get_Item(6, 2),
range.Cells.get_Offset(6,2).get_End(Excel.XlDirection.xlToRight)))
And I get the right result with all the fields I need.
The second time when I need to get another row, I do:
range.get_Range(range.Cells.get_Item(13, 3),
range.Cells.get_Offset(13, 3).get_End(Excel.XlDirection.xlToRight)))
This time it gives me all the elements except the last one. And I have more functions like this, some with XlDirection.xlDown and all of them return me the range without the last element.
I tried to swap the functions, thinking may be I need to release range and then acquire it again or something(wanted to check if it's always working only for the first function being executed) but it is always working only for the first example, whenever the function is being executed...
This is even stranger because it worked in VBA Excel.
I also tired with Excel.Application get_Range and Excel.Worksheet get_Range...
Anyone knows why this happens?
I managed to solve this strange behavior. It's not the correct way of getting out the data.
The correct way would be: range.get_Range(range.Cells[6, 2], (range.Cells[6, 2] as Excel.Range).get_End(Excel.XlDirection.xlToRight)) - for the first example.
Hope it helps somebody...

Create excel async function

I have an excel function that get's data from the internet. The problem is that the function takes a long time to execute and it slows down everything.
It will be amazing if I can change the value of the cell without changing its formula! So if I call the function =GetNumOfEmployee() that returns 0 imediately and then from my addin when I get the result I can replace the value of that cell with 100 for example. If I do that the fomula gets lost and I do not want that to happen.
What I did in order to preserve the formula was to change the formatting of the cell by doing:
this.Application.ActiveCell.NumberFormat = 5;
if the active cell had a 0, then that line of code will replace the value to a 5 which is very cool. I get to preserve the formula and I have a new value.
The problem with that approach is that I get a lot of calculations and every time I write the line:
this.Application.ActiveCell.NumberFormat = 1234;
excel saves that formating at:
and very soon I reach the maximum number of formatings that excel enables. In 1 minute that list has about 500 formats and soon I am not able to delete them. If I programatically delete them excel acts kind of slow and I get the windows spining mouse icon every time I delete a format.
So in short I will like to replace the value of a cell without changing its' formula from my addin once I have the result that I need back. Should I place the value to the right of the cell?

Writing huge amounts of text to a textbox

I am writing a log of lots and lots of formatted text to a textbox in a .net windows form app.
It is slow once the data gets over a few megs. Since I am appending the string has to be reallocated every time right? I only need to set the value to the text box once, but in my code I am doing line+=data tens of thousands of times.
Is there a faster way to do this? Maybe a different control? Is there a linked list string type I can use?
StringBuilder will not help if the text box is added to incrementally, like log output for example.
But, if the above is true and if your updates are frequent enough it may behoove you to cache some number of updates and then append them in one step (rather than appending constantly). That would save you many string reallocations... and then StringBuilder would be helpful.
Notes:
Create a class-scoped StringBuilder member (_sb)
Start a timer (or use a counter)
Append text updates to _sb
When timer ticks or certain counter reached reset and append to
text box
restart process from #1
No one has mentioned virtualization yet, which is really the only way to provide predictable performance for massive volumes of data. Even using a StringBuilder and converting it to a string every half a second will be very slow once the log gets large enough.
With data virtualization, you would only hold the necessary data in memory (i.e. what the user can see, and perhaps a little more on either side) whilst the rest would be stored on disk. Old data would "roll out" of memory as new data comes in to replace it.
In order to make the TextBox appear as though it has a lot of data in it, you would tell it that it does. As the user scrolls around, you would replace the data in the buffer with the relevant data from the underlying source (using random file access). So your UI would be monitoring a file, not listening for logging events.
Of course, this is all a lot more work than simply using a StringBuilder, but I thought it worth mentioning just in case.
Build your String together with a StringBuilder, then convert it to a String using toString(), and assign this to the textbox.
I have found that setting the textbox's WordWrap property to false greatly improves performance, as long as you're ok with having to scroll to the right to see all of your text. In my case, I wanted to paste a 20-50 MB file into a MultiLine textbox to do some processing on it. That took several minutes with WordWrap on, and just several seconds with WordWrap off.

Excel Interop - Efficiency and performance

I was wondering what I could do to improve the performance of Excel automation, as it can be quite slow if you have a lot going on in the worksheet...
Here's a few I found myself:
ExcelApp.ScreenUpdating = false -- turn off the redrawing of the screen
ExcelApp.Calculation = Excel.XlCalculation.xlCalculationManual -- turning off the calculation engine so Excel doesn't automatically recalculate when a cell value changes (turn it back on after you're done)
Reduce calls to Worksheet.Cells.Item(row, col) and Worksheet.Range -- I had to poll hundreds of cells to find the cell I needed. Implementing some caching of cell locations, reduced the execution time from ~40 to ~5 seconds.
What kind of interop calls take a heavy toll on performance and should be avoided? What else can you do to avoid unnecessary processing being done?
When using C# or VB.Net to either get or set a range, figure out what the total size of the range is, and then get one large 2 dimensional object array...
//get values
object[,] objectArray = shtName.get_Range("A1:Z100").Value2;
iFace = Convert.ToInt32(objectArray[1,1]);
//set values
object[,] objectArray = new object[3,1] {{"A"}{"B"}{"C"}};
rngName.Value2 = objectArray;
Note that its important you know what datatype Excel is storing (text or numbers) as it won't automatically do this for you when you are converting the type back from the object array. Add tests if necessary to validate the data if you can't be sure beforehand of the type of data.
This is for anyone wondering what the best way is to populate an excel sheet from a db result set. This is not meant to be a full list by any means but it does list a few options.
Some performance numbers while attempting to populate an excel sheet with 155 columns and 4200 records on an old Pentium 4 3GHz box including data retrieval time which was never more than 10 seconds in order of slowest to fastest is as follows...
One cell at a time - Just under 11 minutes
Populating a dataset by converting to html + Saving html to disk + Loading html into excel and saving worksheet as xls/xlsx - 5 minutes
One column at a time - 4 minutes
Using the deprecated sp_makewebtask procedure in SQL 2005 to create an HTML file - 9 Seconds + Followed by loading the html file in excel and saving as XLS/XLSX - About 2 minutes.
Convert .Net dataset to ADO RecordSet and use the WorkSheet.Range[].CopyFromRecordset function to populate excel - 45 seconds!
I ended up using option 5. Hope this helps.
If you're polling values of many cells you can get all the cell values in a range stored in a variant array in one fell swoop:
Dim CellVals() as Variant
CellVals = Range("A1:B1000").Value
There is a tradeoff here, in terms of the size of the range you're getting values for. I'd guess if you need a thousand or more cell values this is probably faster than just looping through different cells and polling the values.
Use excels builtin functionality whenever possible, for example: Instead of searching a whole column for a given string, use the find command available in the GUI by Ctrl-F:
Set Found = Cells.Find(What:=SearchString, LookIn:=xlValues, _
SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False)
If Not Found Is Nothing Then
Found.Activate
(...)
EndIf
If you want to sort some lists, use the excel sort command, don't do it manually in VBA:
Selection.Sort Key1:=Range("A1"), Order1:=xlAscending, Header:=xlGuess, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
As Anonymous Type says: reading/writing large range blocks is very important to performance.
In cases where the COM-Interop overhead is still too large you may want to switch to using the XLL interface, which is the fastest Excel interface.
Although the XLL interface is primarily meant for C++ users, both XL DNA and Addin Express provide .NET to XLL bridge capability which is significantly faster than COM-Interop.
Performance also depends a lot on how you automate Excel. VBA is faster than COM automation is faster than .NET automation. And typically early (compile time) binding is faster than late binding, too.
If you have serious performance problems you could think of moving the critical parts of the code to a VBA module and call that code from your COM/.NET automation code.
If you use .NET you should also use the optimized primary interop assemblies available from Microsoft and not use custom-built interop assemblies.
Another big thing you can do in VBA is to use Option Explicit and avoid Variants wherever possible. Variants are not 100% avoidable in VBA, but they make the interpreter do more work at runtime and waste memory.
I found this article very helpful when I was starting with VBA in Excel.
http://www.ozgrid.com/VBA/SpeedingUpVBACode.htm
And this book
http://www.amazon.com/VB-VBA-Nutshell-Language-OReilly/dp/1565923588
Similar to
app.ScreenUpdates = false //and
app.Calculation = xlCalculationManual
you can also set
app.EnableEvents = false //Prevent Excel events
app.Interactive = false //Prevent user clicks and keystrokes
although they don't seem to make as big a difference as the first two.
Similar to setting Range values to arrays, if you are working with data that is mostly tables with the same formula in every row of a column, you can use R1C1 formula notation for your formula and set an entire column equal to the formula string to set the whole thing in one call.
app.ReferenceStyle = xlR1C1
app.ActiveSheet.Columns(2) = "=SUBSTITUTE(C[-1],"foo","bar")"
Also, creating XLL add-ins using ExcelDNA & .NET (or the hard way in C) is also the only way you can get UDFs to run on multiple threads. (See Excel DNA's ExcelFunction attribute's IsThreadSafe property.)
Before I transitioned to Excel DNA completely, I also experimented with creating COM visible libraries in .NET to reference in VBA projects. Heavy text processing is a bit faster than VBA that way, as are using wrapped .NET List classes instead of VBA's Collection, but Excel DNA is better.

Categories

Resources