I am using the excel pia's to do some writing and reading to/from excel spreadsheets, i may just be being paranoid but i have the following questions:
As far as i can tell Excel recalculates the formulas in the worksheet upon every write but...
is this the case? - ie is it possible to do series of write read write read and not to read the correct recalculations (eg if its a complex formula and takes too long could i end up reading a value that has not been recalculated yet?)
is there anyway to do something like:
BeginUpdate(); write lots of values EndUpdate(); Recalculate(); readlotsofvalues ?
I have not seen any dodgy results but i would like to be able to know "for sure" ;)
Some VBA functions that will work are here, to use these you can use the SpreadsheetClass in Interop.
For C#, you have the Calculate() function.
Related
I hope someone can help me. Is there a way to embed a specific file (.txt) into an excel cell? I'm currently using epplus, and I would like to embed programmatically a file into a specific excel cell. I did manage to add a hyperlink, but my goal is to have it embedded.
Worksheet.Cells[rowNumber, colNumber].Value = ....
Is there any way to do it? I couldn't find anything online.
As mentioned in the comments, you can certainly put text within a cell, but bear in mind Excel does have a limit to the number of characters it will allow in a single cell. It's pretty large, but conceivably the contents of a text file could exceed that limit -- even if future versions of Excel keep increasing what the limit is (as they have in the past).
You can also embed an OLE object in your worksheet, and a text file qualifies for that. I don't know that you can assign it to a cell, per se. You can change the location, shape and behavior to fit in a cell and behave as though it's part of a cell, but I don't know that it ever belongs to a range the way formulas do. I could be wrong.
The basic construct of how to embed an OLE object into a worksheet is as follows:
Excel.OLEObject ole = ws.OLEObjects().Add(Filename: #"C:\Users\hambone\Documents\foo.txt");
This is the equivalent of the VBA:
Set ole = sh.OLEObjects.Add(Filename:="C:\Users\hambone\Documents\foo.txt")
The method returns an OLEObject object, which you can then shape to behave the way you want:
ole.Height = 5;
Using OpenXML, you can get a list of the named ranges in an Excel document using something similar to:
IEnumerable<DefinedName> names = document
.WorkbookPart
.Workbook
.DefinedNames
.Cast<DefinedName>();
Each of these DefinedName's has a Text property, which defines the range that it refers to, e.g.
Sheet1!$B$3:$D$8
which we can then parse, and use to retrieve the data. At least that's how I understand the process so far.
However, with a dynamic range, the text property can contain something like:
OFFSET(Sheet1!$F$3,0,0,COUNTA(Sheet1!$F:$F),1)
This is not a range, it is a formula which returns a range, and it is the result of this formula that I need.
Is it possible to calculate this formula, or is the result already stored somewhere in the spreadsheet that I can read? Or is there some other way in which I can read a dynamic named range?
This question is specifically about OpenXML. I know that it can be done using other tools.
Excel Defined Names are really named formulas rather than named ranges. So you would need a method such as VBA Evaluate to coerce the formula to a range or a result.
AFAIK OpenXML does not have such a method, so you would have to write your own formula parser and evaluator, or use some other tool.
I need to parse an Excel file. First I wrote an extension in Visual Basic inside the Excel file, all worked good. Now I need to port it to C# so it can be a separate application. While the functions I use are the same, the result is not the same...
When I choose from the GUI which Worksheet to parse, I do something like:
range = (workbook.Worksheets.get_Item(itemIndex) as Excel.Worksheet).UsedRange;
Then, for the first row I need to parse I do something like:
range.get_Range(range.Cells.get_Item(6, 2),
range.Cells.get_Offset(6,2).get_End(Excel.XlDirection.xlToRight)))
And I get the right result with all the fields I need.
The second time when I need to get another row, I do:
range.get_Range(range.Cells.get_Item(13, 3),
range.Cells.get_Offset(13, 3).get_End(Excel.XlDirection.xlToRight)))
This time it gives me all the elements except the last one. And I have more functions like this, some with XlDirection.xlDown and all of them return me the range without the last element.
I tried to swap the functions, thinking may be I need to release range and then acquire it again or something(wanted to check if it's always working only for the first function being executed) but it is always working only for the first example, whenever the function is being executed...
This is even stranger because it worked in VBA Excel.
I also tired with Excel.Application get_Range and Excel.Worksheet get_Range...
Anyone knows why this happens?
I managed to solve this strange behavior. It's not the correct way of getting out the data.
The correct way would be: range.get_Range(range.Cells[6, 2], (range.Cells[6, 2] as Excel.Range).get_End(Excel.XlDirection.xlToRight)) - for the first example.
Hope it helps somebody...
I'm opening up xlsx files as a package and reading the contents of the xml files. I'm able to get the shared strings, borders, etc that I need and it's orders of magnitude faster than when I was using Interop. The only issue I have is when it comes to pulling out numbers and formatting them properly based on what the formatting is in the Excel file.
Is there a generic function somewhere that takes a value and a format and returns the formatted string? For example, if I have the value 31502008 and the custom format "$* #,##0_);$* (#,##0)" is there a simple way to get what Excel shows (which is $31,502,008). Obviously Excel knows how to handle it, but I have some sheets that have a crazy number of custom formats and I'm wondering how best to ensure that the string I get back in code matches what is seen in Excel.
Any ideas?
Thanks a lot for any help.
I was wondering what I could do to improve the performance of Excel automation, as it can be quite slow if you have a lot going on in the worksheet...
Here's a few I found myself:
ExcelApp.ScreenUpdating = false -- turn off the redrawing of the screen
ExcelApp.Calculation = Excel.XlCalculation.xlCalculationManual -- turning off the calculation engine so Excel doesn't automatically recalculate when a cell value changes (turn it back on after you're done)
Reduce calls to Worksheet.Cells.Item(row, col) and Worksheet.Range -- I had to poll hundreds of cells to find the cell I needed. Implementing some caching of cell locations, reduced the execution time from ~40 to ~5 seconds.
What kind of interop calls take a heavy toll on performance and should be avoided? What else can you do to avoid unnecessary processing being done?
When using C# or VB.Net to either get or set a range, figure out what the total size of the range is, and then get one large 2 dimensional object array...
//get values
object[,] objectArray = shtName.get_Range("A1:Z100").Value2;
iFace = Convert.ToInt32(objectArray[1,1]);
//set values
object[,] objectArray = new object[3,1] {{"A"}{"B"}{"C"}};
rngName.Value2 = objectArray;
Note that its important you know what datatype Excel is storing (text or numbers) as it won't automatically do this for you when you are converting the type back from the object array. Add tests if necessary to validate the data if you can't be sure beforehand of the type of data.
This is for anyone wondering what the best way is to populate an excel sheet from a db result set. This is not meant to be a full list by any means but it does list a few options.
Some performance numbers while attempting to populate an excel sheet with 155 columns and 4200 records on an old Pentium 4 3GHz box including data retrieval time which was never more than 10 seconds in order of slowest to fastest is as follows...
One cell at a time - Just under 11 minutes
Populating a dataset by converting to html + Saving html to disk + Loading html into excel and saving worksheet as xls/xlsx - 5 minutes
One column at a time - 4 minutes
Using the deprecated sp_makewebtask procedure in SQL 2005 to create an HTML file - 9 Seconds + Followed by loading the html file in excel and saving as XLS/XLSX - About 2 minutes.
Convert .Net dataset to ADO RecordSet and use the WorkSheet.Range[].CopyFromRecordset function to populate excel - 45 seconds!
I ended up using option 5. Hope this helps.
If you're polling values of many cells you can get all the cell values in a range stored in a variant array in one fell swoop:
Dim CellVals() as Variant
CellVals = Range("A1:B1000").Value
There is a tradeoff here, in terms of the size of the range you're getting values for. I'd guess if you need a thousand or more cell values this is probably faster than just looping through different cells and polling the values.
Use excels builtin functionality whenever possible, for example: Instead of searching a whole column for a given string, use the find command available in the GUI by Ctrl-F:
Set Found = Cells.Find(What:=SearchString, LookIn:=xlValues, _
SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False)
If Not Found Is Nothing Then
Found.Activate
(...)
EndIf
If you want to sort some lists, use the excel sort command, don't do it manually in VBA:
Selection.Sort Key1:=Range("A1"), Order1:=xlAscending, Header:=xlGuess, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
As Anonymous Type says: reading/writing large range blocks is very important to performance.
In cases where the COM-Interop overhead is still too large you may want to switch to using the XLL interface, which is the fastest Excel interface.
Although the XLL interface is primarily meant for C++ users, both XL DNA and Addin Express provide .NET to XLL bridge capability which is significantly faster than COM-Interop.
Performance also depends a lot on how you automate Excel. VBA is faster than COM automation is faster than .NET automation. And typically early (compile time) binding is faster than late binding, too.
If you have serious performance problems you could think of moving the critical parts of the code to a VBA module and call that code from your COM/.NET automation code.
If you use .NET you should also use the optimized primary interop assemblies available from Microsoft and not use custom-built interop assemblies.
Another big thing you can do in VBA is to use Option Explicit and avoid Variants wherever possible. Variants are not 100% avoidable in VBA, but they make the interpreter do more work at runtime and waste memory.
I found this article very helpful when I was starting with VBA in Excel.
http://www.ozgrid.com/VBA/SpeedingUpVBACode.htm
And this book
http://www.amazon.com/VB-VBA-Nutshell-Language-OReilly/dp/1565923588
Similar to
app.ScreenUpdates = false //and
app.Calculation = xlCalculationManual
you can also set
app.EnableEvents = false //Prevent Excel events
app.Interactive = false //Prevent user clicks and keystrokes
although they don't seem to make as big a difference as the first two.
Similar to setting Range values to arrays, if you are working with data that is mostly tables with the same formula in every row of a column, you can use R1C1 formula notation for your formula and set an entire column equal to the formula string to set the whole thing in one call.
app.ReferenceStyle = xlR1C1
app.ActiveSheet.Columns(2) = "=SUBSTITUTE(C[-1],"foo","bar")"
Also, creating XLL add-ins using ExcelDNA & .NET (or the hard way in C) is also the only way you can get UDFs to run on multiple threads. (See Excel DNA's ExcelFunction attribute's IsThreadSafe property.)
Before I transitioned to Excel DNA completely, I also experimented with creating COM visible libraries in .NET to reference in VBA projects. Heavy text processing is a bit faster than VBA that way, as are using wrapped .NET List classes instead of VBA's Collection, but Excel DNA is better.