I'm an intermediate C# programmer, but I'm just starting out with Office automation, specifically Excel for now. I've got to say, the Office API is lacking, or at least it forces you to think about problems differently. One thing that's driving me nuts is cell numbers, such as A1 and B5 and so on. I'm forced to manipulate them often, but there's no easy way to do this. For example, if I'm on column C7 and want to copy or move something to B7, I can't just use --C7. Instead I have to figure out the numerical value of C, decrement it, turn it back into a letter then concatenate it with the row number again.
I could write methods to do this myself (e.g. decrementColumn(), decrementRow(), addColumns( String currentCellName, int howManyToAdd) ), but I don't want to reinvent the wheel. Does a library of functions exist for such oft-needed conversions or am I going to have to roll my own?
To copy/move values easily, you can use the .Offset method, which returns a Range.
For example, if the range/cell you are working with is C7, where rng represents this Range object:
rng.Offset(0,-1).Value = rng.Value
This returns the range, offset by -1 colums.
rng.Offset(10,15) would return a cell/range 10 rows below, and 15 columns right, etc.
You may also look at R1C1 address style in Excel, although I have never been fond of that. This link for Excel 2007 but should be mostly appropriate for any version of Excel.
http://msdn.microsoft.com/en-us/library/office/ee264226(v=office.12).aspx
Related
I want to copy an 1D array to a column range in excel. I'm using interop for this purpose.
I have already tried these things:
range.get_Resize(Ary.Length, 1).Value2 = Ary;
range.set_Value(Excel.XlRangeValueDataType.xlRangeValueDefault, Ary);
and as simple as range.Value = Ary;
I have tried using even range.value2, but these things copy the very first index value in the array to the entire range.
So say suppose, if there are 200 rows in the range and the array contains integers 101-300, than only 101 is copied throughout the range with the above tried methods.
Can anyone please help me with this? It would be more helpful if someone can explain me this strange behavior! Thanks in advance.
Note: I know I can do it through a for loop, but that takes time. I would surely like to know a method which takes less than a minute to iterate a million rows.
I seriously don't know what exactly is wrong with the above methods. But I found the solution:
Excel.Range range = sheetSource.UsedRange.Columns[columnIndx];
range.Value = application.WorksheetFunction.Transpose(Ary);
I have an ETL that's saving data to an Excel file. The issue is that the decimals are not being written out for integers. Example:
14.00
is being written out as
14
My code for writing out that line is
loWorksheet.Cells[liRowNum, 5] = lcAmount.ToString("0.00");
When I step through the code, it shows as 14.00, but on the Excel file it is not retaining the decimal places. Is this something that can be fixed in my code or is this an Excel issue? Any suggestions?
I'm quite sure you have to set format for your cells. I can't check right now, but it will be something like
xlYourRange.NumberFormat = "0.00";
You can check this question Set data type like number, text and date in excel column using Microsoft.Office.Interop.Excel in c#
If you really want the data to be displayed literally the way it is in the source file, you have to deal with trade-offs. The simplest way is to format the data as text. You can do this a cell at a time or for entire columns:
loWorksheet.Columns["A:E"].NumberFormat = "#";
The trade-off is it's just text at this point. You can't add, sum, average, whatever.
On the other hands, if your data looks like this:
4.0
4.00
4.000
You can't really keep it as numbers and expect to retain the original format without doing some funny business.
If it's consistently two decimal places, and you know it's going to be, then I agree with #RenatZamaletdinov's solution.
And you might want to consider other strings and what Excel might to do them
0000123 becomes 123
10/23 will probably render as a date, depending on your localization
12345678901234567890 will render as scientific notation probably
These are all avoided if you make the numeric format text (#), but again without knowing what you plan to do with the data, it's hard to say if this is the correct approach.
Wrap lcAmount.ToString("0.00");in a pair of quotes and put an equal sign in front of it. This will prevent excel from overriding the format.
loWorksheet.Cells[liRowNum, 5] = "=" + '"' lcAmount.ToString("0.00") + '"';
I am using Excel Library - http://code.google.com/p/excellibrary/ - To generate an excel 2003 spreadsheet. Everything works fine except when some big values are used.
These are some reference numbers that are used by a client and I simply need to present them as integer values in the spreadsheet.
int val = 1420007117;
worksheet.Celss[row, col] = new Cell(val); // Displays - 352108063
This results in the value 352108063 being displayed in the spreadsheet. If the value is lower, then it displays fine.
Anyone know what the issue might be, or how to work around this problem. Outputting the value as string is not possible as it leaves a green Number stored as Text error.
I would say that excel doesn't support 64-bit integers and excellibrary doesn't care about it.
For such big numbers you better use floating point. This is how Excel handles big numbers.
My C# code manipulates Excel Ranges using Microsoft.Office.Interop.Excel library. I need to assign a Formula Array to a selected Range. I've tried a variety of methods recommended online, including Microsoft recommendations, but so far was unable to make it work properly.
I observe 2 issues:
Issue 1.
Assignment looks fine on surface: it does not fail, cell objects in the range show .ArrayFormula property assigned, on the spreadsheet formula in every cell appears in curly brackets. However, the Formula Array is actually disjointed: each cell in the range can be changed separately, which normal Formula Array would not permit. It behaves as if every cell had its own, single-cell Formula Array, independent from others. Regardless of my best efforts, this is ALWAYS the case.
Is there actually a properly working solution for this issue?
Issue 2.
My Array Formula contains a reference to another Range (Range A), which I need to refer to in R1C1 style. I need Array Formula in every cell in the target Range point to the same Range A. Somehow I always end up with every cell in target Range having its own version of the formula, referring to shifted "Range A" area. How do I make the reference stay in place, regardless of a cell?
N.B. You may assume that Issue 2 is causing Issue 1, but this is not the case: for example, when array formula is simple, like "=SIN(1)", the Issue 1 still occurs.
I would really appreciate any WORKING suggestions. Thanks a lot in advance.
No one seemed interested, however I found a solution and will answer to my own question.
Apparently, assignment of an Excel Array Formula within C# code works only if the formula is in A1 style, not in R1C1 style. In my case, I was starting with a R1C1-style formula, so it required conversion to A1 style. This is achieved by assigning the original R1C1-style formula to the top left cell of the target range:
topLeftCell.Formula = myR1C1Formula;
// topLeftCell.FormulaR1C1 = myR1C1Formula also works
Assignment to that particular cell will ensure that A1-style formula contains correct references. Get back the converted formula as a string:
string formulaA1 = topLeftCell.Formula;
Get reference to the whole target range by rezising the top left cell:
Excel.Range newArrayRange = topLeftCell.Resize[height, width];
Resize operation must precede the following assignment. Finally, assign the A1-style formula to the FormulaArray property of the whole target range:
newArrayRange.FormulaArray = formulaA1;
This works perfectly without issues or side-effects.
I was wondering what I could do to improve the performance of Excel automation, as it can be quite slow if you have a lot going on in the worksheet...
Here's a few I found myself:
ExcelApp.ScreenUpdating = false -- turn off the redrawing of the screen
ExcelApp.Calculation = Excel.XlCalculation.xlCalculationManual -- turning off the calculation engine so Excel doesn't automatically recalculate when a cell value changes (turn it back on after you're done)
Reduce calls to Worksheet.Cells.Item(row, col) and Worksheet.Range -- I had to poll hundreds of cells to find the cell I needed. Implementing some caching of cell locations, reduced the execution time from ~40 to ~5 seconds.
What kind of interop calls take a heavy toll on performance and should be avoided? What else can you do to avoid unnecessary processing being done?
When using C# or VB.Net to either get or set a range, figure out what the total size of the range is, and then get one large 2 dimensional object array...
//get values
object[,] objectArray = shtName.get_Range("A1:Z100").Value2;
iFace = Convert.ToInt32(objectArray[1,1]);
//set values
object[,] objectArray = new object[3,1] {{"A"}{"B"}{"C"}};
rngName.Value2 = objectArray;
Note that its important you know what datatype Excel is storing (text or numbers) as it won't automatically do this for you when you are converting the type back from the object array. Add tests if necessary to validate the data if you can't be sure beforehand of the type of data.
This is for anyone wondering what the best way is to populate an excel sheet from a db result set. This is not meant to be a full list by any means but it does list a few options.
Some performance numbers while attempting to populate an excel sheet with 155 columns and 4200 records on an old Pentium 4 3GHz box including data retrieval time which was never more than 10 seconds in order of slowest to fastest is as follows...
One cell at a time - Just under 11 minutes
Populating a dataset by converting to html + Saving html to disk + Loading html into excel and saving worksheet as xls/xlsx - 5 minutes
One column at a time - 4 minutes
Using the deprecated sp_makewebtask procedure in SQL 2005 to create an HTML file - 9 Seconds + Followed by loading the html file in excel and saving as XLS/XLSX - About 2 minutes.
Convert .Net dataset to ADO RecordSet and use the WorkSheet.Range[].CopyFromRecordset function to populate excel - 45 seconds!
I ended up using option 5. Hope this helps.
If you're polling values of many cells you can get all the cell values in a range stored in a variant array in one fell swoop:
Dim CellVals() as Variant
CellVals = Range("A1:B1000").Value
There is a tradeoff here, in terms of the size of the range you're getting values for. I'd guess if you need a thousand or more cell values this is probably faster than just looping through different cells and polling the values.
Use excels builtin functionality whenever possible, for example: Instead of searching a whole column for a given string, use the find command available in the GUI by Ctrl-F:
Set Found = Cells.Find(What:=SearchString, LookIn:=xlValues, _
SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False)
If Not Found Is Nothing Then
Found.Activate
(...)
EndIf
If you want to sort some lists, use the excel sort command, don't do it manually in VBA:
Selection.Sort Key1:=Range("A1"), Order1:=xlAscending, Header:=xlGuess, _
OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
As Anonymous Type says: reading/writing large range blocks is very important to performance.
In cases where the COM-Interop overhead is still too large you may want to switch to using the XLL interface, which is the fastest Excel interface.
Although the XLL interface is primarily meant for C++ users, both XL DNA and Addin Express provide .NET to XLL bridge capability which is significantly faster than COM-Interop.
Performance also depends a lot on how you automate Excel. VBA is faster than COM automation is faster than .NET automation. And typically early (compile time) binding is faster than late binding, too.
If you have serious performance problems you could think of moving the critical parts of the code to a VBA module and call that code from your COM/.NET automation code.
If you use .NET you should also use the optimized primary interop assemblies available from Microsoft and not use custom-built interop assemblies.
Another big thing you can do in VBA is to use Option Explicit and avoid Variants wherever possible. Variants are not 100% avoidable in VBA, but they make the interpreter do more work at runtime and waste memory.
I found this article very helpful when I was starting with VBA in Excel.
http://www.ozgrid.com/VBA/SpeedingUpVBACode.htm
And this book
http://www.amazon.com/VB-VBA-Nutshell-Language-OReilly/dp/1565923588
Similar to
app.ScreenUpdates = false //and
app.Calculation = xlCalculationManual
you can also set
app.EnableEvents = false //Prevent Excel events
app.Interactive = false //Prevent user clicks and keystrokes
although they don't seem to make as big a difference as the first two.
Similar to setting Range values to arrays, if you are working with data that is mostly tables with the same formula in every row of a column, you can use R1C1 formula notation for your formula and set an entire column equal to the formula string to set the whole thing in one call.
app.ReferenceStyle = xlR1C1
app.ActiveSheet.Columns(2) = "=SUBSTITUTE(C[-1],"foo","bar")"
Also, creating XLL add-ins using ExcelDNA & .NET (or the hard way in C) is also the only way you can get UDFs to run on multiple threads. (See Excel DNA's ExcelFunction attribute's IsThreadSafe property.)
Before I transitioned to Excel DNA completely, I also experimented with creating COM visible libraries in .NET to reference in VBA projects. Heavy text processing is a bit faster than VBA that way, as are using wrapped .NET List classes instead of VBA's Collection, but Excel DNA is better.