How to retrieve efficiently all strings from a large Excel documents - c#

The Excel spreadsheet should be read by .NET. It is very efficient to read all values from the active range by using the property Value. This transfers all values in a two dimensional array, by one single call to Excel.
However reading strings is not possible for a range which contains more than one single cell. Therefor we have to iterate over all cells and use the Text property. This shows very poor performance for larger document.
The reason of using strings rather than values is to obtains the correct format (for instance for dates or the number of digits).
Here is a sample code written in C# to demonstrate the approach.
static void Main(string[] args)
{
Excel.Application xlApp = (Excel.Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Excel.Application");
var worksheet = xlApp.ActiveSheet;
var cells = worksheet.UsedRange();
// read all values in array -> fast
object[,] arrayValues = cells.Value;
// create array for text of the same extension
object[,] arrayText = (object[,])Array.CreateInstance(typeof(object),
new int[] { arrayValues.GetUpperBound(0), arrayValues.GetUpperBound(1) },
new int[] { arrayValues.GetLowerBound(0), arrayValues.GetLowerBound(1) });
// read text for each cell -> slow
for (int row = arrayValues.GetUpperBound(0); row <= arrayValues.GetUpperBound(0); ++row)
{
for (int col = arrayValues.GetUpperBound(0); col <= arrayValues.GetUpperBound(1); ++col)
{
object obj = cells[row, col].Text;
arrayText[row, col] = obj;
}
}
}
The question is, if there is a more efficient way to read the complete string content from an Excel document. One idea was to use cells.Copy to copy the content to the clipboard to get it from there. However this has some restrictions and could of course interfere with users which are working with the clipboard at the same time. So I wonder if there are better approaches to solve this performance issue.

You can use code below:
using (MSExcel.Application app = MSExcel.Application.CreateApplication())
{
MSExcel.Workbook book1 = app.Workbooks.Open( this.txtOpen_FilePath.Text);
MSExcel.Worksheet sheet = (MSExcel.Worksheet)book1.Worksheets[1];
MSExcel.Range range = sheet.GetRange("A1", "F13");
object value = range.Value; //the value is boxed two-dimensional array
}
The code is provided from this post. It should be much more efficient than your code, but may not be the best.

Related

C# Efficient way to iterate over excel worksheet

I have the following code:
string result = "";
for(int i=2; i<=excelRange.Rows.Count; i++)
{
result += excelRange.Cells[i, 2].Value2;
}
For an excel file with a couple hundred entries this takes 5 seconds. Is there a more efficient way, perhaps? I only want the values from B2 to Bn.
Yes, there is a more efficient way.
Create a range that exactly matches the cells that you really need.
Get the Value2 property of this range. The result will be an array type.
Iterate through the array
The problem with your approach is the large number of inter-process requests between your application and Excel. Your approach requires two or three requests per cell. The proposed approach is much faster because it requires a few requests up-front but not additional requests per cell.
Note that this works up to about 4000 cells. If you need to process more cells, you will need to split it into several ranges, each one containing less than 4000 cells.
Update
Assuming Excel is already running, it would look something like this (the correct number of rows in the B column is automatically selected):
var excelApp = (Excel.Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Excel.Application");
Excel._Worksheet workSheet = (Excel.Worksheet)excelApp.ActiveSheet;
var range = (Excel.Range)workSheet.Range[workSheet.Range["B2"],
workSheet.Range["B2"].End[Excel.XlDirection.xlDown]];
var cellData = (Object[,])range.Value2;
string result = "";
foreach (var cell in cellData) {
result += cell.ToString();
}
The basic skeleton is the following:
var xlApp = new Excel.Application { Visible = true };
var xlBook = xlApp.Workbooks.Open(#"C:\Temp\Results.xlsx");
var xlSheet = xlBook.Sheets[1] as Excel.Worksheet;
var arr = (object[,])xlSheet.Range["B2:B100000"].Value;
var sb = new StringBuilder();
for (int x = 1; x <= arr.GetUpperBound(0); ++x)
{
sb.Append(arr[x, 1]);
}
var final_string = sb.ToString();
// Close workbook, close Excel...

How to get current selected excel sheet data without using oledb connection in c#

I am working on vsto application , i have one open workbook . i want to read selected sheet data from that workbook without using any oledb connection is there any way to read the data and store in datatable.
The tricky part is figuring out if the current selection is valid for what you want to do. In Excel's VBA world you'd work with the VBA information function TypeName to determine whether the current Selection is a Range object. C# doesn't have a direct equivalent, so you have to work around it. If all you're interested in is a Range, then you can check whether a direct conversion to an Excel.Range is valid and procede from there. A Range object will return an array, which you can put in a data set.
The following code sample shows how to test the Selection and work with the resulting array. It doesn't do anything with a dataset - that would be a different question.
object oSel = Globals.ThisAddIn.Application.Selection;
if ((oSel as Excel.Range) != null)
{
Excel.Range rngSelection = (Excel.Range)oSel;
object[,] data = rngSelection.Value2;
int rank = data.Rank;
int lbound = data.GetLowerBound(rank-1);
int ubound = data.GetUpperBound(rank-1);
for (int i = 1; i <= rank; i++)
{
for (int l = lbound; l <= ubound; l++)
{
System.Diagnostics.Debug.Print(data[i,l].ToString());
}
}
}
An alternative to using the cast test involves working with the COM APIs. If you needed to take various actions depending on the type of Selection this approach might be more effective. It's described here: https://www.add-in-express.com/creating-addins-blog/2011/12/20/type-name-system-comobject/

C# how to iterate over excel columns

I want to get a specific column of an excel sheet and then iterate through it's cells. I want it to look something like this:
Excel.Workbook workbook = app.Workbooks.Open(svDropPath);
Excel.Worksheet xlWorkSheet = (Excel.Worksheet)workbook.Sheets["Sheet Name"];
var col = xlWorkSheet.UsedRange.Columns["C:C", Type.Missing]; // I want the 3rd column
foreach(Cell c in col)
....
How do I actually make this foreach loop?
Your loop will looks as follow:
foreach (Excel.Range item in col.Cells)
{
//whatever you want to do with your cells, here- msgbox of cells value
MessageBox.Show(Convert.ToString(item.Value));
}
I believe there is no nice way of doing it other than to loop through the indices in question and use either Cells or Rows:
for (int i = 1; i <= max; i++)
{
Range cell = col.Cells[i, 1];
// or
Range cell = col.Rows[i];
}
However, note that if you are reading and/or writing all the cells, you are much better off reading/writing the whole column to/from an array of object, and then looping through the array items, as outlined in my answer https://stackoverflow.com/a/18058144/1737957 . Not only is this much faster, you can also use nicer language constructs for looping since you are now dealing with a straightforward C# array.
The only reason you would have to loop rather than do this AFAIK is if you were accessing something like conditional formats etc., rather than just cell contents, and you couldn't write a whole range of them in one statement. However there may be ways of doing these too using arrays.

How to put array into excel range

I know how to write single cell into excel but when im trying it on array excel sheet is filling with only last value
this is my range
Excel.Range ServiceName = (Excel.Range)_sheet.get_Range(_sheet.Cells[38, "B"] as Excel.Range, _sheet.Cells[45, "B"] as Excel.Range);
_ServiceName is List which contains 1,2,3,4,5,6
for (int i = 0; i < _ServiceName.Count; i++)
{
ServiceNameArray[0, i] = _ServiceName[i];
}
this i my trying to write into excel but as i said it there is only last item (6) in excel book
for (int i = 0; i < _ServiceName.Count; i++)
{
ServiceName.set_Value(Type.Missing, ServiceNameArray[0,i]);
}
does anyone have an idea?
Davide Piras is right. And you're doing a few other strange things there, I can elaborate by request.
For now I just want to point out that you can directly assign the .Value property of a Range to an array:
ServiceName.Value2 = _ServiceName.toArray();
This is much, much faster for bigger amounts of data.
(Side note: If you want to do the same with Formulas, for some strange reason you have to take an extra step (doubling the time):
range.Formula = array;
range.Formula = range.Formula;
unless there is a better way I don't know about yet.)
I see you looping on the ServiceName array to get all values one after the other but not see you changing the focused cell inside the cellrange at every loop iteration. Of course, I would say, you see only the last value, because you are writing all values one over the other always in the same place.

Optimized way of adding multiple hyperlinks in excel file with C#

I wanted to ask if there is some practical way of adding multiple hyperlinks in excel worksheet with C# ..? I want to generate a list of websites and anchor hyperlinks to them, so the user could click such hyperlink and get to that website.
So far I have come with simple nested for statement, which loops through every cell in a given excel range and adds hyperlink to that cell:
for (int i = 0; i < _range.Rows.Count; i++)
{
Microsoft.Office.Interop.Excel.Range row = _range.Rows[i];
for (int j = 0; j < row.Cells.Count; j++)
{
Microsoft.Office.Interop.Excel.Range cell = row.Cells[j];
cell.Hyperlinks.Add(cell, adresses[i, j], _optionalValue, _optionalValue, _optionalValue);
}
}
The code is working as intended, but it is Extremely slow due to thousands of calls of the Hyperlinks.Add method.
One thing that intrigues me is that the method set_Value from Office.Interop.Excel can add thousands of strings with one simple call, but there is no similar method for adding hyperlinks (Hyperlinks.Add can add just one hyperlink).
So my question is, is there some way to optimize adding hyperlinks to excel file in C# when you need to add a large number of hyperlinks...?
Any help would be apreciated.
I am using VS2010 and MS Excel 2010.
I have the very same problems (adding 300 hyperlinks via Range.Hyperlinks.Add takes approx. 2 min).
The runtime issue is because of the many Range-Instances.
Solution:
Use a single range instance and add Hyperlinks with the "=HYPERLINK(target, [friendlyName])" Excel-Formula.
Example:
List<string> urlsList = new List<string>();
urlsList.Add("http://www.gin.de");
// ^^ n times ...
// create shaped array with content
object[,] content = new object [urlsList.Count, 1];
foreach(string url in urlsList)
{
content[i, 1] = string.Format("=HYPERLINK(\"{0}\")", url);
}
// get Range
string rangeDescription = string.Format("A1:A{0}", urlsList.Count+1) // excel indexes start by 1
Xl.Range xlRange = worksheet.Range[rangeDescription, XlTools.missing];
// set value finally
xlRange.Value2 = content;
... takes just 1 sec ...

Categories

Resources