C# Efficient way to iterate over excel worksheet - c#

I have the following code:
string result = "";
for(int i=2; i<=excelRange.Rows.Count; i++)
{
result += excelRange.Cells[i, 2].Value2;
}
For an excel file with a couple hundred entries this takes 5 seconds. Is there a more efficient way, perhaps? I only want the values from B2 to Bn.

Yes, there is a more efficient way.
Create a range that exactly matches the cells that you really need.
Get the Value2 property of this range. The result will be an array type.
Iterate through the array
The problem with your approach is the large number of inter-process requests between your application and Excel. Your approach requires two or three requests per cell. The proposed approach is much faster because it requires a few requests up-front but not additional requests per cell.
Note that this works up to about 4000 cells. If you need to process more cells, you will need to split it into several ranges, each one containing less than 4000 cells.
Update
Assuming Excel is already running, it would look something like this (the correct number of rows in the B column is automatically selected):
var excelApp = (Excel.Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Excel.Application");
Excel._Worksheet workSheet = (Excel.Worksheet)excelApp.ActiveSheet;
var range = (Excel.Range)workSheet.Range[workSheet.Range["B2"],
workSheet.Range["B2"].End[Excel.XlDirection.xlDown]];
var cellData = (Object[,])range.Value2;
string result = "";
foreach (var cell in cellData) {
result += cell.ToString();
}

The basic skeleton is the following:
var xlApp = new Excel.Application { Visible = true };
var xlBook = xlApp.Workbooks.Open(#"C:\Temp\Results.xlsx");
var xlSheet = xlBook.Sheets[1] as Excel.Worksheet;
var arr = (object[,])xlSheet.Range["B2:B100000"].Value;
var sb = new StringBuilder();
for (int x = 1; x <= arr.GetUpperBound(0); ++x)
{
sb.Append(arr[x, 1]);
}
var final_string = sb.ToString();
// Close workbook, close Excel...

Related

How can I efficiently compare contiguous and sequential rows using C# in Excel?

I am developing a VSTO add-in for Excel in C# that needs to compare potentially large datasets (100 columns x ~10000 or more rows). It is being done in Excel so an end user can view some pictorial representation of the provided data on a row-by-row basis. This application must be done in Excel despite the potential pitfalls of using these large datasets.
Regardless, my question pertains to an efficient way to compare contiguous and sequential rows. My goal is to compare one row to the row directly after it; if there is a change of any of the elements between row1 and row2, this counts as an "event" and row2 output into a separate sheet. I'm sure you can see that for row-wise comparison of rows when the count is around 10000, this takes a long time (in practice, this is about 150ms-200ms per row for the current code).
Currently, I have used the SequenceEqual() method to compare two lists of strings as follows:
private void FilterRawDataForEventReader(Excel.Application xlApp)
{
List<string> row1 = new List<string>();
List<string> row2 = new List<string>();
xlWsRaw = xlApp.Worksheets["Full Raw Data"];
xlWsEventRaw = xlApp.Worksheets["Event Data"];
Excel.Range xlRawRange = xlWsRaw.Range["A3"].Resize[xlWsRaw.UsedRange.Rows.Count-2, xlWsRaw.UsedRange.Columns.Count];
var array = xlRawRange.Value;
Excel.Range xlRange = (Excel.Range)xlWsEventRaw.Cells[xlWsEventRaw.UsedRange.Rows.Count, 1];
int lastRow = xlRange.get_End(Excel.XlDirection.xlUp).Row;
int newRow = lastRow + 2;
for (int i = 1; i < xlWsRaw.UsedRange.Rows.Count - 2; i++)
{
row1.Clear();
row2.Clear();
for (int j = 1; j <= xlWsRaw.UsedRange.Columns.Count-1; j++)
{
row1.Add(array[i, j].ToString());
row2.Add(array[i + 1, j].ToString());
}
if (!row1.SequenceEqual(row2))
{
row2.Add(array[i + 1, xlWsRaw.UsedRange.Columns.Count].ToString()); // Add timestamp to row2.
for (int j = 0; j < row2.Count; j++)
{
xlWsEventRaw.Cells[newRow, j + 1] = row2[j];
}
newRow++;
}
}
}
During testing, I placed timers are various parts of this method to see how long certain operations take. For 100 columns, the first loop which builds the string arrays for row1 and row2 takes around 100ms per iteration and the whole operation takes between 150ms-200ms when an "event" has been found.
My intuition is that building the two List<string> is the problem but I do not know how else to approach this kind of problem in my experience. I should emphasize, the actual values of the data in the two List<string> don't matter; what matters is if the data are different at all. In that way, I feel that I am approaching this problem incorrectly but don't know how to "re-approach" so to say.
I am wondering if, instead of building arrays of strings through iteration and comparing them with the SequenceEqual() method, anyone can suggest a faster way to compare contiguous and sequential rows?
In case this solution may be useful for someone else trying to use Excel in C# and do some comparisons:
This problem was largely an optimization exercise. By eliminating the multiple loops and using Excel instead to generate the comparison lists:
for (int i = 3; i < xlWsRaw.UsedRange.Rows.Count - 2; i++)
{
rng1 = (Excel.Range)xlWsRaw.Range[xlWsRaw.Cells[i, 1], xlWsRaw.Cells[i, xlWsRaw.UsedRange.Columns.Count - 1]];
rng2 = (Excel.Range)xlWsRaw.Range[xlWsRaw.Cells[i+1, 1], xlWsRaw.Cells[i+1, xlWsRaw.UsedRange.Columns.Count - 1]];
rng3 = (Excel.Range)xlWsEventRaw.Range[xlWsEventRaw.Cells[newRow, 1], xlWsEventRaw.Cells[newRow, xlWsRaw.UsedRange.Columns.Count - 1]];
object[,] cellValues1 = (object[,])rng1.Value2;
object[,] cellValues2 = (object[,])rng2.Value2;
List<string> test1 = cellValues1.Cast<object>().ToList().ConvertAll(x => Convert.ToString(x));
List<string> test2 = cellValues2.Cast<object>().ToList().ConvertAll(x => Convert.ToString(x));
if (!test1.SequenceEqual(test2))
{
rng2.Copy(rng3);
xlWsEventRaw.Cells[newRow, xlWsRaw.UsedRange.Columns.Count].Value = xlWsRaw.Cells[i + 1, xlWsRaw.UsedRange.Columns.Count].Value; // Outputs the timestamp of the event to the events worksheet.
newRow++;
}
}
I believe this can be optimized further but in my case, the ranges contain multiple types including strings so I convert everything to List<string> for the purpose of comparison. The SequenceEqual() method, however it works behind the scenes, is nearly instantaneous and reduces the time to compare 120 columns to around 3ms.

How to retrieve efficiently all strings from a large Excel documents

The Excel spreadsheet should be read by .NET. It is very efficient to read all values from the active range by using the property Value. This transfers all values in a two dimensional array, by one single call to Excel.
However reading strings is not possible for a range which contains more than one single cell. Therefor we have to iterate over all cells and use the Text property. This shows very poor performance for larger document.
The reason of using strings rather than values is to obtains the correct format (for instance for dates or the number of digits).
Here is a sample code written in C# to demonstrate the approach.
static void Main(string[] args)
{
Excel.Application xlApp = (Excel.Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Excel.Application");
var worksheet = xlApp.ActiveSheet;
var cells = worksheet.UsedRange();
// read all values in array -> fast
object[,] arrayValues = cells.Value;
// create array for text of the same extension
object[,] arrayText = (object[,])Array.CreateInstance(typeof(object),
new int[] { arrayValues.GetUpperBound(0), arrayValues.GetUpperBound(1) },
new int[] { arrayValues.GetLowerBound(0), arrayValues.GetLowerBound(1) });
// read text for each cell -> slow
for (int row = arrayValues.GetUpperBound(0); row <= arrayValues.GetUpperBound(0); ++row)
{
for (int col = arrayValues.GetUpperBound(0); col <= arrayValues.GetUpperBound(1); ++col)
{
object obj = cells[row, col].Text;
arrayText[row, col] = obj;
}
}
}
The question is, if there is a more efficient way to read the complete string content from an Excel document. One idea was to use cells.Copy to copy the content to the clipboard to get it from there. However this has some restrictions and could of course interfere with users which are working with the clipboard at the same time. So I wonder if there are better approaches to solve this performance issue.
You can use code below:
using (MSExcel.Application app = MSExcel.Application.CreateApplication())
{
MSExcel.Workbook book1 = app.Workbooks.Open( this.txtOpen_FilePath.Text);
MSExcel.Worksheet sheet = (MSExcel.Worksheet)book1.Worksheets[1];
MSExcel.Range range = sheet.GetRange("A1", "F13");
object value = range.Value; //the value is boxed two-dimensional array
}
The code is provided from this post. It should be much more efficient than your code, but may not be the best.

Array not filling with .Split Method

I am having an issue getting the array "whyWontYouWork" to populate with a value. In the following example, the value of rangeNames[j] is "$A$1:$A$10".
the string "group" will fill in correctly as "$A$1:$A$10" but the line above it shows up as "The name 'whyWontYouWork' does not exist in this context", so I am at a loss since it works once, and when I try to split the string, I get nothing. Any ideas?
private void CutStates(string[] sheetNames,string[] rangeNames, string[] idNums)
{
Excel.Application xlApp = (Excel.Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Excel.Application");
Excel.Workbook wkbk = null;
wkbk = xlApp.ActiveWorkbook;
for (int i = 0; i < idNums.Length; i++)
{
string stateId = idNums[i];
for (int j = 0; j < sheetNames.Length; j++)
{
string[] sheet = sheetNames[j].Split('!');
List<string> rowsToDelete = new List<string>();
List<string> reverseDelete = new List<string>();
string tabName = sheet[0];
string[] whyWontYouWork = rangeNames[j].Split(':');
string group = rangeNames[j];
Excel.Range range = wkbk.Sheets[tabName].Range[group];
foreach (Excel.Range cell in range)
{
string val2 = cell.Value.Substring(0, 2);
string cellAdd = cell.Address.ToString();
if (val2 != stateId)
{
string delCell = cell.Address.ToString();
rowsToDelete.Add(delCell);
}
}
reverseDelete = rowsToDelete.ToList();
reverseDelete.Reverse();
foreach (string item in reverseDelete)
{
Excel.Range delete = wkbk.Sheets[tabName].Range[item];
delete.Delete();
}
}//j
}//i
}
I plan on using the first part ($A$1) as the starting point of a group to delete from the top down, and the second part ($A$10) to be the starting point from the bottom up to delete.
I want to iterate through the cells in "group" one at a time and if the first two characters don't match the two character stateId. At that point I move to the next until I find one that matches, move back one row grab that address and then grab the start of the list ($A$1) and select from there to the last row that doesn't match, and delete the block. I will do the same for the row after the last match to the end ($A$10). I would do this line by line but I do this over 15K rows so one at a time is terribly slow. I hope that makes more sense.
From code it seems you are iterating through sheets (for (int j = 0; j < sheetNames.Length; j++)) so it is possible that for first sheet you have rangeNames[j] value ("$A$1:$A$10") and on other sheets you don't.
From what I remember Excel by default creates 3 sheets, so that's probably the problem.
Thanks to A few of you pointing out that the variable wasn't used, I checked my settings and it was optimizing the variable away. I thought I had changed that. Adding a simple console.writeline (instead of using the Watch) pushed the variable into use and made it stick.

how to set values to a two-dimensional Excel range?

I need to build an excel sheet from a list of test-cases in a specific format in order to upload it it to the server.
I've trubles to populate the two dimensional range of "expected" and "actual" in the file.
I use the same methods in order to populate the headers, which is a one-dimensional array, and the steps (which is two-dims).
The flow is:
Defunding the TestCase range (some headers + steps). Let's say: A1 to E14 for the 1st iteration.
Depunding a sub (local) range within the testCase range for the headers (e.g: A1 to C1).
Depunding another sub (local) range within the testCase range for the headers (in my case: D1 to E14).
Populate the two sub-ranges with a test-case values (headers and steps).
Repeat by defunding the next spreadsheet range (A14 to E28 in my case) with same local ranges (steps 2-3), and populate them, and so on...
The source value is a Dictionary which represents the test-case's steps (key = expected and value = actual).
Here is the code I use:
public class TestCase
{
public Dictionary<string, string> steps;
}
Microsoft.Office.Interop.Excel.Application excelApp = new Microsoft.Office.Interop.Excel.Application();
Workbooks workBooks = excelApp.Workbooks;
Workbook workbook = workBooks.Add(XlWBATemplate.xlWBATWorksheet);
Worksheet worksheet = (Worksheet)workbook.Worksheets[1];
excelApp.Visible = true;
foreach (TestCase testCase in TestCaseList.Items.GetList())
{
Range worksheetRange = GetRangeForTestCase(worksheet);
//The step above is equivalent to:
// Range worksheetRange = worksheet.get_Range("A1", "E14");
//for the first foreach iteration.
Range stepsRange = worksheetRange.get_Range("D1", "E14");
//This is a local range within the worksheetRange, not the worksheet,
//so it is always have to be between D1 to E14 for a 14th steps test case.
//Anyway, it should work at least for the 1st iteration.
//for test evaluation only. All cells between D1 to E14 are filled with "ccc"
test.Value = "ccc";
//This list of pairs which are converted to Array[,] is about to be converted to a two dimensional array
list = new List<object>();
foreach (KeyValuePair<string, string> item in testCase.Steps)
{
//Here I build the inside Array[,]
object[] tempArgs = {item.Key, item.Value};
list.Add(tempArgs);
}
object[] args = { Type.Missing, list.ToArray() };
test.GetType().InvokeMember("Value", System.Reflection.BindingFlags.SetProperty, null, test, args);
//Now, all the "ccc" within the Excel worksheet are disapeared, so the Range is correct, but the value of args[] are not set!!
}
The actual results of running this code is that the range is defined (probably correctly) but its values are set to null,
although - I can see the correct values in the args array in run time.
I've also tried to set a wider range and populate it with range.Value = "Pake value" and saw that, after running my peace of code, the correct range of steps become blank!
So, the range is correct, the array is filled with my values, the InvokeMember method is correctly invoked :)
But, all values are set to null..
Help...
One cell or 1dim array can be set via one of the following:
Range range = SetRange();//Let's say range is set between A1 to D1
object[] args = {1, 2, 3, 4 };
//Directly
range.Value = args;
//By Reflection
range.GetType().InvokeMember("Value", System.Reflection.BindingFlags.SetProperty, null, range, args);
A 2dim array cannot be directly set, so one has to use the reflection flow to set a matrix of values. This matrix has to be built before the set, like this:
Range range = SetRange();//Let's say range is set between A1 to C5
int rows = 5;
int columns = 3;
object[,] data = new object[rows, columns];
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < columns; j++)
{
//Here I build the inside Array[,]
string uniqueValue = (i + j).ToString();
data[i, j] = "Insert your string value here, e.g: " + uniqueValue;
}
}
object[] args = { data };
range.GetType().InvokeMember("Value", System.Reflection.BindingFlags.SetProperty, null, range, args);
As for your issue, all the range set to null, I think this is due to wrong arguments.
Indeed why the Type.Missing in the arguments list?
Hence this should be a step in the right direction:
object[] args = { list.ToArray() };
test.GetType().InvokeMember("Value", System.Reflection.BindingFlags.SetProperty, null, test, args);
Moreover list.ToArray will only generate an array of arrays not a matrix, so you should build your matrix differently, e.g.:
object[,] data = new object[14, 2];
int row = 0;
foreach (KeyValuePair<string, string> item in testCase.Steps)
{
//Here I build the inside Array[,]
data[row, 0] = item.Key;
data[row, 1] = item.Value;
++row;
}
object[] args = { data };
And what's the rational behind the use of InvokeMember instead of a simpler:
test.Value = data;
?
Hope this helps...

C# - How do I iterate all the rows in Excel._Worksheet?

I am looking to programmatically pull data from an Excel worksheet and insert it into a database table.
How do I determine the number of columns and rows in a worksheet or otherwise iterate the rows?
I have
Excel._Worksheet worksheet = (Excel._Worksheet)workbook.ActiveSheet;
I tried worksheet.Range.Rows.Count
which tosses up
Indexed property 'Microsoft.Office.Interop.Excel._Worksheet.Range' has
non-optional arguments which must be provided
What needs to be done?
using Excel = Microsoft.Office.Interop.Excel;
...
public void IterateRows(Excel.Worksheet worksheet)
{
//Get the used Range
Excel.Range usedRange = worksheet.UsedRange;
//Iterate the rows in the used range
foreach(Excel.Range row in usedRange.Rows)
{
//Do something with the row.
//Ex. Iterate through the row's data and put in a string array
String[] rowData = new String[row.Columns.Count];
for(int i = 0; i < row.Columns.Count; i++)
rowData[i] =Convert.ToString(row.Cells[1, i + 1].Value2);
}
}
This compiles and runs just great for me! I'm using it to extract rows with missing fields to an error log.
I presume you are actually looking for the last used row. In that case you need to write it like this:
Range UsedRange = worksheet.UsedRange;
int lastUsedRow = UsedRange.Row + UsedRange.Rows.Count - 1;

Categories

Resources