C# how to iterate over excel columns - c#

I want to get a specific column of an excel sheet and then iterate through it's cells. I want it to look something like this:
Excel.Workbook workbook = app.Workbooks.Open(svDropPath);
Excel.Worksheet xlWorkSheet = (Excel.Worksheet)workbook.Sheets["Sheet Name"];
var col = xlWorkSheet.UsedRange.Columns["C:C", Type.Missing]; // I want the 3rd column
foreach(Cell c in col)
....
How do I actually make this foreach loop?

Your loop will looks as follow:
foreach (Excel.Range item in col.Cells)
{
//whatever you want to do with your cells, here- msgbox of cells value
MessageBox.Show(Convert.ToString(item.Value));
}

I believe there is no nice way of doing it other than to loop through the indices in question and use either Cells or Rows:
for (int i = 1; i <= max; i++)
{
Range cell = col.Cells[i, 1];
// or
Range cell = col.Rows[i];
}
However, note that if you are reading and/or writing all the cells, you are much better off reading/writing the whole column to/from an array of object, and then looping through the array items, as outlined in my answer https://stackoverflow.com/a/18058144/1737957 . Not only is this much faster, you can also use nicer language constructs for looping since you are now dealing with a straightforward C# array.
The only reason you would have to loop rather than do this AFAIK is if you were accessing something like conditional formats etc., rather than just cell contents, and you couldn't write a whole range of them in one statement. However there may be ways of doing these too using arrays.

Related

How to delete multiple blank lines in a WPF DataGrid imported from an Excel file

I have a WPF DataGrid which I fill with imported data from an Excel file (*. Xlsx) through a class, the problem is that multiple blank lines are added to the end of the DataGrid that I don't see how to delete. I attach my code.
<DataGrid Name="dgvMuros" Height="210" Margin="8" VerticalAlignment="Top" Padding="5,6" ColumnWidth="50" IsReadOnly="False"
AlternatingRowBackground="Azure" GridLinesVisibility="All" HeadersVisibility="Column"
Loaded="dgvMuros_Loaded" CellEditEnding="DataGrid_CellEditEnding" ItemsSource="{Binding Data}"
HorizontalGridLinesBrush="LightGray" VerticalGridLinesBrush="LightGray" >
</DataGrid>
With this method I import the data from the Excel file.
public void ImportarMuros()
{
ExcelData dataFronExcel = new ExcelData();
this.dgvMuros.DataContext = dataFronExcel;
txtTotMuros.Text = dataFronExcel.numMuros.ToString();
cmdAgregarMuros.IsEnabled = false;
cmdBorrarMuros.IsEnabled = false;
cmdImportar.IsEnabled = false;
}
public class ExcelData
{
public int numMuros { get; set; }
public DataView Data
{
get
{
Excel.Application excelApp = new Excel.Application();
Excel.Workbook workbook;
Excel.Worksheet worksheet;
Excel.Range range;
workbook = excelApp.Workbooks.Open(Environment.CurrentDirectory + "\\MurosEjemplo.xlsx");
worksheet = (Excel.Worksheet)workbook.Sheets["DatMuros"];
int column = 0;
int row = 0;
range = worksheet.UsedRange;
DataTable dt = new DataTable();
dt.Columns.Add("Muro");
dt.Columns.Add("Long");
dt.Columns.Add("Esp");
dt.Columns.Add("X(m)");
dt.Columns.Add("Y(m)");
dt.Columns.Add("Dir");
for (row = 2; row < range.Rows.Count; row++)
{
DataRow dr = dt.NewRow();
for (column = 1; column <= range.Columns.Count; column++)
{
dr[column - 1] = Convert.ToString((range.Cells[row, column] as Excel.Range).Value);
}
dt.Rows.Add(dr);
dt.AcceptChanges();
numMuros = dt.Rows.Count;
}
workbook.Close(true, Missing.Value, Missing.Value);
excelApp.Quit();
return dt.DefaultView;
}
}
}
Below, as commented, is an example of removing the extra “empty” rows from the DataTable.
There are a couple of ways to approach this. One is to clean the Excel file of the extras rows as I am aware that using Excel’s UsedRange property has a nasty habit of flagging rows that have no apparent data as NOT empty. This may be from formatting or other issues. I have a solution for that if you want to go down that rabbit hole. Fastest method to remove Empty rows and Columns From Excel Files using Interop
However, this solution was heavily based on LARGE Excel files with many rows and columns. If the files are not large, then the solution below should work.
Even though your posted code has some much-needed range checking (more below), using the posted code, I was able to read an Excel file that produced extra “empty” rows at the end. It is these rows we want to remove from the DataTable.
I am sure there are other ways to do this, however, a basic approach would be to simply loop through the DataTable rows, and check each cell… and, if ALL the cells on that row are “empty” then remove that row. This is the approach I used below.
To help get this done quickly, keeping this to one loop through the table is a goal. In other words, we want to loop through the table and remove rows from that SAME table. This will mean that extra care is needed. Obviously a foreach loop through the rows will not work.
However, a simple for loop will work, as long as we start at the bottom and work up. AND we need to make sure and NOT use dt.Rows.Count as an “ending” condition in the for loop through the rows. As this could possibly cause some problems. This is easily avoided by simply fixing the row count to a variable and use it as an ending condition. This will allow the code to delete the rows from the bottom up and not have to worry about getting the row and loop indexes mixed up.
A walkthrough of the code would go like… First a bool variable allEmpty is created to indicate if ALL the cells in a row are “empty.” For each row, we will set this variable to true to indicate that the row is empty. Then a loop through each cell of that row and check if each cell is NOT empty. If at least one of the cells in that row is NOT empty, then, we set allEmpty to false and break out of the columns loop. After the columns loop is exited, the code simply checks to see if that row is empty and if so, deletes that row.
It should be noted in the last if statement that checks for the empty row. When the FIRST non-empty row is found, then in this context where we are only wanting to delete the last “empty” rows, then, we are done and can break out of the rows loop and exit since we are only looking for the LAST empty rows.
If you comment out the else portion of the bottom if code, then, the code will remove ALL the empty rows.
bool allEmpty;
int rowCount = dt.Rows.Count - 1;
for (int dtRowIndex = rowCount; dtRowIndex >= 0; dtRowIndex--) {
allEmpty = true;
for (int dtColIndex = 0; dtColIndex < dt.Columns.Count; dtColIndex++) {
if (dt.Rows[dtRowIndex].ItemArray[dtColIndex].ToString() != "") {
allEmpty = false;
break;
}
}
if (allEmpty) {
dt.Rows.RemoveAt(dtRowIndex);
}
else {
break;
}
}
Eye brow raiser for the posted code…
The current posted code makes some dangerous assumptions in relation to what is returned from UsedRange and the dt column indexes. Example, the code starts by grabbing the worksheets UsedRange.
range = worksheet.UsedRange;
We obviously NEED this info, however, at this point in the code, we have NO clue how many rows or columns have been returned. Therefore, when the code gets to the second for loop through the columns... The code uses this column index as an index into the data row dr…
dr[column - 1] = …
Since the data table dt only has 6 columns, this is a risky assignment without checking the index range. Since used range grabs the used cells, what if a user added some text into column 7, 8 or ANY cell greater than 6, then this code will crash and burn. The code MUST check the number of columns returned from UsedRange to avoid an index out of range exception.
There are a couple of ways you could fix this. One would be to set the column loop ending condition to the number of columns in the data table. Unfortunately, this still leads to checking the number of columns returned by the used range considering it may return less columns than the data table has and the code will crash on the same line above only on the right side of the “=” equation.
= Convert.ToString((range.Cells[row, column] as Excel.Range).Value);
In both cases it is clear your code needs to check these ranges BEFORE you start the looping through the used range.
Lastly, if you must use Excel Interop, which is usually a last option case, then you need to minimize the possibility of leaking the COM objects (leaking resources), such that when something goes wrong your code still releases the COM objects the code creates. When using Interop, I suggest you wrap all the Excel code in a try/catch/finally statement. In the try portion you have the code. And the Finally portion is where you close the excel workbook, quit the excel application and release the COM objects.
You will need to decide what to do in the catch portion of code. A simple message box displayed to the user may suffice to tell the user there was an error, the user clicks OK, and the code executes the finally code. Point being, that you want to display something instead of simply swallowing the error.
This approach may look something like…
Microsoft.Office.Interop.Excel.Application ExcelApp = null;
Microsoft.Office.Interop.Excel.Workbook Workbook = null;
Microsoft.Office.Interop.Excel.Worksheet Worksheet = null;
try {
// code that works with excel interop
}
catch (Exception e) {
MessageBox.Show("Error Excel: " + e.Message);
}
finally {
if (Worksheet != null) {
Marshal.ReleaseComObject(Worksheet);
}
if (Workbook != null) {
//Workbook.Save();
Workbook.Close();
Marshal.ReleaseComObject(Workbook);
}
if (ExcelApp != null) {
ExcelApp.Quit();
Marshal.ReleaseComObject(ExcelApp);
}
}
I hope this makes sense and helps.

How to retrieve efficiently all strings from a large Excel documents

The Excel spreadsheet should be read by .NET. It is very efficient to read all values from the active range by using the property Value. This transfers all values in a two dimensional array, by one single call to Excel.
However reading strings is not possible for a range which contains more than one single cell. Therefor we have to iterate over all cells and use the Text property. This shows very poor performance for larger document.
The reason of using strings rather than values is to obtains the correct format (for instance for dates or the number of digits).
Here is a sample code written in C# to demonstrate the approach.
static void Main(string[] args)
{
Excel.Application xlApp = (Excel.Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Excel.Application");
var worksheet = xlApp.ActiveSheet;
var cells = worksheet.UsedRange();
// read all values in array -> fast
object[,] arrayValues = cells.Value;
// create array for text of the same extension
object[,] arrayText = (object[,])Array.CreateInstance(typeof(object),
new int[] { arrayValues.GetUpperBound(0), arrayValues.GetUpperBound(1) },
new int[] { arrayValues.GetLowerBound(0), arrayValues.GetLowerBound(1) });
// read text for each cell -> slow
for (int row = arrayValues.GetUpperBound(0); row <= arrayValues.GetUpperBound(0); ++row)
{
for (int col = arrayValues.GetUpperBound(0); col <= arrayValues.GetUpperBound(1); ++col)
{
object obj = cells[row, col].Text;
arrayText[row, col] = obj;
}
}
}
The question is, if there is a more efficient way to read the complete string content from an Excel document. One idea was to use cells.Copy to copy the content to the clipboard to get it from there. However this has some restrictions and could of course interfere with users which are working with the clipboard at the same time. So I wonder if there are better approaches to solve this performance issue.
You can use code below:
using (MSExcel.Application app = MSExcel.Application.CreateApplication())
{
MSExcel.Workbook book1 = app.Workbooks.Open( this.txtOpen_FilePath.Text);
MSExcel.Worksheet sheet = (MSExcel.Worksheet)book1.Worksheets[1];
MSExcel.Range range = sheet.GetRange("A1", "F13");
object value = range.Value; //the value is boxed two-dimensional array
}
The code is provided from this post. It should be much more efficient than your code, but may not be the best.

how can i calculated values after Worksheet.Calculate()?

i tried Trial version of Gembox.SpreadSheet.
when i Get Cells[,].value by for() or Foreach().
so i think after Calculate() & get Cell[].value, but that way just take same time,too.
it take re-Calculate when i Get Cell[].value.
workSheet.Calcuate(); <- after this, values are Calculated, am i right?
for( int i =0; i <worksheet.GetUsedCellRange(true).LastRowIndex+1;++i)
{
~~~~for Iteration~~~
var value = workSheet.Cells[i,j].Value; <- re-Calcuate value(?)
}
so here is a Question.
Can i Get calculated values? or you guys know pre-Calculate function or Get more Speed?
Unfortunate, I'm not sure what exactly you're asking, can you please try reformulating your question a bit so that it's easier to understand it?
Nevertheless, here is some information which I hope you'll find useful.
To iterate through all cells, you should use one of the following:
1.
foreach (ExcelRow row in workSheet.Rows)
{
foreach (ExcelCell cell in row.AllocatedCells)
{
var value = cell.Value;
// ...
}
}
2.
for (CellRangeEnumerator enumerator = workSheet.Cells.GetReadEnumerator(); enumerator.MoveNext(); )
{
ExcelCell cell = enumerator.Current;
var value = cell.Value;
// ...
}
3.
for (int r = 0, rCount = workSheet.Rows.Count; r < rCount; ++r)
{
for (int c = 0, cCount = workSheet.CalculateMaxUsedColumns(); c < cCount; ++c)
{
var value = workSheet.Cells[r, c].Value;
// ...
}
}
I believe all of them will have pretty much the same performances.
However, depending on the spreadsheet's content this last one could end up a bit slower. This is because it does not exclusively iterate only through allocated cells.
So for instance, let say you have a spreadsheet which has 2 rows. The first row is empty, it has no data, and the second row has 3 cells. Now if you use 1. or 2. approach then you will iterate only through those 3 cells in the second row, but if you use 3. approach then you will iterate through 3 cells in the first row (which previously were not allocated and now they are because we accessed them) and then through 3 cells in the second row.
Now regarding the calculation, note that when you save the file with some Excel application it will save the last calculated formula values in it. In this case you don't have to call Calculate method because you already have the required values in cells.
You should call Calculate method when you need to update, re-calculate the formulas in your spreadsheet, for instance after you have added or modified some cell values.
Last, regarding your question again it is hard to understand it, but nevertheless:
Can i Get calculated values?
Yes, that line of code var value = workSheet.Cells[i,j].Value; should give you the calculated value because you used Calculate method before it. However, if you have formulas that are currently not supported by GemBox.Spreadsheet's calculation engine then it will not be able to calculate the value. You can find a list of currently supported Excel formula functions here.
or you guys know pre-Calculate function or Get more Speed?
I don't know what "pre-Calculate function" means and for speed please refer to first part of this answer.

How to put array into excel range

I know how to write single cell into excel but when im trying it on array excel sheet is filling with only last value
this is my range
Excel.Range ServiceName = (Excel.Range)_sheet.get_Range(_sheet.Cells[38, "B"] as Excel.Range, _sheet.Cells[45, "B"] as Excel.Range);
_ServiceName is List which contains 1,2,3,4,5,6
for (int i = 0; i < _ServiceName.Count; i++)
{
ServiceNameArray[0, i] = _ServiceName[i];
}
this i my trying to write into excel but as i said it there is only last item (6) in excel book
for (int i = 0; i < _ServiceName.Count; i++)
{
ServiceName.set_Value(Type.Missing, ServiceNameArray[0,i]);
}
does anyone have an idea?
Davide Piras is right. And you're doing a few other strange things there, I can elaborate by request.
For now I just want to point out that you can directly assign the .Value property of a Range to an array:
ServiceName.Value2 = _ServiceName.toArray();
This is much, much faster for bigger amounts of data.
(Side note: If you want to do the same with Formulas, for some strange reason you have to take an extra step (doubling the time):
range.Formula = array;
range.Formula = range.Formula;
unless there is a better way I don't know about yet.)
I see you looping on the ServiceName array to get all values one after the other but not see you changing the focused cell inside the cellrange at every loop iteration. Of course, I would say, you see only the last value, because you are writing all values one over the other always in the same place.

Loop through all cells in a Excel Column to the end

Probably simple once i see the correct code but what is the best way to loop through a specific column in a worksheet until the end?
It's pretty simple. Just create a range object that points to the range you want to start at, then loop through each offset from that range until you get to a blank cell.
int i = 0;
while (target_range.Offset(i, 0).Value != "")
{
i++;
}

Categories

Resources