I am trying to get the last row of an excel sheet programatically using the Microsoft.interop.Excel Library and C#. I want to do that, because I am charged with looping through all the records of an excel spreadsheet and performing some kind of operation on them. Specifically, I need the actual number of the last row, as I will throw this number into a function. Anybody have any idea how to do that?
Couple ways,
using Excel = Microsoft.Office.Interop.Excel;
Excel.ApplicationClass excel = new Excel.ApplicationClass();
Excel.Application app = excel.Application;
Excel.Range all = app.get_Range("A1:H10", Type.Missing);
OR
Excel.Range last = sheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);
Excel.Range range = sheet.get_Range("A1", last);
int lastUsedRow = last.Row;
int lastUsedColumn = last.Column;
This is a common issue in Excel.
Here is some C# code:
// Find the last real row
nInLastRow = oSheet.Cells.Find("*",System.Reflection.Missing.Value,
System.Reflection.Missing.Value, System.Reflection.Missing.Value, Excel.XlSearchOrder.xlByRows,Excel.XlSearchDirection.xlPrevious, false,System.Reflection.Missing.Value,System.Reflection.Missing.Value).Row;
// Find the last real column
nInLastCol = oSheet.Cells.Find("*", System.Reflection.Missing.Value, System.Reflection.Missing.Value,System.Reflection.Missing.Value, Excel.XlSearchOrder.xlByColumns,Excel.XlSearchDirection.xlPrevious, false,System.Reflection.Missing.Value,System.Reflection.Missing.Value).Column;
found here
or using SpecialCells
Excel.Range last = sheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);
Excel.Range range = sheet.get_Range("A1", last);
[EDIT] Similar threads:
VB.NET - Reading ENTIRE content of an excel file
How to get the range of occupied cells in excel sheet
Pryank's answer is what worked closest for me. I added a little bit towards the end (.Row) so I am not just returning a range, but an integer.
int lastRow = wkSheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell, Type.Missing).Row;
The only way I could get it to work in ALL scenarios (except Protected sheets):
It supports:
Scanning Hidden Row / Columns
Ignores formatted cells with no data / formula
Code:
// Unhide All Cells and clear formats
sheet.Columns.ClearFormats();
sheet.Rows.ClearFormats();
// Detect Last used Row - Ignore cells that contains formulas that result in blank values
int lastRowIgnoreFormulas = sheet.Cells.Find(
"*",
System.Reflection.Missing.Value,
InteropExcel.XlFindLookIn.xlValues,
InteropExcel.XlLookAt.xlWhole,
InteropExcel.XlSearchOrder.xlByRows,
InteropExcel.XlSearchDirection.xlPrevious,
false,
System.Reflection.Missing.Value,
System.Reflection.Missing.Value).Row;
// Detect Last Used Column - Ignore cells that contains formulas that result in blank values
int lastColIgnoreFormulas = sheet.Cells.Find(
"*",
System.Reflection.Missing.Value,
System.Reflection.Missing.Value,
System.Reflection.Missing.Value,
InteropExcel.XlSearchOrder.xlByColumns,
InteropExcel.XlSearchDirection.xlPrevious,
false,
System.Reflection.Missing.Value,
System.Reflection.Missing.Value).Column;
// Detect Last used Row / Column - Including cells that contains formulas that result in blank values
int lastColIncludeFormulas = sheet.UsedRange.Columns.Count;
int lastColIncludeFormulas = sheet.UsedRange.Rows.Count;
For questions involving the Excel object model, it's often easier to try it out in VBA first, then translating to C# is fairly trivial.
In this case one way to do it in VBA is:
Worksheet.UsedRange.Row + Worksheet.UsedRange.Rows.Count - 1
The ActiveSheet.UsedRange.Value returns a 2 dimensional object array of [row, column]. Checking the length of both dimensions will provide the LastRow index and the LastColumn index. The example below is using C#.
Excel.Worksheet activeSheet;
Excel.Range activeRange;
public virtual object[,] RangeArray
{
get { return ActiveRange.Value; }
}
public virtual int ColumnCount
{
get { return RangeArray.GetLength(1); }
}
public virtual int RowCount
{
get { return RangeArray.GetLength(0); }
}
public virtual int LastRow
{
get { return RowCount; }
}
This issue is even worse when there are possibly empty cells. But you have to read a row even if only one value is filled. It can take a while when there are a lot of unfilled cells but if the input is close to correct it is rather fast.
My solution ignores completely empty rows and returns the longest column's row count:
private static int GetLastRow(Worksheet worksheet)
{
int lastUsedRow = 1;
Range range = worksheet.UsedRange;
for (int i = 1; i < range.Columns.Count; i++)
{
int lastRow = range.Rows.Count;
for (int j = range.Rows.Count; j > 0; j--)
{
if (lastUsedRow < lastRow)
{
lastRow = j;
if (!String.IsNullOrWhiteSpace(Convert.ToString((worksheet.Cells[j, i] as Range).Value)))
{
if (lastUsedRow < lastRow)
lastUsedRow = lastRow;
if (lastUsedRow == range.Rows.Count)
return lastUsedRow - 1;
break;
}
}
else
break;
}
}
return lastUsedRow;
}
For those who use SpecialCells method, (I'm not sure about others), Please Note in case your last cell is merged, you won't be able to get last row and column number using Range.Row and Range.Column to get the last row and column as numbers.
you need to first Unmerge your range and then Again get the last cell.
It cost me a lot.
private int[] GetLastRowCol(Ex.Worksheet ws)
{
Ex.Range last = ws.Cells.SpecialCells(Ex.XlCellType.xlCellTypeLastCell, Type.Missing);
bool isMerged = (bool)last.MergeCells;
if (isMerged)
{
last.UnMerge();
last = ws.Cells.SpecialCells(Ex.XlCellType.xlCellTypeLastCell, Type.Missing);
}
return new int[2] { last.Row, last.Column };
}
As previously discussed, the techniques above (xlCellTypeLastCell etc.) do not always provide expected results. Although it's not difficult to iterate down through a column checking for values, sometimes you may find that there are empty cells or rows with data that you want to consider in subsequent rows. When using Excel directly, a good way of finding the last row is to press CTRL + Down Arrow a couple of times (you'll end up at row 1048576 for an XLSX worksheet) and then press CTRL + Up Arrow which will select the last populated cell. If you do this within Excel while recording a Macro you'll get the code to replicate this, and then it's just a case of tweaking it for C# using the Microsoft.Office.Interop.Excel libraries. For example:
private int GetLastRow()
{
Excel.Application ExcelApp;
ExcelApp = new Excel.Application();
ExcelApp.Selection.End(Excel.XlDirection.xlDown).Select();
ExcelApp.Selection.End(Excel.XlDirection.xlDown).Select();
ExcelApp.Selection.End(Excel.XlDirection.xlDown).Select();
ExcelApp.Selection.End(Excel.XlDirection.xlUp).Select();
return ExcelApp.ActiveCell.Row;
}
It may not be the most elegant solution (I guess instead you could navigate to the final row within the spreadsheet first directly before using XlUp) but it seems to be more reliable.
As CtrlDot and Leo Guardian says, it is not very acuarate the method, there some files where formats affect the "SpecialCells".
So I used a combination of that plus a While.
Range last = sheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell, Type.Missing);
Range range = sheet.get_Range("A1", last);
int lastrow = last.Row;
// Complement to confirm that the last row is the last
string textCell= "Existe";
while (textCell != null)
{
lastrow++;
textCell = sheet.Cells[lastrow + 1, 1].Value;
}
In case of using OfficeOpenXml nowadays:
using OfficeOpenXml;
using System.IO;
FileInfo excelFile = new FileInfo(filename);
ExcelPackage package = new ExcelPackage(excelFile);
ExcelWorksheet sheet = package.Workbook.Worksheets[1];
int lastRow = sheet.Dimension.End.Row;
int lastColumn = sheet.Dimension.End.Column;
I don't know if using Microsoft.Office.Interop.Excel is still state of the art or more a legacy library. In my opinion I'm doing well replacing with OfficeOpenXml. So this answer might be usefull for future search results.
Related
I have a WPF DataGrid which I fill with imported data from an Excel file (*. Xlsx) through a class, the problem is that multiple blank lines are added to the end of the DataGrid that I don't see how to delete. I attach my code.
<DataGrid Name="dgvMuros" Height="210" Margin="8" VerticalAlignment="Top" Padding="5,6" ColumnWidth="50" IsReadOnly="False"
AlternatingRowBackground="Azure" GridLinesVisibility="All" HeadersVisibility="Column"
Loaded="dgvMuros_Loaded" CellEditEnding="DataGrid_CellEditEnding" ItemsSource="{Binding Data}"
HorizontalGridLinesBrush="LightGray" VerticalGridLinesBrush="LightGray" >
</DataGrid>
With this method I import the data from the Excel file.
public void ImportarMuros()
{
ExcelData dataFronExcel = new ExcelData();
this.dgvMuros.DataContext = dataFronExcel;
txtTotMuros.Text = dataFronExcel.numMuros.ToString();
cmdAgregarMuros.IsEnabled = false;
cmdBorrarMuros.IsEnabled = false;
cmdImportar.IsEnabled = false;
}
public class ExcelData
{
public int numMuros { get; set; }
public DataView Data
{
get
{
Excel.Application excelApp = new Excel.Application();
Excel.Workbook workbook;
Excel.Worksheet worksheet;
Excel.Range range;
workbook = excelApp.Workbooks.Open(Environment.CurrentDirectory + "\\MurosEjemplo.xlsx");
worksheet = (Excel.Worksheet)workbook.Sheets["DatMuros"];
int column = 0;
int row = 0;
range = worksheet.UsedRange;
DataTable dt = new DataTable();
dt.Columns.Add("Muro");
dt.Columns.Add("Long");
dt.Columns.Add("Esp");
dt.Columns.Add("X(m)");
dt.Columns.Add("Y(m)");
dt.Columns.Add("Dir");
for (row = 2; row < range.Rows.Count; row++)
{
DataRow dr = dt.NewRow();
for (column = 1; column <= range.Columns.Count; column++)
{
dr[column - 1] = Convert.ToString((range.Cells[row, column] as Excel.Range).Value);
}
dt.Rows.Add(dr);
dt.AcceptChanges();
numMuros = dt.Rows.Count;
}
workbook.Close(true, Missing.Value, Missing.Value);
excelApp.Quit();
return dt.DefaultView;
}
}
}
Below, as commented, is an example of removing the extra “empty” rows from the DataTable.
There are a couple of ways to approach this. One is to clean the Excel file of the extras rows as I am aware that using Excel’s UsedRange property has a nasty habit of flagging rows that have no apparent data as NOT empty. This may be from formatting or other issues. I have a solution for that if you want to go down that rabbit hole. Fastest method to remove Empty rows and Columns From Excel Files using Interop
However, this solution was heavily based on LARGE Excel files with many rows and columns. If the files are not large, then the solution below should work.
Even though your posted code has some much-needed range checking (more below), using the posted code, I was able to read an Excel file that produced extra “empty” rows at the end. It is these rows we want to remove from the DataTable.
I am sure there are other ways to do this, however, a basic approach would be to simply loop through the DataTable rows, and check each cell… and, if ALL the cells on that row are “empty” then remove that row. This is the approach I used below.
To help get this done quickly, keeping this to one loop through the table is a goal. In other words, we want to loop through the table and remove rows from that SAME table. This will mean that extra care is needed. Obviously a foreach loop through the rows will not work.
However, a simple for loop will work, as long as we start at the bottom and work up. AND we need to make sure and NOT use dt.Rows.Count as an “ending” condition in the for loop through the rows. As this could possibly cause some problems. This is easily avoided by simply fixing the row count to a variable and use it as an ending condition. This will allow the code to delete the rows from the bottom up and not have to worry about getting the row and loop indexes mixed up.
A walkthrough of the code would go like… First a bool variable allEmpty is created to indicate if ALL the cells in a row are “empty.” For each row, we will set this variable to true to indicate that the row is empty. Then a loop through each cell of that row and check if each cell is NOT empty. If at least one of the cells in that row is NOT empty, then, we set allEmpty to false and break out of the columns loop. After the columns loop is exited, the code simply checks to see if that row is empty and if so, deletes that row.
It should be noted in the last if statement that checks for the empty row. When the FIRST non-empty row is found, then in this context where we are only wanting to delete the last “empty” rows, then, we are done and can break out of the rows loop and exit since we are only looking for the LAST empty rows.
If you comment out the else portion of the bottom if code, then, the code will remove ALL the empty rows.
bool allEmpty;
int rowCount = dt.Rows.Count - 1;
for (int dtRowIndex = rowCount; dtRowIndex >= 0; dtRowIndex--) {
allEmpty = true;
for (int dtColIndex = 0; dtColIndex < dt.Columns.Count; dtColIndex++) {
if (dt.Rows[dtRowIndex].ItemArray[dtColIndex].ToString() != "") {
allEmpty = false;
break;
}
}
if (allEmpty) {
dt.Rows.RemoveAt(dtRowIndex);
}
else {
break;
}
}
Eye brow raiser for the posted code…
The current posted code makes some dangerous assumptions in relation to what is returned from UsedRange and the dt column indexes. Example, the code starts by grabbing the worksheets UsedRange.
range = worksheet.UsedRange;
We obviously NEED this info, however, at this point in the code, we have NO clue how many rows or columns have been returned. Therefore, when the code gets to the second for loop through the columns... The code uses this column index as an index into the data row dr…
dr[column - 1] = …
Since the data table dt only has 6 columns, this is a risky assignment without checking the index range. Since used range grabs the used cells, what if a user added some text into column 7, 8 or ANY cell greater than 6, then this code will crash and burn. The code MUST check the number of columns returned from UsedRange to avoid an index out of range exception.
There are a couple of ways you could fix this. One would be to set the column loop ending condition to the number of columns in the data table. Unfortunately, this still leads to checking the number of columns returned by the used range considering it may return less columns than the data table has and the code will crash on the same line above only on the right side of the “=” equation.
= Convert.ToString((range.Cells[row, column] as Excel.Range).Value);
In both cases it is clear your code needs to check these ranges BEFORE you start the looping through the used range.
Lastly, if you must use Excel Interop, which is usually a last option case, then you need to minimize the possibility of leaking the COM objects (leaking resources), such that when something goes wrong your code still releases the COM objects the code creates. When using Interop, I suggest you wrap all the Excel code in a try/catch/finally statement. In the try portion you have the code. And the Finally portion is where you close the excel workbook, quit the excel application and release the COM objects.
You will need to decide what to do in the catch portion of code. A simple message box displayed to the user may suffice to tell the user there was an error, the user clicks OK, and the code executes the finally code. Point being, that you want to display something instead of simply swallowing the error.
This approach may look something like…
Microsoft.Office.Interop.Excel.Application ExcelApp = null;
Microsoft.Office.Interop.Excel.Workbook Workbook = null;
Microsoft.Office.Interop.Excel.Worksheet Worksheet = null;
try {
// code that works with excel interop
}
catch (Exception e) {
MessageBox.Show("Error Excel: " + e.Message);
}
finally {
if (Worksheet != null) {
Marshal.ReleaseComObject(Worksheet);
}
if (Workbook != null) {
//Workbook.Save();
Workbook.Close();
Marshal.ReleaseComObject(Workbook);
}
if (ExcelApp != null) {
ExcelApp.Quit();
Marshal.ReleaseComObject(ExcelApp);
}
}
I hope this makes sense and helps.
Alright so I'm trying to receive the last row of an Excel sheet. My code to do so looked like this
public static int FindLastRow(Worksheet sheet)
{
int lastRow;
try
{
lastRow = sheet.Cells.Find("*", System.Reflection.Missing.Value, System.Reflection.Missing.Value, System.Reflection.Missing.Value, Xcl.XlSearchOrder.xlByColumns, Xcl.XlSearchDirection.xlPrevious, false, System.Reflection.Missing.Value, System.Reflection.Missing.Value).Row;
}
catch (Exception)
{
lastRow = 1;
}
return lastRow;
}
but it isn't working as needed because it's returning the wrong row. Basically it returns the row of the widest column instead of the row# of last row.
That means this one returns 7
Returns 7
and this one returns 10
Returns 10
as you can see it does not returns the last row but rather the widest. How do I return the last row of a worksheet in C#?
I'm quite new in handling huge data sets and I'm using C# for this. Now, the data that I'm handling (which is a CSV) has a column of 19 and row of 9,831. When it comes to writing the data into an existing excel file the program take 6 minutes to accomplish its task. I'm looking for suggestions or tips that can reduce the time rendering into seconds. So here's my class or code for writing it to an excel file:
using System;
using System.Data;
using Excel = Microsoft.Office.Interop.Excel;
namespace Project
{
class WriteCsv
{
public WriteCsv(DataTable dt)
{
//sets the existing excel file to be written
Microsoft.Office.Interop.Excel.Application excel = new Microsoft.Office.Interop.Excel.Application();
Microsoft.Office.Interop.Excel.Workbook sheet = excel.Workbooks.Open(#"path to excel file");
Microsoft.Office.Interop.Excel.Worksheet x = excel.ActiveSheet as Microsoft.Office.Interop.Excel.Worksheet;
//selects a specific worksheet to written on
x = (Excel.Worksheet)sheet.Sheets[2];
int rowCount = 1;
int dataColumns = dt.Columns.Count;
//this is where the writing starts
foreach (DataRow dr in dt.Rows)
{
int columnCount = 0;
while (columnCount < dataColumns)
{
x.Cells[rowCount, columnCount + 1] = dr[columnCount];
columnCount++;
}
Console.WriteLine("=====================ROW COMPLETED " + rowCount + "========================");
rowCount++;
}
sheet.Close(true, Type.Missing, Type.Missing);
excel.Quit();
}
}
}
I've dealt with this a few ways in the past. Especially when consuming a DataReader from a SQL source which is always a few hops, skips, and jumps from playing nice and fast with Excel.
Excel really likes 2-dim arrays though. What I've done with the DataTable in the past is converted it to a 2-dim array and then just dump that array into the spreadsheet all at once. You are still iterating through every row/column in the DataTable, but C# is fast about that.
string[,] data = new string[dt.Rows.Count, dt.Columns.Count];
int i = 0;
foreach (DataRow row in dt.Rows)
{
int j = 0;
foreach (DataColumn col in dt.Columns)
{
data[i,j++] = row[col].ToString();
}
i++;
}
//dump the whole array to the range
x.Value = data
There's some other crafty ways of doing this by writing to a file with comma delimited rows (although I would use tab to make it more excel friendly), then opening the file, but that seems even more cumbersome. Check out some interesting answers here
You may also have some luck converting that datatable to an array using Linq, although I haven't tried yet.
Perhaps something like:
x.Value = dt.AsEnumerable().Select(row => row.ItemArray).ToArray()
I'm not convinced that's 100%, but it may be a step in the right direction.
I have a data set in Excel and am using C# to open the worksheet and access some of the data.
I am trying to get all the rows that contain data from a particular column. For example in column B starting from cell 'B3' going down I want to store all the rows that contain data in a collection like an Array.
This is what I have so far:
Application excelApplication;
_Workbook workbook;
_Worksheet sheet;
excelApplication = new Excel.Application
{
Visible = true,
ScreenUpdating = true
};
workbook = excelApplication.Workbooks.Open(#"C:\Documents and Settings\user\Desktop\Book1.xls");
sheet = (Worksheet)workbook.Worksheets[2];
Excel.Range range = sheet.Range["b3:b145"].
foreach (Range cell in range)
{
// Do something with rows which contain data
}
As you can see above I have specified the range from B3 to B45 which I don't want. I want to get all the rows in the B column which contain data starting from B3.
How would I achieve this?
In general when I get stuck in these situations I record a Macro and convert the VBA code to C#. The object model in VSTO is pretty much exactly the same (remember this its a great tip) and with .Net 4.0 onwards optional parameters save a lot of code.
In your particular instance I envisage the larger the spreadsheet the longer it will take to read all the Excel cells in column B using VSTO. My advice is to use this technique to read them all at once:
//Work out the number of rows with data in column B:
//int lastColumn = range.Columns.Count;
int lastRow = range.Rows.Count;
//Get all the column values:
object[,] objectArray = shtName.get_Range("B3:B" + lastRow.ToString()).Value2;
rngName.Value2 = objectArray;
I am looking to programmatically pull data from an Excel worksheet and insert it into a database table.
How do I determine the number of columns and rows in a worksheet or otherwise iterate the rows?
I have
Excel._Worksheet worksheet = (Excel._Worksheet)workbook.ActiveSheet;
I tried worksheet.Range.Rows.Count
which tosses up
Indexed property 'Microsoft.Office.Interop.Excel._Worksheet.Range' has
non-optional arguments which must be provided
What needs to be done?
using Excel = Microsoft.Office.Interop.Excel;
...
public void IterateRows(Excel.Worksheet worksheet)
{
//Get the used Range
Excel.Range usedRange = worksheet.UsedRange;
//Iterate the rows in the used range
foreach(Excel.Range row in usedRange.Rows)
{
//Do something with the row.
//Ex. Iterate through the row's data and put in a string array
String[] rowData = new String[row.Columns.Count];
for(int i = 0; i < row.Columns.Count; i++)
rowData[i] =Convert.ToString(row.Cells[1, i + 1].Value2);
}
}
This compiles and runs just great for me! I'm using it to extract rows with missing fields to an error log.
I presume you are actually looking for the last used row. In that case you need to write it like this:
Range UsedRange = worksheet.UsedRange;
int lastUsedRow = UsedRange.Row + UsedRange.Rows.Count - 1;