Interop Excel .Find() whole match not finding value, partial match does

Interop Excel .Find() whole match not finding value, partial match does - c#

So I've got a spreadsheet where I'm trying to find the index of certain column headers.
What I've found is XlLookAt.xlWhole does not find the value. However, if I use XlLookAt.xlPartit does.
I cannot use xlPart as it does not find the correct match in some instances.
I have confirmed that only AMT_ISSUED is in the cell in the actual spreadsheet and there aren't any white spaces on either end.
Does anyone know why XlLookAt.xlWhole doesn't work . Here's the code I'm using
List<int> columnNumbers = new List<int>();
object misValue = System.Reflection.Missing.Value;
var columnIndex = range.EntireRow.Find("AMT_ISSUED",
misValue, XlFindLookIn.xlValues, XlLookAt.xlWhole,
XlSearchOrder.xlByColumns, XlSearchDirection.xlNext,
false);
var index = columnIndex?.Column ?? 0;
columnNumbers.Add(index);
UPDATE:
I have even done this:
var value = ((Range)range.Cells[1, 4]).Value2.ToString();
var columnIndex = range.EntireRow.Find(value,
misValue, XlFindLookIn.xlValues, XlLookAt.xlWhole,
XlSearchOrder.xlByColumns, XlSearchDirection.xlNext,
false);
var index = columnIndex?.Column ?? 0;
columnNumbers.Add(index);
value has found the correct text but column index is still null.
WORKAROUND (I don't like it but it will get me past this hurdle)
Note: the headings could be in row 1 or 2, if it is in row 2 then row one has the first cell populated
for (int i = 1; i < 3; i++)
{
for (var h = 1; h <= colCount; h++)
{
object cellValue = ((Range)range.Cells[i, h]).Value2;
if ((h== 1 || h==2) && cellValue == null)
{
break;
}
if (columns.Contains(cellValue))
{
columnNumbers.Add(h);
}
}
}

The Range.Find method returns Range object. So, first, you need to obtain the Range and then, if it's not null, retrieve column index:
// Create Excel instance
Excel.Application excel = new Excel.Application { Visible = true };
Excel.Workbook book = excel.Workbooks.Open(#"PATH_TO_FILE");
Excel.Worksheet sheet = book.Sheets[1] as Excel.Worksheet;
// Search in the first row
Excel.Range header = sheet.Range["1:1"].Find("AMT_ISSUED", LookAt: Excel.XlLookAt.xlWhole);
if (header != null)
{
// Header is found
int index = header.Column;
}
else
{
// Header is not found
}

Related

Get Merged Cell Area with EPPLus

I'm using EPPlus to read excel files.
I have a single cell that is part of merged cells. How do I get the merged range that this cell is part of?
For example:
Assume Range ("A1:C1") has been merged.
Given Range "B1" it's Merge property will be true but there isn't a way to get the merged range given a single cell.
How do you get the merged range?
I was hoping for a .MergedRange which would return Range("A1:C1")

There is no such property out of the box but the worksheet has a MergedCells property with an array of all the merged cell addresses in the worksheet and a GetMergeCellId() method which will give you the index for a given cell address.
We can therefore combine these into a little extension method you can use to get the address. Something like this:
public static string GetMergedRangeAddress(this ExcelRange #this)
{
if (#this.Merge)
{
var idx = #this.Worksheet.GetMergeCellId(#this.Start.Row, #this.Start.Column);
return #this.Worksheet.MergedCells[idx-1]; //the array is 0-indexed but the mergeId is 1-indexed...
}
else
{
return #this.Address;
}
}
which you can use as follows:
using (var excel = new ExcelPackage(new FileInfo("inputFile.xlsx")))
{
var ws = excel.Workbook.Worksheets["sheet1"];
var b3address = ws.Cells["B3"].GetMergedRangeAddress();
}
(Note that in the event that you use this method on a multi-celled range it will return the merged cell address for the first cell in the range only)

You can get all merged cells from worksheet, hence
you can find the merged range a specific cell belongs to using the following:
public string GetMergedRange(ExcelWorksheet worksheet, string cellAddress)
{
ExcelWorksheet.MergeCellsCollection mergedCells = worksheet.MergedCells;
foreach (var merged in mergedCells)
{
ExcelRange range = worksheet.Cells[merged];
ExcelCellAddress cell = new ExcelCellAddress(cellAddress);
if (range.Start.Row<=cell.Row && range.Start.Column <= cell.Column)
{
if (range.End.Row >= cell.Row && range.End.Column >= cell.Column)
{
return merged.ToString();
}
}
}
return "";
}
Update:
Turns out that there is a much easier way using EPPLUS, just do the following:
var mergedadress = worksheet.MergedCells[row, column];
For example, if B1 is in a merged range "A1:C1":
var mergedadress = worksheet.MergedCells[1, 2]; //value of mergedadress will be "A1:C1".
2 is the column number because B is the 2nd column.

This will provide you exact width of merged cells:
workSheet.Cells[workSheet.MergedCells[row, col]].Columns

Not a direct answer as Stewart's answer is perfect, but I was lead here looking for a way to get the value of a cell, whether it's part of a larger merged cell or not, so I improved on Stewart's code:
public static string GetVal(this ExcelRange #this)
{
if (#this.Merge)
{
var idx = #this.Worksheet.GetMergeCellId(#this.Start.Row, #this.Start.Column);
string mergedCellAddress = #this.Worksheet.MergedCells[idx - 1];
string firstCellAddress = #this.Worksheet.Cells[mergedCellAddress].Start.Address;
return #this.Worksheet.Cells[firstCellAddress].Value?.ToString()?.Trim() ?? "";
}
else
{
return #this.Value?.ToString()?.Trim() ?? "";
}
}
And call it like this
var worksheet = package.Workbook.Worksheets[i];
var rowCount = worksheet.Dimension.Rows;
var columnCount = worksheet.Dimension.Columns;
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= columnCount; col++)
{
string val = worksheet.Cells[row, col].GetVal();
}
}

How to make a foreach loop with 3 or more excel ranges using C#

After searching alot and not being able to find something that I could adapt to my needs, I figured I'd try asking.
Basically what i need is a loop that would go through my entire range (usedRange)
and verify if on the same row, cells from 3 or more columns are empty (example A10, B10, C10) then the entire row should be removed.
One of my attempts as follows :
xl.Range emptyCellsDel = MySheet.UsedRange.SpecialCells(xl.XlCellType.xlCellTypeBlanks);
xl.Range myRange = emptyCellsDel.Range["A:C"];
const int aCol = 1; const int bCol = 2; const int cCol = 3;
for (int i = 1; i < MySheet.UsedRange.Rows.Count; i++)
{
if ((MySheet.Cells[i, aCol].Value ?? "").ToString() == "" &&
(MySheet.Cells[i, bCol].Value ?? "").ToString() == "" &&
(MySheet.Cells[i, cCol].Value ?? "").ToString() == "")
{
myRange.EntireRow.Delete();
}
}
Any sort of Idea / suggestion would be awesome and much appreciated.
Thank you in advance !

As SmStroble answer indicates and I have mentioned before. If you start at the TOP of the excel file and delete empty rows, you will run into an indexing issue when you delete a row and shift the others up, the row indexing in the loop will get thrown off. In the for loop code you have posted that variable would be i. Using this approach will miss empty rows and possibly delete non-empty rows.
SmStroble’s answer is the correct way to fix this issue, however there a couple of issues. First as commented we want to delete rows from the bottom up so SmStroble’s solution is to loop through the rows to find the row indexes of the empty rows and place them in a List<int>. The only problem here is that when you use a foreach loop here it will start with the first item put in the list, not the last.
The second issue you may run into is execution time. Looping through Excel rows can be expensive. When you use the following MySheet.UsedRange line in a for loop:
for (int i = 1; i < MySheet.UsedRange.Rows.Count; i++) {…
This line is expensive. If there are a lot of rows in the sheet this could take a significant amount of time. There are other solutions for this however a simple solution is to simply make an int variable to hold the row count and use it in the for loop like below:
int totalRows = MySheet.UsedRange.Rows.Count;
for (int i = 1; i < totalRows; i++) {
This small change will speed up execution significantly depending on the number of rows in the spreadsheet. Using a test sheet with 730+ rows it took 2447 Milliseconds with the UsedRange in the for loop and 1118 milliseconds when the usedRange was outside the for loop. Hope this helps.
string filePath = #"C:\YourPathToExcelFile\YourExcelFile.xls";
Microsoft.Office.Interop.Excel.Application ExcelApp = new Microsoft.Office.Interop.Excel.Application();
ExcelApp.Visible = true;
Workbooks wbs = ExcelApp.Workbooks;
Workbook xlWorkbook = wbs.Open(filePath, 0, false, 5, "", "", false, XlPlatform.xlWindows, "", true, false, 0, true, false, false);
Excel._Worksheet MySheet = xlWorkbook.Sheets[1];
const int aCol = 1;
const int bCol = 2;
const int cCol = 3;
List<int> rowsToDelete = new List<int>();
int totalRows = MySheet.UsedRange.Rows.Count;
for (int i = totalRows; i > 0; i--) {
if ((MySheet.Cells[i, aCol].Value ?? "").ToString() == "" &&
(MySheet.Cells[i, bCol].Value ?? "").ToString() == "" &&
(MySheet.Cells[i, cCol].Value ?? "").ToString() == "") {
rowsToDelete.Add(i);
}
}
foreach (int row in rowsToDelete) {
((Range)(MySheet.Rows[row])).Delete(XlDirection.xlUp);
}
xlWorkbook.Save();
xlWorkbook.Close();
ExcelApp.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(xlWorkbook);
System.Runtime.InteropServices.Marshal.ReleaseComObject(ExcelApp);
Console.WriteLine("Fished processing! Press any key to exit");
Console.ReadKey();
EDIT: changed the code above to loop from the bottom of the excel file to the top. This avoids having to reverse the rowsToDelete list.

You need to make note of the rows you want to delete and remove them outside of the for loop so that you are not modifying the range accessed by the for loop.
Also, if you are shifting the rows up you also need to remove them last row first so the row indexes do not get messed up. In the example below I have done this by reversing the order of the for loop and storing the row numbers in a list which will enforce the ordering.
xl.Range myRange = emptyCellsDel.Range["A:C"];
const int aCol = 1; const int bCol = 2; const int cCol = 3;
List<int> rowsToDelete = new List<int>();
int rowCount = MySheet.UsedRange.Rows.Count
for (int i = rowCount; i >= 1; i--)
{
if ((MySheet.Cells[i, aCol].Value2 ?? "").ToString() == "" &&
(MySheet.Cells[i, bCol].Value2 ?? "").ToString() == "" &&
(MySheet.Cells[i, cCol].Value2 ?? "").ToString() == "")
{
rowsToDelete.Add(i);
}
}
foreach (int row in rowsToDelete) { ((Range)(MySheet.Rows[row])).Delete(XlDirection.xlUp); }

find the last used row in a column in windows c#

I have a excel report and i need to draw charts based on the data in the report. Am able to get the range from a particular column to last filled row like shown below. I have many columns in my report and i need only the data in a particular column like ("c1","
c12"). the column length may vary. it need not be 12. How can i get the range till last filled row of a column.
Excel.Range last1 = xlWorkSheet2.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);
oRange = xlWorkSheet2.get_Range("A6", last1);

Try the following code. This works by selecting the top cell in a row, and then searching downwards until the end of the range is found. The range column is simply the range between start and end. Note that this will only find the last contiguous cell in the range, and will not search through blank rows.
Excel.Range start = xlWorkSheet2.Range["A1"];
Excel.Range column;
if (start.Offset[1].Value != null)
column = xlWorkSheet2.Range[start, start.End[Excel.XlDirection.xlDown]];
else
column = start;
The following code will allow you to retrieve the full used range of the column even if there are blank rows. This code works in a similar manner, but searches upwards from the bottom of the used range in the worksheet to find the last cell in the column containing a value.
Excel.Range start = xlWorkSheet2.Range["A1"];
Excel.Range bottom = xlWorkSheet2.Range["A" + (ws.UsedRange.Rows.Count + 1)];
Excel.Range end = bottom.End[Excel.XlDirection.xlUp];
Excel.Range column = xlWorkSheet2.Range[start, end];

Hi found that all the above methods didn't work for what I wanted to do, so here is my solution:
public object GetLastNotEmptyRowOfColumn(string sheet, string column,int startRow,int endRow)
{
try
{
var validColumn = Regex.IsMatch(column, #"^[a-zA-Z]+$");
if(!validColumn)
{
throw new Exception($"column can only a letter. value entered : {column}");
}
xlBook = xlApp.ActiveWorkbook;
xlSheet = xlBook.Sheets[sheet];
xlRange = xlSheet.Range[$"{column}{startRow}", $"{column}{endRow}"];
object[,] returnVal = xlRange.Value;
var rows = returnVal.GetLength(0);
// var cols = returnVal.GetLength(1);
int count = 1;
for (int r = 1; r <= rows; r++)
{
var row = returnVal[r, 1];
if (row == null) break;
count++;
}
//returns an object : {Count:10,Cell:A9}
return= new { Count=count-1, Cell=$"{column}{startRow+count-1}" };
}
catch (Exception ex)
{
......
}
return null;
}
Usage: var response = GetLastNotEmptyRowOfColumn("Sheet1", "A",1,100);
Result:

how to get the cell column of a range

I'm iterating over a row which I got from a range in order to find a specific word in the cell content and then I want to get the column where I find it. For example, if I find the desired content at the 19th place, it means the the excel column is "S".
Here is the code I'm using so far:
Excel.Worksheet xlWorkSheet = GetWorkSheet(currentWorkBook, "sheet");
var row = xlWorkSheet.Rows["5"];
int rowLength = xlWorkSheet.UsedRange.Columns.Count;
Excel.Range currentTitle = row[1]; //in order to iterate only over the 5th row in this example
for (int i = 1; i < rowLength; i++)
{
string title = currentTitle.Value2[1,i];
if (title == null)
{
continue;
}
if (title.Contains(wordToSearch))
{
string column = THIS IS THE QUESTION - WHAT DO I NEED TO WRITE HERE?
Excel.Range valueCell = xlWorkSheet.Range[column + "5"];
return valueCell.Value2;
}
notice the line of string column in which i need to add the code.

As far as you don't want to rely on any calculation, the other option I see is extracting the column letter from Address property, like in the code below:
Excel.Range currentRange = (Excel.Range)currentTitle.Cells[1, i];
string columnLetter = currentRange.get_AddressLocal(true, false, Excel.XlReferenceStyle.xlA1, missing, missing).Split('$')[0];
string title = null;
if (currentRange.Value2 != null)
{
title = currentRange.Value2.ToString();
}
As you can see, I am forcing a "$" to appear in order to ease the column letter retrieval.

Delete Empty Rows with Excel Interop

I have user supplied excel files that need to be converted to PDF. Using excel interop, I can do this fine with .ExportAsFixedFormat(). My problem comes up when a workbook has millions of rows. This turns into a file that has 50k+ pages. That would be fine if the workbook had content in all of those rows. Every time one of these files shows up though, there are maybe 50 rows that have content and the rest are blank. How can I go about removing the empty rows so I can export it to a decent sized PDF?
I've tried starting at the end row and, one-by-one, using CountA to check if the row has content and if it does, delete it. Not only does this take forever, this seems to fail after about 100k rows with the following error:
Unable to evaluate expression because the code is optimized or a native frame is on top of the call stack.
I've tried using SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues) but that includes a row if any cell has formatting (like a bg color).
I've tried using Worksheet.UsedRange and then deleting everything after that but UsedRange has the same problem as point two.
This is the code I've tried:
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
currentRowIndex = rows.Count;
bool contentFound = false;
while (!contentFound && currentRowIndex > 0)
{
currentRow = rows[currentRowIndex];
if (Application.WorksheetFunction.CountA(currentRow) == 0)
{
currentRow.Delete();
}
else
{
contentFound = true;
}
Marshal.FinalReleaseComObject(currentRow);
currentRowIndex--;
}
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
lastCell = rows.SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues);
int startRow = lastCell.Row;
Range range = sheet.get_Range(lastCell.get_Address(RowAbsolute: startRow));
range.Delete();
Marshal.FinalReleaseComObject(range);
Marshal.FinalReleaseComObject(lastCell);
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
Do I have a problem with my code, is this an interop problem or maybe it's just a limitation on what Excel can do? Is there a better way to do what I'm attempting?

I would suggest you to get the count of rows which contain some values, using CountA (as you have tried in point 1). Then copy those rows into a new sheet and export it from there. It will be easier to copy few rows to new sheet and working on it, rather than trying to delete huge number of rows from source sheet.
For creating new sheet and copying rows you can use the following code:
excel.Worksheet tempSheet = workbook.Worksheets.Add();
tempSheet.Name = sheetName;
workbook.Save();
//create a new method for copy new rows
//as the rowindex you can pass the total no of rows you have found out using CountA
public void CopyRows(excel.Workbook workbook, string sourceSheetName, string DestSheetName, int rowIndex)
{
excel.Worksheet sourceSheet = (excel.Worksheet)workbook.Sheets[sourceSheetName];
excel.Range source = (excel.Range)sourceSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
excel.Worksheet destSheet = (excel.Worksheet)workbook.Sheets[DestSheetName];
excel.Range dest = (excel.Range)destSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
source.Copy(dest);
excel.Range newRow = (excel.Range)destSheet.Rows[rowIndex+1];
newRow.Insert();
workbook.Save();
}

Have you tried Sheet1.Range("A1").CurrentRegion.ExportAsFixedFormat() where Sheet1 is a valid sheet name and "A1" is a cell you can test to ensure it is located in the range you want to export?
The question remains, why does Excel think there is data in those "empty" cells? Formatting? A pre-existing print area that needs to be cleared? I know I've encountered situations like that before, those are the only possibilities that come to mind at this moment.

Try these steps -
copy Worksheet.UsedRange to a separate sheet (sheet2).
use paste special so that formatting is retained
try parsing sheet2 for unused rows
If this doesnt help try repeating step 2 with formatting info being cleared and then parsing sheet2. you can always copy format info later (if they are simple enough)

If you can first load the Excel file into a DataSet via the OleDBAdapter, it's relatively easy to remove blank rows on the import...
Try this OleDBAdapter Excel QA I posted via stack overflow.
Then export the DataSet to a new Excel file and convert that file to PDF. That may be a big "IF" though of course depending on the excel layout (or lack there of).

I had to solve this problem today for what might be a subset of your possible cases.
If your spreadsheet meets the following conditions:
All columns with data have header text in line 1.
All rows with data are in sequence until the first BLANK row.
Then, the following code may help:
private static string[,] LoadCellData(Excel.Application excel, dynamic sheet)
{
int countCols = CountColsToFirstBlank(excel, sheet);
int countRows = CountRowsToFirstBlank(excel, sheet);
cellData = new string[countCols, countRows];
string datum;
for (int i = 0; i < countCols; i++)
{
for (int j = 0; j < countRows; j++)
{
try
{
if (null != sheet.Cells[i + 1, j + 1].Value)
{
datum = excel.Cells[i + 1, j + 1].Value.ToString();
cellData[i, j] = datum;
}
}
catch (Exception ex)
{
lastException = ex;
//Console.WriteLine(String.Format("LoadCellData [{1}, {2}] reported an error: [{0}]", ex.Message, i, j));
}
}
}
return cellData;
}
private static int CountRowsToFirstBlank(Excel.Application excel, dynamic sheet)
{
int count = 0;
for (int j = 0; j < sheet.UsedRange.Rows.Count; j++)
{
if (IsBlankRow(excel, sheet, j + 1))
break;
count++;
}
return count;
}
private static int CountColsToFirstBlank(Excel.Application excel, dynamic sheet)
{
int count = 0;
for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
{
if (IsBlankCol(excel, sheet, i + 1))
break;
count++;
}
return count;
}
private static bool IsBlankCol(Excel.Application excel, dynamic sheet, int col)
{
for (int i = 0; i < sheet.UsedRange.Rows.Count; i++)
{
if (null != sheet.Cells[i + 1, col].Value)
{
return false;
}
}
return true;
}
private static bool IsBlankRow(Excel.Application excel, dynamic sheet, int row)
{
for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
{
if (null != sheet.Cells[i + 1, row].Value)
{
return false;
}
}
return true;
}

Can you try with below code :
for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
{
//Assume the first row is the header. Then use the column match ups by name to determine the index.
//This will allow you to have the order of the header.Keys change without any affect.
var row = workSheet.Cells[string.Format("{0}:{0}", rowIndex)];
// check if the row and column cells are empty
bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // skip this row
else{
//here read header
if()
{
//some code
}
else
{
//some code to read body
}
}
}
Hope this help,else let me know if you need description about code.
Updated :
below code is used to check how many rows are in the worksheet. a for loop will traverse untill end of row of the worksheet.
for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
here we are checking if the row and column cells are empty using linq:
bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // if true then skip this row
else
// read headers(assuming it is presented in worksheet)
// else read row wise data
and then do necessary steps.
hoping this clears now.

I had the same problem and managed to fix it using the CurrentRegion:
var lastcell = sheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell);
var filledcells = sheet.Cells.Range[sheet.Cells.Item[1, 1],
sheet.Cells[lastcell.Row - 1, lastcell.Column]]
.CurrentRegion;
filledcells.ExportAsFixedFormat(
and so on. The CurrentRegion is said to expand to the borders where cells are empty, and apparently that means it also shrinks if it contains many empty cells.

Please try the following code:
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
sheet.Columns("A:A").SpecialCells(XlCellType.xlCellTypeBlanks).EntireRow.Delete
sheet.Rows("1:1").SpecialCells(XlCellType.xlCellTypeBlanks).EntireColumn.Delete
Marshal.FinalReleaseComObject(sheet);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Interop Excel .Find() whole match not finding value, partial match does - c#

Related

Get Merged Cell Area with EPPLus

How to make a foreach loop with 3 or more excel ranges using C#

find the last used row in a column in windows c#

how to get the cell column of a range

Delete Empty Rows with Excel Interop

Categories

Resources