Comparing a formula column with value column in excel

Comparing a formula column with value column in excel - c#

I have two columns in a excel sheet. I am populating the first column with a formula and I have some values in second column. Now I want to compare these two columns and need to display True/false in third column. But when I use 'IF' condition all I am getting is FALSE. Here is my code.
Formulating the column
using (ExcelPackage xlPackage = new ExcelPackage(newFile))
{
ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets[GetConfigValue("Reconsheet")];
int totalRows = worksheet.Dimension.End.Row;
for (int row = startupRow; row <= totalRows; row++)
{
//Formula
string vlookforH = "IF(ISNA(VLOOKUP(C" + row + ",PWA!A:B,2,FALSE)),0,VLOOKUP(C" + row + ",PWA!A:B,2,FALSE))";
worksheet.Cells[row, 8].Formula = vlookforH;
}
xlPackage.Save();
MessageBox.Show("PWA hours received");
}
Comparing Formula column and Normal value column:
for (int row = startupRow; row <= totalRows; row++)
{
if (Convert.ToInt32(worksheet.Cells[row, 18].Value) != 0)
{
decimal hvalue = (worksheet.Cells[row, 8].Value) != null ? Convert.ToDecimal(worksheet.Cells[row, 8].Value.ToString()) : 0;
decimal rvalue = (worksheet.Cells[row, 18].Value) != null ? Convert.ToDecimal(worksheet.Cells[row, 18].Value.ToString()) : 0;
if (hvalue == rvalue)
{
worksheet.Cells[row, 31].Value = "True";
}
else
{
worksheet.Cells[row,31].Value = "False";
Count = Count + 1;
}
}
}
When I am debugging the application, I realized hvalue is always zero because it's a formula column.
I've tried in different ways, but unable to find the solution. Can anyone help me? What am I doing wrong?

You will have to call worksheet.Calculate(); after writing the formulas into the cells to actually calculate the values. Calling it once after your first for-loop is enough.
I verified that on a test-project.
EDIT:
If worksheet.Calculate() does not work you can try xlPackage.Workbook.Calculate();
Here is a link to the documentation: EPPlus Calculate Documentation

Related

Get Merged Cell Area with EPPLus

I'm using EPPlus to read excel files.
I have a single cell that is part of merged cells. How do I get the merged range that this cell is part of?
For example:
Assume Range ("A1:C1") has been merged.
Given Range "B1" it's Merge property will be true but there isn't a way to get the merged range given a single cell.
How do you get the merged range?
I was hoping for a .MergedRange which would return Range("A1:C1")

There is no such property out of the box but the worksheet has a MergedCells property with an array of all the merged cell addresses in the worksheet and a GetMergeCellId() method which will give you the index for a given cell address.
We can therefore combine these into a little extension method you can use to get the address. Something like this:
public static string GetMergedRangeAddress(this ExcelRange #this)
{
if (#this.Merge)
{
var idx = #this.Worksheet.GetMergeCellId(#this.Start.Row, #this.Start.Column);
return #this.Worksheet.MergedCells[idx-1]; //the array is 0-indexed but the mergeId is 1-indexed...
}
else
{
return #this.Address;
}
}
which you can use as follows:
using (var excel = new ExcelPackage(new FileInfo("inputFile.xlsx")))
{
var ws = excel.Workbook.Worksheets["sheet1"];
var b3address = ws.Cells["B3"].GetMergedRangeAddress();
}
(Note that in the event that you use this method on a multi-celled range it will return the merged cell address for the first cell in the range only)

You can get all merged cells from worksheet, hence
you can find the merged range a specific cell belongs to using the following:
public string GetMergedRange(ExcelWorksheet worksheet, string cellAddress)
{
ExcelWorksheet.MergeCellsCollection mergedCells = worksheet.MergedCells;
foreach (var merged in mergedCells)
{
ExcelRange range = worksheet.Cells[merged];
ExcelCellAddress cell = new ExcelCellAddress(cellAddress);
if (range.Start.Row<=cell.Row && range.Start.Column <= cell.Column)
{
if (range.End.Row >= cell.Row && range.End.Column >= cell.Column)
{
return merged.ToString();
}
}
}
return "";
}
Update:
Turns out that there is a much easier way using EPPLUS, just do the following:
var mergedadress = worksheet.MergedCells[row, column];
For example, if B1 is in a merged range "A1:C1":
var mergedadress = worksheet.MergedCells[1, 2]; //value of mergedadress will be "A1:C1".
2 is the column number because B is the 2nd column.

This will provide you exact width of merged cells:
workSheet.Cells[workSheet.MergedCells[row, col]].Columns

Not a direct answer as Stewart's answer is perfect, but I was lead here looking for a way to get the value of a cell, whether it's part of a larger merged cell or not, so I improved on Stewart's code:
public static string GetVal(this ExcelRange #this)
{
if (#this.Merge)
{
var idx = #this.Worksheet.GetMergeCellId(#this.Start.Row, #this.Start.Column);
string mergedCellAddress = #this.Worksheet.MergedCells[idx - 1];
string firstCellAddress = #this.Worksheet.Cells[mergedCellAddress].Start.Address;
return #this.Worksheet.Cells[firstCellAddress].Value?.ToString()?.Trim() ?? "";
}
else
{
return #this.Value?.ToString()?.Trim() ?? "";
}
}
And call it like this
var worksheet = package.Workbook.Worksheets[i];
var rowCount = worksheet.Dimension.Rows;
var columnCount = worksheet.Dimension.Columns;
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= columnCount; col++)
{
string val = worksheet.Cells[row, col].GetVal();
}
}

Epplus delete all rows from specific row

It is possible to somehow delete all following rows from specific (empty) row ? I tried for cyclus
for (int rowNum = 1; rowNum <= worksheet.Dimension.End.Row; rowNum++)
{
var rowCells = from cell in worksheet.Cells
where (cell.Start.Row == rowNum)
select cell;
if (!rowCells.Any(cell => cell.Value != null))
{
worksheet.DeleteRow(rowNum);
}
}
but it takes minutes if in excel are millions of empty rows.
Epplus offer this method worksheet.DeleteRow(int rowFrom, int rows) but i do not know the count of all additional empty rows.
In following example i need to delete all rows 12+ but the problem is that i do not know the specific row, where the empty rows begin.
The alternative aproach can be finding last non empty row and delete everything with the range, which will be faster, but there is another issue with empty row inside the table.
ws.DeleteRow(lastFilledTableRow, workSheet.Dimension.End.Row - tableRowsCount,true);
In this example the problem is the red row but maybe i will tell the users that this kind of excel format is invalid and circumvent the problem.

I know that it is old but I could not find any solution so made one my by own.
It is checking the last row if it is empty and if yes it deletes it and doing this until finds non-empty row. (non-empty means here: all columns in this row have some value)
worksheet.TrimLastEmptyRows();
public static void TrimLastEmptyRows(this ExcelWorksheet worksheet)
{
while (worksheet.IsLastRowEmpty())
worksheet.DeleteRow(worksheet.Dimension.End.Row);
}
public static bool IsLastRowEmpty(this ExcelWorksheet worksheet)
{
var empties = new List<bool>();
for (int i = 1; i <= worksheet.Dimension.End.Column; i++)
{
var rowEmpty = worksheet.Cells[worksheet.Dimension.End.Row, i].Value == null ? true : false;
empties.Add(rowEmpty);
}
return empties.All(e => e);
}

Above solution is to delete last empty rows in the file. This will not work if file has empty rows in the middle of the rows list somewhere.
Below is the solution to identify the empty rows in the middle of the rows list.
I used combination of both above and mine to delete empty rows at the end of the rows list and empty rows in the middle of the rows list
private void TrimEmptyRows(ExcelWorksheet worksheet)
{
//loop all rows in a file
for (int i = worksheet.Dimension.Start.Row; i <=
worksheet.Dimension.End.Row; i++)
{
bool isRowEmpty = true;
//loop all columns in a row
for (int j = worksheet.Dimension.Start.Column; j <= worksheet.Dimension.End.Column; j++)
{
if (worksheet.Cells[i, j].Value != null)
{
isRowEmpty = false;
break;
}
}
if (isRowEmpty)
{
worksheet.DeleteRow(i);
}
}
}

Is EPPlus faster then regular loop logic to map datatable to excel spreadsheet

I'm looking for the way to improve Datatable mapping to Excel spreadsheet.
Current code using looping technique to import data from DataTable to Excel:
for (int i = 0; i < dt.Columns.Count; i++)
{
thc = new TableHeaderCell();
thc.BorderWidth = 1;
thc.BorderStyle = BorderStyle.Solid;
//thc.Style.Add("backgroundColor", "#cacab5");
thc.BorderColor = System.Drawing.Color.White;
thc.BackColor = System.Drawing.Color.BurlyWood;
thc.Text = dt.Columns[i].ColumnName;
thr.Cells.Add(thc);
}
tblReport.Rows.Add(thr);
for (int j = 0; j < dt.Rows.Count; j++)
{
tr = new TableRow();
tr.BorderStyle = BorderStyle.Solid;
if (bt == "IE")
tr.BorderWidth = 1;
else
tr.BorderWidth = 2;
tr.BorderColor = System.Drawing.Color.White;
long n;
for (int i = 0; i < dt.Columns.Count; i++)
{
td = new TableCell();
td.BorderWidth = 1;
td.BorderStyle = BorderStyle.Solid;
td.BorderColor = System.Drawing.Color.White;
td.BackColor = System.Drawing.Color.Bisque;
td.Text = dt.Rows[j][i].ToString();
n = 0;
bool isNumeric = long.TryParse(td.Text, out n);
if (isNumeric && td.Text.Length > 10)
td.Attributes.Add("style", #"mso-number-format:\#");
tr.Cells.Add(td);
}
tblReport.Rows.Add(tr);
}
I found another way to do it through EPPlus.
Which way is faster?

EPPlus is very fast and efficient as is. Anything that is going to make it look like it copies your recordset to the workbook in one method is going to internally be looping over columns and rows in one fashion or another. So, the answer is to look at how many things you are doing and how you might optimize them.
From the code snippet you provided, two things I would recommend: First, you do not have to explicitly add rows to the worksheet before inserting values into the cells. This will slow things down quite a lot as your in memory EPPlus object gets bigger. If you are always just writing to the next row, then you never have to explicitly add a row, just reference your next row by index.
Next do not set the display attributes on every cell each time you write a value to the cell. Set it for the whole row (or column) using a range. Or better yet, do it for the whole worksheet at once after you have written out all the data in the cells for all your columns and rows. An example might be like the following which sets a Style setting for the columns A through E in one method call:
using (ExcelRange er = myWorksheet.Cells["A" + lineNo.ToString() + ":E" +
lineNo.ToString()]) {
er.Style.Font.Bold = true;
}

The fastest way I have found is to pass the DataTable into the LoadFromDataTable function for a range of cells.
_workSheet.Cells[startRow,startColumn,endRow,endColumn].LoadFromDataTable(dataTable);

DataTable normalize blanks cells

I have not found a method to normalize a DataTable that came from an Excel with merged cells. When I get the DataTable from that Excel, only the first cell has the value, others are blank.
An example of this DataTable is:
and the expected result:
To summarize: blanks cells should be completed with the value of the next cell above with a value, since is what was happened with the Excel merge of cells.
I'm using Excel.dll to read this Excel, didn't provide the autofill of cells, so that's why I'm searching for a method inside C#.
I suppose that logic should be: if a cell is blank, use the upper cell as a value. The logic appears clear but I have issues trying to get the code to apply it.
This is a sample, but at the end, I'm looking for a method to do this whenever columns or rows have the datatable.
Edit:
Thanks for your quicky feedback.
Attached what i have so far for just only one column and with errors since doesn't take care of the first and last row, but is the idea... what i try to achieve is to have a method for any amount of cols and rows (could be ok if cols are fixed with names, and then if i have more columns i will adapt).
private void NormalizeDataTable(DataTable dtRawTable)
{
DataTable dtFinalized = new DataTable();
dtFinalized.Columns.Add("Col1", typeof(String));
string previousValue = "";
for (int index = 0; index <= dtRawTable.Rows.Count; index++)
{
DataRow dr = dtFinalized.NewRow();
if (index != 0 || index == dtRawTable.Rows.Count -1)
{
if (dtRawTable.Rows[index]["Modelo"].ToString() == "")
{
dr["Col1"] = previousValue;
}
else
{
dr["Col1"] = Convert.ToString(dtRawTable.Rows[index]["Modelo"].ToString());
previousValue = (string)dr["Col1"];
}
}
dtFinalized.Rows.Add(dr);
dtFinalized.AcceptChanges();
}
}

Here is the function i using in my project for same requirement.
public static DataTable AutoFillBlankCellOfTable(DataTable outputTable)
{
for (int i = 0; i < outputTable.Rows.Count; i++)
{
for (int j = 0; j < outputTable.Columns.Count; j++)
{
if (outputTable.Rows[i][j] == DBNull.Value)
{
if (i > 0)
outputTable.Rows[i][j] = outputTable.Rows[i - 1][j];
}
}
}
return outputTable;
}

Delete Empty Rows with Excel Interop

I have user supplied excel files that need to be converted to PDF. Using excel interop, I can do this fine with .ExportAsFixedFormat(). My problem comes up when a workbook has millions of rows. This turns into a file that has 50k+ pages. That would be fine if the workbook had content in all of those rows. Every time one of these files shows up though, there are maybe 50 rows that have content and the rest are blank. How can I go about removing the empty rows so I can export it to a decent sized PDF?
I've tried starting at the end row and, one-by-one, using CountA to check if the row has content and if it does, delete it. Not only does this take forever, this seems to fail after about 100k rows with the following error:
Unable to evaluate expression because the code is optimized or a native frame is on top of the call stack.
I've tried using SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues) but that includes a row if any cell has formatting (like a bg color).
I've tried using Worksheet.UsedRange and then deleting everything after that but UsedRange has the same problem as point two.
This is the code I've tried:
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
currentRowIndex = rows.Count;
bool contentFound = false;
while (!contentFound && currentRowIndex > 0)
{
currentRow = rows[currentRowIndex];
if (Application.WorksheetFunction.CountA(currentRow) == 0)
{
currentRow.Delete();
}
else
{
contentFound = true;
}
Marshal.FinalReleaseComObject(currentRow);
currentRowIndex--;
}
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
lastCell = rows.SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues);
int startRow = lastCell.Row;
Range range = sheet.get_Range(lastCell.get_Address(RowAbsolute: startRow));
range.Delete();
Marshal.FinalReleaseComObject(range);
Marshal.FinalReleaseComObject(lastCell);
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
Do I have a problem with my code, is this an interop problem or maybe it's just a limitation on what Excel can do? Is there a better way to do what I'm attempting?

I would suggest you to get the count of rows which contain some values, using CountA (as you have tried in point 1). Then copy those rows into a new sheet and export it from there. It will be easier to copy few rows to new sheet and working on it, rather than trying to delete huge number of rows from source sheet.
For creating new sheet and copying rows you can use the following code:
excel.Worksheet tempSheet = workbook.Worksheets.Add();
tempSheet.Name = sheetName;
workbook.Save();
//create a new method for copy new rows
//as the rowindex you can pass the total no of rows you have found out using CountA
public void CopyRows(excel.Workbook workbook, string sourceSheetName, string DestSheetName, int rowIndex)
{
excel.Worksheet sourceSheet = (excel.Worksheet)workbook.Sheets[sourceSheetName];
excel.Range source = (excel.Range)sourceSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
excel.Worksheet destSheet = (excel.Worksheet)workbook.Sheets[DestSheetName];
excel.Range dest = (excel.Range)destSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
source.Copy(dest);
excel.Range newRow = (excel.Range)destSheet.Rows[rowIndex+1];
newRow.Insert();
workbook.Save();
}

Have you tried Sheet1.Range("A1").CurrentRegion.ExportAsFixedFormat() where Sheet1 is a valid sheet name and "A1" is a cell you can test to ensure it is located in the range you want to export?
The question remains, why does Excel think there is data in those "empty" cells? Formatting? A pre-existing print area that needs to be cleared? I know I've encountered situations like that before, those are the only possibilities that come to mind at this moment.

Try these steps -
copy Worksheet.UsedRange to a separate sheet (sheet2).
use paste special so that formatting is retained
try parsing sheet2 for unused rows
If this doesnt help try repeating step 2 with formatting info being cleared and then parsing sheet2. you can always copy format info later (if they are simple enough)

If you can first load the Excel file into a DataSet via the OleDBAdapter, it's relatively easy to remove blank rows on the import...
Try this OleDBAdapter Excel QA I posted via stack overflow.
Then export the DataSet to a new Excel file and convert that file to PDF. That may be a big "IF" though of course depending on the excel layout (or lack there of).

I had to solve this problem today for what might be a subset of your possible cases.
If your spreadsheet meets the following conditions:
All columns with data have header text in line 1.
All rows with data are in sequence until the first BLANK row.
Then, the following code may help:
private static string[,] LoadCellData(Excel.Application excel, dynamic sheet)
{
int countCols = CountColsToFirstBlank(excel, sheet);
int countRows = CountRowsToFirstBlank(excel, sheet);
cellData = new string[countCols, countRows];
string datum;
for (int i = 0; i < countCols; i++)
{
for (int j = 0; j < countRows; j++)
{
try
{
if (null != sheet.Cells[i + 1, j + 1].Value)
{
datum = excel.Cells[i + 1, j + 1].Value.ToString();
cellData[i, j] = datum;
}
}
catch (Exception ex)
{
lastException = ex;
//Console.WriteLine(String.Format("LoadCellData [{1}, {2}] reported an error: [{0}]", ex.Message, i, j));
}
}
}
return cellData;
}
private static int CountRowsToFirstBlank(Excel.Application excel, dynamic sheet)
{
int count = 0;
for (int j = 0; j < sheet.UsedRange.Rows.Count; j++)
{
if (IsBlankRow(excel, sheet, j + 1))
break;
count++;
}
return count;
}
private static int CountColsToFirstBlank(Excel.Application excel, dynamic sheet)
{
int count = 0;
for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
{
if (IsBlankCol(excel, sheet, i + 1))
break;
count++;
}
return count;
}
private static bool IsBlankCol(Excel.Application excel, dynamic sheet, int col)
{
for (int i = 0; i < sheet.UsedRange.Rows.Count; i++)
{
if (null != sheet.Cells[i + 1, col].Value)
{
return false;
}
}
return true;
}
private static bool IsBlankRow(Excel.Application excel, dynamic sheet, int row)
{
for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
{
if (null != sheet.Cells[i + 1, row].Value)
{
return false;
}
}
return true;
}

Can you try with below code :
for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
{
//Assume the first row is the header. Then use the column match ups by name to determine the index.
//This will allow you to have the order of the header.Keys change without any affect.
var row = workSheet.Cells[string.Format("{0}:{0}", rowIndex)];
// check if the row and column cells are empty
bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // skip this row
else{
//here read header
if()
{
//some code
}
else
{
//some code to read body
}
}
}
Hope this help,else let me know if you need description about code.
Updated :
below code is used to check how many rows are in the worksheet. a for loop will traverse untill end of row of the worksheet.
for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
here we are checking if the row and column cells are empty using linq:
bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // if true then skip this row
else
// read headers(assuming it is presented in worksheet)
// else read row wise data
and then do necessary steps.
hoping this clears now.

I had the same problem and managed to fix it using the CurrentRegion:
var lastcell = sheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell);
var filledcells = sheet.Cells.Range[sheet.Cells.Item[1, 1],
sheet.Cells[lastcell.Row - 1, lastcell.Column]]
.CurrentRegion;
filledcells.ExportAsFixedFormat(
and so on. The CurrentRegion is said to expand to the borders where cells are empty, and apparently that means it also shrinks if it contains many empty cells.

Please try the following code:
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
sheet.Columns("A:A").SpecialCells(XlCellType.xlCellTypeBlanks).EntireRow.Delete
sheet.Rows("1:1").SpecialCells(XlCellType.xlCellTypeBlanks).EntireColumn.Delete
Marshal.FinalReleaseComObject(sheet);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Comparing a formula column with value column in excel - c#

Related

Get Merged Cell Area with EPPLus

Epplus delete all rows from specific row

Is EPPlus faster then regular loop logic to map datatable to excel spreadsheet

DataTable normalize blanks cells

Delete Empty Rows with Excel Interop

Categories

Resources