Loop on excel rows in different Excel worksheets C# - c#

I have an excel file myexcel.xlsx which has multiple worksheets. All worksheets has the same columns names in the first row. One column is called ID and another column is called Total. I am going through each row in every worksheet then I wonder how I can then check the columns ID if it exists in any other row in the same or other worksheet. If the ID is not found somewhere else then the total will be only equal to the Total column of this row, but if the same ID exists in another row in the same/other worksheet then I want to add the Total column of the other row as well then ignore all these rows of the same ID in the for loop so that they are not repeated.
Excel.Application myapp = new Excel.Application();
Excel.Workbook myworkbook = myapp(#"myexcel.xlsx");
for (int i = 1; i <= myworkbook.Worksheets.Count; i++)
{
Excel._Worksheet myworksheet = myworkbook.Worksheets[i];
Excel.Range myrange = myworksheet.UsedRange;
int myrowCount = myrange.Rows.Count;
}

I don't know the correct syntax for working with Excel sheets, so I'll give you a basic example for what I think you are asking. You'll have to adjust the code so it works, but if you want to aggregate (get the sum) of the "Total" in all the rows with the same "ID" you should be able to do something like this:
var totalsDictionary = new Dictionary<int, int>();
for (var ws = 0; ws < worksheets.Count; ws++)
{
var worksheet = worksheets[ws];
for (var row = 0; row < worksheet.Rows.Count; row++)
{
var id = worksheet.Rows[row]["ID"];
var total = worksheet.Rows[row]["Total"];
if (totalsDictionary.ContainsKey(id))
{
totalsDictionary[id] += total;
}
else
{
totalsDictionary.Add(id, total);
}
}
}
// totalsDictionary now contains the sum of Totals for each ID

Related

How to Check and merge two rows if next value is same or not in excel with epPlus

I am working on dynamic Excel creation with the help of EPPlus library and I have an excel which data looks like:
Name EmpCode Department attendance
Prashant 111 CSE 70% for Sep
Prashant 111 CSE 90% for Oct
XYZ 112 HR 50% for Sep
XYZ 112 HR 90% for Oct
What I want is:
if the current EmpCode is equal to the value of next row then merge this both columns so the expected output will be
I am damn sure that each empCode will be repeated only two times.
The code what I have tried:
for (var rowNum = 1; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var row = ws.Cells[string.Format("{0}:{0}", rowNum)];
}
This code will only work if the empcode is repeated twice but you said you're damn sure it will only be repeated twice so it should be okay, just not very scale-able.
Once you get the data in your spreadsheet you have to loop through all the rows in your dataset. At the beginning of the loop you set the range of your current row and at the end of the loop you set the range of your prior row.
If the previous range is set, you evaluate the columns of each row to determine if you should merge the cells together.
using (var p = new OfficeOpenXml.ExcelPackage(new FileInfo(#"c:\FooFolder\Foo.xlsx")))
{
ExcelWorkbook wb = p.Workbook;
ExcelWorksheet ws = wb.Worksheets[1];
//create variable for previous range that will persist through each loop
ExcelRange previousRange = null;
//set position of first column to merge
int mergecellBegin = 1;
//set position of last column to merge
int mergeCellEnd = 3;
//create variable to check the cells of your rows
bool areCellsEqual;
//iterate through each row in the dataset
for (var rowNum = 2; rowNum <= ws.Dimension.End.Row; rowNum++)
{
ExcelRange currentRange = ws.Cells[rowNum, 1, rowNum, mergeCellEnd];
//will skip if we haven't set previous range yet
if (previousRange != null)
{
//reset your check variable
areCellsEqual = true;
//check if all cells in the ranges are qual to eachother
for (int i = 1; i <= mergeCellEnd; i++)
{
//if the cells from the ranges are not equal then set check variable to false and break the loop
if (!currentRange[rowNum, i].Value.Equals(previousRange[rowNum - 1, i].Value))
{
areCellsEqual = false;
break;
}
}
//if all cells from the two ranges match, merge them together.
if (areCellsEqual)
{
//merge each cell in the ranges
for (int i = 1; i <= mergeCellEnd; i++)
{
ExcelRange mergeRange = ws.Cells[rowNum - 1, i, rowNum, i];
mergeRange.Merge = true;
}
}
}
//sets the previous range to the current range to be used in next iteration
previousRange = currentRange;
}
p.Save();
}

Epplus delete all rows from specific row

It is possible to somehow delete all following rows from specific (empty) row ? I tried for cyclus
for (int rowNum = 1; rowNum <= worksheet.Dimension.End.Row; rowNum++)
{
var rowCells = from cell in worksheet.Cells
where (cell.Start.Row == rowNum)
select cell;
if (!rowCells.Any(cell => cell.Value != null))
{
worksheet.DeleteRow(rowNum);
}
}
but it takes minutes if in excel are millions of empty rows.
Epplus offer this method worksheet.DeleteRow(int rowFrom, int rows) but i do not know the count of all additional empty rows.
In following example i need to delete all rows 12+ but the problem is that i do not know the specific row, where the empty rows begin.
The alternative aproach can be finding last non empty row and delete everything with the range, which will be faster, but there is another issue with empty row inside the table.
ws.DeleteRow(lastFilledTableRow, workSheet.Dimension.End.Row - tableRowsCount,true);
In this example the problem is the red row but maybe i will tell the users that this kind of excel format is invalid and circumvent the problem.
I know that it is old but I could not find any solution so made one my by own.
It is checking the last row if it is empty and if yes it deletes it and doing this until finds non-empty row. (non-empty means here: all columns in this row have some value)
worksheet.TrimLastEmptyRows();
public static void TrimLastEmptyRows(this ExcelWorksheet worksheet)
{
while (worksheet.IsLastRowEmpty())
worksheet.DeleteRow(worksheet.Dimension.End.Row);
}
public static bool IsLastRowEmpty(this ExcelWorksheet worksheet)
{
var empties = new List<bool>();
for (int i = 1; i <= worksheet.Dimension.End.Column; i++)
{
var rowEmpty = worksheet.Cells[worksheet.Dimension.End.Row, i].Value == null ? true : false;
empties.Add(rowEmpty);
}
return empties.All(e => e);
}
Above solution is to delete last empty rows in the file. This will not work if file has empty rows in the middle of the rows list somewhere.
Below is the solution to identify the empty rows in the middle of the rows list.
I used combination of both above and mine to delete empty rows at the end of the rows list and empty rows in the middle of the rows list
private void TrimEmptyRows(ExcelWorksheet worksheet)
{
//loop all rows in a file
for (int i = worksheet.Dimension.Start.Row; i <=
worksheet.Dimension.End.Row; i++)
{
bool isRowEmpty = true;
//loop all columns in a row
for (int j = worksheet.Dimension.Start.Column; j <= worksheet.Dimension.End.Column; j++)
{
if (worksheet.Cells[i, j].Value != null)
{
isRowEmpty = false;
break;
}
}
if (isRowEmpty)
{
worksheet.DeleteRow(i);
}
}
}

Read excel file row by row, cell by cell C#

I want (as title states) to programmatically read values from an Excel file. Row by row and then cell by cell, to have the freedom of creating custom collections out of cell's data.
This questions helped me.
But I need more flexible code. Can I for example write (* is just for all columns)
Range range1 = worksheet.get_Range("*1", Missing.Value)
foreach (Range r in range1)
{
string user = r.Text;
string value = r.Value2;
}
And iterate all cells in row 1 as long as there is next.
There must be some elegant way to iterate through rows and cells in C#.
You can rely on the Rows/Columns properties and then iterate through all the contained ranges (Cells). Sample code:
Range range1 = worksheet.Rows[1]; //For all columns in row 1
//Range range1 = worksheet.Columns[1]; //for all rows in column 1
foreach (Range r in range1.Cells) //range1.Cells represents all the columns/rows
{
// r is the range of the corresponding cell
}
Try this:
Excel.Range r = worksheet.get_Range("*1", Missing.Value);
for (int j = 0; j < r.Rows.Count; j++) {
Excel.Range currentCell = r.Rows[j + 1] as Excel.Range;
}

find the last used row in a column in windows c#

I have a excel report and i need to draw charts based on the data in the report. Am able to get the range from a particular column to last filled row like shown below. I have many columns in my report and i need only the data in a particular column like ("c1","
c12"). the column length may vary. it need not be 12. How can i get the range till last filled row of a column.
Excel.Range last1 = xlWorkSheet2.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);
oRange = xlWorkSheet2.get_Range("A6", last1);
Try the following code. This works by selecting the top cell in a row, and then searching downwards until the end of the range is found. The range column is simply the range between start and end. Note that this will only find the last contiguous cell in the range, and will not search through blank rows.
Excel.Range start = xlWorkSheet2.Range["A1"];
Excel.Range column;
if (start.Offset[1].Value != null)
column = xlWorkSheet2.Range[start, start.End[Excel.XlDirection.xlDown]];
else
column = start;
The following code will allow you to retrieve the full used range of the column even if there are blank rows. This code works in a similar manner, but searches upwards from the bottom of the used range in the worksheet to find the last cell in the column containing a value.
Excel.Range start = xlWorkSheet2.Range["A1"];
Excel.Range bottom = xlWorkSheet2.Range["A" + (ws.UsedRange.Rows.Count + 1)];
Excel.Range end = bottom.End[Excel.XlDirection.xlUp];
Excel.Range column = xlWorkSheet2.Range[start, end];
Hi found that all the above methods didn't work for what I wanted to do, so here is my solution:
public object GetLastNotEmptyRowOfColumn(string sheet, string column,int startRow,int endRow)
{
try
{
var validColumn = Regex.IsMatch(column, #"^[a-zA-Z]+$");
if(!validColumn)
{
throw new Exception($"column can only a letter. value entered : {column}");
}
xlBook = xlApp.ActiveWorkbook;
xlSheet = xlBook.Sheets[sheet];
xlRange = xlSheet.Range[$"{column}{startRow}", $"{column}{endRow}"];
object[,] returnVal = xlRange.Value;
var rows = returnVal.GetLength(0);
// var cols = returnVal.GetLength(1);
int count = 1;
for (int r = 1; r <= rows; r++)
{
var row = returnVal[r, 1];
if (row == null) break;
count++;
}
//returns an object : {Count:10,Cell:A9}
return= new { Count=count-1, Cell=$"{column}{startRow+count-1}" };
}
catch (Exception ex)
{
......
}
return null;
}
Usage: var response = GetLastNotEmptyRowOfColumn("Sheet1", "A",1,100);
Result:

Delete Empty Rows with Excel Interop

I have user supplied excel files that need to be converted to PDF. Using excel interop, I can do this fine with .ExportAsFixedFormat(). My problem comes up when a workbook has millions of rows. This turns into a file that has 50k+ pages. That would be fine if the workbook had content in all of those rows. Every time one of these files shows up though, there are maybe 50 rows that have content and the rest are blank. How can I go about removing the empty rows so I can export it to a decent sized PDF?
I've tried starting at the end row and, one-by-one, using CountA to check if the row has content and if it does, delete it. Not only does this take forever, this seems to fail after about 100k rows with the following error:
Unable to evaluate expression because the code is optimized or a native frame is on top of the call stack.
I've tried using SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues) but that includes a row if any cell has formatting (like a bg color).
I've tried using Worksheet.UsedRange and then deleting everything after that but UsedRange has the same problem as point two.
This is the code I've tried:
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
currentRowIndex = rows.Count;
bool contentFound = false;
while (!contentFound && currentRowIndex > 0)
{
currentRow = rows[currentRowIndex];
if (Application.WorksheetFunction.CountA(currentRow) == 0)
{
currentRow.Delete();
}
else
{
contentFound = true;
}
Marshal.FinalReleaseComObject(currentRow);
currentRowIndex--;
}
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
lastCell = rows.SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues);
int startRow = lastCell.Row;
Range range = sheet.get_Range(lastCell.get_Address(RowAbsolute: startRow));
range.Delete();
Marshal.FinalReleaseComObject(range);
Marshal.FinalReleaseComObject(lastCell);
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
Do I have a problem with my code, is this an interop problem or maybe it's just a limitation on what Excel can do? Is there a better way to do what I'm attempting?
I would suggest you to get the count of rows which contain some values, using CountA (as you have tried in point 1). Then copy those rows into a new sheet and export it from there. It will be easier to copy few rows to new sheet and working on it, rather than trying to delete huge number of rows from source sheet.
For creating new sheet and copying rows you can use the following code:
excel.Worksheet tempSheet = workbook.Worksheets.Add();
tempSheet.Name = sheetName;
workbook.Save();
//create a new method for copy new rows
//as the rowindex you can pass the total no of rows you have found out using CountA
public void CopyRows(excel.Workbook workbook, string sourceSheetName, string DestSheetName, int rowIndex)
{
excel.Worksheet sourceSheet = (excel.Worksheet)workbook.Sheets[sourceSheetName];
excel.Range source = (excel.Range)sourceSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
excel.Worksheet destSheet = (excel.Worksheet)workbook.Sheets[DestSheetName];
excel.Range dest = (excel.Range)destSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
source.Copy(dest);
excel.Range newRow = (excel.Range)destSheet.Rows[rowIndex+1];
newRow.Insert();
workbook.Save();
}
Have you tried Sheet1.Range("A1").CurrentRegion.ExportAsFixedFormat() where Sheet1 is a valid sheet name and "A1" is a cell you can test to ensure it is located in the range you want to export?
The question remains, why does Excel think there is data in those "empty" cells? Formatting? A pre-existing print area that needs to be cleared? I know I've encountered situations like that before, those are the only possibilities that come to mind at this moment.
Try these steps -
copy Worksheet.UsedRange to a separate sheet (sheet2).
use paste special so that formatting is retained
try parsing sheet2 for unused rows
If this doesnt help try repeating step 2 with formatting info being cleared and then parsing sheet2. you can always copy format info later (if they are simple enough)
If you can first load the Excel file into a DataSet via the OleDBAdapter, it's relatively easy to remove blank rows on the import...
Try this OleDBAdapter Excel QA I posted via stack overflow.
Then export the DataSet to a new Excel file and convert that file to PDF. That may be a big "IF" though of course depending on the excel layout (or lack there of).
I had to solve this problem today for what might be a subset of your possible cases.
If your spreadsheet meets the following conditions:
All columns with data have header text in line 1.
All rows with data are in sequence until the first BLANK row.
Then, the following code may help:
private static string[,] LoadCellData(Excel.Application excel, dynamic sheet)
{
int countCols = CountColsToFirstBlank(excel, sheet);
int countRows = CountRowsToFirstBlank(excel, sheet);
cellData = new string[countCols, countRows];
string datum;
for (int i = 0; i < countCols; i++)
{
for (int j = 0; j < countRows; j++)
{
try
{
if (null != sheet.Cells[i + 1, j + 1].Value)
{
datum = excel.Cells[i + 1, j + 1].Value.ToString();
cellData[i, j] = datum;
}
}
catch (Exception ex)
{
lastException = ex;
//Console.WriteLine(String.Format("LoadCellData [{1}, {2}] reported an error: [{0}]", ex.Message, i, j));
}
}
}
return cellData;
}
private static int CountRowsToFirstBlank(Excel.Application excel, dynamic sheet)
{
int count = 0;
for (int j = 0; j < sheet.UsedRange.Rows.Count; j++)
{
if (IsBlankRow(excel, sheet, j + 1))
break;
count++;
}
return count;
}
private static int CountColsToFirstBlank(Excel.Application excel, dynamic sheet)
{
int count = 0;
for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
{
if (IsBlankCol(excel, sheet, i + 1))
break;
count++;
}
return count;
}
private static bool IsBlankCol(Excel.Application excel, dynamic sheet, int col)
{
for (int i = 0; i < sheet.UsedRange.Rows.Count; i++)
{
if (null != sheet.Cells[i + 1, col].Value)
{
return false;
}
}
return true;
}
private static bool IsBlankRow(Excel.Application excel, dynamic sheet, int row)
{
for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
{
if (null != sheet.Cells[i + 1, row].Value)
{
return false;
}
}
return true;
}
Can you try with below code :
for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
{
//Assume the first row is the header. Then use the column match ups by name to determine the index.
//This will allow you to have the order of the header.Keys change without any affect.
var row = workSheet.Cells[string.Format("{0}:{0}", rowIndex)];
// check if the row and column cells are empty
bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // skip this row
else{
//here read header
if()
{
//some code
}
else
{
//some code to read body
}
}
}
Hope this help,else let me know if you need description about code.
Updated :
below code is used to check how many rows are in the worksheet. a for loop will traverse untill end of row of the worksheet.
for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
here we are checking if the row and column cells are empty using linq:
bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // if true then skip this row
else
// read headers(assuming it is presented in worksheet)
// else read row wise data
and then do necessary steps.
hoping this clears now.
I had the same problem and managed to fix it using the CurrentRegion:
var lastcell = sheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell);
var filledcells = sheet.Cells.Range[sheet.Cells.Item[1, 1],
sheet.Cells[lastcell.Row - 1, lastcell.Column]]
.CurrentRegion;
filledcells.ExportAsFixedFormat(
and so on. The CurrentRegion is said to expand to the borders where cells are empty, and apparently that means it also shrinks if it contains many empty cells.
Please try the following code:
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
sheet.Columns("A:A").SpecialCells(XlCellType.xlCellTypeBlanks).EntireRow.Delete
sheet.Rows("1:1").SpecialCells(XlCellType.xlCellTypeBlanks).EntireColumn.Delete
Marshal.FinalReleaseComObject(sheet);
}

Categories

Resources