Deleting duplicate rows in Excel using Epplus

Deleting duplicate rows in Excel using Epplus - c#

I have a worksheet with a number of rows and several columns. I want to delete all duplicate rows in this worksheet. In other words, the highlighted rows in this screenshot should be deleted, and the rows below should be moved up:
and should result in the following:
I'm using the following snippet of code:
List<int> rowsToDelete = new List<int>();
for (int row = 1; row <= worksheet.Dimension.End.Row; row++)
{
string a = worksheet.Cells[row,1].Value.ToString();
string b = worksheet.Cells[row,2].Value.ToString();
string c = worksheet.Cells[row,3].Value.ToString();
int i = row + 1;
while (worksheet.Cells[i,1].Value.ToString().Equals(a) &&
worksheet.Cells[i,2].Value.ToString().Equals(b) &&
worksheet.Cells[i,3].Value.ToString().Equals(c))
{
rowsToDelete.add(i);
i++;
}
}
foreach (var row in rowsToDelete)
{
worksheet.Delete(row);
}
It is not deleting the correct rows. How can I fix this?
This is using Epplus 4.5.3.3 and .NET Framework 4.6.1

I can only assume you are misunderstanding my comment in reference to the posted while statement…
while (worksheet.Cells[i,1].Value.ToString().Equals(a) &&
worksheet.Cells[i,2].Value.ToString().Equals(b) &&
worksheet.Cells[i,3].Value.ToString().Equals(c)) { …
This will work ONLY if the duplicate rows are contiguous. Example, using the first posted picture, lets assume there is a row nine (9) and in this row we have the “duplicate” cell values “a”, “b” and “c”. So when the while loop starts row 2 will evaluate to true as that row is a duplicate of row 1. So row index 2 is added to the list. On the next iteration of the while loop we will add row 3 as a duplicate. However, when we get to row 4, the while condition will evaluate to false as row 4 is NOT a duplicate of row 1. Therefore, the while loop will “EXIT” and the code will loop back up to the initial for loop to check the next row duplicates. At this point the duplicate at row 9 will never get checked thus leaving it as a duplicate row.
The point is that you do NOT want to stop checking for duplicate rows if one of the rows is NOT a duplicate. You need to continue through all the rows as the duplicate row could be on ANY row.
It should also be noted that it may be helpful to avoid “checking” for a duplicate row that has already been marked as a duplicate. Example, using the same first picture, in the first pass though the rows for the “first” row will add rows 2 and 3 as “duplicate” rows. So, when the while loop exits and we loop back up to the next row to check it will be row 2. However row 2 is ALREADY marked as a duplicate, so really there is no need to check that row for duplicates. In the solution below, a check will be made to see if the row that we are checking is already marked as a duplicate. If the row IS marked as a duplicate then we will simply skip that row.
Next, the last foreach loop to actually remove the rows may have some issues. Example, let’s say that the rows to remove list contains rows 2, 3 and 7. So inside the foreach loop... the code removes row 2. After this row is removed, then, row 3 is now row 2 and row 4 is now row 3 etc.… Therefore on the next iteration of the loop it will remove row 3 which is NOW row two (2). I hope it is clear that removing the rows in a top-down fashion WILL NOT WORK because as soon as the first row is removed, then all the row indexes below that row will change.
So, if we want to delete the proper rows in the list of row indexes, then, we can accomplish this by removing the rows from a bottom-up fashion. If we remove the rows from the bottom-up, then we do not have to worry about getting the indexes mixed up as we do when deleting the rows in a top-down fashion.
Given all this, I suggest you break this problem up into two steps. The first step simply fills the list of duplicate rows. Bear in mind, since we will be checking for the duplicate rows in a top-down fashion, the list of row indexes may not necessarily be in an ordered fashion. Example, if we add a duplicate row 9 as previously suggested, then the list of row indexes to delete would be { 2, 3, 9, 7 }. The 9 is BEFORE the 7 because row 9 was found to be a duplicate of row 1 and row 7 was found to be a duplicate of row 6. The point here is that the list may not necessarily be in an ordered fashion and this will create problems as described above.
Therefore, after we get the list of row indexes to delete, we will SORT the list. This will set the list as { 2, 3, 7, 9 }. At this point we could simply start deleting the rows from the bottom of the list on up, OR in the example below we will simply REVERSE the list so it will become { 9, 7, 3, 2 }. Then we will have an ordered list of ints that are order from high to low. NOW the for loop through the list should work without mixing up the row indexes.
To help, I suggest you create a method that takes an open worksheet and returns our “unsorted” list of row indexes we want to delete. To simplify things, all the code does is add the row indexes of duplicate rows. Walking through the code below we start by looping through all the rows in the worksheet. If we get to a row that has already been marked as a duplicate, then we will skip that row.
If the row is not marked as a duplicate, then the code will start another for loop that starts on the next row and ends on the last row. Again if we get to a row that has already been marked as a duplicate, then we will skip that row. Once the code has looped through all the rows we simply return the list of row indexes to delete.
private List<int> GetDuplicateRowsToDelete(ExcelWorksheet worksheet) {
List<int> rowsToDelete = new List<int>();
string a, b, c;
for (int i = 1; i <= worksheet.Dimension.End.Row; i++) {
if (!rowsToDelete.Contains(i)) {
a = worksheet.Cells[i, 1].Value.ToString();
b = worksheet.Cells[i, 2].Value.ToString();
c = worksheet.Cells[i, 3].Value.ToString();
for (int j = i + 1; j <= worksheet.Dimension.End.Row; j++) {
if (!rowsToDelete.Contains(j)) {
if (worksheet.Cells[j, 1].Value.ToString().Equals(a) &&
worksheet.Cells[j, 2].Value.ToString().Equals(b) &&
worksheet.Cells[j, 3].Value.ToString().Equals(c)) {
rowsToDelete.Add(j);
}
}
}
}
}
return rowsToDelete;
}
Finally we could make use of this method to get the indexes to delete, then we will sort and reverse the list, then delete the rows from the bottom up. Something like…
private void button1_Click(object sender, EventArgs e) {
FileInfo newFile = new FileInfo(#"D:\Test\Excel_Test\RemoveDup1.xlsx");
using (ExcelPackage pck = new ExcelPackage(newFile)) {
using (ExcelWorksheet worksheet = pck.Workbook.Worksheets[0]) {
List<int> rowsToDel = GetDuplicateRowsToDelete(worksheet);
rowsToDel.Sort();
rowsToDel.Reverse();
foreach (int rowIndex in rowsToDel) {
worksheet.DeleteRow(rowIndex);
}
pck.Save();
}
}
MessageBox.Show("Removed duplicates complete");
}
I hope this makes sense and helps.

I have solved your issue in another way: I have created two extra columns, "CONCAT" and "COUNT":
"CONCAT" contains the formula =A2+B2+C2 (till the end of the array)
"COUNT" contains the formula ==COUNTIF(D$2:D$9,D2) (also till the end of the array)
From then on, just write a VBA macro, checking the values "E9" back to "E2" and in case the value is larger than 1, remove the entire row.

Related

Displaying original rowindex after filter in datagridview

Just want to ask on how to get the original row index of selected row in DataGridView after filter.
I have DataGridView with 2 columns :name and age. And I have a TextBox that serves as filter. Let's say I have 8 records and upon filtering it goes to only 4 records and upon clicking the last record, I get row index of 4, while I need to get the original index of this row and display it on MessageBox. How will I do it?
Thank you.

Original row index means the index of the DataRow in the DataTable which can be found by DataTable.Rows.IndexOf(row). So to find the original index of the row you can use the following code:
var r = ((DataRowView)BindingContext[dataGridView1.DataSource].Current).Row;
var index = r.Table.Rows.IndexOf(r);
In case you are interested to do that for all rows in the DataGridView, as also is mentioned by Taw in comments, you can look into the DataBoundItem of the DataGridViewRow:
var r = ((DataRowView)dgvRow.DataBoundItem).Row; // dgvRow is a row of the DataGridView
var index = r.Table.Rows.IndexOf(r);

From your comments you said that you need it to display current record selected so I will not directly answer How to get row index on filtered table but how to get current record selected.
So to simply get current selected record use this code:
//Use this one if your datagridview SelectionMode is not FullRowSelect
DataGridViewRow row = dataGridView1.Rows[dataGridView1.SelectedCells[0].RowIndex];
string name = row.Cells["Name"].Value.ToString();
int age = Convert.ToInt32(row.Cells["Age"].Value);
//If your datagridview SelectionMode is FullRowSelect then use this
DataGridViewRow row = dataGridView1.SelectedRows[0];
string name = row.Cells["Name"].Value.ToString();
int age = Convert.ToInt32(row.Cells["Age"].Value);
Reza answered main part of your question but if for some reason it is not working you can use this since your primary key is your NAME
foreach(DataRow r in yourDataTable.Rows)
{
if(r["NAME"].ToString() == row.Cells["Name"].Value.ToString()) //This row.cells... is the one from above code
{
int originalRowIndexInDataTable = dt.Rows.IndexOf(r);
return;
}
}

does OnAutoGeneratingColumn e.cancel affect Table.Column.Count?

If I use OnAutoGeneratingColumn to cancel some columns that I don't necessarily want to generate, will it affect the number of columns in Table.Columns.Count?
Context
I'm iterating through a table, row by row, taking each value and passing it through to an insert SQL command. Right now it lines up so that each entry is associated properly. Will I disrupt this with e.cancel? Will row[1] no longer point to what it once did if row[0] was e.cancel'd?
for (int i = 0; i < table.Dummy.Columns.Count; i++)
{
// if we're past our first entry, add room for the next before entering it
if (i != 0)
{
InsertIntoTableQuery.AddIntPrm();
}
//if our column has an entry, add it into our table.
if (row[i] != null)
{
InsertIntoTableQuery.Prms[i].Val = row[i];
}
}

No! I figured out why I was erroring and this wasn't the cause. you can cancel column generation in the wpf datagrid without actually altering the table index's. In hindsight that's pretty obvious.

Adding DataRow[] data to each column in an unbound DataGridView?

I am trying to query my DataSet and display the results in an unbound DataGridView. I feel like I am quite close with my programming logic here, but I keep getting the error ArgumentOutOfRange Exception. Index was out of range. Must be non-negative and less than the size of the collection.
My code snippet:
DataRow[] foundRows;
//Queries the Reservations table with the 'searchExpression' variable
foundRows = this.reservationMasterDataSet.Tables["Reservations"].Select(searchExpression);
//If there is at least one record found...
if (foundRows.Length > 0)
{
//Used to count our row indexes
int i = 0;
//Populate the DataGridView with the queried response
foreach (DataRow row in foundRows)
{
//Used to count our column indexes
for (int j = 0; j < reservationMasterDataSet.Tables["Reservations"].Columns.Count; j++)
{
//THIS LINE IS THROWING AN EXCEPTION
dataGridView1.Rows[i].Cells[j].Value = row.ItemArray[j];
}
i++;
}
}
My DataRow contains 12 objects so I made sure that the DataGridView has 12 columns to correspond (and there are 12 in the original database). I think I am getting the exception right away (i is still 0 in debugger). I first tried it using just row[i] but got the same error.
This is meant to be a search results pane, not an editable thing, which is why I want to only return certain results. I figured the DataGridView is the nicest and easiest way to layout the record on a Windows form.

Before you access to DataGridView1.Rows[i].Cells[j], you need to make sure DataGridView1.Rows[i] exists. If no, you need to add it to the DataGridViewRowCollection.
You can find a lot of sample on on this page.

How to find column count in a TableHeaderRow with only table id?

I have a situation where I want to make a generic codebehind function to show a message row that spans the whole table. I have previously passed the Table object and the number of columns so that it could set the column span, but this is somewhat error prone as we sometimes add new columns and I have to update the column count numbers for the messages.
There doesn't seem to be any column count in the Table object and neither any way to get the TableHeaderRow that has been added in the aspx file. I'd like to avoid having to add id's to all the TableHeaderRow's as well.

Try this, should work (assuming the TableHeaderRow is the first child of the Table):
int j = 0;
foreach (Control current in tableId.Controls[0].Controls)
{
if (current.ToString() == "System.Web.UI.WebControls.TableHeaderCell")
{
j++;
}
}

Error: Deleted row information cannot be accessed through the row

To whom this may concern, I have searched a considerable amount of time, to work a way out of this error
"Deleted row information cannot be accessed through the row"
I understand that once a row has been deleted from a datatable that it cannot be accessed in a typical fashion and this is why I am getting this error. The big issue is that I am not sure what to do to get my desired result, which I will outline below.
Basically when a row in "dg1" is deleted the row beneath it takes the place of the deleted row (obviously) and thus inherits the deleted rows index. The purpose of this method is to replace and reset the rows index (via grabbing it from the corresponding value in the dataset) that took the deleted rows place and as such the index value.
Right now I am just using a label (lblText) to try and get a response from the process, but it crashes when the last nested if statement trys to compare values.
Here is the code:
void dg1_Click(object sender, EventArgs e)
{
rowIndex = dg1.CurrentRow.Index; //gets the current rows
string value = Convert.ToString(dg1.Rows[rowIndex].Cells[0].Value);
if (ds.Tables[0].Rows[rowIndex].RowState.ToString() == "Deleted")
{
for (int i = 0; i < dg1.Rows.Count; i++)
{
if (Convert.ToString(ds.Tables[0].Rows[i][0].ToString()) == value)
// ^ **where the error is occurring**
{
lblTest.Text = "Aha!";
//when working, will place index of compared dataset value into rowState, which is displaying the current index of the row I am focussed on in 'dg1'
}
}
}
Thanks ahead of time for the help, I really did search, and if it is easy to figure out through a simple google search then allow myself to repeatably hate on me, because I DID try.
gc

You can also use the DataSet's AcceptChanges() method to apply the deletes fully.
ds.Tables[0].Rows[0].Delete();
ds.AcceptChanges();

The current value for the data column in the inner if statement will not be available for deleted rows. To retrieve a value for deleted rows, specify that you want the original value. This should fix your error:
if (Convert.ToString(ds.Tables[0].Rows[i][0, DataRowVersion.Original].ToString()) == value)

In your "crashing if", you can check if the row is deleted before accessing it's values :
if (ds.Tables[0].Rows[i].RowState != DataRowState.Deleted &&
Convert.ToString(ds.Tables[0].Rows[i][0].ToString()) == value)
{
// blaaaaa
}
Also, I'm not sure why you ToString() the RowState instead of comparing it to DataRowState.Deleted.

after deleting the row , rebind your grid with the datatable , no need to manually resetting index , datatable handels it.
so you onl;y need to rebind grid's datasource.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Deleting duplicate rows in Excel using Epplus - c#

Related

Displaying original rowindex after filter in datagridview

does OnAutoGeneratingColumn e.cancel affect Table.Column.Count?

Adding DataRow[] data to each column in an unbound DataGridView?

How to find column count in a TableHeaderRow with only table id?

Error: Deleted row information cannot be accessed through the row

Categories

Resources