I have a worksheet with a number of rows and several columns. I want to delete all duplicate rows in this worksheet. In other words, the highlighted rows in this screenshot should be deleted, and the rows below should be moved up:
and should result in the following:
I'm using the following snippet of code:
List<int> rowsToDelete = new List<int>();
for (int row = 1; row <= worksheet.Dimension.End.Row; row++)
{
string a = worksheet.Cells[row,1].Value.ToString();
string b = worksheet.Cells[row,2].Value.ToString();
string c = worksheet.Cells[row,3].Value.ToString();
int i = row + 1;
while (worksheet.Cells[i,1].Value.ToString().Equals(a) &&
worksheet.Cells[i,2].Value.ToString().Equals(b) &&
worksheet.Cells[i,3].Value.ToString().Equals(c))
{
rowsToDelete.add(i);
i++;
}
}
foreach (var row in rowsToDelete)
{
worksheet.Delete(row);
}
It is not deleting the correct rows. How can I fix this?
This is using Epplus 4.5.3.3 and .NET Framework 4.6.1
I can only assume you are misunderstanding my comment in reference to the posted while statement…
while (worksheet.Cells[i,1].Value.ToString().Equals(a) &&
worksheet.Cells[i,2].Value.ToString().Equals(b) &&
worksheet.Cells[i,3].Value.ToString().Equals(c)) { …
This will work ONLY if the duplicate rows are contiguous. Example, using the first posted picture, lets assume there is a row nine (9) and in this row we have the “duplicate” cell values “a”, “b” and “c”. So when the while loop starts row 2 will evaluate to true as that row is a duplicate of row 1. So row index 2 is added to the list. On the next iteration of the while loop we will add row 3 as a duplicate. However, when we get to row 4, the while condition will evaluate to false as row 4 is NOT a duplicate of row 1. Therefore, the while loop will “EXIT” and the code will loop back up to the initial for loop to check the next row duplicates. At this point the duplicate at row 9 will never get checked thus leaving it as a duplicate row.
The point is that you do NOT want to stop checking for duplicate rows if one of the rows is NOT a duplicate. You need to continue through all the rows as the duplicate row could be on ANY row.
It should also be noted that it may be helpful to avoid “checking” for a duplicate row that has already been marked as a duplicate. Example, using the same first picture, in the first pass though the rows for the “first” row will add rows 2 and 3 as “duplicate” rows. So, when the while loop exits and we loop back up to the next row to check it will be row 2. However row 2 is ALREADY marked as a duplicate, so really there is no need to check that row for duplicates. In the solution below, a check will be made to see if the row that we are checking is already marked as a duplicate. If the row IS marked as a duplicate then we will simply skip that row.
Next, the last foreach loop to actually remove the rows may have some issues. Example, let’s say that the rows to remove list contains rows 2, 3 and 7. So inside the foreach loop... the code removes row 2. After this row is removed, then, row 3 is now row 2 and row 4 is now row 3 etc.… Therefore on the next iteration of the loop it will remove row 3 which is NOW row two (2). I hope it is clear that removing the rows in a top-down fashion WILL NOT WORK because as soon as the first row is removed, then all the row indexes below that row will change.
So, if we want to delete the proper rows in the list of row indexes, then, we can accomplish this by removing the rows from a bottom-up fashion. If we remove the rows from the bottom-up, then we do not have to worry about getting the indexes mixed up as we do when deleting the rows in a top-down fashion.
Given all this, I suggest you break this problem up into two steps. The first step simply fills the list of duplicate rows. Bear in mind, since we will be checking for the duplicate rows in a top-down fashion, the list of row indexes may not necessarily be in an ordered fashion. Example, if we add a duplicate row 9 as previously suggested, then the list of row indexes to delete would be { 2, 3, 9, 7 }. The 9 is BEFORE the 7 because row 9 was found to be a duplicate of row 1 and row 7 was found to be a duplicate of row 6. The point here is that the list may not necessarily be in an ordered fashion and this will create problems as described above.
Therefore, after we get the list of row indexes to delete, we will SORT the list. This will set the list as { 2, 3, 7, 9 }. At this point we could simply start deleting the rows from the bottom of the list on up, OR in the example below we will simply REVERSE the list so it will become { 9, 7, 3, 2 }. Then we will have an ordered list of ints that are order from high to low. NOW the for loop through the list should work without mixing up the row indexes.
To help, I suggest you create a method that takes an open worksheet and returns our “unsorted” list of row indexes we want to delete. To simplify things, all the code does is add the row indexes of duplicate rows. Walking through the code below we start by looping through all the rows in the worksheet. If we get to a row that has already been marked as a duplicate, then we will skip that row.
If the row is not marked as a duplicate, then the code will start another for loop that starts on the next row and ends on the last row. Again if we get to a row that has already been marked as a duplicate, then we will skip that row. Once the code has looped through all the rows we simply return the list of row indexes to delete.
private List<int> GetDuplicateRowsToDelete(ExcelWorksheet worksheet) {
List<int> rowsToDelete = new List<int>();
string a, b, c;
for (int i = 1; i <= worksheet.Dimension.End.Row; i++) {
if (!rowsToDelete.Contains(i)) {
a = worksheet.Cells[i, 1].Value.ToString();
b = worksheet.Cells[i, 2].Value.ToString();
c = worksheet.Cells[i, 3].Value.ToString();
for (int j = i + 1; j <= worksheet.Dimension.End.Row; j++) {
if (!rowsToDelete.Contains(j)) {
if (worksheet.Cells[j, 1].Value.ToString().Equals(a) &&
worksheet.Cells[j, 2].Value.ToString().Equals(b) &&
worksheet.Cells[j, 3].Value.ToString().Equals(c)) {
rowsToDelete.Add(j);
}
}
}
}
}
return rowsToDelete;
}
Finally we could make use of this method to get the indexes to delete, then we will sort and reverse the list, then delete the rows from the bottom up. Something like…
private void button1_Click(object sender, EventArgs e) {
FileInfo newFile = new FileInfo(#"D:\Test\Excel_Test\RemoveDup1.xlsx");
using (ExcelPackage pck = new ExcelPackage(newFile)) {
using (ExcelWorksheet worksheet = pck.Workbook.Worksheets[0]) {
List<int> rowsToDel = GetDuplicateRowsToDelete(worksheet);
rowsToDel.Sort();
rowsToDel.Reverse();
foreach (int rowIndex in rowsToDel) {
worksheet.DeleteRow(rowIndex);
}
pck.Save();
}
}
MessageBox.Show("Removed duplicates complete");
}
I hope this makes sense and helps.
I have solved your issue in another way: I have created two extra columns, "CONCAT" and "COUNT":
"CONCAT" contains the formula =A2+B2+C2 (till the end of the array)
"COUNT" contains the formula ==COUNTIF(D$2:D$9,D2) (also till the end of the array)
From then on, just write a VBA macro, checking the values "E9" back to "E2" and in case the value is larger than 1, remove the entire row.
I can't figure out how to solve this. I want to calculate in datagridview
Column 3 like this
using C#
Check this
int sum = 0;
for(int i=0;i<dataGridView1.Rows.Count;i++)
{
sum += Convert.ToInt32(dataGridView1.Rows[i].Cells[0].Value);
}
MessageBox.Show(sum.ToString());
To calculate values from you datagridview, you can do several approaches
For example, you can retrieve by rows then reference the column of the cell by index or column name.
For example, adding values in "Column 3" would be:
var rows = dataGridView1.rows;
double sum = 0;
foreach (DataGridViewRow row in rows)
{
sum += (double)row.Cells["Column 3"].Value;
}
EDIT:
If you're using DataTable (which I presume from your comments) to get individual cell data, you can either reference the row first then adding an index based on the column, or you can create a temp row, then get the value using the column. The records or values within a DataTable are basically just collection of rows.
Example:
int col3 = workTable.Columns.IndexOf("Column 3") -1;
double amount = Convert.ToDouble(workTable.Rows[1][col3]); // 1st value in column 3
OR (extended for clarity)
DataRow row1 = workTable.Rows[1];
double amount = Convert.ToDouble(row1[col3]);
If you need to transpose values into a DataGridView, you can just set the DataSource property of a DataGridView instance i.e.
dataGridView1.DataSource = newFilledTable;
Its kind of subjective question to ask but still i hope i will find help.
I was learning about merging two tables from database to a singe DataTable.Then i came accross the following block of code.
DataTable mdt = new DataTable();
mdt.Columns.Add("products");
mdt.Columns.Add("price");
for (int i = 0; i < A.Rows.Count; i++)
{
DataRow dr = mdt.NewRow();
dr["product"] = A.Rows[i]["product"].ToString();
dr["price"] = A.Rows[i]["price"].ToString();
//A is a DataTable
mdt.Rows.Add(dr);
}
I can understand that the datas are being added to the row of a Datatable.
This is what i understood:
The column product of DataTable is assigned a value by dr["product"].Correct me if i am wrong.But how is this A.Rows[i]["product"].ToString(); working.
This should help:
A = DataTable
A.Rows = Collection of DataRows in A
A.Rows[i] = i-th row from collection
A.Rows[i]["product"] = Column "product" in row (return type of expression is object)
So when you do dr["product"] = A.Rows[i]["product"].ToString();, you are assigning the value of the product column of the current row from datatable A to the product column in your new data row. Similarly for the price column.
Rows[i] represents the index to which the value is assigned.I mean for the first loop the value is 0,for second loop the value is 1.So for the firs loop the values of product and price are added to first row with index 0.Similarly for the second loop the values of product and price are added to the second row with index 1.
And for the second part ie ["product"].ToString(),its simply converting the value of product to string.
EDIT
Since A is a Datatable which is already filled.What we are doing with that statement is,we are taking the DataTables's i-th row from the collection,converting it to string and assigning it to the column "product" in the datarow.
The ["product"].ToString() Is taking the value of the cell with the column name product, and converting it to a string to be assigned to your new row.
I have a datatable with one column:
this.callsTable.Columns.Add("Call", typeof(String));
I then want to add a row to that datatable, but want to give a specific index, the commented number is the desired index:
this.callsTable.Rows.Add("Legs"); //11
Update:
Must be able to handle inputting hundreds of rows with unique
indexes.
The index must be what is defined by me no matter if there are enough
rows in the table or not for the insertat function.
You can use DataTable.Rows.InsertAt method.
DataRow dr = callsTable.NewRow(); //Create New Row
dr["Call"] = "Legs"; // Set Column Value
callsTable.Rows.InsertAt(dr, 11); // InsertAt specified position
See: DataRowCollection.InsertAt Method
If the value specified for the pos parameter is greater than the
number of rows in the collection, the new row is added to the end.
I am using a DataGridView to display some data in my application.
The data in the table gets changed dynamically according to the users input.
I am able to retrieve the data according to the user.
I have to add an extra column named ID and fill in the values serially starting from 1 to the number of rows which are generated dynamically.
I had added the column using dgrid.columns.add("UID");
But how to insert values at runtime?
Seeing your code, it is not correct to do:
dgrid.Columns.Add("UID");
You will have to do:
dgrid.Columns.Add("uidColumn", "UID");
To modify/add the value of an existing cell, if the row already exists, you can do:
dgrid.Rows[0].Cells["uidColumn"].Value = myValue;
That will modify the value of the column with name uidColumn and row 0. According to your problem, all you have to do is:
for (int i = 0; i < dgrid.Rows.Count; i++) {
dgrid.Rows[i].Cells["uidColumn"].Value = GetValueOfRow(i);
}
supposing that you have a method GetValueOfRow that receives a row index and returns the value you need in the ID column in that row.