Remove duplicates data from a dataset based on 2 column values

Remove duplicates data from a dataset based on 2 column values - c#

Here is my code. I just want to compare two data sets columns and exclude same records. I tried this
code. It exclude the same records but it also generated duplicate data.
for (int i = 0; i < ds2. Tables[0].Rows.Count; i++)
{
for each (Data Row dr2 in ds1. Tables[0].Rows)
{
string table2 = dr2["code"].ToString();
if (i.ToString() != table2.ToString())
{
dsnew. Tables[0].Rows.Add(i);
}
}
}

strange comparison. you don't seem to be using the records in ds2. only the count of ds2.
i think you are getting duplicates because you are
comparing each element in ds2 with 1 element in ds1 before deciding to place it in dsnew
try
comparing each element in ds2 with all elements in ds1 before deciding to place it in dsnew
for (int i = 0; i < ds2.Tables[0].Rows.Count; i++)
{
bool found = false;
foreach (DataRow dr2 in ds1.Tables[0].Rows)
{
string table2 = dr2["code"].ToString();
if (i.ToString() == table2.ToString())
{
found = true;
}
}
if(!found) dsnew.Tables[0].Rows.Add(i);
}

Related

Unable to move row from one datatable to another

I have asked a question previously based on the errors of this code. However, after the suggestions given, there is no more error. However, the data from the row in queue table would not move to the the missedQueue table.
I'm not sure why it won't work :(
this is my code:
DataSet queue = DBMgr.GetDataSet("SELECT * FROM queue");
DataTable missedQueue = queue.Tables[0].Clone();
DataRow dr = queue.Tables[0].NewRow();
for (int i = 0; i < queue.Tables[0].Columns.Count; i++)
{
dr[queue.Tables[0].Columns[i].ColumnName] = queue.Tables[0].Rows[0][i];
}
missedQueue.Rows.Add(dr.ItemArray);
}

Your DataRow should be of missedQueue table and add the row inside loop like
DataRow dr = null;
for (int i = 0; i < queue.Tables[0].Columns.Count; i++)
{
dr = missedQueue.NewRow();
dr[queue.Tables[0].Columns[i].ColumnName] = queue.Tables[0].Rows[0][i];
missedQueue.Rows.Add(dr);
}

How to read data fast from an excel and convert it to list from file stream

I am using EPPlus.
The excel I am uploading has column headers in row number 2 . And from row 4 onward it has the data which may vary up to 2k records.
The way I am doing it , it takes a lot of time for reading 2k records and putting to a list .
using (var excel = new ExcelPackage(hpf.InputStream))
{
var ws = excel.Workbook.Worksheets["Sheet1"];
//Read the file into memory
for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
{
if (!ws.Cells[rw, 1, rw, 24].All(c => c.Value == null))
{
int headerRow = 2;
GroupMembershipUploadInput gm = new GroupMembershipUploadInput();
for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++)
{
var s = ws.Cells[rw, col].Value;
if (ws.Cells[headerRow, col].Value.ToString().Equals("Existing Constituent Master Id"))
{
gm.cnst_mstr_id = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
}
else if (ws.Cells[headerRow, col].Value.ToString().Equals("Prefix of the constituent(Mr, Mrs etc)"))
{
gm.cnst_prefix_nm = (ws.Cells[rw, col].Value ?? (Object)"").ToString();
}
}
lgl.GroupMembershipUploadInputList.Add(gm);
}
}
GroupMembershipUploadInputList is the list of objects of type GroupMembershipUploadInput that I am adding the excel values to after reading from the cell wise.
Can it be done faster ? What am I missing here ?
Please help to improve the performance.

You are making a lot iterations there, for each row, you visit each column twice. I assume that you only need those two values per row and if so the following code would reduce time drastically:
using (var excel = new ExcelPackage(hpf.InputStream))
{
var ws = excel.Workbook.Worksheets["Sheet1"];
int headerRow = 2;
// hold the colum index based on the value in the header
int col_cnst_mstr_id = 2;
int col_cnst_prefix_nm = 4;
// loop once over the columns to fetch the column index
for (int col = ws.Dimension.Start.Column; col <= ws.Dimension.End.Column; col++)
{
if ("Existing Constituent Master Id".Equals(ws.Cells[headerRow, col].Value))
{
col_cnst_mstr_id = col;
}
if ("Prefix of the constituent(Mr, Mrs etc)".Equals(ws.Cells[headerRow, col].Value))
{
col_cnst_prefix_nm = col;
}
}
//Read the file into memory
// loop over all rows
for (int rw = 4; rw <= ws.Dimension.End.Row; rw++)
{
// check if both values are not null
if (ws.Cells[rw, col_cnst_mstr_id].Value != null &&
ws.Cells[rw, col_cnst_prefix_nm].Value != null)
{
// the correct cell will be selcted based on the column index
var gm = new GroupMembershipUploadInput
{
cnst_mstr_id = (string) ws.Cells[rw, col_cnst_mstr_id].Value ?? String.Empty,
cnst_prefix_nm = (string) ws.Cells[rw, col_cnst_prefix_nm].Value ?? String.Empty
};
lgl.GroupMembershipUploadInputList.Add(gm);
}
}
}
I removed the inner column loop and moved it to the start of the method. There it is used to just get the columnindex for each field you're interested in. The expensive null check can now also be reduced. To fetch the value, all that is now needed is a simple index lookup in the row.

DataTable normalize blanks cells

I have not found a method to normalize a DataTable that came from an Excel with merged cells. When I get the DataTable from that Excel, only the first cell has the value, others are blank.
An example of this DataTable is:
and the expected result:
To summarize: blanks cells should be completed with the value of the next cell above with a value, since is what was happened with the Excel merge of cells.
I'm using Excel.dll to read this Excel, didn't provide the autofill of cells, so that's why I'm searching for a method inside C#.
I suppose that logic should be: if a cell is blank, use the upper cell as a value. The logic appears clear but I have issues trying to get the code to apply it.
This is a sample, but at the end, I'm looking for a method to do this whenever columns or rows have the datatable.
Edit:
Thanks for your quicky feedback.
Attached what i have so far for just only one column and with errors since doesn't take care of the first and last row, but is the idea... what i try to achieve is to have a method for any amount of cols and rows (could be ok if cols are fixed with names, and then if i have more columns i will adapt).
private void NormalizeDataTable(DataTable dtRawTable)
{
DataTable dtFinalized = new DataTable();
dtFinalized.Columns.Add("Col1", typeof(String));
string previousValue = "";
for (int index = 0; index <= dtRawTable.Rows.Count; index++)
{
DataRow dr = dtFinalized.NewRow();
if (index != 0 || index == dtRawTable.Rows.Count -1)
{
if (dtRawTable.Rows[index]["Modelo"].ToString() == "")
{
dr["Col1"] = previousValue;
}
else
{
dr["Col1"] = Convert.ToString(dtRawTable.Rows[index]["Modelo"].ToString());
previousValue = (string)dr["Col1"];
}
}
dtFinalized.Rows.Add(dr);
dtFinalized.AcceptChanges();
}
}

Here is the function i using in my project for same requirement.
public static DataTable AutoFillBlankCellOfTable(DataTable outputTable)
{
for (int i = 0; i < outputTable.Rows.Count; i++)
{
for (int j = 0; j < outputTable.Columns.Count; j++)
{
if (outputTable.Rows[i][j] == DBNull.Value)
{
if (i > 0)
outputTable.Rows[i][j] = outputTable.Rows[i - 1][j];
}
}
}
return outputTable;
}

Remove blank row from dataset

ds.Tables.Add(dt);
return ds;
In the above code snippet, how can i return my dataset but exclude all blank rows i.e blank meaning rows with null or an empty string in all their columns.

You will have to do that checking before hand and then return the DataTable something like below (an example)
for (int i = dt.Rows.Count - 1; i >= 0; i--)
{
if (dt.Rows[i]["col1"] == DBNull.Value && dt.Rows[i]["col2"] == DBNull.Value)
{
dt.Rows[i].Delete();
}
}
dt.AcceptChanges();
ds.Tables.Add(dt);
return ds;

In case anyone stumbles across this article, this is the solution I came up with:
// REMOVE ALL EMPTY ROWS
dt_Parsed.Rows.Cast<DataRow>().ToList().FindAll(Row =>
{ return String.IsNullOrEmpty(String.Join("", Row.ItemArray)); }).ForEach(Row =>
{ dt_Parsed.Rows.Remove(Row); });

Here had helper function in which pass your table that you want to delete datarow with all empty columns(Here I assumed all string are of type string then it will work)
For other type u can check datacolumn type and then can make relavant checking.
public DataTable DeleteEmptyRows(DataTable dt)
{
DataTable formattedTable = dt.Copy();
List<DataRow> drList = new List<DataRow>();
foreach (DataRow dr in formattedTable.Rows)
{
int count = dr.ItemArray.Length;
int nullcounter=0;
for (int i = 0; i < dr.ItemArray.Length; i++)
{
if (dr.ItemArray[i] == null || string.IsNullOrEmpty(Convert.ToString(dr.ItemArray[i])))
{
nullcounter++;
}
}
if (nullcounter == count)
{
drList.Add(dr);
}
}
for (int i = 0; i < drList.Count; i++)
{
formattedTable.Rows.Remove(drList[i]);
}
formattedTable.AcceptChanges();
return formattedTable;
}

You can try to loop the DataTables in DataSet with this method:
public void Clear_DataTableEmptyRows(DataTable dataTableControl)
{
for (int i = dataTableControl.Rows.Count - 1; i >= 0; i--)
{
DataRow currentRow = dataTableControl.Rows[i];
foreach (var colValue in currentRow.ItemArray)
{
if (!string.IsNullOrEmpty(colValue.ToString()))
break;
dataTableControl.Rows[i].Delete();
break;
}
}
}

Delete a data table from data set in c#

In c#,i want to delete a datatable from dataset,if all the values of the datatable are zero.
How to achieve above functionality
I am using this code to add values into datatable
for (int row = startRowParcel + 1; row <= endRowParcel; row++) {
List<string> rateRow = new List<string>();
for (int col = startColumnNsa; col <= endColumnNsa; col++) {
if (Convert.ToString(ws.Cells[row, col].Value) == null)
rateRow.Add("0");
else if (Convert.ToString(ws.Cells[row, col].Value) == "1/2")
rateRow.Add("0.5");
else
rateRow.Add(Convert.ToString(ws.Cells[row, col].Value));
}
tbPriority.Rows.Add(rateRow.ToArray());
}
thanks in advance.

you can achieve this by using below code:
for(int i=0;i<dt.rows.count;i++)
{
for(intj=0;j<=dt.columns.count;j++)
{
if( dt.rows[i][j]!=0)
{
flag=1;
break;
}
}
}
if(flag==1)
{
// dont remove the table
}
else
{
ds.tables.remove(dt);
}
}
Iterate through that datatable ,check for non zero values, if all are zero remove it else not
Hope this helps..

This LINQ approach finds all tables where all rows' fields are "0":
var allZeroTables = dsPriorities.Tables.Cast<DataTable>()
.Where(tbl => tbl.AsEnumerable()
.All(r => tbl.Columns.Cast<DataColumn>()
.All(c => r.Field<string>(c) == "0")));
foreach (DataTable zeroTable in allZeroTables.ToList())
dsPriorities.Tables.Remove(zeroTable);
Enumerable.All is a short circuiting method that stops on the first non-match.
Note that the ToList() is required since you cannot modify the DataSet's DataTableCollection from within the foreach without creating a new collection.

if (!(dt.AsEnumerable().Any(x => x.Field<double>("nameOfColumn") != 0)) {
//delete data table
}
For more information see LINQ query on a DataTable

for (int row = startRowParcel + 1; row <= endRowParcel; row++) {
List<string> rateRow = new List<string>();
for (int col = startColumnNsa; col <= endColumnNsa; col++) {
if (Convert.ToString(ws.Cells[row, col].Value) == null)
rateRow.Add("0");
else if (Convert.ToString(ws.Cells[row, col].Value) == "1/2")
rateRow.Add("0.5");
else
rateRow.Add(Convert.ToString(ws.Cells[row, col].Value));
}
if (rateRow.Any(x=> x != "0"))
tbPriority.Rows.Add(rateRow.ToArray());
}
//Then you can check if tbPriority has any rows if it doesn't then remove it from the dataset
if (!tbPriority.Rows.Any())
// delete dataTable
if rateRow contains any value other than "0" the rows will get added.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove duplicates data from a dataset based on 2 column values - c#

Related

Unable to move row from one datatable to another

How to read data fast from an excel and convert it to list from file stream

DataTable normalize blanks cells

Remove blank row from dataset

Delete a data table from data set in c#

Categories

Resources