How to avoid adding duplicate rows to a datatable - c#

I'm using a SqlDataReader to add row by row into a datatable like follows:
while (reader.Read())
{
dataTable.LoadDataRow(reader.CurrentRow(), LoadOption.PreserveChanges);
}
This works, but I need to be able to avoid adding duplicate rows to the dataTable. I would love to be able to use the Contains or Find methods from the dataTable, but I can't find a way to turn the object[] from reader.CurrentRow() into a DataRow to compare to without adding it to a datatable.
I've looked into the option of making a hashset of the object[]s, and then adding them all at once to the datatable at the end, but I forgot that the default object IEqualityComparer only compares the reference.
Is there a feasible way of doing this without removing the duplicates at the end?
If removing the duplicates is the only way to go, what is the best way to do that?
EDIT:
I'm splitting distinct rows from the database into separate datatables in code. Each row from the query result is distinct, but sections of each row are not. Unfortunately I need to do exactly what my question is asking, as the results from the query are already distinct.

You didn't provide a ton of detail, but I hope this is comprehensive.
If you need a single column to be unique, then in your Columns collection in your datatable, specify the column like this:
DataTable appeals = new DataTable("Appeals");
appeals.Columns["PriorAppealNumber"].Unique = true;
DataColumn keyField = new DataColumn("AppealNumber", typeof(string));
appeals.Columns.Add(keyField);
If the uniqueness needs to span multiple rows, this is the method:
var myUniqueConstraint = new UniqueConstraint( new DataColumn[] {appeals.Columns[0], appeals.Columns[1], appeals.Columns[2]} );
appeals.Constraints.Add(myUniqueConstraint);
That will enforce the constraints BEFORE you try to commit back to the source database.

The easiest way is to actually make sure there are no duplicate rows at all - if you're querying relational database use DISTINCT - that will return only unique rows.

Related

How do I populate a DataRow and assign it to one of two tables based on a condition?

I have data being processed by an app which needs to sort the data based on whether or not a bit is flipped. The tables are identical. The code as it stands looks something like this:
DataTable dt2 = dt1.Clone();
DataRow r = dt1.NewRow();
FillUp(ref r);
if(bitISetEarlier)
dt2.ImportRow(r);
else
dt1.ImportRow(r);
Now, a clear problem I was having is that if the row wasn't already attached to a table, ImportRow() fails silently and I end up with an empty table. When I changed this to:
if(bitISetEarlier)
dt2.Rows.Add(r);
else
dt1.Rows.Add(r);
I started getting an exception saying that a function was trying to add a row that existed for another table. So when I tried this:
if(bitISetEarlier)
if(r.RowState == DataRowState.Detached)
dt2.Rows.Add(r)
else dt2.ImportRow(r);
else
if(r.RowState == DataRowState.Detached)
dt1.Rows.Add(r)
else dt1.ImportRow(r);
the exception stopped, but any attempt to assign to dt2 still states that the row belongs to another table, but if I comment out the dt2 if statement and just attempt ImportRow(), the dt2.Rows.Count remains at 0 and no records assigned.
I need to populate the DataRow before knowing which table it belongs in, but I have no idea what columns the row will have before it hits this function. The condition that indicates which table it should go to is not stored with the data in the DataTable.
Is the problem that even though they have identical columns, NewRow() is adding an attribute to the row that makes it incompatible with the sister table? Is there a way I can get the same functionality as NewRow() (copy schema without knowing what any of the columns are ahead of time) but that I can dynamically assign? I'm aware I could probably manually construct a row that is compatible with either by wrapping it in a for loop and building out the columns every time i need the new row using the DataTable.Columns property, but I'd like to avoid doing that if possible.
I found my solution. Since I can't add the row built off of one table to the other table directly, I figured it was the DataRow object that was problematic, but the ItemArray property was probably all I needed.
if (isErrorRow)
{
//nr is the NewRow for dt1
var nr2 = dt2.NewRow();
nr2.ItemArray = nr.ItemArray;
dt2.Rows.Add(nr2);
}
This effectively cloned the rows.

.net Datatable Row insertion order

When I add rows to a Datatable, and later iterate that Datatable.
Will I get the rows in the same order then they were inserted? First in, first out?
Or can I not rely on that order?
row = datatable.NewRow();
row("id") = 1;
table.Rows.Add(row);
row = datatable.NewRow();
row("id") = 2;
table.Rows.Add(row);
foreach(Row row in datatable.AsEnumerable())
{
// FIFO here? always get row with id 1 as first row and row with id 2 as second row?
}
If you do this manually, YES of course.
MSDN - Adding Data to a DataTable
Note that values in the array are matched sequentially to the columns,
based on the order in which they appear in the table.
Yes, normally that is what should happen.
If you need an explicit ordering of the rows I would suggest you sort them before iterating through them. Either by doing an OrderBy<> linq type query or by using the Select method on the DataTable.
Also, you can use the Rows property of the DataTable to get an enumerable of all rows instead of doing the AsEnumerable call.
Edit:
By inspecting the decompiled sources for the DataTable I found that the Rows property of the DataTable returns a DataRowCollection object. The DataRowCollection stores the rows in a binary tree based structure that allows you to fetch the items based on its array index. So it will return the rows in the same order as they were added. As long as we expose an indexer that takes a numerical index this is implied.
In addition, the AsEnumerable extension method, will turn the Rows into an EnumerableRowsCollection that wraps the same types as an IEnumerable<DataRow>.

Can I access entire DataTable if all I have is a single DataRow?

DataRow contains a Table property, which seems to return the entire Table for which this row belongs.
I'd like to know if I can use that table safely, or if there are gotcha's.
In http://msdn.microsoft.com/en-us/library/system.data.datarow.table.aspx documentation, it says "A DataRow does not necessarily belong to any table's collection of rows. This behavior occurs when the DataRow has been created but not added to the DataRowCollection.", but I know for a fact my row belongs to a table.
In terms of pointers, if each Row from DataTable points to original DataTable, than I'm good to go. Is that all 'Table' property does?
Just to explain why I'm trying to get entire Table based on a single DataRow:
I'm using linq to join two (sometimes more) tables. I'd like to have a generic routine which takes the output of linq (var), and generate a single DataTable with all results.
I had opened another question at stackoverflow (Join in LINQ that avoids explicitly naming properties in "new {}"?), but so far there doesn't seem to be a generic solution, so I'm trying to write one.
if you know the row is part of table than yes you can access it without any problem. if the possibility exists where the row may not be associated to a table than check if the property is null.
if(row.Table == null)
{
}
else
{
}
As long as it's not null, you can use it freely.

how to merge repeat dataset value to single dataset

Am using foreach loop that contain some code to retraive data from database. first time it returns some rows from database.In second looping it returns some rows so and so. My question is may i merge the looping rows to single dataset?.
please help me to merge that row values to single dataset....
As #Tim Schmelter mentioned there is also the Merge() method on datasets. This will allow you to different types of merges, including updates which will stop you having duplicate rows if you have the same row in each dataset. This maybe better than using a for loop to add the rows from one to the other depending on the type of data you have.
you can read more on this here:
http://msdn.microsoft.com/en-us/library/803bh6bc.aspx
This is provided I understood you correctly ...
DataTable tbl = new DataTable();
foreach (DataRow row in data.Rows)
{
tbl.Rows.Add(row);
}
Then just add the DataTable to a DataSet of your choice.

Join multiple DataRows into a single DataRow

I am writing this in C# using .NET 3.5. I have a System.Data.DataSet object with a single DataTable that uses the following schema:
Id : uint
AddressA: string
AddressB: string
Bytes : uint
When I run my application, let's say the DataTable gets filled with the following:
1 192.168.0.1 192.168.0.10 300
2 192.168.0.1 192.168.0.20 400
3 192.168.0.1 192.168.0.30 300
4 10.152.0.13 167.10.2.187 80
I'd like to be able to query this DataTable where AddressA is unique and the Bytes column is summed together (I'm not sure I'm saying that correctly). In essence, I'd like to get the following result:
1 192.168.0.1 1000
2 10.152.0.13 80
I ultimately want this result in a DataTable that can be bound to a DataGrid, and I need to update/regenerate this result every 5 seconds or so.
How do I do this? DataTable.Select() method? If so, what does the query look like? Is there an alternate/better way to achieve my goal?
EDIT: I do not have a database. I'm simply using an in-memory DataSet to store the data, so a pure SQL solution won't work here. I'm trying to figure out how to do it within the DataSet itself.
For readability (and because I love it) I would try to use LINQ:
var aggregatedAddresses = from DataRow row in dt.Rows
group row by row["AddressA"] into g
select new {
Address = g.Key,
Byte = g.Sum(row => (uint)row["Bytes"])
};
int i = 1;
foreach(var row in aggregatedAddresses)
{
result.Rows.Add(i++, row.Address, row.Byte);
}
If a performace issue is discovered with the LINQ solution I would go with a manual solution summing up the rows in a loop over the original table and inserting them into the result table.
You can also bind the aggregatedAddresses directly to the grid instead of putting it into a DataTable.
most efficient solution would be to do the sum in SQL directly
select AddressA, SUM(bytes) from ... group by AddressA
I agree with Steven as well that doing this on the server side is the best option. If you are using .NET 3.5 though, you don't have to go through what Rune suggests. Rather, use the extension methods for datasets to help query and sum the values.
Then, you can map it easily to an anonymous type which you can set as the data source for your grid (assuming you don't allow edits to this, which I don't see how you can, since you are aggregating the data).
I agree with Steven that the best way to do this is to do it in the database. But if that isn't an option you can try the following:
Make a new datatable and add the columns you need manually using DataTable.Columns.Add(name, datatype)
Step through the first datatables Rows collection and for each row create a new row in your new datatable using DataTable.NewRow()
Copy the values of the columns found in the first table into the new row
Find the matching row in the other data table using Select() and copy out the final value into the new data row
Add the row to your new data table using DataTable.Rows.Add(newRow)
This will give you a new data table containing the combined data from the two tables. It won't be very fast, but unless you have huge amounts of data it will probably be fast enough. But try to avoid doing a LIKE-query in the Select, for that one is slow.
One possible optimization would be possible if both tables contains rows with identical primary keys. You could then sort both tables and step through them fetching both data rows using their array index. This would rid you of the Select call.

Categories

Resources