Why is Union() not excluding duplicates like it should? - c#

EDIT: Why is Union() not excluding duplicates like it should?
I should have read the documentation before asking the original question. I didn't because everytime I used Union() was on lists of objects that didn't override Equals() and GetHashCode(), so even if the value of the fields of each of my objects in the lists were the same, they would be inside the new list Union() created. At first it would seem as if Union() didn't exclude duplicates and that was what I believed was true. But Union() does, in fact, exclude duplicates. And not only duplicates in both lists, but also duplicates within the same list. If my objects don't override Equals() and GetHashCode() they are not compared by value and that means that they are not seem as duplicates.
This was the confusion that made me ask this question.
Once I create a new List using Union() and then Select() the fields explicitly, "T" would become an anonymous type, which is compared by value. This way, objects with the same value of fields would be seem as duplicates. That is what is causing Union() to behave differently (or rather appear to behave differently). It always excludes duplicates but not always a type is compared by value, so objects with the same value of fields may or may not be seem as duplicates. It depends on the implementation of your custom class.
I guess that should have been the question: Why is Union() not excluding duplicates like it should? (as we've seen, it's because my objects were not really duplicates). Is that right?
----------------------
Original Question: LINQ Union + Select is removing duplicates automatically. Why?
I've always thought that Union() in Linq would return all values from the two lists even if they are the same. But my code is removing duplicates from the first list when I use 'Select()' right after a Union().
Imagine the classic probability problem of ball extraction, where I have different containers and I extract some number of different balls from the containers.
I have two lists of BallExtraction. Each list shows me the Id of the ball, the Id of the container that the ball was in, the number of balls I have extracted (Value) and its Color. But, for some reason, I have two different lists and I want to merge them.
Example Code:
class BallExtraction
{
public enum BallColor
{
Blue = 0,
Red = 1
}
public int Id { get; set; }
public int IdContainer { get; set; }
public int ValueExtracted { get; set; }
public BallColor Color { get; set; }
public BallExtraction() { }
public BallExtraction(int id, int idContainer, int valueExtracted, BallColor color)
{
this.Id = id;
this.IdContainer = idContainer;
this.ValueExtracted = valueExtracted;
this.Color = color;
}
}
And now I run the program that follows:
class Program
{
static void Main(string[] args)
{
List<BallExtraction> list1 = new List<BallExtraction>();
list1.Add(new BallExtraction(1, 1, 20, BallExtraction.BallColor.Blue));
list1.Add(new BallExtraction(1, 1, 20, BallExtraction.BallColor.Blue));
list1.Add(new BallExtraction(1, 1, 20, BallExtraction.BallColor.Red));
list1.Add(new BallExtraction(1, 2, 70, BallExtraction.BallColor.Blue));
list1.Add(new BallExtraction(2, 1, 10, BallExtraction.BallColor.Blue));
List<BallExtraction> list2 = new List<BallExtraction>();
list1.Add(new BallExtraction(3, 2, 80, BallExtraction.BallColor.Blue));
list1.Add(new BallExtraction(3, 2, 80, BallExtraction.BallColor.Red));
var mergedList = list1.Where(w => w.Color == BallExtraction.BallColor.Blue).Select(s => new
{
Id = s.Id,
IdContainer = s.IdContainer,
ValueExtracted = s.ValueExtracted
}).Union(list2.Where(w => w.Color == BallExtraction.BallColor.Blue).Select(s => new
{
Id = s.Id,
IdContainer = s.IdContainer,
ValueExtracted = s.ValueExtracted
}));
Console.WriteLine("Number of items: {0}", mergedList.Count());
foreach (var item in mergedList)
{
Console.WriteLine("Id: {0}. IdContainer: {1}. # of balls extracted: {2}", item.Id, item.IdContainer, item.ValueExtracted);
}
Console.ReadLine();
}
}
The expected output is:
Number of items: 5
Id: 1. IdContainer: 1. Value: 20.
Id: 1. IdContainer: 1. Value: 20.
Id: 1. IdContainer: 2. Value: 70.
Id: 2. IdContainer: 1. Value: 10.
Id: 3. IdContainer: 2. Value: 80.
But the actual output is:
Number of items: 4
Id: 1. IdContainer: 1. Value: 20.
Id: 1. IdContainer: 2. Value: 70.
Id: 2. IdContainer: 1. Value: 10.
Id: 3. IdContainer: 2. Value: 80.
Notice that the first list contains two extractions with the same values. The Id of the ball is 1, the Id of the container is 1, the number of balls extracted is 20 and they are both blue.
I found that when I switch the 'mergedList' to the code below, I get the expected output:
var mergedList = list1.Where(w => w.Color == BallExtraction.BallColor.Blue).Union(list2.Where(w => w.Color == BallExtraction.BallColor.Blue));
So, it seems that the 'Select' used right after the Union() is removing the duplicates from the first list.
The real problem is that I don't actually have a list of a simple type like in the example but I have a list of IEnumerable< T > (T is an anonymous type) and T has a lot of fields. I only want specific fields but I want all the new anonymous type duplicates. The only workaround I have found is if in the 'Select()' I add some field that is unique to each object T.
Is this working as intended? Should Union + Select remove duplicates?

Yes, it's the expected behaviour.
Union's doc states
Return Value Type: System.Collections.Generic.IEnumerable An IEnumerable that contains the elements from both input
sequences, excluding duplicates.
To keep duplicates, you have to use Concat(), not Union()

Related

How to sort a collection of an object that has a predecessor field?

Suppose that I have a Color class
class Color
{
int id;
string name;
int? predecessorId;
Color(_id, _name, _predecessorId)
{
id = _id;
name = _name;
predecessorId = _predecessorId;
}
}
The purpose of the predecessor ID is that I can have a collection of colors and have them sorted arbitrarily. (I did not design this data structure.)
Suppose I have the following seven colors:
var colors = new []
{
new Color(0, "red", null),
new Color(1, "orange", 0),
new Color(2, "yellow", 1),
new Color(3, "green", 2),
new Color(4, "blue", 3),
new Color(5, "indigo", 4),
new Color(6, "violet", 5)
}
Now, suppose that I receive data from an external source. The data looks like the above, but the colors don't necessarily come in the array in the order that the predecessorId field. However, I will always assume that the predecessorId values always form an unbroken linked list.
Given an array of seven of these color objects, how do I sort them so that the first one is the one with no predecessor, the second one has the first one as its predecessor, and so on.
I am well aware that there are any number of ways to skin this cat, but I am using C# .net and I would like to use the built-in framework classes as much as possible to avoid writing more code that must be maintained.
Pick the first (and hopefully unique) color by finding the one with predecessorId == null. Then find the next by indexing a dictionary of predecessorId to color until no more matches are found:
var current = colors.Single(c => c._predecessorId == null);
var sorted = new List<Color> { current };
var colorsByPredecessorId = colors.Where(c => c._predecessorId.HasValue)
.ToDictionary(c => c._predecessorId.Value, c => c);
while (colorsByPredecessorId.ContainsKey(current._id))
{
current = colorsByPredecessorId[current._id];
sorted.Add(current);
}
return sorted;
I posted the same thing as ryanyuyu but deleted it as he was a few seconds faster, but seeing as it seems to do what you want and he hasn't submitted it as answer yet, I will:
var orderedColors = colors.OrderBy(x => x.predecessorId);
Which provides the output like so:
So I'm not sure what you mean in your comment
solution here would only work if we could guarantee that the predecessorIds are always in numeric order. If I changed the predecessor IDs to reverse the order, it wouldn't work anymore
It doesn't matter what order the source data is in. That's the point of OrderBy. It puts the collection in the order you dictate. Unless you can clarify on the requirements, this should suffice.

Sorting multiple collections of objects by looking at each object's own path

This is my code problem and it's killing me. I can't figure out how to do this ... I find myself creating temp arrays to hold references, loops in loops.. it's a mess.
The setup
Look at the image. We have 3 Something collections (layers) with cross-references between collections.
The goal is to order the Somethings as indicated by the arrows. However, C is more important than B who is more important than A (it's a bottom-up importance). A Something can change the natural order by following another Something in a different "Layer".
So the desired order is:
+----+----+----+---+---+---+---+---+----+----+----+---+---+---+---+
| 14 | 15 | 17 | 7 | 8 | 1 | 2 | 9 | 18 | 19 | 12 | 3 | 4 | 5 | 6 |
+----+----+----+---+---+---+---+---+----+----+----+---+---+---+---+
You will probably have to look a few times before you arrive at the same conclusion. Let me break it down for you.
We start in layer C (C outranks B and A, remember?) ... first column on the left is the start column.
We have 14, 15 and 17. We don't want 18, because 18 does not have a reference back to 17. So we move up a layer and start at the beginning again with 7 and 8. It ends there, since 9 does not reference 8.
We move up one layer again and get 1 and 2.
Then it gets difficult -- 2 is followed by 9, so we get 9 first. Then we see that 9 is followed by 18, so we grab that one. Here is the important part -- C outranks B and A, so we first get 19, then go up to fetch 12 and then we go back to layer A to continue after 2 and get 3,4,5,6.
There must be some ingenieus way to do this and still keep it fast. This is a stripped down example. The real thing has dozens of objects and layers.
The real thing has a virtual propery that completes the 1-to-many relationship. I want to avoid that property, because it adds another collection to the mix, but in case it would make things easier I'll add that here.
class Something
{
public int Id {get;set;}
public int FollowsId {get;set;}
public IEnumerable<Something> FollowedBy {get;set;}
}
I renamed the properties so it would be easier to grasp.
You can just treat this as a tree structure, and then just walk it left-to-right (or in your case, nodes directly linked from a sublayer prior to nodes from the same layer) ensuring to yield the current node before navigating sub-nodes...
Working code:
public class Layer
{
public string Name { get; set; }
public int Priority { get; set; }
public Something Head { get; set; }
public Something Add(Something s) {
if (this.Head == null) this.Head = s;
s.Layer = this;
this.Items.Add(s);
return s;
}
public Something this[int id] {
get { return this.Items.SingleOrDefault(s => s.Id == id); }
}
public List<Something> Items = new List<Something>();
private void BuildTree(List<Something> list, Something s = null)
{
list.Add(s);
foreach(Something ss in s.Followers.OrderBy(sss => sss.Layer.Priority))
{
BuildTree(list, ss);
}
}
public List<Something> Tree
{
get
{
List<Something> list = new List<Something>();
if (this.Head != null) BuildTree(list, this.Head);
return list;
}
}
}
public class Something
{
public int Id { get; set; }
public Layer Layer { get; set; }
public List<Something> Followers = new List<Something>();
public void Follows(Something s) { s.Followers.Add(this); }
}
void Main()
{
Layer A = new Layer() {
Name="A",
Priority = 3
};
A.Add(new Something() { Id = 1 });
A.Add(new Something() { Id = 2 }).Follows(A[1]);
A.Add(new Something() { Id = 3 }).Follows(A[2]);
A.Add(new Something() { Id = 4 }).Follows(A[3]);
A.Add(new Something() { Id = 5 }).Follows(A[4]);
A.Add(new Something() { Id = 6 }).Follows(A[5]);
Layer B = new Layer() {
Name = "B",
Priority = 2
};
B.Add(new Something() { Id = 7 });
B.Add(new Something() { Id = 8 }).Follows(B[7]);
B.Add(new Something() { Id = 9 }).Follows(A[2]);
B.Add(new Something() { Id = 12 }).Follows(B[9]);
Layer C = new Layer() {
Name = "C",
Priority = 1
};
C.Add(new Something() { Id = 14 });
C.Add(new Something() { Id = 15 }).Follows(C[14]);
C.Add(new Something() { Id = 17 }).Follows(C[15]);
C.Add(new Something() { Id = 18 }).Follows(B[9]);
C.Add(new Something() { Id = 19 }).Follows(C[18]);
List<Something> orderedItems = new List<Something>();
List<Layer> layers = new List<Layer>() { A, B, C };
foreach(Layer l in layers.OrderBy(ll => ll.Priority)) orderedItems.AddRange(l.Tree);
}
If you run this in LinqPad, then after the last line, you can:
string.Join(",", orderedItems.Select(s => s.Id.ToString())).Dump();
to see the output:
14,15,17,7,8,1,2,9,18,19,12,3,4,5,6
What you're describing sounds a lot like a topological sort, with a few extra modifications. If you have a set of transitive constraints on a group of objects, then a topological ordering of the objects is one where the elements are listed out in an order where all of the constraints are satisfied. In your case, you have two different classes of constraints in play:
The explicit constraints that you've given above.
The implicit constraint that elements in group C precede elements in group B and elements in group B precede elements in group A whenever possible.
To explicitly convert this into a topological sorting problem, you can do the following. First, create a graph where each array cell is a node. Next, add in the explicit constraints that you already know about. Then, add in the implicit constraints by adding in edges from group A to group B and then from group B to group C. Finally, run a topological sorting algorithm - there are many of them, and they're very efficient - to get back a linearized ordering obeying all of your constraints.
I'm not actually a C# programmer and so I don't know the best way to actually code this up, but I suspect that there are good topological sorting libraries out there online. In the worst case, you should be able to code one up on your own; the DFS-based topological sorting algorithm is rather straightforward.
This approach assumes that, like in your picture, the "follows" relation never has something in a higher array following something in a lower array. If that's not the case, this setup might not work because you may get cycles in the graph. In that case, you can consider using a different approach where you use a normal topological sort, but whenever you have a choice about which node to include next in the topological ordering, you always choose the one from the deepest level.

Creating a new instance of IList<T> from an existing one and modifying it [duplicate]

This question already has answers here:
ToList()-- does it create a new list?
(12 answers)
Closed 10 years ago.
Given the the code below:
public class Item
{
private int _id;
private int _order;
private string _name;
public int Id
{
get { return _id; }
set { _id = value; }
}
public int Order
{
get { return _order; }
set { _order = value; }
}
public string Name
{
get { return _name; }
set { _name = value; }
}
public static IList<Item> InitList1()
{
var list = new List<Item>
{
new Item { Id = 1, Order = 1, Name = "Alpha" },
new Item { Id = 2, Order = 2, Name = "Bravo" },
new Item { Id = 3, Order = 3, Name = "Charlie" },
new Item { Id = 4, Order = 4, Name = "Delta" }
};
return list;
}
}
class Program
{
static void Main(string[] args)
{
// Initialize the lists
IList<Item> list1 = Item.InitList1();
IList<Item> list2 = list1.ToList();
IList<Item> list3 = new List<Item>(list1);
// Modify list2
foreach (Item item in list2)
item.Order++;
// Modify list3
foreach (Item item in list3)
item.Order++;
// Output the lists
Console.WriteLine(string.Format("\nList1\n====================="));
foreach (Item item in list1)
Console.WriteLine(string.Format("Item - id: {0} order: {1} name: {2}", item.Id, item.Order, item.Name));
Console.WriteLine(string.Format("\nList2\n====================="));
foreach (Item item in list2)
Console.WriteLine(string.Format("Item - id: {0} order: {1} name: {2}", item.Id, item.Order, item.Name));
Console.WriteLine(string.Format("\nList3\n====================="));
foreach (Item item in list3)
Console.WriteLine(string.Format("Item - id: {0} order: {1} name: {2}", item.Id, item.Order, item.Name));
Console.Write("\nAny key to exit...");
Console.ReadKey();
}
}
The output will be:
List1
=====================
Item - id: 1 order: 3 name: Alpha
Item - id: 2 order: 4 name: Bravo
Item - id: 3 order: 5 name: Charlie
Item - id: 4 order: 6 name: Delta
List2
=====================
Item - id: 1 order: 3 name: Alpha
Item - id: 2 order: 4 name: Bravo
Item - id: 3 order: 5 name: Charlie
Item - id: 4 order: 6 name: Delta
List3
=====================
Item - id: 1 order: 3 name: Alpha
Item - id: 2 order: 4 name: Bravo
Item - id: 3 order: 5 name: Charlie
Item - id: 4 order: 6 name: Delta
Any key to exit...
Can someone please explain to me:
Why after creating the new lists (list2 and list3) that the actions on those lists affects list1 (and subsequently the two other lists)? and
How I can create a new instance of list1 and modify it without affecting list1?
You've effectively got a "shallow copy". That yes, the lists are copied, but they still point to the original items.
Think of it like this. A list doesn't actually contain the items it contains, instead it has a reference to it. So, when you copy your list the new list just contains a reference to the original item. What you need is something like this
IList newlist = new List<Item>();
foreach(item anItem in myList)
{
newList.Add(item.ReturnCopy());
}
where return copy looks something like this:
public Item ReturnCopy()
{
Item newItem = new Item();
newItem._id = _id;
newItem._order = _order;
newItem._name = _name;
return newItem
}
That will copy all the data from the item, but leave the original intact. There are loads of patterns and interfaces that can offer better implementations, but I just wanted to give you a flavour of how it works.
You have 3 lists, but they contain the same 4 elements (ie. references to the same 3 objects in memory). So when you modify order in an item in List1, it also takes effect in the item in List2 - because it is the same object.
Neither ToList nor the List constructor makes a deep copy.
If you want to copy the objects too, you need to copy them also, to create new references to add to the new lists. In .NET, you would typically implement ICloneable<T> in order to provide a Clone method. If you don't feel you need that, you can just create new Items and copy their properties.
static class Extension
{
public static IList<T> Clone<T>(this IList<T> list) where T: ICloneable
{
return list.Select(i => (T)i.Clone()).ToList();
}
}
Now you can use IList<T>.Clone() to return objects.
You have 3 distinct lists, and so editing those lists (i.e. adding a new item to the list, removing an item, setting a new item at a given position) is a change that won't affect the other variables.
However, the item in each of the lists only contains a reference to an actual Item instance, and all three lists have the same three references. When you change the item that is referenced by that list you are making a change that is "visible" from the other lists.
In order to not see this behavior you need to not only create a new list, but ensure that the items in the new list(s) are brand new references to new objects that happen to contain the same values. In the general case, this isn't a trivial task, nor is it often desirable.
You need to clone the objects within the lists. Otherwise you create new lists and they all point to the same objects.
var listToClone = new List<Item>();
var clonedList = listToClone.Select(item => (Item)item.Clone()).ToList();

How to reverse the order of displaying content of a list?

Let's say I have a list of integers.
var myList = new List<int>();
1, 2, 3, 4, 5, ..., 10
Is there any function that allows me to display them in reverse order, i.e.
10, 9, 8, ..., 1
EDIT
public List<Location> SetHierarchyOfLocation(Location location)
{
var list = new List<Location>(7);
var juridiction = location.Juridiction;
if (juridiction > 0)
{
while (juridiction > 0)
{
var loc = repository.GetLocationByID(juridiction);
list.Add(loc);
juridiction = loc.Juridiction;
}
}
return list;
}
Since the list contains location by location, I want to be able to display it by reversed order as well.
Now, when I write return list.Reversed(), I get the error.
Thanks for helping
var reversed = myList.Reverse() should do the trick.
EDIT:
Correction- as the OP found out, List.Reverse works in-place, unlike Enumerable.Reverse. Thus, the answer is simply myList.Reverse(); - you don't assign it to a new variable.
Is there any function that allows me to display them in reverse order, i.e.
It depends if you want to reverse them in place, or merely produce a sequence of values that is the reverse of the underlying sequence without altering the list in place.
If the former, just use List<T>.Reverse.
// myList is List<int>
myList.Reverse();
Console.WriteLine(String.Join(", ", myList));
If the latter, the key is Enumerable.Reverse:
// myList is List<int>
Console.WriteLine(
String.Join(
", ",
myList.AsEnumerable().Reverse()
)
);
for a beautiful one-liner. Be sure you have using System.Linq;.
To do a foreach loop in a List<Location> reversely you should use:
foreach (Location item in myList.Reverse().ToList())
{
// do something ...
}
The .Reverse() is an in-place reverse; it doesn't return a new list. The .ToList() after the .Reverse() do the trick.

Replacing groups of items in a list with new computed rows

I have a list of objects which have 2 string attributes and a few double attributes each - there are duplicate rows in the sense that the string attributes are the same, while the double attributes are different. I'm trying to merge these duplicates into a single new row that is computed from the values of the double attributes. Simplified example:
1. A, Text1, 5
2. A, Text2, 4
3. B, Text1, 7
4. A, Text1, 3
Taking the 'computed row' as a simple average, I should end up with:
1. A, Text1, 4 (5+3)/2
2. A, Text2, 4
3. B, Text1, 7
This is what I'm doing at the moment:
var groups = (from t in MyList group t by new { t.Field1, t.Field2});
foreach (var #group in groups)
{
if (#group.Count() > 1)
{
var newRow = new MyObject
{
Field1 = #group.ElementAt(0).Field1,
Field2 = #group.ElementAt(0).Field2,
Field3 = #group.Average(i => i.Field3)
};
}
}
Which works fine. I'm just not sure how to replace the rows in the groups I'm iterating over, as there are no remove methods available for the groups. I had originally tried to do this with a coupe of nested for loop comparisons, but since I couldn't modify the list I was iterating over I stored matching indexes, but that meant that I had n*(n-1)/2 matches...
Am I missing a perfectly simple way to do this? I can't believe it's very difficult, but I haven't been able to work it out yet.
Answering my own question rather than deleting it as it might be of use to someone who had a similar brainblock:
I was overcomplicating things - I have ended up just creating and returning a new list with the new rows + the rows that did not need processing.
var groups = (from t in MyList group t by new { t.Field1, t.Field2 });
var returnList = new List<MyObject>();
foreach (var #group in groups)
{
returnList.Add(new TradeScore
{
Field1 = #group.ElementAt(0).Field1,
Field2 = #group.ElementAt(0).Field2,
Field3 = #group.Average(i => i.Field3)
});
}
return returnList;
Simples.
Note this does do the processing on all groups, even those with only 1 item. Although they don't need to be processed, the average of one value is obviously that value, and it avoids the extra if(...) {process and add} else {add}. I doubt the overhead will be noticeable but it's something to keep an eye on.

Categories

Resources