Why does LINQ treat two methods that do the "same" thing differently? - c#

I came across an interesting question today where I have two methods that, at a quick glance, both do the same thing. That is return an IEnumerable of Foo objects.
I have defined them below as List1 and List2:
public class Foo
{
public int ID { get; set; }
public bool Enabled { get; set;}
}
public static class Data
{
public static IEnumerable<Foo> List1
{
get
{
return new List<Foo>
{
new Foo {ID = 1, Enabled = true},
new Foo {ID = 2, Enabled = true},
new Foo {ID = 3, Enabled = true}
};
}
}
public static IEnumerable<Foo> List2
{
get
{
yield return new Foo {ID = 1, Enabled = true};
yield return new Foo {ID = 2, Enabled = true};
yield return new Foo {ID = 3, Enabled = true};
}
}
}
Now consider the following tests:
IEnumerable<Foo> listOne = Data.List1;
listOne.Where(item => item.ID.Equals(2)).First().Enabled = false;
Assert.AreEqual(false, listOne.ElementAt(1).Enabled);
Assert.AreEqual(false, listOne.ToList()[1].Enabled);
IEnumerable<Foo> listTwo = Data.List2;
listTwo.Where(item => item.ID.Equals(2)).First().Enabled = false;
Assert.AreEqual(false, listTwo.ElementAt(1).Enabled);
Assert.AreEqual(false, listTwo.ToList()[1].Enabled);
These two methods seem to do the "same" thing.
Why do the second assertions in the test code fail?
Why is listTwo's second "Foo" item not getting set to false when it is in listOne?
NOTE: I'm after an explanation of why this is allowed to happen and what the differences in the two are. Not how to fix the second assertion as I know that if I add a ToList call to List2 it will work.

The first block of code builds the items once and returns a list with the items.
The second block of code builds those items each time the IEnumerable is walked through.
This means that the second and third line of the first block operate on the same object instance. The second block's second and third line operate on different instances of Foo (new instances are created as you iterate through).
The best way to see this would be to set breakpoints in the methods and run this code under the debugger. The first version will only hit the breakpoint once. The second version will hit it twice, once during the .Where() call, and once during the .ElementAt call. (edit: with the modified code, it will also hit the breakpoint a third time, during the ToList() call.)
The thing to remember here is that an iterator method (ie. it uses yield return) will be run every time the enumerator is iterated through, not just when the initial return value is constructed.

Those are definitely not the same thing.
The first builds and returns a list the moment you call it, and you can cast it back to list and list-y things with it if you want, including add or remove items, and once you've put the results in a variable you're acting on that single set of results. Calling the function would produce another set of results, but re-using the result of a single call acts on the same objects.
The second builds an IEnumerable. You can enumerate it, but you can't treat it as a list without first calling .ToList() on it. In fact, calling the method doesn't do anything until you actually iterate over it. Consider:
var fooList = Data.List2().Where(f => f.ID > 1);
// NO foo objects have been created yet.
foreach (var foo in fooList)
{
// a new Foo object is created, but NOT until it's actually used here
Console.WriteLine(foo.Enabled.ToString());
}
Note that the code above will create the first (unused) Foo instance, but not until entering the foreach loop. So the items aren't actually created until called for. But that means every time you call for them, you're building a new set of items.

listTwo is an iterator - a state machine.
ElementAt must start at the beginning of the iterator to correctly get the i-th index in the IEnumerable (whether or not it is an iterator state machine or a true IEnumerable instance), and as such, listTwo will be reinitialized with the default values of Enabled = true for all three items.

Suggestion: Compile the code and open with reflector. Yield is a syntactical suger. You would be able to see the code logic difference in the code your wrote and the code generated for the yield keyword. Both are not the same.

Related

Does the GetEnumerator() in c# return a copy or the iterates the original source?

I have a simple GetEnumerator usage.
private ConcurrentQueue<string> queue = new ConcurrentQueue<string>();
public IEnumerator GetEnumerator()
{
return queue.GetEnumerator();
}
I want to update the queue outside of this class.
So, I'm doing:
var list = _queue.GetEnumerator();
while (list.MoveNext())
{
list.Current as string = "aaa";
}
Does the GetEnumerator() returns a copy of the queue, or iterated the original value?
So while updating, I update the original?
Thank you :)
It depends on the exact underlying implementation.
As far as I remember, most of the built in dotnet containers use the current data, and not a snapshot.
You will likely get an exception if you modify a collection while iterating over it -- this is to protect against exactly this issue.
This is not the case for ConcurrentQueue<T>, as the GetEnumerator method returns a snapshot of the contents of the queue (as of .Net 4.6 - Docs)
The IEnumerator interface does not have a set on the Current property, so you cannot modify the collection this way (Docs)
Modifying a collection (add, remove, replace elements) when iterating is in general risky, as one should not know how the iterator is implemented.
To add on this, a queue is made to get first element / adding element at the end, but in any case would not allow replacing an element "in the middle".
Here are two approaches that could work:
Approach 1 - Create a new queue with updated elements
Iterate over the original queue and recreate a new collection in the process.
var newQueueUpdated = new ConcurrentQueue<string>();
var iterator = _queue.GetEnumerator();
while (iterator.MoveNext())
{
newQueueUpdated.Add("aaa");
}
_queue = newQueueUpdated;
This is naturally done in one go by using linq .Select and feed the constructor of Queue with the result IEnumerable:
_queue = new ConcurrentQueue<string>(_queue.Select(x => "aaa"));
Beware, could be resource consuming. Of course, other implementations are possible, especially if your collection is large.
Approach 2 - Collection of mutable elements
You could use a wrapper class to enable mutation of objects stored:
public class MyObject
{
public string Value { get; set; }
}
Then you create a private ConcurrentQueue<MyObject> queue = new ConcurrentQueue<MyObject>(); instead.
And now you can mutate the elements, without having to change any reference in the collection itself:
var enumerator = _queue.GetEnumerator();
while (enumerator.MoveNext())
{
enumerator.Current.Value = "aaa";
}
In the code above, the references stored by the container have never changed. Their internal state have changed, though.
In the question code, you were actually trying to change an object (string) by another object, which is not clear in the case of queue, and cannot be done through .Current which is readonly. And for some containers it should even be forbidden.
Here's some test code to see if I can modify the ConcurrentQueue<string> while it is iterating.
ConcurrentQueue<string> queue = new ConcurrentQueue<string>(new[] { "a", "b", "c" });
var e = queue.GetEnumerator();
while (e.MoveNext())
{
Console.Write(e.Current);
if (e.Current == "b")
{
queue.Enqueue("x");
}
}
e = queue.GetEnumerator(); //e.Reset(); is not supported
while (e.MoveNext())
{
Console.Write(e.Current);
}
That runs successfully and produces abcabcx.
However, if we change the collection to a standard List<string> then it fails.
Here's the implementation:
List<string> list = new List<string>(new[] { "a", "b", "c" });
var e = list.GetEnumerator();
while (e.MoveNext())
{
Console.Write(e.Current);
if (e.Current == "b")
{
list.Add("x");
}
}
e = list.GetEnumerator();
while (e.MoveNext())
{
Console.Write(e.Current);
}
That produces ab before throwing an InvalidOperationException.
For ConcurrentQueue this is specifically addressed by the documentation:
The enumeration represents a moment-in-time snapshot of the contents
of the queue. It does not reflect any updates to the collection after
GetEnumerator was called. The enumerator is safe to use concurrently
with reads from and writes to the queue.
So the answer is: It acts as if it returns a copy. (It doesn't actually make a copy, but the effect is as if it was a copy - i.e. changing the original collection while enumerating it will not change the items produced by the enumeration.)
This behaviour is NOT guaranteed for other types - for example, attempting to enumerate a List<T> will fail if the list is modified during the enumeration.

IEnumerable<T> from Enumerable.FromRange().Select() vs ToList()

This really stumped me, as I expected 'pass by reference' behavior. I expected this code to print "5,5,5", but instead it prints "7,7,7".
IEnumerable<MyObj> list = Enumerable.Range(0, 3).Select(x => new MyObj { Name = 7 });
Alter(list);
Console.WriteLine(string.Join(',',list.Select(x => x.Name.ToString())));
Console.ReadLine();
void Alter(IEnumerable<MyObj> list)
{
foreach(MyObj obj in list)
{
obj.Name = 5;
}
}
class MyObj
{
public int Name { get; set; }
}
Whereas this prints "7,7,7" as expected.
IEnumerable<MyObj> list = Enumerable.Range(0, 3).Select(x => new MyObj { Name = 7 }).ToList();
Alter(list);
Console.WriteLine(string.Join(',',list.Select(x => x.Name.ToString())));
Console.ReadLine();
void Alter(IEnumerable<MyObj> list)
{
foreach(MyObj obj in list)
{
obj.Name = 5;
}
}
class MyObj
{
public int Name { get; set; }
}
Obviously this is a simplified version of the actual code I was writing. This feels a lot more like behavior I've run into with a property that instantiates a new instance of an object. Here I understand why I would get a new instance every time I reference Mine.
public MyObj Mine => new MyObj();
I was just very surprised to see this behavior in the above code, where it feels more like I've "locked in" the enumerated objects. Can anyone help me understand this?
This feels a lot more like behavior I've run into with a property that instantiates a new instance of an object
Your feelings are correct
As a preface to this answer I want to point out that in C# we can store methods inside variables like we can store data:
var method = () => new MyObj();
The () => new MyObj() is the "method"; it has no name, takes no parameters and returns a new MyObj. We don't actually refer to it as a method, we tend to call it a lambda, but for all intents and purposes it behaves like what you're familiar with as "a method":
public MyObj SomeName(){
return new MyObj();
}
You can probably pick out the common parts- the compiler guesses the return type, provides a name for it internally because we don't care and when it's a single statement that produces a value, the compiles fills in the return keyword for us too. It's a very compact method logic.
So, back to this variable called method:
var method = () => new MyObj();
You could run it like:
var myObj = method.Invoke();
You could even remove the Invoke word and just write method(), but I'll leave Invoke in for now, for clarity. When invoked the method will run, return a new MyObj data and that data would be stored in myObj..
So, most times when we code, a variable stores data but sometimes it stores a method, and this is handy..
When you make a LINQ Select it requires you to give it a method that it will run every time the resulting enumerable is enumerated:
var enumerable = new []{1,2,3}.Select(x => new MyObj());
It doesn't run the method x => new MyObj() you give at the time you call Select, it doesn't generate any data; it just stores the method for later
Every time you loop over (enumerate) this it will run that method (x => new MyObj()) up to 3 times - I say up to, because you don't have to fully enumerate but if you do, there are 3 elements in the array so the enumeration can cause at most 3 invocations of the method.
LINQ Select doesn't run the method and grab the data - it effectively creates a collection of data-providing methods rather than a collection of data
Conceptually it's like Select produces you an array of full of methods:
var enumerable = new[]{
() => new MyObj(),
() => new MyObj(),
() => new MyObj()
};
You could enumerate this and run these methods yourself:
foreach(var method in enumerable)
method.Invoke();
Your Alter() ran the enumeration (did a loop), got each new object returned from each method, modified it, then it was thrown away. Conceptually your Alter method did this:
foreach(var method in enumerable)
method.Invoke().Name = 5;
The method was run, a new MyObj was made with some default value for Name, then Name was changed to 5, then the object data was thrown away
After your Alter was done with you enumerated it again, new objects were made again because the methods were run again, the Name was of course unaltered - new object, no changes. Nothing here is geared up to remember the data being generated; the only thing that is remembered is the list of methods that generate the data
When you put a ToList() on the end, that's different - ToList internally enumerates the enumerable, causes the methods that generate the objects to be run but critically it stashes the resulting objects in a list and gives you the list of objects. Instead of being a collection of methods that provides values, it's a collection of values that you can alter.
It's like it does this:
var r = new List<MyObj>();
foreach(var method in enumerable)
r.Add(method.Invoke());
return r;
This means you get a list of data back (not a list of methods that provide data), and if you then loop ovet this list you're altering data that is stored in the list, rather than running methods, altering the returned value and throwing it away
Select produces an object that has lazy evaluation. This means all steps are executed on demand.
So, when this line gets executed:
IEnumerable<MyObj> list = Enumerable.Range(0, 3).Select(x => new MyObj { Name = 7 });
no MyObj instance is created yet - only the enumerable that has all the information to compute what you've indicated.
When you call Alter, this gets executed during iteration via foreach and all objects get 5 assigned to their properties.
But when you print it, everything gets executed again here:
Console.WriteLine(string.Join(',',list.Select(x => x.Name.ToString())));
So during execution of string.Join, brand new MyObj instances are created with new MyObj { Name = 7 } and printed.

Why does the IEnumerable<T> successfully enumerate twice without me doing anything to reset it?

I noticed something interesting today when I was making changes for a pull request. Below is my code:
public List<MatColor> Colors { get; set; }
public List<Customer> Customers { get; set; }
public ProductViewModel()
{
this.Colors = new List<MatColor>();
this.Customers = new List<Customer>();
var products = this.LoadProductViewList();
this.LoadColorComboBox(products);
this.LoadCustomerComboBox(products);
}
public void LoadColorComboBox(IEnumerable<Product> products)
{
this.Colors.Clear();
this.Colors = products.Select(p => new MatColor()
{ Code = p.ColorCode, Description = p.ColorDescription })
.DistinctBy(p => m.Code).DistinctBy(p => p.Description).ToList();
}
private void LoadCustomerComboBox(IEnumerable<Product> products)
{
this.Customers.Clear();
this.Customers = products.Select(p => new Customer()
{ Name = p.CustomerName, Number = p.CustomerNumber })
.DistinctBy(p => p.Number).DistinctBy(p => p.Name).ToList();
}
This code does everything I want it to. It successfully populates both the Colors and Customers lists. I understand why it would always successfully populate the Colors list. That's because LoadColorComboBox(...) gets called first.
But an IEnumerable<T> can only get enumerated, ONCE, right? So once it gets enumerated in LoadColorComboBox(...), how is it successfully getting reset and thus enumerated again in LoadCustomerComboBox(...)? I've already checked the underlying type being returned by LoadProductViewList() -- it calls a REST service which returns a Task<IEnumerable<Product>>. Is my IEnumerable<Product> somehow getting passed as a value? It's not a primitive so I was under the impression it's a reference type, thus, would get passed in by reference as default, which would cause the second method to blow up. Can someone please tell me what I'm not seeing here?
And IEnumerable is an object with a GetEnumerator method, that is able to get you an enumerator. Generally one would expect it to be able to give you any number of enumerators. Some specific implementations might not support it, but the contract of the interface is that it should be able to give you as many as you want.
As for the IEnumerator instances that it spits out, they do technically have a Reset method, which is supposed to set it back to the "start" of the sequence. In practice however, most implementations tend to not support the "reset" operator and just throw an exception when you call the method.
In your case, you're not actually reseting an IEnumerator and trying to use it to get the values of a sequence twice (which wouldn't work, as none of the iterators from the LINQ methods that you're using to create your sequence support being reset). What you're doing is simply getting multiple different enumerators, which those LINQ methods all support.
But an IEnumerable can only get enumerated, ONCE, right?
No. An IEnumerable<T> can be enumerated any number of times. But for each enumeration, each item can be yielded only once.
In your case, products will be enumerated once for each LINQ query that needs the list of products... so once in LoadColorComboBox and once in LoadCustomerComboBox.

Adding items to an IEnumerable through an extension method does not work?

In most of the methods I use that return some kind of collection I return IEnumerable rather than the specific type (e.g. List). In many cases I have another collection that I want to combine with the result IEnumerable, this would be exactly like taking a List and adding another List to it using the AddRange method. I have the following example, in it I have created an extension method that should take a collection of items to add and adds them to a base collection, while debugging this appears to works but in the original collection the items are never added. I don't understand this, why aren't they added, is there something about the implementation of the IEnumerable that I am missing? I understand that IEnumerable is a read only interface, but Iam not adding to this list in the example below, I am replacing it, but the original IEnumerable does not change.
class Program
{
static void Main(string[] args)
{
var collectionOne = new CollectionContainerOne();
var collectionTwo = new CollectionContainerTwo();
// Starts at 1- 50 //
collectionOne.Orders.AddRange(collectionTwo.Orders);
// Should now be 100 items but remains original 50 //
}
}
public class CollectionContainerOne
{
public IEnumerable<Order> Orders { get; set; }
public CollectionContainerOne()
{
var testIds = Enumerable.Range(1, 50);
var orders = new List<Order>();
foreach (int i in testIds)
{
orders.Add(new Order() { Id = i, Name = "Order #" + i.ToString() });
}
this.Orders = orders;
}
}
public class CollectionContainerTwo
{
public IEnumerable<Order> Orders { get; set; }
public CollectionContainerTwo()
{
var testIds = Enumerable.Range(51, 50);
var orders = new List<Order>();
foreach (int i in testIds)
{
orders.Add(new Order() { Id = i, Name = "Order #" + i.ToString() });
}
this.Orders = orders;
}
}
public class Order
{
public int Id { get; set; }
public string Name { get; set; }
public override string ToString()
{
return this.Name;
}
}
public static class IEnumerable
{
public static void AddRange<T>(this IEnumerable<T> enumerationToAddTo, IEnumerable<T> itemsToAdd)
{
var addingToList = enumerationToAddTo.ToList();
addingToList.AddRange(itemsToAdd);
// Neither of the following works //
enumerationToAddTo.Concat(addingToList);
// OR
enumerationToAddTo = addingToList;
// OR
enumerationToAddTo = new List<T>(addingToList);
}
}
You are modifying the parameter enumerationToAddTo, which is a reference. However, the reference is not itself passed by reference, so when you modify the reference, the change is not observable in the caller. Furthermore, it is not possible to use ref parameters in extension methods.
You are better off using Enumerable.Concat<T>. Alternatively, you can use ICollection, which has an Add(T) method. Unfortunately, List<T>.AddRange isn't defined in any interface.
Here is an example to illustrate the passing of reference types by reference. As Nikola points out, this is not really useful code. Don't try this at home!
void Caller()
{
// think of ss as a piece of paper that tells you where to find the list.
List<string> ss = new List<string> { "a", "b" };
//passing by value: we take another piece of paper and copy the information on ss to that piece of paper; we pass that to the method
DoNotReassign(ss);
//as this point, ss refers to the same list, that now contains { "a", "b", "c" }
//passing by reference: we pass the actual original piece of paper to the method.
Reassign(ref ss);
// now, ss refers to a different list, whose contents are { "x", "y", "z" }
}
void DoNotReassign(List<string> strings)
{
strings.Add("c");
strings = new List<string> { "x", "y", "z" }; // the caller will not see the change of reference
//in the piece of paper analogy, we have erased the piece of paper and written the location
//of the new list on it. Because this piece of paper is a copy of SS, the caller doesn't see the change.
}
void Reassign(ref List<string> strings)
{
strings.Add("d");
//at this point, strings contains { "a", "b", "c", "d" }, but we're about to throw that away:
strings = new List<string> { "x", "y", "z" };
//because strings is a reference to the caller's variable ss, the caller sees the reassignment to a new collection
//in the piece of paper analogy, when we erase the paper and put the new object's
//location on it, the caller sees that, because we are operating on the same
//piece of paper ("ss") as the caller
}
EDIT
Consider this program fragment:
string originalValue = "Hello, World!";
string workingCopy = originalValue;
workingCopy = workingCopy.Substring(0, workingCopy.Length - 1);
workingCopy = workingCopy + "?";
Console.WriteLine(originalValue.Equals("Hello, World!"); // writes "True"
Console.WriteLine(originalValue.Equals(workingCopy); // writes "False"
If your assumption about reference types were true, the output would be "False" then "True"
Calling your extensions method like this:
collectionOne.Orders.AddRange(collectionTwo.Orders);
Is essentially the same as:
IEnumerable.AddRange(collectionOne.Orders, collectionTwo.Orders);
Now what happens there, is you pass copy of reference to the collectionOne.Orders to the AddRange method. In your AddRange implementation you try to assign new value to the copy. It gets "lost" inside. You are not assigning new value to collectionOne.Orders, you assign it to its local copy - which scope is only within the method body itself. As a result of all modifications happenining inside AddRange, outside world notices no changes.
You either need to return new enumerable, or work on lists directly. Having mutating methods on IEnumerable<T> is rather counterintuitive, I'd stay away from doing that.
What you want exists and is called Concat. Essentially, when you do this in your Main:
var combined = collectionOne.Orders.Concat(collectionTwo.Orders);
Here, combined will refer to an IEnumerable that will traverse both source collections when enumerated.
IEnumerable does not support adding. What you in essence did in your code is create new collection from your enumerable, and add items to that new collection. Your old collection still has same items.
E.g., you create a collection of numbers like this
Collection1 = [ 1, 2, 3, 4, 5 ]
when you do Collection1.ToList().Add(...) you will get new collection with same members, and add new members like so:
Collection1 = [ 1, 2, 3, 4, 5, 6, 7, ... ]
your old collection will however still hold old members, as ToList creates new collection.
Solution #1:
Instead of using IEnumerable use IList which supports modification.
Solution #2 (bad):
Cast your IEnumerable back to it's derived type and add members to it. This is quite bad though, in fact it's better to just return List in the first place
IEnumerable<Order> collectionOne = ...;
List<Order> collectionOneList = (List<Order>)collectionOne;
collectionOneList.Add(new Order());
General guideline (best):
If you are returning collections which are standard in .NET there is no reason to return their interfaces. In this case it's best to use original type. If you are however returning collection which you implemented yourself, then you should return an interface
It's a completely different case when you are thinking about input parameters. If your method is asking to enumerate over items, then you should ask for IEnumerable. This way you can do what you need over it, and you are placing least constraint on person who is calling it. They can send any enumerable. If you need to add to that collection, you may require IList so that you can also modify it in your method.
Basically the problem is that you can't assign a value to enumerationToAddTo partially because it isn't a reference parameter. Also as phoog mentions ToList() creates a new list and does not cast the existing IEnumerable to a list.
This isn't really a good use of a extension. I would recommend that you add a method to your container collection that allows you add add new items to the IEnumerable instance. This would better encapsulate the logic that's particular to that class.

Why does setting a property on an enumerated object not work?

I know a lot about C# but this one is stumping me and Google isn't helping.
I have an IEnumerable range of objects. I want to set a property on the first one. I do so, but when I enumerate over the range of objects after the modification, I don't see my change.
Here's a good example of the problem:
public static void GenericCollectionModifier()
{
// 1, 2, 3, 4... 10
var range = Enumerable.Range(1, 10);
// Convert range into SubItem classes
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i});
Write(items); // Expect to output 1,2,3,4,5,6,7,8,9,10
// Make a change
items.First().MagicNumber = 42;
Write(items); // Expect to output 42,2,3,4,5,6,7,8,9,10
// Actual output: 1,2,3,4,5,6,7,8,9,10
}
public static void Write(IEnumerable<SubItem> items)
{
Console.WriteLine(string.Join(", ", items.Select(item => item.MagicNumber.ToString()).ToArray()));
}
public class SubItem
{
public string Name;
public int MagicNumber;
}
What aspect of C# stops my "MagicNumber = 42" change from being output? Is there a way I can get my change to "stick" without doing some funky converting to List<> or array?
Thanks!
-Mike
When you call First() it enumerates over the result of this bit of code:
Select(i => new SubItem() {Name = "foo", MagicNumber = i});
Note that the Select is a lazy enumerator, meaning that it only does the select when you ask for an item from it (and does it every time you ask it). The results are not stored anywhere, so when you call items.First() you get a new SubItem instance. When you then pass items to Write, it gets a whole bunch of new SubItem instances - not the one you got before.
If you want to store the result of your select and modify it, you need to do something like:
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i}).ToList();
I suspect something going in the background. Most likely due to the fact the IEnumerables can only be iterated once.
Does it work if you add a 'ToList()' after the call to Select() when assigning to 'items'?
The only thing I can think of is that items.First() passes a copy of SubItem to your instead of the reference, so when you set it the change isn't carried through.
I have to assume it has something to do with IQueryable only being able to be iterated once. You may want to try changing this:
// Convert range into SubItem classes
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i});
to
// Convert range into SubItem classes
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i}).ToList();
And see if there are any different results.
You can't/shouldn't modify a collection through an enumerator. I'm surprised this doesn't throw an exception.
.First() is a method, not a property. It returns a new instance of the object in the first position of your Enumerable.

Categories

Resources