IEnumerable<T> from Enumerable.FromRange().Select() vs ToList() - c#

This really stumped me, as I expected 'pass by reference' behavior. I expected this code to print "5,5,5", but instead it prints "7,7,7".
IEnumerable<MyObj> list = Enumerable.Range(0, 3).Select(x => new MyObj { Name = 7 });
Alter(list);
Console.WriteLine(string.Join(',',list.Select(x => x.Name.ToString())));
Console.ReadLine();
void Alter(IEnumerable<MyObj> list)
{
foreach(MyObj obj in list)
{
obj.Name = 5;
}
}
class MyObj
{
public int Name { get; set; }
}
Whereas this prints "7,7,7" as expected.
IEnumerable<MyObj> list = Enumerable.Range(0, 3).Select(x => new MyObj { Name = 7 }).ToList();
Alter(list);
Console.WriteLine(string.Join(',',list.Select(x => x.Name.ToString())));
Console.ReadLine();
void Alter(IEnumerable<MyObj> list)
{
foreach(MyObj obj in list)
{
obj.Name = 5;
}
}
class MyObj
{
public int Name { get; set; }
}
Obviously this is a simplified version of the actual code I was writing. This feels a lot more like behavior I've run into with a property that instantiates a new instance of an object. Here I understand why I would get a new instance every time I reference Mine.
public MyObj Mine => new MyObj();
I was just very surprised to see this behavior in the above code, where it feels more like I've "locked in" the enumerated objects. Can anyone help me understand this?

This feels a lot more like behavior I've run into with a property that instantiates a new instance of an object
Your feelings are correct
As a preface to this answer I want to point out that in C# we can store methods inside variables like we can store data:
var method = () => new MyObj();
The () => new MyObj() is the "method"; it has no name, takes no parameters and returns a new MyObj. We don't actually refer to it as a method, we tend to call it a lambda, but for all intents and purposes it behaves like what you're familiar with as "a method":
public MyObj SomeName(){
return new MyObj();
}
You can probably pick out the common parts- the compiler guesses the return type, provides a name for it internally because we don't care and when it's a single statement that produces a value, the compiles fills in the return keyword for us too. It's a very compact method logic.
So, back to this variable called method:
var method = () => new MyObj();
You could run it like:
var myObj = method.Invoke();
You could even remove the Invoke word and just write method(), but I'll leave Invoke in for now, for clarity. When invoked the method will run, return a new MyObj data and that data would be stored in myObj..
So, most times when we code, a variable stores data but sometimes it stores a method, and this is handy..
When you make a LINQ Select it requires you to give it a method that it will run every time the resulting enumerable is enumerated:
var enumerable = new []{1,2,3}.Select(x => new MyObj());
It doesn't run the method x => new MyObj() you give at the time you call Select, it doesn't generate any data; it just stores the method for later
Every time you loop over (enumerate) this it will run that method (x => new MyObj()) up to 3 times - I say up to, because you don't have to fully enumerate but if you do, there are 3 elements in the array so the enumeration can cause at most 3 invocations of the method.
LINQ Select doesn't run the method and grab the data - it effectively creates a collection of data-providing methods rather than a collection of data
Conceptually it's like Select produces you an array of full of methods:
var enumerable = new[]{
() => new MyObj(),
() => new MyObj(),
() => new MyObj()
};
You could enumerate this and run these methods yourself:
foreach(var method in enumerable)
method.Invoke();
Your Alter() ran the enumeration (did a loop), got each new object returned from each method, modified it, then it was thrown away. Conceptually your Alter method did this:
foreach(var method in enumerable)
method.Invoke().Name = 5;
The method was run, a new MyObj was made with some default value for Name, then Name was changed to 5, then the object data was thrown away
After your Alter was done with you enumerated it again, new objects were made again because the methods were run again, the Name was of course unaltered - new object, no changes. Nothing here is geared up to remember the data being generated; the only thing that is remembered is the list of methods that generate the data
When you put a ToList() on the end, that's different - ToList internally enumerates the enumerable, causes the methods that generate the objects to be run but critically it stashes the resulting objects in a list and gives you the list of objects. Instead of being a collection of methods that provides values, it's a collection of values that you can alter.
It's like it does this:
var r = new List<MyObj>();
foreach(var method in enumerable)
r.Add(method.Invoke());
return r;
This means you get a list of data back (not a list of methods that provide data), and if you then loop ovet this list you're altering data that is stored in the list, rather than running methods, altering the returned value and throwing it away

Select produces an object that has lazy evaluation. This means all steps are executed on demand.
So, when this line gets executed:
IEnumerable<MyObj> list = Enumerable.Range(0, 3).Select(x => new MyObj { Name = 7 });
no MyObj instance is created yet - only the enumerable that has all the information to compute what you've indicated.
When you call Alter, this gets executed during iteration via foreach and all objects get 5 assigned to their properties.
But when you print it, everything gets executed again here:
Console.WriteLine(string.Join(',',list.Select(x => x.Name.ToString())));
So during execution of string.Join, brand new MyObj instances are created with new MyObj { Name = 7 } and printed.

Related

Enumerable.Repeat perform badly with for loop on initializing List<> [duplicate]

I have a question about Enumerable.Repeat function.
If I will have a class:
class A
{
//code
}
And I will create an array, of that type objects:
A [] arr = new A[50];
And next, I will want to initialize those objects, calling Enumerable.Repeat:
arr = Enumerable.Repeat(new A(), 50);
Will those objects have the same address in memory?
If I will want to check their hash code, for example in that way:
bool theSameHashCode = questions[0].GetHashCode() == questions[1].GetHashCode();
This will return me true, and if I will change one object properties, all other objects will change it too.
So my question is: is that properly way, to initialize reference type objects? If not, then what is a better way?
Using Enumerable.Repeat this way will initialize only one object and return that object every time when you iterate over the result.
Will those objects have the same address in memory?
There is only one object.
To achieve what you want, you can do this:
Enumerable.Range(1, 50).Select(i => new A()).ToArray();
This will return an array of 50 distinct objects of type A.
By the way, the fact that GetHashCode() returns the same value does not imply that the objects are referentially equal (or simply equal, for that matter). Two non-equal objects can have the same hash code.
Just to help clarify for Camilo, here's some test code that shows the issue at hand:
void Main()
{
var foos = Enumerable.Repeat(new Foo(), 2).ToArray();
foos[0].Name = "Jack";
foos[1].Name = "Jill";
Console.WriteLine(foos[0].Name);
}
public class Foo
{
public string Name;
}
This prints "Jill". Thus it shows that Enumerable.Repeat is only creating one instance of the Foo class.
When using the following code to create an array:
var foos = Enumerable.Repeat(new Foo(), 2).ToArray();
The reason why each location in the array is the same is because you are passing an object, and not a function that creates an object, the code above is the same as:
var foo = new Foo();
var foos = Enumerable.Repeat(foo , 2).ToArray();
The reason above also explains why using a Select statement, like in the code below, creates a new object for each entry, because you are passing a function that dictates how each object is created, rather than the object itself.
Enumerable.Range(1, 2).Select(i => new Foo()).ToArray();
I would use a simple for loop to populate an array with new reference types.

Why does the IEnumerable<T> successfully enumerate twice without me doing anything to reset it?

I noticed something interesting today when I was making changes for a pull request. Below is my code:
public List<MatColor> Colors { get; set; }
public List<Customer> Customers { get; set; }
public ProductViewModel()
{
this.Colors = new List<MatColor>();
this.Customers = new List<Customer>();
var products = this.LoadProductViewList();
this.LoadColorComboBox(products);
this.LoadCustomerComboBox(products);
}
public void LoadColorComboBox(IEnumerable<Product> products)
{
this.Colors.Clear();
this.Colors = products.Select(p => new MatColor()
{ Code = p.ColorCode, Description = p.ColorDescription })
.DistinctBy(p => m.Code).DistinctBy(p => p.Description).ToList();
}
private void LoadCustomerComboBox(IEnumerable<Product> products)
{
this.Customers.Clear();
this.Customers = products.Select(p => new Customer()
{ Name = p.CustomerName, Number = p.CustomerNumber })
.DistinctBy(p => p.Number).DistinctBy(p => p.Name).ToList();
}
This code does everything I want it to. It successfully populates both the Colors and Customers lists. I understand why it would always successfully populate the Colors list. That's because LoadColorComboBox(...) gets called first.
But an IEnumerable<T> can only get enumerated, ONCE, right? So once it gets enumerated in LoadColorComboBox(...), how is it successfully getting reset and thus enumerated again in LoadCustomerComboBox(...)? I've already checked the underlying type being returned by LoadProductViewList() -- it calls a REST service which returns a Task<IEnumerable<Product>>. Is my IEnumerable<Product> somehow getting passed as a value? It's not a primitive so I was under the impression it's a reference type, thus, would get passed in by reference as default, which would cause the second method to blow up. Can someone please tell me what I'm not seeing here?
And IEnumerable is an object with a GetEnumerator method, that is able to get you an enumerator. Generally one would expect it to be able to give you any number of enumerators. Some specific implementations might not support it, but the contract of the interface is that it should be able to give you as many as you want.
As for the IEnumerator instances that it spits out, they do technically have a Reset method, which is supposed to set it back to the "start" of the sequence. In practice however, most implementations tend to not support the "reset" operator and just throw an exception when you call the method.
In your case, you're not actually reseting an IEnumerator and trying to use it to get the values of a sequence twice (which wouldn't work, as none of the iterators from the LINQ methods that you're using to create your sequence support being reset). What you're doing is simply getting multiple different enumerators, which those LINQ methods all support.
But an IEnumerable can only get enumerated, ONCE, right?
No. An IEnumerable<T> can be enumerated any number of times. But for each enumeration, each item can be yielded only once.
In your case, products will be enumerated once for each LINQ query that needs the list of products... so once in LoadColorComboBox and once in LoadCustomerComboBox.

What is the correct way of ignoring arguments in LINQ?

I have the following code:
foreach (var b in userNames.Select(a => new User()))
{
...
}
This works quite well, since it gives me all "fresh" user objects, however Code Analysis complains that I shouldn't create unused locals, so my question is, is there a way of ignoring the arguments (similar to the "_" in Haskell).
PS: prehaps my example is not the best. I am sorry for this.
Thanks!
Update 1
I got the following code analysis error:
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Performance", "CA1804:RemoveUnusedLocals", MessageId = "a"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Performance", "CA1804:RemoveUnusedLocals", MessageId = "b")]
_ is a perfectly valid variable name in C#. So writing
foreach(var b in userNames.Select(_ => new User()))
{
}
is perfectly valid code. It depends on your analysis rules whether it accepts such cases or not.
However, your code is indeed quite suspicious: you're mapping a collection of user names to a collection of users but you're not specifying a direct relation between the two: maybe you wanted to write something like this:
foreach(var b in userNames.Select(username => new User(username)))
To create a collection of objects of a given size, just use the length from the original collection.
var newColletion = Enumerable.Repeat(false, input.Length)
.Select(_ => new User());
but perhaps better would be your own helper method
static class MyEnumerable {
IEnumberable<T> Repeat<T>(Func<T> generator, int count) {
for (var i = 0; i < count; ++i) {
yield return generator();
}
}
}
and then use
var newCollection = MyEnumerable.Repeat(() => new User(), oldCollection.Length);
If quantity is your concern, and need linq, rewrite it as
foreach(var user in Enumerable.Repeat(()=>new User(),Usernames.Count).Select(x=>x()))
{
}
But, it may look ugly based on how you see it.
I'm sure there is a valid case where you would want to ignore arguments like this, but this doesn't seem like one of them. If you are creating N User objects for N userNames, surely you want to couple those together?
foreach (var b in userNames.Select(a => new { name = a, user = new User() }))
{
...
}
Then you won't have any unused arguments.
But the question remains why you aren't just doing this:
foreach (var name in userNames)
{
var user = new User();
// ...
}
As far as I can see, your use of .Select() makes no sense here.
Select performs a projection on the collection on which it is called, performing a specified operation on each element and returning the transformed results in another collection. If you do not need to perform any operation on the lambda element, you'd better simply create an array of User objects directly.
Answering your edit, as I said above, you can simply do this:
var NewColl = new User[userNames.Length];
As for initialization, you could have done this:
Enumerable.Repeat<User>(new User(), userNames.Length);

Why does LINQ treat two methods that do the "same" thing differently?

I came across an interesting question today where I have two methods that, at a quick glance, both do the same thing. That is return an IEnumerable of Foo objects.
I have defined them below as List1 and List2:
public class Foo
{
public int ID { get; set; }
public bool Enabled { get; set;}
}
public static class Data
{
public static IEnumerable<Foo> List1
{
get
{
return new List<Foo>
{
new Foo {ID = 1, Enabled = true},
new Foo {ID = 2, Enabled = true},
new Foo {ID = 3, Enabled = true}
};
}
}
public static IEnumerable<Foo> List2
{
get
{
yield return new Foo {ID = 1, Enabled = true};
yield return new Foo {ID = 2, Enabled = true};
yield return new Foo {ID = 3, Enabled = true};
}
}
}
Now consider the following tests:
IEnumerable<Foo> listOne = Data.List1;
listOne.Where(item => item.ID.Equals(2)).First().Enabled = false;
Assert.AreEqual(false, listOne.ElementAt(1).Enabled);
Assert.AreEqual(false, listOne.ToList()[1].Enabled);
IEnumerable<Foo> listTwo = Data.List2;
listTwo.Where(item => item.ID.Equals(2)).First().Enabled = false;
Assert.AreEqual(false, listTwo.ElementAt(1).Enabled);
Assert.AreEqual(false, listTwo.ToList()[1].Enabled);
These two methods seem to do the "same" thing.
Why do the second assertions in the test code fail?
Why is listTwo's second "Foo" item not getting set to false when it is in listOne?
NOTE: I'm after an explanation of why this is allowed to happen and what the differences in the two are. Not how to fix the second assertion as I know that if I add a ToList call to List2 it will work.
The first block of code builds the items once and returns a list with the items.
The second block of code builds those items each time the IEnumerable is walked through.
This means that the second and third line of the first block operate on the same object instance. The second block's second and third line operate on different instances of Foo (new instances are created as you iterate through).
The best way to see this would be to set breakpoints in the methods and run this code under the debugger. The first version will only hit the breakpoint once. The second version will hit it twice, once during the .Where() call, and once during the .ElementAt call. (edit: with the modified code, it will also hit the breakpoint a third time, during the ToList() call.)
The thing to remember here is that an iterator method (ie. it uses yield return) will be run every time the enumerator is iterated through, not just when the initial return value is constructed.
Those are definitely not the same thing.
The first builds and returns a list the moment you call it, and you can cast it back to list and list-y things with it if you want, including add or remove items, and once you've put the results in a variable you're acting on that single set of results. Calling the function would produce another set of results, but re-using the result of a single call acts on the same objects.
The second builds an IEnumerable. You can enumerate it, but you can't treat it as a list without first calling .ToList() on it. In fact, calling the method doesn't do anything until you actually iterate over it. Consider:
var fooList = Data.List2().Where(f => f.ID > 1);
// NO foo objects have been created yet.
foreach (var foo in fooList)
{
// a new Foo object is created, but NOT until it's actually used here
Console.WriteLine(foo.Enabled.ToString());
}
Note that the code above will create the first (unused) Foo instance, but not until entering the foreach loop. So the items aren't actually created until called for. But that means every time you call for them, you're building a new set of items.
listTwo is an iterator - a state machine.
ElementAt must start at the beginning of the iterator to correctly get the i-th index in the IEnumerable (whether or not it is an iterator state machine or a true IEnumerable instance), and as such, listTwo will be reinitialized with the default values of Enabled = true for all three items.
Suggestion: Compile the code and open with reflector. Yield is a syntactical suger. You would be able to see the code logic difference in the code your wrote and the code generated for the yield keyword. Both are not the same.

Why does setting a property on an enumerated object not work?

I know a lot about C# but this one is stumping me and Google isn't helping.
I have an IEnumerable range of objects. I want to set a property on the first one. I do so, but when I enumerate over the range of objects after the modification, I don't see my change.
Here's a good example of the problem:
public static void GenericCollectionModifier()
{
// 1, 2, 3, 4... 10
var range = Enumerable.Range(1, 10);
// Convert range into SubItem classes
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i});
Write(items); // Expect to output 1,2,3,4,5,6,7,8,9,10
// Make a change
items.First().MagicNumber = 42;
Write(items); // Expect to output 42,2,3,4,5,6,7,8,9,10
// Actual output: 1,2,3,4,5,6,7,8,9,10
}
public static void Write(IEnumerable<SubItem> items)
{
Console.WriteLine(string.Join(", ", items.Select(item => item.MagicNumber.ToString()).ToArray()));
}
public class SubItem
{
public string Name;
public int MagicNumber;
}
What aspect of C# stops my "MagicNumber = 42" change from being output? Is there a way I can get my change to "stick" without doing some funky converting to List<> or array?
Thanks!
-Mike
When you call First() it enumerates over the result of this bit of code:
Select(i => new SubItem() {Name = "foo", MagicNumber = i});
Note that the Select is a lazy enumerator, meaning that it only does the select when you ask for an item from it (and does it every time you ask it). The results are not stored anywhere, so when you call items.First() you get a new SubItem instance. When you then pass items to Write, it gets a whole bunch of new SubItem instances - not the one you got before.
If you want to store the result of your select and modify it, you need to do something like:
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i}).ToList();
I suspect something going in the background. Most likely due to the fact the IEnumerables can only be iterated once.
Does it work if you add a 'ToList()' after the call to Select() when assigning to 'items'?
The only thing I can think of is that items.First() passes a copy of SubItem to your instead of the reference, so when you set it the change isn't carried through.
I have to assume it has something to do with IQueryable only being able to be iterated once. You may want to try changing this:
// Convert range into SubItem classes
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i});
to
// Convert range into SubItem classes
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i}).ToList();
And see if there are any different results.
You can't/shouldn't modify a collection through an enumerator. I'm surprised this doesn't throw an exception.
.First() is a method, not a property. It returns a new instance of the object in the first position of your Enumerable.

Categories

Resources