Why does setting a property on an enumerated object not work? - c#

I know a lot about C# but this one is stumping me and Google isn't helping.
I have an IEnumerable range of objects. I want to set a property on the first one. I do so, but when I enumerate over the range of objects after the modification, I don't see my change.
Here's a good example of the problem:
public static void GenericCollectionModifier()
{
// 1, 2, 3, 4... 10
var range = Enumerable.Range(1, 10);
// Convert range into SubItem classes
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i});
Write(items); // Expect to output 1,2,3,4,5,6,7,8,9,10
// Make a change
items.First().MagicNumber = 42;
Write(items); // Expect to output 42,2,3,4,5,6,7,8,9,10
// Actual output: 1,2,3,4,5,6,7,8,9,10
}
public static void Write(IEnumerable<SubItem> items)
{
Console.WriteLine(string.Join(", ", items.Select(item => item.MagicNumber.ToString()).ToArray()));
}
public class SubItem
{
public string Name;
public int MagicNumber;
}
What aspect of C# stops my "MagicNumber = 42" change from being output? Is there a way I can get my change to "stick" without doing some funky converting to List<> or array?
Thanks!
-Mike

When you call First() it enumerates over the result of this bit of code:
Select(i => new SubItem() {Name = "foo", MagicNumber = i});
Note that the Select is a lazy enumerator, meaning that it only does the select when you ask for an item from it (and does it every time you ask it). The results are not stored anywhere, so when you call items.First() you get a new SubItem instance. When you then pass items to Write, it gets a whole bunch of new SubItem instances - not the one you got before.
If you want to store the result of your select and modify it, you need to do something like:
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i}).ToList();

I suspect something going in the background. Most likely due to the fact the IEnumerables can only be iterated once.
Does it work if you add a 'ToList()' after the call to Select() when assigning to 'items'?

The only thing I can think of is that items.First() passes a copy of SubItem to your instead of the reference, so when you set it the change isn't carried through.
I have to assume it has something to do with IQueryable only being able to be iterated once. You may want to try changing this:
// Convert range into SubItem classes
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i});
to
// Convert range into SubItem classes
var items = range.Select(i => new SubItem() {Name = "foo", MagicNumber = i}).ToList();
And see if there are any different results.

You can't/shouldn't modify a collection through an enumerator. I'm surprised this doesn't throw an exception.

.First() is a method, not a property. It returns a new instance of the object in the first position of your Enumerable.

Related

IEnumerable<T> from Enumerable.FromRange().Select() vs ToList()

This really stumped me, as I expected 'pass by reference' behavior. I expected this code to print "5,5,5", but instead it prints "7,7,7".
IEnumerable<MyObj> list = Enumerable.Range(0, 3).Select(x => new MyObj { Name = 7 });
Alter(list);
Console.WriteLine(string.Join(',',list.Select(x => x.Name.ToString())));
Console.ReadLine();
void Alter(IEnumerable<MyObj> list)
{
foreach(MyObj obj in list)
{
obj.Name = 5;
}
}
class MyObj
{
public int Name { get; set; }
}
Whereas this prints "7,7,7" as expected.
IEnumerable<MyObj> list = Enumerable.Range(0, 3).Select(x => new MyObj { Name = 7 }).ToList();
Alter(list);
Console.WriteLine(string.Join(',',list.Select(x => x.Name.ToString())));
Console.ReadLine();
void Alter(IEnumerable<MyObj> list)
{
foreach(MyObj obj in list)
{
obj.Name = 5;
}
}
class MyObj
{
public int Name { get; set; }
}
Obviously this is a simplified version of the actual code I was writing. This feels a lot more like behavior I've run into with a property that instantiates a new instance of an object. Here I understand why I would get a new instance every time I reference Mine.
public MyObj Mine => new MyObj();
I was just very surprised to see this behavior in the above code, where it feels more like I've "locked in" the enumerated objects. Can anyone help me understand this?
This feels a lot more like behavior I've run into with a property that instantiates a new instance of an object
Your feelings are correct
As a preface to this answer I want to point out that in C# we can store methods inside variables like we can store data:
var method = () => new MyObj();
The () => new MyObj() is the "method"; it has no name, takes no parameters and returns a new MyObj. We don't actually refer to it as a method, we tend to call it a lambda, but for all intents and purposes it behaves like what you're familiar with as "a method":
public MyObj SomeName(){
return new MyObj();
}
You can probably pick out the common parts- the compiler guesses the return type, provides a name for it internally because we don't care and when it's a single statement that produces a value, the compiles fills in the return keyword for us too. It's a very compact method logic.
So, back to this variable called method:
var method = () => new MyObj();
You could run it like:
var myObj = method.Invoke();
You could even remove the Invoke word and just write method(), but I'll leave Invoke in for now, for clarity. When invoked the method will run, return a new MyObj data and that data would be stored in myObj..
So, most times when we code, a variable stores data but sometimes it stores a method, and this is handy..
When you make a LINQ Select it requires you to give it a method that it will run every time the resulting enumerable is enumerated:
var enumerable = new []{1,2,3}.Select(x => new MyObj());
It doesn't run the method x => new MyObj() you give at the time you call Select, it doesn't generate any data; it just stores the method for later
Every time you loop over (enumerate) this it will run that method (x => new MyObj()) up to 3 times - I say up to, because you don't have to fully enumerate but if you do, there are 3 elements in the array so the enumeration can cause at most 3 invocations of the method.
LINQ Select doesn't run the method and grab the data - it effectively creates a collection of data-providing methods rather than a collection of data
Conceptually it's like Select produces you an array of full of methods:
var enumerable = new[]{
() => new MyObj(),
() => new MyObj(),
() => new MyObj()
};
You could enumerate this and run these methods yourself:
foreach(var method in enumerable)
method.Invoke();
Your Alter() ran the enumeration (did a loop), got each new object returned from each method, modified it, then it was thrown away. Conceptually your Alter method did this:
foreach(var method in enumerable)
method.Invoke().Name = 5;
The method was run, a new MyObj was made with some default value for Name, then Name was changed to 5, then the object data was thrown away
After your Alter was done with you enumerated it again, new objects were made again because the methods were run again, the Name was of course unaltered - new object, no changes. Nothing here is geared up to remember the data being generated; the only thing that is remembered is the list of methods that generate the data
When you put a ToList() on the end, that's different - ToList internally enumerates the enumerable, causes the methods that generate the objects to be run but critically it stashes the resulting objects in a list and gives you the list of objects. Instead of being a collection of methods that provides values, it's a collection of values that you can alter.
It's like it does this:
var r = new List<MyObj>();
foreach(var method in enumerable)
r.Add(method.Invoke());
return r;
This means you get a list of data back (not a list of methods that provide data), and if you then loop ovet this list you're altering data that is stored in the list, rather than running methods, altering the returned value and throwing it away
Select produces an object that has lazy evaluation. This means all steps are executed on demand.
So, when this line gets executed:
IEnumerable<MyObj> list = Enumerable.Range(0, 3).Select(x => new MyObj { Name = 7 });
no MyObj instance is created yet - only the enumerable that has all the information to compute what you've indicated.
When you call Alter, this gets executed during iteration via foreach and all objects get 5 assigned to their properties.
But when you print it, everything gets executed again here:
Console.WriteLine(string.Join(',',list.Select(x => x.Name.ToString())));
So during execution of string.Join, brand new MyObj instances are created with new MyObj { Name = 7 } and printed.

Generating a new instance of a List in C#

I have a problem with using C#, if I initialize a certain list, lets say List<T> exampleList using another pre-existing list, lets say toModify like this: List<T> exampleList = new List<T>(toModify). When I later modify toModify list the newly created list also modifies itself. If it passes the value by reference shouldn't the value of exampleList stay the same since it was generated from the other one?
TLDR: Value of a list I initialize using another list(second list) changes when I change the second list. I come from a Java background and can't understand why this happens. Will I always have to use clone?
Let us use this example :
List<A> firstList = new List<A>()
{
new A() { Id = 3 },
new A() { Id = 5 }
};
List<A> secondList = new List<A>(firstList);
secondList[1].Id = 999;
Console.WriteLine(firstList[1].Id);
Output : 999
The main reason for this is that even though we created a new List<T> that points to a new memory allocated on heap it still works with references the point to same objects.
To create a list that points to new (!) objects with the same values we'd need to clone these elements somehow, one way to do it is to use LINQ .Select() method in order to create new objects and then a ToList() method to copy the list itself:
List<A> firstList = new List<A>()
{
new A() { Id = 3 },
new A() { Id = 5 }
};
List<A> secondList = firstList.Select(el => new A() { Id = el.Id }).ToList();
secondList[1].Id = 999;
Console.WriteLine(firstList[1].Id);
Output : 5
Yes.
You're creating a new list containing the same items as the old list. If you clear the first list, the items in the second stay.
But if you change a property for one of the items in the first list, then it is the same object in the second list.
So, both list are referencing the same items in memory. When you write list1[0].SomeProperty = 1 you're changing that using object reference that is the same in list2, so changes are reflected in the second list.
For how to clone a List and generate new references for items, check this SO Answer.
In the following line:
List<T> exampleList = new List<T>(toModify)
you create a list of T calling List<T>'s constructor that takes one argument of type IEnumerable<T>. For further info on the latter, please have a look here.
Method's arguments in C# are passed by default by value and not by reference. They can be passed by reference, but you have to explicitly state this in the signature of the corresponding method using the ref keyword and at the point you call this method, using again the same keyword. So the toModify is passed by value to the constructor of List<T>.
What's the importance of this?
In C# types can be divided into two categories (despite the fact that all types inherit from the System.Object):
Value types
Reference types
When we pass a value type as an argument, we pass a copy of it's value. Each modification we make in either the original value or in the copy of the original value is not reflected to one another. On the other hand, when we pass a reference type as an argument, we pass a copy of that reference. So now we have two references (pointers) that point to the same location in memory. That being said, it's clear that if we change any property of the object in which both references points to, this would be visible by both of them.
In your case, this is what is happening. toModify is a list of reference types (under the hood you have an array, whose items are references to other objects). So any change to the items of the initial list, toModify, is reflected to the list you construct based on this list.
A simple example that you could use to verify the above is the following:
public class Point
{
public int X { get; set; }
public int Y { get; set; }
public override string ToString() => $"X: {X}, Y: {Y}";
}
class Program
{
static void Main(string[] args)
{
var listA = new List<int> {1, 2, 3};
var listB = new List<int>(listA);
// Before the modification
Console.WriteLine(listA[0]); // prints 1
Console.WriteLine(listB[0]); // prints 1
listA[0] = 2;
// After the mofication
Console.WriteLine(listA[0]); // prints 2
Console.WriteLine(listB[0]); // prints 1
Console.ReadKey();
var pointsA = new List<Point>
{
new Point {X = 3, Y = 4},
new Point {X = 4, Y = 5},
new Point {X = 6, Y = 8},
};
var pointsB = new List<Point>(pointsA);
// Before the modification
Console.WriteLine(pointsA[0]); // prints X: 3, Y: 4
Console.WriteLine(pointsB[0]); // prints X: 3, Y: 4
pointsA[0].X = 4;
pointsA[0].Y = 3;
// After the modification
Console.WriteLine(pointsA[0]); // prints X: 4, Y: 3
Console.WriteLine(pointsB[0]); // prints X: 4, Y: 3
Console.ReadKey();
}
}

Function to linq conversion

I have a function which I believe can be simplified into LINQ but have been unable to do so yet.
The function looks like this:
private IList<Colour> GetDifference(IList<Colour> firstList, IList<Colour> secondList)
{
// Create a new list
var list = new List<Colour>();
// Loop through the first list
foreach (var first in firstList)
{
// Create a boolean and set to false
var found = false;
// Loop through the second list
foreach (var second in secondList)
{
// If the first item id is the same as the second item id
if (first.Id == second.Id)
{
// Mark it has being found
found = true;
}
}
// After we have looped through the second list, if we haven't found a match
if (!found)
{
// Add the item to our list
list.Add(first);
}
}
// Return our differences
return list;
}
Can this be converted to a LINQ expression easily?
What is Colour? If it overrides Equals to compare by Id then this would work:
firstList.Except(secondList);
If Colour does not override Equals or it would be wrong for you to do so in the wider context, you could implement an IEqualityComparer<Colour> and pass this as a parameter:
firstList.Except(secondList, comparer);
See the documentation
As noted in the comments below, Except has the added side effect of removing any duplicates in the source (firstList in this example). This may or may not be an issue to you, but should be considered.
If keeping any duplicates in firstList is of importance, then this is the alternative:
var secondSet = new HashSet<Colour>(secondList, comparer);
var result = firstList.Where(c => !secondSet.Contains(c));
As before, comparer is optional if Colour implements appropriate equality
try the following:
var result = firstList.Where(x => !secondList.Any(y => y.ID == x.ID));
Edit:
If you care about runtime and don't mind creating your own IEqualityComparer<>, i would suggest you use Except like Charles suggested in his answer. Except seems to use a hashtable for the second list which speeds it up quite a bit compared to my O(n*m) query. However be aware that Except removes duplicates from secondList as well.

Adding items to an IEnumerable through an extension method does not work?

In most of the methods I use that return some kind of collection I return IEnumerable rather than the specific type (e.g. List). In many cases I have another collection that I want to combine with the result IEnumerable, this would be exactly like taking a List and adding another List to it using the AddRange method. I have the following example, in it I have created an extension method that should take a collection of items to add and adds them to a base collection, while debugging this appears to works but in the original collection the items are never added. I don't understand this, why aren't they added, is there something about the implementation of the IEnumerable that I am missing? I understand that IEnumerable is a read only interface, but Iam not adding to this list in the example below, I am replacing it, but the original IEnumerable does not change.
class Program
{
static void Main(string[] args)
{
var collectionOne = new CollectionContainerOne();
var collectionTwo = new CollectionContainerTwo();
// Starts at 1- 50 //
collectionOne.Orders.AddRange(collectionTwo.Orders);
// Should now be 100 items but remains original 50 //
}
}
public class CollectionContainerOne
{
public IEnumerable<Order> Orders { get; set; }
public CollectionContainerOne()
{
var testIds = Enumerable.Range(1, 50);
var orders = new List<Order>();
foreach (int i in testIds)
{
orders.Add(new Order() { Id = i, Name = "Order #" + i.ToString() });
}
this.Orders = orders;
}
}
public class CollectionContainerTwo
{
public IEnumerable<Order> Orders { get; set; }
public CollectionContainerTwo()
{
var testIds = Enumerable.Range(51, 50);
var orders = new List<Order>();
foreach (int i in testIds)
{
orders.Add(new Order() { Id = i, Name = "Order #" + i.ToString() });
}
this.Orders = orders;
}
}
public class Order
{
public int Id { get; set; }
public string Name { get; set; }
public override string ToString()
{
return this.Name;
}
}
public static class IEnumerable
{
public static void AddRange<T>(this IEnumerable<T> enumerationToAddTo, IEnumerable<T> itemsToAdd)
{
var addingToList = enumerationToAddTo.ToList();
addingToList.AddRange(itemsToAdd);
// Neither of the following works //
enumerationToAddTo.Concat(addingToList);
// OR
enumerationToAddTo = addingToList;
// OR
enumerationToAddTo = new List<T>(addingToList);
}
}
You are modifying the parameter enumerationToAddTo, which is a reference. However, the reference is not itself passed by reference, so when you modify the reference, the change is not observable in the caller. Furthermore, it is not possible to use ref parameters in extension methods.
You are better off using Enumerable.Concat<T>. Alternatively, you can use ICollection, which has an Add(T) method. Unfortunately, List<T>.AddRange isn't defined in any interface.
Here is an example to illustrate the passing of reference types by reference. As Nikola points out, this is not really useful code. Don't try this at home!
void Caller()
{
// think of ss as a piece of paper that tells you where to find the list.
List<string> ss = new List<string> { "a", "b" };
//passing by value: we take another piece of paper and copy the information on ss to that piece of paper; we pass that to the method
DoNotReassign(ss);
//as this point, ss refers to the same list, that now contains { "a", "b", "c" }
//passing by reference: we pass the actual original piece of paper to the method.
Reassign(ref ss);
// now, ss refers to a different list, whose contents are { "x", "y", "z" }
}
void DoNotReassign(List<string> strings)
{
strings.Add("c");
strings = new List<string> { "x", "y", "z" }; // the caller will not see the change of reference
//in the piece of paper analogy, we have erased the piece of paper and written the location
//of the new list on it. Because this piece of paper is a copy of SS, the caller doesn't see the change.
}
void Reassign(ref List<string> strings)
{
strings.Add("d");
//at this point, strings contains { "a", "b", "c", "d" }, but we're about to throw that away:
strings = new List<string> { "x", "y", "z" };
//because strings is a reference to the caller's variable ss, the caller sees the reassignment to a new collection
//in the piece of paper analogy, when we erase the paper and put the new object's
//location on it, the caller sees that, because we are operating on the same
//piece of paper ("ss") as the caller
}
EDIT
Consider this program fragment:
string originalValue = "Hello, World!";
string workingCopy = originalValue;
workingCopy = workingCopy.Substring(0, workingCopy.Length - 1);
workingCopy = workingCopy + "?";
Console.WriteLine(originalValue.Equals("Hello, World!"); // writes "True"
Console.WriteLine(originalValue.Equals(workingCopy); // writes "False"
If your assumption about reference types were true, the output would be "False" then "True"
Calling your extensions method like this:
collectionOne.Orders.AddRange(collectionTwo.Orders);
Is essentially the same as:
IEnumerable.AddRange(collectionOne.Orders, collectionTwo.Orders);
Now what happens there, is you pass copy of reference to the collectionOne.Orders to the AddRange method. In your AddRange implementation you try to assign new value to the copy. It gets "lost" inside. You are not assigning new value to collectionOne.Orders, you assign it to its local copy - which scope is only within the method body itself. As a result of all modifications happenining inside AddRange, outside world notices no changes.
You either need to return new enumerable, or work on lists directly. Having mutating methods on IEnumerable<T> is rather counterintuitive, I'd stay away from doing that.
What you want exists and is called Concat. Essentially, when you do this in your Main:
var combined = collectionOne.Orders.Concat(collectionTwo.Orders);
Here, combined will refer to an IEnumerable that will traverse both source collections when enumerated.
IEnumerable does not support adding. What you in essence did in your code is create new collection from your enumerable, and add items to that new collection. Your old collection still has same items.
E.g., you create a collection of numbers like this
Collection1 = [ 1, 2, 3, 4, 5 ]
when you do Collection1.ToList().Add(...) you will get new collection with same members, and add new members like so:
Collection1 = [ 1, 2, 3, 4, 5, 6, 7, ... ]
your old collection will however still hold old members, as ToList creates new collection.
Solution #1:
Instead of using IEnumerable use IList which supports modification.
Solution #2 (bad):
Cast your IEnumerable back to it's derived type and add members to it. This is quite bad though, in fact it's better to just return List in the first place
IEnumerable<Order> collectionOne = ...;
List<Order> collectionOneList = (List<Order>)collectionOne;
collectionOneList.Add(new Order());
General guideline (best):
If you are returning collections which are standard in .NET there is no reason to return their interfaces. In this case it's best to use original type. If you are however returning collection which you implemented yourself, then you should return an interface
It's a completely different case when you are thinking about input parameters. If your method is asking to enumerate over items, then you should ask for IEnumerable. This way you can do what you need over it, and you are placing least constraint on person who is calling it. They can send any enumerable. If you need to add to that collection, you may require IList so that you can also modify it in your method.
Basically the problem is that you can't assign a value to enumerationToAddTo partially because it isn't a reference parameter. Also as phoog mentions ToList() creates a new list and does not cast the existing IEnumerable to a list.
This isn't really a good use of a extension. I would recommend that you add a method to your container collection that allows you add add new items to the IEnumerable instance. This would better encapsulate the logic that's particular to that class.

Why does LINQ treat two methods that do the "same" thing differently?

I came across an interesting question today where I have two methods that, at a quick glance, both do the same thing. That is return an IEnumerable of Foo objects.
I have defined them below as List1 and List2:
public class Foo
{
public int ID { get; set; }
public bool Enabled { get; set;}
}
public static class Data
{
public static IEnumerable<Foo> List1
{
get
{
return new List<Foo>
{
new Foo {ID = 1, Enabled = true},
new Foo {ID = 2, Enabled = true},
new Foo {ID = 3, Enabled = true}
};
}
}
public static IEnumerable<Foo> List2
{
get
{
yield return new Foo {ID = 1, Enabled = true};
yield return new Foo {ID = 2, Enabled = true};
yield return new Foo {ID = 3, Enabled = true};
}
}
}
Now consider the following tests:
IEnumerable<Foo> listOne = Data.List1;
listOne.Where(item => item.ID.Equals(2)).First().Enabled = false;
Assert.AreEqual(false, listOne.ElementAt(1).Enabled);
Assert.AreEqual(false, listOne.ToList()[1].Enabled);
IEnumerable<Foo> listTwo = Data.List2;
listTwo.Where(item => item.ID.Equals(2)).First().Enabled = false;
Assert.AreEqual(false, listTwo.ElementAt(1).Enabled);
Assert.AreEqual(false, listTwo.ToList()[1].Enabled);
These two methods seem to do the "same" thing.
Why do the second assertions in the test code fail?
Why is listTwo's second "Foo" item not getting set to false when it is in listOne?
NOTE: I'm after an explanation of why this is allowed to happen and what the differences in the two are. Not how to fix the second assertion as I know that if I add a ToList call to List2 it will work.
The first block of code builds the items once and returns a list with the items.
The second block of code builds those items each time the IEnumerable is walked through.
This means that the second and third line of the first block operate on the same object instance. The second block's second and third line operate on different instances of Foo (new instances are created as you iterate through).
The best way to see this would be to set breakpoints in the methods and run this code under the debugger. The first version will only hit the breakpoint once. The second version will hit it twice, once during the .Where() call, and once during the .ElementAt call. (edit: with the modified code, it will also hit the breakpoint a third time, during the ToList() call.)
The thing to remember here is that an iterator method (ie. it uses yield return) will be run every time the enumerator is iterated through, not just when the initial return value is constructed.
Those are definitely not the same thing.
The first builds and returns a list the moment you call it, and you can cast it back to list and list-y things with it if you want, including add or remove items, and once you've put the results in a variable you're acting on that single set of results. Calling the function would produce another set of results, but re-using the result of a single call acts on the same objects.
The second builds an IEnumerable. You can enumerate it, but you can't treat it as a list without first calling .ToList() on it. In fact, calling the method doesn't do anything until you actually iterate over it. Consider:
var fooList = Data.List2().Where(f => f.ID > 1);
// NO foo objects have been created yet.
foreach (var foo in fooList)
{
// a new Foo object is created, but NOT until it's actually used here
Console.WriteLine(foo.Enabled.ToString());
}
Note that the code above will create the first (unused) Foo instance, but not until entering the foreach loop. So the items aren't actually created until called for. But that means every time you call for them, you're building a new set of items.
listTwo is an iterator - a state machine.
ElementAt must start at the beginning of the iterator to correctly get the i-th index in the IEnumerable (whether or not it is an iterator state machine or a true IEnumerable instance), and as such, listTwo will be reinitialized with the default values of Enabled = true for all three items.
Suggestion: Compile the code and open with reflector. Yield is a syntactical suger. You would be able to see the code logic difference in the code your wrote and the code generated for the yield keyword. Both are not the same.

Categories

Resources