Impact of IEnumerable.ToList() - c#

I'm just wondering what goes on when calling .ToList() on an IEnumerable in C#. Do the items actually get copied to completely new duplicated items on the heap or does the new List simply refer to the original items on the heap?
I'm wondering because someone told me it's expensive to call ToList, whereas if it's simply about assigning existing objects to a new list, that's a lightweight call.
I've written this fiddle https://dotnetfiddle.net/s7xIc2
Is simply checking the hashcode enough to know?

IEnumerable doesn't have to contain a list of anything. It can (and often does) resolve each current item at the time it is requested.
On the other hand, an IList is a complete in-memory copy of all the items.
So the answer is... It depends.
What is backing your IEnumerable? If its the file system then yes, calling .ToList can be quite expensive. If its an in-memory list already, then no, calling .ToList would not be terribly expensive.
As an example, lets say you created an IEnumerable that generated and returned a random number each time .Next was called. In this case calling .ToList on the IEnumerable would never return, and would eventually throw an Out Of Memory exception.
However, an IEnumerable of database objects has a finite bounds (usually :) ) and as long as all the data fits in memory, calling .ToList could be entirely appropriate.

Here is one version of ToList:
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source == null) throw Error.ArgumentNull("source");
return new List<TSource>(source);
}
It creates a new list from the source, here is the constructor:
// Constructs a List, copying the contents of the given collection. The
// size and capacity of the new list will both be equal to the size of the
// given collection.
//
public List(IEnumerable<T> collection) {
if (collection==null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
Contract.EndContractBlock();
ICollection<T> c = collection as ICollection<T>;
if( c != null) {
int count = c.Count;
if (count == 0)
{
_items = _emptyArray;
}
else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
}
else {
_size = 0;
_items = _emptyArray;
// This enumerable could be empty. Let Add allocate a new array, if needed.
// Note it will also go to _defaultCapacity first, not 1, then 2, etc.
using(IEnumerator<T> en = collection.GetEnumerator()) {
while(en.MoveNext()) {
Add(en.Current);
}
}
}
}
It copies the items.
The code is from here: referencesource.microsoft.com

The ToList() create a new List object that will contains reference to the original objects or a copy of the object if they are struct.
For instance a List of int would be full copy. A list of "Product" would be only reference to the product, not a full copy. If the original is modified, the product in the list would also be modified.

Related

Return Linq.Where object (IEnumerable) from within a lock - is it thread safe?

Consider the following code block
public class Data
{
public bool Init { get; set; }
public string Value {get; set; }
}
public class Example
{
private Object myObject = new Object();
private List<Data> myList = new List<Data>
{
new Data { Init = true, Value = "abc" },
new Data { Init = false, Value = "def" },
};
public IEnumerable<string> Get()
{
lock(this.myObject)
{
return this.myList.Where(i => i.Init == true).Select(i => i.Value);
}
}
public void Set(string value)
{
lock (this.myObject)
{
this.myList.Add(new Data { Init = false, Value = value });
}
}
}
If multiple threads are calling Get() - will the method be thread-safe?
In addition - will invoking .ToList() at the linq query will make it thread-safe?
return this.myList.Where(i => i.Init == true).Select(i => i.Value).ToList()
Note that you do not lock here:
public void Set(string value)
{
this.myList.Add(new Data { Init = false, Value = value });
}
So it's not thread-safe in any case.
Assuming you just forgot to do that - it's still not safe because Get returns "lazy" IEnumerable. It holds a reference to myList and it will enumerate it only when returned IEnumerable itself will be enumerated. So you are leaking reference to myList you are trying to protect with lock, outside of lock statement to arbitrary code.
You can test it like this:
var example = new Example();
var original = example.Get();
example.Clear(); // new method which clears underlying myList
foreach (var x in original)
Console.WriteLine(x);
Here we call Get, then we clear myList and then we enumerate what we got from Get. Naively one may assume original will contain original 2 items we had, but it will not contain anything, because it's evaluated only when we enumerate original, and at that point in time - list has already been cleared and is empty.
If you use
public IList<string> Get()
{
lock(this.myObject)
{
return this.myList.Where(i => i.Init == true).Select(i => i.Value).ToList();
}
}
Then it will be "safe". Now you return not "lazy" IEnumerable but a new instance of List<> with copies of values you have in myList. Note that it's a good idea to change return type to IList here, otherwise caller might pay additional overhead (like calling ToArray or ToList which makes a copy) while it's not necessary in this case.
You have to be aware about the difference between the potential to enumerate a sequence (= the IEnumerable) and the enumerated sequence itself (the List, Array, etc after you enumerated the sequence.
So you have a class Example, which internally holds a List<Data> in member MyList. Every Data has at least a string property 'Value`.
Class Example has Methods to extract Value, and to add new elements to the MyList.
I'm not sure if it is wise to call them Set and Get, these names are quite confusing. Maybe you've simplified your example (which by the way made it more difficult to talk about it).
You have an object of class Example and two threads, that both have access to this object. You worry, that while one thread is enumerating the elements of the sequence, that the other thread is adding elements of the sequence.
Your Get method returns the "potential to enumerate". The sequence is not enumerated yet after you return from Get, and after the Lock is disposed.
This means, that when you start enumerating the sequence, the Data is not locked anymore. If you've ever returned an IEnumerable from data that you fetched from a database, you probably have seen the same problem: the connection to the database is disposed before you start enumerating.
Solution 1: return enumerated data: inefficient
You already mention one solution: enumerate the data in a List before you return. This way, property MyList will not be accesses after you return from Get, so the lock is not needed anymore:
public IEnumerable<string> GetInitializedValues()
{
lock(this.MyList)
{
return this.MyList
.Where(data => data.Init == true)
.Select(data => data.Value)
.ToList();
}
}
In words: Lock MyList, which is a sequence of Data. Keep only those Data in this sequence that have a true value for property Init. From every remaining Data, take the value of property Value and put them in a List. Dispose the lock and return the List.
This is not efficient if the caller doesn't need the complete list.
// Create an object of class Example which has a zillion Data in MyList:
Example example = CreateFilledClassExample();
// I only want the first element of MyList:
string onlyOneString = example.GetInitializedValues().FirstOrDefault();
GetInitializedValues creates a list of zillion elements, and returns it. The caller only takes the first initialized value and throws the rest of the list away. What a waste of processing power.
Solution 2: use yield return: only enumerate what must be enumerated
The keyword yield means something like: return the next element of the sequence. Keep everything alive, until the caller disposes the IEnumerator
public IEnumerable<string> GetInitializedValues()
{
lock(this.MyList)
{
IEnumerable<string> initializedValues = this.MyList
.Where(data => data.Init == true)
.Select(data => data.Value);
foreach (string initializedValue in initializedValues)
{
yield return initializedValue;
}
}
}
Because the yield is inside the lock, the lock remains active, until you dispose the enumerator:
List<string> someInitializedValues = GetInitializedValues()
.Take(3)
.ToList();
This one is save, and only enumerates the first three elements.
Deep inside it will do something like this:
List<string> someInitializedValues = new List<string>();
IEnumerable<string> enumerableInitializedValues = GetInitializedValues();
// MyList is not locked yet!
// Create the enumerator. This is Disposable, so use using statement:
using (IEnumerator<string> initializedValuesEnumerator = enumerableInitializedValues.GetEnumerator())
{
// start enumerating the first 3 elements (remember: we used Take(3)
while (initializedValuesEnumerator.MoveNext() && someInitializedValues.Count < 3)
{
// there is a next element, fetch it and add the fetched value to the list
string fetchedInitializedValue = initializedValuesEnumerator.Current;
someInitializedValues.Add(fetchedInitializedValue);
}
// the enumerator is not disposed yet, MyList is still locked.
}
// the enumerator is disposed. MyList is not locked anymore

Does the GetEnumerator() in c# return a copy or the iterates the original source?

I have a simple GetEnumerator usage.
private ConcurrentQueue<string> queue = new ConcurrentQueue<string>();
public IEnumerator GetEnumerator()
{
return queue.GetEnumerator();
}
I want to update the queue outside of this class.
So, I'm doing:
var list = _queue.GetEnumerator();
while (list.MoveNext())
{
list.Current as string = "aaa";
}
Does the GetEnumerator() returns a copy of the queue, or iterated the original value?
So while updating, I update the original?
Thank you :)
It depends on the exact underlying implementation.
As far as I remember, most of the built in dotnet containers use the current data, and not a snapshot.
You will likely get an exception if you modify a collection while iterating over it -- this is to protect against exactly this issue.
This is not the case for ConcurrentQueue<T>, as the GetEnumerator method returns a snapshot of the contents of the queue (as of .Net 4.6 - Docs)
The IEnumerator interface does not have a set on the Current property, so you cannot modify the collection this way (Docs)
Modifying a collection (add, remove, replace elements) when iterating is in general risky, as one should not know how the iterator is implemented.
To add on this, a queue is made to get first element / adding element at the end, but in any case would not allow replacing an element "in the middle".
Here are two approaches that could work:
Approach 1 - Create a new queue with updated elements
Iterate over the original queue and recreate a new collection in the process.
var newQueueUpdated = new ConcurrentQueue<string>();
var iterator = _queue.GetEnumerator();
while (iterator.MoveNext())
{
newQueueUpdated.Add("aaa");
}
_queue = newQueueUpdated;
This is naturally done in one go by using linq .Select and feed the constructor of Queue with the result IEnumerable:
_queue = new ConcurrentQueue<string>(_queue.Select(x => "aaa"));
Beware, could be resource consuming. Of course, other implementations are possible, especially if your collection is large.
Approach 2 - Collection of mutable elements
You could use a wrapper class to enable mutation of objects stored:
public class MyObject
{
public string Value { get; set; }
}
Then you create a private ConcurrentQueue<MyObject> queue = new ConcurrentQueue<MyObject>(); instead.
And now you can mutate the elements, without having to change any reference in the collection itself:
var enumerator = _queue.GetEnumerator();
while (enumerator.MoveNext())
{
enumerator.Current.Value = "aaa";
}
In the code above, the references stored by the container have never changed. Their internal state have changed, though.
In the question code, you were actually trying to change an object (string) by another object, which is not clear in the case of queue, and cannot be done through .Current which is readonly. And for some containers it should even be forbidden.
Here's some test code to see if I can modify the ConcurrentQueue<string> while it is iterating.
ConcurrentQueue<string> queue = new ConcurrentQueue<string>(new[] { "a", "b", "c" });
var e = queue.GetEnumerator();
while (e.MoveNext())
{
Console.Write(e.Current);
if (e.Current == "b")
{
queue.Enqueue("x");
}
}
e = queue.GetEnumerator(); //e.Reset(); is not supported
while (e.MoveNext())
{
Console.Write(e.Current);
}
That runs successfully and produces abcabcx.
However, if we change the collection to a standard List<string> then it fails.
Here's the implementation:
List<string> list = new List<string>(new[] { "a", "b", "c" });
var e = list.GetEnumerator();
while (e.MoveNext())
{
Console.Write(e.Current);
if (e.Current == "b")
{
list.Add("x");
}
}
e = list.GetEnumerator();
while (e.MoveNext())
{
Console.Write(e.Current);
}
That produces ab before throwing an InvalidOperationException.
For ConcurrentQueue this is specifically addressed by the documentation:
The enumeration represents a moment-in-time snapshot of the contents
of the queue. It does not reflect any updates to the collection after
GetEnumerator was called. The enumerator is safe to use concurrently
with reads from and writes to the queue.
So the answer is: It acts as if it returns a copy. (It doesn't actually make a copy, but the effect is as if it was a copy - i.e. changing the original collection while enumerating it will not change the items produced by the enumeration.)
This behaviour is NOT guaranteed for other types - for example, attempting to enumerate a List<T> will fail if the list is modified during the enumeration.

Create a copy of IEnumerable<T> to modify collection from different threads?

I am using a thread party data model which uses it's custom data model. Hierarchy of the data model is as below:
Model
---Tables(type of Table)
-----Rows(type of Row)
-------Cells( type of Cell)
Table has property Rows as like DataTable and I have to access this property in more than tasks. Now I need a row from the table which has a column value to the specified value.
To do this, I have created a method which has lock statement to make it accessible from only one thread once.
public static Row GetRowWithColumnValue(Model model, string tableKey, string indexColumnKey, string indexColumnValue)
{
Row simObj = null;
lock (syncRoot)
{
SimWrapperFromValueFactory wrapperSimSystem = new SimWrapperFromValueFactory(model, tableKey, indexColumnKey);
simObj = wrapperSimSystem.GetWrapper(indexColumnValue);
}
return simObj;
}
To create the lookup for one of the column in Table, I have create a method which always try to create a copy of the rows to avoid collection modified exception:
Private Function GetTableRows(table As Table) As List(Of Row)
Dim rowsList As New List(Of Row)(table.Rows) 'Case 1
'rowsList.AddRange(table.Rows) 'Case 2
' Case 3
'For i As Integer = 0 To table.Rows.Count - 1
'rowsList.Add(table.Rows.ElementAt(i))
'Next
Return rowsList
End Function
but other threads can modify the table(e.g. add, remove rows or update column value in any rows). I am getting below "Collection modified exception":
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at System.Collections.Generic.List`1.InsertRange(Int32 index, IEnumerable`1 collection)
I cannot modify this third party library to concurrent collections and this same Data Model shared between multiple project.
Question: I hunting for the solution that let me allow multiple readers on this collection either it modified in another threads.. Is it possible to Get a copy of the collection without getting exception??
Referenced below SO threads but did not find exact solution:
Lock vs. ToArray for thread safe foreach access of List collection
Can ToArray() throw an exception?
Is returning an IEnumerable<> thread-safe?
The simplest solution is to retry on exception, like this:
private List<Row> CopyVolatileList(IEnumerable<Row> original)
{
while (true)
{
try
{
List<Row> copy = new List<Row>();
foreach (Row row in original) {
copy.Add(row);
}
// Validate.
if (copy.Count != 0 && copy[copy.Count - 1] == null) // Assuming Row is a reference type.
{
// At least one element was removed from the list while were copying.
continue;
}
return copy;
}
catch (InvalidOperationException)
{
// Check ex.Message?
}
// Keep trying.
}
}
Eventually you'll get a run where the exception isn't thrown and the data integrity validation passes.
Alternatively, you can dive deep (and I mean very, very deep).
DISCLAIMER: Never ever use this in production. Unless you're desperate and really have no other option.
So we've established that you're working with a custom collection (TableRowCollection) which ultimately uses List<Row>.Enumerator to iterate through the rows. This strongly suggests that your collection is backed by a List<Row>.
First things first, you need to get a reference to that list. Your collection will not expose it publicly, so you'll need to fiddle a bit. You will need to use Reflection to find and get the value of the backing list. I recommend looking at your TableRowCollection in the debugger. It will show you non-public members and you will know what to reflect.
If you can't find your List<Row>, then take a closer look at TableRowCollection.GetEnumerator() - specifically GetEnumerator().GetType(). If that returns List<Row>.Enumerator, then bingo: we can get the backing list out of it, like so:
List<Row> list;
using (IEnumerator<Row> enumerator = table.GetEnumerator())
{
list = (List<Row>)typeof(List<Row>.Enumerator)
.GetField("list", BindingFlags.Instance | BindingFlags.NonPublic)
.GetValue(enumerator);
}
If the above methods of getting your List<Row> have failed, there is no need to read further. You might as well give up.
In case you've succeeded, now that you have the backing List<Row>, we'll have to look at Reference Source for List<T>.
What we see is 3 fields being used:
private T[] _items;
private int _size; // Accessible via "Count".
private int _version;
Our goal is to copy the items whose indexes are between zero and _size - 1 from the _items array into a new array, and to do so in between _version changes.
Observations re thread safety: List<T> does not use locks, none of the fields are marked as volatile and _version is incremented via ++, not Interlocked.Increment. Long story short this means that it is impossible to read all 3 field values and confidently say that we're looking at stable data. We'll have to read the field values repeatedly in order to be somewhat confident that we're looking at a reasonable snapshot (we will never be 100% confident, but you might choose to settle for "good enough").
using System;
using System.Collections.Generic;
using System.Linq.Expressions;
using System.Reflection;
using System.Threading;
private Row[] CopyVolatileList(List<Row> original)
{
while (true)
{
// Get _items and _size values which are safe to use in tandem.
int version = GetVersion(original); // _version.
Row[] items = GetItems(original); // _items.
int count = original.Count; // _size.
if (items.Length < count)
{
// Definitely a torn read. Copy will fail.
continue;
}
// Copy.
Row[] copy = new Row[count];
Array.Copy(items, 0, copy, 0, count);
// Stabilization window.
Thread.Sleep(1);
// Validate.
if (version == GetVersion(original)) {
return copy;
}
// Keep trying.
}
}
static Func<List<Row>, int> GetVersion = CompilePrivateFieldAccessor<List<Row>, int>("_version");
static Func<List<Row>, Row[]> GetItems = CompilePrivateFieldAccessor<List<Row>, Row[]>("_items");
static Func<TObject, TField> CompilePrivateFieldAccessor<TObject, TField>(string fieldName)
{
ParameterExpression param = Expression.Parameter(typeof(TObject), "o");
MemberExpression fieldAccess = Expression.PropertyOrField(param, fieldName);
return Expression
.Lambda<Func<TObject, TField>>(fieldAccess, param)
.Compile();
}
Note re stabilization window: the bigger it is, the more confidence you have that you're not dealing with a torn read (because the list is in process of modifying all 3 fields). I've settled on the smallest value I couldn't fail in my tests where I called CopyVolatileList in a tight loop on one thread, and used another thread to add items to the list, remove them or clear the list at random intervals between 0 and 20ms.
If you remove the stabilization window, you will occasionally get a copy with uninitialized elements at the end of the array because the other thread has removed a row while you were copying - that's why it's needed.
You should obviously validate the copy once it's built, to the best of your ability (at least check for uninitialized elements at the end of the array in case the stabilization window fails).
Good luck.

Remove and Return First Item of List

I was wondering if there was a build in method to remove and return the first item of a list with one method/command.
I used this, which was not pretty
Item currentItem = items.First();
items.RemoveAt(0);
So I could wrote an extension-method:
public static class ListExtensions
{
public static T RemoveAndReturnFirst<T>(this List<T> list)
{
T currentFirst = list.First();
list.RemoveAt(0);
return currentFirst;
}
}
//Example code
Item currentItem = items.RemoveAndReturnFirst();
Is this the best possibility or is there any built-in method?
The list is returned from a nHibernate-Query and therefore it should remain a List<T>.
Most suitable collection for this operation is Queue:
var queue = new Queue<int>();
queue.Enqueue(10); //add first
queue.Enqueue(20); //add to the end
var first = queue.Dequeue(); //removes first and returns it (10)
Queue makes Enqueue and Dequeue operations very fast. But, if you need to search inside queue, or get item by index - it's bad choice. Compare, how many different types of operations do you have and according to this choose the most suitable collection - queue, stack, list or simple array.
Also you can create a Queue from a List:
var list = new List<int>();
var queue = new Queue<int>(list);
There is no built-in method. Your code looks fine to me.
One small thing, I would use the indexer, not the First extension method:
T currentFirst = list[0];
And check your list if there is a Count > 0.
public static T RemoveAndReturnFirst<T>(this List<T> list)
{
if (list == null || list.Count == 0)
{
// Instead of returning the default,
// an exception might be more compliant to the method signature.
return default(T);
}
T currentFirst = list[0];
list.RemoveAt(0);
return currentFirst;
}
If you have to worry about concurrency, I would advice to use another collection type, since this one isn't thread-safe.

List<String> ByRef

I'm wondering how one can prove what the .Net framework is doing behind the scenes.
I have a method that accepts a parameter of a List<String> originalParameterList.
In my method I have another List<String> newListObj if I do the following:
List<String> newListObj = originalParameterList
newListObj.Add(value);
newListObj.Add(value1);
newListObj.Add(value2);
The count of the originalParameterList grows (+3).
If I do this:
List<String> newListObj = new List<String>(originalParamterList);
newListObj.Add(value);
newListObj.Add(value1);
newListObj.Add(value2);
The count of the originalParameterList stays the sames (+0).
I also found that this code behaves the same:
List<String> newListObj = new List<String>(originalParamterList.ToArray());
newListObj.Add(value);
newListObj.Add(value1);
newListObj.Add(value2);
The count of the originalParameterList stays the sames (+0).
My question is, is there a way to see what the .Net Framework is doing behind the scenes in a definitive way?
You can load your assembly into ILDASM and(when loaded),find your method and double-click it,
it will show the cil code of that method.Just type "IL" in windows start menu in the search.
Alternatively you can you can use these following ways to also create a new independent list
private void GetList(List<string> lst)
{
List<string> NewList = lst.Cast<string>().ToList();
NewList.Add("6");
//not same values.
//or....
List<string> NewList = lst.ConvertAll(s => s);
NewList.Add("6");
//again different values
}
Normally, the documentation should give enough information to use the API.
In your specific example, the documentation for public List(IEnumerable<T> collection) says (emphasis mine):
Initializes a new instance of the List class that contains elements
copied from the specified collection and has sufficient capacity to
accommodate the number of elements copied.
For the reference here is the source code for the constructor:
public List (IEnumerable <T> collection)
{
if (collection == null)
throw new ArgumentNullException ("collection");
// initialize to needed size (if determinable)
ICollection <T> c = collection as ICollection <T>;
if (c == null) {
_items = EmptyArray<T>.Value;;
AddEnumerable (collection);
} else {
_size = c.Count;
_items = new T [Math.Max (_size, DefaultCapacity)];
c.CopyTo (_items, 0);
}
}
void AddEnumerable (IEnumerable <T> enumerable)
{
foreach (T t in enumerable)
{
Add (t);
}
}
The simplest way to do it is simply go to MSDN
http://msdn.microsoft.com/en-us/library/fkbw11z0.aspx
It says that
Initializes a new instance of the List class that contains elements copied from the specified collection and has sufficient capacity to accommodate the number of elements copied.
so internally it`s simply add all elements of passed IEnumerable into new list. It also says that
this is a O(n) operation
which means that no optimizations assumed.
That's because the frist case you referenced the original list (since it is a reference type), and you modified it's collection via newListObj. The second and third case you copied the original objects' collection via List constructor List Class, and you modified the new collection, which is not take any effect to the original.
As others already said, there are various tools that let you examine the source code of the .NET framework. I personally prefer dotPeek from JetBrains, which is free.
In the specific case that you have mentioned, I think when you pass a list into the constructor of another list, that list is copied. If you just assign one variable to another, those variables are then simply referring to the same list.
You can either
read the documentation over at MSDN
decompile the resulting MSIL-code, for instance using Telerik's free JustDecompile
or step through the .NET Framework code using the debugger.
This is the code from List constrcutor:
public List(IEnumerable<T> collection)
{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
ICollection<T> collection2 = collection as ICollection<T>;
if (collection2 != null)
{
int count = collection2.Count;
this._items = new T[count];
collection2.CopyTo(this._items, 0);
this._size = count;
return;
}
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current);
}
}
}
As you can see when you calls costructor which takes IEnumerable it copies all data to itself.

Categories

Resources