Create a copy of IEnumerable<T> to modify collection from different threads? - c#

I am using a thread party data model which uses it's custom data model. Hierarchy of the data model is as below:
Model
---Tables(type of Table)
-----Rows(type of Row)
-------Cells( type of Cell)
Table has property Rows as like DataTable and I have to access this property in more than tasks. Now I need a row from the table which has a column value to the specified value.
To do this, I have created a method which has lock statement to make it accessible from only one thread once.
public static Row GetRowWithColumnValue(Model model, string tableKey, string indexColumnKey, string indexColumnValue)
{
Row simObj = null;
lock (syncRoot)
{
SimWrapperFromValueFactory wrapperSimSystem = new SimWrapperFromValueFactory(model, tableKey, indexColumnKey);
simObj = wrapperSimSystem.GetWrapper(indexColumnValue);
}
return simObj;
}
To create the lookup for one of the column in Table, I have create a method which always try to create a copy of the rows to avoid collection modified exception:
Private Function GetTableRows(table As Table) As List(Of Row)
Dim rowsList As New List(Of Row)(table.Rows) 'Case 1
'rowsList.AddRange(table.Rows) 'Case 2
' Case 3
'For i As Integer = 0 To table.Rows.Count - 1
'rowsList.Add(table.Rows.ElementAt(i))
'Next
Return rowsList
End Function
but other threads can modify the table(e.g. add, remove rows or update column value in any rows). I am getting below "Collection modified exception":
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at System.Collections.Generic.List`1.InsertRange(Int32 index, IEnumerable`1 collection)
I cannot modify this third party library to concurrent collections and this same Data Model shared between multiple project.
Question: I hunting for the solution that let me allow multiple readers on this collection either it modified in another threads.. Is it possible to Get a copy of the collection without getting exception??
Referenced below SO threads but did not find exact solution:
Lock vs. ToArray for thread safe foreach access of List collection
Can ToArray() throw an exception?
Is returning an IEnumerable<> thread-safe?

The simplest solution is to retry on exception, like this:
private List<Row> CopyVolatileList(IEnumerable<Row> original)
{
while (true)
{
try
{
List<Row> copy = new List<Row>();
foreach (Row row in original) {
copy.Add(row);
}
// Validate.
if (copy.Count != 0 && copy[copy.Count - 1] == null) // Assuming Row is a reference type.
{
// At least one element was removed from the list while were copying.
continue;
}
return copy;
}
catch (InvalidOperationException)
{
// Check ex.Message?
}
// Keep trying.
}
}
Eventually you'll get a run where the exception isn't thrown and the data integrity validation passes.
Alternatively, you can dive deep (and I mean very, very deep).
DISCLAIMER: Never ever use this in production. Unless you're desperate and really have no other option.
So we've established that you're working with a custom collection (TableRowCollection) which ultimately uses List<Row>.Enumerator to iterate through the rows. This strongly suggests that your collection is backed by a List<Row>.
First things first, you need to get a reference to that list. Your collection will not expose it publicly, so you'll need to fiddle a bit. You will need to use Reflection to find and get the value of the backing list. I recommend looking at your TableRowCollection in the debugger. It will show you non-public members and you will know what to reflect.
If you can't find your List<Row>, then take a closer look at TableRowCollection.GetEnumerator() - specifically GetEnumerator().GetType(). If that returns List<Row>.Enumerator, then bingo: we can get the backing list out of it, like so:
List<Row> list;
using (IEnumerator<Row> enumerator = table.GetEnumerator())
{
list = (List<Row>)typeof(List<Row>.Enumerator)
.GetField("list", BindingFlags.Instance | BindingFlags.NonPublic)
.GetValue(enumerator);
}
If the above methods of getting your List<Row> have failed, there is no need to read further. You might as well give up.
In case you've succeeded, now that you have the backing List<Row>, we'll have to look at Reference Source for List<T>.
What we see is 3 fields being used:
private T[] _items;
private int _size; // Accessible via "Count".
private int _version;
Our goal is to copy the items whose indexes are between zero and _size - 1 from the _items array into a new array, and to do so in between _version changes.
Observations re thread safety: List<T> does not use locks, none of the fields are marked as volatile and _version is incremented via ++, not Interlocked.Increment. Long story short this means that it is impossible to read all 3 field values and confidently say that we're looking at stable data. We'll have to read the field values repeatedly in order to be somewhat confident that we're looking at a reasonable snapshot (we will never be 100% confident, but you might choose to settle for "good enough").
using System;
using System.Collections.Generic;
using System.Linq.Expressions;
using System.Reflection;
using System.Threading;
private Row[] CopyVolatileList(List<Row> original)
{
while (true)
{
// Get _items and _size values which are safe to use in tandem.
int version = GetVersion(original); // _version.
Row[] items = GetItems(original); // _items.
int count = original.Count; // _size.
if (items.Length < count)
{
// Definitely a torn read. Copy will fail.
continue;
}
// Copy.
Row[] copy = new Row[count];
Array.Copy(items, 0, copy, 0, count);
// Stabilization window.
Thread.Sleep(1);
// Validate.
if (version == GetVersion(original)) {
return copy;
}
// Keep trying.
}
}
static Func<List<Row>, int> GetVersion = CompilePrivateFieldAccessor<List<Row>, int>("_version");
static Func<List<Row>, Row[]> GetItems = CompilePrivateFieldAccessor<List<Row>, Row[]>("_items");
static Func<TObject, TField> CompilePrivateFieldAccessor<TObject, TField>(string fieldName)
{
ParameterExpression param = Expression.Parameter(typeof(TObject), "o");
MemberExpression fieldAccess = Expression.PropertyOrField(param, fieldName);
return Expression
.Lambda<Func<TObject, TField>>(fieldAccess, param)
.Compile();
}
Note re stabilization window: the bigger it is, the more confidence you have that you're not dealing with a torn read (because the list is in process of modifying all 3 fields). I've settled on the smallest value I couldn't fail in my tests where I called CopyVolatileList in a tight loop on one thread, and used another thread to add items to the list, remove them or clear the list at random intervals between 0 and 20ms.
If you remove the stabilization window, you will occasionally get a copy with uninitialized elements at the end of the array because the other thread has removed a row while you were copying - that's why it's needed.
You should obviously validate the copy once it's built, to the best of your ability (at least check for uninitialized elements at the end of the array in case the stabilization window fails).
Good luck.

Related

Impact of IEnumerable.ToList()

I'm just wondering what goes on when calling .ToList() on an IEnumerable in C#. Do the items actually get copied to completely new duplicated items on the heap or does the new List simply refer to the original items on the heap?
I'm wondering because someone told me it's expensive to call ToList, whereas if it's simply about assigning existing objects to a new list, that's a lightweight call.
I've written this fiddle https://dotnetfiddle.net/s7xIc2
Is simply checking the hashcode enough to know?
IEnumerable doesn't have to contain a list of anything. It can (and often does) resolve each current item at the time it is requested.
On the other hand, an IList is a complete in-memory copy of all the items.
So the answer is... It depends.
What is backing your IEnumerable? If its the file system then yes, calling .ToList can be quite expensive. If its an in-memory list already, then no, calling .ToList would not be terribly expensive.
As an example, lets say you created an IEnumerable that generated and returned a random number each time .Next was called. In this case calling .ToList on the IEnumerable would never return, and would eventually throw an Out Of Memory exception.
However, an IEnumerable of database objects has a finite bounds (usually :) ) and as long as all the data fits in memory, calling .ToList could be entirely appropriate.
Here is one version of ToList:
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source == null) throw Error.ArgumentNull("source");
return new List<TSource>(source);
}
It creates a new list from the source, here is the constructor:
// Constructs a List, copying the contents of the given collection. The
// size and capacity of the new list will both be equal to the size of the
// given collection.
//
public List(IEnumerable<T> collection) {
if (collection==null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
Contract.EndContractBlock();
ICollection<T> c = collection as ICollection<T>;
if( c != null) {
int count = c.Count;
if (count == 0)
{
_items = _emptyArray;
}
else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
}
else {
_size = 0;
_items = _emptyArray;
// This enumerable could be empty. Let Add allocate a new array, if needed.
// Note it will also go to _defaultCapacity first, not 1, then 2, etc.
using(IEnumerator<T> en = collection.GetEnumerator()) {
while(en.MoveNext()) {
Add(en.Current);
}
}
}
}
It copies the items.
The code is from here: referencesource.microsoft.com
The ToList() create a new List object that will contains reference to the original objects or a copy of the object if they are struct.
For instance a List of int would be full copy. A list of "Product" would be only reference to the product, not a full copy. If the original is modified, the product in the list would also be modified.

Exception when modifying collection in foreach loop

I know the basic principle of not modifying collection inside a foreach, that's why I did something like this:
public void UpdateCoverages(Dictionary<PlayerID, double> coverages)
{
// TODO: temp
var keys = coverages.Select(pair => pair.Key);
foreach (var key in keys)
{
coverages[key] = 0.84;
}
}
And:
class PlayerID : IEquatable<PlayerID>
{
public PlayerID(byte value)
{
Value = value;
}
public byte Value { get; private set; }
public bool Equals(PlayerID other)
{
return Value == other.Value;
}
}
First I save all my keys not to have the Collection modified exception and then I go through it. But I still get the exception which I cannot understand.
How to correct this and what is causing the problem?
First I save all my keys
No you don't; keys is a live sequence that is actively iterating the collection as it is iterated by the foreach. To create an isolated copy of the keys, you need to add .ToList() (or similar) to the end:
var keys = coverages.Select(pair => pair.Key).ToList();
Although personally I'd probably go for:
var keys = new PlayerID[coverages.Count];
coverages.Keys.CopyTo(keys, 0);
(which allows for correct-length allocation, and memory-copy)
What is a live sequence actually?
The Select method creates non-buffered spooling iterator over another... that is a really complicated thing to understand, but basically: when you first start iterating var key in keys, it grabs the inner sequence of coverages (aka coverages.GetEnumerator()), and then every time the foreach asks for the next item, it asks for the next item. Yeah, that sounds complicated. The good news is the C# compiler has it all built in automatically, with it generating state machines etc for you. All mainly done using the yield return syntax. Jon Skeet gives an excellent discussion of this in Chapter 6 of C# in Depth. IIRC this used to be the "free chapter", but now it is not.
However, consider the following:
static IEnumerable<int> OneTwoOneTwoForever()
{
while(true) {
yield return 1;
yield return 2;
}
}
It might surprise you to learn that you can consume the above, using the same non-buffered "when you ask for another value, it runs just enough code to give you the next value" approach:
var firstTwenty = OneTwoOneTwoForever().Take(20).ToList(); // works!

List<> optimization, what are the possibilities?

I have implemented in the Add() method of my generic EF repository a check whether the row I'm going to insert already exists in the table, if it does, update it with the currently available info.
private List<T> _previousEntries;
//Try and find one with same PK in the temporary tables...
var previousRow = _previousEntries.Single(n => (int)n.GetType().GetProperty(_PKName).GetValue(n, null) == entPKValue);
//If we did find one...
if (previousRow != null)
{
//Update the row...
return;
}
//Add it...
So I know, I'm using reflection, which is slow but I have not found another way since different entities have different SQL PK names.
But I'm not sure that reflection is the biggest issue here, sometimes, _previousEntries will hold up to 800,000 items.
_previousEntries has its items assigned to it in the class constructor of the repository class. _PKName is also assigned a value in the class constructor depending on the type of T.
If I just set a breakpoint on the Single() statement, it can be processing for 2-3 seconds for sure so I don't know how I could determine what is the bottleneck here: reflection or Single() on 800,000 items... It sure goes way faster on a 5,000 items list.
Any opinions ? Is there anything I can do to optimize my List ?
You could move the reflection out of the LINQ statement to avoid it being evaluated repeatedly:
var property = typeof(T).GetProperty(_PKName);
var previousRow = _previousEntries.Single(n => (int)property.GetValue(n, null) == entPKValue);
Or perhaps pass a Func<T, int> to your class constructor and avoid the need for reflection altogether.
private Func<T, int> _getKeyForItem; // Set by constructor
...
var previousRow = _previousEntries.Single(n => _getkeyForItem(n) == entPKValue);
Provide a primary key accessor as a delegate
public class Repository<T>
{
private Funct<T,int> _getPK;
private Dictionary<int,T> _previousEntries;
public Repository(Funct<T,int> getPK)
{
_getPK = getPK;
_previousEntries = new Dictionary<int,T>();
}
public void Add(T item) {
...
int pk = _getPK(item);
T previousEntry;
if (_previousEntries.TryGetValue(pk, out previousEntry) {
// Update
} else {
// Add
_previousEntries.Add(pk, item);
}
}
}
You would create a repositiory like this
var clientRepository = new Repositiory<Client>(c => c.ClientID);
There is no way to make searching through not sorted list fast. It will be O(number-of-items).
You need to use some other data structure to make look up faster - Dictionary or list sorted by PK are possible options.

C# Locking mechanism - write only locking

In continuation for my latest ponders about locks in C# and .NET,
Consider the following scenario:
I have a class which contains a specific collection (for this example, i've used a Dictionary<string, int>) which is updated from a data source every few minutes using a specific method which it's body you can see below:
DataTable dataTable = dbClient.ExecuteDataSet(i_Query).GetFirstTable();
lock (r_MappingLock)
{
i_MapObj.Clear();
foreach (DataRow currRow in dataTable.Rows)
{
i_MapObj.Add(Convert.ToString(currRow[i_Column1]), Convert.ToInt32[i_Column2]));
}
}
r_MappingLock is an object dedicated to lock the critical section which refreshes the dictionary's contents.
i_MapObj is the dictionary object
i_Column1 and i_Column2 are the datatable's column names which contain the desired data for the mapping.
Now, I also have a class method which receives a string and returns the correct mapped int based on the mentioned dictionary.
I want this method to wait until the refresh method completes it's execution, so at first glance one would consider the following implementation:
lock (r_MappingLock)
{
int? retVal = null;
if (i_MapObj.ContainsKey(i_Key))
{
retVal = i_MapObj[i_Key];
}
return retVal;
}
This will prevent unexpected behaviour and return value while the dictionary is being updated, but another issue arises:
Since every thread which executes the above method tries to claim the lock, it means that if multiple threads try to execute this method at the same time, each will have to wait until the previous thread finished executing the method and try to claim the lock, and this is obviously an undesirable behaviour since the above method is only for reading purposes.
I was thinking of adding a boolean member to the class which will be set to true or false wether the dictionary is being updated or not and checking it within the "read only" method, but this arise other race-condition based issues...
Any ideas how to solve this gracefully?
Thanks again,
Mikey
Have a look at the built in ReaderWriterLock.
I would just switch to using a ConcurrentDictionary to avoid this situation altogether - manually locking is error-prone. Also as I can gather from "C#: The Curious ConcurrentDictionary", ConcurrentDictionary is already read-optimized.
Albin pointed out correctly at ReaderWriterLock. I will add an even nicer one: ReaderWriterGate by Jeffrey Richter. Enjoy!
You might consider creating a new dictionary when updating, instead of locking. This way, you will always have consistent results, but reads during updates would return previous data:
private volatile Dictionary<string, int> i_MapObj = new Dictionary<string, int>();
private void Update()
{
DataTable dataTable = dbClient.ExecuteDataSet(i_Query).GetFirstTable();
var newData = new Dictionary<string, int>();
foreach (DataRow currRow in dataTable.Rows)
{
newData.Add(Convert.ToString(currRow[i_Column1]), Convert.ToInt32[i_Column2]));
}
// Start using new data - reference assignments are atomic
i_MapObj = newData;
}
private int? GetValue(string key)
{
int value;
if (i_MapObj.TryGetValue(key, out value))
return value;
return null;
}
In C# 4.0 there is ReaderWriterLockSlim class that is a lot faster!
Almost as fast as a lock().
Keep the policy to disallow recursion (LockRecursionPolicy::NoRecursion) to keep performances so high.
Look at this page for more info.

"Possible multiple enumeration of IEnumerable" vs "Parameter can be declared with base type"

In Resharper 5, the following code led to the warning "Parameter can be declared with base type" for list:
public void DoSomething(List<string> list)
{
if (list.Any())
{
// ...
}
foreach (var item in list)
{
// ...
}
}
In Resharper 6, this is not the case. However, if I change the method to the following, I still get that warning:
public void DoSomething(List<string> list)
{
foreach (var item in list)
{
// ...
}
}
The reason is, that in this version, list is only enumerated once, so changing it to IEnumerable<string> will not automatically introduce another warning.
Now, if I change the first version manually to use an IEnumerable<string> instead of a List<string>, I will get that warning ("Possible multiple enumeration of IEnumerable") on both occurrences of list in the body of the method:
public void DoSomething(IEnumerable<string> list)
{
if (list.Any()) // <- here
{
// ...
}
foreach (var item in list) // <- and here
{
// ...
}
}
I understand, why, but I wonder, how to solve this warning, assuming, that the method really only needs an IEnumerable<T> and not a List<T>, because I just want to enumerate the items and I don't want to change the list.
Adding a list = list.ToList(); at the beginning of the method makes the warning go away:
public void DoSomething(IEnumerable<string> list)
{
list = list.ToList();
if (list.Any())
{
// ...
}
foreach (var item in list)
{
// ...
}
}
I understand, why that makes the warning go away, but it looks a bit like a hack to me...
Any suggestions, how to solve that warning better and still use the most general type possible in the method signature?
The following problems should all be solved for a good solution:
No call to ToList() inside the method, because it has a performance impact
No usage of ICollection<T> or even more specialized interfaces/classes, because they change the semantics of the method as seen from the caller.
No multiple iterations over an IEnumerable<T> and thus risking accessing a database multiple times or similar.
Note: I am aware that this is not a Resharper issue, and thus, I don't want to suppress this warning, but fix the underlying cause as the warning is legit.
UPDATE:
Please don't care about Any and the foreach. I don't need help in merging those statements to have only one enumeration of the enumerable.
It could really be anything in this method that enumerates the enumerable multiple times!
You should probably take an IEnumerable<T> and ignore the "multiple iterations" warning.
This message is warning you that if you pass a lazy enumerable (such as an iterator or a costly LINQ query) to your method, parts of the iterator will execute twice.
There is no perfect solution, choose one acording to the situation.
enumerable.ToList, you may optimize it by firstly trying "enumerable as List" as long as you don't modify the list
Iterate two times over the IEnumerable but make it clear for the caller (document it)
Split in two methods
Take List to avoid cost of "as"/ToList and potential cost of double enumeration
The first solution (ToList) is probably the most "correct" for a public method that could be working on any Enumerable.
You can ignore Resharper issues, the warning is legit in a general case but may be wrong in your specific situation. Especially if the method is intended for internal usage and you have full control on callers.
This class will give you a way to split the first item off of the enumeration and then have an IEnumerable for the rest of the enumeration without giving you a double enumeration, thus avoiding the potentially nasty performance hit. It's usage is like this (where T is whatever type you are enumerating):
var split = new SplitFirstEnumerable(currentIEnumerable);
T firstItem = split.First;
IEnumerable<T> remaining = split.Remaining;
Here is the class itself:
/// <summary>
/// Use this class when you want to pull the first item off of an IEnumerable
/// and then enumerate over the remaining elements and you want to avoid the
/// warning about "possible double iteration of IEnumerable" AND without constructing
/// a list or other duplicate data structure of the enumerable. You construct
/// this class from your existing IEnumerable and then use its First and
/// Remaining properties for your algorithm.
/// </summary>
/// <typeparam name="T">The type of item you are iterating over; there are no
/// "where" restrictions on this type.</typeparam>
public class SplitFirstEnumerable<T>
{
private readonly IEnumerator<T> _enumerator;
/// <summary>
/// Constructor
/// </summary>
/// <remarks>Will throw an exception if there are zero items in enumerable or
/// if the enumerable is already advanced past the last element.</remarks>
/// <param name="enumerable">The enumerable that you want to split</param>
public SplitFirstEnumerable(IEnumerable<T> enumerable)
{
_enumerator = enumerable.GetEnumerator();
if (_enumerator.MoveNext())
{
First = _enumerator.Current;
}
else
{
throw new ArgumentException("Parameter 'enumerable' must have at least 1 element to be split.");
}
}
/// <summary>
/// The first item of the original enumeration, equivalent to calling
/// enumerable.First().
/// </summary>
public T First { get; private set; }
/// <summary>
/// The items of the original enumeration minus the first, equivalent to calling
/// enumerable.Skip(1).
/// </summary>
public IEnumerable<T> Remaining
{
get
{
while (_enumerator.MoveNext())
{
yield return _enumerator.Current;
}
}
}
}
This does presuppose that the IEnumerable has at least one element to start. If you want to do more of a FirstOrDefault type setup, you'll need to catch the exception that would otherwise be thrown in the constructor.
There exists a general solution to address both Resharper warnings: the lack of guarantee for repeat-ability of IEnumerable, and the List base class (or potentially expensive ToList() workaround).
Create a specialized class, I.E "RepeatableEnumerable", implementing IEnumerable, with "GetEnumerator()" implemented with the following logic outline:
Yield all items already collected so far from the inner list.
If the wrapped enumerator has more items,
While the wrapped enumerator can move to the next item,
Get the current item from the inner enumerator.
Add the current item to the inner list.
Yield the current item
Mark the inner enumerator as having no more items.
Add extension methods and appropriate optimizations where the wrapped parameter is already repeatable. Resharper will no longer flag the indicated warnings on the following code:
public void DoSomething(IEnumerable<string> list)
{
var repeatable = list.ToRepeatableEnumeration();
if (repeatable.Any()) // <- no warning here anymore.
// Further, this will read at most one item from list. A
// query (SQL LINQ) with a 10,000 items, returning one item per second
// will pass this block in 1 second, unlike the ToList() solution / hack.
{
// ...
}
foreach (var item in repeatable) // <- and no warning here anymore, either.
// Further, this will read in lazy fashion. In the 10,000 item, one
// per second, query scenario, this loop will process the first item immediately
// (because it was read already for Any() above), and then proceed to
// process one item every second.
{
// ...
}
}
With a little work, you can also turn RepeatableEnumerable into LazyList, a full implementation of IList. That's beyond the scope of this particular problem though. :)
UPDATE: Code implementation requested in comments -- not sure why the original PDL wasn't enough, but in any case, the following faithfully implements the algorithm I suggested (My own implementation implements the full IList interface; that is a bit beyond the scope I want to release here... :) )
public class RepeatableEnumerable<T> : IEnumerable<T>
{
readonly List<T> innerList;
IEnumerator<T> innerEnumerator;
public RepeatableEnumerable( IEnumerator<T> innerEnumerator )
{
this.innerList = new List<T>();
this.innerEnumerator = innerEnumerator;
}
public IEnumerator<T> GetEnumerator()
{
// 1. Yield all items already collected so far from the inner list.
foreach( var item in innerList ) yield return item;
// 2. If the wrapped enumerator has more items
if( innerEnumerator != null )
{
// 2A. while the wrapped enumerator can move to the next item
while( innerEnumerator.MoveNext() )
{
// 1. Get the current item from the inner enumerator.
var item = innerEnumerator.Current;
// 2. Add the current item to the inner list.
innerList.Add( item );
// 3. Yield the current item
yield return item;
}
// 3. Mark the inner enumerator as having no more items.
innerEnumerator.Dispose();
innerEnumerator = null;
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
// Add extension methods and appropriate optimizations where the wrapped parameter is already repeatable.
public static class RepeatableEnumerableExtensions
{
public static RepeatableEnumerable<T> ToRepeatableEnumerable<T>( this IEnumerable<T> items )
{
var result = ( items as RepeatableEnumerable<T> )
?? new RepeatableEnumerable<T>( items.GetEnumerator() );
return result;
}
}
I realize this question is old and already marked as answered, but I was surprised that nobody suggested manually iterating over the enumerator:
// NOTE: list is of type IEnumerable<T>.
// The name was taken from the OP's code.
var enumerator = list.GetEnumerator();
if (enumerator.MoveNext())
{
// Run your list.Any() logic here
...
do
{
var item = enumerator.Current;
// Run your foreach (var item in list) logic here
...
} while (enumerator.MoveNext());
}
Seems a lot more straightforward than the other answers here.
Generally speaking, what you need is some state object into which you can PUSH the items (within a foreach loop), and out of which you then get your final result.
The downside of the enumerable LINQ operators is that they actively enumerate the source instead of accepting items being pushed to them, so they don't meet your requirements.
If you e.g. just need the minimum and maximum values of a sequence of 1'000'000 integers which cost $1'000 worth of processor time to retrieve, you end up writing something like this:
public class MinMaxAggregator
{
private bool _any;
private int _min;
private int _max;
public void OnNext(int value)
{
if (!_any)
{
_min = _max = value;
_any = true;
}
else
{
if (value < _min) _min = value;
if (value > _max) _max = value;
}
}
public MinMax GetResult()
{
if (!_any) throw new InvalidOperationException("Sequence contains no elements.");
return new MinMax(_min, _max);
}
}
public static MinMax DoSomething(IEnumerable<int> source)
{
var aggr = new MinMaxAggregator();
foreach (var item in source) aggr.OnNext(item);
return aggr.GetResult();
}
In fact, you just re-implemented the logic of the Min() and Max() operators. Of course that's easy, but they are only examples for arbitrary complex logic you might otherwise easily express in a LINQish way.
The solution came to me on yesterday's night walk: we need to PUSH... that's REACTIVE! All the beloved operators also exist in a reactive version built for the push paradigm. They can be chained together at will to whatever complexity you need, just as their enumerable counterparts.
So the min/max example boils down to:
public static MinMax DoSomething(IEnumerable<int> source)
{
// bridge over to the observable world
var connectable = source.ToObservable(Scheduler.Immediate).Publish();
// express the desired result there (note: connectable is observed by multiple observers)
var combined = connectable.Min().CombineLatest(connectable.Max(), (min, max) => new MinMax(min, max));
// subscribe
var resultAsync = combined.GetAwaiter();
// unload the enumerable into connectable
connectable.Connect();
// pick up the result
return resultAsync.GetResult();
}
Why not:
bool any;
foreach (var item in list)
{
any = true;
// ...
}
if(any)
{
//...
}
Update: Personally, I wouldn't drastically change the code just to get around a warning like this. I would just disable the warning and continue on. The warning is suggesting you change the general flow of the code to make it better; if you're not making the code better (and arguably making it worse) to address the warning; then the point of the warning is missed.
For example:
// ReSharper disable PossibleMultipleEnumeration
public void DoSomething(IEnumerable<string> list)
{
if (list.Any()) // <- here
{
// ...
}
foreach (var item in list) // <- and here
{
// ...
}
}
// ReSharper restore PossibleMultipleEnumeration
UIMS* - Fundamentally, there is no great solve. IEnumerable<T> used to be the "very basic thing that represents a bunch of things of the same type, so using it in method sigs is Correct." It has now also become a "thing that might evaluate behind the scenes, and might take a while, so now you always have to worry about that."
It's as if IDictionary suddenly were extended to support lazy loading of values, via a LazyLoader property of type Func<TKey,TValue>. Actually that'd be neat to have, but not so neat to be added to IDictionary, because now every time we receive an IDictionary we have to worry about that. But that's where we are.
So it would seem that "if a method takes an IEnumerable and evals it twice, always force eval via ToList()" is the best you can do. And nice work by Jetbrains to give us this warning.
*(Unless I'm Missing Something . . . just made it up but it seems useful)
Be careful when accepting enumerables in your method. The "warning" for the base type is only a hint, the enumeration warning is a true warning.
However, your list will be enumerated at least two times because you do any and then a foreach. If you add a ToList() your enumeration will be enumerated three times - remove the ToList().
I would suggest to set resharpers warning settings for the base type to a hint. So you still have a hint (green underline) and the possibility to quickfix it (alt+enter) and no "warnings" in your file.
You should take care if enumerating the IEnumerable is an expensive action like loading something from file or database, or if you have a method which calculates values and uses yield return. In this case do a ToList() or ToArray() first to load/calculate all data only ONCE.
You could use ICollection<T> (or IList<T>). It's less specific than List<T>, but doesn't suffer from the multiple-enumeration problem.
Still I'd tend to use IEnumerable<T> in this case. You can also consider to refactor the code to enumerate only once.
Use an IList as your parameter type rather than IEnumerable - IEnumerable has different semantics to List whereas IList has the same
IEnumerable could be based on a non-seekable stream which is why you get the warnings
You can iterate only once :
public void DoSomething(IEnumerable<string> list)
{
bool isFirstItem = true;
foreach (var item in list)
{
if (isFirstItem)
{
isFirstItem = false;
// ...
}
// ...
}
}
There is something no one had said before (#Zebi). Any() already iterates trying to find the element. If you call a ToList(), it will iterate as well, to create a list. The initial idea of using IEnumerable is only to iterate, anything else provokes an iteration in order to perform. You should try to, inside a single loop, do everything.
And include in it your .Any() method.
if you pass a list of Action in your method you would have a cleaner iterated once code
public void DoSomething(IEnumerable<string> list, params Action<string>[] actions)
{
foreach (var item in list)
{
for(int i =0; i < actions.Count; i++)
{
actions[i](item);
}
}
}

Categories

Resources