Does Distinct() method keep original ordering of sequence intact?

Does Distinct() method keep original ordering of sequence intact? - c#

I want to remove duplicates from list, without changing order of unique elements in the list.
Jon Skeet & others have suggested to use the following:
list = list.Distinct().ToList();
Reference:
How to remove duplicates from a List<T>?
Remove duplicates from a List<T> in C#
Is it guaranteed that the order of unique elements would be same as before? If yes, please give a reference that confirms this as I couldn't find anything on it in documentation.

It's not guaranteed, but it's the most obvious implementation. It would be hard to implement in a streaming manner (i.e. such that it returned results as soon as it could, having read as little as it could) without returning them in order.
You might want to read my blog post on the Edulinq implementation of Distinct().
Note that even if this were guaranteed for LINQ to Objects (which personally I think it should be) that wouldn't mean anything for other LINQ providers such as LINQ to SQL.
The level of guarantees provided within LINQ to Objects is a little inconsistent sometimes, IMO. Some optimizations are documented, others not. Heck, some of the documentation is flat out wrong.

In the .NET Framework 3.5, disassembling the CIL of the Linq-to-Objects implementation of Distinct() shows that the order of elements is preserved - however this is not documented behavior.
I did a little investigation with Reflector. After disassembling System.Core.dll, Version=3.5.0.0 you can see that Distinct() is an extension method, which looks like this:
public static class Emunmerable
{
public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
throw new ArgumentNullException("source");
return DistinctIterator<TSource>(source, null);
}
}
So, interesting here is DistinctIterator, which implements IEnumerable and IEnumerator. Here is simplified (goto and lables removed) implementation of this IEnumerator:
private sealed class DistinctIterator<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable
{
private bool _enumeratingStarted;
private IEnumerator<TSource> _sourceListEnumerator;
public IEnumerable<TSource> _source;
private HashSet<TSource> _hashSet;
private TSource _current;
private bool MoveNext()
{
if (!_enumeratingStarted)
{
_sourceListEnumerator = _source.GetEnumerator();
_hashSet = new HashSet<TSource>();
_enumeratingStarted = true;
}
while(_sourceListEnumerator.MoveNext())
{
TSource element = _sourceListEnumerator.Current;
if (!_hashSet.Add(element))
continue;
_current = element;
return true;
}
return false;
}
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
TSource IEnumerator<TSource>.Current
{
get { return _current; }
}
object IEnumerator.Current
{
get { return _current; }
}
}
As you can see - enumerating goes in order provided by source enumerable (list, on which we are calling Distinct). Hashset is used only for determining whether we already returned such element or not. If not, we are returning it, else - continue enumerating on source.
So, it is guaranteed, that Distinct() will return elements exactly in same order, which are provided by collection to which Distinct was applied.

According to the documentation the sequence is unordered.

Yes, Enumerable.Distinct preserves order. Assuming the method to be lazy "yields distinct values are soon as they are seen", it follows automatically. Think about it.
The .NET Reference source confirms. It returns a subsequence, the first element in each equivalence class.
foreach (TSource element in source)
if (set.Add(element)) yield return element;
The .NET Core implementation is similar.
Frustratingly, the documentation for Enumerable.Distinct is confused on this point:
The result sequence is unordered.
I can only imagine they mean "the result sequence is not sorted." You could implement Distinct by presorting then comparing each element to the previous, but this would not be lazy as defined above.

A bit late to the party, but no one really posted the best complete code to accomplish this IMO, so let me offer this (which is essentially identical to what .NET Framework does with Distinct())*:
public static IEnumerable<T> DistinctOrdered<T>(this IEnumerable<T> items)
{
HashSet<T> returnedItems = new HashSet<T>();
foreach (var item in items)
{
if (returnedItems.Add(item))
yield return item;
}
}
This guarantees the original order without reliance on undocumented or assumed behavior. I also believe this is more efficient than using multiple LINQ methods though I'm open to being corrected here.
(*) The .NET Framework source uses an internal Set class, which appears to be substantively identical to HashSet.

By default when use Distinct linq operator uses Equals method but you can use your own IEqualityComparer<T> object to specify when two objects are equals with a custom logic implementing GetHashCode and Equals method.
Remember that:
GetHashCode should not used heavy cpu comparision ( eg. use only some obvious basic checks ) and its used as first to state if two object are surely different ( if different hash code are returned ) or potentially the same ( same hash code ). In this latest case when two object have the same hashcode the framework will step to check using the Equals method as a final decision about equality of given objects.
After you have MyType and a MyTypeEqualityComparer classes follow code not ensure the sequence maintain its order:
var cmp = new MyTypeEqualityComparer();
var lst = new List<MyType>();
// add some to lst
var q = lst.Distinct(cmp);
In follow sci library I implemented an extension method to ensure Vector3D set maintain the order when use a specific extension method DistinctKeepOrder:
relevant code follows:
/// <summary>
/// support class for DistinctKeepOrder extension
/// </summary>
public class Vector3DWithOrder
{
public int Order { get; private set; }
public Vector3D Vector { get; private set; }
public Vector3DWithOrder(Vector3D v, int order)
{
Vector = v;
Order = order;
}
}
public class Vector3DWithOrderEqualityComparer : IEqualityComparer<Vector3DWithOrder>
{
Vector3DEqualityComparer cmp;
public Vector3DWithOrderEqualityComparer(Vector3DEqualityComparer _cmp)
{
cmp = _cmp;
}
public bool Equals(Vector3DWithOrder x, Vector3DWithOrder y)
{
return cmp.Equals(x.Vector, y.Vector);
}
public int GetHashCode(Vector3DWithOrder obj)
{
return cmp.GetHashCode(obj.Vector);
}
}
In short Vector3DWithOrder encapsulate the type and an order integer, while Vector3DWithOrderEqualityComparer encapsulates original type comparer.
and this is the method helper to ensure order maintained
/// <summary>
/// retrieve distinct of given vector set ensuring to maintain given order
/// </summary>
public static IEnumerable<Vector3D> DistinctKeepOrder(this IEnumerable<Vector3D> vectors, Vector3DEqualityComparer cmp)
{
var ocmp = new Vector3DWithOrderEqualityComparer(cmp);
return vectors
.Select((w, i) => new Vector3DWithOrder(w, i))
.Distinct(ocmp)
.OrderBy(w => w.Order)
.Select(w => w.Vector);
}
Note : further research could allow to find a more general ( uses of interfaces ) and optimized way ( without encapsulate the object ).

This highly depends on your linq-provider. On Linq2Objects you can stay on the internal source-code for Distinct, which makes one assume the original order is preserved.
However for other providers that resolve to some kind of SQL for example, that isn´t neccessarily the case, as an ORDER BY-statement usually comes after any aggregation (such as Distinct). So if your code is this:
myArray.OrderBy(x => anothercol).GroupBy(x => y.mycol);
this is translated to something similar to the following in SQL:
SELECT * FROM mytable GROUP BY mycol ORDER BY anothercol;
This obviously first groups your data and sorts it afterwards. Now you´re stuck on the DBMS own logic of how to execute that. On some DBMS this isn´t even allowed. Imagine the following data:
mycol anothercol
1 2
1 1
1 3
2 1
2 3
when executing myArr.OrderBy(x => x.anothercol).GroupBy(x => x.mycol) we assume the following result:
mycol anothercol
1 1
2 1
But the DBMS may aggregate the anothercol-column so, that allways the value of the first row is used, resulting in the following data:
mycol anothercol
1 2
2 1
which after ordering will result in this:
mycol anothercol
2 1
1 2
This is similar to the following:
SELECT mycol, First(anothercol) from mytable group by mycol order by anothercol;
which is the completely reverse order than what you expected.
You see the execution-plan may vary depending on what the underlying provider is. This is why there´s no guarantee about that in the docs.

Related

C# Interface IEnumerable Any() without specifying generic types

I have casted
var info = property.Info;
object data = info.GetValue(obj);
...
var enumerable = (IEnumerable)data;
if (enumerable.Any()) ///Does not compile
{
}
if (enumerable.GetEnumerator().Current != null) // Run time error
{
}
and I would like to see if this enumerable has any elements, via using Linq Query Any(). But unfortunately, even with using Linq, I can't.
How would I do this without specifying the generic type.

While you can't do this directly, you could do it via Cast:
if (enumerable.Cast<object>().Any())
That should always work, as any IEnumerable can be wrapped as an IEnumerable<object>. It will end up boxing the first element if it's actually an IEnumerable<int> or similar, but it should work fine. Unlike most LINQ methods, Cast and OfType target IEnumerable rather than IEnumerable<T>.
You could write your own subset of extension methods like the LINQ ones but operating on the non-generic IEnumerable type if you wanted to, of course. Implementing LINQ to Objects isn't terribly hard - you could use my Edulinq project as a starting point, for example.
There are cases where you could implement Any(IEnumerable) slightly more efficiently than using Cast - for example, taking a shortcut if the target implements the non-generic ICollection interface. At that point, you wouldn't need to create an iterator or take the first element. In most cases that won't make much performance difference, but it's the kind of thing you could do if you were optimizing.

One method is to use foreach, as noted in IEnumerable "Remarks". It also provides details on the additional methods off of the result of GetEnumerator.
bool hasAny = false;
foreach (object i in (IEnumerable)(new int[1] /* IEnumerable of any type */)) {
hasAny = true;
break;
}
(Which is itself easily transferable to an Extension method.)

Your attempt to use GetEnumerator().Current tried to get the current value of an enumerator that had not yet been moved to the first position yet. It would also have given the wrong result if the first item existed or was null. What you could have done (and what the Any() in Enumerable does) is see if it was possible to move to that first item or not; i.e. is there a first item to move to:
internal static class UntypedLinq
{
public static bool Any(this IEnumerable source)
{
if (source == null) throw new ArgumentNullException(nameof(source));
IEnumerator ator = source.GetEnumerator();
// Unfortunately unlike IEnumerator<T>, IEnumerator does not implement
// IDisposable. (A design flaw fixed when IEnumerator<T> was added).
// We need to test whether disposal is required or not.
if (ator is IDisposable disp)
{
using(disp)
{
return ator.MoveNext();
}
}
return ator.MoveNext();
}
// Not completely necessary. Causes any typed enumerables to be handled by the existing Any
// in Linq via a short method that will be inlined.
public static bool Any<T>(this IEnumerable<T> source) => Enumerable.Any(source);
}

Is it possible to turn an IEnumerable into an IOrderedEnumerable without using OrderBy?

Say there is an extension method to order an IQueryable based on several types of Sorting (i.e. sorting by various properties) designated by a SortMethod enum.
public static IOrderedEnumerable<AClass> OrderByX(this IQueryable<AClass> values,
SortMethod? sortMethod)
{
IOrderedEnumerable<AClass> queryRes = null;
switch (sortMethod)
{
case SortMethod.Method1:
queryRes = values.OrderBy(a => a.Property1);
break;
case SortMethod.Method2:
queryRes = values.OrderBy(a => a.Property2);
break;
case null:
queryRes = values.OrderBy(a => a.DefaultProperty);
break;
default:
queryRes = values.OrderBy(a => a.DefaultProperty);
break;
}
return queryRes;
}
In the case where sortMethod is null (i.e. where it is specified that I don't care about the order of the values), is there a way to instead of ordering by some default property, to instead just pass the IEnumerator values through as "ordered" without having to perform the actual sort?
I would like the ability to call this extension, and then possibly perform some additional ThenBy orderings.

All you need to do for the default case is:
queryRes = values.OrderBy(a => 1);
This will effectively be a noop sort. Because the OrderBy performs a stable sort the original order will be maintained in the event that the selected objects are equal. Note that since this is an IQueryable and not an IEnumerable it's possible for the query provider to not perform a stable sort. In that case, you need to know if it's important that order be maintained, or if it's appropriate to just say "I don't care what order the result is, so long as I can call ThenBy on the result).
Another option, that allows you to avoid the actual sort is to create your own IOrderedEnumerable implementation:
public class NoopOrder<T> : IOrderedEnumerable<T>
{
private IQueryable<T> source;
public NoopOrder(IQueryable<T> source)
{
this.source = source;
}
public IOrderedEnumerable<T> CreateOrderedEnumerable<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer, bool descending)
{
if (descending)
{
return source.OrderByDescending(keySelector, comparer);
}
else
{
return source.OrderBy(keySelector, comparer);
}
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return source.GetEnumerator();
}
}
With that your query can be:
queryRes = new NoopOrder<AClass>(values);
Note that the consequence of the above class is that if there is a call to ThenBy that ThenBy will effectively be a top level sort. It is in effect turning the subsequent ThenBy into an OrderBy call. (This should not be surprising; ThenBy will call the CreateOrderedEnumerable method, and in there this code is calling OrderBy, basically turning that ThenBy into an OrderBy. From a conceptual sorting point of view, this is a way of saying that "all of the items in this sequence are equal in the eyes of this sort, but if you specify that equal objects should be tiebroken by something else, then do so.
Another way of thinking of a "no op sort" is that it orders the items based in the index of the input sequence. This means that the items are not all "equal", it means that the order input sequence will be the final order of the output sequence, and since each item in the input sequence is always larger than the one before it, adding additional "tiebreaker" comparisons will do nothing, making any subsequent ThenBy calls pointless. If this behavior is desired, it is even easier to implement than the previous one:
public class NoopOrder<T> : IOrderedEnumerable<T>
{
private IQueryable<T> source;
public NoopOrder(IQueryable<T> source)
{
this.source = source;
}
public IOrderedEnumerable<T> CreateOrderedEnumerable<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer, bool descending)
{
return new NoopOrder<T>(source);
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return source.GetEnumerator();
}
}

If you return always the same index value you will get an IOrderedEnumerable that preserve the original list order:
case null:
queryRes = values.OrderBy(a => 1);
break;
Btw I don't think this is a right thing to do. You will get a collection that is supposted to be ordered but actually it is not.

Bottom line, IOrderedEnumerable exists solely to provide a grammar structure to the OrderBy()/ThenBy() methods, preventing you from trying to start an ordering clause with ThenBy(). process. It's not intended to be a "marker" that identifies the collection as ordered, unless it was actually ordered by OrderBy(). So, the answer is that if the sorting method being null is supposed to indicate that the enumerable is in some "default order", you should specify that default order (as your current implementation does). It's disingenuous to state that the enumerable is ordered when in fact it isn't, even if, by not specifying a SortingMethod, you are inferring it's "ordered by nothing" and don't care about the actual order.
The "problem" inherent in trying to simply mark the collection as ordered using the interface is that there's more to the process than simply sorting. By executing an ordering method chain, such as myCollection.OrderBy().ThenBy().ThenByDescending(), you're not actually sorting the collection with each call; not yet anyway. You are instead defining the behavior of an "iterator" class, named OrderedEnumerable, which will use the projections and comparisons you define in the chain to perform the sorting at the moment you need an actual sorted element.
Servy's answer, stating that OrderBy(x=>1) is a noop and should be optimized out of SQL providers ignores the reality that this call, made against an Enumerable, will still do quite a bit of work, and that most SQL providers in fact do not optimize this kind of call; OrderBy(x=>1) will, in most Linq providers, produce a query with an "ORDER BY 1" clause, which not only forces the SQL provider to perform its own sorting, it will actually result in a change to the order, because in T-SQL at least "ORDER BY 1" means to order by the first column of the select list.

Implementing IEquatable<T> to avoid duplicates from List<T>

I have a List<CustomObject> and want to remove duplicates from it.
If two Custom Objects have same value for property: City, then I will call them duplicate.
I have implemented IEquatable as follows, but not able to remove duplicates from the list.
What is missing?
public class CustomAddress : IAddress, IEqualityComparer<IAddress>
{
//Other class members go here
//IEqualityComparer members
public bool Equals(IAddress x, IAddress y)
{
// Check whether the compared objects reference the same data.
if (ReferenceEquals(x, y)) return true;
// Check whether any of the compared objects is null.
if (ReferenceEquals(x, null) || ReferenceEquals(y, null))
return false;
// Check whether the Objects' properties are equal.
return x.City.Equals(y.City);
}
public int GetHashCode(IAddress obj)
{
// Check whether the object is null.
if (ReferenceEquals(obj, null)) return 0;
int hashAreaName = City == null ? 0 : City.GetHashCode();
return hashAreaName;
}
}
I am using .NET 3.5

With your overrides of Equals and GetHashCode in place, if you have an existing list that you need to filter, simply invoke Distinct() (available through the namespace System.Linq) on the list.
var noDupes = list.Distinct();
This will give you a duplicate-free sequence. If you need that to be a concrete list, simply add a ToList() to the end of the invocation.
var noDupes = list.Distinct().ToList();
Another answer mentions implementing an IEqualityComparer<CustomObject>. This is useful when overriding Equals and GetHashCode directly is either impossible (you don't control the source) or does not make sense (your idea of equality in this particular case is not universal for the class). In that case, define the comparer as demonstrated and provide an instance of the comparer to an overload of Distinct.
Finally, if you're building a list from the ground-up and want to avoid duplicates being inserted, you can use a HashSet<T> as mentioned here. The HashSet also accepts a custom comparer in the constructor, so you can optionally include that.
var mySet = new HashSet<CustomObject>();
bool isAdded = mySet.Add(myElement);
// isAdded will be false if myElement already exists in set, and
// myElement would not be added a second time.
// or you could use
if (!mySet.Contains(myElement))
mySet.Add(myElement);
One more option that is not using .NET library methods but can be useful in a pinch is Jon Skeet's DistinctBy, which you can see a rough implementation here. The idea is that you submit a Func<MyObject, Key> lambda expression directly and omit the overrides of Equals and GetHashCode (or the custom comparer) entirely.
var noDupes = list.DistinctBy(obj => obj.City); // NOT part of BCL

Just by implementing .Equals the way you did (wich you implemented correctly) you will not prevent duplicates from beeing added to a List<T>. You will actually have to manually remove them.
Instead of List<CustomObject> use HashSet<CustomObject>. It will never contain duplicates.

That's because List<CustomObject> tests if your class ( CustomObject) implements IEquatable<CustomObject> and not IEquatable<IAddress> as you did
I assume that for duplicate check you are using the Contains method, before adding a new member

To match duplicates on only a specific property you need a comparer.
class MyComparer : IEqualityComparer<CustomObject>
{
public bool Equals(CustomObject x, CustomObject y)
{
return x.City.Equals(y.City);
}
public int GetHashCode(CustomObject x)
{
return x.City.GetHashCode()
}
}
Usage:
var yourDistictObjects = youObjects.Distinct(new MyComparer());
Edit: Found this thread that does what you need and I think I referred to it in the past:
Remove duplicates in the list using linq
One answer that I thought was kind of interesting (but not how had done it) was:
var distinctItems = items.GroupBy(x => x.Id).Select(y => y.First());
It's a one liner that does what you need but might not be as efficient as the other methods.

"Possible multiple enumeration of IEnumerable" vs "Parameter can be declared with base type"

In Resharper 5, the following code led to the warning "Parameter can be declared with base type" for list:
public void DoSomething(List<string> list)
{
if (list.Any())
{
// ...
}
foreach (var item in list)
{
// ...
}
}
In Resharper 6, this is not the case. However, if I change the method to the following, I still get that warning:
public void DoSomething(List<string> list)
{
foreach (var item in list)
{
// ...
}
}
The reason is, that in this version, list is only enumerated once, so changing it to IEnumerable<string> will not automatically introduce another warning.
Now, if I change the first version manually to use an IEnumerable<string> instead of a List<string>, I will get that warning ("Possible multiple enumeration of IEnumerable") on both occurrences of list in the body of the method:
public void DoSomething(IEnumerable<string> list)
{
if (list.Any()) // <- here
{
// ...
}
foreach (var item in list) // <- and here
{
// ...
}
}
I understand, why, but I wonder, how to solve this warning, assuming, that the method really only needs an IEnumerable<T> and not a List<T>, because I just want to enumerate the items and I don't want to change the list.
Adding a list = list.ToList(); at the beginning of the method makes the warning go away:
public void DoSomething(IEnumerable<string> list)
{
list = list.ToList();
if (list.Any())
{
// ...
}
foreach (var item in list)
{
// ...
}
}
I understand, why that makes the warning go away, but it looks a bit like a hack to me...
Any suggestions, how to solve that warning better and still use the most general type possible in the method signature?
The following problems should all be solved for a good solution:
No call to ToList() inside the method, because it has a performance impact
No usage of ICollection<T> or even more specialized interfaces/classes, because they change the semantics of the method as seen from the caller.
No multiple iterations over an IEnumerable<T> and thus risking accessing a database multiple times or similar.
Note: I am aware that this is not a Resharper issue, and thus, I don't want to suppress this warning, but fix the underlying cause as the warning is legit.
UPDATE:
Please don't care about Any and the foreach. I don't need help in merging those statements to have only one enumeration of the enumerable.
It could really be anything in this method that enumerates the enumerable multiple times!

You should probably take an IEnumerable<T> and ignore the "multiple iterations" warning.
This message is warning you that if you pass a lazy enumerable (such as an iterator or a costly LINQ query) to your method, parts of the iterator will execute twice.

There is no perfect solution, choose one acording to the situation.
enumerable.ToList, you may optimize it by firstly trying "enumerable as List" as long as you don't modify the list
Iterate two times over the IEnumerable but make it clear for the caller (document it)
Split in two methods
Take List to avoid cost of "as"/ToList and potential cost of double enumeration
The first solution (ToList) is probably the most "correct" for a public method that could be working on any Enumerable.
You can ignore Resharper issues, the warning is legit in a general case but may be wrong in your specific situation. Especially if the method is intended for internal usage and you have full control on callers.

This class will give you a way to split the first item off of the enumeration and then have an IEnumerable for the rest of the enumeration without giving you a double enumeration, thus avoiding the potentially nasty performance hit. It's usage is like this (where T is whatever type you are enumerating):
var split = new SplitFirstEnumerable(currentIEnumerable);
T firstItem = split.First;
IEnumerable<T> remaining = split.Remaining;
Here is the class itself:
/// <summary>
/// Use this class when you want to pull the first item off of an IEnumerable
/// and then enumerate over the remaining elements and you want to avoid the
/// warning about "possible double iteration of IEnumerable" AND without constructing
/// a list or other duplicate data structure of the enumerable. You construct
/// this class from your existing IEnumerable and then use its First and
/// Remaining properties for your algorithm.
/// </summary>
/// <typeparam name="T">The type of item you are iterating over; there are no
/// "where" restrictions on this type.</typeparam>
public class SplitFirstEnumerable<T>
{
private readonly IEnumerator<T> _enumerator;
/// <summary>
/// Constructor
/// </summary>
/// <remarks>Will throw an exception if there are zero items in enumerable or
/// if the enumerable is already advanced past the last element.</remarks>
/// <param name="enumerable">The enumerable that you want to split</param>
public SplitFirstEnumerable(IEnumerable<T> enumerable)
{
_enumerator = enumerable.GetEnumerator();
if (_enumerator.MoveNext())
{
First = _enumerator.Current;
}
else
{
throw new ArgumentException("Parameter 'enumerable' must have at least 1 element to be split.");
}
}
/// <summary>
/// The first item of the original enumeration, equivalent to calling
/// enumerable.First().
/// </summary>
public T First { get; private set; }
/// <summary>
/// The items of the original enumeration minus the first, equivalent to calling
/// enumerable.Skip(1).
/// </summary>
public IEnumerable<T> Remaining
{
get
{
while (_enumerator.MoveNext())
{
yield return _enumerator.Current;
}
}
}
}
This does presuppose that the IEnumerable has at least one element to start. If you want to do more of a FirstOrDefault type setup, you'll need to catch the exception that would otherwise be thrown in the constructor.

There exists a general solution to address both Resharper warnings: the lack of guarantee for repeat-ability of IEnumerable, and the List base class (or potentially expensive ToList() workaround).
Create a specialized class, I.E "RepeatableEnumerable", implementing IEnumerable, with "GetEnumerator()" implemented with the following logic outline:
Yield all items already collected so far from the inner list.
If the wrapped enumerator has more items,
While the wrapped enumerator can move to the next item,
Get the current item from the inner enumerator.
Add the current item to the inner list.
Yield the current item
Mark the inner enumerator as having no more items.
Add extension methods and appropriate optimizations where the wrapped parameter is already repeatable. Resharper will no longer flag the indicated warnings on the following code:
public void DoSomething(IEnumerable<string> list)
{
var repeatable = list.ToRepeatableEnumeration();
if (repeatable.Any()) // <- no warning here anymore.
// Further, this will read at most one item from list. A
// query (SQL LINQ) with a 10,000 items, returning one item per second
// will pass this block in 1 second, unlike the ToList() solution / hack.
{
// ...
}
foreach (var item in repeatable) // <- and no warning here anymore, either.
// Further, this will read in lazy fashion. In the 10,000 item, one
// per second, query scenario, this loop will process the first item immediately
// (because it was read already for Any() above), and then proceed to
// process one item every second.
{
// ...
}
}
With a little work, you can also turn RepeatableEnumerable into LazyList, a full implementation of IList. That's beyond the scope of this particular problem though. :)
UPDATE: Code implementation requested in comments -- not sure why the original PDL wasn't enough, but in any case, the following faithfully implements the algorithm I suggested (My own implementation implements the full IList interface; that is a bit beyond the scope I want to release here... :) )
public class RepeatableEnumerable<T> : IEnumerable<T>
{
readonly List<T> innerList;
IEnumerator<T> innerEnumerator;
public RepeatableEnumerable( IEnumerator<T> innerEnumerator )
{
this.innerList = new List<T>();
this.innerEnumerator = innerEnumerator;
}
public IEnumerator<T> GetEnumerator()
{
// 1. Yield all items already collected so far from the inner list.
foreach( var item in innerList ) yield return item;
// 2. If the wrapped enumerator has more items
if( innerEnumerator != null )
{
// 2A. while the wrapped enumerator can move to the next item
while( innerEnumerator.MoveNext() )
{
// 1. Get the current item from the inner enumerator.
var item = innerEnumerator.Current;
// 2. Add the current item to the inner list.
innerList.Add( item );
// 3. Yield the current item
yield return item;
}
// 3. Mark the inner enumerator as having no more items.
innerEnumerator.Dispose();
innerEnumerator = null;
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
// Add extension methods and appropriate optimizations where the wrapped parameter is already repeatable.
public static class RepeatableEnumerableExtensions
{
public static RepeatableEnumerable<T> ToRepeatableEnumerable<T>( this IEnumerable<T> items )
{
var result = ( items as RepeatableEnumerable<T> )
?? new RepeatableEnumerable<T>( items.GetEnumerator() );
return result;
}
}

I realize this question is old and already marked as answered, but I was surprised that nobody suggested manually iterating over the enumerator:
// NOTE: list is of type IEnumerable<T>.
// The name was taken from the OP's code.
var enumerator = list.GetEnumerator();
if (enumerator.MoveNext())
{
// Run your list.Any() logic here
...
do
{
var item = enumerator.Current;
// Run your foreach (var item in list) logic here
...
} while (enumerator.MoveNext());
}
Seems a lot more straightforward than the other answers here.

Generally speaking, what you need is some state object into which you can PUSH the items (within a foreach loop), and out of which you then get your final result.
The downside of the enumerable LINQ operators is that they actively enumerate the source instead of accepting items being pushed to them, so they don't meet your requirements.
If you e.g. just need the minimum and maximum values of a sequence of 1'000'000 integers which cost $1'000 worth of processor time to retrieve, you end up writing something like this:
public class MinMaxAggregator
{
private bool _any;
private int _min;
private int _max;
public void OnNext(int value)
{
if (!_any)
{
_min = _max = value;
_any = true;
}
else
{
if (value < _min) _min = value;
if (value > _max) _max = value;
}
}
public MinMax GetResult()
{
if (!_any) throw new InvalidOperationException("Sequence contains no elements.");
return new MinMax(_min, _max);
}
}
public static MinMax DoSomething(IEnumerable<int> source)
{
var aggr = new MinMaxAggregator();
foreach (var item in source) aggr.OnNext(item);
return aggr.GetResult();
}
In fact, you just re-implemented the logic of the Min() and Max() operators. Of course that's easy, but they are only examples for arbitrary complex logic you might otherwise easily express in a LINQish way.
The solution came to me on yesterday's night walk: we need to PUSH... that's REACTIVE! All the beloved operators also exist in a reactive version built for the push paradigm. They can be chained together at will to whatever complexity you need, just as their enumerable counterparts.
So the min/max example boils down to:
public static MinMax DoSomething(IEnumerable<int> source)
{
// bridge over to the observable world
var connectable = source.ToObservable(Scheduler.Immediate).Publish();
// express the desired result there (note: connectable is observed by multiple observers)
var combined = connectable.Min().CombineLatest(connectable.Max(), (min, max) => new MinMax(min, max));
// subscribe
var resultAsync = combined.GetAwaiter();
// unload the enumerable into connectable
connectable.Connect();
// pick up the result
return resultAsync.GetResult();
}

Why not:
bool any;
foreach (var item in list)
{
any = true;
// ...
}
if(any)
{
//...
}
Update: Personally, I wouldn't drastically change the code just to get around a warning like this. I would just disable the warning and continue on. The warning is suggesting you change the general flow of the code to make it better; if you're not making the code better (and arguably making it worse) to address the warning; then the point of the warning is missed.
For example:
// ReSharper disable PossibleMultipleEnumeration
public void DoSomething(IEnumerable<string> list)
{
if (list.Any()) // <- here
{
// ...
}
foreach (var item in list) // <- and here
{
// ...
}
}
// ReSharper restore PossibleMultipleEnumeration

UIMS* - Fundamentally, there is no great solve. IEnumerable<T> used to be the "very basic thing that represents a bunch of things of the same type, so using it in method sigs is Correct." It has now also become a "thing that might evaluate behind the scenes, and might take a while, so now you always have to worry about that."
It's as if IDictionary suddenly were extended to support lazy loading of values, via a LazyLoader property of type Func<TKey,TValue>. Actually that'd be neat to have, but not so neat to be added to IDictionary, because now every time we receive an IDictionary we have to worry about that. But that's where we are.
So it would seem that "if a method takes an IEnumerable and evals it twice, always force eval via ToList()" is the best you can do. And nice work by Jetbrains to give us this warning.
*(Unless I'm Missing Something . . . just made it up but it seems useful)

Be careful when accepting enumerables in your method. The "warning" for the base type is only a hint, the enumeration warning is a true warning.
However, your list will be enumerated at least two times because you do any and then a foreach. If you add a ToList() your enumeration will be enumerated three times - remove the ToList().
I would suggest to set resharpers warning settings for the base type to a hint. So you still have a hint (green underline) and the possibility to quickfix it (alt+enter) and no "warnings" in your file.
You should take care if enumerating the IEnumerable is an expensive action like loading something from file or database, or if you have a method which calculates values and uses yield return. In this case do a ToList() or ToArray() first to load/calculate all data only ONCE.

You could use ICollection<T> (or IList<T>). It's less specific than List<T>, but doesn't suffer from the multiple-enumeration problem.
Still I'd tend to use IEnumerable<T> in this case. You can also consider to refactor the code to enumerate only once.

Use an IList as your parameter type rather than IEnumerable - IEnumerable has different semantics to List whereas IList has the same
IEnumerable could be based on a non-seekable stream which is why you get the warnings

You can iterate only once :
public void DoSomething(IEnumerable<string> list)
{
bool isFirstItem = true;
foreach (var item in list)
{
if (isFirstItem)
{
isFirstItem = false;
// ...
}
// ...
}
}

There is something no one had said before (#Zebi). Any() already iterates trying to find the element. If you call a ToList(), it will iterate as well, to create a list. The initial idea of using IEnumerable is only to iterate, anything else provokes an iteration in order to perform. You should try to, inside a single loop, do everything.
And include in it your .Any() method.
if you pass a list of Action in your method you would have a cleaner iterated once code
public void DoSomething(IEnumerable<string> list, params Action<string>[] actions)
{
foreach (var item in list)
{
for(int i =0; i < actions.Count; i++)
{
actions[i](item);
}
}
}

Extract item in a list if Contains returns true

I have two lists A and B, at the beginning of my program, they are both filled with information from a database (List A = List B). My program runs, List A is used and modified, List B is left alone. After a while I reload List B with new information from the database, and then do a check with that against List A.
foreach (CPlayer player in ListA)
if (ListB.Contains(player))
-----
Firstly, the object player is created from a class, its main identifier is player.Name.
If the Name is the same, but the other variables are different, would the .Contains still return true?
Class CPlayer(
public CPlayer (string name)
_Name = name
At the ---- I need to use the item from ListB that causes the .Contains to return true, how do I do that?

The default behaviour of List.Contains is that it uses the default equality comparer. If your items are reference types this means that it will use an identity comparison unless your class provides another implementation via Equals.
If you are using .NET 3.5 then you can change your second line to this which will do what you want:
if (ListB.Any(x => x.Name == player.Name))
For .NET 2.0 you could implement Equals and GetHashCode for your class, but this might give undesirable behaviour in other situations where you don't want two player objects to compare equal if they have the same name but differ in other fields.
An alternative way is to adapt Jon Skeet's answer for .NET 2.0. Create a Dictionary<string, object> and fill it with the names of all players in listB. Then to test if a player with a certain name is in listB you can use dict.ContainsKey(name).

An alternative to Mark's suggestion is to build a set of names and use that:
HashSet<string> namesB = new HashSet<string>(ListB.Select(x => x.Name));
foreach (CPlayer player in ListA)
{
if (namesB.Contains(player.Name))
{
...
}
}

Assuming you are using the System.Collections.Generic.List class, if the CPlayer class does not implement IEquatable<T> it will use the Equals and GetHashCode functions of the CPlayer class to check if the List has a member that equals the argument of Contains. Assuming that implementation is OK for you, you could something like
CPlayer listBItem = ListB.First(p => p == player);
to get the instance from ListB

It sounds like this is what you need to accomplish:
For each player in list A, find each player in list B with the same name and bring both players into the same scope.
Here is an approach which joins the two lists in a query:
var playerPairs =
from playerA in ListA
join playerB in ListB on playerA.Name equals playerB.Name
select new { playerA, playerB };
foreach(var playerPair in playerPairs)
{
Console.Write(playerPair.playerA.Name);
Console.Write(" -> ");
Console.WriteLine(playerPair.playerB.Name);
}

If you want the .Contains method to match only on CPlayer.Name, then in the CPlayer class implement these methods:
public override bool Equals(object obj)
{
if (!(obj is CPlayer)
return false;
return Name == (obj as CPlayer).Name;
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
If you want the Name comparison to be Case Insensitive, replace use this Equals method instead:
public override bool Equals(object obj)
{
if (!(obj is CPlayer)
return false;
return Name.Equals((obj as CPlayer).Name, StringComparison.OrdinalIgnoreCase);
}
If you do this, your .Contains call will work just as you want it.
Secondly, if you want to select this item in the list, do this:
var playerB = ListB[ListB.IndexOf(player)];
It uses the same .Equals and .GetHashCode methods.
UPD:
This is probably a subjective statement, but you could also squeeze some performance out of it, if your .Equals method compared the Int hashes before doing the string comparison..
Looking at the .NET sources (Reflector FTW) I can see that seemingly only the HastTable class uses GetHashCode to improve it's performance, instead of using .Equals to compare objects every single time. In the case of a small class like this, the equality comparer is simple, a single string comparison.. If you were comparing all properties though, then comparing two integers would be much faster (esp if they were cached :) )
The List.Contains and List.IndexOf don't use the hash code, and use the .Equals method, hence I proposed checking the hash code inside. It probably won't be anything noticeable, but when you're itching to get every single ms of execution (not always a good thing, bug hey! :P ) this might help someone. just saying... :)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Does Distinct() method keep original ordering of sequence intact? - c#

According to the documentation the sequence is unordered.

Related

C# Interface IEnumerable Any() without specifying generic types

Is it possible to turn an IEnumerable into an IOrderedEnumerable without using OrderBy?

Implementing IEquatable<T> to avoid duplicates from List<T>

"Possible multiple enumeration of IEnumerable" vs "Parameter can be declared with base type"

Extract item in a list if Contains returns true

Categories

Resources