AsEnumerable and how it affects a yield and a SqlDataReader

AsEnumerable and how it affects a yield and a SqlDataReader - c#

I'm trying to understand what the affect of AsEnumerable() has over my data when iterating over it. I have a mock in-memory list. If I foreach over it with first calling ToList(), this forces evaluation and my printout looks like this (see code at the bottom of this post to explain output):
entering da
yield return
yield return
yield return
exiting da
doing something to aaron
doing something to jeremy
doing something to brendan
All makes sense. The ToList() forces the yields in the repository to execute first into a list, then we get our foreach iteration. All good so far.
When I do the same except use AsEnumerable(), based on what I've read regarding IQueryable (I understand this isn't IQueryable), I would have thought this also forces evaluation, but it does not. It looks like this:
entering da
yield return
doing something to aaron
yield return
doing something to jeremy
yield return
doing something to brendan
exiting da
As it would if I never even called AsEnumerable(), so my question is:
Why does AsEnumerable behave differently for an in memory collection vs linq to sql and its IQueryable return type?
How would all this change when my repository is changed to using a SqlDataReader and doing a yield return inside of the reader (whilst calling Read() method). Would the rows coming from SqlServer that are buffered in the clients network buffer be fully evaluated before executing the foreach (normally a yield here will cause a "pause" in the repo while each row is processed by the foreach block. I know if I call ToList() first in this case, I can force evaluation of the SqlDataReader, so does AsEnumerable do the same here?
Note: I am not interested in whether putting yield into a SqlDataReader is a good idea, given it might hold the connection open, I've beaten this topic to death already :)
Here is my test code:
public class TestClient
{
public void Execute()
{
var data = MockRepo.GetData();
foreach (var p in data.AsEnumerable()) //or .ToList()
{
Console.WriteLine("doing something to {0}", p.Name);
}
Console.ReadKey();
}
}
public class Person
{
public Person(string name)
{
Name = name;
}
public string Name { get; set; }
}
public class MockRepo
{
private static readonly List<Person> items = new List<Person>(3)
{
new Person("aaron"),
new Person("jeremy"),
new Person("brendan")
};
public static IEnumerable<Person> GetData()
{
Console.WriteLine("entering da");
var enumerator = items.GetEnumerator();
while (enumerator.MoveNext())
{
Console.WriteLine("yield return");
yield return enumerator.Current;
}
Console.WriteLine("exiting da");
}
}

AsEnumerable does nothing except change the expression type to IEnumerable<T>. When it's used in a query like this:
var query = db.Customers
.Where(x => x.Foo)
.AsEnumerable()
.Where(x => x.Bar);
... that just means you'll use Queryable.Where for the first predicate (so that's converted to SQL), and Enumerable.Where for the second predicate (so that's executed in your .NET code).
It doesn't force evaluation. It doesn't do anything. It doesn't even check whether it's called on null.
See my Edulinq blog post on AsEnumerable for more information.

#Jon Skeet has already posted what AsEnumerable() does - it just changes the compile time type. But why would you use it?
Essentially by changing the expression from an IQueryable to an IEnumerable you can now use Linq to Objects (instead of the IQueryable implementation by your database provider) without any restriction - there does not have to be an equivalent method on the database side, so you can freely perform object transformation, remote calls (if required) or any sort of string manipulation.
That said you will want to do all the filtering you can while you are still working on the database (IQueryable) - otherwise you would be bringing all these rows into memory which will cost you - and only then use AsEnumerable() to do your final transformations afterwards.

According to the MSDN documentation:
The AsEnumerable(Of TSource)(IEnumerable(Of TSource)) method has no
effect other than to change the compile-time type of source from a
type that implements IEnumerable(Of T) to IEnumerable(Of T) itself.
It should not cause any evaluation, just hint that you want to use IEnumerable methods vs. some other implementation (IQueryable, etc.).

Related

C# Interface IEnumerable Any() without specifying generic types

I have casted
var info = property.Info;
object data = info.GetValue(obj);
...
var enumerable = (IEnumerable)data;
if (enumerable.Any()) ///Does not compile
{
}
if (enumerable.GetEnumerator().Current != null) // Run time error
{
}
and I would like to see if this enumerable has any elements, via using Linq Query Any(). But unfortunately, even with using Linq, I can't.
How would I do this without specifying the generic type.

While you can't do this directly, you could do it via Cast:
if (enumerable.Cast<object>().Any())
That should always work, as any IEnumerable can be wrapped as an IEnumerable<object>. It will end up boxing the first element if it's actually an IEnumerable<int> or similar, but it should work fine. Unlike most LINQ methods, Cast and OfType target IEnumerable rather than IEnumerable<T>.
You could write your own subset of extension methods like the LINQ ones but operating on the non-generic IEnumerable type if you wanted to, of course. Implementing LINQ to Objects isn't terribly hard - you could use my Edulinq project as a starting point, for example.
There are cases where you could implement Any(IEnumerable) slightly more efficiently than using Cast - for example, taking a shortcut if the target implements the non-generic ICollection interface. At that point, you wouldn't need to create an iterator or take the first element. In most cases that won't make much performance difference, but it's the kind of thing you could do if you were optimizing.

One method is to use foreach, as noted in IEnumerable "Remarks". It also provides details on the additional methods off of the result of GetEnumerator.
bool hasAny = false;
foreach (object i in (IEnumerable)(new int[1] /* IEnumerable of any type */)) {
hasAny = true;
break;
}
(Which is itself easily transferable to an Extension method.)

Your attempt to use GetEnumerator().Current tried to get the current value of an enumerator that had not yet been moved to the first position yet. It would also have given the wrong result if the first item existed or was null. What you could have done (and what the Any() in Enumerable does) is see if it was possible to move to that first item or not; i.e. is there a first item to move to:
internal static class UntypedLinq
{
public static bool Any(this IEnumerable source)
{
if (source == null) throw new ArgumentNullException(nameof(source));
IEnumerator ator = source.GetEnumerator();
// Unfortunately unlike IEnumerator<T>, IEnumerator does not implement
// IDisposable. (A design flaw fixed when IEnumerator<T> was added).
// We need to test whether disposal is required or not.
if (ator is IDisposable disp)
{
using(disp)
{
return ator.MoveNext();
}
}
return ator.MoveNext();
}
// Not completely necessary. Causes any typed enumerables to be handled by the existing Any
// in Linq via a short method that will be inlined.
public static bool Any<T>(this IEnumerable<T> source) => Enumerable.Any(source);
}

decorate IEnumerable without looping

I need to create an IEnummerable of DcumentSearch object from IQueryable
The following code causes the database to load the entire result which makes my app slow.
public static IEnumerable<DocumentSearch> BuildDocumentSearch(IQueryable<Document> documents)
{
var enumerator = documents.GetEnumerator();
while(enumerator.MoveNext())
{
yield return new DocumentSearch(enumerator.Current);
}
}

The natural way of writing this is:
public static IEnumerable<DocumentSearch> BuildDocumentSearch(IQueryable<Document> documents)
{
return documents.Select(doc => new DocumentSearch(doc));
}
When you call one of the IEnumerable extension methods like Select, Where, OrderBy etc, you are still adding to the recipe for the results that will be returned. When you try to access an element of an IEnumerable (as in your example), the result set must be resolved at that time.
For what it's worth, your while loop would be more naturally written as a foreach loop, though it should have the same semantics about when the query is executed.

Does Distinct() method keep original ordering of sequence intact?

I want to remove duplicates from list, without changing order of unique elements in the list.
Jon Skeet & others have suggested to use the following:
list = list.Distinct().ToList();
Reference:
How to remove duplicates from a List<T>?
Remove duplicates from a List<T> in C#
Is it guaranteed that the order of unique elements would be same as before? If yes, please give a reference that confirms this as I couldn't find anything on it in documentation.

It's not guaranteed, but it's the most obvious implementation. It would be hard to implement in a streaming manner (i.e. such that it returned results as soon as it could, having read as little as it could) without returning them in order.
You might want to read my blog post on the Edulinq implementation of Distinct().
Note that even if this were guaranteed for LINQ to Objects (which personally I think it should be) that wouldn't mean anything for other LINQ providers such as LINQ to SQL.
The level of guarantees provided within LINQ to Objects is a little inconsistent sometimes, IMO. Some optimizations are documented, others not. Heck, some of the documentation is flat out wrong.

In the .NET Framework 3.5, disassembling the CIL of the Linq-to-Objects implementation of Distinct() shows that the order of elements is preserved - however this is not documented behavior.
I did a little investigation with Reflector. After disassembling System.Core.dll, Version=3.5.0.0 you can see that Distinct() is an extension method, which looks like this:
public static class Emunmerable
{
public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
throw new ArgumentNullException("source");
return DistinctIterator<TSource>(source, null);
}
}
So, interesting here is DistinctIterator, which implements IEnumerable and IEnumerator. Here is simplified (goto and lables removed) implementation of this IEnumerator:
private sealed class DistinctIterator<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable
{
private bool _enumeratingStarted;
private IEnumerator<TSource> _sourceListEnumerator;
public IEnumerable<TSource> _source;
private HashSet<TSource> _hashSet;
private TSource _current;
private bool MoveNext()
{
if (!_enumeratingStarted)
{
_sourceListEnumerator = _source.GetEnumerator();
_hashSet = new HashSet<TSource>();
_enumeratingStarted = true;
}
while(_sourceListEnumerator.MoveNext())
{
TSource element = _sourceListEnumerator.Current;
if (!_hashSet.Add(element))
continue;
_current = element;
return true;
}
return false;
}
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
TSource IEnumerator<TSource>.Current
{
get { return _current; }
}
object IEnumerator.Current
{
get { return _current; }
}
}
As you can see - enumerating goes in order provided by source enumerable (list, on which we are calling Distinct). Hashset is used only for determining whether we already returned such element or not. If not, we are returning it, else - continue enumerating on source.
So, it is guaranteed, that Distinct() will return elements exactly in same order, which are provided by collection to which Distinct was applied.

According to the documentation the sequence is unordered.

Yes, Enumerable.Distinct preserves order. Assuming the method to be lazy "yields distinct values are soon as they are seen", it follows automatically. Think about it.
The .NET Reference source confirms. It returns a subsequence, the first element in each equivalence class.
foreach (TSource element in source)
if (set.Add(element)) yield return element;
The .NET Core implementation is similar.
Frustratingly, the documentation for Enumerable.Distinct is confused on this point:
The result sequence is unordered.
I can only imagine they mean "the result sequence is not sorted." You could implement Distinct by presorting then comparing each element to the previous, but this would not be lazy as defined above.

A bit late to the party, but no one really posted the best complete code to accomplish this IMO, so let me offer this (which is essentially identical to what .NET Framework does with Distinct())*:
public static IEnumerable<T> DistinctOrdered<T>(this IEnumerable<T> items)
{
HashSet<T> returnedItems = new HashSet<T>();
foreach (var item in items)
{
if (returnedItems.Add(item))
yield return item;
}
}
This guarantees the original order without reliance on undocumented or assumed behavior. I also believe this is more efficient than using multiple LINQ methods though I'm open to being corrected here.
(*) The .NET Framework source uses an internal Set class, which appears to be substantively identical to HashSet.

By default when use Distinct linq operator uses Equals method but you can use your own IEqualityComparer<T> object to specify when two objects are equals with a custom logic implementing GetHashCode and Equals method.
Remember that:
GetHashCode should not used heavy cpu comparision ( eg. use only some obvious basic checks ) and its used as first to state if two object are surely different ( if different hash code are returned ) or potentially the same ( same hash code ). In this latest case when two object have the same hashcode the framework will step to check using the Equals method as a final decision about equality of given objects.
After you have MyType and a MyTypeEqualityComparer classes follow code not ensure the sequence maintain its order:
var cmp = new MyTypeEqualityComparer();
var lst = new List<MyType>();
// add some to lst
var q = lst.Distinct(cmp);
In follow sci library I implemented an extension method to ensure Vector3D set maintain the order when use a specific extension method DistinctKeepOrder:
relevant code follows:
/// <summary>
/// support class for DistinctKeepOrder extension
/// </summary>
public class Vector3DWithOrder
{
public int Order { get; private set; }
public Vector3D Vector { get; private set; }
public Vector3DWithOrder(Vector3D v, int order)
{
Vector = v;
Order = order;
}
}
public class Vector3DWithOrderEqualityComparer : IEqualityComparer<Vector3DWithOrder>
{
Vector3DEqualityComparer cmp;
public Vector3DWithOrderEqualityComparer(Vector3DEqualityComparer _cmp)
{
cmp = _cmp;
}
public bool Equals(Vector3DWithOrder x, Vector3DWithOrder y)
{
return cmp.Equals(x.Vector, y.Vector);
}
public int GetHashCode(Vector3DWithOrder obj)
{
return cmp.GetHashCode(obj.Vector);
}
}
In short Vector3DWithOrder encapsulate the type and an order integer, while Vector3DWithOrderEqualityComparer encapsulates original type comparer.
and this is the method helper to ensure order maintained
/// <summary>
/// retrieve distinct of given vector set ensuring to maintain given order
/// </summary>
public static IEnumerable<Vector3D> DistinctKeepOrder(this IEnumerable<Vector3D> vectors, Vector3DEqualityComparer cmp)
{
var ocmp = new Vector3DWithOrderEqualityComparer(cmp);
return vectors
.Select((w, i) => new Vector3DWithOrder(w, i))
.Distinct(ocmp)
.OrderBy(w => w.Order)
.Select(w => w.Vector);
}
Note : further research could allow to find a more general ( uses of interfaces ) and optimized way ( without encapsulate the object ).

This highly depends on your linq-provider. On Linq2Objects you can stay on the internal source-code for Distinct, which makes one assume the original order is preserved.
However for other providers that resolve to some kind of SQL for example, that isn´t neccessarily the case, as an ORDER BY-statement usually comes after any aggregation (such as Distinct). So if your code is this:
myArray.OrderBy(x => anothercol).GroupBy(x => y.mycol);
this is translated to something similar to the following in SQL:
SELECT * FROM mytable GROUP BY mycol ORDER BY anothercol;
This obviously first groups your data and sorts it afterwards. Now you´re stuck on the DBMS own logic of how to execute that. On some DBMS this isn´t even allowed. Imagine the following data:
mycol anothercol
1 2
1 1
1 3
2 1
2 3
when executing myArr.OrderBy(x => x.anothercol).GroupBy(x => x.mycol) we assume the following result:
mycol anothercol
1 1
2 1
But the DBMS may aggregate the anothercol-column so, that allways the value of the first row is used, resulting in the following data:
mycol anothercol
1 2
2 1
which after ordering will result in this:
mycol anothercol
2 1
1 2
This is similar to the following:
SELECT mycol, First(anothercol) from mytable group by mycol order by anothercol;
which is the completely reverse order than what you expected.
You see the execution-plan may vary depending on what the underlying provider is. This is why there´s no guarantee about that in the docs.

How to make this linq efficient

I have this code snippet where we get a collection from COM Dll
public BOCollection SelectedObjects{
get
{
IMSICDPInterfacesLib.IJMonikerElements oIJMonikerElements;
oIJMonikerElements = m_oIJSelectSet.Elements as IMSICDPInterfacesLib.IJMonikerElements;
BOCollection oBusinessObjects = new BOCollection(oIJMonikerElements);
return oBusinessObjects;
}
}
Now BOCollection does implement IEnumerable. So would it be better to change it to
public IEnumerable<BusinessObject> SelectedObjects
So as to get the iterator goodness ? Or is there another way ?
thanks
Sunit

Are you wanting to return IEnumerable so you get deferred execution? First off, you wouldn't want to do this in a property, as I'm sure FxCop will yell at you for that. Here's how I suggest you change things so you can benefit from both deferred execution and LINQ.
Change the m_oIJSelectSet.Elements property to a method that returns IEnumerable like so:
public IEnumerable<IJMonikeElements> GetElements() {
// Do some magic here to determine which elements are selected
return (from e in this.allElements where e.IsSelected select e).AsEnumerable();
// This could also be a complicated loop
// while (someCondition()) {
// bool isSelected = false;
// var item = this.allItems[i++];
// Complicated logic determine if item is selected
// if (isSelected) {
// yield return item;
// }
}
}
public IEnumerable<BusinessObject> GetSelectedObjects() {
return m_oIJSelectSet.GetElements().Cast<BusinessObject>();
}
Now, you'll have complete deferred execution and LINQ support.

If BOCollection implements IEnumerable, then you've already got the iterator goodness. Just throw it in a for or foreach loop.

The problem with IEnumerable<T> is yes, it will give you "Linq goodness", but the lowest common denominator of Linq goodness. Better to return IList<T> or even IQueryable<T> (if you can do this).
For example if somebody wanted to get the 4th element, IEnumerable<T> doesn't makes sense if you are already storing the objects in an array or list.
To get IQueryable<T> from a List<T> do this:
IQueryable<int> query = list.AsQueryable();

Does it load the data from database?

Assume we have a method like this:
public IEnumerable<T> FirstMethod()
{
var entities = from t in context.Products
where {some conditions}
select t;
foreach( var entity in entities )
{
entity.SomeProperty = {SomeValue};
yield return entity;
}
}
where context is a DataContext that is generated by Linq to SQL designer.
Does "FirstMethod" load the data into memory from database (because of the foreach loop) or will it still defer-load it until another foreach loop that doesn't have "yield return" is found in another method like the following?
public void SecondMethod()
{
foreach( var item in FirstMethod() )
{
{Do Something}
}
}

The latter (deferred); FirstMethod is an iterator block (because of yield return); this means that you have a chain of iterators. Nothing is read until the final caller starts iterating the data; then each record is read in turn during the final caller's foreach (between which the connection/command is open).
The using that surrounds foreach (under the bonnet) ensures that the connection is closed if the foreach is abandoned half-way-through.
If you want to load the data earlier, use .ToList() or .ToArray() to buffer the data locally - but note that this breaks "composition" - i.e. the caller can no longer add extra Where etc clauses (which they can if it returns a raw IQueryable<T>).
Re your question:
public IEnumerable<T> FirstMethod()
{
var entities = from t in context.Products
where {some conditions}
select t;
foreach( var entity in entities.AsEnumerable() )
{
entity.SomeProperty = {SomeValue};
yield return entity;
}
}
The AsEnumerable is the key here; it ends the composable IQueryable<T> chain, and uses LINQ-to-Objects for the rest.

In short, it doesn't load until SecondMethod performs the iteration...
Read here for more...

Loading is deferred until the GetEnumerator method is called on the entities query and that won't happen until the GetEnumerator method is called on the IEnumerable<T> you're returning.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

AsEnumerable and how it affects a yield and a SqlDataReader - c#

Related

C# Interface IEnumerable Any() without specifying generic types

decorate IEnumerable without looping

Does Distinct() method keep original ordering of sequence intact?

How to make this linq efficient

Does it load the data from database?

Categories

Resources