Assume we have a method like this:
public IEnumerable<T> FirstMethod()
{
var entities = from t in context.Products
where {some conditions}
select t;
foreach( var entity in entities )
{
entity.SomeProperty = {SomeValue};
yield return entity;
}
}
where context is a DataContext that is generated by Linq to SQL designer.
Does "FirstMethod" load the data into memory from database (because of the foreach loop) or will it still defer-load it until another foreach loop that doesn't have "yield return" is found in another method like the following?
public void SecondMethod()
{
foreach( var item in FirstMethod() )
{
{Do Something}
}
}
The latter (deferred); FirstMethod is an iterator block (because of yield return); this means that you have a chain of iterators. Nothing is read until the final caller starts iterating the data; then each record is read in turn during the final caller's foreach (between which the connection/command is open).
The using that surrounds foreach (under the bonnet) ensures that the connection is closed if the foreach is abandoned half-way-through.
If you want to load the data earlier, use .ToList() or .ToArray() to buffer the data locally - but note that this breaks "composition" - i.e. the caller can no longer add extra Where etc clauses (which they can if it returns a raw IQueryable<T>).
Re your question:
public IEnumerable<T> FirstMethod()
{
var entities = from t in context.Products
where {some conditions}
select t;
foreach( var entity in entities.AsEnumerable() )
{
entity.SomeProperty = {SomeValue};
yield return entity;
}
}
The AsEnumerable is the key here; it ends the composable IQueryable<T> chain, and uses LINQ-to-Objects for the rest.
In short, it doesn't load until SecondMethod performs the iteration...
Read here for more...
Loading is deferred until the GetEnumerator method is called on the entities query and that won't happen until the GetEnumerator method is called on the IEnumerable<T> you're returning.
Related
public async IAsyncEnumerable<Entity> FindByIds(List<string> ids)
{
List<List<string>> splitIdsList = ids.Split(5);
var entityList = splitIdsList.Select(x => FindByIdsQuery(x)).ToList();
foreach (var entities in entityList)
{
await foreach (var entity in entities)
{
yield return entity;
}
}
}
private async IAsyncEnumerable<Entity> FindByIdsQuery(List<string> ids)
{
var result = await Connection.QueryAsync(query, new {ids})
foreach (var entity in result)
{
yield return entity;
}
}
If I send 25 ids to this function. The first FindByIdsQuery takes 5000ms. The other 4 FindByIdsQuery takes 100ms. Then this solution wont output any Entities, until after 5000ms. Is there any solution that will start outputting entities as soon as there exist anyone to output. Or, if you could do something like in Task, with Task.WhenAny.
To be clear: Any of the 5 Queries can take 5000ms.
From your comments, I understood your problem. What you are basically looking for is some kind of "SelectMany" operator. This operator would start awaiting all of the IAsyncEnumerables and return items in order in which they come, irrespective in what order the source async enumerables are.
I was hoping, that default AsyncEnumerable.SelectMany does this, but I found that not to be true. It goes through the source enumerables and then goes through whole inner enumerable before continuing onto next. So I hacked together SelectMany variant that properly awaits for all inner async enumerables at the same time. Be warned, I do not guarantee correctness, nor safety. There is zero error handling.
/// <summary>
/// Starts all inner IAsyncEnumerable and returns items from all of them in order in which they come.
/// </summary>
public static async IAsyncEnumerable<TItem> SelectManyAsync<TItem>(IEnumerable<IAsyncEnumerable<TItem>> source)
{
// get enumerators from all inner IAsyncEnumerable
var enumerators = source.Select(x => x.GetAsyncEnumerator()).ToList();
List<Task<(IAsyncEnumerator<TItem>, bool)>> runningTasks = new List<Task<(IAsyncEnumerator<TItem>, bool)>>();
// start all inner IAsyncEnumerable
foreach (var asyncEnumerator in enumerators)
{
runningTasks.Add(MoveNextWrapped(asyncEnumerator));
}
// while there are any running tasks
while (runningTasks.Any())
{
// get next finished task and remove it from list
var finishedTask = await Task.WhenAny(runningTasks);
runningTasks.Remove(finishedTask);
// get result from finished IAsyncEnumerable
var result = await finishedTask;
var asyncEnumerator = result.Item1;
var hasItem = result.Item2;
// if IAsyncEnumerable has item, return it and put it back as running for next item
if (hasItem)
{
yield return asyncEnumerator.Current;
runningTasks.Add(MoveNextWrapped(asyncEnumerator));
}
}
// don't forget to dispose, should be in finally
foreach (var asyncEnumerator in enumerators)
{
await asyncEnumerator.DisposeAsync();
}
}
/// <summary>
/// Helper method that returns Task with tuple of IAsyncEnumerable and it's result of MoveNextAsync.
/// </summary>
private static async Task<(IAsyncEnumerator<TItem>, bool)> MoveNextWrapped<TItem>(IAsyncEnumerator<TItem> asyncEnumerator)
{
var res = await asyncEnumerator.MoveNextAsync();
return (asyncEnumerator, res);
}
You can then use it to merge all the enumerables instead of the first foreach:
var entities = SelectManyAsync(splitIdsList.Select(x => FindByIdsQuery(x)));
return entities;
The problem is that your code makes them wait. There is NO sense in an async foreach here because - you do not do async.
You do this:
var entityList = splitIdsList.Select(x => FindByIdsQuery(x)).ToList();
This is the part of the query that could run async, but it does not because you materialize the whole result set into a list. You then go on async looping over it, but at that point all results already are in memory.
The way to get async is simply to get rid of ToList. Dump the query into foreach, do not materialize it into memory. The async foreach should hit the ef level query (not query result) so you can process information as you get it from the database. ToList efficiently bypasses this.
Also understand that EF can not efficiently handle multiple id lookups. The only possibly way for it to do it is put them into an array and contains, which is a SQL "IN" clause - terribly inefficient for larger numbers as it forces table scan. The efficient SQL way would be to load them into a table valued variable with statistics and use a join, but there is no way to do that in EF - one of the limitations. The SQL limitations of long IN clauses are well documented. The limitations of the EF side not, but they are still there.
I have a method that takes an IEnumerable, filters it further and loops through the filtered collection
to modify one property.
I am observing a very weird behaviour.
While the method loops through the filtered IEnumerable<Entity>, after a few iterations (I've not exactly counted how many),
one of the items in it gets deleted.
private async Task<bool> UpdateSomeValue(IEnumerable<BusinessEntity> entities, BusinessEntity entityToDelete)
{
//FIlter the IENumerable
var entitiesToUpdateSequence = entities
.Where(f => f.Sequence > entityToDelete.Sequence);
if (entitiesToUpdateSequence.Any())
{
var testList = new List<FormBE>(entitiesToUpdateSequence);
Debug.WriteLine(entitiesToUpdateSequence.Count()); // 5
//DUring this loop, after a few iterations, one item gets deleted
foreach (var entity in testList)
{
entity.Sequence -= 1;
}
Debug.WriteLine(entitiesToUpdateSequence.Count()); // 4
return await _someRepo.UpdateEntitySequence(entityToDelete.Id1, entityToDelete.ID2, testList);
}
return await Task.FromResult(true);
}
THis method is called like this:
var entities = await entitiesTask.ConfigureAwait(false);
var entityToDelete = entities.Single(f => f.Key.Equals("someValue"));
var updated = await UpdateSomeValue(entities, entityToDelete);
and that's it, there's no other reference to the entities collection. Therefore, it cannot be modified from any other thread.
I've temprorarily found a word around by copy the filtered IEnumerable in a List and then using the List for further operation
(List content remains the same after the loop).
What may be causing this issue?
Check out the documentation on Enumerable.Where. Specifically, the Remarks.
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
Which means that when you call Where you're not necessarily getting back an object such as a List or Array that just has X number of items in it. You're getting back an object that knows how to filter the IEnumerable<T> you called Where on, based on the predicate you provided. When you iterate that object, such as with a foreach loop or a call to Enumerable.Count() each item in the source IEnumerable<T> is evaluated against the predicate you provided and only the items that satisfy that predicate are returned.
Since the predicate you're providing checks the Sequence property, and you're modifying that property inside the first foreach loop, the second time you iterate entitiesToUpdateSequence fewer items match the predicate you provided and so you get a lower count. If you were to increment Sequence instead of decrement it, you might end up with a higher count the second time you iterate entitiesToUpdateSequence.
When using a foreach loop with a nested condition inside, I ever write in the following way:
foreach (RadioButton item in listOfRadioButtons)
{
if (item.IsChecked == true)
{
// sometging
}
}
But I've installed ReSharper and it suggests to change this loop to the following form (removing the if and using a lambda):
foreach (RadioButton item in listOfRadioButtons.Where(item => item.IsChecked == true))
{
// something
}
In my experience, the ReSharper way will loop two times: one to generate the filtered IEnumerable, and after to loop the results of the .Where query again.
I am correct? If so, why is ReSharper suggesting this? Because in my opinion, the first is also more reliable.
Note: The default IsChecked property of the WPF RadioButton is a Nullable bool, so it's need a == true, a .Value, or a cast to bool inside a condition to return bool.
In my experience, the ReSharper way will loop two times: one to
generate the filtered IEnumerable, and after to loop the results of
the .Where query again.
Nope, it will loop only once. Where does not loop your collection - it only creates iterator which will be used to enumerate your collection. Here is how LINQ solution looks like:
using(var iterator = listOfRadioButtons.Where(rb => rb.IsChecked == true))
{
while(iterator.MoveNext())
{
RadioButton item = iterator.Current;
// something
}
}
Your original code is better for performance - you will avoid creating delegate and passing it to instance of WhereEnumerableIterator, and then executing delegate for each item in source sequence. But you should note, as #dcastro pointed, difference will be really small and does not worth noting until you will have to optimize this particular loop.
Solution suggested by ReSharper is (maybe) better for readability. I personally like simple if condition in a loop.
UPDATE: Where iterator can be simplified to (also some interfaces are omitted)
public class WhereEnumerableIterator<T> : IEnumerable<T>, IDisposable
{
private IEnumerator<T> _enumerator;
private Func<T,bool> _predicate;
public WhereEnumerableIterator(IEnumerable<T> source, Func<T,bool> predicate)
{
_predicate = predicate;
_enumerator = source.GetEnumerator();
}
public bool MoveNext()
{
while (_enumerator.MoveNext())
{
if (_predicate(_enumerator.Current))
{
Current = _enumerator.Current;
return true;
}
}
return false;
}
public T Current { get; private set; }
public void Dispose()
{
if (_enumerator != null)
_enumerator.Dispose();
}
}
Main idea here - it enumerates original source only when you ask it to move to next item. Then iterator goes to next item in original source and checks if it matches predicate. If match found, then it returns current item and puts enumerating source on hold.
So, until you will not ask items from this iterator, it will not enumerate source. If you will call ToList() on this iterator, it will enumerate source sequence and return all matched items, which will be saved to new list.
I need to create an IEnummerable of DcumentSearch object from IQueryable
The following code causes the database to load the entire result which makes my app slow.
public static IEnumerable<DocumentSearch> BuildDocumentSearch(IQueryable<Document> documents)
{
var enumerator = documents.GetEnumerator();
while(enumerator.MoveNext())
{
yield return new DocumentSearch(enumerator.Current);
}
}
The natural way of writing this is:
public static IEnumerable<DocumentSearch> BuildDocumentSearch(IQueryable<Document> documents)
{
return documents.Select(doc => new DocumentSearch(doc));
}
When you call one of the IEnumerable extension methods like Select, Where, OrderBy etc, you are still adding to the recipe for the results that will be returned. When you try to access an element of an IEnumerable (as in your example), the result set must be resolved at that time.
For what it's worth, your while loop would be more naturally written as a foreach loop, though it should have the same semantics about when the query is executed.
I'm trying to understand what the affect of AsEnumerable() has over my data when iterating over it. I have a mock in-memory list. If I foreach over it with first calling ToList(), this forces evaluation and my printout looks like this (see code at the bottom of this post to explain output):
entering da
yield return
yield return
yield return
exiting da
doing something to aaron
doing something to jeremy
doing something to brendan
All makes sense. The ToList() forces the yields in the repository to execute first into a list, then we get our foreach iteration. All good so far.
When I do the same except use AsEnumerable(), based on what I've read regarding IQueryable (I understand this isn't IQueryable), I would have thought this also forces evaluation, but it does not. It looks like this:
entering da
yield return
doing something to aaron
yield return
doing something to jeremy
yield return
doing something to brendan
exiting da
As it would if I never even called AsEnumerable(), so my question is:
Why does AsEnumerable behave differently for an in memory collection vs linq to sql and its IQueryable return type?
How would all this change when my repository is changed to using a SqlDataReader and doing a yield return inside of the reader (whilst calling Read() method). Would the rows coming from SqlServer that are buffered in the clients network buffer be fully evaluated before executing the foreach (normally a yield here will cause a "pause" in the repo while each row is processed by the foreach block. I know if I call ToList() first in this case, I can force evaluation of the SqlDataReader, so does AsEnumerable do the same here?
Note: I am not interested in whether putting yield into a SqlDataReader is a good idea, given it might hold the connection open, I've beaten this topic to death already :)
Here is my test code:
public class TestClient
{
public void Execute()
{
var data = MockRepo.GetData();
foreach (var p in data.AsEnumerable()) //or .ToList()
{
Console.WriteLine("doing something to {0}", p.Name);
}
Console.ReadKey();
}
}
public class Person
{
public Person(string name)
{
Name = name;
}
public string Name { get; set; }
}
public class MockRepo
{
private static readonly List<Person> items = new List<Person>(3)
{
new Person("aaron"),
new Person("jeremy"),
new Person("brendan")
};
public static IEnumerable<Person> GetData()
{
Console.WriteLine("entering da");
var enumerator = items.GetEnumerator();
while (enumerator.MoveNext())
{
Console.WriteLine("yield return");
yield return enumerator.Current;
}
Console.WriteLine("exiting da");
}
}
AsEnumerable does nothing except change the expression type to IEnumerable<T>. When it's used in a query like this:
var query = db.Customers
.Where(x => x.Foo)
.AsEnumerable()
.Where(x => x.Bar);
... that just means you'll use Queryable.Where for the first predicate (so that's converted to SQL), and Enumerable.Where for the second predicate (so that's executed in your .NET code).
It doesn't force evaluation. It doesn't do anything. It doesn't even check whether it's called on null.
See my Edulinq blog post on AsEnumerable for more information.
#Jon Skeet has already posted what AsEnumerable() does - it just changes the compile time type. But why would you use it?
Essentially by changing the expression from an IQueryable to an IEnumerable you can now use Linq to Objects (instead of the IQueryable implementation by your database provider) without any restriction - there does not have to be an equivalent method on the database side, so you can freely perform object transformation, remote calls (if required) or any sort of string manipulation.
That said you will want to do all the filtering you can while you are still working on the database (IQueryable) - otherwise you would be bringing all these rows into memory which will cost you - and only then use AsEnumerable() to do your final transformations afterwards.
According to the MSDN documentation:
The AsEnumerable(Of TSource)(IEnumerable(Of TSource)) method has no
effect other than to change the compile-time type of source from a
type that implements IEnumerable(Of T) to IEnumerable(Of T) itself.
It should not cause any evaluation, just hint that you want to use IEnumerable methods vs. some other implementation (IQueryable, etc.).