I have a method that takes an IEnumerable, filters it further and loops through the filtered collection
to modify one property.
I am observing a very weird behaviour.
While the method loops through the filtered IEnumerable<Entity>, after a few iterations (I've not exactly counted how many),
one of the items in it gets deleted.
private async Task<bool> UpdateSomeValue(IEnumerable<BusinessEntity> entities, BusinessEntity entityToDelete)
{
//FIlter the IENumerable
var entitiesToUpdateSequence = entities
.Where(f => f.Sequence > entityToDelete.Sequence);
if (entitiesToUpdateSequence.Any())
{
var testList = new List<FormBE>(entitiesToUpdateSequence);
Debug.WriteLine(entitiesToUpdateSequence.Count()); // 5
//DUring this loop, after a few iterations, one item gets deleted
foreach (var entity in testList)
{
entity.Sequence -= 1;
}
Debug.WriteLine(entitiesToUpdateSequence.Count()); // 4
return await _someRepo.UpdateEntitySequence(entityToDelete.Id1, entityToDelete.ID2, testList);
}
return await Task.FromResult(true);
}
THis method is called like this:
var entities = await entitiesTask.ConfigureAwait(false);
var entityToDelete = entities.Single(f => f.Key.Equals("someValue"));
var updated = await UpdateSomeValue(entities, entityToDelete);
and that's it, there's no other reference to the entities collection. Therefore, it cannot be modified from any other thread.
I've temprorarily found a word around by copy the filtered IEnumerable in a List and then using the List for further operation
(List content remains the same after the loop).
What may be causing this issue?
Check out the documentation on Enumerable.Where. Specifically, the Remarks.
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
Which means that when you call Where you're not necessarily getting back an object such as a List or Array that just has X number of items in it. You're getting back an object that knows how to filter the IEnumerable<T> you called Where on, based on the predicate you provided. When you iterate that object, such as with a foreach loop or a call to Enumerable.Count() each item in the source IEnumerable<T> is evaluated against the predicate you provided and only the items that satisfy that predicate are returned.
Since the predicate you're providing checks the Sequence property, and you're modifying that property inside the first foreach loop, the second time you iterate entitiesToUpdateSequence fewer items match the predicate you provided and so you get a lower count. If you were to increment Sequence instead of decrement it, you might end up with a higher count the second time you iterate entitiesToUpdateSequence.
Related
I'm trying to upload multiple files for an ASP.NET website. As such, I have the following simplified example:
public ContentResult UploadFiles(IList<HttpPostedFileBase> files)
{
if (files == null ||
!files.Any())
{
// Nothing uploaded.
return Content("No files where uploaded.");
}
myService.UploadCsvFilesToDatabaseAsync(files.Select(x => x.InputStream));
return Content("Success!");
}
Notice this: myService.UploadCsvFilesToDatabaseAsync(files.Select(x => x.InputStream)
I do not want to enumerate through the files at this point .. but later on when I'm inside that method.
So, does files.Select(x => x.InputStream) do any enumerating at this point ? Or does it just pass in a new collection, which contains the start of each input stream, ready to be enumerated.
Clarification: I do not want to read any data in from the files at this point but a little bit later on inside, that method.
So, does files.Select(x => x.InputStream) do any enumerating at this
point ?
NO, quoting from Documentation
This method is implemented by using deferred execution. The immediate
return value is an object that stores all the information that is
required to perform the action. The query represented by this method
is not executed until the object is enumerated either by calling its
GetEnumerator method directly or by using foreach
So until you are actually enumerating through it, it actually doesn't process the list. A simple test would be below
List<int> intlist = new List<int>() {1,2,3};
var result = intlist.Select(x => x);
intlist.Add(12);
foreach (var item in result)
{
Console.WriteLine(item);
}
This would consider even the element which had been added after the LINQ expression. Would have it been eager loaded it would have not been included the element 12.
Results In:
1
2
3
12
Update: For a complete example of both deferred execution AND already-executed, have a look at this .NET fiddle example.
I need to create an IEnummerable of DcumentSearch object from IQueryable
The following code causes the database to load the entire result which makes my app slow.
public static IEnumerable<DocumentSearch> BuildDocumentSearch(IQueryable<Document> documents)
{
var enumerator = documents.GetEnumerator();
while(enumerator.MoveNext())
{
yield return new DocumentSearch(enumerator.Current);
}
}
The natural way of writing this is:
public static IEnumerable<DocumentSearch> BuildDocumentSearch(IQueryable<Document> documents)
{
return documents.Select(doc => new DocumentSearch(doc));
}
When you call one of the IEnumerable extension methods like Select, Where, OrderBy etc, you are still adding to the recipe for the results that will be returned. When you try to access an element of an IEnumerable (as in your example), the result set must be resolved at that time.
For what it's worth, your while loop would be more naturally written as a foreach loop, though it should have the same semantics about when the query is executed.
I have two tables: Transactions and TransactionAgents. TransactionAgents has a foreign key to Transactions called TransactionID. Pretty standard.
I also have this code:
BrokerManagerDataContext db = new BrokerManagerDataContext();
var transactions = from t in db.Transactions
where t.SellingPrice != 0
select t;
var taAgents = from ta in db.TransactionAgents
select ta;
foreach (var transaction in transactions)
{
foreach(var agent in taAgents)
{
agent.AgentCommission = ((transaction.CommissionPercent / 100) * (agent.CommissionPercent / 100) * transaction.SellingPrice) - agent.BrokerageSplit;
}
}
dataGridView1.DataSource = taAgents;
Basically, a TransactionAgent has a property/column named AgentCommission, which is null for all TransactionAgents in my database.
My goal is to perform the math you see in the foreach(var agent in taAgents) to patch up the value for each agent so that it isn't null.
Oddly, when I run this code and break-point on agent.AgentCommission = (formula) it shows the value is being calculated for AgentCommissision and the object is being updated but after it displays in my datagrid (used only for testing), it does not show the value it calculated.
So, to me, it seems that the Property isn't being permanently set on the object. What's more, If I persist this newly updated object back to the database with an update, I doubt the calculated AgentCommission will be set there.
Without having my table set up the same way, is there anyone that can look at the code and see why I am not retaining the property's value?
IEnumerable<T>s do not guarantee that updated values will persist across enumerations. For instance, a List will return the same set of objects on every iteration, so if you update a property, it will be saved across iterations. However, many other implementations of IEnumerables return a new set of objects each time, so any changes made will not persist.
If you need to store and update the results, pull the IEnumerable<T> down to a List<T> using .ToList() or project it into a new IEnumerable<T> using .Select() with the changes applied.
To specifically apply that to your code, it would look like this:
var transactions = (from t in db.Transactions
where t.SellingPrice != 0
select t).ToList();
var taAgents = (from ta in db.TransactionAgents
select ta).ToList();
foreach (var transaction in transactions)
{
foreach(var agent in taAgents)
{
agent.AgentCommission = ((transaction.CommissionPercent / 100) * (agent.CommissionPercent / 100) * transaction.SellingPrice) - agent.BrokerageSplit;
}
}
dataGridView1.DataSource = taAgents;
Specifically, the problem is that each time you access the IEnumerable, it enumerates over the collection. In this case, the collection is a call to the database. In the first part, you're getting the values from the database and updating them. In the second part, you're getting the values from the database again and setting that as the datasource (or, pedantically, you're setting the enumerator as the datasource, and then that is getting the values from the database).
Use .ToList() or similar to keep the results in memory, and access the same collection every time.
Assuming you are using LINQ to SQL, if EnableObjectTracking is false, then the objects will be constructed new every time the query is run. Otherwise, you would be getting the same object instances each time and your changes would survive. However, like others have shown, instead of having the query execute multiple times, cache the results in a list. Not only will you get what you want working, you'll have fewer database round trips.
I found that I had to locate the item in the list that I wanted to modify, extract the copy, modify the copy (by incrementing its count property), remove the original from the list and add the modified copy.
var x = stats.Where(d => d.word == s).FirstOrDefault();
var statCount = stats.IndexOf(x);
x.count++;
stats.RemoveAt(statCount);
stats.Add(x);
It is helpful to rewrite your LINQ expression using lambdas so that we can consider the code in more explicit terms.
//Original code from question
var taAgents = from ta in db.TransactionAgents
select ta;
//Rewritten to explicitly call attention to what Select() is actually doing
var taAgents = db.TransactionAgents.Select(ta => new TransactionAgents(/*database row's data*/)});
In the rewritten code, we can clearly see that Select() is constructing a new object based on each row returned from the database. What's more, this object construction occurs every time the IEnumerable taAgents is iterated through.
So, explained more concretely, if there are 5 TransactionAgents rows in the database, in the following example, the TransactionAgents() constructor is called a total of 10 times.
// Assume there are 5 rows in the TransactionAgents table
var taAgents = from ta in db.TransactionAgents
select ta;
//foreach will iterate through the IEnumerable, thus calling the TransactionAgents() constructor 5 times
foreach(var ta in taAgents)
{
Console.WriteLine($"first iteration through taAgents - element {ta}");
}
// these first 5 TransactionAgents objects are now out of scope and are destroyed by the GC
//foreach will iterate through the IEnumerable, thus calling the TransactionAgents() constructor 5 MORE times
foreach(var ta in taAgents)
{
Console.WriteLine($"second iteration through taAgents - element {ta}");
}
// these second 5 TransactionAgents objects are now out of scope and are destroyed by the GC
As we can see, all 10 of our TransactionAgents objects were created by the lambda in our Select() method, and do not exist outside of the scope of the foreach statement.
I'm building a repository method (Entity Framework) to take in a collection of ids supplied by checkboxes in a form as part of a CMS, and updating a lookup table (entity set) that relates topics to publications.
I have this method in a respository:
public void AttachToTopics(int pubId, IQueryable<int> topicsForAssociation, IQueryable<int> topicsSubset, int primaryTopicId)
{
// EVERYTHING IS FINE IF I INSERT A MANUAL COLLECTION OF int LIKE THIS:
// var priorAssociatedTopics = new[] { 2 }.AsQueryable(); //
// BUT WHAT I REALLY NEED TO WORK IS THIS:
IQueryable<int> priorAssociatedTopics = ListTopicIdsForPublication(pubId);
var priorAssociatedTopicsToExamine = priorAssociatedTopics.Intersect(topicsSubset);
var topicsToAdd =
associatedTopics.Intersect(topicsSubset).Except(priorAssociatedTopicsToExamine);
foreach (var topicToAdd in topicsToAdd)
AttachToTopic(pubId, topicToAdd);
foreach (var topicToRemove in priorAssociatedTopicsToExamine.Except(associatedTopics))
DetachFromTopic(pubId, topicToRemove);
}
AttachToTopics chokes on the first foreach loop, yielding this error message:
This method supports the LINQ to Entities infrastructure and is not intended to be used directly from your code.
But the problem is really with the first line: the repository method called on that line provides the appropriately typed collection into priorAssociatedTopics, and Intellisense has no problem with explicitly typing it as IQueryable (normally, I'd use var), and the debugger shows that this variable holds a collection of integers.
public IQueryable<int> ListTopicIdsForPublication(int pubId)
{
var topics = from x in DataContext.TopicPublications where x.PublicationId == pubId select x;
return topics.Select(t => t.Id);
}
However, back in attachToTopics, my topicsToAdd collection doesn't get populated, and its Results View in debug holds the aforementioned error message.
Curiously, if I switch in a manually generated IQueryable collection of ints for priorAssociatedTopics (see comment in code, above) the foreach loop works fine. So I beleive I need to find some other way to get priorAssociatedTopics populated with ints from a method call in my repository.
Any clue out there?
Is there any reason that ListTopicIdsForPublication can't return an IEnumerable<int> or an IList<int> in this case? If so, adding a .ToList() at the end of topics.Select(t => t.ID) will ensure that the query gets run at that point.
It's not so much that it's IQueryable<int> that's causing the problem, but rather IQueryable<int> from DataContext.TopicPublications. It looks like it's losing it's contextual data information and that's why you're getting the exception.
Given the following LINQ Statement(s), which will be more efficient?
ONE:
public List<Log> GetLatestLogEntries()
{
var logEntries = from entry in db.Logs
select entry;
return logEntries.ToList().Take(10);
}
TWO:
public List<Log> GetLatestLogEntries()
{
var logEntries = from entry in db.Logs
select entry;
return logEntries.Take(10).ToList();
}
I am aware that .ToList() executes the query immediately.
The first version wouldn't even compile - because the return value of Take is an IEnumerable<T>, not a List<T>. So you'd need it to be:
public List<Log> GetLatestLogEntries()
{
var logEntries = from entry in db.Logs
select entry;
return logEntries.ToList().Take(10).ToList();
}
That would fetch all the data from the database and convert it to a list, then take the first 10 entries, then convert it to a list again.
Getting the Take(10) to occur in the database (i.e. the second form) certainly looks a heck of a lot cheaper to me...
Note that there's no Queryable.ToList() method - you'll end up calling Enumerable.ToList() which will fetch all the entries. In other words, the call to ToList doesn't participate in SQL translation, whereas Take does.
Also note that using a query expression here doesn't make much sense either. I'd write it as:
public List<Log> GetLatestLogEntries()
{
return db.Log.Take(10).ToList();
}
Mind you, you may want an OrderBy call - otherwise it'll just take the first 10 entries it finds, which may not be the latest ones...
Your first option won't work, because .Take(10) converts it to IEnumerable<Log>. Your return type is List<Log>, so you would have to do return logEntries.ToList().Take(10).ToList(), which is more inefficient.
By doing .ToList().Take(10), you are forcing the .Take(10) to be LINQ to objects, while the other way the filter could be passed on to the database or other underlying data source. In other words, if you first do .ToList(), ALL the objects have to be transferred from the database and allocated in memory. THEN you filter to the first 10. If you're talking about millions of database rows (and objects) you can imagine how this is VERY inefficient and not scalable.
The second one will also run immediately because you have .ToList(), so no difference there.
The second version will be more efficient (in both time and memory usage). For example, imagine that you have a sequence containing 1,000,000 items:
The first version iterates through all 1,000,000 items, adding them to a list as it goes. Then, finally, it will take the first 10 items from that large list.
The second version only needs to iterate the first 10 items, adding them to a list as it goes. (The remaining 999,990 items don't even need to be considered.)
How about this ?
I have 5000 records in "items"
version 1:
IQueryable<T> items = Items; // my items
items = ApplyFilteringCriteria(items, filter); // my filter BL
items = ApplySortingCriteria(items, sortBy, sortDir); // my sorting BL
items = items.Skip(0);
items = items.Take(25);
return items.ToList();
this took : 20 sec on server
version 2:
IQueryable<T> items = Items; // my items
items = ApplyFilteringCriteria(items, filter); // my filter BL
items = ApplySortingCriteria(items, sortBy, sortDir); // my sorting BL
List<T> x = items.ToList();
items = x.Skip(0).ToList();
items = x.Take(25).ToList();
return x;
this took : 1 sec on server
What do you think now ? Any idea why ?
The second option.
The first will evaluate the entire enumerable, slurping it into a List(); then you set up the iterator that will iterate through the first ten objects and then exit.
The second sets up the Take() iterator first, so whatever happens later than that, only 10 objects will be evaluated and sent to the "downstream" processing (in this case the ToList() which will take those ten elements and return them as the concrete List).