IEnumerable<T> and .Where Linq method behaviour? - c#

I thought I know everything about IEnumerable<T> but I just met a case that I cannot explain. When we call .Where linq method on a IEnumerable, the execution is deferred until the object is enumerated, isn't it?
So how to explain the sample below :
public class CTest
{
public CTest(int amount)
{
Amount = amount;
}
public int Amount { get; set; }
public override string ToString()
{
return $"Amount:{Amount}";
}
public static IEnumerable<CTest> GenerateEnumerableTest()
{
var tab = new List<int> { 2, 5, 10, 12 };
return tab.Select(t => new CTest(t));
}
}
Nothing bad so far!
But the following test gives me an unexpected result although my knowledge regarding IEnumerable<T> and .Where linq method :
[TestMethod]
public void TestCSharp()
{
var tab = CTest.GenerateEnumerableTest();
foreach (var item in tab.Where(i => i.Amount > 6))
{
item.Amount = item.Amount * 2;
}
foreach (var t in tab)
{
var s = t.ToString();
Debug.Print(s);
}
}
No item from tab will be multiplied by 2. The output will be :
Amount:2
Amount:5
Amount:10
Amount:12
Does anyone can explain why after enumerating tab, I get the original value.
Of course, everything work fine after calling .ToList() just after calling GenerateEnumerableTest() method.

var tab = CTest.GenerateEnumerableTest();
This tab is a LINQ query that generates CTest instances that are initialized from int-values which come from an integer array which will never change. So whenever you ask for this query you will get the "same" instances(with the original Amount).
If you want to "materialize" this query you could use ToList and then change them.
Otherwise you are modifying CTest instances that exist only in the first foreach loop. The second loop enumerates other CTest instances with the unmodified Amount.
So the query contains the informations how to get the items, you could also call the method directly:
foreach (var item in CTest.GenerateEnumerableTest().Where(i => i.Amount > 6))
{
item.Amount = item.Amount * 2;
}
foreach (var t in CTest.GenerateEnumerableTest())
{
// now you don't expect them to be changed, do you?
}

Like many LINQ operations, Select is lazy and use deferred execution so your lambda expression is never being executed, because you're calling Select but never using the results. This is why, everything work fine after calling .ToList() just after calling GenerateEnumerableTest() method:
var tab = CTest.GenerateEnumerableTest().ToList();

Related

IEnumerable<T>.Count() returns 0

I have a variable of type IEnumerable and if I call Count() method on it before foreach loop then it returns the correct count but after the foreach loop it returns 0. Why is that?
[UPDATE]
According to the given answers, I found out that my IEnumerable is kind of one-shot thing. So I've attached my code because I already converted it to a list and returned as IEnumerable. So where I'm doing it wrong?
public async Task<IEnumerable<WorkItem>> Get(int[] workItemIds)
{
return await context.WorkItems
.Where(it => workItemIds.Contains(it.Id))
.ToListAsync();
}
private async Task<int> ApproveOrRejectWorkItems(IEnumerable<WorkItem> workItems, int status)
{
// var workItemsToBeUpdated = workItems.Count();
workItems = workItems.Where(it => it.StatusId == (int)WorkItemStatus.Submitted);
foreach (var workItem in workItems)
{
workItem.StatusId = status;
}
// here value becomes 0
await _unitOfWork.WorkItemRepository.Update(workItems);
return workItems.Count();
}
Thank you for clarifying that you were filtering the list after materializing it in the Get() method.
Your issue is that LINQ is a view. So if you iterate an IEnumerable twice, it will go through the source items twice, applying any filters, projections, etc. to the source items. This means that by changing the source items, the enumerable will yield different items the second time you iterate through it because those items no longer match the filter.
I would suggest you modify the method to be like this:
private async Task<int> ApproveOrRejectWorkItems(IEnumerable<WorkItem> workItems, int status)
{
var workItemsToBeUpdated = workItems.Count();
foreach (var workItem in workItems.Where(it => it.StatusId == (int)WorkItemStatus.Submitted))
{
workItem.StatusId = status;
}
await _unitOfWork.WorkItemRepository.Update(workItems);
return workItems.Count();
}
You're most likely dealing with one of two scenarios:
Outside circumstances change between enumerations
Please consider the following scenario:
var badRecords = repository.GetBadRecords(); //returns IEnumerable<T>
if(badRecords.Count() > n)
{
repository.DeleteBadRecords();
}
foreach( var badRecords in badRecords )
// This enumeration goes to the db again and selects 0 records because we just deleted them.
{
Log(badRecord);
};
The solution is to .ToList() early;
Single use generator
This is similar but slightly different, we're dealing with code that by design allows just one iteration.
public static IEnumerable<int> GetAllNumbers(List<int> availableNumbers) {
while(availableNumbers.Count > 0)
{
var x = availableNumbers[0];
availableNumbers.RemoveAt(0);
yield return x;
}
}
static void Main(string[] args)
{
var numbers = GetAllNumbers(new List<int>{1,2,3});
Console.WriteLine(numbers.Count()); // 3
Console.WriteLine(numbers.Count()); // 0
numbers = GetAllNumbers(new List<int>{1,2,3}).ToList();
Console.WriteLine(numbers.Count()); // 3
Console.WriteLine(numbers.Count()); // 3
}
Resharper's Possible multiple enumeration of IEnumerable warning
BTW. Resharper has a dedicated warning for this: Code Inspection: Possible multiple enumeration of IEnumerable

IEnumerable performs differently on Array vs List

This question is more of a "is my understanding accurate", and if not, please help me get my head around it. I have this bit of code to explain my question:
class Example
{
public string MyString { get; set; }
}
var wtf = new[] { "string1", "string2"};
IEnumerable<Example> transformed = wtf.Select(s => new Example { MyString = s });
IEnumerable<Example> transformedList = wtf.Select(s => new Example { MyString = s }).ToList();
foreach (var i in transformed)
i.MyString = "somethingDifferent";
foreach (var i in transformedList)
i.MyString = "somethingDifferent";
foreach(var i in transformed)
Console.WriteLine(i.MyString);
foreach (var i in transformedList)
Console.WriteLine(i.MyString);
It outputs:
string1
string2
somethingDifferent
somethingDifferent
Both Select() methods at first glance return IEnumerable< Example>. However, underlying types are WhereSelectArrayIterator< string, Example> and List< Example >.
This is where my sanity started to come into question. From my understanding the difference in output above is because of the way both underlying types implement the GetEnumerator() method.
Using this handy website, I was able to (I think) track down the bit of code that was causing the difference.
class WhereSelectArrayIterator<TSource, TResult> : Iterator<TResult>
{ }
Looking at that on line 169 points me to Iterator< TResult>, since that's where it appears GetEnumerator() is called.
Starting on line 90 I see:
public IEnumerator<TSource> GetEnumerator() {
if (threadId == Thread.CurrentThread.ManagedThreadId && state == 0) {
state = 1;
return this;
}
Iterator<TSource> duplicate = Clone();
duplicate.state = 1;
return duplicate;
}
What I gather from that is when you enumerate over it, you're actually enumerating over a cloned source (as written in the WhereSelectArrayIterator class' Clone() method).
This will satisfy my need to understand for now, but as a bonus, if someone could help me figure out why this isn't returned the first time I enumerate over the data. From what I can tell, the state should = 0 the first pass. Unless, perhaps there is magic happening under the hood that is calling the same method from different threads.
Update
At this point I'm thinking my 'findings' were a bit misleading (damn Clone method taking me down the wrong rabbit hole) and it was indeed due to deferred execution. I mistakenly thought that even though I deferred execution, once it was enumerated the first time it would store those values in my variable. I should have known better; after all I was using the new keyword in the Select. That said, it still did open my eyes to the idea that a particular class' GetEnumerator() implementation could still return a clone which would present a very similar problem. It just so happened that my problem was different.
Update2
This is an example of what I thought my problem was. Thanks everyone for the information.
IEnumerable<Example> friendly = new FriendlyExamples();
IEnumerable<Example> notFriendly = new MeanExamples();
foreach (var example in friendly)
example.MyString = "somethingDifferent";
foreach (var example in notFriendly)
example.MyString = "somethingDifferent";
foreach (var example in friendly)
Console.WriteLine(example.MyString);
foreach (var example in notFriendly)
Console.WriteLine(example.MyString);
// somethingDifferent
// somethingDifferent
// string1
// string2
Supporting classes:
class Example
{
public string MyString { get; set; }
public Example(Example example)
{
MyString = example.MyString;
}
public Example(string s)
{
MyString = s;
}
}
class FriendlyExamples : IEnumerable<Example>
{
Example[] wtf = new[] { new Example("string1"), new Example("string2") };
public IEnumerator<Example> GetEnumerator()
{
return wtf.Cast<Example>().GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return wtf.GetEnumerator();
}
}
class MeanExamples : IEnumerable<Example>
{
Example[] wtf = new[] { new Example("string1"), new Example("string2") };
public IEnumerator<Example> GetEnumerator()
{
return wtf.Select(e => new Example(e)).Cast<Example>().GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return wtf.Select(e => new Example(e)).GetEnumerator();
}
}
Linq works by making each function return another IEnumerable that is typically a deferred processor. No actual execution occurs until an enumeration of the finally returned Ienumerable occurs. This allows for the create of efficient pipelines.
When you do
var transformed = wtf.Select(s => new Example { MyString = s });
The select code has not actually executed yet. Only when you finally enumerate transformed will the select be done. ie here
foreach (var i in transformed)
i.MyString = "somethingDifferent";
Note that if you do
foreach (var i in transformed)
i.MyString = "somethingDifferent";
the pipeline will be executed again. Here thats is not a big deal but it can be huge if IO is involved.
this line
var transformedList = wtf.Select(s => new Example { MyString = s }).ToList();
Is the same as
var transformedList = transformed.ToList();
The real eyeopener is to place debug statements or breakpoints inside a where or select to actually see the deferred pipeline execution
reading the implementation of linq is useful. here is select https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,5c652c53e80df013,references

Is it the same to iterate over Linq expression result than to assign it first to a variable?

So, this is more difficult to explain in words, so i will put code examples.
let's suppose i already have a list of clients that i want to filter.
Basically i want to know if this:
foreach(var client in list.Where(c=>c.Age > 20))
{
//Do something
}
is the same as this:
var filteredClients = list.Where(c=>c.Age > 20);
foreach(var client in filteredClients)
{
//Do something
}
I've been told that the first approach executes the .Where() in every iteration.
I'm sorry if this is a duplicate, i couldn't find any related question.
Thanks in advance.
Yes, both those examples are functionally identical. One just stores the result from Enumerable.Where in a variable before accessing it while the other just accesses it directly.
To really see why this will not make a difference, you have to understand what a foreach loop essentially does. The code in your examples (both of them) is basically equivalent to this (I’ve assumed a known type Client here):
IEnumerable<Client> x = list.Where(c=>c.Age > 20);
// foreach loop
IEnumerator<Client> enumerator = x.GetEnumerator();
while (enumerator.MoveNext())
{
Client client = enumerator.Current;
// Do something
}
So what actually happens here is the IEnumerable result from the LINQ method is not consumed directly, but an enumerator of it is requested first. And then the foreach loop does nothing else than repeatedly asking for a new object from the enumerator and processing the current element in each loop body.
Looking at this, it doesn’t make sense whether the x in the above code is really an x (i.e. a previously stored variable), or whether it’s the list.Where() call itself. Only the enumerator object—which is created just once—is used in the loop.
Now to cover that SharePoint example which Colin posted. It looks like this:
SPList activeList = SPContext.Current.List;
for (int i=0; i < activeList.Items.Count; i++)
{
SPListItem listItem = activeList.Items[i];
// do stuff
}
This is a fundamentally different thing though. Since this is not using a foreach loop, we do not get that one enumerator object which we use to iterate through the list. Instead, we repeatedly access activeList.Items: Once in the loop body to get an item by index, and once in the continuation condition of the for loop where we get the collection’s Count property value.
Unfortunately, Microsoft does not follow its own guidelines all the time, so even if Items is a property on the SPList object, it actually is creating a new SPListItemCollection object every time. And that object is empty by default and will only lazily load the actual items when you first access an item from it. So above code will eventually create a large amount of SPListItemCollections which will each fetch the items from the database. This behavior is also mentioned in the remarks section of the property documentation.
This generally violates Microsoft’s own guidelines on choosing a property vs a method:
Do use a method, rather than a property, in the following situations.
The operation returns a different result each time it is called, even if the parameters do not change.
Note that if we used a foreach loop for that SharePoint example again, then everything would have been fine, since we would have again only requested a single SPListItemCollection and created a single enumerator for it:
foreach (SPListItem listItem in activeList.Items.Cast<SPListItem>())
{ … }
They are not quite the same:
Here is the original C# code:
static void ForWithVariable(IEnumerable<Person> clients)
{
var adults = clients.Where(x => x.Age > 20);
foreach (var client in adults)
{
Console.WriteLine(client.Age.ToString());
}
}
static void ForWithoutVariable(IEnumerable<Person> clients)
{
foreach (var client in clients.Where(x => x.Age > 20))
{
Console.WriteLine(client.Age.ToString());
}
}
Here is the decompiled Intermediate Language (IL) code this results in (according to ILSpy):
private static void ForWithVariable(IEnumerable<Person> clients)
{
Func<Person, bool> arg_21_1;
if ((arg_21_1 = Program.<>c.<>9__1_0) == null)
{
arg_21_1 = (Program.<>c.<>9__1_0 = new Func<Person, bool>(Program.<>c.<>9.<ForWithVariable>b__1_0));
}
IEnumerable<Person> enumerable = clients.Where(arg_21_1);
foreach (Person current in enumerable)
{
Console.WriteLine(current.Age.ToString());
}
}
private static void ForWithoutVariable(IEnumerable<Person> clients)
{
Func<Person, bool> arg_22_1;
if ((arg_22_1 = Program.<>c.<>9__2_0) == null)
{
arg_22_1 = (Program.<>c.<>9__2_0 = new Func<Person, bool>(Program.<>c.<>9.<ForWithoutVariable>b__2_0));
}
foreach (Person current in clients.Where(arg_22_1))
{
Console.WriteLine(current.Age.ToString());
}
}
As you can see, there is a key difference:
IEnumerable<Person> enumerable = clients.Where(arg_21_1);
A more practical question, however, is whether the differences hurt performance. I concocted a test to measure that.
class Program
{
public static void Main()
{
Measure(ForEachWithVariable);
Measure(ForEachWithoutVariable);
Console.ReadKey();
}
static void Measure(Action<List<Person>, List<Person>> action)
{
var clients = new[]
{
new Person { Age = 10 },
new Person { Age = 20 },
new Person { Age = 30 },
}.ToList();
var adultClients = new List<Person>();
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 1E6; i++)
action(clients, adultClients);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds.ToString());
Console.WriteLine($"{adultClients.Count} adult clients found");
}
static void ForEachWithVariable(List<Person> clients, List<Person> adultClients)
{
var adults = clients.Where(x => x.Age > 20);
foreach (var client in adults)
adultClients.Add(client);
}
static void ForEachWithoutVariable(List<Person> clients, List<Person> adultClients)
{
foreach (var client in clients.Where(x => x.Age > 20))
adultClients.Add(client);
}
}
class Person
{
public int Age { get; set; }
}
After several runs of the program, I was not able to find any significant difference between ForEachWithVariable and ForEachWithoutVariable. They were always close in time, and neither was consistently faster than the other. Interestingly, if I change 1E6 to just 1000, the ForEachWithVariable is actually consistently slower, by about 1 millisecond.
So, I conclude that for LINQ to Objects, there is no practical difference. The same type of test could be run if your particular use case involves LINQ to Entities (or SharePoint).

EntityFramework : does IQueryable or IEnumerable get all the results at the first place?

I learned that IQueryable or IEnumerable datatypes do not return the results at the first place and only return them when needed. However when I open that object in the watch inspector I saw all the objects are there.
Is there anything wrong in my code or it just showing because I had call it on the watch ?
[When I view the pendings object in the watch dialogbox I saw all the list items but it shouldn't load at the first place. Is there anything wrong in my approaching or is it just showing because I had call it on the watch.]
public IQueryable<PurchasePendingView> PurchasePendings() {
var pendings = db.PurchasePendingViews
.Where(m => m.AccountStatusID != StructAccountStatus.Succeed); // when I view it in the watch dialougebox I saw all the list items but it shouldn't load at the first place. Is there anything wrong in my approaching or is it just showing because I had call it on the watch.
if (ApplicationEnvironment.DEBUGGING) {
return pendings;
} else if (IsMobileAccount()) {
var showroom = db.ShowRooms.FirstOrDefault(m=> m.MemberID == employee.MemberID);
if (showroom != null) {
return pendings.Where(m => m.ShowRoomID == showroom.ShowRoomID);
} else {
return pendings.Where(m => m.CountryDivisionID == employee.CountryDivisionID);
}
} else {
//normal salary employee can see every detail
return pendings;
}
}
Note: Currently my lazy loading is off.
The collections are evaluated the first time you iterate through the results.
Since you're iterating through the results in the watch inspector, they are evalauated then.
This is easier to demonstrate than to explain:
public class MeanException : Exception
{
public MeanException() : base() { }
public MeanException(string message) : base(message) { }
}
public static IEnumerable<T> Break<T>(this IEnumerable<T> source)
where T : new()
{
if (source != null)
{
throw new MeanException("Sequence was evaluated");
}
if (source == null)
{
throw new MeanException("Sequence was evaluated");
}
//unreachable
//this will make this an iterator block so that it will have differed execution,
//just like most other LINQ extension methods
yield return new T();
}
public static IEnumerable<int> getQuery()
{
var list = new List<int> { 1, 2, 3, 4, 5 };
var query = list.Select(n => n + 1)
.Break()
.Where(n => n % 2 == 0);
return query;
}
So, what do we have here. We have a custom exception so we can catch it independently. We have an extension method for IEnumerable<T> that will always throw an exception as soon as the sequence is evaluated, but it uses deferred execution, just like Select and Where. Finally we have a method to get a query. We can see a LINQ method both before and after the Break call, and we can see that a List is used as the underlying data source. (In your example it could be either some collection in memory, or an object that will go query a database and then iterate over the results when iterated.)
Now let's use this query and see what happens:
try
{
Console.WriteLine("Before fetching query");
IEnumerable<int> query = getQuery();
Console.WriteLine("After fetching query");
foreach (var number in query)
{
Console.WriteLine("Inside foreach loop");
}
Console.WriteLine("After foreach loop");
}
catch (MeanException ex)
{
Console.WriteLine("Exception thrown: \n{0}", ex.ToString());
}
What you'll see if you run this code is the print before the query (obviously) the print after the query (meaning we just returned the query from a method and the mean exception was never thrown) and next the message that the exception was thrown (meaning we never got inside of, or past the end of, the foreach loop.
This is obviously a bit of a contrived example to demonstrate a concept, but this is something you'll actually see in practice often enough. For example, if you lose your connection with the database after creating your data context you won't actually get an exception until you iterate the query, or if your data holder objects are out of date and don't match the DB any longer you'll get an exception at the same point. In a less obvious example, if you hold onto the query for an extended period of time you will end up with data at the time you fetch the results of the query, not when you build it. Here is another demonstration of that:
var list = new List<int> { 1, 2, 3, 4, 5 };
var query = list.Where(num => num < 5);
Console.WriteLine(query.Count());
list.RemoveAll(num => num < 4);
Console.WriteLine(query.Count()
Here we have a list of data, and count the number of items less than 5 (it's 4). Then we modify the list (without changing query at all). We re-query query and end up with a count of 1.

IQueryable remove from the collection, best way?

IQueryable<SomeType> collection = GetCollection();
foreach (var c in collection)
{
//do some complex checking that can't be embedded in a query
//based on results from prev line we want to discard the 'c' object
}
//here I only want the results of collection - the discarded objects
So with that simple code what is the best way to get the results. Should I created a List just before the foreach and insert the objects I want to keep, or is there some other way that would be better to do this type of thing.
I know there are other posts on similar topics but I just don't feel I'm getting what I need out of them.
Edit I tried this
var collection = GetCollection().Where(s =>
{
if (s.property == 1)
{
int num= Number(s);
double avg = Avg(s.x);
if (num > avg)
return true;
else
return false;
}
else return false;
});
I tried this but was given "A lambda expression with a statement body cannot be converted to an expression tree" on compile. Did I not do something right?
//do some complex checking that can't be embedded in a query
I don't get it. You can pass a delegate which can point to a very complex function (Turing-complete) that checks whether you should discard it or not:
var result = GetCollection().AsEnumerable().Where(c => {
// ...
// process "c"
// return true if you want it in the collection
});
If you want, you can refactor it in another function:
var result = GetCollection.Where(FunctionThatChecksToDiscardOrNot);
If you wrap it into another method, you can use yield return and then iterate over the returned collection, like so:
public IEnumerable<SomeType> FindResults(IQueryable<SomeType> collection) {
foreach (var c in collection)
{
if (doComplicatedQuery(c)) {
yield return c;
}
}
}
// elsewhere
foreach (var goodItem in FindResults(GetCollection())) {
// do stuff.
}

Categories

Resources