Will this LINQ query enumerate through the entire enumeration? - c#

I'm trying to upload multiple files for an ASP.NET website. As such, I have the following simplified example:
public ContentResult UploadFiles(IList<HttpPostedFileBase> files)
{
if (files == null ||
!files.Any())
{
// Nothing uploaded.
return Content("No files where uploaded.");
}
myService.UploadCsvFilesToDatabaseAsync(files.Select(x => x.InputStream));
return Content("Success!");
}
Notice this: myService.UploadCsvFilesToDatabaseAsync(files.Select(x => x.InputStream)
I do not want to enumerate through the files at this point .. but later on when I'm inside that method.
So, does files.Select(x => x.InputStream) do any enumerating at this point ? Or does it just pass in a new collection, which contains the start of each input stream, ready to be enumerated.
Clarification: I do not want to read any data in from the files at this point but a little bit later on inside, that method.

So, does files.Select(x => x.InputStream) do any enumerating at this
point ?
NO, quoting from Documentation
This method is implemented by using deferred execution. The immediate
return value is an object that stores all the information that is
required to perform the action. The query represented by this method
is not executed until the object is enumerated either by calling its
GetEnumerator method directly or by using foreach
So until you are actually enumerating through it, it actually doesn't process the list. A simple test would be below
List<int> intlist = new List<int>() {1,2,3};
var result = intlist.Select(x => x);
intlist.Add(12);
foreach (var item in result)
{
Console.WriteLine(item);
}
This would consider even the element which had been added after the LINQ expression. Would have it been eager loaded it would have not been included the element 12.
Results In:
1
2
3
12
Update: For a complete example of both deferred execution AND already-executed, have a look at this .NET fiddle example.

Related

One item gets deleted while iterating an IEnumerable<T>

I have a method that takes an IEnumerable, filters it further and loops through the filtered collection
to modify one property.
I am observing a very weird behaviour.
While the method loops through the filtered IEnumerable<Entity>, after a few iterations (I've not exactly counted how many),
one of the items in it gets deleted.
private async Task<bool> UpdateSomeValue(IEnumerable<BusinessEntity> entities, BusinessEntity entityToDelete)
{
//FIlter the IENumerable
var entitiesToUpdateSequence = entities
.Where(f => f.Sequence > entityToDelete.Sequence);
if (entitiesToUpdateSequence.Any())
{
var testList = new List<FormBE>(entitiesToUpdateSequence);
Debug.WriteLine(entitiesToUpdateSequence.Count()); // 5
//DUring this loop, after a few iterations, one item gets deleted
foreach (var entity in testList)
{
entity.Sequence -= 1;
}
Debug.WriteLine(entitiesToUpdateSequence.Count()); // 4
return await _someRepo.UpdateEntitySequence(entityToDelete.Id1, entityToDelete.ID2, testList);
}
return await Task.FromResult(true);
}
THis method is called like this:
var entities = await entitiesTask.ConfigureAwait(false);
var entityToDelete = entities.Single(f => f.Key.Equals("someValue"));
var updated = await UpdateSomeValue(entities, entityToDelete);
and that's it, there's no other reference to the entities collection. Therefore, it cannot be modified from any other thread.
I've temprorarily found a word around by copy the filtered IEnumerable in a List and then using the List for further operation
(List content remains the same after the loop).
What may be causing this issue?
Check out the documentation on Enumerable.Where. Specifically, the Remarks.
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
Which means that when you call Where you're not necessarily getting back an object such as a List or Array that just has X number of items in it. You're getting back an object that knows how to filter the IEnumerable<T> you called Where on, based on the predicate you provided. When you iterate that object, such as with a foreach loop or a call to Enumerable.Count() each item in the source IEnumerable<T> is evaluated against the predicate you provided and only the items that satisfy that predicate are returned.
Since the predicate you're providing checks the Sequence property, and you're modifying that property inside the first foreach loop, the second time you iterate entitiesToUpdateSequence fewer items match the predicate you provided and so you get a lower count. If you were to increment Sequence instead of decrement it, you might end up with a higher count the second time you iterate entitiesToUpdateSequence.

Converting foreach to Linq

Current Code:
For each element in the MapEntryTable, check the properties IsDisplayedColumn and IsReturnColumn and if they are true then add the element to another set of lists, its running time would be O(n), there would be many elements with both properties as false, so will not get added to any of the lists in the loop.
foreach (var mapEntry in MapEntryTable)
{
if (mapEntry.IsDisplayedColumn)
Type1.DisplayColumnId.Add(mapEntry.OutputColumnId);
if (mapEntry.IsReturnColumn)
Type1.ReturnColumnId.Add(mapEntry.OutputColumnId);
}
Following is the Linq version of doing the same:
MapEntryTable.Where(x => x.IsDisplayedColumn == true).ToList().ForEach(mapEntry => Type1.DisplayColumnId.Add(mapEntry.OutputColumnId));
MapEntryTable.Where(x => x.IsReturnColumn == true).ToList().ForEach(mapEntry => Type1.ReturnColumnId.Add(mapEntry.OutputColumnId));
I am converting all such foreach code to linq, as I am learning it, but my question is:
Do I get any advantage of Linq conversion in this case or is it a disadvantage ?
Is there a better way to do the same using Linq
UPDATE:
Consider the condition where out of 1000 elements in the list 80% have both properties false, then does where provides me a benefit of quickly finding elements with a given condition.
Type1 is a custom type with set of List<int> structures, DisplayColumnId and ReturnColumnId
ForEach ins't a LINQ method. It's a method of List. And not only is it not a part of LINQ, it's very much against the very values and patterns of LINQ. Eric Lippet explains this in a blog post that was written when he was a principle developer on the C# compiler team.
Your "LINQ" approach also:
Completely unnecessarily copies all of the items to be added into a list, which is both wasteful in time and memory and also conflicts with LINQ's goals of deferred execution when executing queries.
Isn't actually a query with the exception of the Where operator. You're acting on the items in the query, rather than performing a query. LINQ is a querying tool, not a tool for manipulating data sets.
You're iterating the source sequence twice. This may or may not be a problem, depending on what the source sequence actually is and what the costs of iterating it are.
A solution that uses LINQ as much as is it is designed for would be to use it like so:
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsDisplayedColumn))
list1.DisplayColumnId.Add(mapEntry.OutputColumnId);
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsReturnColumn))
list2.ReturnColumnId.Add(mapEntry.OutputColumnId);
I would say stick with the original way with the foreach loop, since you are only iterating through the list 1 time over.
also your linq should look more like this:
list1.DisplayColumnId.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn).Select(mapEntry => mapEntry.OutputColumnId));
list2.ReturnColumnId.AddRange(MapEntryTable.Where(x => x.IsReturnColumn).Select(mapEntry => mapEntry.OutputColumnId));
The performance of foreach vs Linq ForEach are almost exactly the same, within nano seconds of each other. Assuming you have the same internal logic in the loop in both versions when testing.
However a for loop, outperforms both by a LARGE margin. for(int i; i < count; ++i) is much faster than both. Because a for loop doesn't rely on an IEnumerable implementation (overhead). The for loop compiles to x86 register index/jump code. It maintains an incrementor, and then it's up to you to retrieve the item by it's index in the loop.
Using a Linq ForEach loop though does have a big disadvantage. You cannot break out of the loop. If you need to do that you have to maintain a boolean like "breakLoop = false", set it to true, and have each recursive exit if breakLoop is true... Bad performing there. Secondly you cannot use continue, instead you use "return".
I never use Linq's foreach loop.
If you are dealing with linq, e.g.
List<Thing> things = .....;
var oldThings = things.Where(p.DateTime.Year < DateTime.Now.Year);
That internally will foreach with linq and give you back only the items with a year less than the current year. Cool..
But if I am doing this:
List<Thing> things = new List<Thing>();
foreach(XElement node in Results) {
things.Add(new Thing(node));
}
I don't need to use a linq for each loop. Even if I did...
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing") {
if (node.Ignore) {
continue;
}
thing.Add(node);
}
even though I could write that cleaner like
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing" && !node.Ignore) {
thing.Add(node);
}
There is no real reason I can think of to do this..>
things.ForEach(thing => {
//do something
//can't break
//can't continue
return; //<- continue
});
And if I want the fastest loop possible,
for (int i = 0; i < things.Count; ++i) {
var thing = things[i];
//do something
}
Will be faster.
Your LINQ isn't quite right as you're converting the results of Where to a List and then pseudo-iterating over those results with ForEach to add to another list. Use ToList or AddRange for converting or adding sequences to lists.
Example, where overwriting list1 (if it were actually a List<T>):
list1 = MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId).ToList();
or to append:
list1.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId));
In C#, to do what you want functionally in one call, you have to write your own partition method. If you are open to using F#, you can use List.Partition<'T>
https://msdn.microsoft.com/en-us/library/ee353782.aspx

Repeat list endlessly (Zip a finite list with an Infinite repeating sequence)

UserList is a list of dictionaries, like:
[
{Name:"Alex",Age:25},
{Name:"Peter",Age:35},
{Name:"Muhammad",Age:28},
{Name:"Raul",Age:29}
]
RowColorList is a list of colors: [#bcf,#fc0]
The new UserList should contain one RowColor for every name, taken in sequence from RowColorList:
[
{Name:"Alex",Age:25,RowColor:#bcf},
{Name:"Peter",Age:35,RowColor:#fc0},
{Name:"Muhammad",Age:28,RowColor:#bcf},
{Name:"Raul",Age:29,RowColor:#fc0}
]
I tried the following code:
UserList.Zip(RowColorList,(user,color) => user.Add("RowColor",color))
With this code, the new UserList will only contain as many entries as are in RowColorList. I would like him to start from the beginning of RowColorList again, whenever the available colors are used up. How?
You can create a function to return an infinite enumerable of Color / string (or whatever the type of RowColor is), by using yield return as a lazy generator:
public IEnumerable<Color> InfiniteColors()
{
while (true)
{
foreach (var color in RowColors)
{
yield return color;
}
}
}
This can then be used with any of the Linq IEnumerable extension methods such as Zip.
UserList.Zip(InfiniteColors(),(user,color) => user.Add("RowColor",color))
Edit - Explanation
The reason why InfiniteColors doesn't hang is because the state machine will yield back to the caller after each result, and Zip will terminate on the first enumerable to complete, which is because the other collection being zipped is finite (i.e. UserList)
Obviously you shouldn't try and Zip the InfiniteColors enumerable with itself, nor should you try and materialize InfiniteColors, i.e. don't call InfiniteColors.ToList() or such :-):
Something like this should do the trick:
var i = 0;
var l = RowColorList.Count;
UserList.ForEach(user => user.Add("RowColor", RowColorList[(i++) % l]));
The % operator will guarantee "cyclic" access to the RowColorList.

decorate IEnumerable without looping

I need to create an IEnummerable of DcumentSearch object from IQueryable
The following code causes the database to load the entire result which makes my app slow.
public static IEnumerable<DocumentSearch> BuildDocumentSearch(IQueryable<Document> documents)
{
var enumerator = documents.GetEnumerator();
while(enumerator.MoveNext())
{
yield return new DocumentSearch(enumerator.Current);
}
}
The natural way of writing this is:
public static IEnumerable<DocumentSearch> BuildDocumentSearch(IQueryable<Document> documents)
{
return documents.Select(doc => new DocumentSearch(doc));
}
When you call one of the IEnumerable extension methods like Select, Where, OrderBy etc, you are still adding to the recipe for the results that will be returned. When you try to access an element of an IEnumerable (as in your example), the result set must be resolved at that time.
For what it's worth, your while loop would be more naturally written as a foreach loop, though it should have the same semantics about when the query is executed.

C# - Populate a list using lambda expressions or LINQ

It's been a while since I've used lambda expressions or LINQ and am wondering how I would do the following (I know I can use a foreach loop, this is just out of curiosity) using both methods.
I have an array of string paths (does it make a difference if it's an array or list here?) from which I want to return a new list of just the filenames.
i.e. using a foreach loop it would be:
string[] paths = getPaths();
List<string> listToReturn = new List<string>();
foreach (string path in paths)
{
listToReturn.add(Path.GetFileName(path));
}
return listToReturn;
How would I do the same thing with both lambda and LINQ?
EDIT: In my case, I'm using the returned list as an ItemsSource for a ListBox (WPF) so I'm assuming it's going to need to be a list as opposed to an IEnumerable?
Your main tool would be the .Select() method.
string[] paths = getPaths();
var fileNames = paths.Select(p => Path.GetFileName(p));
does it make a difference if it's an array or list here?
No, an array also implements IEnumerable<T>
Note that this minimal approach involves deferred execution, meaning that fileNames is an IEnumerable<string> and only starts iterating over the source array when you get elements from it.
If you want a List (to be safe), use
string[] paths = getPaths();
var fileNames = paths.Select(p => Path.GetFileName(p)).ToList();
But when there are many files you might want to go the opposite direction (get the results interleaved, faster) by also using a deferred execution source:
var filePaths = Directory.EnumerateFiles(...); // requires Fx4
var fileNames = filePaths.Select(p => Path.GetFileName(p));
It depends on what you want to do next with fileNames.
I think by "LINQ" you really mean "a query expression" but:
// Query expression
var listToReturn = (from path in paths
select Path.GetFileName(path)).ToList();
// Extension methods and a lambda
var listToReturn = paths.Select(path => Path.GetFileName(path))
.ToList();
// Extension methods and a method group conversion
var listToReturn = paths.Select(Path.GetFileName)
.ToList();
Note how the last one works by constructing the projection delegate from a method group, like this:
Func<string, string> projection = Path.GetFileName;
var listToReturn = paths.Select(projection).ToList();
(Just in case that wasn't clear.)
Note that if you don't need to use this as a list - if you just want to iterate over it, in other words - you can drop the ToList() call from each of these approaches.
It's just:
var listToReturn = getPaths().Select(x => Path.GetFileName(x)).ToList();
As already stated in other answers, if you don't actually need a List<string> you can omit the ToList() and simply return IEnumerable<string> (for example if you just need to iterate it, IEnumerable<> is better because avoids the creation of an other list of strings)
Also, given that Select() method takes a delegate, and there's an implicit conversion between method groups and delegates having the same signature, you can skip the lambda and just do:
getPaths().Select(Path.GetFileName)
You could do it like this:
return getPaths().Select(Path.GetFileName);
listToReturn = paths.ToList().Select(p => Path.GetFileName(p));

Categories

Resources