Removing items from collection while iterating IEnumerable

Removing items from collection while iterating IEnumerable - c#

I have the following code:
foreach (var bar in dataFromDataFeed.Where(bar => bar.Key < fromTicks
|| bar.Key > toTicks))
{
dataFromDataFeed.Remove(bar.Key);
}
Is this safe or do I need to convert the IEnumerable within foreach to a Dictionary<T,U> first?
Thanks.

No, that will either bomb or get you a result that still has bad elements. Just invert the Where expression:
var filtered = dataFromDataFeed.Where(bar => bar.Key >= fromTicks && bar.Key <= toTicks);
dataFromFeed = filtered.ToList(); // optional
It isn't clear whether you actually need the list to be updated, often it is not necessary since you have a perfectly good enumerator, so the last statement is // optional.
Keep in mind that using Remove() like you did in your original code has O(n*m) complexity, very bad. Using ToList() is only O(m) but requires O(m) storage. Trading speed for memory is a common programmer's decision but this is a slamdunk unless m is huge (hundreds of millions and you're fighting OOM) or very small. Neither should apply, given the expression.

It is considered bad practice to modify a collection while it is being enumerated, and in many cases this will cause an InvalidOperationException to be thrown.
You should copy the values into another List or array (e.g. by calling ToList() after your Where() call), and then you won't be modifying the original data.
foreach (var bar in dataFromDataFeed.Where(bar => bar.Key < fromTicks || bar.Key > toTicks).ToList())
{
dataFromDataFeed.Remove(bar.Key);
}

It will throw an error because you are iterating over the collection. Modification of collection is not supported while you are enumerating over it.
Create new List using ToList()
foreach (var bar in dataFromDataFeed.Where(bar => bar.Key < fromTicks
|| bar.Key > toTicks).ToList())
{
dataFromDataFeed.Remove(bar.Key);
}

Although Dictionary could almost certainly have included a method which would all a predicate function on each item and remove all items where that predicate returned true, it doesn't. Consequently, if one has a Dictionary that includes some items which match a predicate and others which don't, the only ways to get a dictionary including only the items which don't satisfy the predicate are either to build a list of all the items satisfying the predicate and then remove from the dictionary all the items on the list, or else build a dictionary containing only items that don't satisfy the predicate and abandon the original in favor of the new one. Which approach is better will depend upon the relative numbers of items to be kept and discarded.
As an alternative, one could switch to using a ConcurrentDictionary. Unlike a Dictionary, a ConcurrentDictionary will allow items to be removed without invalidating any enumearation in progress. If one only removes items as they are enumerated, I would expect a ConcurrentDictionary to enumerate exactly as one would expects. If enumerating one item would sometimes cause code to delete a different item, then code must be prepared for the fact that removing an item which has not yet been enumerated might, but is not required to, cause that item to be omitted from the enumeration.
Although Dictionary is apt to generally be faster than ConcurrentDictionary, it may be worthwhile to use the latter if "delete items where..." operations are common and would have to either delete or copy a significant fraction of the items in the collection.

Related

How to use conditional in List.ForEach()?

I need to remove items from the HttpSession collection. In the following code, myList contains the same items as Session. If there are items in myList/Session that are not in itemsToRemove, they should be deleted from the session collection.
However, I'm not sure what the lambda syntax should look like. The following isn't correct.
myList.ForEach(x => !itemsToRemove.Contains(x) { Session.Remove(x) });
Any ideas how I can use a lambda expression to put everything on one line to accomplish this task?
Also, is there a way to avoid creating the intermediate list (myList)? I'm only doing that because I can't remove items from Session while iterating through it.

The most naïve way:
myList.Where(x => !itemsToRemove.Contains(x)) // LINQ extension method
.ToList() <----
.ForEach(x => Session.Remove(x)); // List<T> method so this is required |
Also you can use this:
mystList.Except(itemsToRemove)
.ToList()
.ForEach(x => Session.Remove(x));
But to use ForEach the underlying type should be List<T> so you need to call ToList() first. What causes 1 excess enumeration of the whole collection.
I would do this instead:
foreach (var x in mystList.Except(itemsToRemove))
{
Session.Remove(x)
}
This will minimize the number of enumerations.

First off, abatischev's answer is excellent. It's ideal from both a performance perspective and a readability perspective. If, however, you really want to cram all the functionality into one statement (which I don't recommend), you could try the following:
Session.OfType<string>()
.Except(itemsToRemove)
.ToList()
.ForEach(x => Session.Remove(x));
As abatischev metnioned, the ToList() call costs you an extra enumeration through the collection, which could have a non-trivial performance impact if the collection has a large number of elements in it. However, it means the ForEach() call iterates over a newly created List<string>, which fills the role of your myList and lets you remove items from the Session (since you're iterating through that temporary list, rather than the Session).
(Note that I haven't worked with HttpSessionState objects myself, merely looked at their MSDN article. You may need to replace the string generic type with something else if strings aren't what HttpSessionState holds.)

Does the foreach the same as classic for?

Regarding the collections implementing this[int] and assuming the collection won't change during the enumeration, does the foreach (var item in list) loop produce the same sequence as for (var i = 0; i < list.Count; ++i) anytime?
This means, when I need the ascending order by index, could I use foreach or is just simply safer to use for? Or it just depends on the curren collection implementation and migh vary or change in time?

foreach (var item in list)
{
// do things
}
translates to
var enumerator = list.GetEnumerator();
while(enumerator.MoveNext())
{
var item = enumerator.Current;
// do things
}
So as you can see, it's not using the indexor list[i] in the general case.
For most collections types, however, the semantics is the same.
edit
There are IList<T> implementations where the enumerator IList<T> as a linked list, it's very unlikely you will use the indexor in your enumerator implementation, as it would be very inefficient.
As a rule of thumb, using foreach ensure you use the most efficient algorithm for the class at hand, as it is the one chosen by the class' Creator. In the worst case, you will just suffer a small indirection overhead that is very unlikely to be noticeable.
edit 2 after nos's comment
There is a case where the semantics of the two constructs varies widly: the collection modification.
While using a simple for loop, nothing particular will happen if you change the collection while iterating through it. The program will behave as if it assumed you know what you're doing. This could result in some values iterated over more than once or other skipped, but no exception as long as you're not accessing outside of the range of the indexor (which would require a multithreaded program ot happen).
While using a foreachloop; if you modify the collection while iterating through it, you enter undefined behavior. The documentation tells us
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and its behavior is undefined.
In that case, expect most of C# built-in types to throw an InvalidOperationException, but everything can happen in a custom implementation, from missed values to repeated values , including infinite loops...

Generally speaking, yes, but strictly spoken: no. It really depends on the implementation.
Usually with for you would use the this indexer properties. foreach uses GetEnumerator() to get the enumerator that iterates over the collection. Depending on the implementation the enumerator might yield another result than the for.
The implied logic of a list is that is has a specific order, and when implementing IList you may state is it save to assume that the order of both the indexer properties as the enumerator are the same.

There is no guarantee that this would be the case. The code paths can be completely separate. Of course collections like List will produce the same result but you can write data structures (even useful ones) that do not.
The indexer is just a property with additional index argument. You can return a random value if you feel like it.

One important think you should have in mind, as a difference between the 2 is that inside foreach you can/t make any changes to the enumarared objects.
If you wish to alter (basicaly delete) objects from the enumeration you must use a for loop

Empty HashSet - Count vs Any

I am only interested to know whether a HashSet hs is empty or not.
I am NOT interested to know exactly how many elements it contains.
So I could use this:
bool isEmpty = (hs.Count == 0);
...or this:
bool isEmpty = hs.Any(x=>true);
Which one provides better results, performance-wise(specially when the HashSet contains a large number of elements) ?

On a HashSet you can use both, since HashSet internally manages the count.
However, if your data is in an IEnumerable<T> or IQueryable<T> object, using result.Any() is preferable over result.Count() (Both Linq Methods).
Linq's .Count() will iterate through the whole Enumerable, .Any() will only peek if any objects exists within the Enumerable or not.
Update:
Just small addition:
In your case with the HashSet .Count may be preferable as .Any() would require an IEmumerator to be created and returned which is a small overhead if you are not going to use the Enumerator anywhere in your code (foreach, Linq, etc.). But I think that would be considered "Micro optimization".

HastSet<T> implements ICollection<T>, which has a Count property, so a call to Count() will just call HastSet<T>.Count, which I'm assuming is an O(1) operation (meaning it doesn't actually have to count - it just returns the current size of the HashSet).
Any will iterate until it finds an item that matches the condition, then stop.
So in your case, it will just iterate one item, then stop, so the difference will probably be negligible.
If you had a filter that you wanted to apply (e.g. x => x.IsValid) then Any would definitely be faster since Count(x => x.IsValid) would iterate over the entire collection, while Any would stop as soon as if finds a match.
For those reasons I generally prefer to use Any() rather than Count()==0 since it's more direct and avoids any potential performance problems. I would only switch to Count()==0 if it provided a significant performance boost over Any().
Note that Any(x=>true) is logically the same as calling Any(). That doesn't change your question, but it looks cleaner without the lambda.

Depending on the type of collection, it may or may not matter performance-wise. So why not just use hs.Any() since that is designed for exactly what you need to know?
And the lambda expression x => true has no meaning here. You can leave that out.

Find a set number of values from collection

I have some code that filters through a collection of sorted objects according to a filter value. For instance, I want to find the objects where Name=="searchquery". Then I want to take the top X values from that collection.
My questions:
My collection is a List<T>. Does this collection guarantee the sort order?
If so, is there a built-in way to find the the top X objects that satisfy the condition? I'm looking for something like
collection.FindAll(o=>o.Name=="searchquery",100);
That would give me the top 100 objects that satisfy the condition. The reason is performance, once I've found my 100 objects, I don't want to keep checking the entire collection.
If i write:
collection.FindAll(o=>o.Name=="searchquery").Take(100);
will the runtime be intelligent enough to stop checking once it hits 100?
I can of course implement this myself, but if there is a built-in way (like a LInQ method) I'd prefer to use it.

collection.Where(o=>o.Name=="searchquery").Take(100)
The order should be in the same order as the original list, and it will stop checking once it takes 100 elements (Where returns an enumeration which is only evaluated as you take elements). From the documentation:
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
If you need a different sort order, you will have to specify it (this of course means you have no choice but to examine all elements though).

Ok,
My collection is a List<T>. Does this collection guarantee the sort order?
No, but it will preserve the order of insertion.
If so, is there a built-in way to find the the top X objects that satisfy the condition?
someEnumerable.Where(r => r.Name == "searchquery").Take(100)
If i write:
// Some linq that works
will the runtime be intelligent enough to stop checking once it hits 100?
Yes, probably
Now, if you have a IList that has been sorted and you want to quickly iterate the top 100 items do this.
var list = sourceEnumerable.OrderBy(r => r.Name).ToList();
foreach(var r in list.Where(r => r.Name == "searchquery").Take(100))
{
// Do something
}

collection.Where(o=>o.Name=="searchquery").Take(100)
Is the most correct answer, because behind the scene Where is deferred execution, below is how Where method is implemented:
Where(this IEnumerable<T>, Func<T, bool> func)
{
foreach (var item in collection)
{
if (func(item))
{
yield return item;
}
}
}
So when calling Take(100), the loop just finds first 100 items which satisfy the criteria.

If you know for sure that the objects in your collection are not repeated (e.g.like a primary key), then you can use SortedList instead of List<T>. This will guarantee, that your list will be sorted when you filter it using a certain criteria. Have a look here for sorted list example:
http://msdn.microsoft.com/en-us/library/system.collections.sortedlist(v=vs.100).aspx

The best way to get a count of IEnumerable<T>

Whats the best/easiest way to obtain a count of items within an IEnumerable collection without enumerating over all of the items in the collection?
Possible with LINQ or Lambda?

In any case, you have to loop through it. Linq offers the Count method:
var result = myenum.Count();

The solution depends on why you don't want to enumerate through the collection.
If it's because enumerating the collection might be slow, then there is no solution that will be faster. You might want to consider using an ICollection instead if possible. Unless the enumeration is remarkably slow (e.g. it reads items from disk) speed shouldn't be a problem though.
If it's because enumerating the collection will require more code then it's already been written for you in the form of the .Count() extension method. Just use MyEnumerable.Count().
If it's because you want to be able to enumerate the collection after you've counted then the .Count() extension method allows for this. You can even call .Count() on a collection you're in the middle of enumerating and it will carry on from where it was before the count. For example:
foreach (int item in Series.Generate(5))
{
Console.WriteLine(item + "(" + myEnumerable.Count() + ")");
}
will give the results
0 (5)
1 (5)
2 (5)
3 (5)
4 (5)
If it's because the enumeration has side effects (e.g. writes to disk/console) or is dependant on variables that may change between counting and enumerating (e.g. reads from disk) [N.B. If possible, I would suggest rethinking the architecture as this can cause a lot of problems] then one possibility to consider is reading the enumeration into an intermittent storage. For example:
List<int> seriesAsList = Series.Generate(5).ToList();
All of the above assume you can't change the type (i.e. it is returned from a library that you do not own). If possible you might want to consider changing to use an ICollection or IList (ICollection being more widely scoped than IList) which has a Count property on it.

You will have to enumerate to get a count. Other constructs like the List keep a running count.

Use this.
IEnumerable list =..........;
list.OfType<T>().Count()
it will return the count.

There's also IList or ICollection, if you want to use a construct that is still somewhat flexible, but also has the feature you require. They both imply IEnumerable.

It also depends on what you want to achieve by counting.. If you are interested to find if the enumerable collection has any elements, you could use
myEnumerable.Any() over myEnumerable.Count() where the former will yield the first element and the later will yield all the elements.

An IEnumerable will have to iterate through every item. to get the full count.
If you just need to check if there is one or more items in an IEnumerable a more efficient method is to check if there are any. Any() only check to see there is a value and does not loop through everything.
IEnumerable myStrings = new List(){"one","two", "three"};
bool hasValues = myStrings.Any();

Not possible with LINQ, as calling .Count(...) does enumerate the collection. If you're running into the problem where you can't iterate through a collection twice, try this:
List<MyTableItem> myList = dataContext.MyTable.ToList();
int myTableCount = myList.Count;
foreach (MyTableItem in myList)
{
...
}

If you need to count and then loop you may be better off with a list.
If you're using count to check for members you can use Any() to avoid enumerating the entire collection.

The best solution -as I think is to do the following:
using System.Linq.Dynamic;
myEnumerable.AsQueryable().Count()

When I want to use the Count property, I use ILIST which implements IEnumerable and ICollection interfaces. The ILIST data structure is an Array. I stepped through using the VS Debugger and found that the .Count property below returns the Array.Length property.
IList<string> FileServerVideos = Directory.GetFiles(VIDEOSERVERPATH, "*.mp4");
if (FileServerVideos.Count == 0)
return;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Removing items from collection while iterating IEnumerable - c#

I have the following code: foreach (var bar in dataFromDataFeed.Where(bar => bar.Key < fromTicks || bar.Key > toTicks)) { dataFromDataFeed.Remove(bar.Key); } Is this safe or do I need to convert the IEnumerable within foreach to a Dictionary<T,U> first? Thanks.

Related

How to use conditional in List.ForEach()?

Does the foreach the same as classic for?

Empty HashSet - Count vs Any

Find a set number of values from collection

The best way to get a count of IEnumerable<T>

Categories

Resources