Whats the best/easiest way to obtain a count of items within an IEnumerable collection without enumerating over all of the items in the collection?
Possible with LINQ or Lambda?
In any case, you have to loop through it. Linq offers the Count method:
var result = myenum.Count();
The solution depends on why you don't want to enumerate through the collection.
If it's because enumerating the collection might be slow, then there is no solution that will be faster. You might want to consider using an ICollection instead if possible. Unless the enumeration is remarkably slow (e.g. it reads items from disk) speed shouldn't be a problem though.
If it's because enumerating the collection will require more code then it's already been written for you in the form of the .Count() extension method. Just use MyEnumerable.Count().
If it's because you want to be able to enumerate the collection after you've counted then the .Count() extension method allows for this. You can even call .Count() on a collection you're in the middle of enumerating and it will carry on from where it was before the count. For example:
foreach (int item in Series.Generate(5))
{
Console.WriteLine(item + "(" + myEnumerable.Count() + ")");
}
will give the results
0 (5)
1 (5)
2 (5)
3 (5)
4 (5)
If it's because the enumeration has side effects (e.g. writes to disk/console) or is dependant on variables that may change between counting and enumerating (e.g. reads from disk) [N.B. If possible, I would suggest rethinking the architecture as this can cause a lot of problems] then one possibility to consider is reading the enumeration into an intermittent storage. For example:
List<int> seriesAsList = Series.Generate(5).ToList();
All of the above assume you can't change the type (i.e. it is returned from a library that you do not own). If possible you might want to consider changing to use an ICollection or IList (ICollection being more widely scoped than IList) which has a Count property on it.
You will have to enumerate to get a count. Other constructs like the List keep a running count.
Use this.
IEnumerable list =..........;
list.OfType<T>().Count()
it will return the count.
There's also IList or ICollection, if you want to use a construct that is still somewhat flexible, but also has the feature you require. They both imply IEnumerable.
It also depends on what you want to achieve by counting.. If you are interested to find if the enumerable collection has any elements, you could use
myEnumerable.Any() over myEnumerable.Count() where the former will yield the first element and the later will yield all the elements.
An IEnumerable will have to iterate through every item. to get the full count.
If you just need to check if there is one or more items in an IEnumerable a more efficient method is to check if there are any. Any() only check to see there is a value and does not loop through everything.
IEnumerable myStrings = new List(){"one","two", "three"};
bool hasValues = myStrings.Any();
Not possible with LINQ, as calling .Count(...) does enumerate the collection. If you're running into the problem where you can't iterate through a collection twice, try this:
List<MyTableItem> myList = dataContext.MyTable.ToList();
int myTableCount = myList.Count;
foreach (MyTableItem in myList)
{
...
}
If you need to count and then loop you may be better off with a list.
If you're using count to check for members you can use Any() to avoid enumerating the entire collection.
The best solution -as I think is to do the following:
using System.Linq.Dynamic;
myEnumerable.AsQueryable().Count()
When I want to use the Count property, I use ILIST which implements IEnumerable and ICollection interfaces. The ILIST data structure is an Array. I stepped through using the VS Debugger and found that the .Count property below returns the Array.Length property.
IList<string> FileServerVideos = Directory.GetFiles(VIDEOSERVERPATH, "*.mp4");
if (FileServerVideos.Count == 0)
return;
Related
I have the following code:
foreach (var bar in dataFromDataFeed.Where(bar => bar.Key < fromTicks
|| bar.Key > toTicks))
{
dataFromDataFeed.Remove(bar.Key);
}
Is this safe or do I need to convert the IEnumerable within foreach to a Dictionary<T,U> first?
Thanks.
No, that will either bomb or get you a result that still has bad elements. Just invert the Where expression:
var filtered = dataFromDataFeed.Where(bar => bar.Key >= fromTicks && bar.Key <= toTicks);
dataFromFeed = filtered.ToList(); // optional
It isn't clear whether you actually need the list to be updated, often it is not necessary since you have a perfectly good enumerator, so the last statement is // optional.
Keep in mind that using Remove() like you did in your original code has O(n*m) complexity, very bad. Using ToList() is only O(m) but requires O(m) storage. Trading speed for memory is a common programmer's decision but this is a slamdunk unless m is huge (hundreds of millions and you're fighting OOM) or very small. Neither should apply, given the expression.
It is considered bad practice to modify a collection while it is being enumerated, and in many cases this will cause an InvalidOperationException to be thrown.
You should copy the values into another List or array (e.g. by calling ToList() after your Where() call), and then you won't be modifying the original data.
foreach (var bar in dataFromDataFeed.Where(bar => bar.Key < fromTicks || bar.Key > toTicks).ToList())
{
dataFromDataFeed.Remove(bar.Key);
}
It will throw an error because you are iterating over the collection. Modification of collection is not supported while you are enumerating over it.
Create new List using ToList()
foreach (var bar in dataFromDataFeed.Where(bar => bar.Key < fromTicks
|| bar.Key > toTicks).ToList())
{
dataFromDataFeed.Remove(bar.Key);
}
Although Dictionary could almost certainly have included a method which would all a predicate function on each item and remove all items where that predicate returned true, it doesn't. Consequently, if one has a Dictionary that includes some items which match a predicate and others which don't, the only ways to get a dictionary including only the items which don't satisfy the predicate are either to build a list of all the items satisfying the predicate and then remove from the dictionary all the items on the list, or else build a dictionary containing only items that don't satisfy the predicate and abandon the original in favor of the new one. Which approach is better will depend upon the relative numbers of items to be kept and discarded.
As an alternative, one could switch to using a ConcurrentDictionary. Unlike a Dictionary, a ConcurrentDictionary will allow items to be removed without invalidating any enumearation in progress. If one only removes items as they are enumerated, I would expect a ConcurrentDictionary to enumerate exactly as one would expects. If enumerating one item would sometimes cause code to delete a different item, then code must be prepared for the fact that removing an item which has not yet been enumerated might, but is not required to, cause that item to be omitted from the enumeration.
Although Dictionary is apt to generally be faster than ConcurrentDictionary, it may be worthwhile to use the latter if "delete items where..." operations are common and would have to either delete or copy a significant fraction of the items in the collection.
Is there any speed improvement or indeed point in checking the Count() of an Enumerable before iterating/foreaching over the collection?
List<int> numbers = new List<int>();
if(numbers.Count() > 0)
{
foreach(var i in numbers) {/*Do something*/}
}
No, the opposite can be true. If it's not a collection (like a List or Array) but a deferred executed query it must be executed completely which can be very expensive, just to determine the count. In your example it's actually a List, Enumerable.Count is clever enough to try to cast it to a ICollection<T>/ICollection first . If that succeeds the Count property is used.
So just use the foreach. It doesn't hurt if the sequence is empty, the loop will be exited immediately.
For the same reason it's better to use Enumerable.Any instead of Count() > 0 if you just want to check if a sequence contains at least one element. The intention is also more clear.
If your Enumarable is lazy evaluated (LINQ) the call to Count() is actually very bad since it evaluates the whole Enumerable.
Since foreach doesn't execute if the Enumarable is empty it's best to not use Count.
I don't believe there is any speed improvement or indeed point in checking the Count() of an Enumerable before iterating/foreaching over the collection. As no code is executed within the foreach block if there isn't any items in the collection anyway.
Even if you want to check for the count, use Count instead of the extension method Count() since the performance of Count() is worse then Count
Short answer: No, on the contrary.Longer answer: The foreach keyword calls methods on the IEnumerator interface. These methods implicitly check for items being present. So calling Count() is purely redundant.
I am only interested to know whether a HashSet hs is empty or not.
I am NOT interested to know exactly how many elements it contains.
So I could use this:
bool isEmpty = (hs.Count == 0);
...or this:
bool isEmpty = hs.Any(x=>true);
Which one provides better results, performance-wise(specially when the HashSet contains a large number of elements) ?
On a HashSet you can use both, since HashSet internally manages the count.
However, if your data is in an IEnumerable<T> or IQueryable<T> object, using result.Any() is preferable over result.Count() (Both Linq Methods).
Linq's .Count() will iterate through the whole Enumerable, .Any() will only peek if any objects exists within the Enumerable or not.
Update:
Just small addition:
In your case with the HashSet .Count may be preferable as .Any() would require an IEmumerator to be created and returned which is a small overhead if you are not going to use the Enumerator anywhere in your code (foreach, Linq, etc.). But I think that would be considered "Micro optimization".
HastSet<T> implements ICollection<T>, which has a Count property, so a call to Count() will just call HastSet<T>.Count, which I'm assuming is an O(1) operation (meaning it doesn't actually have to count - it just returns the current size of the HashSet).
Any will iterate until it finds an item that matches the condition, then stop.
So in your case, it will just iterate one item, then stop, so the difference will probably be negligible.
If you had a filter that you wanted to apply (e.g. x => x.IsValid) then Any would definitely be faster since Count(x => x.IsValid) would iterate over the entire collection, while Any would stop as soon as if finds a match.
For those reasons I generally prefer to use Any() rather than Count()==0 since it's more direct and avoids any potential performance problems. I would only switch to Count()==0 if it provided a significant performance boost over Any().
Note that Any(x=>true) is logically the same as calling Any(). That doesn't change your question, but it looks cleaner without the lambda.
Depending on the type of collection, it may or may not matter performance-wise. So why not just use hs.Any() since that is designed for exactly what you need to know?
And the lambda expression x => true has no meaning here. You can leave that out.
I have some code that filters through a collection of sorted objects according to a filter value. For instance, I want to find the objects where Name=="searchquery". Then I want to take the top X values from that collection.
My questions:
My collection is a List<T>. Does this collection guarantee the sort order?
If so, is there a built-in way to find the the top X objects that satisfy the condition? I'm looking for something like
collection.FindAll(o=>o.Name=="searchquery",100);
That would give me the top 100 objects that satisfy the condition. The reason is performance, once I've found my 100 objects, I don't want to keep checking the entire collection.
If i write:
collection.FindAll(o=>o.Name=="searchquery").Take(100);
will the runtime be intelligent enough to stop checking once it hits 100?
I can of course implement this myself, but if there is a built-in way (like a LInQ method) I'd prefer to use it.
collection.Where(o=>o.Name=="searchquery").Take(100)
The order should be in the same order as the original list, and it will stop checking once it takes 100 elements (Where returns an enumeration which is only evaluated as you take elements). From the documentation:
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
If you need a different sort order, you will have to specify it (this of course means you have no choice but to examine all elements though).
Ok,
My collection is a List<T>. Does this collection guarantee the sort order?
No, but it will preserve the order of insertion.
If so, is there a built-in way to find the the top X objects that satisfy the condition?
someEnumerable.Where(r => r.Name == "searchquery").Take(100)
If i write:
// Some linq that works
will the runtime be intelligent enough to stop checking once it hits 100?
Yes, probably
Now, if you have a IList that has been sorted and you want to quickly iterate the top 100 items do this.
var list = sourceEnumerable.OrderBy(r => r.Name).ToList();
foreach(var r in list.Where(r => r.Name == "searchquery").Take(100))
{
// Do something
}
collection.Where(o=>o.Name=="searchquery").Take(100)
Is the most correct answer, because behind the scene Where is deferred execution, below is how Where method is implemented:
Where(this IEnumerable<T>, Func<T, bool> func)
{
foreach (var item in collection)
{
if (func(item))
{
yield return item;
}
}
}
So when calling Take(100), the loop just finds first 100 items which satisfy the criteria.
If you know for sure that the objects in your collection are not repeated (e.g.like a primary key), then you can use SortedList instead of List<T>. This will guarantee, that your list will be sorted when you filter it using a certain criteria. Have a look here for sorted list example:
http://msdn.microsoft.com/en-us/library/system.collections.sortedlist(v=vs.100).aspx
Does
foreach(T value in new List<T>(oldList) )
is dangerous (costly) when oldList contains 1 millions of object T ?
More generaly what is the best way to enumerate over oldList given that elements can be added/removed during the enumeration...
The general rule is, you should not modify the same collection in which you are enumerating. If you want to do something like that, keep another collection which will keep track of which elements to add/remove from the original collection and then after exiting from the loop, perform the add/remove operation on the original collection.
I usually just create a list for all the objects to be removed or added.
Within the foreach I just add the items to the appropriate collections and modify the original collection after the foreach have completed (loop through the removeItems and addItems collection)
just like this
var itemsToBeRemoved = new List<T>();
foreach (T item in myHugeList)
{
if (/*<condition>*/)
itemsToBeRemoved.Add(item);
}
myHugeList.RemoveRange(itemsToBeRemoved);
You could iterate through the list without using an enumerator, so do something like...
for(int i = 0;i<oldList.Count;i++) {
var value = oldList[i];
...
if(itemRemoveCondition) {
oldList.RemoveAt(i--);
}
}
If you mean you can add/remove objects from another thread, I would:
1-synchronize the threads
2- in the add/remove threads, create a list of items to be added or deleted
3- and then delete these items in a critical section (so it is small - you don't have to synch while adding the items to the delete list)
If you dont want to do that, you can use for instead of foreach, that would avoid the exception, but you would have to take extra care so you do not get other kinds of exceptions
foreach(T value in new List(oldList).ToList() ) - give a try
For me, first thing is you should consider using some kind of data paging, because having such 1-milion-items-large list could be dangerous itself.
Have you heard about Unit of Work pattern?
You can implement it so you mark objects for create, update or delete, and later, you call "SaveChanges", "Commit" or any other doing the job of "apply changes", and you'll get done.
For example, you iterate over the enumerable (oldList) and you mark them as "delete". Later, you call "SaveChanges" and the more abstract, generic unit of work will iterate over the small, filtered list of objects to work with.
http://martinfowler.com/eaaCatalog/unitOfWork.html
Anyway, avoid lists of a milion items. You should work with paged lists of objects.
It will be 'slow' but there is not much more you can do about it, except running it on a background thread. E.g. using a BackgroundWorker.
If your operations on the list only occur on one thread, the correct approach is to add the items to add/remove to seperate lists, and perform those operations after your iterations has finished.
If you use multiple threads you will have to look into multithreaded programming, and e.g. use locks or probably better a ReaderWriterLock.
UPDATE:
As mentioned in another Stack Overflow question, this is now possible without any effort in .NET 4.0 when using concurrent collections.
If you are using Foreach loop for modifying collection then you will get this error as below.
List<string> li = new List<string>();
li.Add("bhanu");
li.Add("test");
foreach (string s in li)
{
li.Remove(s);
}
Solution - use For Loop as below.
for (int i = 0; i < li.Count; i++)
{
li.RemoveAt(i);
i--;
}
you can use a flag to switch the modification to a temporary list while the original is being enumerated.
/// where you are enumerating
isBeingEnumerated = true
foreach(T value in new List<T>(oldList) )
isBeingEnumerated = false
SyncList(oldList with temporaryList)
/// where you are modifying while enumerating
if isBeingEnumerated then
use a temporaryList to make the changes.