Does
foreach(T value in new List<T>(oldList) )
is dangerous (costly) when oldList contains 1 millions of object T ?
More generaly what is the best way to enumerate over oldList given that elements can be added/removed during the enumeration...
The general rule is, you should not modify the same collection in which you are enumerating. If you want to do something like that, keep another collection which will keep track of which elements to add/remove from the original collection and then after exiting from the loop, perform the add/remove operation on the original collection.
I usually just create a list for all the objects to be removed or added.
Within the foreach I just add the items to the appropriate collections and modify the original collection after the foreach have completed (loop through the removeItems and addItems collection)
just like this
var itemsToBeRemoved = new List<T>();
foreach (T item in myHugeList)
{
if (/*<condition>*/)
itemsToBeRemoved.Add(item);
}
myHugeList.RemoveRange(itemsToBeRemoved);
You could iterate through the list without using an enumerator, so do something like...
for(int i = 0;i<oldList.Count;i++) {
var value = oldList[i];
...
if(itemRemoveCondition) {
oldList.RemoveAt(i--);
}
}
If you mean you can add/remove objects from another thread, I would:
1-synchronize the threads
2- in the add/remove threads, create a list of items to be added or deleted
3- and then delete these items in a critical section (so it is small - you don't have to synch while adding the items to the delete list)
If you dont want to do that, you can use for instead of foreach, that would avoid the exception, but you would have to take extra care so you do not get other kinds of exceptions
foreach(T value in new List(oldList).ToList() ) - give a try
For me, first thing is you should consider using some kind of data paging, because having such 1-milion-items-large list could be dangerous itself.
Have you heard about Unit of Work pattern?
You can implement it so you mark objects for create, update or delete, and later, you call "SaveChanges", "Commit" or any other doing the job of "apply changes", and you'll get done.
For example, you iterate over the enumerable (oldList) and you mark them as "delete". Later, you call "SaveChanges" and the more abstract, generic unit of work will iterate over the small, filtered list of objects to work with.
http://martinfowler.com/eaaCatalog/unitOfWork.html
Anyway, avoid lists of a milion items. You should work with paged lists of objects.
It will be 'slow' but there is not much more you can do about it, except running it on a background thread. E.g. using a BackgroundWorker.
If your operations on the list only occur on one thread, the correct approach is to add the items to add/remove to seperate lists, and perform those operations after your iterations has finished.
If you use multiple threads you will have to look into multithreaded programming, and e.g. use locks or probably better a ReaderWriterLock.
UPDATE:
As mentioned in another Stack Overflow question, this is now possible without any effort in .NET 4.0 when using concurrent collections.
If you are using Foreach loop for modifying collection then you will get this error as below.
List<string> li = new List<string>();
li.Add("bhanu");
li.Add("test");
foreach (string s in li)
{
li.Remove(s);
}
Solution - use For Loop as below.
for (int i = 0; i < li.Count; i++)
{
li.RemoveAt(i);
i--;
}
you can use a flag to switch the modification to a temporary list while the original is being enumerated.
/// where you are enumerating
isBeingEnumerated = true
foreach(T value in new List<T>(oldList) )
isBeingEnumerated = false
SyncList(oldList with temporaryList)
/// where you are modifying while enumerating
if isBeingEnumerated then
use a temporaryList to make the changes.
Related
I have a ConcurrentBag of objects, and I want to do following over it:
enumerate all items with a where filtering.
for each item, check some properties, and based on the values, make some method call. After the method call, it's better to remove the item form the bag.
modify some properties' value and save it to the bag.
So basically I need something like following:
foreach (var item in myBag.Where(it => it.Property1 = true))
{
if (item.Property2 = true)
{
SomeMethodToReadTheItem(item);
//it's better to remove this item from the bag here, but
//there is a permeance hit, then just leave it.
}
else
{
item.Property3= "new value";
//now how do I save the item back to the bag?
}
}
Of cause it should be done in a thread-safe way. I know that the enumeration over a ConcurrentBag is actually over a "snapshot" of the real bag, but how about with a where clause filter? Should I do a ToList to prevent it form making a new "snapshot"?
Also if you want to modify one specific item, you just bag.TryTake(out item). But since I've already get the item in the enumeration, should I "take" it again?
Any explanation/comment/sample would be very much apricated.
Thank you.
I'll try to answer specific parts of your question without addressing the performance.
First off, the Where method takes an IEnumerable<T> as its first parameter and will itself iterate over the enumerable which will call GetEnumerator() once so you will only take one snapshot of the underlying ConcurrentBag.
Secondly the thread-safety of your code is not very clear, there may be some implicit guarantees in the rest of your code which are not specified. For example you have a ConcurrentBag so your collection is thread-safe however you modify the items contained within that collection without any thread synchronisation. If there is other code that runs the same method or in another method that reads/modifies the items in the ConcurrentBag concurrently then you may see data races.
Note that it is not necessary to call TryTake if you already have a reference to the item as it will only return the same reference.
I recommend you just create a new list and, if the WHERE filter, add it to this new list.
It would look something like this:
List<T> myNewList = new List<T>();
foreach (var item in myBag.Where(it => it.Property1))
{
if (!item.Property2)
{
myNewList.Add(item);
}
}
attention to " ! "
I have a class that has several List<T> objects in it. These Lists are "associated" so that the first items in each are related, and the second ones, and so on (kind of like fields within a single record). I want to loop through the Lists together to alter some of the data simultaneously per "record".
With a foreach loop, I can loop through one List without tracking the record via i or some such. However, I don't know how to simultaneously access the related items in the other Lists. Do I have to count it out using a variable like i, or is there a better way? I'm still pretty new to generics and class-based programming. Am I totally missing a better way to arrange this data?
So this is kind of a fun problem... Note that I suspect some different data modeling might have been able to get around this issue, but if you stored the related items together in a Tuple you could get away from having sync'ed lists... It seems very dangerous to have these sync'ed lists and rely on the fact that they should all correspond at "i" in that any sorting, grouping, or paging (Skip/Take) could break this paradigm.
If you stored them in a List<Tuple<ItemTypeFromList1, ItemTypeFromList2, ... ItemTypeFromListN>> then you could keep the items together in a single list such that you could do a single iteration over the list and then just act on the N items in the tuple appropriately
Use a standard for loop and an index (your i) that will allow you to access the same element in each array. There is no better way to do it.
How about collecting all data for the 'row' in a single class and place instances of this class in a single list as opposed to multiple lists you are trying to keep in synch
The easiest way I can think of would be to use a standard for-loop. When the index is important I always prefer for-loops instead of foreach.
for(int i = 0; i < list1.Count(); i++)
{
list1[i].someMethod();
list2[i].someMethod();
...
}
I assume all lists are of equal length when they are related as you say.
You might want to look into grouping the related items together in a single class and then have only one list, instead of multiple.
Try using following code
foreach (var i in firstList )
{
var s1 = secondList[firstList.LastIndexOf(i)];
var s2 = thirdList[firstList.LastIndexOf(i)];
}
Hope this is the answer you want..:)
Sometimes it is useful to enumerate a list while it is changing.
e.g.
foreach (var item in listOfEntities)
item.Update();
// somewhere else (with someEntity contained in listOfEntities)
// an add or remove is made:
someEntity.OnUpdate += (s,e) => listOfEntities.Remove(someEntity);
This will fail if listOfEntities is a List<T>.
There are workarounds like making a copy or a simple for-loop, each with different drawbacks, but I would like to know if there is a list type in the framework (or open source) that supports this.
Look at the collections in System.Collections.Concurrent. There's no list there, but the collections' enumerators do "represents a moment-in-time snapshot of the contents of the [collection]".
These collections are designed for access from multiple threads, so they will be better suited to applications like the code sample you posted.
This has nothing to do with List<T>; it is a limitation of the enumerator. If you change the state of the collection underneath the enumerator it will throw, period.
You could use a for loop, but you will then run into logical errors as you index into a collection after the number of items have changed.
It's probably a bad idea to swap items in and out of a collection while you are enumerating it in another thread. I would stick with the tried and true method of recording the items to be removed in another collection or locking the collection while it is being enumerated.
I'm not claiming this is an impossible problem to solve, I just don't know of an easy way to do it.
I came across a method to change a list in a foreach loop by converting to a list in itself like this:
foreach (var item in myList.ToList())
{
//add or remove items from myList
}
(If you attempt to modify myList directly an error is thrown since the enumerator basically locks it)
This works because it's not the original myList that's being modified. My question is, does this method create garbage when the loop is over (namely from the List that's returned from the ToList method? For small loops, would it be preferable to using a for loop to avoid the creation of garbage?
The second list is going to be garbage, there will be garbage for an enumerator that is used in building the second list, and add in the enumerator that the foreach would spawn, which you would have had with or without the second list.
Should you switch to a for? Maybe, if you can point to this region of code being a true performance bottleneck. Otherwise, code for simplicity and maintainability.
Yes. ToList() would create another list that would need to be garbage collected.
That's an interesting technique which I will keep in mind for the future! (I can't believe I've never thought of that!)
Anyway, yes, the list that you are building doesn't magically unallocate itself. The possible performance problems with this technique are:
Increased memory usage (building a List, separate from the IEnumerable). Probably not that big of a deal, unless you do this very frequently, or the IEnumerable is very large.
Decreased speed, since it has to go through the IEnumerable at once to build the List.
Also, if enumerating the IEnumerable has side effects, they will all be triggered by this process.
Unless this is actually inside an inner loop, or you're working with very large data sets, you can probably do this without any problems.
Yes, the ToList() method creates "garbage". I would just indexing.
for (int i = MyList.Count - 1; 0 <= i; --i)
{
var item = MyList[i];
//add or remove items from myList
}
It's non-deterministic. But the reference created from the call ToList() will be GCd eventually.
I wouldn't worry about it too much, since all it would be holding at most would be references or small value types.
Whats the best/easiest way to obtain a count of items within an IEnumerable collection without enumerating over all of the items in the collection?
Possible with LINQ or Lambda?
In any case, you have to loop through it. Linq offers the Count method:
var result = myenum.Count();
The solution depends on why you don't want to enumerate through the collection.
If it's because enumerating the collection might be slow, then there is no solution that will be faster. You might want to consider using an ICollection instead if possible. Unless the enumeration is remarkably slow (e.g. it reads items from disk) speed shouldn't be a problem though.
If it's because enumerating the collection will require more code then it's already been written for you in the form of the .Count() extension method. Just use MyEnumerable.Count().
If it's because you want to be able to enumerate the collection after you've counted then the .Count() extension method allows for this. You can even call .Count() on a collection you're in the middle of enumerating and it will carry on from where it was before the count. For example:
foreach (int item in Series.Generate(5))
{
Console.WriteLine(item + "(" + myEnumerable.Count() + ")");
}
will give the results
0 (5)
1 (5)
2 (5)
3 (5)
4 (5)
If it's because the enumeration has side effects (e.g. writes to disk/console) or is dependant on variables that may change between counting and enumerating (e.g. reads from disk) [N.B. If possible, I would suggest rethinking the architecture as this can cause a lot of problems] then one possibility to consider is reading the enumeration into an intermittent storage. For example:
List<int> seriesAsList = Series.Generate(5).ToList();
All of the above assume you can't change the type (i.e. it is returned from a library that you do not own). If possible you might want to consider changing to use an ICollection or IList (ICollection being more widely scoped than IList) which has a Count property on it.
You will have to enumerate to get a count. Other constructs like the List keep a running count.
Use this.
IEnumerable list =..........;
list.OfType<T>().Count()
it will return the count.
There's also IList or ICollection, if you want to use a construct that is still somewhat flexible, but also has the feature you require. They both imply IEnumerable.
It also depends on what you want to achieve by counting.. If you are interested to find if the enumerable collection has any elements, you could use
myEnumerable.Any() over myEnumerable.Count() where the former will yield the first element and the later will yield all the elements.
An IEnumerable will have to iterate through every item. to get the full count.
If you just need to check if there is one or more items in an IEnumerable a more efficient method is to check if there are any. Any() only check to see there is a value and does not loop through everything.
IEnumerable myStrings = new List(){"one","two", "three"};
bool hasValues = myStrings.Any();
Not possible with LINQ, as calling .Count(...) does enumerate the collection. If you're running into the problem where you can't iterate through a collection twice, try this:
List<MyTableItem> myList = dataContext.MyTable.ToList();
int myTableCount = myList.Count;
foreach (MyTableItem in myList)
{
...
}
If you need to count and then loop you may be better off with a list.
If you're using count to check for members you can use Any() to avoid enumerating the entire collection.
The best solution -as I think is to do the following:
using System.Linq.Dynamic;
myEnumerable.AsQueryable().Count()
When I want to use the Count property, I use ILIST which implements IEnumerable and ICollection interfaces. The ILIST data structure is an Array. I stepped through using the VS Debugger and found that the .Count property below returns the Array.Length property.
IList<string> FileServerVideos = Directory.GetFiles(VIDEOSERVERPATH, "*.mp4");
if (FileServerVideos.Count == 0)
return;