Is the following pseudocode thread-safe ?
IList<T> dataList = SomeNhibernateRepository.GetData();
Parallel.For(..i..)
{
foreach(var item in dataList)
{
DoSomething(item);
}
}
The list never gets changed, it's only iterated and read in parallel. No writing to fields or something like that whatsoever.
Thanks.
Yes, List<T> is fine to read from multiple threads concurrently, so long as nothing's writing.
From the documentation:
A List<T> can support multiple readers concurrently, as long as the collection is not modified.
EDIT: Note that your code doesn't necessarily use List<T> - just an IList<T>. Do you know the type returned by GetData()? If you're in control of GetData() you probably want to document that the list returned by it is thread-safe for reading, if it's actually returning a List<T>.
It's fully thread-safe as long as DoSomething(item) doesn't modify dataList. Since you said it doesn't, then yes, that is thread-safe.
To make sure no one is going to change your list, you could access it through an IEnumerable
IEnumerable<T> dataList = SomeNhibernateRepository.GetData();
Parallel.For(..i..)
{
foreach(var item in dataList)
{
DoSomething(item);
}
}
If whay you say is correct then I would say so. But what you say or think may not what happen in reality. How can you say in code what you have said. How to enforce the constraint that List is never modified?
Related
I have a ConcurrentBag of objects, and I want to do following over it:
enumerate all items with a where filtering.
for each item, check some properties, and based on the values, make some method call. After the method call, it's better to remove the item form the bag.
modify some properties' value and save it to the bag.
So basically I need something like following:
foreach (var item in myBag.Where(it => it.Property1 = true))
{
if (item.Property2 = true)
{
SomeMethodToReadTheItem(item);
//it's better to remove this item from the bag here, but
//there is a permeance hit, then just leave it.
}
else
{
item.Property3= "new value";
//now how do I save the item back to the bag?
}
}
Of cause it should be done in a thread-safe way. I know that the enumeration over a ConcurrentBag is actually over a "snapshot" of the real bag, but how about with a where clause filter? Should I do a ToList to prevent it form making a new "snapshot"?
Also if you want to modify one specific item, you just bag.TryTake(out item). But since I've already get the item in the enumeration, should I "take" it again?
Any explanation/comment/sample would be very much apricated.
Thank you.
I'll try to answer specific parts of your question without addressing the performance.
First off, the Where method takes an IEnumerable<T> as its first parameter and will itself iterate over the enumerable which will call GetEnumerator() once so you will only take one snapshot of the underlying ConcurrentBag.
Secondly the thread-safety of your code is not very clear, there may be some implicit guarantees in the rest of your code which are not specified. For example you have a ConcurrentBag so your collection is thread-safe however you modify the items contained within that collection without any thread synchronisation. If there is other code that runs the same method or in another method that reads/modifies the items in the ConcurrentBag concurrently then you may see data races.
Note that it is not necessary to call TryTake if you already have a reference to the item as it will only return the same reference.
I recommend you just create a new list and, if the WHERE filter, add it to this new list.
It would look something like this:
List<T> myNewList = new List<T>();
foreach (var item in myBag.Where(it => it.Property1))
{
if (!item.Property2)
{
myNewList.Add(item);
}
}
attention to " ! "
Sometimes it is useful to enumerate a list while it is changing.
e.g.
foreach (var item in listOfEntities)
item.Update();
// somewhere else (with someEntity contained in listOfEntities)
// an add or remove is made:
someEntity.OnUpdate += (s,e) => listOfEntities.Remove(someEntity);
This will fail if listOfEntities is a List<T>.
There are workarounds like making a copy or a simple for-loop, each with different drawbacks, but I would like to know if there is a list type in the framework (or open source) that supports this.
Look at the collections in System.Collections.Concurrent. There's no list there, but the collections' enumerators do "represents a moment-in-time snapshot of the contents of the [collection]".
These collections are designed for access from multiple threads, so they will be better suited to applications like the code sample you posted.
This has nothing to do with List<T>; it is a limitation of the enumerator. If you change the state of the collection underneath the enumerator it will throw, period.
You could use a for loop, but you will then run into logical errors as you index into a collection after the number of items have changed.
It's probably a bad idea to swap items in and out of a collection while you are enumerating it in another thread. I would stick with the tried and true method of recording the items to be removed in another collection or locking the collection while it is being enumerated.
I'm not claiming this is an impossible problem to solve, I just don't know of an easy way to do it.
Does
foreach(T value in new List<T>(oldList) )
is dangerous (costly) when oldList contains 1 millions of object T ?
More generaly what is the best way to enumerate over oldList given that elements can be added/removed during the enumeration...
The general rule is, you should not modify the same collection in which you are enumerating. If you want to do something like that, keep another collection which will keep track of which elements to add/remove from the original collection and then after exiting from the loop, perform the add/remove operation on the original collection.
I usually just create a list for all the objects to be removed or added.
Within the foreach I just add the items to the appropriate collections and modify the original collection after the foreach have completed (loop through the removeItems and addItems collection)
just like this
var itemsToBeRemoved = new List<T>();
foreach (T item in myHugeList)
{
if (/*<condition>*/)
itemsToBeRemoved.Add(item);
}
myHugeList.RemoveRange(itemsToBeRemoved);
You could iterate through the list without using an enumerator, so do something like...
for(int i = 0;i<oldList.Count;i++) {
var value = oldList[i];
...
if(itemRemoveCondition) {
oldList.RemoveAt(i--);
}
}
If you mean you can add/remove objects from another thread, I would:
1-synchronize the threads
2- in the add/remove threads, create a list of items to be added or deleted
3- and then delete these items in a critical section (so it is small - you don't have to synch while adding the items to the delete list)
If you dont want to do that, you can use for instead of foreach, that would avoid the exception, but you would have to take extra care so you do not get other kinds of exceptions
foreach(T value in new List(oldList).ToList() ) - give a try
For me, first thing is you should consider using some kind of data paging, because having such 1-milion-items-large list could be dangerous itself.
Have you heard about Unit of Work pattern?
You can implement it so you mark objects for create, update or delete, and later, you call "SaveChanges", "Commit" or any other doing the job of "apply changes", and you'll get done.
For example, you iterate over the enumerable (oldList) and you mark them as "delete". Later, you call "SaveChanges" and the more abstract, generic unit of work will iterate over the small, filtered list of objects to work with.
http://martinfowler.com/eaaCatalog/unitOfWork.html
Anyway, avoid lists of a milion items. You should work with paged lists of objects.
It will be 'slow' but there is not much more you can do about it, except running it on a background thread. E.g. using a BackgroundWorker.
If your operations on the list only occur on one thread, the correct approach is to add the items to add/remove to seperate lists, and perform those operations after your iterations has finished.
If you use multiple threads you will have to look into multithreaded programming, and e.g. use locks or probably better a ReaderWriterLock.
UPDATE:
As mentioned in another Stack Overflow question, this is now possible without any effort in .NET 4.0 when using concurrent collections.
If you are using Foreach loop for modifying collection then you will get this error as below.
List<string> li = new List<string>();
li.Add("bhanu");
li.Add("test");
foreach (string s in li)
{
li.Remove(s);
}
Solution - use For Loop as below.
for (int i = 0; i < li.Count; i++)
{
li.RemoveAt(i);
i--;
}
you can use a flag to switch the modification to a temporary list while the original is being enumerated.
/// where you are enumerating
isBeingEnumerated = true
foreach(T value in new List<T>(oldList) )
isBeingEnumerated = false
SyncList(oldList with temporaryList)
/// where you are modifying while enumerating
if isBeingEnumerated then
use a temporaryList to make the changes.
Whats the best/easiest way to obtain a count of items within an IEnumerable collection without enumerating over all of the items in the collection?
Possible with LINQ or Lambda?
In any case, you have to loop through it. Linq offers the Count method:
var result = myenum.Count();
The solution depends on why you don't want to enumerate through the collection.
If it's because enumerating the collection might be slow, then there is no solution that will be faster. You might want to consider using an ICollection instead if possible. Unless the enumeration is remarkably slow (e.g. it reads items from disk) speed shouldn't be a problem though.
If it's because enumerating the collection will require more code then it's already been written for you in the form of the .Count() extension method. Just use MyEnumerable.Count().
If it's because you want to be able to enumerate the collection after you've counted then the .Count() extension method allows for this. You can even call .Count() on a collection you're in the middle of enumerating and it will carry on from where it was before the count. For example:
foreach (int item in Series.Generate(5))
{
Console.WriteLine(item + "(" + myEnumerable.Count() + ")");
}
will give the results
0 (5)
1 (5)
2 (5)
3 (5)
4 (5)
If it's because the enumeration has side effects (e.g. writes to disk/console) or is dependant on variables that may change between counting and enumerating (e.g. reads from disk) [N.B. If possible, I would suggest rethinking the architecture as this can cause a lot of problems] then one possibility to consider is reading the enumeration into an intermittent storage. For example:
List<int> seriesAsList = Series.Generate(5).ToList();
All of the above assume you can't change the type (i.e. it is returned from a library that you do not own). If possible you might want to consider changing to use an ICollection or IList (ICollection being more widely scoped than IList) which has a Count property on it.
You will have to enumerate to get a count. Other constructs like the List keep a running count.
Use this.
IEnumerable list =..........;
list.OfType<T>().Count()
it will return the count.
There's also IList or ICollection, if you want to use a construct that is still somewhat flexible, but also has the feature you require. They both imply IEnumerable.
It also depends on what you want to achieve by counting.. If you are interested to find if the enumerable collection has any elements, you could use
myEnumerable.Any() over myEnumerable.Count() where the former will yield the first element and the later will yield all the elements.
An IEnumerable will have to iterate through every item. to get the full count.
If you just need to check if there is one or more items in an IEnumerable a more efficient method is to check if there are any. Any() only check to see there is a value and does not loop through everything.
IEnumerable myStrings = new List(){"one","two", "three"};
bool hasValues = myStrings.Any();
Not possible with LINQ, as calling .Count(...) does enumerate the collection. If you're running into the problem where you can't iterate through a collection twice, try this:
List<MyTableItem> myList = dataContext.MyTable.ToList();
int myTableCount = myList.Count;
foreach (MyTableItem in myList)
{
...
}
If you need to count and then loop you may be better off with a list.
If you're using count to check for members you can use Any() to avoid enumerating the entire collection.
The best solution -as I think is to do the following:
using System.Linq.Dynamic;
myEnumerable.AsQueryable().Count()
When I want to use the Count property, I use ILIST which implements IEnumerable and ICollection interfaces. The ILIST data structure is an Array. I stepped through using the VS Debugger and found that the .Count property below returns the Array.Length property.
IList<string> FileServerVideos = Directory.GetFiles(VIDEOSERVERPATH, "*.mp4");
if (FileServerVideos.Count == 0)
return;
I need to enumerate though generic IList<> of objects. The contents of the list may change, as in being added or removed by other threads, and this will kill my enumeration with a "Collection was modified; enumeration operation may not execute."
What is a good way of doing threadsafe foreach on a IList<>? prefferably without cloning the entire list. It is not possible to clone the actual objects referenced by the list.
Cloning the list is the easiest and best way, because it ensures your list won't change out from under you. If the list is simply too large to clone, consider putting a lock around it that must be taken before reading/writing to it.
There is no such operation. The best you can do is
lock(collection){
foreach (object o in collection){
...
}
}
Your problem is that an enumeration does not allow the IList to change. This means you have to avoid this while going through the list.
A few possibilities come to mind:
Clone the list. Now each enumerator has its own copy to work on.
Serialize the access to the list. Use a lock to make sure no other thread can modify it while it is being enumerated.
Alternatively, you could write your own implementation of IList and IEnumerator that allows the kind of parallel access you need. However, I'm afraid this won't be simple.
ICollection MyCollection;
// Instantiate and populate the collection
lock(MyCollection.SyncRoot) {
// Some operation on the collection, which is now thread safe.
}
From MSDN
You'll find that's a very interesting topic.
The best approach relies on the ReadWriteResourceLock which use to have big performance issues due to the so called Convoy Problem.
The best article I've found treating the subject is this one by Jeffrey Richter which exposes its own method for a high performance solution.
So the requirements are: you need to enumerate through an IList<> without making a copy while simultaniously adding and removing elements.
Could you clarify a few things? Are insertions and deletions happening only at the beginning or end of the list?
If modifications can occur at any point in the list, how should the enumeration behave when elements are removed or added near or on the location of the enumeration's current element?
This is certainly doable by creating a custom IEnumerable object with perhaps an integer index, but only if you can control all access to your IList<> object (for locking and maintaining the state of your enumeration). But multithreaded programming is a tricky business under the best of circumstances, and this is a complex probablem.
Forech depends on the fact that the collection will not change. If you want to iterate over a collection that can change, use the normal for construct and be prepared to nondeterministic behavior. Locking might be a better idea, depending on what you're doing.
Default behavior for a simple indexed data structure like a linked list, b-tree, or hash table is to enumerate in order from the first to the last. It would not cause a problem to insert an element in the data structure after the iterator had already past that point or to insert one that the iterator would enumerate once it had arrived, and such an event could be detected by the application and handled if the application required it. To detect a change in the collection and throw an error during enumeration I could only imagine was someone's (bad) idea of doing what they thought the programmer would want. Indeed, Microsoft has fixed their collections to work correctly. They have called their shiny new unbroken collections ConcurrentCollections (System.Collections.Concurrent) in .NET 4.0.
I recently spend some time multip-threading a large application and had a lot of issues with the foreach operating on list of objects shared across threads.
In many cases you can use the good old for-loop and immediately assign the object to a copy to use inside the loop. Just keep in mind that all threads writing to the objects of your list should write to different data of the objects. Otherwise, use a lock or a copy as the other contributors suggest.
Example:
foreach(var p in Points)
{
// work with p...
}
Can be replaced by:
for(int i = 0; i < Points.Count; i ++)
{
Point p = Points[i];
// work with p...
}
Wrap the list in a locking object for reading and writing. You can even iterate with multiple readers at once if you have a suitable lock, that allows multiple concurrent readers but also a single writer (when there are no readers).
This is something that I've recently had to deal with and to me it really depends on what you're doing with the list.
If you need to use the list at a point in time (given the number of elements currently in it) AND another thread can only ADD to the end of the list, then maybe you just switch out to a FOR loop with a counter. At the point you grab the counter, you're only seeing X numbers of elements in the list. You can walk through the list (while others are adding to the end of it) . . . should not cause a problem.
Now, if the list needs to have items taken OUT of it by other threads, or CLEARED by other threads, then you'll need to implement one of the locking mechanisms mentioned above. Also, you may want to look at some of the newer "concurrent" collection classes (though I don't believe they implement IList - so you may need refactor for a dictionary).