How to process n items in a collection concurrently - c#

i am building a class that inherits from List. Items are going to be added to this collection at runtime and what i want is to have this class automatically do something with each block of n items after they have been added.
So here is the scenario.
1] Create new class that inherits from List - CollectionX
2] At runtime we will be calling ColX.Add(T) many times
3] When ColX has 500 or more items it is to move them into a temporary area and do work on them, then delete them. Keeping in mind that all the while items will still be being added to ColX.
So i guess my question is how do i implement this nicely and by ensuring that it is thread safe.
The work that is to be performed must be done in blocks so i dont think a queue will work as you can only dequeue 1 item at a time.
I think im looking for more of a pattern than actual types or libraries.
Can anyone help?

Don't let CollectionX inherit from List.
Instead, use 2 Lists internally, Add() to 1 and process the other.
This way you only have to lock the swapping of the Lists. If there are timing problems you could even use a 3rd List to prevent blockage.

Related

Thread-Safe Random Array

Is there a way to create a Random Array/List/HashSet WITHOUT using lock() or the Concurrent* set of methods?
My goal:
Be able to Add and Remove strings as I please (or be able to Add and "disable" strings)
Be able to Clear the list or swap out the entire sets of strings
Be able to get an item from it (Kind of like TryPeek) to choose ANY item RANDOMLY without using a Mixture of .ElementAt with Random of .Count/.Count()
I'm attempting to create a system to choose a Proxy at complete random from a collection that can be modified by another thread either removing or adding new proxies at any time.
Here's some "Randomizing" solutions thrown around a lot and why they are bad and shouldn't be used
.ElementAt(Random.Next(List.Count)) is NOT a good way to randomize a list
This is extremely unsuitable for a multi-threaded scenario for multiple reasons.
Even with a lock(){} wrapped around it and all other code to do with the collection can cause conflicts in a modified enumerable.
What I mean is, the List can change its count (perhaps lower) causing an error to be caused with .ElementAt when it chose lets say the final element, but that final element JUST got removed from the list right before ElementAt gets to it, causing an exception.
.OrderBy(Random)
Is also another bad way as it will randomize the entire list before choosing an element and is susceptible to Collection being modified while executing OrderBy causing an exception.
Both of the bad ways to randomly choose 1 item from a list can be "solved" by simply .ToArray()'ing before .OrderBy(Random) or before you do the .ElementAt but you also must use the ToArray()'s count for the .ElementAt's Random.
The issue here too is that this will be bad for memory depending on what your doing your essentially doubling the memory usage for the list.
This is why i'm asking if there's any kind of way to randomize efficiently without the possibility of multi-threading conflicting with modifications to collections.
Current Code
lock = new object();
...
lock (_Lock) {
proxy = proxyArray.ElementAt(proxyArray.Count == 1 ? 0 : rand.Next(0, proxyArray.Count - 1));
}
I'm doing a count == 1 ? 0 so that it doesn't even bother wasting CPU randomizing for an obvious answer.
My attempt with ConcurrentBag
object proxiesRefill = new object();//for lock
...
while (!concurrentProxiesBag.TryTake(out proxy)) {//keep attempting until it takes a proxy
lock (proxiesRefill) {//it failed possibly because its empty lets lock and check on 1 thread
if (concurrentProxiesBag.IsEmpty) {//if its empty lets refill
concurrentProxiesBag = new ConcurrentBag<string>(hashsetDisabledProxies);//Refill by creating a new ConcurrentBag and adding all disabled proxies to it [Properly threadsafe? not sure]
}
}
Thread.Sleep(100);//sleep for 100ms just to cool down a small bit
}
//Got proxy
hashsetDisabledProxies.Add(proxy);//Disable proxy as its taken out of the active proxy bag by TryTake
//Its a hashset for no duplicates that could possibly get added.
I made some changes to your code:
while (!concurrentProxiesBag.TryTake(out proxy)) {//keep attempting until it takes a proxy
if (concurrentProxiesBag.IsEmpty) {//if its empty lets refill
concurrentProxiesBag = new ConcurrentBag<string>(hashsetDisabledProxies);//Refill by creating a new ConcurrentBag and adding all disabled proxies to it [Properly threadsafe? not sure]
}
}
Thread.Sleep(100);//sleep for 100ms just to cool down a small bit
IsEmpty, Count, ToArray(), GetEnumerator() lock the entire structure;

Is there List type that can be Enumerated while it is Changing?

Sometimes it is useful to enumerate a list while it is changing.
e.g.
foreach (var item in listOfEntities)
item.Update();
// somewhere else (with someEntity contained in listOfEntities)
// an add or remove is made:
someEntity.OnUpdate += (s,e) => listOfEntities.Remove(someEntity);
This will fail if listOfEntities is a List<T>.
There are workarounds like making a copy or a simple for-loop, each with different drawbacks, but I would like to know if there is a list type in the framework (or open source) that supports this.
Look at the collections in System.Collections.Concurrent. There's no list there, but the collections' enumerators do "represents a moment-in-time snapshot of the contents of the [collection]".
These collections are designed for access from multiple threads, so they will be better suited to applications like the code sample you posted.
This has nothing to do with List<T>; it is a limitation of the enumerator. If you change the state of the collection underneath the enumerator it will throw, period.
You could use a for loop, but you will then run into logical errors as you index into a collection after the number of items have changed.
It's probably a bad idea to swap items in and out of a collection while you are enumerating it in another thread. I would stick with the tried and true method of recording the items to be removed in another collection or locking the collection while it is being enumerated.
I'm not claiming this is an impossible problem to solve, I just don't know of an easy way to do it.

Transaction support in an observable collection

I'm interested the most efficient way to change an observable collection in such a way that only one property changed is fired. Lets say that I want to populate the list with 3 items, there is no addCollection method or something like that, so I have to do clear + 3 times add. Do I need to create a different observable collection and assign? Or what techniqies do others use?
NET Framework's ObservableCollection class sends individual notifications on as each item added to the collection and provides no mechanism for AddRange-type functionality. However you can very easily create your own collection that implements INotifyCollectionChanged and send whatever notifications you like.
On issue you may encounter is that the INotifyCollectionChanged interface includes the ability to specify that multiple items were added to the collection in a single message, but no standard NET Framework classes actually create these notifications. Because of this, some third-party and open source controls that assume only one item has been added when they receive an Add notification. Even the built-in NET Framework classes may have undiscovered bugs related to this.
For these reasons I would recommend your custom collection have a mode in which it can be set to always send a Reset notification at the end of an AddRange instead of a single multi-item Add notification. You could optimize this further by sending multiple single-item Add notifictions or a Reset notification depending on the actual number of items added.
Of course there are situations in which it is just as easy to replace the ObservableCollection with a new one. At times this will be much less efficient than looping Add() because event handlers and CollectionViews are rebuilt. Other times it will be more efficient if the collection is large and your loop only adds a few items at a time.
And sometimes it won't work at all.

Lockless list help!

Hi im trying to write a lockless list i got the adding part working it think but the code that extracts objects from the list does not work to good :(
Well the list is not a normal list.. i have the Interface IWorkItem
interface IWorkItem
{
DateTime ExecuteTime { get; }
bool Cancelled { get; }
void Execute(DateTime now);
}
and well i have a list where i can add this :P and the idear is when i run Get(); on the list it should loop it until it finds a IWorkItem that
If (item.ExecuteTime < DateTime.Now)
and remove it from the list and return it..
i have ran tests with many threads on my dual core cpu and it seems that Add works never failed so far but the Get function looses some workitems some where i have no idear whats wrong.....
ps if i get this working any one is free to use the code :) well you are any way but i dont se the point when its bugged :P
The code is here http://www.easy-share.com/1903474734/LinkedList.zip and if you try to run it you will see that it will some times not be able to get as many workitems as it did put in the list...
Edit: I have got a lockless list working it was faster than using the lock(obj) statment but i have a lock object that uses Interlocked that was still outpreforming the lockless list, im going to try to make a lockless arraylist and se if i get the same results there when im done ill upload the result here..
The problem is your algorithm: Consider this sequence of events:
Thread 1 calls list.Add(workItem1), which completes fully.
Status is:
first=workItem1, workItem1.next = null
Then thread 1 calls list.Add(workItem2) and reaches the spot right before the second Replace (where you have the comment "//lets try").
Status is:
first=workItem1, workItem1.next = null, nextItem=workItem1
At this point thread 2 takes over and calls list.Get(). Assume workItem1's executionTime is now, so the call succeeds and returns workItem1.
After this status is:
first = null, workItem1.next = null
(and in the other thread, nextItem is still workItem1).
Now we get back to the first thread, and it completes the Add() by setting workItem1.next:=workItem2.
If we call list.Get() now, we will get null, even though the Add() completed successfully.
You should probably look up a real peer-reviewed lock-free linked list algorithm. I think the standard one is this by John Valois. There is a C++ implementation here. This article on lock-free priority queues might also be of use.
You can use a timestamping protocol for datastructures just fine, mirroring this example from the database world:
Concurrency
But be clear that each item needs both a read and write timestamp, and be sure to follow the rules of the algorithm clearly.
There are some additional difficulties of implementing this on a linked list though, I think. The database example would be fine for a vector where you know the array index of what you want. However, in a linked list, you may need to walk down the pointers -- and the structure of the list could change while you are searching! I guess you could solve that by some sort of nuance (or if you just want to traverse the "new" list as it is, do nothing), but it poses a problem. Try to solve it without introducing some rollback condition that makes it worse than locking the list!
So are you sure that it needs to be lockless? Depending on your work load the non-blocking solution can sometimes be slower. Check out this MSDN article for a little more. Also proving that a lockless data structure is correct can be very difficult.
I am in no way an expert on the subject, but as far as I can see, you need to either make the ExecutionTime-field in the implementation of IWorkItem volatile (of course it might already be that) or insert a memorybarrier either after you set the ExecutionTime or before you read it.

Threadsafe foreach enumeration of lists

I need to enumerate though generic IList<> of objects. The contents of the list may change, as in being added or removed by other threads, and this will kill my enumeration with a "Collection was modified; enumeration operation may not execute."
What is a good way of doing threadsafe foreach on a IList<>? prefferably without cloning the entire list. It is not possible to clone the actual objects referenced by the list.
Cloning the list is the easiest and best way, because it ensures your list won't change out from under you. If the list is simply too large to clone, consider putting a lock around it that must be taken before reading/writing to it.
There is no such operation. The best you can do is
lock(collection){
foreach (object o in collection){
...
}
}
Your problem is that an enumeration does not allow the IList to change. This means you have to avoid this while going through the list.
A few possibilities come to mind:
Clone the list. Now each enumerator has its own copy to work on.
Serialize the access to the list. Use a lock to make sure no other thread can modify it while it is being enumerated.
Alternatively, you could write your own implementation of IList and IEnumerator that allows the kind of parallel access you need. However, I'm afraid this won't be simple.
ICollection MyCollection;
// Instantiate and populate the collection
lock(MyCollection.SyncRoot) {
// Some operation on the collection, which is now thread safe.
}
From MSDN
You'll find that's a very interesting topic.
The best approach relies on the ReadWriteResourceLock which use to have big performance issues due to the so called Convoy Problem.
The best article I've found treating the subject is this one by Jeffrey Richter which exposes its own method for a high performance solution.
So the requirements are: you need to enumerate through an IList<> without making a copy while simultaniously adding and removing elements.
Could you clarify a few things? Are insertions and deletions happening only at the beginning or end of the list?
If modifications can occur at any point in the list, how should the enumeration behave when elements are removed or added near or on the location of the enumeration's current element?
This is certainly doable by creating a custom IEnumerable object with perhaps an integer index, but only if you can control all access to your IList<> object (for locking and maintaining the state of your enumeration). But multithreaded programming is a tricky business under the best of circumstances, and this is a complex probablem.
Forech depends on the fact that the collection will not change. If you want to iterate over a collection that can change, use the normal for construct and be prepared to nondeterministic behavior. Locking might be a better idea, depending on what you're doing.
Default behavior for a simple indexed data structure like a linked list, b-tree, or hash table is to enumerate in order from the first to the last. It would not cause a problem to insert an element in the data structure after the iterator had already past that point or to insert one that the iterator would enumerate once it had arrived, and such an event could be detected by the application and handled if the application required it. To detect a change in the collection and throw an error during enumeration I could only imagine was someone's (bad) idea of doing what they thought the programmer would want. Indeed, Microsoft has fixed their collections to work correctly. They have called their shiny new unbroken collections ConcurrentCollections (System.Collections.Concurrent) in .NET 4.0.
I recently spend some time multip-threading a large application and had a lot of issues with the foreach operating on list of objects shared across threads.
In many cases you can use the good old for-loop and immediately assign the object to a copy to use inside the loop. Just keep in mind that all threads writing to the objects of your list should write to different data of the objects. Otherwise, use a lock or a copy as the other contributors suggest.
Example:
foreach(var p in Points)
{
// work with p...
}
Can be replaced by:
for(int i = 0; i < Points.Count; i ++)
{
Point p = Points[i];
// work with p...
}
Wrap the list in a locking object for reading and writing. You can even iterate with multiple readers at once if you have a suitable lock, that allows multiple concurrent readers but also a single writer (when there are no readers).
This is something that I've recently had to deal with and to me it really depends on what you're doing with the list.
If you need to use the list at a point in time (given the number of elements currently in it) AND another thread can only ADD to the end of the list, then maybe you just switch out to a FOR loop with a counter. At the point you grab the counter, you're only seeing X numbers of elements in the list. You can walk through the list (while others are adding to the end of it) . . . should not cause a problem.
Now, if the list needs to have items taken OUT of it by other threads, or CLEARED by other threads, then you'll need to implement one of the locking mechanisms mentioned above. Also, you may want to look at some of the newer "concurrent" collection classes (though I don't believe they implement IList - so you may need refactor for a dictionary).

Categories

Resources