[Sorry that English is not my native language.]
So, I have a UI and 1 worker on another thread, the worker will call the UI to Update() in random frequency,
so there might be a lot of Update() invoked to the UI. But if there are really multple Update(), then only the latest one is meaningful, yet I have no way to skip those in between.
So I want to
"detect if there is an Update() ongoing, if yes, just pend 1 more Update()"
"check if there is already 1 more Update() pending, then there is no need to pend more"
before the Update()
But I am not sure what's the best way to do it. I think surely someone has encountered such problem before. Googling just gives me some unrelated result. So I am looking for some patterns or best practices or search terms or advice or suggestion about this.
Thank you very much
It isn't clear what kind of class library you are using. However, invoking Update() is fundamentally wrong. Painting the UI is a low priority task, it should only be done when nothing more important needs to be taken care of.
The proper thing to do is call Invalidate(). You can call it as many times as you want, it cannot 'backup'. When the UI thread is ready and willing, then it will paint the user interface. If the changes happen faster then the UI thread can keep up with then no harm is done, the intermediary paint just didn't happen.
Which is in general something else you need to take care of. It is pretty easy to shoot the foot and invoke hundreds of times per second. Which is pointless, a human cannot perceive changes that fast. Forty times per second is plenty, it looks as smooth as a movie in cinema. Realistically you should use less.
Seems like you need a queue of Update requests with a length of one.
Produce UpdateRequests to the queue and discard them if the queue is full... then consume them from another thread that actually does the final update invoke to the main thread.
If you always want to guarantee that the value retrieved by the UI thread is the last one pushed, consider a stack structure for the data you're sharing between the UI and background threads. You will still need to put a lock on the shared data to ensure that the UI thread doesn't get a "stale" update.
Here's a reference to the Stack class in C#:
http://msdn.microsoft.com/en-us/library/system.collections.stack.aspx
According to the documentation:
Thread Safety
Public static (Shared
in Visual Basic) members of this type
are thread safe. Any instance members
are not guaranteed to be thread safe.
To guarantee the thread safety of the
Stack, all operations must be done
through the wrapper returned by the
Synchronized method.
Enumerating through a collection is
intrinsically not a thread-safe
procedure. Even when a collection is
synchronized, other threads can still
modify the collection, which causes
the enumerator to throw an exception.
To guarantee thread safety during
enumeration, you can either lock the
collection during the entire
enumeration or catch the exceptions
resulting from changes made by other
threads.
EDIT:
Jorge is right- there is also a Queue class that might be better suited:
http://msdn.microsoft.com/en-us/library/system.collections.queue.aspx
Related
I have a program that is constantly reading and parsing a large stream of data from a WebSocket. All of the parsing happens on one thread within the client, and the data is organized into a SortedSet<T> tree for fast operation.
All of the data is added, updated, and removed without a hitch.
The problem comes when I try to access the data from another thread. It will run fine, but somewhere along the lines is a race condition that will be hit within a minute or two.
Consider this code (running on its own thread) to update the UI in near real-time:
private async Task RenderOrderBook()
{
var book = _client.OrderBook;
while (true)
{
try
{
var asks = book.Asks.OrderBy(i => i.Price).Take(5).OrderByDescending(i => i.Price);
var bids = book.Bids.OrderByDescending(i => i.Price).Take(5);
orderBookView.BeginInvoke(new MethodInvoker(() =>
{
...omitted due to irrelevance
}));
await Task.Delay(500);
}
catch (Exception ex)
{
ex.ToString();
}
}
}
The race condition lies within the LINQ operations on book. The common error is that i.Price (a decimal variable), or perhaps just the object i is referring to, is null. Additionally, my shoddy attempt to just swallow the exception does not actually work.
Regardless, my guess is that the data is being parsed and manipulated so fast that eventually, when using the LINQ OrderBy operation, it will hit a case where a node has been removed by the client, attempt to read from it, and throw an exception.
The book.Asks and book.Bids properties were initially of type SortedSet<T> and pointed directly to the data member itself. In an attempt to mitigate this race condition scenario, I attempted to change them to an array of the node, and use a _asks.ToArray() call to essentially make a copy to read from. This helped make the problem occur a bit less frequently, but nonetheless it still does happen.
How can I make this thread-safe?
Additional Code Snippets
public PriceNode[] Asks
{
get { return _asks.ToArray(); }
}
public PriceNode[] Bids
{
get { return _bids.ToArray(); }
}
My first rule of UI development is that you never perform I/O on the UI thread. Sounds like you've got that one covered.
My second rule is that once something is visible to the UI thread, you can't touch it from any other thread. There is exactly one exception to this rule, and that is for immutable data: if an object will not change, then any thread can touch it. Mutable data? No touch. Keep in mind that "mutable data" includes most collections.
Your life will be so much easier if you can follow these two rules. Following one without breaking the other can be tricky, but there are ways to do it, and once you have a decent grip of them, you'll be in a better place. The path to enlightenment begins here:
Your read thread (the thread reading off the socket) is allowed to create all the new objects it wants, but it can't update existing objects. It also can't modify any collections that the UI thread is using. If you're only adding new objects, this isn't so bad: your read thread can pull data off the socket and use it to cook up new objects. When those objects are ready, it has to hand them over to the UI thread, and the UI thread can add them to the relevant collections. The bulk of the work (and all of the I/O) happens on the read thread, which is what we want, per Strobel's Rule #1. The act of "committing" the already-populated objects should be trivial by comparison. Per Rule #2, once any mutable objects get handed off to the UI thread, your read thread can't touch them again. Ever.
Updating existing objects is trickier. There's a couple ways you can approach this. One is to have the read thread use the latest data to create new objects, which it then hands off to the UI thread. If you have very simple object graphs, the easiest option might be to simply replace the old objects with their newer versions, keeping in mind that any UI code referencing an old object will need to know that it's been replaced. Alternatively, the UI thread can use the data from the new object to update the existing object. If you're following Rule #2, this will be totally thread-safe, and any UI code that pointed to the old object automatically sees the new data without any torn reads or other race-related nastiness. This approach is probably your best bet.
If, after trying out the approaches in the previous paragraph, you find that you are generating unacceptable amounts of garbage, there is a third option. The read thread can copy the raw data for each object into a temporary buffer, then hand the buffers over to the UI thread, which can use the data in the buffers to update the existing objects. This means more work occurring on the UI thread, but at least the data is already in memory (the socket I/O is already done). Since the point of this approach is to create less garbage, it only makes sense if you reuse the buffers. That means you need a thread-safe buffer pool. The read thread acquires a temporary buffer, fills it from the socket, hands it to the UI thread, which returns it to the pool when it's done. Astute readers will note that passing mutable buffers between threads bumps up against Rule #2, so take care that once a thread hands over a buffer, it immediately forgets about it. Because this approach requires a stronger grasp of thread safety to make the pool work, I recommend it only as a last resort. If you can get away with one of the options in the previous paragraph, please do so.
Regardless of which approach you use for updating existing objects, you'll need a way to match up the new objects/data with the old objects. If each object has a unique identifier, you can use a Dictionary<,> as an efficient lookup mechanism. Replacing old objects with their newer copies is a bit more involved, because the old versions may be scattered across multiple collections, some of which may not support efficient replacement.
One last thing: when you hand over new/updated objects to the UI thread, it is vastly preferable to do it in batches. For example, you're better off posting a single operation to your UI thread to update 100 objects than posting 100 separate operations that each update one object.
Consider two threads run simultaneously. A is reading and B is writing. When A is reading, in the middle of code ,CPU time for A finishes then B thread continues.
Is there any way to don't give back CPU until A finishes, but B can start or continue?
You need to understand that you have almost no control over when CPU is given back and to whom it is given. The operating system does that. To have control on that, you'd need to be the operating system. The only things you can usually do are:
start a thread
set thread priority, so some threads are may more likely get time than others
put a thread to sleep, immediatelly and ask the operating system to wake it up upon some condition, maybe with some timeout (waiting time limit)
as a special case, or a typical use case, the second point is often also provided with a shorthand:
put a thread to sleep, immediatelly for a specified amount of time
By "sleep" I mean that this thread is paused and will not get any CPU time, even if all CPUs are idle, unless the thread is woken up by the OS due to some condition.
Furthermore, in a typical case, there is no "thread A and thread B that switch CPU time between them", but there is "lots of threads from various processes and the operating system itself, and you two threads". This means that when your thread A loses the CPU, most probably it will not be the thread B that gets the time now. Some other thread from somewhere else will get it, and at some future point of time, maybe your thread A or maybe thread B will get it back.
This means that there is very little you can be sure. You can be sure that your threads are
either dead
or sleeping
or proceeding 'forward' in a hard to determine order
If you need to ensure that some threads are synchronized, you must .. not start them simultaneously, or put them sleep in precise moments and wake them up in precise order.
You've just said in comments:
You know, if in the middle of A CPU time finishes, data that has been retrieved is not complete
This means that you need to ensure that thread B does not try to touch the data before thread A finishes writing it. But also, if you think about it, you need to ensure that thread A doesn't start writing next data if the thread B is now reading previous data.
This means synchronization. This means that threads A and B must wait if the other thread is touching the data. This means that they need to be put to sleep and woken up when the other thread finishes.
In C#, the easiest way to do that is to use lock(x) keyword. When a thread enters a lock() section, it proceeds only if it is able to get the lock. If not, it is put to sleep. It can't get the lock if any other thread was faster and got it before. However, a thread releases the lock when it ends its job. Upon that time, one of the sleeping threads is woken up and given the lock.
lock(fooo) { // <- this line means 'acquire the lock or sleep'
iam.doing(myjob);
very.important(things);
thatshouldnt.be.interrupted();
byother(threads);
} // <- this line means 'release the lock'
So, when a thread gets through the lock(fooo){ line, you can't be sure it won't be interrupted. Oh, surely it will be. OS will switch the threads back and forth to other processes, and so on. But you can be sure that no other threads of your app will be inside the code block. If they tried to get inside while your thread got that lock, they'd imediatelly fall asleep in the first lock line. One of them be will be later woken up when your thread gets out of that code.
There's one more thing. lock() keyword requires a parameter. I wrote foo there. You need to pass there something that will act as the lock. It can be any object, even plain object:
private object thelock = new object();
private void dosomething()
{
lock(thelock)
{
foobarize(thebaz);
}
}
however you must ensure that all threads try to use the same lock instance. Writing a code like
private void dosomething()
{
object thelock = new object();
lock(thelock)
{
foobarize(thebaz);
}
}
is a nonsense since every potential thread executing that lines will try lockin upon their own new object instance and will see it as "free" (it's new, just created, noone took it earlier) and will immediatelly get into the protected code block.
Now you wrote about using ConcurrentQueue. This class provides safely mechanisms against concurrency. You can be sure that adding or reading or removing items from that queue is already safe. This collection makes it safe. You don't need to add synchronization to add or remove items safely. It's safe. If you observe any ill effects, then most probably you have tried putting an item into that collection and then you were modifying that item. Concurrent collection will not guard you against that. It can only make sure that add/remove/etc are safe. But it has no knowledge or control on what you do to the items:
In short, if some thread B tries to read items from the collection, then in thread A this is NOT safe:
concurrentcoll.Add(item);
item.x = 5;
item.foobarize();
but this is safe:
item.x = 5;
item.foobarize();
concurrentcoll.Add(item);
// and do not touch the Item anymore here.
I have several actions that I want to execute in the background, but they have to be executed synchronously one after the other.
I was wondering if it's a good idea to use the Task.ContinueWith method to achieve this. Do you foresee any problems with this?
My code looks something like this:
private object syncRoot =new object();
private Task latestTask;
public void EnqueueAction(System.Action action)
{
lock (syncRoot)
{
if (latestTask == null)
latestTask = Task.Factory.StartNew(action);
else
latestTask = latestTask.ContinueWith(tsk => action());
}
}
There is one flaw with this, which I recently discovered myself because I am also using this method of ensuring tasks execute sequentially.
In my application I had thousands of instances of these mini-queues and quickly discovered I was having memory issues. Since these queues were often idle I was holding onto the last completed task object for a long time and preventing garbage collection. Since the result object of the last completed task was often over 85,000 bytes it was allocated to Large Object Heap (which does not perform compaction during garbage collection). This resulted in fragmentation of the LOH and the process continuously growing in size.
As a hack to avoid this, you can schedule a no-op task right after the real one within your lock. For a real solution, I will need to move to a different method of controlling the scheduling.
This should work as designed (using the fact that TPL will schedule the continuation immediately if the corresponding task already has completed).
Personally in this case I would just use a dedicated thread using a concurrent queue (ConcurrentQueue) to draw tasks from - this is more explicit but easier to parse reading the code, especially if you want to find out i.e. how many tasks are currently queued etc.
I used this snippet and have seem to get it work as designed.
The number of instances in my case does not runs in to thousands, but in single digit.
Nevertheless, no issues so far.
I would be interested in the ConcurrentQueue example, if there is any?
Thanks
My problem is this:
I have two threads, my UI thread, and a worker thread. My worker thread is running in a seperate class that gets instantiated by the form, which passes itself as an ISynchronizeInvoke to the worker class, which then uses Invoke on that interface to call it's events, which provide status updates to the UI for display. This works wonderfully.
I noticed that my background thread seemed to be running slowly though, so I changed the call to Invoke to BeginInvoke, thinking that "I'm just providing progress updates, it doesn't need to be exactly synchronous, no harm done" except that now I'm getting oddities with the progress update. My progress bar updates, but the label's text doesn't, and if I change to another window and try to change back, it acts like the UI thread is locked up, so I'm wondering if perhaps my progress calls (which happen very often) are overloading the UI thread so much that it never processes messages. Is this possible at all, or is there something else at work here?
You're definitively overloading the UI thread.
In your first sample, you were (behind the scenes) sending a message to the UI thread, waiting for it to be processed (that's the purpose of invoke, which ultimately relies on SendMessage), and then sending another one. In the meantime, other messages were probably enqueued (WM_PAINT messages, for example) and processed.
In your second sample, by using BeginInvoke (which ultimately relies on PostMessage), you massively enqueued a lot of messages in the message queue, that the message pump must sequentially handle. And of course, while it's handling those thousands of messages, it cannot handle the OS messages (WM_PAINT, etc..) which makes your UI look "frozen"
You're probably providing too much status updates ; try to lower the feedback level.
If you want to understand better how messages work in windows, this is the place to start.
A few thoughts;
try batching your updates; for example, there is no point updating for every iteration in a loop; depending on the speed, perhaps every 50 / 500. In the case of lists, you would buffer in a local list variable, take the list over via Invoke / BeginInvoke, and process the buffer on the UI thread
variable capture; if you are using BeginInvoke and anonymous methods, you could have problems... I'll add an example below
making the UI update efficient - especially if you are processing a list; some controls (especially list-based controls) have a pair of methods like BeginEdit / EndEdit, that stop the UI redrawing when you are making lots of updates; instead, it waits until the End* is called
capture problem... imagine (worker):
List<string> stuff = new List<string>();
for(int i = 0 ; i < 50000 ; i++) {
stuff.Add(i.ToString());
if((i % 100) == 0) {
// update UI
BeginInvoke((MethodInvoker) delegate {
foreach(string s in stuff) {
listBox.Items.Add(s);
}
});
}
}
Did you notice that at some point both threads are talking to stuff? The UI thread can be iterating it while the worker thread (which has kept running past BeginInvoke) keeps adding. This can cause issues. Not usually performance issues (unless you are catching the exceptions and taking a long time to log them), but definitely issues. Options here would include:
using Invoke to run the update synchronously
create a new buffer per update, so that the two threads never have the same list instance (you'd need to look very carefully at the variable scoped to make sure, though)
Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class
The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.
I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.
Be careful, QueueUserWorkItem might fail
There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.
ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).
I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.
You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).
I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.