Best way to handle multi threaded applications from running code concurrently - c#

I have an automatic betting BOT.
I use a Windows Service and timers to set off a job every 30 seconds in its own thread that takes bets from the DB, loops through and places them.
However in certain occurrences when the job is too long (over 30 seconds) I can get the same bet being placed twice using the same BetPK (unique ID) as the job for placing it runs at the same time as a previously started thread.
I am using C#, NET 4, VS 2012.
At the moment I set a "locked" flag in a table when the job to place bets runs and then unset it on finishing. So if another job runs and the job is locked it will return ASAP. However this is relying on the DB and network traffic.
What would be the best way in C# to prevent a job started by a timer thread from clashing with a previously started thread. I am thinking I could set a flag IN the service controller that spawns the threads so if a job is running another one won't spawn.
However I would like to learn the correct way to handle multi threaed clashes like this. I just lost a couple of hundred pounds today due to 2 LAY bets being placed at exactly the same time. As only one record existed for the Bet, the last bet placed had the Betfair ID updated so I had no clue about the duplicate until I checked Betfairs own page.
I do already do checks to see if the bet has already been placed before trying to place it but in cases where the "placebet" method is running on the same Bet record at exactly the same time then this is no good.
Any help much appreciated.
Thanks

No, the best solution is to keep the locks in the database. The app should be as stateless as possible. You already have a great solution.
Locking inside of your app is error prone and the errors are catastrophic (deadlock, the app stops to work until manually restarted). Locking using the database is much easier, and errors are recoverable.
Just get the locking with the database right. Ask a new question where you post details on what you're doing. I recommend that you XLOCK any betting jobs that you're working on. That way they can only be executed once. Use the power of database locks and transactions to make this work. This is by far easier than app-level threading.

You could always try implementing a db like Redis (redis.io) that offers built in POP functions (http://redis.io/commands/lpop). Redis has a C# client and is super useful for any kind of app where speed is crucial as it keeps the entire db in memory. It's also single threaded which makes it easy to implement distributors for multi-consumer type applications.
I'd also recommend checking out http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis as it lays out the pros and cons for Redis and other dbs. Might help you make future db decisions.

Old question, I know, but I wanted to throw this out there for anybody that stumbles across it.
C# (and presumably VB.NET) offers a couple of nice options for handling thread synchronization. You can use the lock keyword to block execution until a given lock is available, or Monitor.TryEnter() if you want to specify a timeout (possibly immediately) for taking the lock.
For either of these approaches, you need an object to use for locking. Pretty much any object will do; if you aren't synchronizing access to some object itself (collection, database connection, whatever), you can even just instantiate a throwaway object. For a polling timer, the latter is typical.
First, make sure you have an object to use for synchronization:
public class DatabasePollingClass {
object PollingTimerLock = new object();
...
Now, if you want the polling threads to block indefinitely waiting for their turn, use the lock keyword:
public class DatabasePollingClass {
object PollingTimerLock = new object();
...
protected void PollingTimerCallback() {
lock (PollingTimerLock) {
//Useful stuff here
}
}
}
Only a single thread will be allowed within the lock (PollingTimerLock) block of code at a time. All other threads will wait indefinitely, then resume executing as soon as they can acquire the lock for themselves.
However, you probably don't want that behavior. If you'd rather have the subsequent threads abort immediately (or after a short wait) if another polling thread is still running, you can use Monitor.TryEnter() when taking the lock. This does require slightly more caution, however:
public class DatabasePollingClass {
object PollingTimerLock = new object();
...
protected void PollingTimerCallback() {
if (Monitor.TryEnter(PollingTimerLock)) { //Acquires lock on PollingTimerLock object
try {
//Useful stuff here
} finally {
//Releases lock.
//You MUST do this in a finally block! (See below.)
Monitor.Exit(PollingTimerLock);
}
} else {
Console.WriteLine("Warning: Polling timer overlap. Skipping.");
}
}
}
The additional caution stems from the fact that, unlike the lock keyword, Monitor.TryEnter() requires you to manually release the lock when you're finished with it. In order to guarantee that this happens, you need to wrap your whole critical section in a try block, and release the lock in the finally block. This is to ensure that the lock will be released, even if the polling method fails or returns early. If the method returned without releasing the lock, your program would effectively be hung, as no further threads would be able to acquire the lock.
Another option, which doesn't use locking mechanisms, would be to configure your Timer without a repeat period, i.e. a one-shot Timer. At the end of your polling method, you would dispose the old Timer, and set a new one (you would also need to do this within a finally block to guarantee that the Timer gets reset by the end of the method). This approach would be useful if you want to poll the database at a certain interval since the end of the previous polling. It's a subtle distinction, but it also solves the problem of concurrent polling attempts.
Note that this is a really simple thread concurrency example. As long as all of your locking is happening on threads separate from your UI thread (the message pump itself can become a point of contention), and you're only ever locking a single object, you shouldn't have to worry too much about deadlocks. Those can be really unpleasant to debug; the symptom is usually "application stops responding, and now you get to guess which threads are waiting on what".

Related

Is ReaderWriterLockSlim resistant to ThreadAbortException?

I would like to check whether following code is resistant against ThreadAbortException and will not lead into orphan lock. If it is not, what is the best pattern to avoid orphan locks here?
ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();
public void DoStaff()
{
_lock.EnterWriteLock();
//Is this place where ThreadAbotException can corrupt my code, or is there JIT optimalization which prevent this from happening???
try
{
...
}
finally
{
_lock.ExitWriteLock();
}
}
According following link http://chabster.blogspot.cz/2013/07/a-story-of-orphaned-readerwriterlockslim.html, there is (or at least there was) possible way how to create orphan locks but I was running sample code for a while without any luck.
I am using .NET 4.0
Is there any difference between behavior in Debug and Release?
Yes, ThreadAbortException could occur there, in which case the try wouldn't be entered and therefore you would never exit the write lock.
There's no good general solution to the problem. Which is why Eric Lippert (among others) says that Locks and exceptions do not mix.
You're asking specifically about ThreadAbortException, which leads me to believe that you're contemplating using Thread.Abort for some kind of threading control in your application. I urge you to reconsider. If you want the ability to cancel your threads, you should use Cancellation or something similar. Using Thread.Abort in any other than the most dire circumstances is a horrifically bad idea. It certainly should not be part of your program's overall design.
In order for code which uses a locking primitive to be robust in the face of thread aborts, it is necessary that every lock-acquisition and lock-release request pass, or be performed through, an unshared token which can be given "ownership" the lock. Depending upon the design of the locking API, the token may be an object of some specific type, an arbitrary Object, or a variable passed as a ref parameter. It's imperative, however, that the token be created and stored by some means before the lock is acquired, so that if the token gets created but the store fails, the token may be abandoned without difficulty. Unfortunately, although monitor locks have added (in .NET 4.0) overloads of Monitor.Enter and Monitor.TryEnter which use ref bool as a token, I know of no equivalent for reader-writer locks.
If one wants abort-safe reader-writer lock functionality, I would suggest one would need a class which was designed around that; it should keep track of what threads hold reader or writer access and, rather than relying upon threads to release locks, it should, when waiting for a lock to be released, make sure the thread holding it is still alive. If a thread dies while holding read access, it should be released. If a thread dies while holding right access, any pending or future attempts to acquire the lock should throw an immediate exception.
Otherwise, there are some tricks via which a block of code can be protected against Thread.Abort(). Unfortunately, I don't know any clean way to bracket the code around a lock-acquisition request in such a way that Abort will work when the request itself can be cleanly aborted without having succeeded, but will be deferred if the request succeeds.
There are ways via which a framework could safely allow a thread which is in an endless loop to be killed by another thread, but designing mechanisms which could be used safely would require more effort than was put into Thread.Abort().

Is it OK to use Control.Invoke instead of using a lock?

I'll have a database object that can be accessed from multiple threads as well as from the main thread. I don't want them to access the underlying database object concurrently, so I'll write a set of thread safe public methods that can be accessed from multiple threads.
My first idea was to use a lock around my connection such as lock(oleDbConnection), but the problem is that I would have to lock it for the main thread since is one more thread that can access it. Which would mean rewriting lots of code.
But, since these threads and the main thread wont access the database very often, how about just using some of my control's (maybe the main form's) Invoke method every time I call any of the database methods from another thread. This way, as far as I understand, these methods would be never called concurrently, and I wouldn't need to worry about the main thread. I guess the only problem would be degrading performance a little bit, but as I said, the database is not accessed that often; the reason why I use threads is not so that they can access the database concurrently but so that they can perform other operations concurrently.
So does this sound like a good idea? Am I missing something? Sounds a bit too easy so I'm suspicious.
It sounds like it would work AFAIK, but it also sounds like a really bad idea.
The problem is that when writing lock you are saying "I want this code to be a critical section", whereas when writing Invoke you are saying "I want this to be executed on the UI thread". These two things are certainly not equivalent, which can lead to lots of problems. For example:
Invoke is normally used to access UI controls. What if a developer sees Invoke and nothing UI-related, and goes "gee, that's an unneeded Invoke; let's get rid of it"?
What if more than one UI thread ends up existing?
What if the database operation takes a long time (or times out)? Your UI would stop responding.
I would definitely go for the lock. You typically want the UI thread responsive when performing operations that may take time, which includes any sort of DB access; you don't know whether it's alive or not for instance.
Also, the typical way to handle connections is to create, use and dispose the connection for each request, rather than reusing the same connection. This might perhaps solve some of your concurrency problems.
Why don't you try to use Connection Pool. Every thread can do its work with a different DB connection and send the result to main thread with Invoke. Connection Pooling is a very common approach used in Servers.
See Using Connection Pooling with SQL Server

Using Rx to queue operations I don't want executed until particular time?

Summary: I have a web app that executes workflows on business objects and sometimes needs to deliberately wait several seconds or minutes between steps. I'm looking to (perhaps via Rx.NET), improve the execution of these workflows so I do not exhaust the ThreadPool and make the website unresponsive when the system is under heavy load.
A very simplified version of the workflow is:
Create an object
Load data into it from System A
POST this data to System B
If System A is down, my app waits and retries later. The wait time is modeled after GMail's escalating delays in retry: Wait 1 second, double on each subsequent retry (maxing out at 1 hour). The app saves state to the database obsessively so if the whole app blows up, when it restarts it will resume all workflows where it left off.
Currently (please be gentle) each step in the workflow is executed by calling ThreadPool.QueueUserWorkItem to queue up a method that calls Thread.Sleep if necessary for the retry delay described above, then actually executes the step.
If the system is performing well (no errors), it can easily handle all the traffic we throw at it, and the ThreadPool nicely manages parallel execution of all these workflow instances. But if System B is down for a while, retry count and thus delay grows, and pretty soon the ThreadPool is filled with all the sleeping threads, causing the website to become unresponsive to new requests.
Essentially I want to throw all these pending workflows into a queue ordered by (last execution time + desired retry delay). Despite reading a lot about and being excited by Rx, I've never had an opportunity to use it, but it seems like it might be a helpful way to handle this. If Rx can magically manage spitting out these objects when they're ready to fire it seems like it would
Greatly simplify and clarify this logic, and
Prevent the wasteful use of lots of threads that are just sleeping 99% of the time
Any guidance to an Rx newbie would be greatly appreciated, even if it's just to explain why this is in fact not a good use case for Rx.
In this case, I might stick with your current solution, because of this bit:
The app saves state to the database obsessively so if the whole app blows up, when it restarts it will resume all workflows where it left off.
"Resuming" a pipeline (i.e. x.Where().Select().Timeout().Bla()) via deserialization on startup is...tricky.
It's hard to give you a more detailed solution without more info, it might actually work pretty well with Rx if you don't try to model the entire flow, just the transaction bit (i.e. load from A, send to B).
Anyway, the way to solve your thread pool exhaustion is via the System.Threading.Timer class, which tells the thread pool to simply wait until the timeout before queueing a new item.
You will definitely have to adapt:
public IDisposable StartProcess<T>(Action<T> load, Action<T> post) where T : new()
{
return StartProcess(TimeSpan.FromSeconds(1), new T())
.Do(load)
.Subscribe(post);
}
private IObservable<long> StartProcess<T>(TimeSpan span, T obj) where T : new()
{
Observable
.Interval(span)
.OnErrorResumeNext(Observable.Defer(() => StartProcess(IncreaseSpan(span), obj)))
.Concat(Observable.Defer(() => StartProcess(TimeSpan.FromSeconds(1), new T())));
}
private TimeSpan IncreaseSpan(TimeSpan span)
{
return TimeSpan.FromSeconds(span.TotalSeconds < 1800? span.TotalSeconds * 2 : 3600);
}
Now I'd much rather have load instantiate and fill the object rather than doing it explicitly since functional programming discourages mutability and you may wish load to actually go to a database and restore the state like you mentioned.
I wasn't sure if you wanted to preserve the state object in case the call to post or load crashed and you will need to adapt because currently, it'll preserve the state whether load or post crashes and will call load again without a fresh state if post crashes which may definitely not be what you want to do.
I didn't test the code, but Rx is suitable for what you want to do.
Check out this post on the Rx forums. Pretty handy operator for the kind of problem you want to solve: http://social.msdn.microsoft.com/Forums/en-US/rx/thread/af43b14e-fb00-42d4-8fb1-5c45862f7796/
Rx is a great way to deal with problems like this (and in particular), because you can have your async functions/observables and apply generic operators like the described Retry operator to them.

Looking at what happens when a c#/ASP.NET thread is terminated and how to get around problems

I'm working on a ASP.NET website that on some requests will run a very lengthy caching process. I'm wondering what happens exactly if the execution timeout is reached while it is still running in terms of how the code handles it.
Particularly I am wondering about things like if the code is in the try of a try/finally block will the finally still be run?
Also given I am not sure I want the caching to terminate even if it goes on that long is there a way with spawning new threads, etc. that I can circumvent this execution timeout? I am thinking it would be much nicer to return to the user immediately and say "a cache build is happening" rather than just letting them time out. I have recently started playing with some locking code to make sure only one cache build happens at a time but am thinking about extending this to make it run out of sync.
I've not really played with creating threads and such like myself so am not sure exactly how they work, particularly in terms of interacting with ASP.NET. eg if the parent thread that launched it is terminated will that have any effect on the spawned thread?
I know there is kind of a lot of different questions in here and I can split them if that is deemed best but they all seem to go together... I'll try to summarise the questions though:
Will a finally block still be executed if a thread is terminated by ASP.NET while in the try block
Would newly created threads be subject to the same timeouts as the original thread?
Would newly created threads die at the same time as the parent thread that created them?
And the general one of what is the best way to do long running background processes on an ASP.NET site?
Sorry for some noobish questions, I've never really played with threads and they still intimidate me a bit (my brain says they are hard). I could probably test the answer to a lot of tehse questions but I wouldn't be confident enough of my tests. :)
Edit to add:
In response to Capital G:
The problem I have is that the ASp.NET execution timeout is currently set to one hour which is not always long enough for some of these processes I reckon. I've put some stuff in with locks to prevent more than one person setting off these long processes and I was worried the locks might not be released (which if finally blocks aren't always run might happen I guess).
Your comments on not running long processes in ASP.NET is why I was thinking of moving them to other threads rather than blocking the request thread but I don't know if that still counts as running within the ASP.NET architecture that you said was bad.
The code is not actually mine so I'm not allowed (and not sure I 100% understand it enough) to rework it into a service though that is certainly where it would best live.
Would using a BackgroundWorker process for something that could take an hour be feasible in this situation (with respect to comments on long running processes in ASP.NET). I would then make request return a "Cache is building" page until its finished and then go back to serving normally... Its all a bit of a nightmare but its my job so I've got to find a way to improve it. :)
Interesting question, just tested and no it's not guaranteed to execute the code in the finally block, if a thread is aborted it could stop at any point in the processing. You can design some sanity checking and other mechanisms to handle special cleanup routines and such but it has a lot to do with your thread handling as well.
Not necessarily, it depends on how your implementing your threads. If you are working with threads yourself, then you can easily get into situations where the parent thread is killed while it's child threads are still out there processing, you generally want to do some cleanup in the parent thread that ends the child threads as well. Some objects might do a lot of this for you as well, so it's a tough call to say one way or the other. Never assume this at the very least.
No, not necessarily, don't assume this at least, again has to do with your design and whether your doing threading yourself or using some higher level threading object/pattern. I would never assume this regardless.
I don't recommend long running processes within the ASP.NET architecture, unless its within the typical timeout, if it's 10-20s okay but if it's minutes, no, the reason is resource usage within ASP.NET and it's awfully bad on a user. That being said you could perform asynchronous operations where you hand off the work to the server, then you return back to the user when the processing is finished, (this is great for those 10-20s+ processes), the user can be given a little animation or otherwise not have their browser all stuck for that long waiting for whatever is happening on the server to happen.
If it is a long running process, things that take greater than 30-60s+, unless it absolutely has to be done in ASP.NET due to the nature of the process, I suggest moving it to a windows service and schedule it in some way to occur when required.
Note: Threading CAN be complicated, it's not that it's hard so much as that you have to be very aware of what your doing, which requires a firm understanding of what threads are and how they work, I'm no expert, but I'm also not completely new and I'll tell you that in most situations you don't need to get into the realm of threading, even when it seems like you do, if you must however, I would suggest looking into the BackgroundWorker object as they are simplified for the purposes of doing batched processing etc. (honestly for many situations that DO need threads, this is usually a very simple solution).
http://msdn.microsoft.com/en-us/library/system.componentmodel.backgroundworker.aspx
Long or time consuming processes to be started behind the web-page; it should not hit the ASP.NET execution time out; the user page should be freed; running the requests under lock etc. All these situation points towards using async services. In one of the products, where I architected, used services for such scenarios. The service exposes some async method to initiate. The status of the progress can be queried using another method. Every request is given some id and no duplicate requests are fired ever. The progress proceeds even if the user logs out. The user can see the results at a later time.
If you have looked at such options already, let me know if there is any issue. Or if you are yet to look in this direction, please get it this way. For any help, just send in your comments.

Why is the explicit management of threads a bad thing?

In a previous question, I made a bit of a faux pas. You see, I'd been reading about threads and had got the impression that they were the tastiest things since kiwi jello.
Imagine my confusion then, when I read stuff like this:
[T]hreads are A Very Bad Thing. Or, at least, explicit management of threads is a bad thing
and
Updating the UI across threads is usually a sign that you are abusing threads.
Since I kill a puppy every time something confuses me, consider this your chance get your karma back in the black...
How should I be using thread?
Enthusiam for learning about threading is great; don't get me wrong. Enthusiasm for using lots of threads, by contrast, is symptomatic of what I call Thread Happiness Disease.
Developers who have just learned about the power of threads start asking questions like "how many threads can I possible create in one program?" This is rather like an English major asking "how many words can I use in a sentence?" Typical advice for writers is to keep your sentences short and to the point, rather than trying to cram as many words and ideas into one sentence as possible. Threads are the same way; the right question is not "how many can I get away with creating?" but rather "how can I write this program so that the number of threads is the minimum necessary to get the job done?"
Threads solve a lot of problems, it's true, but they also introduce huge problems:
Performance analysis of multi-threaded programs is often extremely difficult and deeply counterintuitive. I've seen real-world examples in heavily multi-threaded programs in which making a function faster without slowing down any other function or using more memory makes the total throughput of the system smaller. Why? Because threads are often like streets downtown. Imagine taking every street and magically making it shorter without re-timing the traffic lights. Would traffic jams get better, or worse? Writing faster functions in multi-threaded programs drives the processors towards congestion faster.
What you want is for threads to be like interstate highways: no traffic lights, highly parallel, intersecting at a small number of very well-defined, carefully engineered points. That is very hard to do. Most heavily multi-threaded programs are more like dense urban cores with stoplights everywhere.
Writing your own custom management of threads is insanely difficult to get right. The reason is because when you are writing a regular single-threaded program in a well-designed program, the amount of "global state" you have to reason about is typically small. Ideally you write objects that have well-defined boundaries, and that do not care about the control flow that invokes their members. You want to invoke an object in a loop, or a switch, or whatever, you go right ahead.
Multi-threaded programs with custom thread management require global understanding of everything that a thread is going to do that could possibly affect data that is visible from another thread. You pretty much have to have the entire program in your head, and understand all the possible ways that two threads could be interacting in order to get it right and prevent deadlocks or data corruption. That is a large cost to pay, and highly prone to bugs.
Essentially, threads make your methods lie. Let me give you an example. Suppose you have:
if (!queue.IsEmpty) queue.RemoveWorkItem().Execute();
Is that code correct? If it is single threaded, probably. If it is multi-threaded, what is stopping another thread from removing the last remaining item after the call to IsEmpty is executed? Nothing, that's what. This code, which locally looks just fine, is a bomb waiting to go off in a multi-threaded program. Basically that code is actually:
if (queue.WasNotEmptyAtSomePointInThePast) ...
which obviously is pretty useless.
So suppose you decide to fix the problem by locking the queue. Is this right?
lock(queue) {if (!queue.IsEmpty) queue.RemoveWorkItem().Execute(); }
That's not right either, necessarily. Suppose the execution causes code to run which waits on a resource currently locked by another thread, but that thread is waiting on the lock for queue - what happens? Both threads wait forever. Putting a lock around a hunk of code requires you to know everything that code could possibly do with any shared resource, so that you can work out whether there will be any deadlocks. Again, that is an extremely heavy burden to put on someone writing what ought to be very simple code. (The right thing to do here is probably to extract the work item in the lock and then execute it outside the lock. But... what if the items are in a queue because they have to be executed in a particular order? Now that code is wrong too because other threads can then execute later jobs first.)
It gets worse. The C# language spec guarantees that a single-threaded program will have observable behaviour that is exactly as the program is specified. That is, if you have something like "if (M(ref x)) b = 10;" then you know that the code generated will behave as though x is accessed by M before b is written. Now, the compiler, jitter and CPU are all free to optimize that. If one of them can determine that M is going to be true and if we know that on this thread, the value of b is not read after the call to M, then b can be assigned before x is accessed. All that is guaranteed is that the single-threaded program seems to work like it was written.
Multi-threaded programs do not make that guarantee. If you are examining b and x on a different thread while this one is running then you can see b change before x is accessed, if that optimization is performed. Reads and writes can logically be moved forwards and backwards in time with respect to each other in single threaded programs, and those moves can be observed in multi-threaded programs.
This means that in order to write multi-threaded programs where there is a dependency in the logic on things being observed to happen in the same order as the code is actually written, you have to have a detailed understanding of the "memory model" of the language and the runtime. You have to know precisely what guarantees are made about how accesses can move around in time. And you cannot simply test on your x86 box and hope for the best; the x86 chips have pretty conservative optimizations compared to some other chips out there.
That's just a brief overview of just a few of the problems you run into when writing your own multithreaded logic. There are plenty more. So, some advice:
Do learn about threading.
Do not attempt to write your own thread management in production code.
Use higher-level libraries written by experts to solve problems with threads. If you have a bunch of work that needs to be done in the background and want to farm it out to worker threads, use a thread pool rather than writing your own thread creation logic. If you have a problem that is amenable to solution by multiple processors at once, use the Task Parallel Library. If you want to lazily initialize a resource, use the lazy initialization class rather than trying to write lock free code yourself.
Avoid shared state.
If you can't avoid shared state, share immutable state.
If you have to share mutable state, prefer using locks to lock-free techniques.
Explicit management of threads is not intrinsically a bad thing, but it's frought with dangers and shouldn't be done unless absolutely necessary.
Saying threads are absolutely a good thing would be like saying a propeller is absolutely a good thing: propellers work great on airplanes (when jet engines aren't a better alternative), but wouldn't be a good idea on a car.
You cannot appreciate what kind of problems threading can cause unless you've debugged a three-way deadlock. Or spent a month chasing a race condition that happens only once a day. So, go ahead and jump in with both feet and make all the kind of mistakes you need to make to learn to fear the Beast and what to do to stay out of trouble.
There's no way I could offer a better answer than what's already here. But I can offer a concrete example of some multithreaded code that we actually had at my work that was disastrous.
One of my coworkers, like you, was very enthusiastic about threads when he first learned about them. So there started to be code like this throughout the program:
Thread t = new Thread(LongRunningMethod);
t.Start(GetThreadParameters());
Basically, he was creating threads all over the place.
So eventually another coworker discovered this and told the developer responsible: don't do that! Creating threads is expensive, you should use the thread pool, etc. etc. So a lot of places in the code that originally looked like the above snippet started getting rewritten as:
ThreadPool.QueueUserWorkItem(LongRunningMethod, GetThreadParameters());
Big improvement, right? Everything's sane again?
Well, except that there was a particular call in that LongRunningMethod that could potentially block -- for a long time. Suddenly every now and then we started seeing it happen that something our software should have reacted to right away... it just didn't. In fact, it might not have reacted for several seconds (clarification: I work for a trading firm, so this was a complete catastrophe).
What had ended up happening was that the thread pool was actually filling up with long-blocking calls, leading to other code that was supposed to happen very quickly getting queued up and not running until significantly later than it should have.
The moral of this story is not, of course, that the first approach of creating your own threads is the right thing to do (it isn't). It's really just that using threads is tough, and error-prone, and that, as others have already said, you should be very careful when you use them.
In our particular situation, many mistakes were made:
Creating new threads in the first place was wrong because it was far more costly than the developer realized.
Queuing all background work on the thread pool was wrong because it treated all background tasks indiscriminately and did not account for the possibility of asynchronous calls actually being blocked.
Having a long-blocking method by itself was the result of some careless and very lazy use of the lock keyword.
Insufficient attention was given to ensuring that the code that was being run on background threads was thread-safe (it wasn't).
Insufficient thought was given to the question of whether making a lot of the affected code multithreaded was even worth doing to begin with. In plenty of cases, the answer was no: multithreading just introduced complexity and bugs, made the code less comprehensible, and (here's the kicker): hurt performance.
I'm happy to say that today, we're still alive and our code is in a much healthier state than it once was. And we do use multithreading in plenty of places where we've decided it's appropriate and have measured performance gains (such as reduced latency between receiving a market data tick and having an outgoing quote confirmed by the exchange). But we learned some pretty important lessons the hard way. Chances are, if you ever work on a large, highly multithreaded system, you will too.
Unless you are on the level of being able to write a fully-fledged kernel scheduler, you will get explicit thread management always wrong.
Threads can be the most awesome thing since hot chocolate, but parallel programming is incredibly complex. However, if you design your threads to be independent then you can't shoot yourself in the foot.
As fore rule of the thumb, if a problem is decomposed into threads, they should be as independent as possible, with as few but well defined shared resources as possible, with the most minimalistic management concept.
I think the first statement is best explained as such: with the many advanced APIs now available, manually writing your own thread code is almost never necessary. The new APIs are a lot easier to use, and a lot harder to mess up!. Whereas, with the old-style threading, you have to be quite good to not mess up. The old-style APIs (Thread et. al.) are still available, but the new APIs (Task Parallel Library, Parallel LINQ, and Reactive Extensions) are the way of the future.
The second statement is from more of a design perspective, IMO. In a design that has a clean separation of concerns, a background task should not really be reaching directly into the UI to report updates. There should be some separation there, using a pattern like MVVM or MVC.
I would start by questioning this perception:
I'd been reading about threads and had got the impression that they were the tastiest things since kiwi jello.
Don’t get me wrong – threads are a very versatile tool – but this degree of enthusiasm seems weird. In particular, it indicates that you might be using threads in a lot of situations where they simply don’t make sense (but then again, I might just mistake your enthusiasm).
As others have indicated, thread handling is additionally quite complex and complicated. Wrappers for threads exist and only in rare occasions do they have to be handled explicitly. For most applications, threads can be implied.
For example, if you just want to push a computation to the background while leaving the GUI responsive, a better solution is often to either use callback (that makes it seem as though the computation is done in the background while really being executed on the same thread), or by using a convenience wrapper such as the BackgroundWorker that takes and hides all the explicit thread handling.
A last thing, creating a thread is actually very expensive. Using a thread pool mitigates this cost because here, the runtime creates a number of threads that are subsequently reused. When people say that explicit management of threads is bad, this is all they might be referring to.
Many advanced GUI Applications usually consist of two threads, one for the UI, one (or sometimes more) for Processing of data (copying files, making heavy calculations, loading data from a database, etc).
The processing threads shouldn't update the UI directly, the UI should be a black box to them (check Wikipedia for Encapsulation).
They just say "I'm done processing" or "I completed task 7 of 9" and call an Event or other callback method. The UI subscribes to the event, checks what has changed and updates the UI accordingly.
If you update the UI from the Processing Thread you won't be able to reuse your code and you will have bigger problems if you want to change parts of your code.
I think you should experiement as much as possible with Threads and get to know the benefits and pitfalls of using them. Only by experimentation and usage will your understanding of them grow. Read as much as you can on the subject.
When it comes to C# and the userinterface (which is single threaded and you can only modify userinterface elements on code executed on the UI thread). I use the following utility to keep myself sane and sleep soundly at night.
public static class UIThreadSafe {
public static void Perform(Control c, MethodInvoker inv) {
if(c == null)
return;
if(c.InvokeRequired) {
c.Invoke(inv, null);
}
else {
inv();
}
}
}
You can use this in any thread that needs to change a UI element, like thus:
UIThreadSafe.Perform(myForm, delegate() {
myForm.Title = "I Love Threads!";
});
A huge reason to try to keep the UI thread and the processing thread as independent as possible is that if the UI thread freezes, the user will notice and be unhappy. Having the UI thread be blazing fast is important. If you start moving UI stuff out of the UI thread or moving processing stuff into the UI thread, you run a higher risk of having your application become unresponsive.
Also, a lot of the framework code is deliberately written with the expectation that you will separate the UI and processing; programs will just work better when you separate the two out, and will hit errors and problems when you don't. I don't recall any specifics issues that I encountered as a result of this, though I have vague recollections of in the past trying to set certain properties of stuff the UI was responsible for outside of the UI and having the code refuse to work; I don't recall whether it didn't compile or it threw an exception.
Threads are a very good thing, I think. But, working with them is very hard and needs a lot of knowledge and training. The main problem is when we want to access shared resources from two other threads which can cause undesirable effects.
Consider classic example: you have a two threads which get some items from a shared list and after doing something they remove the item from the list.
The thread method that is called periodically could look like this:
void Thread()
{
if (list.Count > 0)
{
/// Do stuff
list.RemoveAt(0);
}
}
Remember that the threads, in theory, can switch at any line of your code that is not synchronized. So if the list contains only one item, one thread could pass the list.Count condition, just before list.Remove the threads switch and another thread passes the list.Count (list still contains one item). Now the first thread continues to list.Remove and after that second thread continues to list.Remove, but the last item already has been removed by the first thread, so the second one crashes. That's why it would have to be synchronized using lock statement, so that there can't be a situation where two threads are inside the if statement.
So that is the reason why UI which is not synchronized must always run in a single thread and no other thread should interfere with UI.
In previous versions of .NET if you wanted to update UI in another thread, you would have to synchronize using Invoke methods, but as it was hard enough to implement, new versions of .NET come with BackgroundWorker class which simplifies a thing by wrapping all the stuff and letting you do the asynchronous stuff in a DoWork event and updating UI in ProgressChanged event.
A couple of things are important to note when updating the UI from a non-UI thread:
If you use "Invoke" frequently, the performance of your non-UI thread may be severely adversely affected if other stuff makes the UI thread run sluggishly. I prefer to avoid using "Invoke" unless the non-UI thread needs to wait for the UI-thread action to be performed before it continues.
If you use "BeginInvoke" recklessly for things like control updates, an excessive number of invocation delegates may get queued, some of which may well be pretty useless by the time they actually occur.
My preferred style in many cases is to have each control's state encapsulated in an immutable class, and then have a flag which indicates whether an update is not needed, pending, or needed but not pending (the latter situation may occur if a request is made to update a control before it is fully created). The control's update routine should, if an update is needed, start by clearing the update flag, grabbing the state, and drawing the control. If the update flag is set, it should re-loop. To request another thread, a routine should use Interlocked.Exchange to set the update flag to update pending and--if it wasn't pending--try to BeginInvoke the update routine; if the BeginInvoke fails, set the update flag to "needed but not pending".
If an attempt to control occurs just after the control's update routine checks and clears its update flag, it may well happen that the first update will reflect the new value but the update flag will have been set anyway, forcing an extra screen redraw. On the occasions when this happens, it will be relatively harmless. The important thing is that the control will end up being drawn in the correct state, without there ever having been more than one BeginInvoke pending.

Categories

Resources