I have a .NET 4 C# service that is using the TPL libraries for threading. We recently switched it to also use connection pooling, since one connection was becoming a bottle neck for processing.
Previously, we were using a lock clause to control thread safety on the connection object. As work would back up, the queue would exist as tasks, and many threads (tasks) would be waiting on the lock clause. Now, in most scenarios, threads wait on database IO and work processes MUCH faster.
However, now that I'm using connection pooling, we have a new issue. Once the max number of connections is reached (100 default), if further connections are requested, there is a timeout (see Pooling info). When this happens, an exception is thrown saying "Connection request timed out".
All of my IDisposables are within using statements, and I am properly managing my connections. This scenario happens due to more work being requested than the pool can process (which is expected). I understand why this exception is thrown, and am aware of ways of handling it. A simple retry feels like a hack. I also realize that I can increase the timeout period via the connection string, however that doesn't feel like a solid solution. In the previous design (without pooling), work items would process because of the lock within the application.
What is a good way of handling this scenario to ensure that all work gets processed?
Another approach is to use a semaphore around the code that retrieves connections from the pool (and, hopefully, returns them). A sempahore is like a lock statement, except that it allows a configurable number of requestors at a time, not just one.
Something like this should do:
//Assuming mySemaphore is a semaphore instance, e.g.
// public static Semaphore mySemaphore = new Semaphore(100,100);
try {
mySemaphore.WaitOne(); // This will block until a slot is available.
DosomeDatabaseLogic();
} finally {
mySemaphore.Release();
}
You could look to control the degree of parallelism by using the Parallel.ForEach() method as follows:
var items = ; // your collection of work items
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 100 };
Parallel.ForEach(items, parallelOptions, ProcessItem)
In this case I chose to set the degree to 100, but you can choose a value that makes sense for your current connection pool implementation.
This solution of course assumes that you have a collection of work items up front. If, however, you're creating new Tasks through some external mechanism such as incoming web requests the exception is actually a good thing. At that point I would suggest that you make use of concurrent Queue data structure where you can place the work items and pop them off as worker threads become available.
The simplest solution is to increase the connection timeout to the length of time you are willing to block a request before returning failure. There must be some length of time that is "too long".
This effectively uses the connection pool as a work queue with a timeout. It's a lot easier than trying to implement one yourself. You would have to check the connection pool is fair ( FIFO ).
Related
I have an application that handles TCP connections, and when a connection is made, BeginRead is called on the stream to wait for data, and the main thread resumes waiting for new connections.
Under normal circumstances, there will be only a few connections at a time, and so generally the number of worker threads created by BeginRead is not an issue. However, in the theoretical situation that many, many connections exist at the same time, eventually when a new connection is made, the call to BeginRead causes an OutOfMemoryException. I would like to prevent the thread from being created in this situation (or be informed of a better way to wait for data from multiple streams).
What are some decent ways of accomplishing this? All I can think to do is to either
a) only allow a certain number of active connections at a time, or
b) attempt to use something called a MemoryFailPoint. After reading, I think this might be the better option, but how do I know how much memory a call to BeginRead will need to do its thing safely?
Look at this thread here. It can give you many answers for that.
But you can read your current memory usage of your process like this:
Process currentProcess = Process.GetCurrentProcess();
long memorySize = currentProcess.PrivateMemorySize64;
You should be using the thread pool to handle these operations, rather than creating whole new threads for every operation you need to perform. Not only will using a thread pool that can re-use threads greatly remove effort spent creating and tearing down threads (which isn't cheap) but the thread pool will work to ensure that threads are only created when it will be beneficial to do so, and will simply let requests to have work done queue up when adding more threads wouldn't be beneficial.
Inspired by my current problem, which is kind of identical to this:
Analogue of Queue.Peek() for BlockingCollection when listening to consuming IEnumerable<T> with the difference that I - currently - am using ConcurrentQueue<T> instead of BlockingCollection<T>, I wonder what any use case for ConcurrentQueue<T>.TryPeek() may be?
Of course I mean a use case without manual lock(myQueue) stuff to serialize queue accesses as TPL is meant to improve/substitute those lockings.
I had an application that used ConcurrentQueue<T>.TryPeek to good effect. One thread was set up to monitor the queue. Mostly it was looking at queue size, but we also wanted to get an idea of latency. Because the items in the queue had a time stamp field that said what time they were put into the queue, my monitoring thread could call TryPeek to get the item at the head of the queue, subtract the insertion time from the current time, and tell me how long the item had been in the queue. Over time and many samples, that gave me a very clear picture of how long it was taking for a received item to be processed.
It didn't matter that some other thread might dequeue the item while my monitoring code was still examining it.
I can think of a few other scenarios in which it would be useful to see what's at the head of the queue, even though it might be pulled off immediately.
I have a ConcurrentQueue where many threads may Enqueue, but I limit just one thread doing TryPeek and TryDequeue by lock:
lock (dequeueLock)
if (queue.TryPeek(out item))
if (item.Value <= threshold)
queue.TryDequeue(out item);
Note: other threads may continue to Enqueue while this code runs.
It would be nicer to have some atomic peek - check - dequeue operation, but the lock is fine for my scenario.
TryPeek is used to wait for the object to be at the first of the queue. TryDequeue will dequeue any object that is there. So, for instance, I wrote a webserver that is multithreaded, but during authorization, when authorization is enabled for certain request, they need at one point to be processed in the order they were received. I don't want to lock up the whole thread function or half of it, only so that for some clients I can process their requests in order.
So, I created a Dictionary<string, ConcurrentQueue<HttpListenerContext>>, then at the very beginning of the server thread, I lock temporarily and check to see if authorization will be required, if so I store the HttpListenerContext in a queue with the client IP as the dictionary key, so that different clients don't block each other's threads unnecessarily. Then, I process the headers and compute the hashes as normal, as page may make two or three request using ajax and websockets connections after the initial, it is better to multithread the hashing of the authorization information (which is digest authorization I implemented for HttpListener myself, so that I am not restricted to using Active Directory). Then when the authorization needs to be checked for the case that what is called the client nonce count is only one greater than the last request for that client's session, a security feature, I use the que I created and TryPeek with Thread.Yield() to wait until that threads HttpListenerContext is the first in the que to finish authorization and then dequeue it.
In short, it can be used to multithread where for most the thread you want things to run in parallel, to take advantage of different cores, but then for some threads for a piece of them you need everything to get back in order.
My understanding is that you use this method when you want to do a peek but you are not sure there is an item in the queue. Normally Peek on an empty queue will throw an exception. TryPeek will return false if the item is not there. This can be extremely useful in multithreaded scenarios where another thread may dequeue the item in between checks for empty queue and actually peeking for the value.
Try it
T item = bc.GetConsumingEnumerable().FirstOrDefault();
if (item != null)
{
//...
}
Having a look at an object to see if it is valid before taking it out is an option, just remember that when you do this that the Concurrent Queue will create a reference and not release the object from memory when you dequeue it. If you do, and you are memory profiling as I did with my ConcurrentQueue, you will see something like this.
Notice the ConcurrentQueueSegment with 11,060 instances while the queue only holds 8.
I have an automatic betting BOT.
I use a Windows Service and timers to set off a job every 30 seconds in its own thread that takes bets from the DB, loops through and places them.
However in certain occurrences when the job is too long (over 30 seconds) I can get the same bet being placed twice using the same BetPK (unique ID) as the job for placing it runs at the same time as a previously started thread.
I am using C#, NET 4, VS 2012.
At the moment I set a "locked" flag in a table when the job to place bets runs and then unset it on finishing. So if another job runs and the job is locked it will return ASAP. However this is relying on the DB and network traffic.
What would be the best way in C# to prevent a job started by a timer thread from clashing with a previously started thread. I am thinking I could set a flag IN the service controller that spawns the threads so if a job is running another one won't spawn.
However I would like to learn the correct way to handle multi threaed clashes like this. I just lost a couple of hundred pounds today due to 2 LAY bets being placed at exactly the same time. As only one record existed for the Bet, the last bet placed had the Betfair ID updated so I had no clue about the duplicate until I checked Betfairs own page.
I do already do checks to see if the bet has already been placed before trying to place it but in cases where the "placebet" method is running on the same Bet record at exactly the same time then this is no good.
Any help much appreciated.
Thanks
No, the best solution is to keep the locks in the database. The app should be as stateless as possible. You already have a great solution.
Locking inside of your app is error prone and the errors are catastrophic (deadlock, the app stops to work until manually restarted). Locking using the database is much easier, and errors are recoverable.
Just get the locking with the database right. Ask a new question where you post details on what you're doing. I recommend that you XLOCK any betting jobs that you're working on. That way they can only be executed once. Use the power of database locks and transactions to make this work. This is by far easier than app-level threading.
You could always try implementing a db like Redis (redis.io) that offers built in POP functions (http://redis.io/commands/lpop). Redis has a C# client and is super useful for any kind of app where speed is crucial as it keeps the entire db in memory. It's also single threaded which makes it easy to implement distributors for multi-consumer type applications.
I'd also recommend checking out http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis as it lays out the pros and cons for Redis and other dbs. Might help you make future db decisions.
Old question, I know, but I wanted to throw this out there for anybody that stumbles across it.
C# (and presumably VB.NET) offers a couple of nice options for handling thread synchronization. You can use the lock keyword to block execution until a given lock is available, or Monitor.TryEnter() if you want to specify a timeout (possibly immediately) for taking the lock.
For either of these approaches, you need an object to use for locking. Pretty much any object will do; if you aren't synchronizing access to some object itself (collection, database connection, whatever), you can even just instantiate a throwaway object. For a polling timer, the latter is typical.
First, make sure you have an object to use for synchronization:
public class DatabasePollingClass {
object PollingTimerLock = new object();
...
Now, if you want the polling threads to block indefinitely waiting for their turn, use the lock keyword:
public class DatabasePollingClass {
object PollingTimerLock = new object();
...
protected void PollingTimerCallback() {
lock (PollingTimerLock) {
//Useful stuff here
}
}
}
Only a single thread will be allowed within the lock (PollingTimerLock) block of code at a time. All other threads will wait indefinitely, then resume executing as soon as they can acquire the lock for themselves.
However, you probably don't want that behavior. If you'd rather have the subsequent threads abort immediately (or after a short wait) if another polling thread is still running, you can use Monitor.TryEnter() when taking the lock. This does require slightly more caution, however:
public class DatabasePollingClass {
object PollingTimerLock = new object();
...
protected void PollingTimerCallback() {
if (Monitor.TryEnter(PollingTimerLock)) { //Acquires lock on PollingTimerLock object
try {
//Useful stuff here
} finally {
//Releases lock.
//You MUST do this in a finally block! (See below.)
Monitor.Exit(PollingTimerLock);
}
} else {
Console.WriteLine("Warning: Polling timer overlap. Skipping.");
}
}
}
The additional caution stems from the fact that, unlike the lock keyword, Monitor.TryEnter() requires you to manually release the lock when you're finished with it. In order to guarantee that this happens, you need to wrap your whole critical section in a try block, and release the lock in the finally block. This is to ensure that the lock will be released, even if the polling method fails or returns early. If the method returned without releasing the lock, your program would effectively be hung, as no further threads would be able to acquire the lock.
Another option, which doesn't use locking mechanisms, would be to configure your Timer without a repeat period, i.e. a one-shot Timer. At the end of your polling method, you would dispose the old Timer, and set a new one (you would also need to do this within a finally block to guarantee that the Timer gets reset by the end of the method). This approach would be useful if you want to poll the database at a certain interval since the end of the previous polling. It's a subtle distinction, but it also solves the problem of concurrent polling attempts.
Note that this is a really simple thread concurrency example. As long as all of your locking is happening on threads separate from your UI thread (the message pump itself can become a point of contention), and you're only ever locking a single object, you shouldn't have to worry too much about deadlocks. Those can be really unpleasant to debug; the symptom is usually "application stops responding, and now you get to guess which threads are waiting on what".
My problem is that I'm apparently using too many tasks (threads?) that call a method that queries a SQL Server 2008 database. Here is the code:
for(int i = 0; i < 100000 ; i++)
{
Task.Factory.StartNew(() => MethodThatQueriesDataBase()).ContinueWith(t=>OtherMethod(t));
}
After a while I get a SQL timeout exception. I want keep the actual number of threads low(er) than 100000 to a buffer of say "no more than 10 at a time". I know I can manage my own threads using the ThreadPool, but I want to be able to use the beauty of TPL with the ContinueWith.
I looked at the Task.Factory.Scheduler.MaximumConcurrencyLevel but it has no setter.
How do I do that?
Thanks in advance!
UPDATE 1
I just tested the LimitedConcurrencyLevelTaskScheduler class (pointed out by Skeet) and still doing the same thing (SQL Timeout).
BTW, this database receives more than 800000 events per day and has never had crashes or timeouts from those. It sounds kinda weird that this will.
You could create a TaskScheduler with a limited degree of concurrency, as explained here, then create a TaskFactory from that, and use that factory to start the tasks instead of Task.Factory.
Tasks are not 1:1 with threads - tasks are assigned threads for execution out of a pool of threads, and the pool of threads is normally kept fairly small (number of threads == number of CPU cores) unless a task/thread is blocked waiting for a long-running synchronous result - such as perhaps a synchronous network call or file I/O.
So spinning up 10,000 tasks should not result in the production of 10,000 actual threads. However, if every one of those tasks immediately dives into a blocking call, then you may wind up with more threads, but it still shouldn't be 10,000.
What may be happening here is you are overwhelming the SQL db with too many requests all at once. Even if the system only sets up a handful of threads for your thousands of tasks, a handful of threads can still cause a pileup if the destination of the call is single-threaded. If every task makes a call into the SQL db, and the SQL db interface or the db itself coordinates multithreaded requests through a single thread lock, then all the concurrent calls will pile up waiting for the thread lock to get into the SQL db for execution. There is no guarantee of which threads will be released to call into the SQL db next, so you could easily end up with one "unlucky" thread that starts waiting for access to the SQL db early but doesn't get into the SQL db call before the blocking wait times out.
It's also possible that the SQL back-end is multithreaded, but limits the number of concurrent operations due to licensing level. That is, a SQL demo engine only allows 2 concurrent requests but the fully licensed engine supports dozens of concurrent requests.
Either way, you need to do something to reduce your concurrency to more reasonable levels. Jon Skeet's suggestion of using a TaskScheduler to limit the concurrency sounds like a good place to start.
I suspect there is something wrong with the way you're handling DB connections. Web servers could have thousands of concurrent page requests running all in various stages of SQL activity. I'm betting that attempts to reduce the concurrent task count is really masking a different problem.
Can you profile the SQL connections? Check out perfmon to see how many active connections there are. See if you can grab-use-release connections as quickly as possible.
I'm creating a server-type application at the moment which will do the usual listening for connections from external clients and, when they connect, handle requests, etc.
At the moment, my implementation creates a pair of threads every time a client connects. One thread simply reads requests from the socket and adds them to a queue, and the second reads the requests from the queue and processes them.
I'm basically looking for opinions on whether or not you think having all of these threads is overkill, and importantly whether this approach is going to cause me problems.
It is important to note that most of the time these threads will be idle - I use wait handles (ManualResetEvent) in both threads. The Reader thread waits until a message is available and if so, reads it and dumps it in a queue for the Process thread. The Process thread waits until the reader signals that a message is in the queue (again, using a wait handle). Unless a particular client is really hammering the server, these threads will be sat waiting. Is this costly?
I'm done a bit of testing - had 1,000 clients connected continually nagging - the server (so, 2,000+ threads) and it seemed to cope quite well.
I think your implementation is flawed. This kind of design doesn't scale because creating threads is expensive and there is a limit on how many threads can be created.
That is the reason that most implementations of this type use a thread pool. That makes it easy to put a cap on the maximum amount of threads while easily managing new connections and reusing the threads when the work is finished.
If all you are doing with your thread is putting items in a queue, then use the
ThreadPool.QueueUserWorkItem method to use the default .NET thread pool.
You haven't given enough information in your question to specify for definite but perhaps you now only need one other thread, constantly running clearing down the queue, you can use a wait handle to signal when something has been added.
Just make sure to synchronise access to your queue or things will go horribly wrong.
I advice to use following patter. First you need thread pool - build in or custom. Have a thread that checks is there something available to read, if yes it picks Reader thread. Then reading thread puts into queue and then thread from pool of processing threads will pick it. it will minimize number of threads and minimize time spend in waiting state