Threading an unkown amount of threads in C# - c#

I'm currently writing a sitemap generator that scrapes a site for urls and builds an xml sitemap. As most of the waiting is spent on requests to uri's I'm using threading, specifically the build in ThreadPool object.
In order to let the main thread wait for the unknown amount of threads to complete I have implemented the following setup. I don't feel this is a good solution though, can any threading gurus advise me of any problems this solution has, or suggest a better way to implement it?
The EventWaitHandle is set to EventResetMode.ManualReset
Here is the thread method
protected void CrawlUri(object o)
{
try
{
Interlocked.Increment(ref _threadCount);
Uri uri = (Uri)o;
foreach (Match match in _regex.Matches(GetWebResponse(uri)))
{
Uri newUri = new Uri(uri, match.Value);
if (!_uriCollection.Contains(newUri))
{
_uriCollection.Add(newUri);
ThreadPool.QueueUserWorkItem(_waitCallback, newUri);
}
}
}
catch
{
// Handle exceptions
}
finally
{
Interlocked.Decrement(ref _threadCount);
}
// If there are no more threads running then signal the waithandle
if (_threadCount == 0)
_eventWaitHandle.Set();
}
Here is the main thread method
// Request first page (based on host)
Uri root = new Uri(context.Request.Url.GetLeftPart(UriPartial.Authority));
// Begin threaded crawling of the Uri
ThreadPool.QueueUserWorkItem(_waitCallback, root);
Thread.Sleep(5000); // TEMP SOLUTION: Sleep for 5 seconds
_eventWaitHandle.WaitOne();
// Server the Xml Sitemap
context.Response.ContentType = "text/xml";
context.Response.Write(GetXml().OuterXml);
Any ideas are much appreciated :)

Well, first off you can create a ManualResetEvent that starts unset, so you don't have to sleep before waiting on it. Secondly you're going to need to put thread synchronization around your Uri collection. You could get a race condition where one two threads pass the "this Uri does not exist yet" check and they add duplicates. Another race condition is that two threads could pass the if (_threadCount == 0) check and they could both set the event.
Last, you can make the whole thing much more efficient by using the asynchronous BeginGetRequest. Your solution right now keeps a thread around to wait for every request. If you use async methods and callbacks, your program will use less memory (1MB per thread) and won't need to do context switches of threads nearly as much.
Here's an example that should illustrate what I'm talking about. Out of curiosity, I did test it out (with a depth limit) and it does work.
public class CrawlUriTool
{
private Regex regex;
private int pendingRequests;
private List<Uri> uriCollection;
private object uriCollectionSync = new object();
private ManualResetEvent crawlCompletedEvent;
public List<Uri> CrawlUri(Uri uri)
{
this.pendingRequests = 0;
this.uriCollection = new List<Uri>();
this.crawlCompletedEvent = new ManualResetEvent(false);
this.StartUriCrawl(uri);
this.crawlCompletedEvent.WaitOne();
return this.uriCollection;
}
private void StartUriCrawl(Uri uri)
{
Interlocked.Increment(ref this.pendingRequests);
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.BeginGetResponse(this.UriCrawlCallback, request);
}
private void UriCrawlCallback(IAsyncResult asyncResult)
{
HttpWebRequest request = asyncResult.AsyncState as HttpWebRequest;
try
{
HttpWebResponse response = (HttpWebResponse)request.EndGetResponse(asyncResult);
string responseText = this.GetTextFromResponse(response); // not included
foreach (Match match in this.regex.Matches(responseText))
{
Uri newUri = new Uri(response.ResponseUri, match.Value);
lock (this.uriCollectionSync)
{
if (!this.uriCollection.Contains(newUri))
{
this.uriCollection.Add(newUri);
this.StartUriCrawl(newUri);
}
}
}
}
catch (WebException exception)
{
// handle exception
}
finally
{
if (Interlocked.Decrement(ref this.pendingRequests) == 0)
{
this.crawlCompletedEvent.Set();
}
}
}
}

When doing this kind of logic I generally try to make an object representing each asynchronous task and the data it needs to run. I would typically add this object to the collection of tasks to be done. The thread pool gets these tasks secheduled, and I would let the object itself remove itself from the "to be done" collection when the task finishes, possibly signalling on the collection itself.
So you're finished when the "to be done" collection is empty; the main thread is probably awoken once by each task that finishes.

You could look into the CTP of the Task Parallel Library which should make this simpler for you. What you're doing can be divided into "tasks", chunks or units of work, and the TPL can parallelize this for you if you supply the tasks. It uses a thread pool internally as well, but it's easier to use and comes with a lot of options like waiting for all tasks to finish. Check out this Channel9 video where the possibilities are explained and where a demo is shown of traversing a tree recursively in parallel, which seems very applicable to your problem.
However, it's still a preview and won't be released until .NET 4.0, so it comes with no warranties and you'll have to manually include the supplied System.Threading.dll (found in the install folder) into your project and I don't know if that's an option to you.

Related

Network Command Processing with TPL Dataflow

I'm working on a system that involves accepting commands over a TCP network connection, then sending responses upon execution of those commands. Fairly basic stuff, but I'm looking to support a few requirements:
Multiple clients can connect at the same time and establish separate sessions. Sessions can last as long or as short as desired, with the same client IP able to establish multiple parallel sessions, if desired.
Each session can process multiple commands at the same time, as some of the requested operations can be performed in parallel.
I'd like to implement this cleanly using async/await and, based on what I've read, TPL Dataflow sounds like a good way to cleanly break up the processing into nice chunks that can run on the thread pool instead of tying up threads for different sessions/commands, blocking on wait handles.
This is what I'm starting with (some parts stripped out to simplify, such as details of exception handling; I've also omitted a wrapper that provides an efficient awaitable for the network I/O):
private readonly Task _serviceTask;
private readonly Task _commandsTask;
private readonly CancellationTokenSource _cancellation;
private readonly BufferBlock<Command> _pendingCommands;
public NetworkService(ICommandProcessor commandProcessor)
{
_commandProcessor = commandProcessor;
IsRunning = true;
_cancellation = new CancellationTokenSource();
_pendingCommands = new BufferBlock<Command>();
_serviceTask = Task.Run((Func<Task>)RunService);
_commandsTask = Task.Run((Func<Task>)RunCommands);
}
public bool IsRunning { get; private set; }
private async Task RunService()
{
_listener = new TcpListener(IPAddress.Any, ServicePort);
_listener.Start();
while (IsRunning)
{
Socket client = null;
try
{
client = await _listener.AcceptSocketAsync();
client.Blocking = false;
var session = RunSession(client);
lock (_sessions)
{
_sessions.Add(session);
}
}
catch (Exception ex)
{ //Handling here...
}
}
}
private async Task RunCommands()
{
while (IsRunning)
{
var command = await _pendingCommands.ReceiveAsync(_cancellation.Token);
var task = Task.Run(() => RunCommand(command));
}
}
private async Task RunCommand(Command command)
{
try
{
var response = await _commandProcessor.RunCommand(command.Content);
Send(command.Client, response);
}
catch (Exception ex)
{
//Deal with general command exceptions here...
}
}
private async Task RunSession(Socket client)
{
while (client.Connected)
{
var reader = new DelimitedCommandReader(client);
try
{
var content = await reader.ReceiveCommand();
_pendingCommands.Post(new Command(client, content));
}
catch (Exception ex)
{
//Exception handling here...
}
}
}
The basics seem straightforward, but one part is tripping me up: how do I make sure that when I'm shutting down the application, I wait for all pending command tasks to complete? I get the Task object when I use Task.Run to execute the command, but how do I keep track of pending commands so that I can make sure that all of them are complete before allowing the service to shut down?
I've considered using a simple List, with removal of commands from the List as they finish, but I'm wondering if I'm missing some basic tools in TPL Dataflow that would allow me to accomplish this more cleanly.
EDIT:
Reading more about TPL Dataflow, I'm wondering if what I should be using is a TransformBlock with an increased MaxDegreeOfParallelism to allow processing parallel commands? This sets an upper limit on the number of commands that can run in parallel, but that's a sensible limitation for my system, I think. I'm curious to hear from those who have experience with TPL Dataflow to know if I'm on the right track.
Yeah, so... you're kinda half using the power of TPL here. The fact that you're still manually receiving items from the BufferBlock in your own while loop in a background Task is not the "way" you want to do it if you're subscribing to the TPL DataFlow style.
What you would do is link an ActionBlock to the BufferBlock and do your command processing/sending from within that. This is also the block where you would set the MaxDegreeOfParallelism to control just how many concurrent commands you want to process. So that setup might look something like this:
// Initialization logic to build up the TPL flow
_pendingCommands = new BufferBlock<Command>();
_commandProcessor = new ActionBlock<Command>(this.ProcessCommand);
_pendingCommands.LinkTo(_commandProcessor);
private Task ProcessCommand(Command command)
{
var response = await _commandProcessor.RunCommand(command.Content);
this.Send(command.Client, response);
}
Then, in your shutdown code, you would need to signal that you're done adding items into the pipeline by calling Complete on the _pipelineCommands BufferBlock and then wait on the _commandProcessor ActionBlock to complete to ensure that all items have made their way through the pipeline. You do this by grabbing the Task returned by the block's Completion property and calling Wait on it:
_pendingCommands.Complete();
_commandProcessor.Completion.Wait();
If you want to go for bonus points, you can even separate the command processing from the command sending. This would allow you to configure those steps separately from one another. For example, maybe you need to limit the number of threads processing commands, but want to have more sending out the responses. You would do this by simply introducing a TransformBlock into the middle of the flow:
_pendingCommands = new BufferBlock<Command>();
_commandProcessor = new TransformBlock<Command, Tuple<Client, Response>>(this.ProcessCommand);
_commandSender = new ActionBlock<Tuple<Client, Response>(this.SendResponseToClient));
_pendingCommands.LinkTo(_commandProcessor);
_commandProcessor.LinkTo(_commandSender);
private Task ProcessCommand(Command command)
{
var response = await _commandProcessor.RunCommand(command.Content);
return Tuple.Create(command, response);
}
private Task SendResponseToClient(Tuple<Client, Response> clientAndResponse)
{
this.Send(clientAndResponse.Item1, clientAndResponse.Item2);
}
You probably want to use your own data structure instead of Tuple, it was just for illustrative purposes, but the point is this is exactly the kind of structure you want to use to break up the pipeline so that you can control the various aspects of it exactly how you might need to.
Tasks are by default background, which means that when application terminates they are also immediately terminated. You should use a Thread not a Task. Then you can set:
Thread.IsBackground = false;
This will prevent your application from terminating while the worker thread is running.
Although of course this will require some changes in your above code.
What's more you, when executing the shutdown method, you could also just wait for any outstanding tasks from the main thread.
I do not see a better solution to this.

Out of Memory Threading - Perf Test Tool

I'm creating a tool to load test (sends http: GETs) and it runs fine but eventually dies because of an out of memory error.
ASK: How can I reset the threads so this loop can continually run and not err?
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
while (true)
{
for (int i = 0; i < 1000; i++)
{
new Thread(LoadTest).Start(); //<-- EXCEPTION!.eventually errs out of memory
}
Thread.Sleep(2);
}
}
static void LoadTest()
{
string url = "http://myserv.com/api/dev/getstuff?whatstuff=thisstuff";
// Sends http get from above url ... and displays the repose in the console....
}
You are instantiating Threads left right and centre. This is likely you problem. You want to replace the
new Thread(LoadTest).Start();
with
Task.Run(LoadTest);
This will run your LoadTest on a Thread in the ThreadPool, instead of using resources to create a new Thread each time. HOWEVER. This will then expose a different issue.
Threads on the ThreadPool are a limited resource and you want to return Threads to the ThreadPool as soon as possible. I assume you are using the synchronous download methods as opposed to the APM methods. This means that whilst the request is being sent out to the server, the thread spawning the request is sleeping as opposed to going off to do some other work.
Either use (assuming .net 4.5)
var client = new WebClient();
var response = await client.DownloadStringTaskAsync(url);
Console.WriteLine(response);
Or use a callback (if not .net 4.5)
var client = new WebClient();
client.OnDownloadStringCompleted(x => Console.WriteLine(x));
client.BeginDownloadString(url);
Use a ThreadPool and use QueueUserWorkItem instead of creating thousands of threads. Threads are expensive objects and it is no surprise you are running out of memory and besides you won't be able to have any performance (in your test tool) with so many threads.
You code snippet creates lots of threads and no wonder it eventually runs out of memory. It would be better to use a Thread Pool here.
You code would look like this:
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
ThreadPool.SetMaxThreads(500, 300);
while (true)
{
ThreadPool.QueueUserWorkItem(LoadTest);
}
}
static void LoadTest(object state)
{
string url = "http://myserv.com/api/dev/getstuff?whatstuff=thisstuff";
// Sends http get from above url ... and displays the repose in the console....
}

How do I know when it's safe to call Dispose?

I have a search application that takes some time (10 to 15 seconds) to return results for some requests. It's not uncommon to have multiple concurrent requests for the same information. As it stands, I have to process those independently, which makes for quite a bit of unnecessary processing.
I've come up with a design that should allow me to avoid the unnecessary processing, but there's one lingering problem.
Each request has a key that identifies the data being requested. I maintain a dictionary of requests, keyed by the request key. The request object has some state information and a WaitHandle that is used to wait on the results.
When a client calls my Search method, the code checks the dictionary to see if a request already exists for that key. If so, the client just waits on the WaitHandle. If no request exists, I create one, add it to the dictionary, and issue an asynchronous call to get the information. Again, the code waits on the event.
When the asynchronous process has obtained the results, it updates the request object, removes the request from the dictionary, and then signals the event.
This all works great. Except I don't know when to dispose of the request object. That is, since I don't know when the last client is using it, I can't call Dispose on it. I have to wait for the garbage collector to come along and clean up.
Here's the code:
class SearchRequest: IDisposable
{
public readonly string RequestKey;
public string Results { get; set; }
public ManualResetEvent WaitEvent { get; private set; }
public SearchRequest(string key)
{
RequestKey = key;
WaitEvent = new ManualResetEvent(false);
}
public void Dispose()
{
WaitEvent.Dispose();
GC.SuppressFinalize(this);
}
}
ConcurrentDictionary<string, SearchRequest> Requests = new ConcurrentDictionary<string, SearchRequest>();
string Search(string key)
{
SearchRequest req;
bool addedNew = false;
req = Requests.GetOrAdd(key, (s) =>
{
// Create a new request.
var r = new SearchRequest(s);
Console.WriteLine("Added new request with key {0}", key);
addedNew = true;
return r;
});
if (addedNew)
{
// A new request was created.
// Start a search.
ThreadPool.QueueUserWorkItem((obj) =>
{
// Get the results
req.Results = DoSearch(req.RequestKey); // DoSearch takes several seconds
// Remove the request from the pending list
SearchRequest trash;
Requests.TryRemove(req.RequestKey, out trash);
// And signal that the request is finished
req.WaitEvent.Set();
});
}
Console.WriteLine("Waiting for results from request with key {0}", key);
req.WaitEvent.WaitOne();
return req.Results;
}
Basically, I don't know when the last client will be released. No matter how I slice it here, I have a race condition. Consider:
Thread A Creates a new request, starts Thread 2, and waits on the wait handle.
Thread B Begins processing the request.
Thread C detects that there's a pending request, and then gets swapped out.
Thread B Completes the request, removes the item from the dictionary, and sets the event.
Thread A's wait is satisfied, and it returns the result.
Thread C wakes up, calls WaitOne, is released, and returns the result.
If I use some kind of reference counting so that the "last" client calls Dispose, then the object would be disposed by Thread A in the above scenario. Thread C would then die when it tried to wait on the disposed WaitHandle.
The only way I can see to fix this is to use a reference counting scheme and protect access to the dictionary with a lock (in which case using ConcurrentDictionary is pointless) so that a lookup is always accompanied by an increment of the reference count. Whereas that would work, it seems like an ugly hack.
Another solution would be to ditch the WaitHandle and use an event-like mechanism with callbacks. But that, too, would require me to protect the lookups with a lock, and I have the added complication of dealing with an event or a naked multicast delegate. That seems like a hack, too.
This probably isn't a problem currently, because this application doesn't yet get enough traffic for those abandoned handles to add up before the next GC pass comes and cleans them up. And maybe it won't ever be a problem? It worries me, though, that I'm leaving them to be cleaned up by the GC when I should be calling Dispose to get rid of them.
Ideas? Is this a potential problem? If so, do you have a clean solution?
Consider using Lazy<T> for SearchRequest.Results maybe? But that would probably entail a bit of redesign. Haven't thought this out completely.
But what would probably be almost a drop-in replacement for your use case is to implement your own Wait() and Set() methods in SearchRequest. Something like:
object _resultLock;
void Wait()
{
lock(_resultLock)
{
while (!_hasResult)
Monitor.Wait(_resultLock);
}
}
void Set(string results)
{
lock(_resultLock)
{
Results = results;
_hasResult = true;
Monitor.PulseAll(_resultLock);
}
}
No need to dispose. :)
I think that your best bet to make this work is to use the TPL for all of you multi-threading needs. That's what it is good at.
As per my comment on your question, you need to keep in mind that ConcurrentDictionary does have side-effects. If multiple threads try to call GetOrAdd at the same time then the factory can be invoked for all of them, but only one will win. The values produced for the other threads will just be discarded, however by then the compute has been done.
Since you also said that doing searches is expensive then the cost of taking a lock ad then using a standard dictionary would be minimal.
So this is what I suggest:
private Dictionary<string, Task<string>> _requests
= new Dictionary<string, Task<string>>();
public string Search(string key)
{
Task<string> task;
lock (_requests)
{
if (_requests.ContainsKey(key))
{
task = _requests[key];
}
else
{
task = Task<string>
.Factory
.StartNew(() => DoSearch(key));
_requests[key] = task;
task.ContinueWith(t =>
{
lock(_requests)
{
_requests.Remove(key);
}
});
}
}
return task.Result;
}
This option nicely runs the search, remembers the task throughout the duration of the search and then removes it from the dictionary when it completes. All requests for the same key while a search is executing get the same task and so will get the same result once the task is complete.
I've test the code and it works.

Threadpool/WaitHandle resource leak/crash

I think I may need to re-think my design. I'm having a hard time narrowing down a bug that is causing my computer to completely hang, sometimes throwing an HRESULT 0x8007000E from VS 2010.
I have a console application (that I will later convert to a service) that handles transferring files based on a database queue.
I am throttling the threads allowed to transfer. This is because some systems we are connecting to can only contain a certain number of connections from certain accounts.
For example, System A can only accept 3 simultaneous connections (which means 3 separate threads). Each one of these threads has their own unique connection object, so we shouldn't run in to any synchronization problems since they aren't sharing a connection.
We want to process the files from those systems in cycles. So, for example, we will allow 3 connections that can transfer up to 100 files per connection. This means, to move 1000 files from System A, we can only process 300 files per cycle, since 3 threads are allowed with 100 files each. Therefore, over the lifetime of this transfer, we will have 10 threads. We can only run 3 at a time. So, there will be 3 cycles, and the last cycle will only use 1 thread to transfer the last 100 files. (3 threads x 100 files = 300 files per cycle)
The current architecture by example is:
A System.Threading.Timer checks the queue every 5 seconds for something to do by calling GetScheduledTask()
If there's nothing to, GetScheduledTask() simply does nothing
If there is work, create a ThreadPool thread to process the work [Work Thread A]
Work Thread A sees that there are 1000 files to transfer
Work Thread A sees that it can only have 3 threads running to the system it is getting files from
Work Thread A starts three new work threads [B,C,D] and transfers
Work Thread A waits for B,C,D [WaitHandle.WaitAll(transfersArray)]
Work Thread A sees that there are still more files in the queue (should be 700 now)
Work Thread A creates a new array to wait on [transfersArray = new TransferArray[3] which is the max for System A, but could vary on system
Work Thread A starts three new work threads [B,C,D] and waits for them [WaitHandle.WaitAll(transfersArray)]
The process repeats until there are no more files to move.
Work Thread A signals that it is done
I am using ManualResetEvent to handle the signaling.
My questions are:
Is there any glaring circumstance which would cause a resource leak or problem that I am experiencing?
Should I loop thru the array after every WaitHandle.WaitAll(array) and call array[index].Dispose()?
The Handle count under the Task Manager for this process slowly creeps up
I am calling the initial creation of Worker Thread A from a System.Threading.Timer. Is there going to be any problems with this? The code for that timer is:
(Some class code for scheduling)
private ManualResetEvent _ResetEvent;
private void Start()
{
_IsAlive = true;
ManualResetEvent transferResetEvent = new ManualResetEvent(false);
//Set the scheduler timer to 5 second intervals
_ScheduledTasks = new Timer(new TimerCallback(ScheduledTasks_Tick), transferResetEvent, 200, 5000);
}
private void ScheduledTasks_Tick(object state)
{
ManualResetEvent resetEvent = null;
try
{
resetEvent = (ManualResetEvent)state;
//Block timer until GetScheduledTasks() finishes
_ScheduledTasks.Change(Timeout.Infinite, Timeout.Infinite);
GetScheduledTasks();
}
finally
{
_ScheduledTasks.Change(5000, 5000);
Console.WriteLine("{0} [Main] GetScheduledTasks() finished", DateTime.Now.ToString("MMddyy HH:mm:ss:fff"));
resetEvent.Set();
}
}
private void GetScheduledTask()
{
try
{
//Check to see if the database connection is still up
if (!_IsAlive)
{
//Handle
_ConnectionLostNotification = true;
return;
}
//Get scheduled records from the database
ISchedulerTask task = null;
using (DataTable dt = FastSql.ExecuteDataTable(
_ConnectionString, "hidden for security", System.Data.CommandType.StoredProcedure,
new List<FastSqlParam>() { new FastSqlParam(ParameterDirection.Input, SqlDbType.VarChar, "#ProcessMachineName", Environment.MachineName) })) //call to static class
{
if (dt != null)
{
if (dt.Rows.Count == 1)
{ //Only 1 row is allowed
DataRow dr = dt.Rows[0];
//Get task information
TransferParam.TaskType taskType = (TransferParam.TaskType)Enum.Parse(typeof(TransferParam.TaskType), dr["TaskTypeId"].ToString());
task = ScheduledTaskFactory.CreateScheduledTask(taskType);
task.Description = dr["Description"].ToString();
task.IsEnabled = (bool)dr["IsEnabled"];
task.IsProcessing = (bool)dr["IsProcessing"];
task.IsManualLaunch = (bool)dr["IsManualLaunch"];
task.ProcessMachineName = dr["ProcessMachineName"].ToString();
task.NextRun = (DateTime)dr["NextRun"];
task.PostProcessNotification = (bool)dr["NotifyPostProcess"];
task.PreProcessNotification = (bool)dr["NotifyPreProcess"];
task.Priority = (TransferParam.Priority)Enum.Parse(typeof(TransferParam.SystemType), dr["PriorityId"].ToString());
task.SleepMinutes = (int)dr["SleepMinutes"];
task.ScheduleId = (int)dr["ScheduleId"];
task.CurrentRuns = (int)dr["CurrentRuns"];
task.TotalRuns = (int)dr["TotalRuns"];
SchedulerTask scheduledTask = new SchedulerTask(new ManualResetEvent(false), task);
//Queue up task to worker thread and start
ThreadPool.QueueUserWorkItem(new WaitCallback(this.ThreadProc), scheduledTask);
}
}
}
}
catch (Exception ex)
{
//Handle
}
}
private void ThreadProc(object taskObject)
{
SchedulerTask task = (SchedulerTask)taskObject;
ScheduledTaskEngine engine = null;
try
{
engine = SchedulerTaskEngineFactory.CreateTaskEngine(task.Task, _ConnectionString);
engine.StartTask(task.Task);
}
catch (Exception ex)
{
//Handle
}
finally
{
task.TaskResetEvent.Set();
task.TaskResetEvent.Dispose();
}
}
0x8007000E is an out-of-memory error. That and the handle count seem to point to a resource leak. Ensure you're disposing of every object that implements IDisposable. This includes the arrays of ManualResetEvents you're using.
If you have time, you may also want to convert to using the .NET 4.0 Task class; it was designed to handle complex scenarios like this much more cleanly. By defining child Task objects, you can reduce your overall thread count (threads are quite expensive not only because of scheduling but also because of their stack space).
I'm looking for answers to a similar problem (Handles Count increasing over time).
I took a look at your application architecture and like to suggest you something that could help you out:
Have you heard about IOCP (Input Output Completion Ports).
I'm not sure of the dificulty to implement this using C# but in C/C++ it is a piece of cake.
By using this you create a unique thread pool (The number of threads in that pool is in general defined as 2 x the number of processors or processors cores in the PC or server)
You associate this pool to a IOCP Handle and the pool does the work.
See the help for these functions:
CreateIoCompletionPort();
PostQueuedCompletionStatus();
GetQueuedCompletionStatus();
In General creating and exiting threads on the fly could be time consuming and leads to performance penalties and memory fragmentation.
There are thousands of literature about IOCP in MSDN and in google.
I think you should reconsider your architecture altogether. The fact that you can only have 3 simultaneously connections is almost begging you to use 1 thread to generate the list of files and 3 threads to process them. Your producer thread would insert all files into a queue and the 3 consumer threads will dequeue and continue processing as items arrive in the queue. A blocking queue can significantly simplify the code. If you are using .NET 4.0 then you can take advantage of the BlockingCollection class.
public class Example
{
private BlockingCollection<string> m_Queue = new BlockingCollection<string>();
public void Start()
{
var threads = new Thread[]
{
new Thread(Producer),
new Thread(Consumer),
new Thread(Consumer),
new Thread(Consumer)
};
foreach (Thread thread in threads)
{
thread.Start();
}
}
private void Producer()
{
while (true)
{
Thread.Sleep(TimeSpan.FromSeconds(5));
ScheduledTask task = GetScheduledTask();
if (task != null)
{
foreach (string file in task.Files)
{
m_Queue.Add(task);
}
}
}
}
private void Consumer()
{
// Make a connection to the resource that is assigned to this thread only.
while (true)
{
string file = m_Queue.Take();
// Process the file.
}
}
}
I have definitely oversimplified things in the example above, but I hope you get the general idea. Notice how this is much simpler as there is not much in the way of thread synchronization (most will be embedded in the blocking queue) and of course there is no use of WaitHandle objects. Obviously you would have to add in the correct mechanisms to shut down the threads gracefully, but that should be fairly easy.
It turns out the source of this strange problem was not related to architecture but rather because of converting the solution from 3.5 to 4.0. I re-created the solution, performing no code changes, and the problem never occurred again.

How to effectively log asynchronously?

I am using Enterprise Library 4 on one of my projects for logging (and other purposes). I've noticed that there is some cost to the logging that I am doing that I can mitigate by doing the logging on a separate thread.
The way I am doing this now is that I create a LogEntry object and then I call BeginInvoke on a delegate that calls Logger.Write.
new Action<LogEntry>(Logger.Write).BeginInvoke(le, null, null);
What I'd really like to do is add the log message to a queue and then have a single thread pulling LogEntry instances off the queue and performing the log operation. The benefit of this would be that logging is not interfering with the executing operation and not every logging operation results in a job getting thrown on the thread pool.
How can I create a shared queue that supports many writers and one reader in a thread safe way? Some examples of a queue implementation that is designed to support many writers (without causing synchronization/blocking) and a single reader would be really appreciated.
Recommendation regarding alternative approaches would also be appreciated, I am not interested in changing logging frameworks though.
I wrote this code a while back, feel free to use it.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace MediaBrowser.Library.Logging {
public abstract class ThreadedLogger : LoggerBase {
Queue<Action> queue = new Queue<Action>();
AutoResetEvent hasNewItems = new AutoResetEvent(false);
volatile bool waiting = false;
public ThreadedLogger() : base() {
Thread loggingThread = new Thread(new ThreadStart(ProcessQueue));
loggingThread.IsBackground = true;
loggingThread.Start();
}
void ProcessQueue() {
while (true) {
waiting = true;
hasNewItems.WaitOne(10000,true);
waiting = false;
Queue<Action> queueCopy;
lock (queue) {
queueCopy = new Queue<Action>(queue);
queue.Clear();
}
foreach (var log in queueCopy) {
log();
}
}
}
public override void LogMessage(LogRow row) {
lock (queue) {
queue.Enqueue(() => AsyncLogMessage(row));
}
hasNewItems.Set();
}
protected abstract void AsyncLogMessage(LogRow row);
public override void Flush() {
while (!waiting) {
Thread.Sleep(1);
}
}
}
}
Some advantages:
It keeps the background logger alive, so it does not need to spin up and spin down threads.
It uses a single thread to service the queue, which means there will never be a situation where 100 threads are servicing the queue.
It copies the queues to ensure the queue is not blocked while the log operation is performed
It uses an AutoResetEvent to ensure the bg thread is in a wait state
It is, IMHO, very easy to follow
Here is a slightly improved version, keep in mind I performed very little testing on it, but it does address a few minor issues.
public abstract class ThreadedLogger : IDisposable {
Queue<Action> queue = new Queue<Action>();
ManualResetEvent hasNewItems = new ManualResetEvent(false);
ManualResetEvent terminate = new ManualResetEvent(false);
ManualResetEvent waiting = new ManualResetEvent(false);
Thread loggingThread;
public ThreadedLogger() {
loggingThread = new Thread(new ThreadStart(ProcessQueue));
loggingThread.IsBackground = true;
// this is performed from a bg thread, to ensure the queue is serviced from a single thread
loggingThread.Start();
}
void ProcessQueue() {
while (true) {
waiting.Set();
int i = ManualResetEvent.WaitAny(new WaitHandle[] { hasNewItems, terminate });
// terminate was signaled
if (i == 1) return;
hasNewItems.Reset();
waiting.Reset();
Queue<Action> queueCopy;
lock (queue) {
queueCopy = new Queue<Action>(queue);
queue.Clear();
}
foreach (var log in queueCopy) {
log();
}
}
}
public void LogMessage(LogRow row) {
lock (queue) {
queue.Enqueue(() => AsyncLogMessage(row));
}
hasNewItems.Set();
}
protected abstract void AsyncLogMessage(LogRow row);
public void Flush() {
waiting.WaitOne();
}
public void Dispose() {
terminate.Set();
loggingThread.Join();
}
}
Advantages over the original:
It's disposable, so you can get rid of the async logger
The flush semantics are improved
It will respond slightly better to a burst followed by silence
Yes, you need a producer/consumer queue. I have one example of this in my threading tutorial - if you look my "deadlocks / monitor methods" page you'll find the code in the second half.
There are plenty of other examples online, of course - and .NET 4.0 will ship with one in the framework too (rather more fully featured than mine!). In .NET 4.0 you'd probably wrap a ConcurrentQueue<T> in a BlockingCollection<T>.
The version on that page is non-generic (it was written a long time ago) but you'd probably want to make it generic - it would be trivial to do.
You would call Produce from each "normal" thread, and Consume from one thread, just looping round and logging whatever it consumes. It's probably easiest just to make the consumer thread a background thread, so you don't need to worry about "stopping" the queue when your app exits. That does mean there's a remote possibility of missing the final log entry though (if it's half way through writing it when the app exits) - or even more if you're producing faster than it can consume/log.
Here is what I came up with... also see Sam Saffron's answer. This answer is community wiki in case there are any problems that people see in the code and want to update.
/// <summary>
/// A singleton queue that manages writing log entries to the different logging sources (Enterprise Library Logging) off the executing thread.
/// This queue ensures that log entries are written in the order that they were executed and that logging is only utilizing one thread (backgroundworker) at any given time.
/// </summary>
public class AsyncLoggerQueue
{
//create singleton instance of logger queue
public static AsyncLoggerQueue Current = new AsyncLoggerQueue();
private static readonly object logEntryQueueLock = new object();
private Queue<LogEntry> _LogEntryQueue = new Queue<LogEntry>();
private BackgroundWorker _Logger = new BackgroundWorker();
private AsyncLoggerQueue()
{
//configure background worker
_Logger.WorkerSupportsCancellation = false;
_Logger.DoWork += new DoWorkEventHandler(_Logger_DoWork);
}
public void Enqueue(LogEntry le)
{
//lock during write
lock (logEntryQueueLock)
{
_LogEntryQueue.Enqueue(le);
//while locked check to see if the BW is running, if not start it
if (!_Logger.IsBusy)
_Logger.RunWorkerAsync();
}
}
private void _Logger_DoWork(object sender, DoWorkEventArgs e)
{
while (true)
{
LogEntry le = null;
bool skipEmptyCheck = false;
lock (logEntryQueueLock)
{
if (_LogEntryQueue.Count <= 0) //if queue is empty than BW is done
return;
else if (_LogEntryQueue.Count > 1) //if greater than 1 we can skip checking to see if anything has been enqueued during the logging operation
skipEmptyCheck = true;
//dequeue the LogEntry that will be written to the log
le = _LogEntryQueue.Dequeue();
}
//pass LogEntry to Enterprise Library
Logger.Write(le);
if (skipEmptyCheck) //if LogEntryQueue.Count was > 1 before we wrote the last LogEntry we know to continue without double checking
{
lock (logEntryQueueLock)
{
if (_LogEntryQueue.Count <= 0) //if queue is still empty than BW is done
return;
}
}
}
}
}
I suggest to start with measuring actual performance impact of logging on the overall system (i.e. by running profiler) and optionally switching to something faster like log4net (I've personally migrated to it from EntLib logging a long time ago).
If this does not work, you can try using this simple method from .NET Framework:
ThreadPool.QueueUserWorkItem
Queues a method for execution. The method executes when a thread pool thread becomes available.
MSDN Details
If this does not work either then you can resort to something like John Skeet has offered and actually code the async logging framework yourself.
In response to Sam Safrons post, I wanted to call flush and make sure everything was really finished writting. In my case, I am writing to a database in the queue thread and all my log events were getting queued up but sometimes the application stopped before everything was finished writing which is not acceptable in my situation. I changed several chunks of your code but the main thing I wanted to share was the flush:
public static void FlushLogs()
{
bool queueHasValues = true;
while (queueHasValues)
{
//wait for the current iteration to complete
m_waitingThreadEvent.WaitOne();
lock (m_loggerQueueSync)
{
queueHasValues = m_loggerQueue.Count > 0;
}
}
//force MEL to flush all its listeners
foreach (MEL.LogSource logSource in MEL.Logger.Writer.TraceSources.Values)
{
foreach (TraceListener listener in logSource.Listeners)
{
listener.Flush();
}
}
}
I hope that saves someone some frustration. It is especially apparent in parallel processes logging lots of data.
Thanks for sharing your solution, it set me into a good direction!
--Johnny S
I wanted to say that my previous post was kind of useless. You can simply set AutoFlush to true and you will not have to loop through all the listeners. However, I still had crazy problem with parallel threads trying to flush the logger. I had to create another boolean that was set to true during the copying of the queue and executing the LogEntry writes and then in the flush routine I had to check that boolean to make sure something was not already in the queue and the nothing was getting processed before returning.
Now multiple threads in parallel can hit this thing and when I call flush I know it is really flushed.
public static void FlushLogs()
{
int queueCount;
bool isProcessingLogs;
while (true)
{
//wait for the current iteration to complete
m_waitingThreadEvent.WaitOne();
//check to see if we are currently processing logs
lock (m_isProcessingLogsSync)
{
isProcessingLogs = m_isProcessingLogs;
}
//check to see if more events were added while the logger was processing the last batch
lock (m_loggerQueueSync)
{
queueCount = m_loggerQueue.Count;
}
if (queueCount == 0 && !isProcessingLogs)
break;
//since something is in the queue, reset the signal so we will not keep looping
Thread.Sleep(400);
}
}
Just an update:
Using enteprise library 5.0 with .NET 4.0 it can easily be done by:
static public void LogMessageAsync(LogEntry logEntry)
{
Task.Factory.StartNew(() => LogMessage(logEntry));
}
See:
http://randypaulo.wordpress.com/2011/07/28/c-enterprise-library-asynchronous-logging/
An extra level of indirection may help here.
Your first async method call can put messages onto a synchonized Queue and set an event -- so the locks are happening in the thread-pool, not on your worker threads -- and then have yet another thread pulling messages off the queue when the event is raised.
If you log something on a separate thread, the message may not be written if the application crashes, which makes it rather useless.
The reason goes why you should always flush after every written entry.
If what you have in mind is a SHARED queue, then I think you are going to have to synchronize the writes to it, the pushes and the pops.
But, I still think it's worth aiming at the shared queue design. In comparison to the IO of logging and probably in comparison to the other work your app is doing, the brief amount of blocking for the pushes and the pops will probably not be significant.

Categories

Resources