How do I know when it's safe to call Dispose? - c#

I have a search application that takes some time (10 to 15 seconds) to return results for some requests. It's not uncommon to have multiple concurrent requests for the same information. As it stands, I have to process those independently, which makes for quite a bit of unnecessary processing.
I've come up with a design that should allow me to avoid the unnecessary processing, but there's one lingering problem.
Each request has a key that identifies the data being requested. I maintain a dictionary of requests, keyed by the request key. The request object has some state information and a WaitHandle that is used to wait on the results.
When a client calls my Search method, the code checks the dictionary to see if a request already exists for that key. If so, the client just waits on the WaitHandle. If no request exists, I create one, add it to the dictionary, and issue an asynchronous call to get the information. Again, the code waits on the event.
When the asynchronous process has obtained the results, it updates the request object, removes the request from the dictionary, and then signals the event.
This all works great. Except I don't know when to dispose of the request object. That is, since I don't know when the last client is using it, I can't call Dispose on it. I have to wait for the garbage collector to come along and clean up.
Here's the code:
class SearchRequest: IDisposable
{
public readonly string RequestKey;
public string Results { get; set; }
public ManualResetEvent WaitEvent { get; private set; }
public SearchRequest(string key)
{
RequestKey = key;
WaitEvent = new ManualResetEvent(false);
}
public void Dispose()
{
WaitEvent.Dispose();
GC.SuppressFinalize(this);
}
}
ConcurrentDictionary<string, SearchRequest> Requests = new ConcurrentDictionary<string, SearchRequest>();
string Search(string key)
{
SearchRequest req;
bool addedNew = false;
req = Requests.GetOrAdd(key, (s) =>
{
// Create a new request.
var r = new SearchRequest(s);
Console.WriteLine("Added new request with key {0}", key);
addedNew = true;
return r;
});
if (addedNew)
{
// A new request was created.
// Start a search.
ThreadPool.QueueUserWorkItem((obj) =>
{
// Get the results
req.Results = DoSearch(req.RequestKey); // DoSearch takes several seconds
// Remove the request from the pending list
SearchRequest trash;
Requests.TryRemove(req.RequestKey, out trash);
// And signal that the request is finished
req.WaitEvent.Set();
});
}
Console.WriteLine("Waiting for results from request with key {0}", key);
req.WaitEvent.WaitOne();
return req.Results;
}
Basically, I don't know when the last client will be released. No matter how I slice it here, I have a race condition. Consider:
Thread A Creates a new request, starts Thread 2, and waits on the wait handle.
Thread B Begins processing the request.
Thread C detects that there's a pending request, and then gets swapped out.
Thread B Completes the request, removes the item from the dictionary, and sets the event.
Thread A's wait is satisfied, and it returns the result.
Thread C wakes up, calls WaitOne, is released, and returns the result.
If I use some kind of reference counting so that the "last" client calls Dispose, then the object would be disposed by Thread A in the above scenario. Thread C would then die when it tried to wait on the disposed WaitHandle.
The only way I can see to fix this is to use a reference counting scheme and protect access to the dictionary with a lock (in which case using ConcurrentDictionary is pointless) so that a lookup is always accompanied by an increment of the reference count. Whereas that would work, it seems like an ugly hack.
Another solution would be to ditch the WaitHandle and use an event-like mechanism with callbacks. But that, too, would require me to protect the lookups with a lock, and I have the added complication of dealing with an event or a naked multicast delegate. That seems like a hack, too.
This probably isn't a problem currently, because this application doesn't yet get enough traffic for those abandoned handles to add up before the next GC pass comes and cleans them up. And maybe it won't ever be a problem? It worries me, though, that I'm leaving them to be cleaned up by the GC when I should be calling Dispose to get rid of them.
Ideas? Is this a potential problem? If so, do you have a clean solution?

Consider using Lazy<T> for SearchRequest.Results maybe? But that would probably entail a bit of redesign. Haven't thought this out completely.
But what would probably be almost a drop-in replacement for your use case is to implement your own Wait() and Set() methods in SearchRequest. Something like:
object _resultLock;
void Wait()
{
lock(_resultLock)
{
while (!_hasResult)
Monitor.Wait(_resultLock);
}
}
void Set(string results)
{
lock(_resultLock)
{
Results = results;
_hasResult = true;
Monitor.PulseAll(_resultLock);
}
}
No need to dispose. :)

I think that your best bet to make this work is to use the TPL for all of you multi-threading needs. That's what it is good at.
As per my comment on your question, you need to keep in mind that ConcurrentDictionary does have side-effects. If multiple threads try to call GetOrAdd at the same time then the factory can be invoked for all of them, but only one will win. The values produced for the other threads will just be discarded, however by then the compute has been done.
Since you also said that doing searches is expensive then the cost of taking a lock ad then using a standard dictionary would be minimal.
So this is what I suggest:
private Dictionary<string, Task<string>> _requests
= new Dictionary<string, Task<string>>();
public string Search(string key)
{
Task<string> task;
lock (_requests)
{
if (_requests.ContainsKey(key))
{
task = _requests[key];
}
else
{
task = Task<string>
.Factory
.StartNew(() => DoSearch(key));
_requests[key] = task;
task.ContinueWith(t =>
{
lock(_requests)
{
_requests.Remove(key);
}
});
}
}
return task.Result;
}
This option nicely runs the search, remembers the task throughout the duration of the search and then removes it from the dictionary when it completes. All requests for the same key while a search is executing get the same task and so will get the same result once the task is complete.
I've test the code and it works.

Related

How do I prevent by Rx test from hanging?

I am reproducing my Rx issue with a simplified test case below. The test below hangs. I am sure it is a small, but fundamental, thing that I am missing, but can't put my finger on it.
public class Service
{
private ISubject<double> _subject = new Subject<double>();
public void Reset()
{
_subject.OnNext(0.0);
}
public IObservable<double> GetProgress()
{
return _subject;
}
}
public class ObTest
{
[Fact]
private async Task SimpleTest()
{
var service = new Service();
var result = service.GetProgress().Take(1);
var task = Task.Run(async () =>
{
service.Reset();
});
await result;
}
}
UPDATE
My attempt above was to simplify the problem a little and understand it. In my case GetProgress() is a merge of various Observables that publish the download progress, one of these Observables is a Subject<double> that publishes 0 everytime somebody calls a method to delete the download.
The race condition identified by Enigmativity and Theodor Zoulias may(??) happen in real life. I display a view which attempts to get the progress, however, quick fingers delete it just in time.
What I need to understand a bit more is if the download is started again (subscription has taken place by now, by virtue of displaying a view, which has already made the subscription) and somebody again deletes it.
public class Service
{
private ISubject<double> _deleteSubject = new Subject<double>();
public void Reset()
{
_deleteSubject.OnNext(0.0);
}
public IObservable<double> GetProgress()
{
return _deleteSubject.Merge(downloadProgress);
}
}
Your code isn't hanging. It's awaiting an observable that sometimes never gets a value.
You have a race condition.
The Task.Run is sometimes executing to completion before the await result creates the subscription to the observable - so it never sees the value.
Try this code instead:
private async Task SimpleTest()
{
var service = new Service();
var result = service.GetProgress().Take(1);
var awaiter = result.GetAwaiter();
var task = Task.Run(() =>
{
service.Reset();
});
await awaiter;
}
The line await result creates a subscription to the observable. The problem is that the notification _subject.OnNext(0.0) may occur before this subscription, in which case the value will pass unobserved, and the await result will continue waiting for a notification for ever. In this particular example the notification is always missed, at least in my PC, because the subscription is delayed for around 30 msec (measured with a Stopwatch), which is longer than the time needed for the task that resets the service to complete, probably because the JITer must load and compile some RX-related assembly. The situation changes when I do a warm-up by calling new Subject<int>().FirstAsync().Subscribe() before running the example. In that case the notification is observed almost always, and the hanging is avoided.
I can think of two robust solutions to this problem.
The solution suggested by Enigmativity, to create an awaitable subscription before starting the task that resets the service. This can be done with either GetAwaiter or ToTask.
To use a ReplaySubject<T> instead of a plain vanilla Subject<T>.
Represents an object that is both an observable sequence as well as an observer. Each notification is broadcasted to all subscribed and future observers, subject to buffer trimming policies.
The ReplaySubject will cache the value so that it can be observed by the future subscription, eliminating the race condition. You could initialize it with a bufferSize of 1 to minimize the memory footprint of the buffer.

Should/Could this "recursive Task" be expressed as a TaskContinuation?

In my application I have the need to continually process some piece(s) of Work on some set interval(s). I had originally written a Task to continually check a given Task.Delay to see if it was completed, if so the Work would be processed that corresponded to that Task.Delay. The draw back to this method is the Task that checks these Task.Delays would be in a psuedo-infinite loop when no Task.Delay is completed.
To solve this problem I found that I could create a "recursive Task" (I am not sure what the jargon for this would be) that processes the work at the given interval as needed.
// New Recurring Work can be added by simply creating
// the Task below and adding an entry into this Dictionary.
// Recurring Work can be removed/stopped by looking
// it up in this Dictionary and calling its CTS.Cancel method.
private readonly object _LockRecurWork = new object();
private Dictionary<Work, Tuple<Task, CancellationTokenSource> RecurringWork { get; set; }
...
private Task CreateRecurringWorkTask(Work workToDo, CancellationTokenSource taskTokenSource)
{
return Task.Run(async () =>
{
// Do the Work, then wait the prescribed amount of time before doing it again
DoWork(workToDo);
await Task.Delay(workToDo.RecurRate, taskTokenSource.Token);
// If this Work's CancellationTokenSource is not
// cancelled then "schedule" the next Work execution
if (!taskTokenSource.IsCancellationRequested)
{
lock(_LockRecurWork)
{
RecurringWork[workToDo] = new Tuple<Task, CancellationTokenSource>
(CreateRecurringWorkTask(workToDo, taskTokenSource), taskTokenSource);
}
}
}, taskTokenSource.Token);
}
Should/Could this be represented with a chain of Task.ContinueWith? Would there be any benefit to such an implementation? Is there anything majorly wrong with the current implementation?
Yes!
Calling ContinueWith tells the Task to call your code as soon as it finishes. This is far faster than manually polling it.

BrokeredMessage Automatically Disposed after calling OnMessage()

I am trying to queue up items from an Azure Service Bus so I can process them in bulk. I am aware that the Azure Service Bus has a ReceiveBatch() but it seems problematic for the following reasons:
I can only get a max of 256 messages at a time and even this then can be random based on message size.
Even if I peek to see how many messages are waiting I don't know how many RequestBatch calls to make because I don't know how many messages each call will give me back. Since messages will keep coming in I can't just continue to make requests until it's empty since it will never be empty.
I decided to just use the message listener which is cheaper than doing wasted peeks and will give me more control.
Basically I am trying to let a set number of messages build up and
then process them at once. I use a timer to force a delay but I need
to be able to queue my items as they come in.
Based on my timer requirement it seemed like the blocking collection was not a good option so I am trying to use ConcurrentBag.
var batchingQueue = new ConcurrentBag<BrokeredMessage>();
myQueueClient.OnMessage((m) =>
{
Console.WriteLine("Queueing message");
batchingQueue.Add(m);
});
while (true)
{
var sw = WaitableStopwatch.StartNew();
BrokeredMessage msg;
while (batchingQueue.TryTake(out msg)) // <== Object is already disposed
{
...do this until I have a thousand ready to be written to DB in batch
Console.WriteLine("Completing message");
msg.Complete(); // <== ERRORS HERE
}
sw.Wait(MINIMUM_DELAY);
}
However as soon as I access the message outside of the OnMessage
pipeline it shows the BrokeredMessage as already being disposed.
I am thinking this must be some automatic behavior of OnMessage and I don't see any way to do anything with the message other than process it right away which I don't want to do.
This is incredibly easy to do with BlockingCollection.
var batchingQueue = new BlockingCollection<BrokeredMessage>();
myQueueClient.OnMessage((m) =>
{
Console.WriteLine("Queueing message");
batchingQueue.Add(m);
});
And your consumer thread:
foreach (var msg in batchingQueue.GetConsumingEnumerable())
{
Console.WriteLine("Completing message");
msg.Complete();
}
GetConsumingEnumerable returns an iterator that consumes items in the queue until the IsCompleted property is set and the queue is empty. If the queue is empty but IsCompleted is False, it does a non-busy wait for the next item.
To cancel the consumer thread (i.e. shut down the program), you stop adding things to the queue and have the main thread call batchingQueue.CompleteAdding. The consumer will empty the queue, see that the IsCompleted property is True, and exit.
Using BlockingCollection here is better than ConcurrentBag or ConcurrentQueue, because the BlockingCollection interface is easier to work with. In particular, the use of GetConsumingEnumerable relieves you from having to worry about checking the count or doing busy waits (polling loops). It just works.
Also note that ConcurrentBag has some rather strange removal behavior. In particular, the order in which items are removed differs depending on which thread removes the item. The thread that created the bag removes items in a different order than other threads. See Using the ConcurrentBag Collection for the details.
You haven't said why you want to batch items on input. Unless there's an overriding performance reason to do so, it doesn't seem like a particularly good idea to complicate your code with that batching logic.
If you want to do batch writes to the database, then I would suggest using a simple List<T> to buffer the items. If you have to process the items before they're written to the database, then use the technique I showed above to process them. Then, rather writing directly to the database, add the item to a list. When the list gets 1,000 items, or a given amount of time elapses, allocate a new list and start a task to write the old list to the database. Like this:
// at class scope
// Flush every 5 minutes.
private readonly TimeSpan FlushDelay = TimeSpan.FromMinutes(5);
private const int MaxBufferItems = 1000;
// Create a timer for the buffer flush.
System.Threading.Timer _flushTimer = new System.Threading.Timer(TimedFlush, FlushDelay.TotalMilliseconds, Timeout.Infinite);
// A lock for the list. Unless you're getting hundreds of thousands
// of items per second, this will not be a performance problem.
object _listLock = new Object();
List<BrokeredMessage> _recordBuffer = new List<BrokeredMessage>();
Then, in your consumer:
foreach (var msg in batchingQueue.GetConsumingEnumerable())
{
// process the message
Console.WriteLine("Completing message");
msg.Complete();
lock (_listLock)
{
_recordBuffer.Add(msg);
if (_recordBuffer.Count >= MaxBufferItems)
{
// Stop the timer
_flushTimer.Change(Timeout.Infinite, Timeout.Infinite);
// Save the old list and allocate a new one
var myList = _recordBuffer;
_recordBuffer = new List<BrokeredMessage>();
// Start a task to write to the database
Task.Factory.StartNew(() => FlushBuffer(myList));
// Restart the timer
_flushTimer.Change(FlushDelay.TotalMilliseconds, Timeout.Infinite);
}
}
}
private void TimedFlush()
{
bool lockTaken = false;
List<BrokeredMessage> myList = null;
try
{
if (Monitor.TryEnter(_listLock, 0, out lockTaken))
{
// Save the old list and allocate a new one
myList = _recordBuffer;
_recordBuffer = new List<BrokeredMessage>();
}
}
finally
{
if (lockTaken)
{
Monitor.Exit(_listLock);
}
}
if (myList != null)
{
FlushBuffer(myList);
}
// Restart the timer
_flushTimer.Change(FlushDelay.TotalMilliseconds, Timeout.Infinite);
}
The idea here is that you get the old list out of the way, allocate a new list so that processing can continue, and then write the old list's items to the database. The lock is there to prevent the timer and the record counter from stepping on each other. Without the lock, things would likely appear to work fine for a while, and then you'd get weird crashes at unpredictable times.
I like this design because it eliminates polling by the consumer. The only thing I don't like is that the consumer has to be aware of the timer (i.e. it has to stop and then restart the timer). With a little more thought, I could eliminate that requirement. But it works well the way it's written.
Switching to OnMessageAsync solved the problem for me
_queueClient.OnMessageAsync(async receivedMessage =>
I reached out to Microsoft about the BrokeredMessage being disposed issue on MSDN, this is the response:
Very basic rule and I am not sure if this is documented. The received message needs to be processed in the callback function's life time. In your case, messages will be disposed when async callback completes, this is why your complete attempts are failing with ObjectDisposedException in another thread.
I don't really see how queuing messages for further processing helps on the throughput. This will add more burden to client for sure. Try processing the message in the async callback, that should be performant enough.
In my case that means I can't use ServiceBus in the way I wanted to, and I have to re-think how I wanted things to work. Bugger.
I had the same issue when started to work with Azure Service Bus service.
I have found that method OnMessage always dispose BrokedMessage object. The approach proposed by Jim Mischel didn't help me (but it was very interesting to read - thanks!).
After some investigation I have found that the whole approach is wrong. Let me explain the right way to do what you want.
Use BrokedMessage.Complete() method only inside OnMessage method handler.
If you need to process message outside of this method that you should use method QueueClient.Complete(Guid lockToken). "LockToken" is property of BrokeredMessage object.
Example:
var messageOptions = new OnMessageOptions {
AutoComplete = false,
AutoRenewTimeout = TimeSpan.FromMinutes( 5 ),
MaxConcurrentCalls = 1
};
var buffer = new Dictionary<string, Guid>();
// get message from queue
myQueueClient.OnMessage(
m => buffer.Add(key: m.GetBody<string>(), value: m.LockToken),
messageOptions // this option says to ServiceBus to "froze" message in he queue until we process it
);
foreach(var item in buffer){
try {
Console.WriteLine($"Process item: {item.Key}");
myQueueClient.Complete(item.Value);// you can also use method CompleteBatch(...) to improve performance
}
catch{
// "unfroze" message in ServiceBus. Message would be delivered to other listener
myQueueClient.Defer(item.Value);
}
}
My solution was to get the message SequenceNumber then defer the message and add the SequenceNumber the BlockingCollection. Once the BlockingCollection picks up a new item it can receive the deferred message by the SequenceNumber and mark the message as complete. If for some reason the BlockingCollection doesn't process the SequenceNumber it will remain in the queue as deferred so it can be picked up later when the process is restarted. This protects against loosing messages if the process abnormally terminates while there's still items in the BlockingCollection.
BlockingCollection<long> queueSequenceNumbers = new BlockingCollection<long>();
//This finds any deferred/unfinished messages on startup.
BrokeredMessage existingMessage = client.Peek();
while (existingMessage != null)
{
if (existingMessage.State == MessageState.Deferred)
{
queueSequenceNumbers.Add(existingMessage.SequenceNumber);
}
existingMessage = client.Peek();
}
//setup the message handler
Action<BrokeredMessage> processMessage = new Action<BrokeredMessage>((message) =>
{
try
{
//skip deferred messages if they are already in the queueSequenceNumbers collection.
if (message.State != MessageState.Deferred || (message.State == MessageState.Deferred && !queueSequenceNumbers.Any(x => x == message.SequenceNumber)))
{
message.Defer();
queueSequenceNumbers.Add(message.SequenceNumber);
}
}
catch (Exception ex)
{
// Indicates a problem, unlock message in queue
message.Abandon();
}
});
// Callback to handle newly received messages
client.OnMessage(processMessage, new OnMessageOptions() { AutoComplete = false, MaxConcurrentCalls = 1 });
//start the blocking loop to process messages as they are added to the collection
foreach (var queueSequenceNumber in queueSequenceNumbers.GetConsumingEnumerable())
{
var message = client.Receive(queueSequenceNumber);
//mark the message as complete so it's removed from the queue
message.Complete();
//do something with the message
}

How to effectively log asynchronously?

I am using Enterprise Library 4 on one of my projects for logging (and other purposes). I've noticed that there is some cost to the logging that I am doing that I can mitigate by doing the logging on a separate thread.
The way I am doing this now is that I create a LogEntry object and then I call BeginInvoke on a delegate that calls Logger.Write.
new Action<LogEntry>(Logger.Write).BeginInvoke(le, null, null);
What I'd really like to do is add the log message to a queue and then have a single thread pulling LogEntry instances off the queue and performing the log operation. The benefit of this would be that logging is not interfering with the executing operation and not every logging operation results in a job getting thrown on the thread pool.
How can I create a shared queue that supports many writers and one reader in a thread safe way? Some examples of a queue implementation that is designed to support many writers (without causing synchronization/blocking) and a single reader would be really appreciated.
Recommendation regarding alternative approaches would also be appreciated, I am not interested in changing logging frameworks though.
I wrote this code a while back, feel free to use it.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace MediaBrowser.Library.Logging {
public abstract class ThreadedLogger : LoggerBase {
Queue<Action> queue = new Queue<Action>();
AutoResetEvent hasNewItems = new AutoResetEvent(false);
volatile bool waiting = false;
public ThreadedLogger() : base() {
Thread loggingThread = new Thread(new ThreadStart(ProcessQueue));
loggingThread.IsBackground = true;
loggingThread.Start();
}
void ProcessQueue() {
while (true) {
waiting = true;
hasNewItems.WaitOne(10000,true);
waiting = false;
Queue<Action> queueCopy;
lock (queue) {
queueCopy = new Queue<Action>(queue);
queue.Clear();
}
foreach (var log in queueCopy) {
log();
}
}
}
public override void LogMessage(LogRow row) {
lock (queue) {
queue.Enqueue(() => AsyncLogMessage(row));
}
hasNewItems.Set();
}
protected abstract void AsyncLogMessage(LogRow row);
public override void Flush() {
while (!waiting) {
Thread.Sleep(1);
}
}
}
}
Some advantages:
It keeps the background logger alive, so it does not need to spin up and spin down threads.
It uses a single thread to service the queue, which means there will never be a situation where 100 threads are servicing the queue.
It copies the queues to ensure the queue is not blocked while the log operation is performed
It uses an AutoResetEvent to ensure the bg thread is in a wait state
It is, IMHO, very easy to follow
Here is a slightly improved version, keep in mind I performed very little testing on it, but it does address a few minor issues.
public abstract class ThreadedLogger : IDisposable {
Queue<Action> queue = new Queue<Action>();
ManualResetEvent hasNewItems = new ManualResetEvent(false);
ManualResetEvent terminate = new ManualResetEvent(false);
ManualResetEvent waiting = new ManualResetEvent(false);
Thread loggingThread;
public ThreadedLogger() {
loggingThread = new Thread(new ThreadStart(ProcessQueue));
loggingThread.IsBackground = true;
// this is performed from a bg thread, to ensure the queue is serviced from a single thread
loggingThread.Start();
}
void ProcessQueue() {
while (true) {
waiting.Set();
int i = ManualResetEvent.WaitAny(new WaitHandle[] { hasNewItems, terminate });
// terminate was signaled
if (i == 1) return;
hasNewItems.Reset();
waiting.Reset();
Queue<Action> queueCopy;
lock (queue) {
queueCopy = new Queue<Action>(queue);
queue.Clear();
}
foreach (var log in queueCopy) {
log();
}
}
}
public void LogMessage(LogRow row) {
lock (queue) {
queue.Enqueue(() => AsyncLogMessage(row));
}
hasNewItems.Set();
}
protected abstract void AsyncLogMessage(LogRow row);
public void Flush() {
waiting.WaitOne();
}
public void Dispose() {
terminate.Set();
loggingThread.Join();
}
}
Advantages over the original:
It's disposable, so you can get rid of the async logger
The flush semantics are improved
It will respond slightly better to a burst followed by silence
Yes, you need a producer/consumer queue. I have one example of this in my threading tutorial - if you look my "deadlocks / monitor methods" page you'll find the code in the second half.
There are plenty of other examples online, of course - and .NET 4.0 will ship with one in the framework too (rather more fully featured than mine!). In .NET 4.0 you'd probably wrap a ConcurrentQueue<T> in a BlockingCollection<T>.
The version on that page is non-generic (it was written a long time ago) but you'd probably want to make it generic - it would be trivial to do.
You would call Produce from each "normal" thread, and Consume from one thread, just looping round and logging whatever it consumes. It's probably easiest just to make the consumer thread a background thread, so you don't need to worry about "stopping" the queue when your app exits. That does mean there's a remote possibility of missing the final log entry though (if it's half way through writing it when the app exits) - or even more if you're producing faster than it can consume/log.
Here is what I came up with... also see Sam Saffron's answer. This answer is community wiki in case there are any problems that people see in the code and want to update.
/// <summary>
/// A singleton queue that manages writing log entries to the different logging sources (Enterprise Library Logging) off the executing thread.
/// This queue ensures that log entries are written in the order that they were executed and that logging is only utilizing one thread (backgroundworker) at any given time.
/// </summary>
public class AsyncLoggerQueue
{
//create singleton instance of logger queue
public static AsyncLoggerQueue Current = new AsyncLoggerQueue();
private static readonly object logEntryQueueLock = new object();
private Queue<LogEntry> _LogEntryQueue = new Queue<LogEntry>();
private BackgroundWorker _Logger = new BackgroundWorker();
private AsyncLoggerQueue()
{
//configure background worker
_Logger.WorkerSupportsCancellation = false;
_Logger.DoWork += new DoWorkEventHandler(_Logger_DoWork);
}
public void Enqueue(LogEntry le)
{
//lock during write
lock (logEntryQueueLock)
{
_LogEntryQueue.Enqueue(le);
//while locked check to see if the BW is running, if not start it
if (!_Logger.IsBusy)
_Logger.RunWorkerAsync();
}
}
private void _Logger_DoWork(object sender, DoWorkEventArgs e)
{
while (true)
{
LogEntry le = null;
bool skipEmptyCheck = false;
lock (logEntryQueueLock)
{
if (_LogEntryQueue.Count <= 0) //if queue is empty than BW is done
return;
else if (_LogEntryQueue.Count > 1) //if greater than 1 we can skip checking to see if anything has been enqueued during the logging operation
skipEmptyCheck = true;
//dequeue the LogEntry that will be written to the log
le = _LogEntryQueue.Dequeue();
}
//pass LogEntry to Enterprise Library
Logger.Write(le);
if (skipEmptyCheck) //if LogEntryQueue.Count was > 1 before we wrote the last LogEntry we know to continue without double checking
{
lock (logEntryQueueLock)
{
if (_LogEntryQueue.Count <= 0) //if queue is still empty than BW is done
return;
}
}
}
}
}
I suggest to start with measuring actual performance impact of logging on the overall system (i.e. by running profiler) and optionally switching to something faster like log4net (I've personally migrated to it from EntLib logging a long time ago).
If this does not work, you can try using this simple method from .NET Framework:
ThreadPool.QueueUserWorkItem
Queues a method for execution. The method executes when a thread pool thread becomes available.
MSDN Details
If this does not work either then you can resort to something like John Skeet has offered and actually code the async logging framework yourself.
In response to Sam Safrons post, I wanted to call flush and make sure everything was really finished writting. In my case, I am writing to a database in the queue thread and all my log events were getting queued up but sometimes the application stopped before everything was finished writing which is not acceptable in my situation. I changed several chunks of your code but the main thing I wanted to share was the flush:
public static void FlushLogs()
{
bool queueHasValues = true;
while (queueHasValues)
{
//wait for the current iteration to complete
m_waitingThreadEvent.WaitOne();
lock (m_loggerQueueSync)
{
queueHasValues = m_loggerQueue.Count > 0;
}
}
//force MEL to flush all its listeners
foreach (MEL.LogSource logSource in MEL.Logger.Writer.TraceSources.Values)
{
foreach (TraceListener listener in logSource.Listeners)
{
listener.Flush();
}
}
}
I hope that saves someone some frustration. It is especially apparent in parallel processes logging lots of data.
Thanks for sharing your solution, it set me into a good direction!
--Johnny S
I wanted to say that my previous post was kind of useless. You can simply set AutoFlush to true and you will not have to loop through all the listeners. However, I still had crazy problem with parallel threads trying to flush the logger. I had to create another boolean that was set to true during the copying of the queue and executing the LogEntry writes and then in the flush routine I had to check that boolean to make sure something was not already in the queue and the nothing was getting processed before returning.
Now multiple threads in parallel can hit this thing and when I call flush I know it is really flushed.
public static void FlushLogs()
{
int queueCount;
bool isProcessingLogs;
while (true)
{
//wait for the current iteration to complete
m_waitingThreadEvent.WaitOne();
//check to see if we are currently processing logs
lock (m_isProcessingLogsSync)
{
isProcessingLogs = m_isProcessingLogs;
}
//check to see if more events were added while the logger was processing the last batch
lock (m_loggerQueueSync)
{
queueCount = m_loggerQueue.Count;
}
if (queueCount == 0 && !isProcessingLogs)
break;
//since something is in the queue, reset the signal so we will not keep looping
Thread.Sleep(400);
}
}
Just an update:
Using enteprise library 5.0 with .NET 4.0 it can easily be done by:
static public void LogMessageAsync(LogEntry logEntry)
{
Task.Factory.StartNew(() => LogMessage(logEntry));
}
See:
http://randypaulo.wordpress.com/2011/07/28/c-enterprise-library-asynchronous-logging/
An extra level of indirection may help here.
Your first async method call can put messages onto a synchonized Queue and set an event -- so the locks are happening in the thread-pool, not on your worker threads -- and then have yet another thread pulling messages off the queue when the event is raised.
If you log something on a separate thread, the message may not be written if the application crashes, which makes it rather useless.
The reason goes why you should always flush after every written entry.
If what you have in mind is a SHARED queue, then I think you are going to have to synchronize the writes to it, the pushes and the pops.
But, I still think it's worth aiming at the shared queue design. In comparison to the IO of logging and probably in comparison to the other work your app is doing, the brief amount of blocking for the pushes and the pops will probably not be significant.

Threading an unkown amount of threads in C#

I'm currently writing a sitemap generator that scrapes a site for urls and builds an xml sitemap. As most of the waiting is spent on requests to uri's I'm using threading, specifically the build in ThreadPool object.
In order to let the main thread wait for the unknown amount of threads to complete I have implemented the following setup. I don't feel this is a good solution though, can any threading gurus advise me of any problems this solution has, or suggest a better way to implement it?
The EventWaitHandle is set to EventResetMode.ManualReset
Here is the thread method
protected void CrawlUri(object o)
{
try
{
Interlocked.Increment(ref _threadCount);
Uri uri = (Uri)o;
foreach (Match match in _regex.Matches(GetWebResponse(uri)))
{
Uri newUri = new Uri(uri, match.Value);
if (!_uriCollection.Contains(newUri))
{
_uriCollection.Add(newUri);
ThreadPool.QueueUserWorkItem(_waitCallback, newUri);
}
}
}
catch
{
// Handle exceptions
}
finally
{
Interlocked.Decrement(ref _threadCount);
}
// If there are no more threads running then signal the waithandle
if (_threadCount == 0)
_eventWaitHandle.Set();
}
Here is the main thread method
// Request first page (based on host)
Uri root = new Uri(context.Request.Url.GetLeftPart(UriPartial.Authority));
// Begin threaded crawling of the Uri
ThreadPool.QueueUserWorkItem(_waitCallback, root);
Thread.Sleep(5000); // TEMP SOLUTION: Sleep for 5 seconds
_eventWaitHandle.WaitOne();
// Server the Xml Sitemap
context.Response.ContentType = "text/xml";
context.Response.Write(GetXml().OuterXml);
Any ideas are much appreciated :)
Well, first off you can create a ManualResetEvent that starts unset, so you don't have to sleep before waiting on it. Secondly you're going to need to put thread synchronization around your Uri collection. You could get a race condition where one two threads pass the "this Uri does not exist yet" check and they add duplicates. Another race condition is that two threads could pass the if (_threadCount == 0) check and they could both set the event.
Last, you can make the whole thing much more efficient by using the asynchronous BeginGetRequest. Your solution right now keeps a thread around to wait for every request. If you use async methods and callbacks, your program will use less memory (1MB per thread) and won't need to do context switches of threads nearly as much.
Here's an example that should illustrate what I'm talking about. Out of curiosity, I did test it out (with a depth limit) and it does work.
public class CrawlUriTool
{
private Regex regex;
private int pendingRequests;
private List<Uri> uriCollection;
private object uriCollectionSync = new object();
private ManualResetEvent crawlCompletedEvent;
public List<Uri> CrawlUri(Uri uri)
{
this.pendingRequests = 0;
this.uriCollection = new List<Uri>();
this.crawlCompletedEvent = new ManualResetEvent(false);
this.StartUriCrawl(uri);
this.crawlCompletedEvent.WaitOne();
return this.uriCollection;
}
private void StartUriCrawl(Uri uri)
{
Interlocked.Increment(ref this.pendingRequests);
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.BeginGetResponse(this.UriCrawlCallback, request);
}
private void UriCrawlCallback(IAsyncResult asyncResult)
{
HttpWebRequest request = asyncResult.AsyncState as HttpWebRequest;
try
{
HttpWebResponse response = (HttpWebResponse)request.EndGetResponse(asyncResult);
string responseText = this.GetTextFromResponse(response); // not included
foreach (Match match in this.regex.Matches(responseText))
{
Uri newUri = new Uri(response.ResponseUri, match.Value);
lock (this.uriCollectionSync)
{
if (!this.uriCollection.Contains(newUri))
{
this.uriCollection.Add(newUri);
this.StartUriCrawl(newUri);
}
}
}
}
catch (WebException exception)
{
// handle exception
}
finally
{
if (Interlocked.Decrement(ref this.pendingRequests) == 0)
{
this.crawlCompletedEvent.Set();
}
}
}
}
When doing this kind of logic I generally try to make an object representing each asynchronous task and the data it needs to run. I would typically add this object to the collection of tasks to be done. The thread pool gets these tasks secheduled, and I would let the object itself remove itself from the "to be done" collection when the task finishes, possibly signalling on the collection itself.
So you're finished when the "to be done" collection is empty; the main thread is probably awoken once by each task that finishes.
You could look into the CTP of the Task Parallel Library which should make this simpler for you. What you're doing can be divided into "tasks", chunks or units of work, and the TPL can parallelize this for you if you supply the tasks. It uses a thread pool internally as well, but it's easier to use and comes with a lot of options like waiting for all tasks to finish. Check out this Channel9 video where the possibilities are explained and where a demo is shown of traversing a tree recursively in parallel, which seems very applicable to your problem.
However, it's still a preview and won't be released until .NET 4.0, so it comes with no warranties and you'll have to manually include the supplied System.Threading.dll (found in the install folder) into your project and I don't know if that's an option to you.

Categories

Resources