Threadpool/WaitHandle resource leak/crash - c#

I think I may need to re-think my design. I'm having a hard time narrowing down a bug that is causing my computer to completely hang, sometimes throwing an HRESULT 0x8007000E from VS 2010.
I have a console application (that I will later convert to a service) that handles transferring files based on a database queue.
I am throttling the threads allowed to transfer. This is because some systems we are connecting to can only contain a certain number of connections from certain accounts.
For example, System A can only accept 3 simultaneous connections (which means 3 separate threads). Each one of these threads has their own unique connection object, so we shouldn't run in to any synchronization problems since they aren't sharing a connection.
We want to process the files from those systems in cycles. So, for example, we will allow 3 connections that can transfer up to 100 files per connection. This means, to move 1000 files from System A, we can only process 300 files per cycle, since 3 threads are allowed with 100 files each. Therefore, over the lifetime of this transfer, we will have 10 threads. We can only run 3 at a time. So, there will be 3 cycles, and the last cycle will only use 1 thread to transfer the last 100 files. (3 threads x 100 files = 300 files per cycle)
The current architecture by example is:
A System.Threading.Timer checks the queue every 5 seconds for something to do by calling GetScheduledTask()
If there's nothing to, GetScheduledTask() simply does nothing
If there is work, create a ThreadPool thread to process the work [Work Thread A]
Work Thread A sees that there are 1000 files to transfer
Work Thread A sees that it can only have 3 threads running to the system it is getting files from
Work Thread A starts three new work threads [B,C,D] and transfers
Work Thread A waits for B,C,D [WaitHandle.WaitAll(transfersArray)]
Work Thread A sees that there are still more files in the queue (should be 700 now)
Work Thread A creates a new array to wait on [transfersArray = new TransferArray[3] which is the max for System A, but could vary on system
Work Thread A starts three new work threads [B,C,D] and waits for them [WaitHandle.WaitAll(transfersArray)]
The process repeats until there are no more files to move.
Work Thread A signals that it is done
I am using ManualResetEvent to handle the signaling.
My questions are:
Is there any glaring circumstance which would cause a resource leak or problem that I am experiencing?
Should I loop thru the array after every WaitHandle.WaitAll(array) and call array[index].Dispose()?
The Handle count under the Task Manager for this process slowly creeps up
I am calling the initial creation of Worker Thread A from a System.Threading.Timer. Is there going to be any problems with this? The code for that timer is:
(Some class code for scheduling)
private ManualResetEvent _ResetEvent;
private void Start()
{
_IsAlive = true;
ManualResetEvent transferResetEvent = new ManualResetEvent(false);
//Set the scheduler timer to 5 second intervals
_ScheduledTasks = new Timer(new TimerCallback(ScheduledTasks_Tick), transferResetEvent, 200, 5000);
}
private void ScheduledTasks_Tick(object state)
{
ManualResetEvent resetEvent = null;
try
{
resetEvent = (ManualResetEvent)state;
//Block timer until GetScheduledTasks() finishes
_ScheduledTasks.Change(Timeout.Infinite, Timeout.Infinite);
GetScheduledTasks();
}
finally
{
_ScheduledTasks.Change(5000, 5000);
Console.WriteLine("{0} [Main] GetScheduledTasks() finished", DateTime.Now.ToString("MMddyy HH:mm:ss:fff"));
resetEvent.Set();
}
}
private void GetScheduledTask()
{
try
{
//Check to see if the database connection is still up
if (!_IsAlive)
{
//Handle
_ConnectionLostNotification = true;
return;
}
//Get scheduled records from the database
ISchedulerTask task = null;
using (DataTable dt = FastSql.ExecuteDataTable(
_ConnectionString, "hidden for security", System.Data.CommandType.StoredProcedure,
new List<FastSqlParam>() { new FastSqlParam(ParameterDirection.Input, SqlDbType.VarChar, "#ProcessMachineName", Environment.MachineName) })) //call to static class
{
if (dt != null)
{
if (dt.Rows.Count == 1)
{ //Only 1 row is allowed
DataRow dr = dt.Rows[0];
//Get task information
TransferParam.TaskType taskType = (TransferParam.TaskType)Enum.Parse(typeof(TransferParam.TaskType), dr["TaskTypeId"].ToString());
task = ScheduledTaskFactory.CreateScheduledTask(taskType);
task.Description = dr["Description"].ToString();
task.IsEnabled = (bool)dr["IsEnabled"];
task.IsProcessing = (bool)dr["IsProcessing"];
task.IsManualLaunch = (bool)dr["IsManualLaunch"];
task.ProcessMachineName = dr["ProcessMachineName"].ToString();
task.NextRun = (DateTime)dr["NextRun"];
task.PostProcessNotification = (bool)dr["NotifyPostProcess"];
task.PreProcessNotification = (bool)dr["NotifyPreProcess"];
task.Priority = (TransferParam.Priority)Enum.Parse(typeof(TransferParam.SystemType), dr["PriorityId"].ToString());
task.SleepMinutes = (int)dr["SleepMinutes"];
task.ScheduleId = (int)dr["ScheduleId"];
task.CurrentRuns = (int)dr["CurrentRuns"];
task.TotalRuns = (int)dr["TotalRuns"];
SchedulerTask scheduledTask = new SchedulerTask(new ManualResetEvent(false), task);
//Queue up task to worker thread and start
ThreadPool.QueueUserWorkItem(new WaitCallback(this.ThreadProc), scheduledTask);
}
}
}
}
catch (Exception ex)
{
//Handle
}
}
private void ThreadProc(object taskObject)
{
SchedulerTask task = (SchedulerTask)taskObject;
ScheduledTaskEngine engine = null;
try
{
engine = SchedulerTaskEngineFactory.CreateTaskEngine(task.Task, _ConnectionString);
engine.StartTask(task.Task);
}
catch (Exception ex)
{
//Handle
}
finally
{
task.TaskResetEvent.Set();
task.TaskResetEvent.Dispose();
}
}

0x8007000E is an out-of-memory error. That and the handle count seem to point to a resource leak. Ensure you're disposing of every object that implements IDisposable. This includes the arrays of ManualResetEvents you're using.
If you have time, you may also want to convert to using the .NET 4.0 Task class; it was designed to handle complex scenarios like this much more cleanly. By defining child Task objects, you can reduce your overall thread count (threads are quite expensive not only because of scheduling but also because of their stack space).

I'm looking for answers to a similar problem (Handles Count increasing over time).
I took a look at your application architecture and like to suggest you something that could help you out:
Have you heard about IOCP (Input Output Completion Ports).
I'm not sure of the dificulty to implement this using C# but in C/C++ it is a piece of cake.
By using this you create a unique thread pool (The number of threads in that pool is in general defined as 2 x the number of processors or processors cores in the PC or server)
You associate this pool to a IOCP Handle and the pool does the work.
See the help for these functions:
CreateIoCompletionPort();
PostQueuedCompletionStatus();
GetQueuedCompletionStatus();
In General creating and exiting threads on the fly could be time consuming and leads to performance penalties and memory fragmentation.
There are thousands of literature about IOCP in MSDN and in google.

I think you should reconsider your architecture altogether. The fact that you can only have 3 simultaneously connections is almost begging you to use 1 thread to generate the list of files and 3 threads to process them. Your producer thread would insert all files into a queue and the 3 consumer threads will dequeue and continue processing as items arrive in the queue. A blocking queue can significantly simplify the code. If you are using .NET 4.0 then you can take advantage of the BlockingCollection class.
public class Example
{
private BlockingCollection<string> m_Queue = new BlockingCollection<string>();
public void Start()
{
var threads = new Thread[]
{
new Thread(Producer),
new Thread(Consumer),
new Thread(Consumer),
new Thread(Consumer)
};
foreach (Thread thread in threads)
{
thread.Start();
}
}
private void Producer()
{
while (true)
{
Thread.Sleep(TimeSpan.FromSeconds(5));
ScheduledTask task = GetScheduledTask();
if (task != null)
{
foreach (string file in task.Files)
{
m_Queue.Add(task);
}
}
}
}
private void Consumer()
{
// Make a connection to the resource that is assigned to this thread only.
while (true)
{
string file = m_Queue.Take();
// Process the file.
}
}
}
I have definitely oversimplified things in the example above, but I hope you get the general idea. Notice how this is much simpler as there is not much in the way of thread synchronization (most will be embedded in the blocking queue) and of course there is no use of WaitHandle objects. Obviously you would have to add in the correct mechanisms to shut down the threads gracefully, but that should be fairly easy.

It turns out the source of this strange problem was not related to architecture but rather because of converting the solution from 3.5 to 4.0. I re-created the solution, performing no code changes, and the problem never occurred again.

Related

Free up memory once Thread finished its task

Put break point before Thread start and you will notice console consuming about 5 to 8 MB memory but once Thread started it spike to 17 to 20 MB memory. And this memory stay used until close console. How can i freeup memory after Thread finished it task? Any better solution?
Now question is: Why i need it since garbage collector will automatically free up memory when needed. I need it because i am doing web scraping and i got a global class to store all scraped html text there and i have to scrape like 10k pages and store that html to global class. What happen is: when i run this app after scrape 500 html data to global class it eat almost 100% of my pc RAM which is 20 GB. So i need to free up RAM. I cant close console app to free up ram bcoz i have some calculation after collect all html.
class DemoData
{
public int Id { get; set; }
public string Text { get; set; }
public static List<DemoData> data = new List<DemoData>();
}
class Program
{
public static void Main()
{
for (var i = 0; i < 5000; i++)
{
DemoData.data.Add(new DemoData
{
Id = i,
Text = "something....",
});
}
foreach (var item in DemoData.data)
{
var t = new Thread(new ThreadStart(DoSomething));//put break point here and see.
t.Name = item.Id.ToString(); ;
t.Start();
}
Console.WriteLine("wait");
Console.ReadLine();
}
public static void DoSomething()
{
Thread thr = Thread.CurrentThread;
Console.WriteLine(thr.Name);
}
}
you just need to wait for for the garbage collector to run, your current app isn't complex enough to require this except on close so you aren't seeing this occur,
so the framework will fire off the GC when it feels it is needed, this will then look though the callstack and decide which objects it no longer needs and delete them freeing up memory, this will happen completely automatically with out you needing to do anything.
if you want to help the GC out you can check your variable scopes so that the GC can see its not needed anymore because its completely out of scope, so avoiding global variables, not creating links between data objects that aren't needed making correct choices between value and reference types, ect
however if you ever do end up in a position where you need to manually fire the GC you can call GC.Collect()
see
https://learn.microsoft.com/en-us/dotnet/api/system.gc.collect?view=net-5.0
You might need to use a memory profiler to check for potential memory leaks.
A possible reason for your issues is the large number of threads, there is no reason to use more threads than there are cpu cores. Each thread used will need some memory for a stack and other house keeping.
One fairly simple way would be to put addresses that needs visiting in a collection, and use a parallel.Foreach loop. This will try to adjust the number of threads used to maximize thruput. More complex variants could use one of the concurrent collections and multiple consumers. There is also async variants of many IO calls to avoid the memory overhead of blocking a thread while waiting for IO. I would recommend reading some examples of multiple producers/consumers pattern for more details.
if your problem is feeding the data into the threads for processing then i would suggest you look at https://devblogs.microsoft.com/dotnet/an-introduction-to-system-threading-channels/ this allows you to then create a thread safe processing buffer
here is working example, note this is using the .net 5 syntax
using System;
using System.Threading.Channels;
using System.Net.Http;
using System.Collections.Generic;
using System.Threading.Tasks;
using System.Linq;
var sites = Enumerable.Range(0, 5000).Select(i => #"http:\\www.example.com");
//create thread safe buffer no limit on size
var sitebuffer = Channel.CreateUnbounded<string>();
//create thread safe buffer limited to 10 elements
var htmlbuffer = Channel.CreateBounded<string>(10);
async Task Feed()
{
//while the buffer hasn't closed, wait for new data to be available
while(await sitebuffer.Reader.WaitToReadAsync())
{
//read the next available url from the buffer
var uri = await sitebuffer.Reader.ReadAsync();
var http = new HttpClient();
var html= await http.GetAsync(uri);
Console.WriteLine("reading site");
//load the return text to the htmlbuffer, if buffer is full wait for space
await htmlbuffer.Writer.WriteAsync(await html.Content.ReadAsStringAsync());
Console.WriteLine("reading site complete");
}
}
async Task Process()
{
//while the buffer hasn't closed, wait for new data to be available
while (await htmlbuffer.Reader.WaitToReadAsync())
{
//read html from buffer send to doSomething then read next element
var html = await htmlbuffer.Reader.ReadAsync();
await doSomethingWithHTML(html);
}
}
async Task doSomethingWithHTML(string html)
{
await Task.Delay(2);
Console.WriteLine("done something");
}
//start 4 feeders threads
var feeders = new[]
{
Feed(),
Feed(),
Feed(),
Feed(),
};
//start 2 worker threads
var workers = new[]
{
Process(),
Process(),
};
//start of feeding in sites
foreach (var item in sites)
{
await sitebuffer.Writer.WriteAsync(item);
}
//mark that all sites have been fed into the systems
sitebuffer.Writer.Complete();
//wait for all feeders to finish
await Task.WhenAll(feeders);
//mark that no more sites will be read
htmlbuffer.Writer.Complete();
//wait for all workers to finish
await Task.WhenAll(workers);
Console.WriteLine("all tasks complete");
notice the async Task's and awaits this is a newer wrapper around threads that simplifies a lot of the complexity in managing threads

How to use tasks with ConcurrentDictionary

I have to write a program where I'm reading from a database the queues to process and all the queues are run in parallel and managed on the parent thread using a ConcurrentDictionary.
I have a class that represents the queue, which has a constructor that takes in the queue information and the parent instance handle. The queue class also has the method that processes the queue.
Here is the Queue Class:
Class MyQueue {
protected ServiceExecution _parent;
protect string _queueID;
public MyQueue(ServiceExecution parentThread, string queueID)
{
_parent = parentThread;
_queueID = queueID;
}
public void Process()
{
try
{
//Do work to process
}
catch()
{
//exception handling
}
finally{
_parent.ThreadFinish(_queueID);
}
The parent thread loops through the dataset of queues and instantiates a new queue class. It spawns a new thread to execute the Process method of the Queue object asynchronously. This thread is added to the ConcurrentDictionary and then started as follows:
private ConcurrentDictionary<string, MyQueue> _runningQueues = new ConcurrentDictionary<string, MyQueue>();
Foreach(datarow dr in QueueDataset.rows)
{
MyQueue queue = new MyQueue(this, dr["QueueID"].ToString());
Thread t = new Thread(()=>queue.Process());
if(_runningQueues.TryAdd(dr["QueueID"].ToString(), queue)
{
t.start();
}
}
//Method that gets called by the queue thread when it finishes
public void ThreadFinish(string queueID)
{
MyQueue queue;
_runningQueues.TryRemove(queueID, out queue);
}
I have a feeling this is not the right approach to manage the asynchronous queue processing and I'm wondering if perhaps I can run into deadlocks with this design? Furthermore, I would like to use Tasks to run the queues asynchronously instead of the new Threads. I need to keep track of the queues because I will not spawn a new thread or task for the same queue if the previous run is not complete yet. What is the best way to handle this type of parallelism?
Thanks in advance!
About your current approach
Indeed it is not the right approach. High number of queues read from database will spawn high number of threads which might be bad. You will create a new thread each time. Better to create some threads and then re-use them. And if you want tasks, better to create LongRunning tasks and re-use them.
Suggested Design
I'd suggest the following design:
Reserve only one task to read queues from the database and put those queues in a BlockingCollection;
Now start multiple LongRunning tasks to read a queue each from that BlockingCollection and process that queue;
When a task is done with processing the queue it took from the BlockingCollection, it will then take another queue from that BlockingCollection;
Optimize the number of these processing tasks so as to properly utilize the cores of your CPU. Usually since DB interactions are slow, you can create tasks 3 times more than the number of cores however YMMV.
Deadlock possibility
They will at least not happen at the application side. However, since the queues are of database transactions, the deadlock may happen at the database end. You may have to write some logic to make your task start a transaction again if the database rolled it back because of deadlock.
Sample Code
private static void TaskDesignedRun()
{
var expectedParallelQueues = 1024; //Optimize it. I've chosen it randomly
var parallelProcessingTaskCount = 4 * Environment.ProcessorCount; //Optimize this too.
var baseProcessorTaskArray = new Task[parallelProcessingTaskCount];
var taskFactory = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.None);
var itemsToProcess = new BlockingCollection<MyQueue>(expectedParallelQueues);
//Start a new task to populate the "itemsToProcess"
taskFactory.StartNew(() =>
{
// Add code to read queues and add them to itemsToProcess
Console.WriteLine("Done reading all the queues...");
// Finally signal that you are done by saying..
itemsToProcess.CompleteAdding();
});
//Initializing the base tasks
for (var index = 0; index < baseProcessorTaskArray.Length; index++)
{
baseProcessorTaskArray[index] = taskFactory.StartNew(() =>
{
while (!itemsToProcess.IsAddingCompleted && itemsToProcess.Count != 0) {
MyQueue q;
if (!itemsToProcess.TryTake(out q)) continue;
//Process your queue
}
});
}
//Now just wait till all queues in your database have been read and processed.
Task.WaitAll(baseProcessorTaskArray);
}

Adding a multithreading scenario for an application in c#

I have developed an application in c#. The class structure is as follows.
Form1 => The UI form. Has a backgroundworker, processbar, and a "ok" button.
SourceReader, TimedWebClient, HttpWorker, ReportWriter //clases do some work
Controller => Has the all over control. From "ok" button click an instance of this class called "cntrl" is created. This cntrlr is a global variable in Form1.cs.
(At the constructor of the Controler I create SourceReader, TimedWebClient,HttpWorker,ReportWriter instances. )
Then I call the RunWorkerAsync() of the background worker.
Within it code is as follows.
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
int iterator = 1;
for (iterator = 1; iterator <= this.urlList.Count; iterator++)
{
cntrlr.Vmain(iterator-1);
backgroundWorker1.ReportProgress(iterator);
}
}
At themoment ReportProgress updates the progressbar.
The urlList mentioned above has 1000 of urls. cntlr.Vamin(int i) process the whole process at themoment. I want to give the task to several threads, each one having to process 100 of urls. Though access for other instances or methods of them is not prohibited, access to ReportWriter should be limited to only one thread at a time. I can't find a way to do this. If any one have an idea or an answer, please explain.
If you do want to restrict multiple threads using the same method concurrently then I would use the Semaphore class to facilitate the required thread limit; here's how...
A semaphore is like a mean night club bouncer, it has been provide a club capacity and is not allowed to exceed this limit. Once the club is full, no one else can enter... A queue builds up outside. Then as one person leaves another can enter (analogy thanks to J. Albahari).
A Semaphore with a value of one is equivalent to a Mutex or Lock except that the Semaphore has no owner so that it is thread ignorant. Any thread can call Release on a Semaphore whereas with a Mutex/Lock only the thread that obtained the Mutex/Lock can release it.
Now, for your case we are able to use Semaphores to limit concurrency and prevent too many threads from executing a particular piece of code at once. In the following example five threads try to enter a night club that only allows entry to three...
class BadAssClub
{
static SemaphoreSlim sem = new SemaphoreSlim(3);
static void Main()
{
for (int i = 1; i <= 5; i++)
new Thread(Enter).Start(i);
}
// Enfore only three threads running this method at once.
static void Enter(int i)
{
try
{
Console.WriteLine(i + " wants to enter.");
sem.Wait();
Console.WriteLine(i + " is in!");
Thread.Sleep(1000 * (int)i);
Console.WriteLine(i + " is leaving...");
}
finally
{
sem.Release();
}
}
}
Note, that SemaphoreSlim is a lighter weight version of the Semaphore class and incurs about a quarter of the overhead. it is sufficient for what you require.
I hope this helps.
I think I would have used the ThreadPool, instead of background worker, and given each thread 1, not 100 url's to process. The thread pool will limit the number of threads it starts at once, so you wont have to worry about getting 1000 requests at once. Have a look here for a good example
http://msdn.microsoft.com/en-us/library/3dasc8as.aspx
Feeling a little more adventurous? Consider using TPL DataFlow to download a bunch of urls:
var urls = new[]{
"http://www.google.com",
"http://www.microsoft.com",
"http://www.apple.com",
"http://www.stackoverflow.com"};
var tb = new TransformBlock<string, string>(async url => {
using(var wc = new WebClient())
{
var data = await wc.DownloadStringTaskAsync(url);
Console.WriteLine("Downloaded : {0}", url);
return data;
}
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 4});
var ab = new ActionBlock<string>(data => {
//process your data
Console.WriteLine("data length = {0}", data.Length);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 1});
tb.LinkTo(ab); //join output of producer to consumer block
foreach(var u in urls)
{
tb.Post(u);
}
tb.Complete();
Note how you can control the parallelism of each block explicitly, so you can gather in parallel but process without going concurrent (for example).
Just grab it with nuget. Easy.

Shared Object Pool without Thread.Sleep?

I have developed an "object pool" and cannot seem to do it without using Thread.Sleep() which is "bad practice" I believe.
This relates to my other question "Is there a standard way of implementing a proprietary connection pool in .net?". The idea behind the object pool is similar to the one behind the connection pool used for database connections. However, in my case I am using it to share a limited resource in a standard ASP.NET Web Service (running in IIS6). This means that many threads will be requesting access to this limited resource. The pool would dish out these objects (a "Get) and once all the available pool objects have been used, the next thread requesting one would simply waits a set amount of time for one of these object to become available again (a thread would do a "Put" once done with the object). If an object does not become available in this set time, a timeout error will occur.
Here is the code:
public class SimpleObjectPool
{
private const int cMaxGetTimeToWaitInMs = 60000;
private const int cMaxGetSleepWaitInMs = 10;
private object fSyncRoot = new object();
private Queue<object> fQueue = new Queue<object>();
private SimpleObjectPool()
{
}
private static readonly SimpleObjectPool instance = new SimpleObjectPool();
public static SimpleObjectPool Instance
{
get
{
return instance;
}
}
public object Get()
{
object aObject = null;
for (int i = 0; i < (cMaxGetTimeToWaitInMs / cMaxGetSleepWaitInMs); i++)
{
lock (fSyncRoot)
{
if (fQueue.Count > 0)
{
aObject = fQueue.Dequeue();
break;
}
}
System.Threading.Thread.Sleep(cMaxGetSleepWaitInMs);
}
if (aObject == null)
throw new Exception("Timout on waiting for object from pool");
return aObject;
}
public void Put(object aObject)
{
lock (fSyncRoot)
{
fQueue.Enqueue(aObject);
}
}
}
To use use it, one would do the following:
public void ExampleUse()
{
PoolObject lTestObject = (PoolObject)SimpleObjectPool.Instance.Get();
try
{
// Do something...
}
finally
{
SimpleObjectPool.Instance.Put(lTestObject);
}
}
Now the question I have is: How do I write this so I get rid of the Thread.Sleep()?
(Why I want to do this is because I suspect that it is responsible for the "false" timeout I am getting in my testing. My test application has a object pool with 3 objects in it. It spins up 12 threads and each thread gets an object from the pool 100 times. If the thread gets an object from the pool, it holds on to if for 2,000 ms, if it does not, it goes to the next iteration. Now logic dictates that 9 threads will be waiting for an object at any point in time. 9 x 2,000 ms is 18,000 ms which is the maximum time any thread should have to wait for an object. My get timeout is set to 60,000 ms so no thread should ever timeout. However some do so something is wrong and I suspect its the Thread.Sleep)
Since you are already using lock, consider using Monitor.Wait and Monitor.Pulse
In Get():
lock (fSyncRoot)
{
while (fQueue.Count < 1)
Monitor.Wait(fSyncRoot);
aObject = fQueue.Dequeue();
}
And in Put():
lock (fSyncRoot)
{
fQueue.Enqueue(aObject);
if (fQueue.Count == 1)
Monitor.Pulse(fSyncRoot);
}
you should be using a semaphore.
http://msdn.microsoft.com/en-us/library/system.threading.semaphore.aspx
UPDATE:
Semaphores are one of the basic constructs of multi-threaded programming.
A semaphore can be used different ways, but the basic idea is when you have a limited resource and many clients who want to use that resource, you can limit the number of clients that can access the resource at any given time.
below is a very crude example. I didn't add any error checking or try/finally blocks but you should.
You can also check:
http://en.wikipedia.org/wiki/Semaphore_(programming)
Say you have 10 buckets and 100 people who want to use those buckets.
We can represent the buckets in a queue.
At the start, add all of your buckets to the queue
for(int i=0;i<10;i++)
{
B.Push(new Bucket());
}
Now create a semaphore to guard your bucket queue. This semaphore is created with no items triggered and a capacity of 10.
Semaphore s = new Semaphore(0, 10);
All clients should check the semaphore before accessing the queue. You might have 100 threads running the thread method below. The first 10 will pass the semaphore. All others will wait.
void MyThread()
{
while(true)
{
// thread will wait until the semaphore is triggered once
// there are other ways to call this which allow you to pass a timeout
s.WaitOne();
// after being triggered once, thread is clear to get an item from the queue
Bucket b = null;
// you still need to lock because more than one thread can pass the semaphore at the sam time.
lock(B_Lock)
{
b = B.Pop();
}
b.UseBucket();
// after you finish using the item, add it back to the queue
// DO NOT keep the queue locked while you are using the item or no other thread will be able to get anything out of it
lock(B_Lock)
{
B.Push(b);
}
// after adding the item back to the queue, trigger the semaphore and allow
// another thread to enter
s.Release();
}
}

How to effectively log asynchronously?

I am using Enterprise Library 4 on one of my projects for logging (and other purposes). I've noticed that there is some cost to the logging that I am doing that I can mitigate by doing the logging on a separate thread.
The way I am doing this now is that I create a LogEntry object and then I call BeginInvoke on a delegate that calls Logger.Write.
new Action<LogEntry>(Logger.Write).BeginInvoke(le, null, null);
What I'd really like to do is add the log message to a queue and then have a single thread pulling LogEntry instances off the queue and performing the log operation. The benefit of this would be that logging is not interfering with the executing operation and not every logging operation results in a job getting thrown on the thread pool.
How can I create a shared queue that supports many writers and one reader in a thread safe way? Some examples of a queue implementation that is designed to support many writers (without causing synchronization/blocking) and a single reader would be really appreciated.
Recommendation regarding alternative approaches would also be appreciated, I am not interested in changing logging frameworks though.
I wrote this code a while back, feel free to use it.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace MediaBrowser.Library.Logging {
public abstract class ThreadedLogger : LoggerBase {
Queue<Action> queue = new Queue<Action>();
AutoResetEvent hasNewItems = new AutoResetEvent(false);
volatile bool waiting = false;
public ThreadedLogger() : base() {
Thread loggingThread = new Thread(new ThreadStart(ProcessQueue));
loggingThread.IsBackground = true;
loggingThread.Start();
}
void ProcessQueue() {
while (true) {
waiting = true;
hasNewItems.WaitOne(10000,true);
waiting = false;
Queue<Action> queueCopy;
lock (queue) {
queueCopy = new Queue<Action>(queue);
queue.Clear();
}
foreach (var log in queueCopy) {
log();
}
}
}
public override void LogMessage(LogRow row) {
lock (queue) {
queue.Enqueue(() => AsyncLogMessage(row));
}
hasNewItems.Set();
}
protected abstract void AsyncLogMessage(LogRow row);
public override void Flush() {
while (!waiting) {
Thread.Sleep(1);
}
}
}
}
Some advantages:
It keeps the background logger alive, so it does not need to spin up and spin down threads.
It uses a single thread to service the queue, which means there will never be a situation where 100 threads are servicing the queue.
It copies the queues to ensure the queue is not blocked while the log operation is performed
It uses an AutoResetEvent to ensure the bg thread is in a wait state
It is, IMHO, very easy to follow
Here is a slightly improved version, keep in mind I performed very little testing on it, but it does address a few minor issues.
public abstract class ThreadedLogger : IDisposable {
Queue<Action> queue = new Queue<Action>();
ManualResetEvent hasNewItems = new ManualResetEvent(false);
ManualResetEvent terminate = new ManualResetEvent(false);
ManualResetEvent waiting = new ManualResetEvent(false);
Thread loggingThread;
public ThreadedLogger() {
loggingThread = new Thread(new ThreadStart(ProcessQueue));
loggingThread.IsBackground = true;
// this is performed from a bg thread, to ensure the queue is serviced from a single thread
loggingThread.Start();
}
void ProcessQueue() {
while (true) {
waiting.Set();
int i = ManualResetEvent.WaitAny(new WaitHandle[] { hasNewItems, terminate });
// terminate was signaled
if (i == 1) return;
hasNewItems.Reset();
waiting.Reset();
Queue<Action> queueCopy;
lock (queue) {
queueCopy = new Queue<Action>(queue);
queue.Clear();
}
foreach (var log in queueCopy) {
log();
}
}
}
public void LogMessage(LogRow row) {
lock (queue) {
queue.Enqueue(() => AsyncLogMessage(row));
}
hasNewItems.Set();
}
protected abstract void AsyncLogMessage(LogRow row);
public void Flush() {
waiting.WaitOne();
}
public void Dispose() {
terminate.Set();
loggingThread.Join();
}
}
Advantages over the original:
It's disposable, so you can get rid of the async logger
The flush semantics are improved
It will respond slightly better to a burst followed by silence
Yes, you need a producer/consumer queue. I have one example of this in my threading tutorial - if you look my "deadlocks / monitor methods" page you'll find the code in the second half.
There are plenty of other examples online, of course - and .NET 4.0 will ship with one in the framework too (rather more fully featured than mine!). In .NET 4.0 you'd probably wrap a ConcurrentQueue<T> in a BlockingCollection<T>.
The version on that page is non-generic (it was written a long time ago) but you'd probably want to make it generic - it would be trivial to do.
You would call Produce from each "normal" thread, and Consume from one thread, just looping round and logging whatever it consumes. It's probably easiest just to make the consumer thread a background thread, so you don't need to worry about "stopping" the queue when your app exits. That does mean there's a remote possibility of missing the final log entry though (if it's half way through writing it when the app exits) - or even more if you're producing faster than it can consume/log.
Here is what I came up with... also see Sam Saffron's answer. This answer is community wiki in case there are any problems that people see in the code and want to update.
/// <summary>
/// A singleton queue that manages writing log entries to the different logging sources (Enterprise Library Logging) off the executing thread.
/// This queue ensures that log entries are written in the order that they were executed and that logging is only utilizing one thread (backgroundworker) at any given time.
/// </summary>
public class AsyncLoggerQueue
{
//create singleton instance of logger queue
public static AsyncLoggerQueue Current = new AsyncLoggerQueue();
private static readonly object logEntryQueueLock = new object();
private Queue<LogEntry> _LogEntryQueue = new Queue<LogEntry>();
private BackgroundWorker _Logger = new BackgroundWorker();
private AsyncLoggerQueue()
{
//configure background worker
_Logger.WorkerSupportsCancellation = false;
_Logger.DoWork += new DoWorkEventHandler(_Logger_DoWork);
}
public void Enqueue(LogEntry le)
{
//lock during write
lock (logEntryQueueLock)
{
_LogEntryQueue.Enqueue(le);
//while locked check to see if the BW is running, if not start it
if (!_Logger.IsBusy)
_Logger.RunWorkerAsync();
}
}
private void _Logger_DoWork(object sender, DoWorkEventArgs e)
{
while (true)
{
LogEntry le = null;
bool skipEmptyCheck = false;
lock (logEntryQueueLock)
{
if (_LogEntryQueue.Count <= 0) //if queue is empty than BW is done
return;
else if (_LogEntryQueue.Count > 1) //if greater than 1 we can skip checking to see if anything has been enqueued during the logging operation
skipEmptyCheck = true;
//dequeue the LogEntry that will be written to the log
le = _LogEntryQueue.Dequeue();
}
//pass LogEntry to Enterprise Library
Logger.Write(le);
if (skipEmptyCheck) //if LogEntryQueue.Count was > 1 before we wrote the last LogEntry we know to continue without double checking
{
lock (logEntryQueueLock)
{
if (_LogEntryQueue.Count <= 0) //if queue is still empty than BW is done
return;
}
}
}
}
}
I suggest to start with measuring actual performance impact of logging on the overall system (i.e. by running profiler) and optionally switching to something faster like log4net (I've personally migrated to it from EntLib logging a long time ago).
If this does not work, you can try using this simple method from .NET Framework:
ThreadPool.QueueUserWorkItem
Queues a method for execution. The method executes when a thread pool thread becomes available.
MSDN Details
If this does not work either then you can resort to something like John Skeet has offered and actually code the async logging framework yourself.
In response to Sam Safrons post, I wanted to call flush and make sure everything was really finished writting. In my case, I am writing to a database in the queue thread and all my log events were getting queued up but sometimes the application stopped before everything was finished writing which is not acceptable in my situation. I changed several chunks of your code but the main thing I wanted to share was the flush:
public static void FlushLogs()
{
bool queueHasValues = true;
while (queueHasValues)
{
//wait for the current iteration to complete
m_waitingThreadEvent.WaitOne();
lock (m_loggerQueueSync)
{
queueHasValues = m_loggerQueue.Count > 0;
}
}
//force MEL to flush all its listeners
foreach (MEL.LogSource logSource in MEL.Logger.Writer.TraceSources.Values)
{
foreach (TraceListener listener in logSource.Listeners)
{
listener.Flush();
}
}
}
I hope that saves someone some frustration. It is especially apparent in parallel processes logging lots of data.
Thanks for sharing your solution, it set me into a good direction!
--Johnny S
I wanted to say that my previous post was kind of useless. You can simply set AutoFlush to true and you will not have to loop through all the listeners. However, I still had crazy problem with parallel threads trying to flush the logger. I had to create another boolean that was set to true during the copying of the queue and executing the LogEntry writes and then in the flush routine I had to check that boolean to make sure something was not already in the queue and the nothing was getting processed before returning.
Now multiple threads in parallel can hit this thing and when I call flush I know it is really flushed.
public static void FlushLogs()
{
int queueCount;
bool isProcessingLogs;
while (true)
{
//wait for the current iteration to complete
m_waitingThreadEvent.WaitOne();
//check to see if we are currently processing logs
lock (m_isProcessingLogsSync)
{
isProcessingLogs = m_isProcessingLogs;
}
//check to see if more events were added while the logger was processing the last batch
lock (m_loggerQueueSync)
{
queueCount = m_loggerQueue.Count;
}
if (queueCount == 0 && !isProcessingLogs)
break;
//since something is in the queue, reset the signal so we will not keep looping
Thread.Sleep(400);
}
}
Just an update:
Using enteprise library 5.0 with .NET 4.0 it can easily be done by:
static public void LogMessageAsync(LogEntry logEntry)
{
Task.Factory.StartNew(() => LogMessage(logEntry));
}
See:
http://randypaulo.wordpress.com/2011/07/28/c-enterprise-library-asynchronous-logging/
An extra level of indirection may help here.
Your first async method call can put messages onto a synchonized Queue and set an event -- so the locks are happening in the thread-pool, not on your worker threads -- and then have yet another thread pulling messages off the queue when the event is raised.
If you log something on a separate thread, the message may not be written if the application crashes, which makes it rather useless.
The reason goes why you should always flush after every written entry.
If what you have in mind is a SHARED queue, then I think you are going to have to synchronize the writes to it, the pushes and the pops.
But, I still think it's worth aiming at the shared queue design. In comparison to the IO of logging and probably in comparison to the other work your app is doing, the brief amount of blocking for the pushes and the pops will probably not be significant.

Categories

Resources