C# simple serial task queue memory leak

C# simple serial task queue memory leak - c#

I needed a very basic serial execution queue.
I wrote the following based on this idea, but I needed a queue to ensure FIFO, so I added a intermediate ConcurrentQueue<>.
Here is the code:
public class SimpleSerialTaskQueue
{
private SemaphoreSlim _semaphore = new SemaphoreSlim(0);
private ConcurrentQueue<Func<Task>> _taskQueue = new ConcurrentQueue<Func<Task>>();
public SimpleSerialTaskQueue()
{
Task.Run(async () =>
{
Func<Task> dequeuedTask;
while (true)
{
if (await _semaphore.WaitAsync(1000))
{
if (_taskQueue.TryDequeue(out dequeuedTask) == true)
{
await dequeuedTask();
}
}
else
{
Console.WriteLine("Nothing more to process");
//If I don't do that , memory pressure is never released
//GC.Collect();
}
}
});
}
public void Add(Func<Task> o_task)
{
_taskQueue.Enqueue(o_task);
_semaphore.Release();
}
}
When I run that in a loop, simulating heavy load, I get some kind of memory leak. Here is the code:
static void Main(string[] args)
{
SimpleSerialTaskQueue queue = new SimpleSerialTaskQueue();
for (int i = 0; i < 100000000; i++)
{
queue.Add(async () =>
{
await Task.Delay(0);
});
}
Console.ReadLine();
}
EDIT:
I don't understand why once the tasks have been executed, I still get like 750MB of memory used (based on VS2015 diagnostic tools). I thought once executed it would be very low. The GC doesnt seem to collect anything.
Can anyone tell me what is happening? Is this related to the state machine

Related

Multi Thread & Semaphore & Events

I am trying to execute some commands that come from RabbitMQ. Its about 5 msgs/sec. So as are too many msg, I have to send to a thread to execute, but I dont have so many threads, so I put a limit of 10.
so the ideia was that the msgs would come to the worker, put in a queue and any of the 10 threads would peak and execute. All these using semaphore.
After some experiments, I don´t know why, but my thread only executes 3 or 4 items, after that it just stops with no error...
The problem I think is the logic when the event calls the method to execute, could not think in a better way...
Why just the first 4 msgs are processed??
What pattern or better way to do this?
Here are some parts of my code:
const int MaxThreads = 10;
private static Semaphore sem = new Semaphore(MaxThreads, MaxThreads);
private static Queue<BasicDeliverEventArgs> queue = new Queue<BasicDeliverEventArgs>();
static void Main(string[] args)
{
consumer.Received += (sender, ea) =>
{
var m = JsonConvert.DeserializeObject<Mail>(ea.Body.GetString());
Console.WriteLine($"Sub-> {m.Subject}");
queue.Enqueue(ea);
RUN();
};
channel.BasicConsume(queueName, false, consumer);
Console.Read();
}
private static void RUN()
{
while (queue.Count > 0)
{
sem.WaitOne();
var item = queue.Dequeue();
ThreadPool.QueueUserWorkItem(sendmail, item);
}
}
private static void sendmail(Object item)
{
//.....soem processing stuff....
//tell rabbitMq that everything was OK
channel.BasicAck(deliveryTag: x.DeliveryTag, multiple: true);
//release thread
sem.Release();
}

I think that you could use a blocking collection here. It will simplify the code.
So your email sender would look something like that:
public class ParallelEmailSender : IDisposable
{
private readonly BlockingCollection<string> blockingCollection;
public ParallelEmailSender(int threadsCount)
{
blockingCollection = new BlockingCollection<string>(new ConcurrentQueue<string>());
for (int i = 0; i < threadsCount; i++)
{
Task.Factory.StartNew(SendInternal);
}
}
public void Send(string message)
{
blockingCollection.Add(message);
}
private void SendInternal()
{
foreach (string message in blockingCollection.GetConsumingEnumerable())
{
// send method
}
}
public void Dispose()
{
blockingCollection.CompleteAdding();
}
}
Of course you will need to add error catching logic and you could also improve the app shutting down process by using cancellation tokens.
I strongly suggest to read the great e-book about multithreading programming written by Joseph Albahari.

How to handle threads that hang when using SemaphoreSlim

I have some code that runs thousands of URLs through a third party library. Occasionally the method in the library hangs which takes up a thread. After a while all threads are taken up by processes doing nothing and it grinds to a halt.
I am using a SemaphoreSlim to control adding new threads so I can have an optimal number of tasks running. I need a way to identify tasks that have been running too long and then to kill them but also release a thread from the SemaphoreSlim so a new task can be created.
I am struggling with the approach here so I made some test code that immitates what I am doing. It create tasks that have a 10% chance of hanging so very quickly all threads have hung.
How should I be checking for these and killing them off?
Here is the code:
class Program
{
public static SemaphoreSlim semaphore;
public static List<Task> taskList;
static void Main(string[] args)
{
List<string> urlList = new List<string>();
Console.WriteLine("Generating list");
for (int i = 0; i < 1000; i++)
{
//adding random strings to simulate a large list of URLs to process
urlList.Add(Path.GetRandomFileName());
}
Console.WriteLine("Queueing tasks");
semaphore = new SemaphoreSlim(10, 10);
Task.Run(() => QueueTasks(urlList));
Console.ReadLine();
}
static void QueueTasks(List<string> urlList)
{
taskList = new List<Task>();
foreach (var url in urlList)
{
Console.WriteLine("{0} tasks can enter the semaphore.",
semaphore.CurrentCount);
semaphore.Wait();
taskList.Add(DoTheThing(url));
}
}
static async Task DoTheThing(string url)
{
Random rand = new Random();
// simulate the IO process
await Task.Delay(rand.Next(2000, 10000));
// add a 10% chance that the thread will hang simulating what happens occasionally with http request
int chance = rand.Next(1, 100);
if (chance <= 10)
{
while (true)
{
await Task.Delay(1000000);
}
}
semaphore.Release();
Console.WriteLine(url);
}
}

As people have already pointed out, Aborting threads in general is bad and there is no guaranteed way of doing it in C#. Using a separate process to do the work and then kill it is a slightly better idea than attempting Thread.Abort; but still not the best way to go. Ideally, you want co-operative threads/processes, which use IPC to decide when to bail out themselves. This way the cleanup is done properly.
With all that said, you can use code like below to do what you intend to do. I have written it assuming your task will be done in a thread. With slight changes, you can use the same logic to do your task in a process
The code is by no means bullet-proof and is meant to be illustrative. The concurrent code is not really tested well. Locks are held for longer than needed and some places I am not locking (like the Log function)
class TaskInfo {
public Thread Task;
public DateTime StartTime;
public TaskInfo(ParameterizedThreadStart startInfo, object startArg) {
Task = new Thread(startInfo);
Task.Start(startArg);
StartTime = DateTime.Now;
}
}
class Program {
const int MAX_THREADS = 1;
const int TASK_TIMEOUT = 6; // in seconds
const int CLEANUP_INTERVAL = TASK_TIMEOUT; // in seconds
public static SemaphoreSlim semaphore;
public static List<TaskInfo> TaskList;
public static object TaskListLock = new object();
public static Timer CleanupTimer;
static void Main(string[] args) {
List<string> urlList = new List<string>();
Log("Generating list");
for (int i = 0; i < 2; i++) {
//adding random strings to simulate a large list of URLs to process
urlList.Add(Path.GetRandomFileName());
}
Log("Queueing tasks");
semaphore = new SemaphoreSlim(MAX_THREADS, MAX_THREADS);
Task.Run(() => QueueTasks(urlList));
CleanupTimer = new Timer(CleanupTasks, null, CLEANUP_INTERVAL * 1000, CLEANUP_INTERVAL * 1000);
Console.ReadLine();
}
// TODO: Guard against re-entrancy
static void CleanupTasks(object state) {
Log("CleanupTasks started");
lock (TaskListLock) {
var now = DateTime.Now;
int n = TaskList.Count;
for (int i = n - 1; i >= 0; --i) {
var task = TaskList[i];
Log($"Checking task with ID {task.Task.ManagedThreadId}");
// kill processes running for longer than anticipated
if (task.Task.IsAlive && now.Subtract(task.StartTime).TotalSeconds >= TASK_TIMEOUT) {
Log("Cleaning up hung task");
task.Task.Abort();
}
// remove task if it is not alive
if (!task.Task.IsAlive) {
Log("Removing dead task from list");
TaskList.RemoveAt(i);
continue;
}
}
if (TaskList.Count == 0) {
Log("Disposing cleanup thread");
CleanupTimer.Dispose();
}
}
Log("CleanupTasks done");
}
static void QueueTasks(List<string> urlList) {
TaskList = new List<TaskInfo>();
foreach (var url in urlList) {
Log($"Trying to schedule url = {url}");
semaphore.Wait();
Log("Semaphore acquired");
ParameterizedThreadStart taskRoutine = obj => {
try {
DoTheThing((string)obj);
} finally {
Log("Releasing semaphore");
semaphore.Release();
}
};
var task = new TaskInfo(taskRoutine, url);
lock (TaskListLock)
TaskList.Add(task);
}
Log("All tasks queued");
}
// simulate all processes get hung
static void DoTheThing(string url) {
while (true)
Thread.Sleep(5000);
}
static void Log(string msg) {
Console.WriteLine("{0:HH:mm:ss.fff} Thread {1,2} {2}", DateTime.Now, Thread.CurrentThread.ManagedThreadId.ToString(), msg);
}
}

Why this C# code throws SemaphoreFullException?

I have following code which throws SemaphoreFullException, I don't understand why ?
If I change _semaphore = new SemaphoreSlim(0, 2) to
_semaphore = new SemaphoreSlim(0, int.MaxValue)
then all works fine.
Can anyone please find fault with this code and explain to me.
class BlockingQueue<T>
{
private Queue<T> _queue = new Queue<T>();
private SemaphoreSlim _semaphore = new SemaphoreSlim(0, 2);
public void Enqueue(T data)
{
if (data == null) throw new ArgumentNullException("data");
lock (_queue)
{
_queue.Enqueue(data);
}
_semaphore.Release();
}
public T Dequeue()
{
_semaphore.Wait();
lock (_queue)
{
return _queue.Dequeue();
}
}
}
public class Test
{
private static BlockingQueue<string> _bq = new BlockingQueue<string>();
public static void Main()
{
for (int i = 0; i < 100; i++)
{
_bq.Enqueue("item-" + i);
}
for (int i = 0; i < 5; i++)
{
Thread t = new Thread(Produce);
t.Start();
}
for (int i = 0; i < 100; i++)
{
Thread t = new Thread(Consume);
t.Start();
}
Console.ReadLine();
}
private static Random _random = new Random();
private static void Produce()
{
while (true)
{
_bq.Enqueue("item-" + _random.Next());
Thread.Sleep(2000);
}
}
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
}
}
}

If you want to use the semaphore to control the number of concurrent threads, you're using it wrong. You should acquire the semaphore when you dequeue an item, and release the semaphore when the thread is done processing that item.
What you have right now is a system that allows only two items to be in the queue at any one time. Initially, your semaphore has a count of 2. Each time you enqueue an item, the count is reduced. After two items, the count is 0 and if you try to release again you're going to get a semaphore full exception.
If you really want to do this with a semaphore, you need to remove the Release call from the Enqueue method. And add a Release method to the BlockingQueue class. You then would write:
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
bq.Release();
}
}
That would make your code work, but it's not a very good solution. A much better solution would be to use BlockingCollection<T> and two persistent consumers. Something like:
private BlockingCollection<int> bq = new BlockingCollection<int>();
void Test()
{
// create two consumers
var c1 = new Thread(Consume);
var c2 = new Thread(Consume);
c1.Start();
c2.Start();
// produce
for (var i = 0; i < 100; ++i)
{
bq.Add(i);
}
bq.CompleteAdding();
c1.Join();
c2.Join();
}
void Consume()
{
foreach (var i in bq.GetConsumingEnumerable())
{
Console.WriteLine("Consumed-" + i);
Thread.Sleep(1000);
}
}
That gives you two persistent threads consuming the items. The benefit is that you avoid the cost of spinning up a new thread (or having the RTL assign a pool thread) for each item. Instead, the threads do non-busy waits on the queue. You also don't have to worry about explicit locking, etc. The code is simpler, more robust, and much less likely to contain a bug.

How to Perform Interlocked.Increment/Decrement Properly in IO Bound?

I have simple IO bound 4.0 console application, which send 1 to n requests to a web-service and wait for their completion and then exit. Here is a sample,
static int counter = 0;
static void Main(string[] args)
{
foreach (my Loop)
{
......................
WebClientHelper.PostDataAsync(... =>
{
................................
................................
Interlocked.Decrement(ref counter);
});
Interlocked.Increment(ref counter);
}
while(counter != 0)
{
Thread.Sleep(500);
}
}
Is this is correct implementation?

You can use Tasks. Let TPL manage those things.
Task<T>[] tasks = ...;
//Started the tasks
Task.WaitAll(tasks);
Another way is to use TaskCompletionSource as mentioned here.

As suggested by Hans, here's your code implemented with CountdownEvent:
static void Main(string[] args)
{
var counter = new CountdownEvent();
foreach (my Loop)
{
......................
WebClientHelper.PostDataAsync(... =>
{
................................
................................
counter.Signal();
});
counter.AddCount();
}
counter.Wait();
}

NullReferenceException when creating a thread

I was looking at this thread on creating a simple thread pool. There, I came across #MilanGardian's response for .NET 3.5 which was elegant and served my purpose:
using System;
using System.Collections.Generic;
using System.Threading;
namespace SimpleThreadPool
{
public sealed class Pool : IDisposable
{
public Pool(int size)
{
this._workers = new LinkedList<Thread>();
for (var i = 0; i < size; ++i)
{
var worker = new Thread(this.Worker) { Name = string.Concat("Worker ", i) };
worker.Start();
this._workers.AddLast(worker);
}
}
public void Dispose()
{
var waitForThreads = false;
lock (this._tasks)
{
if (!this._disposed)
{
GC.SuppressFinalize(this);
this._disallowAdd = true; // wait for all tasks to finish processing while not allowing any more new tasks
while (this._tasks.Count > 0)
{
Monitor.Wait(this._tasks);
}
this._disposed = true;
Monitor.PulseAll(this._tasks); // wake all workers (none of them will be active at this point; disposed flag will cause then to finish so that we can join them)
waitForThreads = true;
}
}
if (waitForThreads)
{
foreach (var worker in this._workers)
{
worker.Join();
}
}
}
public void QueueTask(Action task)
{
lock (this._tasks)
{
if (this._disallowAdd) { throw new InvalidOperationException("This Pool instance is in the process of being disposed, can't add anymore"); }
if (this._disposed) { throw new ObjectDisposedException("This Pool instance has already been disposed"); }
this._tasks.AddLast(task);
Monitor.PulseAll(this._tasks); // pulse because tasks count changed
}
}
private void Worker()
{
Action task = null;
while (true) // loop until threadpool is disposed
{
lock (this._tasks) // finding a task needs to be atomic
{
while (true) // wait for our turn in _workers queue and an available task
{
if (this._disposed)
{
return;
}
if (null != this._workers.First && object.ReferenceEquals(Thread.CurrentThread, this._workers.First.Value) && this._tasks.Count > 0) // we can only claim a task if its our turn (this worker thread is the first entry in _worker queue) and there is a task available
{
task = this._tasks.First.Value;
this._tasks.RemoveFirst();
this._workers.RemoveFirst();
Monitor.PulseAll(this._tasks); // pulse because current (First) worker changed (so that next available sleeping worker will pick up its task)
break; // we found a task to process, break out from the above 'while (true)' loop
}
Monitor.Wait(this._tasks); // go to sleep, either not our turn or no task to process
}
}
task(); // process the found task
this._workers.AddLast(Thread.CurrentThread);
task = null;
}
}
private readonly LinkedList<Thread> _workers; // queue of worker threads ready to process actions
private readonly LinkedList<Action> _tasks = new LinkedList<Action>(); // actions to be processed by worker threads
private bool _disallowAdd; // set to true when disposing queue but there are still tasks pending
private bool _disposed; // set to true when disposing queue and no more tasks are pending
}
public static class Program
{
static void Main()
{
using (var pool = new Pool(5))
{
var random = new Random();
Action<int> randomizer = (index =>
{
Console.WriteLine("{0}: Working on index {1}", Thread.CurrentThread.Name, index);
Thread.Sleep(random.Next(20, 400));
Console.WriteLine("{0}: Ending {1}", Thread.CurrentThread.Name, index);
});
for (var i = 0; i < 40; ++i)
{
var i1 = i;
pool.QueueTask(() => randomizer(i1));
}
}
}
}
}
I am using this as follows:
static void Main(string[] args)
{
...
...
while(keepRunning)
{
...
pool.QueueTask(() => DoTask(eventObject);
}
...
}
private static void DoTask(EventObject e)
{
// Do some computations
pool.QueueTask(() => DoAnotherTask(eventObject)); // this is a relatively smaller computation
}
I am getting the following exception after running the code for about two days:
Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Generic.LinkedList`1.InternalInsertNodeBefore(LinkedListNode`1 node, LinkedListNode`1 newNode)
at System.Collections.Generic.LinkedList`1.AddLast(T value)
at MyProg.Pool.Worker()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
I am unable to figure out what is causing this as I am unable to get this error again. Any suggestions on how to fix this?

Seems like access to _workers linked list is not properly synchronized. Consider this scenario:
Lets assume that at some point this._workets list contains one item.
First thread calls this._workers.AddLast(Thread.CurrentThread); but gets interrupted at a very special place - inside AddLast() method:
public void AddLast(LinkedListNode<T> node)
{
this.ValidateNewNode(node);
if (this.head == null)
{
this.InternalInsertNodeToEmptyList(node);
}
else
{
// here we got interrupted - the list was not empty,
// but it would be pretty soon, and this.head becomes null
// InternalInsertNodeBefore() does not expect that
this.InternalInsertNodeBefore(this.head, node);
}
node.list = (LinkedList<T>) this;
}
Other thread calls this._workers.RemoveFirst();. There is no lock() around that statement so it completes and now list is empty. AddLast() now should call InternalInsertNodeToEmptyList(node); but it can't as the condition was already evaluated.
Putting a simple lock(this._tasks) around single this._workers.AddLast() line should prevent such scenario.
Other bad scenarios include adding item to the same list at the same time by two threads.

Think I found the issue. The code sample has a missed lock()
private void Worker()
{
Action task = null;
while (true) // loop until threadpool is disposed
{
lock (this._tasks) // finding a task needs to be atomic
{
while (true) // wait for our turn in _workers queue and an available task
{
....
}
}
task(); // process the found task
this._workers.AddLast(Thread.CurrentThread);
task = null;
}
}
The lock should be extended or wrapped around this._workers.AddLast(Thread.CurrentThread);
If you look at the other code that modifies LinkedList (Pool.QueueTask), it is wrapped in a lock.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# simple serial task queue memory leak - c#

Related

Multi Thread & Semaphore & Events

How to handle threads that hang when using SemaphoreSlim

Why this C# code throws SemaphoreFullException?

How to Perform Interlocked.Increment/Decrement Properly in IO Bound?

NullReferenceException when creating a thread

Categories

Resources