I am trying to execute some commands that come from RabbitMQ. Its about 5 msgs/sec. So as are too many msg, I have to send to a thread to execute, but I dont have so many threads, so I put a limit of 10.
so the ideia was that the msgs would come to the worker, put in a queue and any of the 10 threads would peak and execute. All these using semaphore.
After some experiments, I donĀ“t know why, but my thread only executes 3 or 4 items, after that it just stops with no error...
The problem I think is the logic when the event calls the method to execute, could not think in a better way...
Why just the first 4 msgs are processed??
What pattern or better way to do this?
Here are some parts of my code:
const int MaxThreads = 10;
private static Semaphore sem = new Semaphore(MaxThreads, MaxThreads);
private static Queue<BasicDeliverEventArgs> queue = new Queue<BasicDeliverEventArgs>();
static void Main(string[] args)
{
consumer.Received += (sender, ea) =>
{
var m = JsonConvert.DeserializeObject<Mail>(ea.Body.GetString());
Console.WriteLine($"Sub-> {m.Subject}");
queue.Enqueue(ea);
RUN();
};
channel.BasicConsume(queueName, false, consumer);
Console.Read();
}
private static void RUN()
{
while (queue.Count > 0)
{
sem.WaitOne();
var item = queue.Dequeue();
ThreadPool.QueueUserWorkItem(sendmail, item);
}
}
private static void sendmail(Object item)
{
//.....soem processing stuff....
//tell rabbitMq that everything was OK
channel.BasicAck(deliveryTag: x.DeliveryTag, multiple: true);
//release thread
sem.Release();
}
I think that you could use a blocking collection here. It will simplify the code.
So your email sender would look something like that:
public class ParallelEmailSender : IDisposable
{
private readonly BlockingCollection<string> blockingCollection;
public ParallelEmailSender(int threadsCount)
{
blockingCollection = new BlockingCollection<string>(new ConcurrentQueue<string>());
for (int i = 0; i < threadsCount; i++)
{
Task.Factory.StartNew(SendInternal);
}
}
public void Send(string message)
{
blockingCollection.Add(message);
}
private void SendInternal()
{
foreach (string message in blockingCollection.GetConsumingEnumerable())
{
// send method
}
}
public void Dispose()
{
blockingCollection.CompleteAdding();
}
}
Of course you will need to add error catching logic and you could also improve the app shutting down process by using cancellation tokens.
I strongly suggest to read the great e-book about multithreading programming written by Joseph Albahari.
Related
I needed a very basic serial execution queue.
I wrote the following based on this idea, but I needed a queue to ensure FIFO, so I added a intermediate ConcurrentQueue<>.
Here is the code:
public class SimpleSerialTaskQueue
{
private SemaphoreSlim _semaphore = new SemaphoreSlim(0);
private ConcurrentQueue<Func<Task>> _taskQueue = new ConcurrentQueue<Func<Task>>();
public SimpleSerialTaskQueue()
{
Task.Run(async () =>
{
Func<Task> dequeuedTask;
while (true)
{
if (await _semaphore.WaitAsync(1000))
{
if (_taskQueue.TryDequeue(out dequeuedTask) == true)
{
await dequeuedTask();
}
}
else
{
Console.WriteLine("Nothing more to process");
//If I don't do that , memory pressure is never released
//GC.Collect();
}
}
});
}
public void Add(Func<Task> o_task)
{
_taskQueue.Enqueue(o_task);
_semaphore.Release();
}
}
When I run that in a loop, simulating heavy load, I get some kind of memory leak. Here is the code:
static void Main(string[] args)
{
SimpleSerialTaskQueue queue = new SimpleSerialTaskQueue();
for (int i = 0; i < 100000000; i++)
{
queue.Add(async () =>
{
await Task.Delay(0);
});
}
Console.ReadLine();
}
EDIT:
I don't understand why once the tasks have been executed, I still get like 750MB of memory used (based on VS2015 diagnostic tools). I thought once executed it would be very low. The GC doesnt seem to collect anything.
Can anyone tell me what is happening? Is this related to the state machine
I have some code that runs thousands of URLs through a third party library. Occasionally the method in the library hangs which takes up a thread. After a while all threads are taken up by processes doing nothing and it grinds to a halt.
I am using a SemaphoreSlim to control adding new threads so I can have an optimal number of tasks running. I need a way to identify tasks that have been running too long and then to kill them but also release a thread from the SemaphoreSlim so a new task can be created.
I am struggling with the approach here so I made some test code that immitates what I am doing. It create tasks that have a 10% chance of hanging so very quickly all threads have hung.
How should I be checking for these and killing them off?
Here is the code:
class Program
{
public static SemaphoreSlim semaphore;
public static List<Task> taskList;
static void Main(string[] args)
{
List<string> urlList = new List<string>();
Console.WriteLine("Generating list");
for (int i = 0; i < 1000; i++)
{
//adding random strings to simulate a large list of URLs to process
urlList.Add(Path.GetRandomFileName());
}
Console.WriteLine("Queueing tasks");
semaphore = new SemaphoreSlim(10, 10);
Task.Run(() => QueueTasks(urlList));
Console.ReadLine();
}
static void QueueTasks(List<string> urlList)
{
taskList = new List<Task>();
foreach (var url in urlList)
{
Console.WriteLine("{0} tasks can enter the semaphore.",
semaphore.CurrentCount);
semaphore.Wait();
taskList.Add(DoTheThing(url));
}
}
static async Task DoTheThing(string url)
{
Random rand = new Random();
// simulate the IO process
await Task.Delay(rand.Next(2000, 10000));
// add a 10% chance that the thread will hang simulating what happens occasionally with http request
int chance = rand.Next(1, 100);
if (chance <= 10)
{
while (true)
{
await Task.Delay(1000000);
}
}
semaphore.Release();
Console.WriteLine(url);
}
}
As people have already pointed out, Aborting threads in general is bad and there is no guaranteed way of doing it in C#. Using a separate process to do the work and then kill it is a slightly better idea than attempting Thread.Abort; but still not the best way to go. Ideally, you want co-operative threads/processes, which use IPC to decide when to bail out themselves. This way the cleanup is done properly.
With all that said, you can use code like below to do what you intend to do. I have written it assuming your task will be done in a thread. With slight changes, you can use the same logic to do your task in a process
The code is by no means bullet-proof and is meant to be illustrative. The concurrent code is not really tested well. Locks are held for longer than needed and some places I am not locking (like the Log function)
class TaskInfo {
public Thread Task;
public DateTime StartTime;
public TaskInfo(ParameterizedThreadStart startInfo, object startArg) {
Task = new Thread(startInfo);
Task.Start(startArg);
StartTime = DateTime.Now;
}
}
class Program {
const int MAX_THREADS = 1;
const int TASK_TIMEOUT = 6; // in seconds
const int CLEANUP_INTERVAL = TASK_TIMEOUT; // in seconds
public static SemaphoreSlim semaphore;
public static List<TaskInfo> TaskList;
public static object TaskListLock = new object();
public static Timer CleanupTimer;
static void Main(string[] args) {
List<string> urlList = new List<string>();
Log("Generating list");
for (int i = 0; i < 2; i++) {
//adding random strings to simulate a large list of URLs to process
urlList.Add(Path.GetRandomFileName());
}
Log("Queueing tasks");
semaphore = new SemaphoreSlim(MAX_THREADS, MAX_THREADS);
Task.Run(() => QueueTasks(urlList));
CleanupTimer = new Timer(CleanupTasks, null, CLEANUP_INTERVAL * 1000, CLEANUP_INTERVAL * 1000);
Console.ReadLine();
}
// TODO: Guard against re-entrancy
static void CleanupTasks(object state) {
Log("CleanupTasks started");
lock (TaskListLock) {
var now = DateTime.Now;
int n = TaskList.Count;
for (int i = n - 1; i >= 0; --i) {
var task = TaskList[i];
Log($"Checking task with ID {task.Task.ManagedThreadId}");
// kill processes running for longer than anticipated
if (task.Task.IsAlive && now.Subtract(task.StartTime).TotalSeconds >= TASK_TIMEOUT) {
Log("Cleaning up hung task");
task.Task.Abort();
}
// remove task if it is not alive
if (!task.Task.IsAlive) {
Log("Removing dead task from list");
TaskList.RemoveAt(i);
continue;
}
}
if (TaskList.Count == 0) {
Log("Disposing cleanup thread");
CleanupTimer.Dispose();
}
}
Log("CleanupTasks done");
}
static void QueueTasks(List<string> urlList) {
TaskList = new List<TaskInfo>();
foreach (var url in urlList) {
Log($"Trying to schedule url = {url}");
semaphore.Wait();
Log("Semaphore acquired");
ParameterizedThreadStart taskRoutine = obj => {
try {
DoTheThing((string)obj);
} finally {
Log("Releasing semaphore");
semaphore.Release();
}
};
var task = new TaskInfo(taskRoutine, url);
lock (TaskListLock)
TaskList.Add(task);
}
Log("All tasks queued");
}
// simulate all processes get hung
static void DoTheThing(string url) {
while (true)
Thread.Sleep(5000);
}
static void Log(string msg) {
Console.WriteLine("{0:HH:mm:ss.fff} Thread {1,2} {2}", DateTime.Now, Thread.CurrentThread.ManagedThreadId.ToString(), msg);
}
}
I am trying to implement producer/consumer pattern with multiple or parallel consumers.
I did an implementation but I would like to know how good it is. Can somebody do better? Can any of you spot any errors?
Unfortunately I can not use TPL dataflow, because we are at the end of our project and to put in an extra library in our package would take to much paperwork and we do not have that time.
What I am trying to do is to speed up the following portion:
anIntermediaryList = StepOne(anInputList); // I will put StepOne as Producer :-) Step one is remote call.
aResultList = StepTwo(anIntermediaryList); // I will put StepTwo as Consumer, however he also produces result. Step two is also a remote call.
// StepOne is way faster than StepTwo.
For this I came up with the idea that I will chunk the input list (anInputList)
StepOne will be inside of a Producer and will put the intermediary chunks into a queue.
There will be multiple Producers and they will take the intermediary results and process it with StepTwo.
Here is a simplified version of of the implementation later:
Task.Run(() => {
aChunkinputList = Split(anInputList)
foreach(aChunk in aChunkinputList)
{
anIntermediaryResult = StepOne(aChunk)
intermediaryQueue.Add(anIntermediaryResult)
}
})
while(intermediaryQueue.HasItems)
{
anItermediaryResult = intermediaryQueue.Dequeue()
Task.Run(() => {
aResultList = StepTwo(anItermediaryResult);
resultQueue.Add(aResultList)
}
}
I also thought that the best number for the parallel running Consumers would be: "Environment.ProcessorCount / 2". I would like to know if this also is a good idea.
Now here is my mock implementation and the question is can somebody do better or spot any error?
class Example
{
protected static readonly int ParameterCount_ = 1000;
protected static readonly int ChunkSize_ = 100;
// This might be a good number for the parallel consumers.
protected static readonly int ConsumerCount_ = Environment.ProcessorCount / 2;
protected Semaphore mySemaphore_ = new Semaphore(Example.ConsumerCount_, Example.ConsumerCount_);
protected ConcurrentQueue<List<int>> myIntermediaryQueue_ = new ConcurrentQueue<List<int>>();
protected ConcurrentQueue<List<int>> myResultQueue_ = new ConcurrentQueue<List<int>>();
public void Main()
{
List<int> aListToProcess = new List<int>(Example.ParameterCount_ + 1);
aListToProcess.AddRange(Enumerable.Range(0, Example.ParameterCount_));
Task aProducerTask = Task.Run(() => Producer(aListToProcess));
List<Task> aTaskList = new List<Task>();
while(!aProducerTask.IsCompleted || myIntermediaryQueue_.Count > 0)
{
List<int> aChunkToProcess;
if (myIntermediaryQueue_.TryDequeue(out aChunkToProcess))
{
mySemaphore_.WaitOne();
aTaskList.Add(Task.Run(() => Consumer(aChunkToProcess)));
}
}
Task.WaitAll(aTaskList.ToArray());
List<int> aResultList = new List<int>();
foreach(List<int> aChunk in myResultQueue_)
{
aResultList.AddRange(aChunk);
}
aResultList.Sort();
if (aListToProcess.SequenceEqual(aResultList))
{
Console.WriteLine("All good!");
}
else
{
Console.WriteLine("Bad, very bad!");
}
}
protected void Producer(List<int> elements_in)
{
List<List<int>> aChunkList = Example.SplitList(elements_in, Example.ChunkSize_);
foreach(List<int> aChunk in aChunkList)
{
Console.WriteLine("Thread Id: {0} Producing from: ({1}-{2})",
Thread.CurrentThread.ManagedThreadId,
aChunk.First(),
aChunk.Last());
myIntermediaryQueue_.Enqueue(ProduceItemsRemoteCall(aChunk));
}
}
protected void Consumer(List<int> elements_in)
{
Console.WriteLine("Thread Id: {0} Consuming from: ({1}-{2})",
Thread.CurrentThread.ManagedThreadId,
Convert.ToInt32(Math.Sqrt(elements_in.First())),
Convert.ToInt32(Math.Sqrt(elements_in.Last())));
myResultQueue_.Enqueue(ConsumeItemsRemoteCall(elements_in));
mySemaphore_.Release();
}
// Dummy Remote Call
protected List<int> ProduceItemsRemoteCall(List<int> elements_in)
{
return elements_in.Select(x => x * x).ToList();
}
// Dummy Remote Call
protected List<int> ConsumeItemsRemoteCall(List<int> elements_in)
{
return elements_in.Select(x => Convert.ToInt32(Math.Sqrt(x))).ToList();
}
public static List<List<int>> SplitList(List<int> masterList_in, int chunkSize_in)
{
List<List<int>> aReturnList = new List<List<int>>();
for (int i = 0; i < masterList_in.Count; i += chunkSize_in)
{
aReturnList.Add(masterList_in.GetRange(i, Math.Min(chunkSize_in, masterList_in.Count - i)));
}
return aReturnList;
}
}
Main function:
class Program
{
static void Main(string[] args)
{
Example anExample = new Example();
anExample.Main();
}
}
Bye
Laszlo
Based on the comments I've posted a second and third version:
https://codereview.stackexchange.com/questions/71182/producer-consumer-in-c-with-multiple-parallel-consumers-and-no-tpl-dataflow/71233#71233
I have following code which throws SemaphoreFullException, I don't understand why ?
If I change _semaphore = new SemaphoreSlim(0, 2) to
_semaphore = new SemaphoreSlim(0, int.MaxValue)
then all works fine.
Can anyone please find fault with this code and explain to me.
class BlockingQueue<T>
{
private Queue<T> _queue = new Queue<T>();
private SemaphoreSlim _semaphore = new SemaphoreSlim(0, 2);
public void Enqueue(T data)
{
if (data == null) throw new ArgumentNullException("data");
lock (_queue)
{
_queue.Enqueue(data);
}
_semaphore.Release();
}
public T Dequeue()
{
_semaphore.Wait();
lock (_queue)
{
return _queue.Dequeue();
}
}
}
public class Test
{
private static BlockingQueue<string> _bq = new BlockingQueue<string>();
public static void Main()
{
for (int i = 0; i < 100; i++)
{
_bq.Enqueue("item-" + i);
}
for (int i = 0; i < 5; i++)
{
Thread t = new Thread(Produce);
t.Start();
}
for (int i = 0; i < 100; i++)
{
Thread t = new Thread(Consume);
t.Start();
}
Console.ReadLine();
}
private static Random _random = new Random();
private static void Produce()
{
while (true)
{
_bq.Enqueue("item-" + _random.Next());
Thread.Sleep(2000);
}
}
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
}
}
}
If you want to use the semaphore to control the number of concurrent threads, you're using it wrong. You should acquire the semaphore when you dequeue an item, and release the semaphore when the thread is done processing that item.
What you have right now is a system that allows only two items to be in the queue at any one time. Initially, your semaphore has a count of 2. Each time you enqueue an item, the count is reduced. After two items, the count is 0 and if you try to release again you're going to get a semaphore full exception.
If you really want to do this with a semaphore, you need to remove the Release call from the Enqueue method. And add a Release method to the BlockingQueue class. You then would write:
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
bq.Release();
}
}
That would make your code work, but it's not a very good solution. A much better solution would be to use BlockingCollection<T> and two persistent consumers. Something like:
private BlockingCollection<int> bq = new BlockingCollection<int>();
void Test()
{
// create two consumers
var c1 = new Thread(Consume);
var c2 = new Thread(Consume);
c1.Start();
c2.Start();
// produce
for (var i = 0; i < 100; ++i)
{
bq.Add(i);
}
bq.CompleteAdding();
c1.Join();
c2.Join();
}
void Consume()
{
foreach (var i in bq.GetConsumingEnumerable())
{
Console.WriteLine("Consumed-" + i);
Thread.Sleep(1000);
}
}
That gives you two persistent threads consuming the items. The benefit is that you avoid the cost of spinning up a new thread (or having the RTL assign a pool thread) for each item. Instead, the threads do non-busy waits on the queue. You also don't have to worry about explicit locking, etc. The code is simpler, more robust, and much less likely to contain a bug.
Example for threading queue book "Accelerated C# 2008" (CrudeThreadPool class) not work correctly. If I insert long job in WorkFunction() on 2-processor machine executing for next task don't run before first is over. How to solve this problem? I want to load the processor to 100 percent
public class CrudeThreadPool
{
static readonly int MAX_WORK_THREADS = 4;
static readonly int WAIT_TIMEOUT = 2000;
public delegate void WorkDelegate();
public CrudeThreadPool()
{
stop = 0;
workLock = new Object();
workQueue = new Queue();
threads = new Thread[MAX_WORK_THREADS];
for (int i = 0; i < MAX_WORK_THREADS; ++i)
{
threads[i] = new Thread(new ThreadStart(this.ThreadFunc));
threads[i].Start();
}
}
private void ThreadFunc()
{
lock (workLock)
{
int shouldStop = 0;
do
{
shouldStop = Interlocked.Exchange(ref stop, stop);
if (shouldStop == 0)
{
WorkDelegate workItem = null;
if (Monitor.Wait(workLock, WAIT_TIMEOUT))
{
// Process the item on the front of the queue
lock (workQueue)
{
workItem = (WorkDelegate)workQueue.Dequeue();
}
workItem();
}
}
} while (shouldStop == 0);
}
}
public void SubmitWorkItem(WorkDelegate item)
{
lock (workLock)
{
lock (workQueue)
{
workQueue.Enqueue(item);
}
Monitor.Pulse(workLock);
}
}
public void Shutdown()
{
Interlocked.Exchange(ref stop, 1);
}
private Queue workQueue;
private Object workLock;
private Thread[] threads;
private int stop;
}
public class EntryPoint
{
static void WorkFunction()
{
Console.WriteLine("WorkFunction() called on Thread 0}", Thread.CurrentThread.GetHashCode());
//some long job
double s = 0;
for (int i = 0; i < 100000000; i++)
s += Math.Sin(i);
}
static void Main()
{
CrudeThreadPool pool = new CrudeThreadPool();
for (int i = 0; i < 10; ++i)
{
pool.SubmitWorkItem(
new CrudeThreadPool.WorkDelegate(EntryPoint.WorkFunction));
}
pool.Shutdown();
}
}
I can see 2 problems:
Inside ThreadFunc() you take a lock(workLock) for the duration of the method, meaning your threadpool is no longer async.
in the Main() method, you close down the threadpool w/o waiting for it to finish. Oddly enough that is why it is working now, stopping each ThreadFunc after 1 loop.
It's hard to tell because there's no indentation, but it looks to me like it's executing the work item while still holding workLock - which is basically going to serialize all the work.
If at all possible, I suggest you start using the Parallel Extensions framework in .NET 4, which has obviously had rather more time spent on it. Otherwise, there's the existing thread pool in the framework, and there are other implementations around if you're willing to have a look. I have one in MiscUtil although I haven't looked at the code for quite a while - it's pretty primitive.