I have a listener that waits for messages to arrive in a queue. I want to throttle the amount of tasks so that for every 1000 messages in queue I need to wait until they complete before processing the next set of messages. The reason for this is the ProcessMessage code calls a WCF service and that seems to get overloaded with too many concurrent calls at once.
I want to know is this the best way to achieve thisthrottling? This code below looks a bit hacky.
var isEmpty = false;
var maxThreads = 1000;
var currentThreadCount = 0;
List<Task> taskList = new List<Task>();
while(!isEmpty)
{
var message = GetMessageFromServer();
if(!String.IsNullorEmpty(message))
{
isEmpty = true;
}
else
{
if(currentThreadCount == maxThreads)
{
task.WaitAll(tasksList.ToArray());
currentThreadCount = 0;
}
else
{
taskList.Add(Task.Run(() => ProcessMessage(message)));
}
}
}
Assuming that you are interested in the result of ProcessMessage I would suggest considering the use of Channels (cf.).
What this could improve with respect to your solution is that you could ensure a consistent amount of work and not stopping when 1000 is reached and waiting until all tasks are finished.
Related
Hi is there any possible way to get the status of the threads from a Thread.Join, or can i make a breakout from a Thread.Join at a specified period?
For eg:
I have a loop that have n-jobs, i've got 3 free cores for 3 parallel threads, and after Joining the 3 threads, i wonder if there's a way to check if a thread has done it's job to start another job in it's place.
I want to keep the 3 cores working all time, not to wait for all threads to stop and then start another 3 of them.
The simplest, and most likely best, solution is to use the threadpool. The threadpool automatically scales based on available processors and cores.
ThreadPool.QueueUserWorkItem(state => TaskOne());
ThreadPool.QueueUserWorkItem(state => TaskTwo());
ThreadPool.QueueUserWorkItem(state => TaskThree());
ThreadPool.QueueUserWorkItem(state => TaskFour());
If you need to do this the hard way, you could keep a queue of pending tasks and a list of currently running tasks, and use a timeout for the Join() call so that it returns false if the thread is not ready.
I can't think of any reason to prefer the complex to the simple solution, but there might be one, of course.
var MAX_RUNNING = 3;
var JOIN_TIMEOUT_MS = 50;
var waiting = new Queue<ThreadStart>();
var running = new List<Thread>();
waiting.Enqueue(new ThreadStart(TaskOne));
waiting.Enqueue(new ThreadStart(TaskTwo));
waiting.Enqueue(new ThreadStart(TaskThree));
waiting.Enqueue(new ThreadStart(TaskFour));
while (waiting.Any() || running.Any())
{
while (running.Count < MAX_RUNNING && waiting.Any())
{
var next = new Thread(waiting.Dequeue());
next.Start();
running.Add(next);
}
for (var i = running.Count - 1; i >= 0; --i)
{
var t = running[i];
if(t.ThreadState == System.Threading.ThreadState.Stopped) {
running.RemoveAt(i);
break;
}
if (t.Join(JOIN_TIMEOUT_MS))
{
running.RemoveAt(i);
break;
}
}
}
I have to send 10000 messages. At the moment, it happens synchronously and takes up to 20 minutes to send them all.
// sending messages in a sync way
foreach (var message in messages)
{
var result = Send(message);
_logger.Info($"Successfully sent {message.Title}.")
}
To shorten the message sending time, I'd like to use async and await, but my concern is if C# runtime can handle 15000 number of tasks in the worker process.
var tasks = new List<Task>();
foreach (var message in messages)
{
tasks.Add(Task.Run(() => Send(message))
}
var t = Task.WhenAll(tasks);
t.Wait();
...
Also, in terms of memory, I'm not sure if it's a good idea to create a list of 15000 tasks
Since I came home from work, I have played with this a bit and here is my answer.
First of all Parallel.ForEach is bretty cool to use, and with my 8 core runs very fast.
I suggest to limit the CPU usage so you do not use 100% capacity, but that depends on your system, I have made two suggestion for it.
The other things is you need to monitor and be sure that your sender server can eat all these jobs with out getting trouble.
Here is a the implementation:
public void MessMessageSender(List<Message> messages)
{
try
{
var parallelOptions = new ParallelOptions();
_cancelToken = new CancellationTokenSource();
parallelOptions.CancellationToken = _cancelToken.Token;
var maxProc = System.Environment.ProcessorCount;
// this option use around 75% core capacity
parallelOptions.MaxDegreeOfParallelism = Convert.ToInt32(Math.Ceiling(maxProc * 0.75));
// the following option use all cores expect 1
//parallelOptions.MaxDegreeOfParallelism = maxProc - 1;
try
{
Parallel.ForEach(messages, parallelOptions, message =>
{
try
{
Send(message);
//_logger.Info($"Successfully sent {text.Title}.");
}
catch (Exception ex)
{
//_logger.Error($"Something went wrong {ex}.");
}
});
}
catch (OperationCanceledException e)
{
//User has cancelled this request.
}
}
finally
{
//What ever dispose of clients;
}
}
My answer is inspired for this page.
Documentation:
Parallel.Foreach
Environment.ProcessorCount
I've an application that works with a queue with strings (which corresponds to different tasks that application needs to perform). At random moments the queue can be filled with strings (like several times a minute sometimes but it also can take a few hours.
Till now I always had a timer that checked every few seconds the queue whether there were items in the queue and removed them.
I think there must be a nicer solution than this way. Is there any way to get an event or so when an item is added to the queue?
Yes. Take a look at TPL Dataflow, in particular, the BufferBlock<T>, which does more or less the same as BlockingCollection without the nasty side-effect of jamming up your threads by leveraging async/await.
So you can:
void Main()
{
var b = new BufferBlock<string>();
AddToBlockAsync(b);
ReadFromBlockAsync(b);
}
public async Task AddToBlockAsync(BufferBlock<string> b)
{
while (true)
{
b.Post("hello");
await Task.Delay(1000);
}
}
public async Task ReadFromBlockAsync(BufferBlock<string> b)
{
await Task.Delay(10000); //let some messages buffer up...
while(true)
{
var msg = await b.ReceiveAsync();
Console.WriteLine(msg);
}
}
I'd take a look at BlockingCollection.GetConsumingEnumerable. The collection will be backed with a queue by default, and it is a nice way to automatically take values from the queue as they are added using a simple foreach loop.
There is also an overload that allows you to supply a CancellationToken meaning you can cleanly break out.
Have you looked at BlockingCollection ? The GetConsumingEnumerable() method allows an indefinite loop to be run on the consumer, to which will new items will be yielded once an item becomes available, with no need for timers, or Thread.Sleep's:
// Common:
BlockingCollection<string> _blockingCollection =
new BlockingCollection<string>();
// Producer
for (var i = 0; i < 100; i++)
{
_blockingCollection.Add(i.ToString());
Thread.Sleep(500); // So you can track the consumer synchronization. Remove.
}
// Consumer:
foreach (var item in _blockingCollection.GetConsumingEnumerable())
{
Debug.WriteLine(item);
}
I'm using C# Parallel.ForEach to process more than thousand subsets of data. One set takes 5-30 minutes to process, depending on size of the set. In my computer with option
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = Environment.ProcessorCount
I'll get 8 parallel processes. As I understood, processes are divided equally between parallel tasks (e.g. the first task gets jobs number 1,9,17 etc, the second gets 2,10,18 etc.); therefore, one task can finish own jobs sooner than others. Because those sets of data took less time than others.
The problem is that four parallel tasks finish their jobs within 24 hours, but the last one finish in 48 hours. It there some chance to organize parallelism so that all parallel tasks are finishing equally? It means all parallel tasks continue working until all jobs are done?
Since the jobs are not equal, you can't split the number of jobs between processors and have them finish at about the same time. I think what you need here is 8 worker threads that retrieve the next job in line. You will have to use a lock on the function to get the next job.
Somebody correct me if I'm wrong, but off the top of my head... a worker thread could be given a function like this:
public void ProcessJob()
{
for (Job myJob = GetNextJob(); myJob != null; myJob = GetNextJob())
{
// process job
}
}
And the function to get the next job would look like:
private List<Job> jobs;
private int currentJob = 0;
private Job GetNextJob()
{
lock (jobs)
{
Job job = null;
if (currentJob < jobs.Count)
{
job = jobs[currentJob];
currentJob++;
}
return job;
}
}
It seems that there is no ready-to-use solution and it has to be created.
My previous code was:
var ListOfSets = (from x in Database
group x by x.SetID into z
select new { ID = z.Key}).ToList();
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(ListOfSets, po, SingleSet=>
{
AnalyzeSet(SingleSet.ID);
});
To share work equally between all CPU-s, I still use Parallel to do the work, but instead of ForEach I use For and an idea from Matt. The new code is:
Parallel.For(0, Environment.ProcessorCount, i=>
{
while(ListOfSets.Count() > 0)
{
double SetID = 0;
lock (ListOfSets)
{
SetID = ListOfSets[0].ID;
ListOfSets.RemoveAt(0);
}
AnalyzeSet(SetID);
}
});
So, thank you for your advice.
One option, as suggested by others, is to manage your own producer consumer queue. I'd like to note that using the BlockingCollection makes this very easy to do.
BlockingCollection<JobData> queue = new BlockingCollection<JobData>();
//add data to queue; if it can be done quickly, just do it inline.
//If it's expensive, start a new task/thread just to add items to the queue.
foreach (JobData job in data)
queue.Add(job);
queue.CompleteAdding();
for (int i = 0; i < Environment.ProcessorCount; i++)
{
Task.Factory.StartNew(() =>
{
foreach (var job in queue.GetConsumingEnumerable())
{
ProcessJob(job);
}
}, TaskCreationOptions.LongRunning);
}
Say I have 10N items(I need to fetch them via http protocol), in the code N Tasks are started to get data, each task takes 10 items in sequence. I put the items in a ConcurrentQueue<Item>. After that, the items are processed in a thread-unsafe method one by one.
async Task<Item> GetItemAsync()
{
//fetch one item from the internet
}
async Task DoWork()
{
var tasks = new List<Task>();
var items = new ConcurrentQueue<Item>();
var handles = new List<ManualResetEvent>();
for i 1 -> N
{
var handle = new ManualResetEvent(false);
handles.Add(handle);
tasks.Add(Task.Factory.StartNew(async delegate
{
for j 1 -> 10
{
var item = await GetItemAsync();
items.Enqueue(item);
}
handle.Set();
});
}
//begin to process the items when any handle is set
WaitHandle.WaitAny(handles);
while(true)
{
if (all handles are set && items collection is empty) //***
break;
//in another word: all tasks are really completed
while(items.TryDequeue(out item))
{
AThreadUnsafeMethod(item); //process items one by one
}
}
}
I don't know what if condition can be placed in the statement marked ***. I can't use Task.IsCompleted property here, because I use await in the task, so the task is completed very soon. And a bool[] that indicates whether the task is executed to the end looks really ugly, because I think ManualResetEvent can do the same work. Can anyone give me a suggestion?
Well, you could build this yourself, but I think it's tons easier with TPL Dataflow.
Something like:
static async Task DoWork()
{
// By default, ActionBlock uses MaxDegreeOfParallelism == 1,
// so AThreadUnsafeMethod is not called in parallel.
var block = new ActionBlock<Item>(AThreadUnsafeMethod);
// Start off N tasks, each asynchronously acquiring 10 items.
// Each item is sent to the block as it is received.
var tasks = Enumerable.Range(0, N).Select(Task.Run(
async () =>
{
for (int i = 0; i != 10; ++i)
block.Post(await GetItemAsync());
})).ToArray();
// Complete the block when all tasks have completed.
Task.WhenAll(tasks).ContinueWith(_ => { block.Complete(); });
// Wait for the block to complete.
await block.Completion;
}
You can do a WaitOne with a timeout of zero to check the state. Something like this should work:
if (handles.All(handle => handle.WaitOne(TimeSpan.Zero)) && !items.Any())
break;
http://msdn.microsoft.com/en-us/library/cc190477.aspx
Thanks all. At last I found CountDownEvent is very suitable for this scenario. The general implementation looks like this:(for others' information)
for i 1 -> N
{
//start N tasks
//invoke CountDownEvent.Signal() at the end of each task
}
//see if CountDownEvent.IsSet here