C# multiple asynchronous HttpRequest with one callback - c#

I want to make 10 asynchronous http requests at once and only process the results when all have completed and in a single callback function. I also do not want to block any threads using WaitAll (it is my understanding that WaitAll blocks until all are complete). I think I want to make a custom IAsyncResult which will handle multiple calls. Am I on the right track? Are there any good resources or examples out there that describe handling this?

I like Darin's solution. But, if you want something more traditional, you can try this.
I would definitely use an array of wait handles and the WaitAll mechanism:
static void Main(string[] args)
{
WaitCallback del = state =>
{
ManualResetEvent[] resetEvents = new ManualResetEvent[10];
WebClient[] clients = new WebClient[10];
Console.WriteLine("Starting requests");
for (int index = 0; index < 10; index++)
{
resetEvents[index] = new ManualResetEvent(false);
clients[index] = new WebClient();
clients[index].OpenReadCompleted += new OpenReadCompletedEventHandler(client_OpenReadCompleted);
clients[index].OpenReadAsync(new Uri(#"http:\\www.google.com"), resetEvents[index]);
}
bool succeeded = ManualResetEvent.WaitAll(resetEvents, 10000);
Complete(succeeded);
for (int index = 0; index < 10; index++)
{
resetEvents[index].Dispose();
clients[index].Dispose();
}
};
ThreadPool.QueueUserWorkItem(del);
Console.WriteLine("Waiting...");
Console.ReadKey();
}
static void client_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
{
// Do something with data...Then close the stream
e.Result.Close();
ManualResetEvent readCompletedEvent = (ManualResetEvent)e.UserState;
readCompletedEvent.Set();
Console.WriteLine("Received callback");
}
static void Complete(bool succeeded)
{
if (succeeded)
{
Console.WriteLine("Yeah!");
}
else
{
Console.WriteLine("Boohoo!");
}
}

In .NET 4.0 there's a nice parallel Task library that allows you to do things like:
using System;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
class Program
{
public static void Main()
{
var urls = new[] { "http://www.google.com", "http://www.yahoo.com" };
Task.Factory.ContinueWhenAll(
urls.Select(url => Task.Factory.StartNew(u =>
{
using (var client = new WebClient())
{
return client.DownloadString((string)u);
}
}, url)).ToArray(),
tasks =>
{
var results = tasks.Select(t => t.Result);
foreach (var html in results)
{
Console.WriteLine(html);
}
});
Console.ReadLine();
}
}
As you can see for each url in the list a different task is started and once all tasks are completed the callback is invoked and passed the result of all tasks.

I think you are better off using the WaitAll approach. Otherwise you will be processing 10 IAsyncResult callbacks, and using a semaphore to determine that all 10 are finally complete.
Keep in mind that WaitAll is very efficient; it is not like the silliness of having a thread "sleep." When a thread sleeps, it continues to use processing time. When a thread is "de-scheduled" because it hit a WaitAll, then the thread no longer consumes any processor time. It is very efficient.

Related

Limit the number of cores: new Task will block until core free from other tasks [duplicate]

I have a collection of 1000 input message to process. I'm looping the input collection and starting the new task for each message to get processed.
//Assume this messages collection contains 1000 items
var messages = new List<string>();
foreach (var msg in messages)
{
Task.Factory.StartNew(() =>
{
Process(msg);
});
}
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time?
How to ensure this message get processed in the same sequence/order of the Collection?
You could use Parallel.Foreach and rely on MaxDegreeOfParallelism instead.
Parallel.ForEach(messages, new ParallelOptions {MaxDegreeOfParallelism = 10},
msg =>
{
// logic
Process(msg);
});
SemaphoreSlim is a very good solution in this case and I higly recommend OP to try this, but #Manoj's answer has flaw as mentioned in comments.semaphore should be waited before spawning the task like this.
Updated Answer: As #Vasyl pointed out Semaphore may be disposed before completion of tasks and will raise exception when Release() method is called so before exiting the using block must wait for the completion of all created Tasks.
int maxConcurrency=10;
var messages = new List<string>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach(var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Answer to Comments
for those who want to see how semaphore can be disposed without Task.WaitAll
Run below code in console app and this exception will be raised.
System.ObjectDisposedException: 'The semaphore has been disposed.'
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
// Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static void Process(string msg)
{
Thread.Sleep(2000);
Console.WriteLine(msg);
}
I think it would be better to use Parallel LINQ
Parallel.ForEach(messages ,
new ParallelOptions{MaxDegreeOfParallelism = 4},
x => Process(x);
);
where x is the MaxDegreeOfParallelism
With .NET 5.0 and Core 3.0 channels were introduced.
The main benefit of this producer/consumer concurrency pattern is that you can also limit the input data processing to reduce resource impact.
This is especially helpful when processing millions of data records.
Instead of reading the whole dataset at once into memory, you can now consecutively query only chunks of the data and wait for the workers to process it before querying more.
Code sample with a queue capacity of 50 messages and 5 consumer threads:
/// <exception cref="System.AggregateException">Thrown on Consumer Task exceptions.</exception>
public static async Task ProcessMessages(List<string> messages)
{
const int producerCapacity = 10, consumerTaskLimit = 3;
var channel = Channel.CreateBounded<string>(producerCapacity);
_ = Task.Run(async () =>
{
foreach (var msg in messages)
{
await channel.Writer.WriteAsync(msg);
// blocking when channel is full
// waiting for the consumer tasks to pop messages from the queue
}
channel.Writer.Complete();
// signaling the end of queue so that
// WaitToReadAsync will return false to stop the consumer tasks
});
var tokenSource = new CancellationTokenSource();
CancellationToken ct = tokenSource.Token;
var consumerTasks = Enumerable
.Range(1, consumerTaskLimit)
.Select(_ => Task.Run(async () =>
{
try
{
while (await channel.Reader.WaitToReadAsync(ct))
{
ct.ThrowIfCancellationRequested();
while (channel.Reader.TryRead(out var message))
{
await Task.Delay(500);
Console.WriteLine(message);
}
}
}
catch (OperationCanceledException) { }
catch
{
tokenSource.Cancel();
throw;
}
}))
.ToArray();
Task waitForConsumers = Task.WhenAll(consumerTasks);
try { await waitForConsumers; }
catch
{
foreach (var e in waitForConsumers.Exception.Flatten().InnerExceptions)
Console.WriteLine(e.ToString());
throw waitForConsumers.Exception.Flatten();
}
}
As pointed out by Theodor Zoulias:
On multiple consumer exceptions, the remaining tasks will continue to run and have to take the load of the killed tasks. To avoid this, I implemented a CancellationToken to stop all the remaining tasks and handle the exceptions combined in the AggregateException of waitForConsumers.Exception.
Side note:
The Task Parallel Library (TPL) might be good at automatically limiting the tasks based on your local resources. But when you are processing data remotely via RPC, it's necessary to manually limit your RPC calls to avoid filling the network/processing stack!
If your Process method is async you can't use Task.Factory.StartNew as it doesn't play well with an async delegate. Also there are some other nuances when using it (see this for example).
The proper way to do it in this case is to use Task.Run. Here's #ClearLogic answer modified for an async Process method.
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Run(async () =>
{
try
{
await Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static async Task Process(string msg)
{
await Task.Delay(2000);
Console.WriteLine(msg);
}
You can create your own TaskScheduler and override QueueTask there.
protected virtual void QueueTask(Task task)
Then you can do anything you like.
One example here:
Limited concurrency level task scheduler (with task priority) handling wrapped tasks
You can simply set the max concurrency degree like this way:
int maxConcurrency=10;
var messages = new List<1000>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
foreach(var msg in messages)
{
Task.Factory.StartNew(() =>
{
concurrencySemaphore.Wait();
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
}
}
If you need in-order queuing (processing might finish in any order), there is no need for a semaphore. Old fashioned if statements work fine:
const int maxConcurrency = 5;
List<Task> tasks = new List<Task>();
foreach (var arg in args)
{
var t = Task.Run(() => { Process(arg); } );
tasks.Add(t);
if(tasks.Count >= maxConcurrency)
Task.WaitAny(tasks.ToArray());
}
Task.WaitAll(tasks.ToArray());
I ran into a similar problem where I wanted to produce 5000 results while calling apis, etc. So, I ran some speed tests.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id).GetAwaiter().GetResult();
});
produced 100 results in 30 seconds.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id);
});
Moving the GetAwaiter().GetResult() to the individual async api calls inside GetProductMetaData resulted in 14.09 seconds to produce 100 results.
foreach (var id in ids.Take(100))
{
GetProductMetaData(productsMetaData, client, id);
}
Complete non-async programming with the GetAwaiter().GetResult() in api calls resulted in 13.417 seconds.
var tasks = new List<Task>();
while (y < ids.Count())
{
foreach (var id in ids.Skip(y).Take(100))
{
tasks.Add(GetProductMetaData(productsMetaData, client, id));
}
y += 100;
Task.WhenAll(tasks).GetAwaiter().GetResult();
Console.WriteLine($"Finished {y}, {sw.Elapsed}");
}
Forming a task list and working through 100 at a time resulted in a speed of 7.36 seconds.
using (SemaphoreSlim cons = new SemaphoreSlim(10))
{
var tasks = new List<Task>();
foreach (var id in ids.Take(100))
{
cons.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
GetProductMetaData(productsMetaData, client, id);
}
finally
{
cons.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Using SemaphoreSlim resulted in 13.369 seconds, but also took a moment to boot to start using it.
var throttler = new SemaphoreSlim(initialCount: take);
foreach (var id in ids)
{
throttler.WaitAsync().GetAwaiter().GetResult();
tasks.Add(Task.Run(async () =>
{
try
{
skip += 1;
await GetProductMetaData(productsMetaData, client, id);
if (skip % 100 == 0)
{
Console.WriteLine($"started {skip}/{count}, {sw.Elapsed}");
}
}
finally
{
throttler.Release();
}
}));
}
Using Semaphore Slim with a throttler for my async task took 6.12 seconds.
The answer for me in this specific project was use a throttler with Semaphore Slim. Although the while foreach tasklist did sometimes beat the throttler, 4/6 times the throttler won for 1000 records.
I realize I'm not using the OPs code, but I think this is important and adds to this discussion because how is sometimes not the only question that should be asked, and the answer is sometimes "It depends on what you are trying to do."
Now to answer the specific questions:
How to limit the maximum number of parallel tasks in c#: I showed how to limit the number of tasks that are completed at a time.
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time? I cannot guess how many will be processed at a time unless I set an upper limit but I can set an upper limit. Obviously different computers function at different speeds due to CPU, RAM etc. and how many threads and cores the program itself has access to as well as other programs running in tandem on the same computer.
How to ensure this message get processed in the same sequence/order of the Collection? If you want to process everything in a specific order, it is synchronous programming. The point of being able to run things asynchronously is ensuring that they can do everything without an order. As you can see from my code, the time difference is minimal in 100 records unless you use async code. In the event that you need an order to what you are doing, use asynchronous programming up until that point, then await and do things synchronously from there. For example, task1a.start, task2a.start, then later task1a.await, task2a.await... then later task1b.start task1b.await and task2b.start task 2b.await.
public static void RunTasks(List<NamedTask> importTaskList)
{
List<NamedTask> runningTasks = new List<NamedTask>();
try
{
foreach (NamedTask currentTask in importTaskList)
{
currentTask.Start();
runningTasks.Add(currentTask);
if (runningTasks.Where(x => x.Status == TaskStatus.Running).Count() >= MaxCountImportThread)
{
Task.WaitAny(runningTasks.ToArray());
}
}
Task.WaitAll(runningTasks.ToArray());
}
catch (Exception ex)
{
Log.Fatal("ERROR!", ex);
}
}
you can use the BlockingCollection, If the consume collection limit has reached, the produce will stop producing until a consume process will finish. I find this pattern more easy to understand and implement than the SemaphoreSlim.
int TasksLimit = 10;
BlockingCollection<Task> tasks = new BlockingCollection<Task>(new ConcurrentBag<Task>(), TasksLimit);
void ProduceAndConsume()
{
var producer = Task.Factory.StartNew(RunProducer);
var consumer = Task.Factory.StartNew(RunConsumer);
try
{
Task.WaitAll(new[] { producer, consumer });
}
catch (AggregateException ae) { }
}
void RunConsumer()
{
foreach (var task in tasks.GetConsumingEnumerable())
{
task.Start();
}
}
void RunProducer()
{
for (int i = 0; i < 1000; i++)
{
tasks.Add(new Task(() => Thread.Sleep(1000), TaskCreationOptions.AttachedToParent));
}
}
Note that the RunProducer and RunConsumer has spawn two independent tasks.

How can I check whether this interval is even running?

I've set a bunch of Console.WriteLines and as far as I can tell none of them are being invoked when I run the following in .NET Fiddle.
using System;
using System.Net;
using System.Linq.Expressions;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Timers;
using System.Collections.Generic;
public class Program
{
private static readonly object locker = new object();
private static readonly string pageFormat = "http://www.letsrun.com/forum/forum.php?board=1&page={0}";
public static void Main()
{
var client = new WebClient();
// Queue up the requests we are going to make
var tasks = new Queue<Task<string>>(
Enumerable
.Repeat(0,50)
.Select(i => new Task<string>(() => client.DownloadString(string.Format(pageFormat,i))))
);
// Create set of 5 tasks which will be the at most 5
// requests we wait on
var runningTasks = new HashSet<Task<string>>();
for(int i = 0; i < 5; ++i)
{
runningTasks.Add(tasks.Dequeue());
}
var timer = new System.Timers.Timer
{
AutoReset = true,
Interval = 2000
};
// On each tick, go through the tasks that are supposed
// to have started running and if they have completed
// without error then store their result and run the
// next queued task if there is one. When we run out of
// any more tasks to run or wait for, stop the ticks.
timer.Elapsed += delegate
{
lock(locker)
{
foreach(var task in runningTasks)
{
if(task.IsCompleted)
{
if(!task.IsFaulted)
{
Console.WriteLine("Got a document: {0}",
task.Result.Substring(Math.Min(30, task.Result.Length)));
runningTasks.Remove(task);
if(tasks.Any())
{
runningTasks.Add(tasks.Dequeue());
}
}
else
{
Console.WriteLine("Uh-oh, task faulted, apparently");
}
}
else if(!task.Status.Equals(TaskStatus.Running)) // task not started
{
Console.WriteLine("About to start a task.");
task.Start();
}
else
{
Console.WriteLine("Apparently a task is running.");
}
}
if(!runningTasks.Any())
{
timer.Stop();
}
}
};
}
}
I'd also appreciate advice on how I can simplify or fix any faulty logic in this. The pattern I'm trying to do is like
(1) Create a queueu of N tasks
(2) Create a set of M tasks, the first M dequeued items from (1)
(3) Start the M tasks running
(4) After X seconds, check for completed tasks.
(5) For any completed task, do something with the result, remove the task from the set and replace it with another task from the queue (if any are left in the queueu).
(6) Repeat (4)-(5) indefinitely.
(7) If the set has no tasks left, we're done.
but perhaps there's a better way to implement it, or perhaps there's some .NET function that easily encapsulates what I'm trying to do (web requests in parallel with a specified max degree of parallelism).
There are several issues in your code, but since you are looking for better way to implement it - you can use Parallel.For or Parallel.ForEach:
Parallel.For(0, 50, new ParallelOptions() { MaxDegreeOfParallelism = 5 }, (i) =>
{
// surround with try-catch
string result;
using (var client = new WebClient()) {
result = client.DownloadString(string.Format(pageFormat, i));
}
// do something with result
Console.WriteLine("Got a document: {0}", result.Substring(Math.Min(30, result.Length)));
});
It will execute the body in parallel (not more than 5 tasks at any given time). When one task is completed - next one is started, until they are all done, just like you want.
UPDATE. There are several waits to throttle tasks with this approach, but the most straightforward is just sleep:
Parallel.For(0, 50, new ParallelOptions() { MaxDegreeOfParallelism = 5 },
(i) =>
{
// surround with try-catch
var watch = Stopwatch.StartNew();
string result;
using (var client = new WebClient()) {
result = client.DownloadString(string.Format(pageFormat, i));
}
// do something with result
Console.WriteLine("Got a document: {0}", result.Substring(Math.Min(30, result.Length)));
watch.Stop();
var sleep = 2000 - watch.ElapsedMilliseconds;
if (sleep > 0)
Thread.Sleep((int)sleep);
});
This isn't a direct answer to your question. I just wanted to suggest an alternative approach.
I'd recommend that you look into using Microsoft's Reactive Framework (NuGet "System.Reactive") for doing this kind of thing.
You could then do something like this:
var query =
Observable
.Range(0, 50)
.Select(i => string.Format(pageFormat, i))
.Select(u => Observable.Using(
() => new WebClient(),
wc => Observable.Start(() => new { url = u, content = wc.DownloadString(u) })))
.Merge(5);
IDisposable subscription = query.Subscribe(x =>
{
Console.WriteLine(x.url);
Console.WriteLine(x.content);
});
It's all async and the process can be stopped at any time by calling subscription.Dispose();

Best practices for implementing a thread to do fast, bulk, and continuous reading in C#?

How should the reading of bulk data from a device in C# be handled in .NET 4.0? Specifically I need to read quickly from a USB HID device that emits reports over 26 packets where order must be preserved.
I've tried doing this in a BackgroundWorker thread. It reads one packet from the device at a time, and process it, before reading more. This gives reasonably good response times, but it is liable to lose a packet here and there, and the overhead costs of a single packet read adds up.
while (!( sender as BackgroundWorker ).CancellationPending) {
//read a single packet
//check for header or footer
//process packet data
}
}
What is the best practice in C# for reading a device like this?
Background:
My USB HID device continuously reports a large amount of data. The data is split over 26 packets and I must preserver the order. Unfortunately the device only marks the first the last packets in each report, so I need to be able to catch every other packet in between.
For .Net 4 you can use a BlockingCollection to provide a threadsafe queue that can be used by a producer and a consumer. The BlockingCollection.GetConsumingEnumerable() method provides an enumerator which automatically terminates when the queue has been marked as completed using CompleteAdding() and is empty.
Here's some sample code. The payload is an array of ints in this example, but of course you would use whatever data type you need.
Note that for your specific example, you can use the overload of GetConsumingEnumerable() which accepts an argument of type CancellationToken.
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
public static class Program
{
private static void Main()
{
var queue = new BlockingCollection<int[]>();
Task.Factory.StartNew(() => produce(queue));
consume(queue);
Console.WriteLine("Finished.");
}
private static void consume(BlockingCollection<int[]> queue)
{
foreach (var item in queue.GetConsumingEnumerable())
{
Console.WriteLine("Consuming " + item[0]);
Thread.Sleep(25);
}
}
private static void produce(BlockingCollection<int[]> queue)
{
for (int i = 0; i < 1000; ++i)
{
Console.WriteLine("Producing " + i);
var payload = new int[100];
payload[0] = i;
queue.Add(payload);
Thread.Sleep(20);
}
queue.CompleteAdding();
}
}
}
For .Net 4.5 and later, you could use the higher-level classes from Microsoft's Task Parallel Library, which has a wealth of functionality (and can be somewhat daunting at first sight).
Here's the same example using TPL DataFlow:
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace Demo
{
public static class Program
{
private static void Main()
{
var queue = new BufferBlock<int[]>();
Task.Factory.StartNew(() => produce(queue));
consume(queue).Wait();
Console.WriteLine("Finished.");
}
private static async Task consume(BufferBlock<int[]> queue)
{
while (await queue.OutputAvailableAsync())
{
var payload = await queue.ReceiveAsync();
Console.WriteLine("Consuming " + payload[0]);
await Task.Delay(25);
}
}
private static void produce(BufferBlock<int[]> queue)
{
for (int i = 0; i < 1000; ++i)
{
Console.WriteLine("Producing " + i);
var payload = new int[100];
payload[0] = i;
queue.Post(payload);
Thread.Sleep(20);
}
queue.Complete();
}
}
}
If missing packets is a concern do not do your processing and your reading on the same thread. Starting with .NET 4.0 they added the System.Collections.Concurrent namespace which makes this very easy to do. All you need is a BlockingCollection which behaves as a queue for your incoming packets.
BlockingCollection<Packet> _queuedPackets = new BlockingCollection<Packet>(new ConcurrentQueue<Packet>());
void readingBackgroundWorker_DoWork(object sender, DoWorkEventArgs e)
{
while (!( sender as BackgroundWorker ).CancellationPending)
{
Packet packet = GetPacket();
_queuedPackets.Add(packet);
}
_queuedPackets.CompleteAdding();
}
void processingBackgroundWorker_DoWork(object sender, DoWorkEventArgs e)
{
List<Packet> report = new List<Packet>();
foreach(var packet in _queuedPackets.GetConsumingEnumerable())
{
report.Add(packet);
if(packet.IsLastPacket)
{
ProcessReport(report);
report = new List<Packet>();
}
}
}
What will happen is while _queuedPackets is empty _queuedPackets.GetConsumingEnumerable() will block the thread not consuming any resources. As soon as a packet arrives it will unblock and do the next iteration of the foreach.
When you call _queuedPackets.CompleteAdding(); the foreach on your processing thread will run till the collection is empty then exit the foreach loop. If you don't want it to "finish up the queue" when you cancel you can easily change it up to quit early. I also am going to switch to using Tasks instead of Background workers because it makes the passing in parameters much easier to do.
void ReadingLoop(BlockingCollection<Packet> queue, CancellationToken token)
{
while (!token.IsCancellationRequested)
{
Packet packet = GetPacket();
queue.Add(packet);
}
queue.CompleteAdding();
}
void ProcessingLoop(BlockingCollection<Packet> queue, CancellationToken token)
{
List<Packet> report = new List<Packet>();
try
{
foreach(var packet in queue.GetConsumingEnumerable(token))
{
report.Add(packet);
if(packet.IsLastPacket)
{
ProcessReport(report);
report = new List<Packet>();
}
}
}
catch(OperationCanceledException)
{
//Do nothing, we don't care that it happened.
}
}
//This would replace your backgroundWorker.RunWorkerAsync() calls;
private void StartUpLoops()
{
var queue = new BlockingCollection<Packet>(new ConcurrentQueue<Packet>());
var cancelRead = new CancellationTokenSource();
var cancelProcess = new CancellationTokenSource();
Task.Factory.StartNew(() => ReadingLoop(queue, cancelRead.Token));
Task.Factory.StartNew(() => ProcessingLoop(queue, cancelProcess.Token));
//You can stop each loop indpendantly by calling cancelRead.Cancel() or cancelProcess.Cancel()
}

Ensure a long running task is only fired once and subsequent request are queued but with only one entry in the queue

I have a compute intensive method Calculate that may run for a few seconds, requests come from multiple threads.
Only one Calculate should be executing, a subsequent request should be queued until the initial request completes. If there is already a request queued then the the subsequent request can be discarded (as the queued request will be sufficient)
There seems to be lots of potential solutions but I just need the simplest.
UPDATE: Here's my rudimentaryattempt:
private int _queueStatus;
private readonly object _queueStatusSync = new Object();
public void Calculate()
{
lock(_queueStatusSync)
{
if(_queueStatus == 2) return;
_queueStatus++;
if(_queueStatus == 2) return;
}
for(;;)
{
CalculateImpl();
lock(_queueStatusSync)
if(--_queueStatus == 0) return;
}
}
private void CalculateImpl()
{
// long running process will take a few seconds...
}
The simplest, cleanest solution IMO is using TPL Dataflow (as always) with a BufferBlock acting as the queue. BufferBlock is thread-safe, supports async-await, and more important, has TryReceiveAll to get all the items at once. It also has OutputAvailableAsync so you can wait asynchronously for items to be posted to the buffer. When multiple requests are posted you simply take the last and forget about the rest:
var buffer = new BufferBlock<Request>();
var task = Task.Run(async () =>
{
while (await buffer.OutputAvailableAsync())
{
IList<Request> requests;
buffer.TryReceiveAll(out requests);
Calculate(requests.Last());
}
});
Usage:
buffer.Post(new Request());
buffer.Post(new Request());
Edit: If you don't have any input or output for the Calculate method you can simply use a boolean to act as a switch. If it's true you can turn it off and calculate, if it became true again while Calculate was running then calculate again:
public bool _shouldCalculate;
public void Producer()
{
_shouldCalculate = true;
}
public async Task Consumer()
{
while (true)
{
if (!_shouldCalculate)
{
await Task.Delay(1000);
}
else
{
_shouldCalculate = false;
Calculate();
}
}
}
A BlockingCollection that only takes 1 at a time
The trick is to skip if there are any items in the collection
I would go with the answer from I3aron +1
This is (maybe) a BlockingCollection solution
public static void BC_AddTakeCompleteAdding()
{
using (BlockingCollection<int> bc = new BlockingCollection<int>(1))
{
// Spin up a Task to populate the BlockingCollection
using (Task t1 = Task.Factory.StartNew(() =>
{
for (int i = 0; i < 100; i++)
{
if (bc.TryAdd(i))
{
Debug.WriteLine(" add " + i.ToString());
}
else
{
Debug.WriteLine(" skip " + i.ToString());
}
Thread.Sleep(30);
}
bc.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
try
{
// Consume consume the BlockingCollection
while (true)
{
Debug.WriteLine("take " + bc.Take());
Thread.Sleep(100);
}
}
catch (InvalidOperationException)
{
// An InvalidOperationException means that Take() was called on a completed collection
Console.WriteLine("That's All!");
}
}))
Task.WaitAll(t1, t2);
}
}
}
It sounds like a classic producer-consumer. I'd recommend looking into BlockingCollection<T>. It is part of the System.Collection.Concurrent namespace. On top of that you can implement your queuing logic.
You may supply to a BlockingCollection any internal structure to hold its data, such as a ConcurrentBag<T>, ConcurrentQueue<T> etc. The latter is the default structure used.

Run x number of web requests simultaneously

Our company has a web service which I want to send XML files (stored on my drive) via my own HTTPWebRequest client in C#. This already works. The web service supports 5 synchronuous requests at the same time (I get a response from the web service once the processing on the server is completed). Processing takes about 5 minutes for each request.
Throwing too many requests (> 5) results in timeouts for my client. Also, this can lead to errors on the server side and incoherent data. Making changes on the server side is not an option (from different vendor).
Right now, my Webrequest client will send the XML and wait for the response using result.AsyncWaitHandle.WaitOne();
However, this way, only one request can be processed at a time although the web service supports 5. I tried using a Backgroundworker and Threadpool but they create too many requests at same, which make them useless to me. Any suggestion, how one could solve this problem? Create my own Threadpool with exactly 5 threads? Any suggestions, how to implement this?
The easy way is to create 5 threads ( aside: that's an odd number! ) that consume the xml files from a BlockingCollection.
Something like:
var bc = new BlockingCollection<string>();
for ( int i = 0 ; i < 5 ; i++ )
{
new Thread( () =>
{
foreach ( var xml in bc.GetConsumingEnumerable() )
{
// do work
}
}
).Start();
}
bc.Add( xml_1 );
bc.Add( xml_2 );
...
bc.CompleteAdding(); // threads will end when queue is exhausted
If you're on .Net 4, this looks like a perfect fit for Parallel.ForEach(). You can set its MaxDegreeOfParallelism, which means you are guaranteed that no more items are processed at one time.
Parallel.ForEach(items,
new ParallelOptions { MaxDegreeOfParallelism = 5 },
ProcessItem);
Here, ProcessItem is a method that processes one item by accessing your server and blocking until the processing is done. You could use a lambda instead, if you wanted.
Creating your own threadpool of five threads isn't tricky - Just create a concurrent queue of objects describing the request to make, and have five threads that loop through performing the task as needed. Add in an AutoResetEvent and you can make sure they don't spin furiously while there are no requests that need handling.
It can though be tricky to return the response to the correct calling thread. If this is the case for how the rest of your code works, I'd take a different approach and create a limiter that acts a bit like a monitor but allowing 5 simultaneous threads rather than only one:
private static class RequestLimiter
{
private static AutoResetEvent _are = new AutoResetEvent(false);
private static int _reqCnt = 0;
public ResponseObject DoRequest(RequestObject req)
{
for(;;)
{
if(Interlocked.Increment(ref _reqCnt) <= 5)
{
//code to create response object "resp".
Interlocked.Decrement(ref _reqCnt);
_are.Set();
return resp;
}
else
{
if(Interlocked.Decrement(ref _reqCnt) >= 5)//test so we don't end up waiting due to race on decrementing from finished thread.
_are.WaitOne();
}
}
}
}
You could write a little helper method, that would block the current thread until all the threads have finished executing the given action delegate.
static void SpawnThreads(int count, Action action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
new Thread(() =>
{
action();
countdown.Signal();
}).Start();
}
countdown.Wait();
}
And then use a BlockingCollection<string> (thread-safe collection), to keep track of your xml files. By using the helper method above, you could write something like:
static void Main(string[] args)
{
var xmlFiles = new BlockingCollection<string>();
// Add some xml files....
SpawnThreads(5, () =>
{
using (var web = new WebClient())
{
web.UploadFile(xmlFiles.Take());
}
});
Console.WriteLine("Done");
Console.ReadKey();
}
Update
An even better approach would be to upload the files async, so that you don't waste resources on using threads for an IO task.
Again you could write a helper method:
static void SpawnAsyncs(int count, Action<CountdownEvent> action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
action(countdown);
}
countdown.Wait();
}
And use it like:
static void Main(string[] args)
{
var urlXML = new BlockingCollection<Tuple<string, string>>();
urlXML.Add(Tuple.Create("http://someurl.com", "filename"));
// Add some more to collection...
SpawnAsyncs(5, c =>
{
using (var web = new WebClient())
{
var current = urlXML.Take();
web.UploadFileCompleted += (s, e) =>
{
// some code to mess with e.Result (response)
c.Signal();
};
web.UploadFileAsyncAsync(new Uri(current.Item1), current.Item2);
}
});
Console.WriteLine("Done");
Console.ReadKey();
}

Categories

Resources