parallel check open port ip from file - c#

I want to check the open port in a group of ip . I do not want to check an ip behind a title sequentially. But I want it in parallel so that the user determines the number of threads. The number of ip is divided into threads :
private async void button3_Click(object sender, EventArgs e)
{
await Task.Run(() =>
{
var lines = File.ReadAllLines("ip.txt");
int cc = (lines.Length) / (int.Parse(thread.Text));
if (lines.Length % int.Parse(thread.Text) == 0)
{
for (int s = 0; s < lines.Length; s = s + cc)
{
Parallel.For(0, 1, a =>
{
checkopen(s, (s + cc));
});
}
}
else
{
MessageBox.Show("enter true thread num");
}
});
}
this to check open port :
void checkopen(int first,int last)
{
int port = Convert.ToInt32(portnum.Text);
var lines = File.ReadAllLines("ip.txt");
for (int i = first; i < last; ++i)
{
var line = lines[i];
using (TcpClient tcpClient = new TcpClient())
{
try
{
tcpClient.Connect(line, port);
this.Invoke(new Action(() =>
{
listBox1.Items.Add(line); // open port
}));
}
catch (Exception)
{
this.Invoke(new Action(() =>
{
listBox2.Items.Add(line); //close port
}));
}
}
}
}

I see this problem every day.
There is no point putting IO bound operations in Parallel.For, it's not designed for IO work, or the async pattern (and trust me, that's what you want).
Why do we want the async await pattern?
Because it designed to give the threads back when it's awaiting an IO completion port or awaitable workload.
When you run IO work in Parallel.For/Parallel.ForEach like this, the Task Scheduler just won't give you threads to block, it uses all sort of heuristics to work out how many threads you should have, and it takes a dim view of it.
So what should we use?
The async and await pattern.
Why?
Because we can let IO be IO, the system creates and IO completion port, .Net gives the thread back to the threadpool until the Completion port calls back and the method continues.
So, there are many options for this. But first and foremost await the awaitable async methods of the libraries you are using.
From here you can either create a lists of tasks and use something like an awaitable SemaphoreSlim to limit concurrency and a WhenAll.
Or you could use something like an ActionBlock out of TPL Dataflow, which is designed to work with both CPU and IO bound workloads.
The real world benefits can't be understated. Your Parallel.For approach will just run a handful of thread and block them. An async version you'll be able to run 100s simultaneously.
Dataflow example
You can get the Nuget here
public async Task DoWorkLoads(List<WorkLoad> workloads)
{
var options = new ExecutionDataflowBlockOptions
{
// add pepper and salt to taste
MaxDegreeOfParallelism = 100,
EnsureOrdered = false
};
// create an action block
var block = new ActionBlock<WorkLoad>(MyMethodAsync, options);
// Queue them up
foreach (var workLoad in workloads)
block.Post(workLoad );
// wait for them to finish
block.Complete();
await block.Completion;
}
...
// Notice we are using the async / await pattern
public async Task MyMethodAsync(WorkLoad workLoad)
{
try
{
Console.WriteLine("Doing some IO work async);
await DoIoWorkAsync;
}
catch (Exception)
{
// probably best to add some error checking some how
}
}

Related

Limit the number of cores: new Task will block until core free from other tasks [duplicate]

I have a collection of 1000 input message to process. I'm looping the input collection and starting the new task for each message to get processed.
//Assume this messages collection contains 1000 items
var messages = new List<string>();
foreach (var msg in messages)
{
Task.Factory.StartNew(() =>
{
Process(msg);
});
}
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time?
How to ensure this message get processed in the same sequence/order of the Collection?
You could use Parallel.Foreach and rely on MaxDegreeOfParallelism instead.
Parallel.ForEach(messages, new ParallelOptions {MaxDegreeOfParallelism = 10},
msg =>
{
// logic
Process(msg);
});
SemaphoreSlim is a very good solution in this case and I higly recommend OP to try this, but #Manoj's answer has flaw as mentioned in comments.semaphore should be waited before spawning the task like this.
Updated Answer: As #Vasyl pointed out Semaphore may be disposed before completion of tasks and will raise exception when Release() method is called so before exiting the using block must wait for the completion of all created Tasks.
int maxConcurrency=10;
var messages = new List<string>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach(var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Answer to Comments
for those who want to see how semaphore can be disposed without Task.WaitAll
Run below code in console app and this exception will be raised.
System.ObjectDisposedException: 'The semaphore has been disposed.'
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
// Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static void Process(string msg)
{
Thread.Sleep(2000);
Console.WriteLine(msg);
}
I think it would be better to use Parallel LINQ
Parallel.ForEach(messages ,
new ParallelOptions{MaxDegreeOfParallelism = 4},
x => Process(x);
);
where x is the MaxDegreeOfParallelism
With .NET 5.0 and Core 3.0 channels were introduced.
The main benefit of this producer/consumer concurrency pattern is that you can also limit the input data processing to reduce resource impact.
This is especially helpful when processing millions of data records.
Instead of reading the whole dataset at once into memory, you can now consecutively query only chunks of the data and wait for the workers to process it before querying more.
Code sample with a queue capacity of 50 messages and 5 consumer threads:
/// <exception cref="System.AggregateException">Thrown on Consumer Task exceptions.</exception>
public static async Task ProcessMessages(List<string> messages)
{
const int producerCapacity = 10, consumerTaskLimit = 3;
var channel = Channel.CreateBounded<string>(producerCapacity);
_ = Task.Run(async () =>
{
foreach (var msg in messages)
{
await channel.Writer.WriteAsync(msg);
// blocking when channel is full
// waiting for the consumer tasks to pop messages from the queue
}
channel.Writer.Complete();
// signaling the end of queue so that
// WaitToReadAsync will return false to stop the consumer tasks
});
var tokenSource = new CancellationTokenSource();
CancellationToken ct = tokenSource.Token;
var consumerTasks = Enumerable
.Range(1, consumerTaskLimit)
.Select(_ => Task.Run(async () =>
{
try
{
while (await channel.Reader.WaitToReadAsync(ct))
{
ct.ThrowIfCancellationRequested();
while (channel.Reader.TryRead(out var message))
{
await Task.Delay(500);
Console.WriteLine(message);
}
}
}
catch (OperationCanceledException) { }
catch
{
tokenSource.Cancel();
throw;
}
}))
.ToArray();
Task waitForConsumers = Task.WhenAll(consumerTasks);
try { await waitForConsumers; }
catch
{
foreach (var e in waitForConsumers.Exception.Flatten().InnerExceptions)
Console.WriteLine(e.ToString());
throw waitForConsumers.Exception.Flatten();
}
}
As pointed out by Theodor Zoulias:
On multiple consumer exceptions, the remaining tasks will continue to run and have to take the load of the killed tasks. To avoid this, I implemented a CancellationToken to stop all the remaining tasks and handle the exceptions combined in the AggregateException of waitForConsumers.Exception.
Side note:
The Task Parallel Library (TPL) might be good at automatically limiting the tasks based on your local resources. But when you are processing data remotely via RPC, it's necessary to manually limit your RPC calls to avoid filling the network/processing stack!
If your Process method is async you can't use Task.Factory.StartNew as it doesn't play well with an async delegate. Also there are some other nuances when using it (see this for example).
The proper way to do it in this case is to use Task.Run. Here's #ClearLogic answer modified for an async Process method.
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Run(async () =>
{
try
{
await Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static async Task Process(string msg)
{
await Task.Delay(2000);
Console.WriteLine(msg);
}
You can create your own TaskScheduler and override QueueTask there.
protected virtual void QueueTask(Task task)
Then you can do anything you like.
One example here:
Limited concurrency level task scheduler (with task priority) handling wrapped tasks
You can simply set the max concurrency degree like this way:
int maxConcurrency=10;
var messages = new List<1000>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
foreach(var msg in messages)
{
Task.Factory.StartNew(() =>
{
concurrencySemaphore.Wait();
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
}
}
If you need in-order queuing (processing might finish in any order), there is no need for a semaphore. Old fashioned if statements work fine:
const int maxConcurrency = 5;
List<Task> tasks = new List<Task>();
foreach (var arg in args)
{
var t = Task.Run(() => { Process(arg); } );
tasks.Add(t);
if(tasks.Count >= maxConcurrency)
Task.WaitAny(tasks.ToArray());
}
Task.WaitAll(tasks.ToArray());
I ran into a similar problem where I wanted to produce 5000 results while calling apis, etc. So, I ran some speed tests.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id).GetAwaiter().GetResult();
});
produced 100 results in 30 seconds.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id);
});
Moving the GetAwaiter().GetResult() to the individual async api calls inside GetProductMetaData resulted in 14.09 seconds to produce 100 results.
foreach (var id in ids.Take(100))
{
GetProductMetaData(productsMetaData, client, id);
}
Complete non-async programming with the GetAwaiter().GetResult() in api calls resulted in 13.417 seconds.
var tasks = new List<Task>();
while (y < ids.Count())
{
foreach (var id in ids.Skip(y).Take(100))
{
tasks.Add(GetProductMetaData(productsMetaData, client, id));
}
y += 100;
Task.WhenAll(tasks).GetAwaiter().GetResult();
Console.WriteLine($"Finished {y}, {sw.Elapsed}");
}
Forming a task list and working through 100 at a time resulted in a speed of 7.36 seconds.
using (SemaphoreSlim cons = new SemaphoreSlim(10))
{
var tasks = new List<Task>();
foreach (var id in ids.Take(100))
{
cons.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
GetProductMetaData(productsMetaData, client, id);
}
finally
{
cons.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Using SemaphoreSlim resulted in 13.369 seconds, but also took a moment to boot to start using it.
var throttler = new SemaphoreSlim(initialCount: take);
foreach (var id in ids)
{
throttler.WaitAsync().GetAwaiter().GetResult();
tasks.Add(Task.Run(async () =>
{
try
{
skip += 1;
await GetProductMetaData(productsMetaData, client, id);
if (skip % 100 == 0)
{
Console.WriteLine($"started {skip}/{count}, {sw.Elapsed}");
}
}
finally
{
throttler.Release();
}
}));
}
Using Semaphore Slim with a throttler for my async task took 6.12 seconds.
The answer for me in this specific project was use a throttler with Semaphore Slim. Although the while foreach tasklist did sometimes beat the throttler, 4/6 times the throttler won for 1000 records.
I realize I'm not using the OPs code, but I think this is important and adds to this discussion because how is sometimes not the only question that should be asked, and the answer is sometimes "It depends on what you are trying to do."
Now to answer the specific questions:
How to limit the maximum number of parallel tasks in c#: I showed how to limit the number of tasks that are completed at a time.
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time? I cannot guess how many will be processed at a time unless I set an upper limit but I can set an upper limit. Obviously different computers function at different speeds due to CPU, RAM etc. and how many threads and cores the program itself has access to as well as other programs running in tandem on the same computer.
How to ensure this message get processed in the same sequence/order of the Collection? If you want to process everything in a specific order, it is synchronous programming. The point of being able to run things asynchronously is ensuring that they can do everything without an order. As you can see from my code, the time difference is minimal in 100 records unless you use async code. In the event that you need an order to what you are doing, use asynchronous programming up until that point, then await and do things synchronously from there. For example, task1a.start, task2a.start, then later task1a.await, task2a.await... then later task1b.start task1b.await and task2b.start task 2b.await.
public static void RunTasks(List<NamedTask> importTaskList)
{
List<NamedTask> runningTasks = new List<NamedTask>();
try
{
foreach (NamedTask currentTask in importTaskList)
{
currentTask.Start();
runningTasks.Add(currentTask);
if (runningTasks.Where(x => x.Status == TaskStatus.Running).Count() >= MaxCountImportThread)
{
Task.WaitAny(runningTasks.ToArray());
}
}
Task.WaitAll(runningTasks.ToArray());
}
catch (Exception ex)
{
Log.Fatal("ERROR!", ex);
}
}
you can use the BlockingCollection, If the consume collection limit has reached, the produce will stop producing until a consume process will finish. I find this pattern more easy to understand and implement than the SemaphoreSlim.
int TasksLimit = 10;
BlockingCollection<Task> tasks = new BlockingCollection<Task>(new ConcurrentBag<Task>(), TasksLimit);
void ProduceAndConsume()
{
var producer = Task.Factory.StartNew(RunProducer);
var consumer = Task.Factory.StartNew(RunConsumer);
try
{
Task.WaitAll(new[] { producer, consumer });
}
catch (AggregateException ae) { }
}
void RunConsumer()
{
foreach (var task in tasks.GetConsumingEnumerable())
{
task.Start();
}
}
void RunProducer()
{
for (int i = 0; i < 1000; i++)
{
tasks.Add(new Task(() => Thread.Sleep(1000), TaskCreationOptions.AttachedToParent));
}
}
Note that the RunProducer and RunConsumer has spawn two independent tasks.

Can you avoid Task Continuations when using async/await if you want execution to continue immediately

I am working on a protocol and trying to use as much async/await as I can to make it scale well. The protocol will have to support hundreds to thousands of simultaneous connections. Below is a little bit of pseudo code to illustrate my problem.
private static async void DoSomeWork()
{
var protocol = new FooProtocol();
await protocol.Connect("127.0.0.1", 1234);
var i = 0;
while(i != int.MaxValue)
{
i++;
var request = new FooRequest();
request.Payload = "Request Nr " + i;
var task = protocol.Send(request);
_ = task.ContinueWith(async tmp =>
{
var resp = await task;
Console.WriteLine($"Request {resp.SequenceNr} Successful: {(resp.Status == 0)}");
});
}
}
And below is a little pseudo code for the protocol.
public class FooProtocol
{
private int sequenceNr = 0;
private SemaphoreSlim ss = new SemaphoreSlim(20, 20);
public Task<FooResponse> Send(FooRequest fooRequest)
{
var tcs = new TaskCompletionSource<FooResponse>();
ss.Wait();
var tmp = Interlocked.Increment(ref sequenceNr);
fooRequest.SequenceNr = tmp;
// Faking some arbitrary delay. This work is done over sockets.
Task.Run(async () =>
{
await Task.Delay(1000);
tcs.SetResult(new FooResponse() {SequenceNr = tmp});
ss.Release();
});
return tcs.Task;
}
}
I have a protocol with request and response pairs. I have used asynchronous socket programming. The FooProtocol will take care of matching up request with responses (sequence numbers) and will also take care of the maximum number of pending requests. (Done in the pseudo and my code with a semaphore slim, So I am not worried about run away requests). The DoSomeWork method calls the Protocol.Send method, but I don't want to await the response, I want to spin around and send the next one until I am blocked by the maximum number of pending requests. When the task does complete I want to check the response and maybe do some work.
I would like to fix two things
I would like to avoid using Task.ContinueWith() because it seems to not fit in cleanly with the async/await patterns
Because I have awaited on the connection, I have had to use the async modifier. Now I get warnings from the IDE "Because this call is not waited, execution of the current method continues before this call is complete. Consider applying the 'await' operator to the result of the call." I don't want to do that, because as soon as I do it ruins the protocol's ability to have many requests in flight. The only way I can get rid of the warning is to use a discard. Which isn't the worst thing but I can't help but feel like I am missing a trick and fighting this too hard.
Side note: I hope your actual code is using SemaphoreSlim.WaitAsync rather than SemaphoreSlim.Wait.
In most socket code, you do end up with a list of connections, and along with each connection is a "processor" of some kind. In the async world, this is naturally represented as a Task.
So you will need to keep a list of Tasks; at the very least, your consuming application will need to know when it is safe to shut down (i.e., all responses have been received).
Don't preemptively worry about using Task.Run; as long as you aren't blocking (e.g., SemaphoreSlim.Wait), you probably will not starve the thread pool. Remember that during the awaits, no thread pool thread is used.
I am not sure that it's a good idea to enforce the maximum concurrency at the protocol level. It seems to me that this responsibility belongs to the caller of the protocol. So I would remove the SemaphoreSlim, and let it do the one thing that it knows to do well:
public class FooProtocol
{
private int sequenceNr = 0;
public async Task<FooResponse> Send(FooRequest fooRequest)
{
var tmp = Interlocked.Increment(ref sequenceNr);
fooRequest.SequenceNr = tmp;
await Task.Delay(1000); // Faking some arbitrary delay
return new FooResponse() { SequenceNr = tmp };
}
}
Then I would use an ActionBlock from the TPL Dataflow library in order to coordinate the process of sending a massive number of requests through the protocol, by handling the concurrency, the backpreasure (BoundedCapacity), the cancellation (if needed), the error-handling, and the status of the whole operation (running, completed, failed etc). Example:
private static async Task DoSomeWorkAsync()
{
var protocol = new FooProtocol();
var actionBlock = new ActionBlock<FooRequest>(async request =>
{
var resp = await protocol.Send(request);
Console.WriteLine($"Request {resp.SequenceNr} Status: {resp.Status}");
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 20,
BoundedCapacity = 100
});
await protocol.Connect("127.0.0.1", 1234);
foreach (var i in Enumerable.Range(0, Int32.MaxValue))
{
var request = new FooRequest();
request.Payload = "Request Nr " + i;
var accepted = await actionBlock.SendAsync(request);
if (!accepted) break; // The block has failed irrecoverably
}
actionBlock.Complete();
await actionBlock.Completion; // Propagate any exceptions
}
The BoundedCapacity = 100 configuration means that the ActionBlock will store in its internal buffer at most 100 requests. When this threshold is reached, anyone who wants to send more requests to it will have to wait. The awaiting will happen in the await actionBlock.SendAsync line.

Random tasks from Task.Factory.StartNew never finishes

I am using Async await with Task.Factory method.
public async Task<JobDto> ProcessJob(JobDto jobTask)
{
try
{
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
await T;
}
This method I am calling inside a loop like this
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
What I notice is that new tasks opens up inside Process explorer and they also start working (based on log file). however out of 10 sometimes 8 or sometimes 7 finishes. Rest of them just never come back.
why would that be happening ?
Are they timing out ? Where can I set timeout for my tasks ?
UPDATE
Basically above, I would like each Task to start running as soon as they are called and wait for the response on AWAIT T keyword. I am assuming here that once they finish each of them will come back at Await T and do the next action. I am alraedy seeing this result for 7 out of 10 tasks but 3 of them are not coming back.
Thanks
It is hard to say what the issues is without the rest if the code, but you code can be simplified by making ProcessJob synchronous and then calling Task.Run with it.
public JobDto ProcessJob(JobDto jobTask)
{
JobWorker jobWorker = new JobWorker();
return jobWorker.Execute(jobTask);
}
Start tasks and wait for all tasks to finish. Prefer using Task.Run rather than Task.Factory.StartNew as it provides more favourable defaults for pushing work to the background. See here.
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = Task.Run(() => ProcessJob(jobList[i]));
}
try
{
await Task.WhenAll(tasks);
}
catch(Exception ex)
{
// handle exception
}
First, let's make a reproducible version of your code. This is NOT the best way to achieve what you are doing, but to show you what is happening in your code!
I'll keep the code almost same as your code, except I'll use simple int rather than your JobDto and on completion of the job Execute() I'll write in a file that we can verify later. Here's the code
public class SomeMainClass
{
public void StartProcessing()
{
var jobList = Enumerable.Range(1, 10).ToArray();
var tasks = new Task[10];
//[1] start 10 jobs, one-by-one
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
//[4] here we have 10 awaitable Task in tasks
//[5] do all other unrelated operations
Thread.Sleep(1500); //assume it works for 1.5 sec
// Task.WaitAll(tasks); //[6] wait for tasks to complete
// The PROCESS IS COMPLETE here
}
public async Task ProcessJob(int jobTask)
{
try
{
//[2] start job in a ThreadPool, Background thread
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
//[3] await here will keep context of calling thread
await T; //... and release the calling thread
}
catch (Exception) { /*handle*/ }
}
}
public class JobWorker
{
static object locker = new object();
const string _file = #"C:\YourDirectory\out.txt";
public void Execute(int jobTask) //on complete, writes in file
{
Thread.Sleep(500); //let's assume does something for 0.5 sec
lock(locker)
{
File.AppendAllText(_file,
Environment.NewLine + "Writing the value-" + jobTask);
}
}
}
After running just the StartProcessing(), this is what I get in the file
Writing the value-4
Writing the value-2
Writing the value-3
Writing the value-1
Writing the value-6
Writing the value-7
Writing the value-8
Writing the value-5
So, 8/10 jobs has completed. Obviously, every time you run this, the number and order might change. But, the point is, all the jobs did not complete!
Now, if I un-comment the step [6] Task.WaitAll(tasks);, this is what I get in my file
Writing the value-2
Writing the value-3
Writing the value-4
Writing the value-1
Writing the value-5
Writing the value-7
Writing the value-8
Writing the value-6
Writing the value-9
Writing the value-10
So, all my jobs completed here!
Why the code is behaving like this, is already explained in the code-comments. The main thing to note is, your tasks run in ThreadPool based Background threads. So, if you do not wait for them, they will be killed when the MAIN process ends and the main thread exits!!
If you still don't want to await the tasks there, you can return the list of tasks from this first method and await the tasks at the very end of the process, something like this
public Task[] StartProcessing()
{
...
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
...
return tasks;
}
//in the MAIN METHOD of your application/process
var tasks = new SomeMainClass().StartProcessing();
// do all other stuffs here, and just at the end of process
Task.WaitAll(tasks);
Hope this clears all confusion.
It's possible your code is swallowing exceptions. I would add a ContineWith call to the end of the part of the code that starts the new task. Something like this untested code:
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
}).ContinueWith(tsk =>
{
var flattenedException = tsk.Exception.Flatten();
Console.Log("Exception! " + flattenedException);
return true;
});
},TaskContinuationOptions.OnlyOnFaulted); //Only call if task is faulted
Another possibility is that something in one of the tasks is timing out (like you mentioned) or deadlocking. To track down whether a timeout (or maybe deadlock) is the root cause, you could add some timeout logic (as described in this SO answer):
int timeout = 1000; //set to something much greater than the time it should take your task to complete (at least for testing)
var task = TheMethodWhichWrapsYourAsyncLogic(cancellationToken);
if (await Task.WhenAny(task, Task.Delay(timeout, cancellationToken)) == task)
{
// Task completed within timeout.
// Consider that the task may have faulted or been canceled.
// We re-await the task so that any exceptions/cancellation is rethrown.
await task;
}
else
{
// timeout/cancellation logic
}
Check out the documentation on exception handling in the TPL on MSDN.

Throttling asynchronous tasks in asp .net, with a limit on N successful tasks

I'm using Asp .Net 4.5.1.
I have tasks to run, which call a web-service, and some might fail. I need to run N successful tasks which perform some light CPU work and mainly call a web service, then stop, and I want to throttle.
For example, let's assume we have 300 URLs in some collection. We need to run a function called Task<bool> CheckUrlAsync(url) on each of them, with throttling, meaning, for example, having only 5 run "at the same time" (in other words, have maximum 5 connections used at any given time). Also, we only want to perform N (assume 100) successful operations, and then stop.
I've read this and this and still I'm not sure what would be the correct way to do it.
How would you do it?
Assume ASP .Net
Assume IO call (http call to web serice), no heavy CPU operations.
Use Semaphore slim.
var semaphore = new SemaphoreSlim(5);
var tasks = urlCollection.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
};
while(tasks.Where(t => t.Completed).Count() < 100)
{
await.Task.WhenAny(tasks);
}
Although I would prefer to use Rx.Net to produce some better code.
using(var semaphore = new SemaphoreSlim(5))
{
var results = urlCollection.ToObservable()
.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
}).Take(100).ToList();
}
Okay...this is going to be fun.
public static class SemaphoreHelper
{
public static Task<T> ContinueWith<T>(
this SemaphoreSlim semaphore,
Func<Task<T>> action)
var ret = semaphore.WaitAsync()
.ContinueWith(action);
ret.ContinueWith(_ => semaphore.Release(), TaskContinuationOptions.None);
return ret;
}
var semaphore = new SemaphoreSlim(5);
var results = urlCollection.Select(
url => semaphore.ContinueWith(() => CheckUrlAsync(url)).ToList();
I do need to add that the code as it stands will still run all 300 URLs, it just will return quicker...thats all. You would need to add the cancelation token to the semaphore.WaitAsync(token) to cancel the queued work. Again I suggest using Rx.Net for that. Its just easier to use Rx.Net to get the cancelation token to work with .Take(100).
Try something like this?
private const int u_limit = 100;
private const int c_limit = 5;
List<Task> tasks = new List<Task>();
int totalRun = 0;
while (totalRun < u_limit)
{
for (int i = 0; i < c_limit; i++)
{
tasks.Add(Task.Run (() => {
// Your code here.
}));
}
Task.WaitAll(tasks);
totalRun += c_limit;
}

How to aggregate the data from an async producer and write it to a file?

I'm learning about async/await patterns in C#. Currently I'm trying to solve a problem like this:
There is a producer (a hardware device) that generates 1000 packets per second. I need to log this data to a file.
The device only has a ReadAsync() method to report a single packet at a time.
I need to buffer the packets and write them in the order they are generated to the file, only once a second.
Write operation should fail if the write process is not finished in time when the next batch of packets is ready to be written.
So far I have written something like below. It works but I am not sure if this is the best way to solve the problem. Any comments or suggestion? What is the best practice to approach this kind of Producer/Consumer problem where the consumer needs to aggregate the data received from the producer?
static async Task TestLogger(Device device, int seconds)
{
const int bufLength = 1000;
bool firstIteration = true;
Task writerTask = null;
using (var writer = new StreamWriter("test.log")))
{
do
{
var buffer = new byte[bufLength][];
for (int i = 0; i < bufLength; i++)
{
buffer[i] = await device.ReadAsync();
}
if (!firstIteration)
{
if (!writerTask.IsCompleted)
throw new Exception("Write Time Out!");
}
writerTask = Task.Run(() =>
{
foreach (var b in buffer)
writer.WriteLine(ToHexString(b));
});
firstIteration = false;
} while (--seconds > 0);
}
}
You could use the following idea, provided the criteria for flush is the number of packets (up to 1000). I did not test it. It makes use of Stephen Cleary's AsyncProducerConsumerQueue<T> featured in this question.
AsyncProducerConsumerQueue<byte[]> _queue;
Stream _stream;
// producer
async Task ReceiveAsync(CancellationToken token)
{
while (true)
{
var list = new List<byte>();
while (true)
{
token.ThrowIfCancellationRequested(token);
var packet = await _device.ReadAsync(token);
list.Add(packet);
if (list.Count == 1000)
break;
}
// push next batch
await _queue.EnqueueAsync(list.ToArray(), token);
}
}
// consumer
async Task LogAsync(CancellationToken token)
{
Task previousFlush = Task.FromResult(0);
CancellationTokenSource cts = null;
while (true)
{
token.ThrowIfCancellationRequested(token);
// get next batch
var nextBatch = await _queue.DequeueAsync(token);
if (!previousFlush.IsCompleted)
{
cts.Cancel(); // cancel the previous flush if not ready
throw new Exception("failed to flush on time.");
}
await previousFlush; // it's completed, observe for any errors
// start flushing
cts = CancellationTokenSource.CreateLinkedTokenSource(token);
previousFlush = _stream.WriteAsync(nextBatch, 0, nextBatch.Count, cts.Token);
}
}
If you don't want to fail the logger but rather prefer to cancel the flush and proceed to the next batch, you can do so with a minimal change to this code.
In response to #l3arnon comment:
A packet is not a byte, it's byte[]. 2. You haven't used the OP's ToHexString. 3. AsyncProducerConsumerQueue is much less robust and
tested than .Net's TPL Dataflow. 4. You await previousFlush for errors
just after you throw an exception which makes that line redundant.
etc. In short: I think the possible added value doesn't justify this
very complicated solution.
"A packet is not a byte, it's byte[]" - A packet is a byte, this is obvious from the OP's code: buffer[i] = await device.ReadAsync(). Then, a batch of packets is byte[].
"You haven't used the OP's ToHexString." - The goal was to show how to use Stream.WriteAsync which natively accepts a cancellation token, instead of WriteLineAsync which doesn't allow cancellation. It's trivial to use ToHexString with Stream.WriteAsync and still take advantage of cancellation support:
var hexBytes = Encoding.ASCII.GetBytes(ToHexString(nextBatch) +
Environment.NewLine);
_stream.WriteAsync(hexBytes, 0, hexBytes.Length, token);
"AsyncProducerConsumerQueue is much less robust and tested than .Net's TPL Dataflow" - I don't think this is a determined fact. However, if the OP is concerned about it, he can use regular BlockingCollection, which doesn't block the producer thread. It's OK to block the consumer thread while waiting for the next batch, because writing is done in parallel. As opposed to this, your TPL Dataflow version carries one redundant CPU and lock intensive operation: moving data from producer pipeline to writer pipleline with logAction.Post(packet), byte by byte. My code doesn't do that.
"You await previousFlush for errors just after you throw an exception which makes that line redundant." - This line is not redundant. Perhaps, you're missing this point: previousFlush.IsCompleted can be true when previousFlush.IsFaulted or previousFlush.IsCancelled is also true. So, await previousFlush is relevant there to observe any errors on the completed tasks (e.g., a write failure), which otherwise will be lost.
A better approach IMHO would be to have 2 "workers", a producer and a consumer. The producer reads from the device and simply fills a list. The consumer "wakes up" every second and writes the batch to a file.
List<byte[]> _data = new List<byte[]>();
async Task Producer(Device device)
{
while (true)
{
_data.Add(await device.ReadAsync());
}
}
async Task Consumer(Device device)
{
using (var writer = new StreamWriter("test.log")))
{
while (true)
{
Stopwatch watch = Stopwatch.StartNew();
var batch = _data;
_data = new List<byte[]>();
foreach (var packet in batch)
{
writer.WriteLine(ToHexString(packet));
if (watch.Elapsed >= TimeSpan.FromSeconds(1))
{
throw new Exception("Write Time Out!");
}
}
await Task.Delay(TimeSpan.FromSeconds(1) - watch.Elapsed);
}
}
}
The while (true) should probably be replaced by a system wide cancellation token.
Assuming you can batch by amount (1000) instead of time (1 second), the simplest solution is probably using TPL Dataflow's BatchBlock which automatically batches a flow of items by size:
async Task TestLogger(Device device, int seconds)
{
var writer = new StreamWriter("test.log");
var batch = new BatchBlock<byte[]>(1000);
var logAction = new ActionBlock<byte[]>(
packet =>
{
return writer.WriteLineAsync(ToHexString(packet));
});
ActionBlock<byte[]> transferAction;
transferAction = new ActionBlock<byte[][]>(
bytes =>
{
foreach (var packet in bytes)
{
if (transferAction.InputCount > 0)
{
return; // or throw new Exception("Write Time Out!");
}
logAction.Post(packet);
}
}
);
batch.LinkTo(transferAction);
logAction.Completion.ContinueWith(_ => writer.Dispose());
while (true)
{
batch.Post(await device.ReadAsync());
}
}

Categories

Resources