I have this function which checks for proxy servers and currently it checks only a number of threads and waits for all to finish until the next set is starting. Is it possible to start a new thread as soon as one is finished from the maximum allowed?
for (int i = 0; i < listProxies.Count(); i+=nThreadsNum)
{
for (nCurrentThread = 0; nCurrentThread < nThreadsNum; nCurrentThread++)
{
if (nCurrentThread < nThreadsNum)
{
string strProxyIP = listProxies[i + nCurrentThread].sIPAddress;
int nPort = listProxies[i + nCurrentThread].nPort;
tasks.Add(Task.Factory.StartNew<ProxyAddress>(() => CheckProxyServer(strProxyIP, nPort, nCurrentThread)));
}
}
Task.WaitAll(tasks.ToArray());
foreach (var tsk in tasks)
{
ProxyAddress result = tsk.Result;
UpdateProxyDBRecord(result.sIPAddress, result.bOnlineStatus);
}
tasks.Clear();
}
This seems much more simple:
int numberProcessed = 0;
Parallel.ForEach(listProxies,
new ParallelOptions { MaxDegreeOfParallelism = nThreadsNum },
(p)=> {
var result = CheckProxyServer(p.sIPAddress, s.nPort, Thread.CurrentThread.ManagedThreadId);
UpdateProxyDBRecord(result.sIPAddress, result.bOnlineStatus);
Interlocked.Increment(numberProcessed);
});
With slots:
var obj = new Object();
var slots = new List<int>();
Parallel.ForEach(listProxies,
new ParallelOptions { MaxDegreeOfParallelism = nThreadsNum },
(p)=> {
int threadId = Thread.CurrentThread.ManagedThreadId;
int slot = slots.IndexOf(threadId);
if (slot == -1)
{
lock(obj)
{
slots.Add(threadId);
}
slot = slots.IndexOf(threadId);
}
var result = CheckProxyServer(p.sIPAddress, s.nPort, slot);
UpdateProxyDBRecord(result.sIPAddress, result.bOnlineStatus);
});
I took a few shortcuts there to guarantee thread safety. You don't have to do the normal check-lock-check dance because there will never be two threads attempting to add the same threadid to the list, so the second check will always fail and isn't needed. Secondly, for the same reason, I don't believe you need to ever lock around the outer IndexOf either. That makes this a very highly efficient concurrent routine that rarely locks (it should only lock nThreadsNum times) no matter how many items are in the enumerable.
Another solution is to use a SemaphoreSlim or the Producer-Consumer Pattern using a BlockinCollection<T>. Both solution support cancellation.
SemaphoreSlim
private async Task CheckProxyServerAsync(IEnumerable<object> proxies)
{
var tasks = new List<Task>();
int currentThreadNumber = 0;
int maxNumberOfThreads = 8;
using (semaphore = new SemaphoreSlim(maxNumberOfThreads, maxNumberOfThreads))
{
foreach (var proxy in proxies)
{
// Asynchronously wait until thread is available if thread limit reached
await semaphore.WaitAsync();
string proxyIP = proxy.IPAddress;
int port = proxy.Port;
tasks.Add(Task.Run(() => CheckProxyServer(proxyIP, port, Interlocked.Increment(ref currentThreadNumber)))
.ContinueWith(
(task) =>
{
ProxyAddress result = task.Result;
// Method call must be thread-safe!
UpdateProxyDbRecord(result.IPAddress, result.OnlineStatus);
Interlocked.Decrement(ref currentThreadNumber);
// Allow to start next thread if thread limit was reached
semaphore.Release();
},
TaskContinuationOptions.OnlyOnRanToCompletion));
}
// Asynchronously wait until all tasks are completed
// to prevent premature disposal of semaphore
await Task.WhenAll(tasks);
}
}
Producer-Consumer Pattern
// Uses a fixed number of same threads
private async Task CheckProxyServerAsync(IEnumerable<ProxyInfo> proxies)
{
var pipe = new BlockingCollection<ProxyInfo>();
int maxNumberOfThreads = 8;
var tasks = new List<Task>();
// Create all threads (count == maxNumberOfThreads)
for (int currentThreadNumber = 0; currentThreadNumber < maxNumberOfThreads; currentThreadNumber++)
{
tasks.Add(
Task.Run(() => ConsumeProxyInfo(pipe, currentThreadNumber)));
}
proxies.ToList().ForEach(pipe.Add);
pipe.CompleteAdding();
await Task.WhenAll(tasks);
}
private void ConsumeProxyInfo(BlockingCollection<ProxyInfo> proxiesPipe, int currentThreadNumber)
{
while (!proxiesPipe.IsCompleted)
{
if (proxiesPipe.TryTake(out ProxyInfo proxy))
{
int port = proxy.Port;
string proxyIP = proxy.IPAddress;
ProxyAddress result = CheckProxyServer(proxyIP, port, currentThreadNumber);
// Method call must be thread-safe!
UpdateProxyDbRecord(result.IPAddress, result.OnlineStatus);
}
}
}
If I'm understanding your question properly, this is actually fairly simple to do with await Task.WhenAny. Basically, you keep a collection of all of the running tasks. Once you reach a certain number of tasks running, you wait for one or more of your tasks to finish, and then you remove the tasks that were completed from your collection and continue to add more tasks.
Here's an example of what I mean below:
var tasks = new List<Task>();
for (int i = 0; i < 20; i++)
{
// I want my list of tasks to contain at most 5 tasks at once
if (tasks.Count == 5)
{
// Wait for at least one of the tasks to complete
await Task.WhenAny(tasks.ToArray());
// Remove all of the completed tasks from the list
tasks = tasks.Where(t => !t.IsCompleted).ToList();
}
// Add some task to the list
tasks.Add(Task.Factory.StartNew(async delegate ()
{
await Task.Delay(1000);
}));
}
I suggest changing your approach slightly. Instead of starting and stopping threads, put your proxy server data in a concurrent queue, one item for each proxy server. Then create a fixed number of threads (or async tasks) to work on the queue. This is more likely to provide smooth performance (you aren't starting and stopping threads over and over, which has overhead) and is a lot easier to code, in my opinion.
A simple example:
class ProxyChecker
{
private ConcurrentQueue<ProxyInfo> _masterQueue = new ConcurrentQueue<ProxyInfo>();
public ProxyChecker(IEnumerable<ProxyInfo> listProxies)
{
foreach (var proxy in listProxies)
{
_masterQueue.Enqueue(proxy);
}
}
public async Task RunChecks(int maximumConcurrency)
{
var count = Math.Max(maximumConcurrency, _masterQueue.Count);
var tasks = Enumerable.Range(0, count).Select( i => WorkerTask() ).ToList();
await Task.WhenAll(tasks);
}
private async Task WorkerTask()
{
ProxyInfo proxyInfo;
while ( _masterList.TryDequeue(out proxyInfo))
{
DoTheTest(proxyInfo.IP, proxyInfo.Port)
}
}
}
Related
I want to generate an enumerable of tasks, the tasks will complete at different times.
How can I make a generator in C# that:
yields tasks
every few iterations, resolves previously yielded tasks with results that are only now known
The reason I want to do this is because I am processing a long iterable of inputs, and every so often I accumulate enough data from these inputs to send a batch API request and finalise my outputs.
Pseudocode:
IEnumerable<Task<Output>> Process(IEnumerable<Input> inputs)
{
var queuedInputs = Queue<Input>();
var cumulativeLength = 0;
foreach (var input in inputs)
{
yield return waiting task for this input
queuedInputs.Enqueue(input);
cumulativeLength += input.Length;
if (cumulativeLength > 10)
{
cumulativeLength = 0
GetFromAPI(queue).ContinueWith((apiTask) => {
Queue<BatchResult> batchResults = apiTask.Result;
while (queuedInputs.Count > 0)
{
batchResult = batchResults.Dequeue();
historicalInput = queuedInputs.Dequeue();
var output = MakeOutput(historicalInput, batchResult);
resolve earlier input's task with this output
}
});
}
}
}
The shape of your solution is going to be driven by the shape of your problem. There's a couple of questions I have because your problem domain seems odd:
Are all your inputs known at the outset? The (synchronous) IEnumerable<Input> implies they are.
Are you sure you want to wait for a batch of inputs before sending any query? What about the "remainder" if you're batching by 10 but have 55 inputs?
Assuming you do have synchronous inputs, and that you want to batch with remainders, you can just accumulate all your inputs immediately, batch them, and walk the batches, asynchronously providing outputs:
async IAsyncEnumerable<Output> Process(IEnumerable<Input> inputs)
{
foreach (var batchedInput in inputs.Batch(10))
{
var batchResults = await GetFromAPI(batchedInput);
for (int i = 0; i != batchedInput.Count; ++i)
yield return MakeOutput(batchedInput[i], batchResults[i]);
}
}
public static IEnumerable<IReadOnlyList<TSource>> Batch<TSource>(this IEnumerable<TSource> source, int size)
{
List<TSource>? batch = null;
foreach (var item in source)
{
batch ??= new List<TSource>(capacity: size);
batch.Add(item);
if (batch.Count == size)
{
yield return batch;
batch = null;
}
}
if (batch?.Count > 0)
yield return batch;
}
Update:
If you want to start the API calls immediately, you can move those out of the loop:
async IAsyncEnumerable<Output> Process(IEnumerable<Input> inputs)
{
var batchedInputs = inputs.Batch(10).ToList();
var apiCallTasks = batchedInputs.Select(GetFromAPI).ToList();
foreach (int i = 0; i != apiCallTasks.Count; ++i)
{
var batchResults = await apiCallTasks[i];
var batchedInput = batchedInputs[i];
for (int j = 0; j != batchedInput.Count; ++j)
yield return MakeOutput(batchedInput[j], batchResults[j]);
}
}
One approach is to use the TPL Dataflow library. This library offers a variety of components named "blocks" (TransformBlock, ActionBlock etc), where each block is processing its input data, and then propagates the results to the next block. The blocks are linked together so that the completion of the previous block in the pipeline triggers the completion of the next block etc, until the final block which is usually an ActionBlock<T> with no output. Here is an example:
var block1 = new TransformBlock<int, string>(item =>
{
Thread.Sleep(1000); // Simulate synchronous work
return item.ToString();
}, new()
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded,
EnsureOrdered = false
});
var block2 = new BatchBlock<string>(batchSize: 10);
var block3 = new ActionBlock<string[]>(async batch =>
{
await Task.Delay(1000); // Simulate asynchronous work
}); // The default MaxDegreeOfParallelism is 1
block1.LinkTo(block2, new() { PropagateCompletion = true });
block2.LinkTo(block3, new() { PropagateCompletion = true });
// Provide some input in the pipeline
block1.Post(1);
block1.Post(2);
block1.Post(3);
block1.Post(4);
block1.Post(5);
block1.Complete(); // Mark the first block as completed
await block3.Completion; // Await the completion of the last block
The TPL Dataflow library is powerful and flexible, but is has a weak point in the propagation of exceptions. There is no built-in way to instruct the block1 to stop working, if the block3 fails. You can read more about this issue here. It might not be a serious issue, if you don't expect your blocks to fail very often.
Assuming MyGenerator() returns List<Task<T>>, and the number of tasks is relatively small (even in the hundreds is probably fine) then you can use Task.WhenAny(), which returns the first Task that completes. Then remove that Task from the list, process the result, and move on to the next:
var tasks = MyGenerator();
while (tasks.Count > 0) {
var t = Task.WhenAny(tasks);
tasks.Remove(t);
var result = await t; // this won't actually wait since the task is already done
// Do something with result
}
There is a good discussion of this in an article by Stephen Toub, which explains in more detail, and gives alternatives if your task list is in the thousands: Processing tasks as they complete
There's also this article, but I think Stephen's is better written: Process asynchronous tasks as they complete (C#)
Using TaskCompletionSource:
IEnumerable<Task<Output>> Process(IEnumerable<Input> inputs)
{
var tcss = new List<TaskCompletionSource<Output>>();
var queue = new Queue<(Input, TaskCompletionSource<Output>)>();
var cumulativeLength = 0;
foreach (var input in inputs)
{
var tcs = new TaskCompletionSource<Output>();
queue.Enqueue((input, tcs));
tcss.Add(tcs);
cumulativeLength += input.Length;
if (cumulativeLength > 10)
{
cumulativeLength = 0
var queueClone = Queue<(Input, TaskCompletionSource<Input>)>(queue);
queue.Clear();
GetFromAPI(queueClone.Select(x => x.Item1)).ContinueWith((apiTask) => {
Queue<BatchResult> batchResults = apiTask.Result;
while (queueClone.Count > 0)
{
var batchResult = batchResults.Dequeue();
var (queuedInput, queuedTcs) = queueClone.Dequeue();
var output = MakeOutput(queuedInput, batchResult);
queuedTcs.SetResult(output)
}
});
}
}
GetFromAPI(queue.Select(x => x.Item1)).ContinueWith((apiTask) => {
Queue<BatchResult> batchResults = apiTask.Result;
while (queue.Count > 0)
{
var batchResult = batchResults.Dequeue();
var (queuedInput, queuedTcs) = queue.Dequeue();
var output = MakeOutput(queuedInput, batchResult);
queuedTcs.SetResult(output)
}
});
foreach (var tcs in tcss)
{
yield return tcs.Task;
}
}
Suppose I kick off 5 async tasks, and I want to print the results in the order they were requested:
public async void RunTasks()
{
var tasks = new List<Task<int>>();
for(int i=1; i<=5; i++)
{
tasks.Add(DoSomething(i));
}
var results = await Task.WhenAll(tasks);
Console.WriteLine(String.Join(',', results));
}
public async Task<int> DoSomething(int taskNumber)
{
var random = new Random();
await Task.Delay(random.Next(5000));
return taskNumber;
}
This will always print "1,2,3,4,5" - because Task.WhenAll() orders the results by the order requested, not by the order in which they finished.
Unfortunately this means I have to wait for ALL Tasks to finish until I can print anything.
How might I instead print the result of each task as soon as it's finished, but still respecting the order they were requested?
So I should always see "1,2,3,4,5" - but it may arrive gradually:
"1"
"1,2,3"
"1,2,3,4"
"1,2,3,4,5"
(no need to worry about the actual reasoning for doing this, treat it as a fun problem)
var tasks = new List<Task<int>>();
for(int i=1; i<=5; i++)
{
tasks.Add(DoSomething(i));
}
foreach (var task in tasks)
{
var result = await task;
Console.WriteLine(result);
}
We kick off all of the tasks first, then loop over them in order, awaiting each in turn. If the task being awaited has previously completed, the await just returns its result. Otherwise we wait until it completes.
Try a TransformBlock it will output the items it processes one by one in the order the were received by default even if the elements are processed in parallel.
public async Task Order()
{
var tBlock = new TransformBlock<int, string>(async x =>
{
await Task.Delay(100);
return x.ToString();
}, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = 10 });
var sub = tBlock.AsObservable().Subscribe(x => Console.Write(x));
foreach (var num in Enumerable.Range(0, 10))
{
tBlock.Post(num);
}
tBlock.Complete();
await tBlock.Completion;
sub.Dispose();
}
Output:
0123456789
I need to start tasks in parallel, but I choose to use Task.Run instead of Parallel.Foreach, so I can get some feedback when all tasks finished and enable UI controls.
private async void buttonStart_Click(object sender, EventArgs e)
{
var cells = objectListView.CheckedObjects;
if(cells != null)
{
List<Task> tasks = new List<Task>();
foreach (Cell c in cells)
{
Cell cell = c;
var progressHandler = new Progress<string>(value =>
{
cell.Status = value;
});
var progress = progressHandler as IProgress<string>;
Task t = Task.Run(() =>
{
progress.Report("Starting...");
int a = 123;
for (int i = 0; i < 200000; i++)
{
a = a + i;
Task.Delay(500).Wait();
}
progress.Report("Done");
});
tasks.Add(t);
}
await Task.WhenAll(tasks);
Console.WriteLine("Done, enabld UI controls");
}
}
So what I expect is that I see in UI "Starting..." almost instantly for all items. What I actually see is first 4 items are "Starting..." (I guess because all 4 CPU cores are used per thread), then each second or less new item is "Starting". I have total 37 items and it takes around 30 seconds for all items to start all tasks.
How can I make it as parallel as possible?
How can I make it as parallel as possible?
The part of inner for loop is simulating long running CPU-bound job, which I would like to start at the same time as much as possible.
It's already as parallel as possible. Starting 37 threads that all have CPU-bound work to do will not make it go any faster, since you're apparently running it on a 4-core machine. There are 4 cores, so only 4 threads can actually run at a time. The other 33 threads are going to be waiting while 4 are running. They would only appear to run simultaneously.
That said, if you really want to start up all those thread pool threads, you can do this by calling ThreadPool.SetMinThreads.
I need to start tasks in parallel, but I choose to use Task.Run instead of Parallel.Foreach, so I can get some feedback when all tasks finished and enable UI controls.
Since you have parallel work to do, you should use Parallel. If you want the nice resume-on-the-UI-thread behavior of await, then you can use a single await Task.Run, something like this:
private async void buttonStart_Click(object sender, EventArgs e)
{
var cells = objectListView.CheckedObjects;
if (cells == null)
return;
var workItems = cells.Select(c => new
{
Cell = c,
Progress = new Progress<string>(value => { c.Status = value; }),
}).ToList();
await Task.Run(() => Parallel.ForEach(workItems, item =>
{
var progress = item.Progress as IProgress<string>();
progress.Report("Starting...");
int a = 123;
for (int i = 0; i < 200000; i++)
{
a = a + i;
Thread.Sleep(500);
}
progress.Report("Done");
}));
Console.WriteLine("Done, enabld UI controls");
}
I'd say, it is as parallel as possible. If you have 4 cores, you can run 4 threads in parallel.
If you can do stuff while waiting for the "delay", have a look into asynchronous programming (where one thread can run multiple tasks "at once", because most of them are waiting for something).
EDIT: you can also run Parallel.ForEach in its own task and await that:
private async void buttonStart_Click(object sender, EventArgs e)
{
var cells = objectListView.CheckedObjects;
if(cells != null)
{
await Task.Run( () => Parallel.ForEach( cells, c => ... ) );
}
}
I think it relies on your taskcreation-options.
TaskCreationOptions.LongRunning
Here you can find further informations:
https://msdn.microsoft.com/en-us/library/system.threading.tasks.taskcreationoptions(v=vs.110).aspx
But you have to know, that task uses a threadpool with a finite maximum amount of threads. You can use LongRunning to signal, that this task needs a long time and should not clog your pool. I thinks it's more complex to create a long-running task, because the scheduler may create a new thread.
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace TaskTest
{
internal class Program
{
private static void Main(string[] args)
{
var demo = new Program();
demo.SimulateClick();
Console.ReadLine();
}
public void SimulateClick()
{
buttonStart_Click(null, null);
}
private async void buttonStart_Click(object sender, EventArgs e)
{
var tasks = new List<Task>();
for (var i = 0; i < 36; i++)
{
var taskId = i;
var t = Task.Factory.StartNew((() =>
{
Console.WriteLine($"Starting Task ({taskId})");
for (var ii = 0; ii < 200000; ii++)
{
Task.Delay(TimeSpan.FromMilliseconds(500)).Wait();
var s1 = new string(' ', taskId);
var s2 = new string(' ', 36-taskId);
Console.WriteLine($"Updating Task {s1}X{s2} ({taskId})");
}
Console.Write($"Done ({taskId})");
}),TaskCreationOptions.LongRunning);
tasks.Add(t);
}
await Task.WhenAll(tasks);
Console.WriteLine("Done, enabld UI controls");
}
}
}
I have a windows service (written in C#) that use the task parallel library dll to perform some parallel tasks (5 tasks a time)
After the tasks are executed once I would like to repeat the same tasks on an on going basis (hourly). Call the QueuePeek method
Do I use a timer or a counter like I have setup in the code snippet below?
I am using a counter to set up the tasks, once I reach five I exit the loop, but I also use a .ContinueWith to decrement the counter, so my thought is that the counter value would be below 5 hence the loop would continue. But my ContinueWith seems to be executing on the main thread and the loop then exits.
The call to DecrementCounter using the ContinueWith does not seem to work
FYI : The Importer class is to load some libraries using MEF and do the work
This is my code sample:
private void QueuePeek()
{
var list = SetUpJobs();
while (taskCounter < 5)
{
int j = taskCounter;
Task task = null;
task = new Task(() =>
{
DoLoad(j);
});
taskCounter += 1;
tasks[j] = task;
task.ContinueWith((t) => DecrementTaskCounter());
task.Start();
ds.SetJobStatus(1);
}
if (taskCounter == 0)
Console.WriteLine("Completed all tasks.");
}
private void DoLoad(int i)
{
ILoader loader;
DataService.DataService ds = new DataService.DataService();
Dictionary<int, dynamic> results = ds.AssignRequest(i);
var data = results.Where(x => x.Key == 2).First();
int loaderId = (int)data.Value;
Importer imp = new Importer();
loader = imp.Run(GetLoaderType(loaderId));
LoaderProcessor lp = new LoaderProcessor(loader);
lp.ExecuteLoader();
}
private void DecrementTaskCounter()
{
Console.WriteLine(string.Format("Decrementing task counter with threadId: {0}",Thread.CurrentThread.ManagedThreadId) );
taskCounter--;
}
I see a few issues with your code that can potentially lead to some hard to track-down bugs. First, if using a counter that all of the tasks can potentially be reading and writing to at the same time, try using Interlocked. For example:
Interlocked.Increment(ref _taskCounter); // or Interlocked.Decrement(ref _taskCounter);
If I understand what you're trying to accomplish, I think what you want to do is to use a timer that you re-schedule after each group of tasks is finished.
public class Worker
{
private System.Threading.Timer _timer;
private int _timeUntilNextCall = 3600000;
public void Start()
{
_timer = new Timer(new TimerCallback(QueuePeek), null, 0, Timeout.Infinite);
}
private void QueuePeek(object state)
{
int numberOfTasks = 5;
Task[] tasks = new Task[numberOfTasks];
for(int i = 0; i < numberOfTasks; i++)
{
tasks[i] = new Task(() =>
{
DoLoad();
});
tasks[i].Start();
}
// When all tasks are complete, set to run this method again in x milliseconds
Task.Factory.ContinueWhenAll(tasks, (t) => { _timer.Change(_timeUntilNextCall, Timeout.Infinite); });
}
private void DoLoad() { }
}
I am attempting to monitor a long running process. Right now the process created new Task objects for all the small pieces, but I need some way to monitor their progress to send status to a UI.
ExecutionContext ctx = new ExecutionContext()
{
Result = result,
LastCount = result.Values.Count
};
Task t = Task.Factory.StartNew(() =>
{
foreach (var dataSlice in dataObjects)
{
Task.Factory.StartNew(() =>
{
// Do Some Work
}, TaskCreationOptions.AttachedToParent);
}
});
ctx.ParentTask = t;
Task monitor = Task.Factory.StartNew( () =>
{
ctx.LastCount = ctx.Result.Values.Count;
}, TaskCreationOptions.LongRunning);
My problem, or perhaps question is, if I force my monitor task to wait (via a SpinWait or Sleep) will it possibly lock part of the Tasks created above it? I need the monitor to check status every now and then, but I don't want it's wait condition to kill another task that needs to run.
EDIT:
So I found an interesting approach that's very similar to what Hans suggested in the comments below. It comes in two pieces. One Task to happen multiple times in the middle, and one completion task to do the final clean-up. Still in testing, but it looks promising.
Here's what it looks like:
Task t = new Task(() =>
{
int i = 0;
for (int j = 0; j < 200; j++)
{
foreach (var session in sessions)
{
Task work = action.Invoke(SomeParameter);
if (i == 50 || i == 0)
{
work.ContinueWith(task => Task.Factory.StartNew(UpdateAction));
i = 1;
}
else
{
i++;
}
}
}
});
ctx.ParentTask = t;
t.ContinueWith(CompletionAction => Task.Factory.StartNew(() => CompleteExecution(SomeParameter)));
t.Start();