I am trying this code (just spawn some tasks and simulate work):
var tasks = Enumerable.Range(1, 10).Select(d => Task.Factory.StartNew(() =>
{
Console.Out.WriteLine("Processing [{0}]", d);
Task.Delay(20000).Wait(); // Simulate work. Here will be some web service calls taking 7/8+ seconds.
Console.Out.WriteLine("Task Complete [{0}]", d);
return (2 * d).ToString();
})).ToList();
var results = Task.WhenAll(tasks).Result;
Console.Out.WriteLine("All processing were complete with results: {0}", string.Join("|", results));
I was expecting to see 10 Processing ... in the console at once; but when I run, initially I see this output
Processing [1]
Processing [2]
Processing [3]
Processing [4]
Then 1/2 seconds later Processing [5], Processing [6] and others are shown slowly one after another.
Can you explain this? Does this mean tasks are being started as delayed? Why?
As mentioned in another answer, using TaskCreationOptions.LongRunning will solve your problem.
But this is not how you should approach your problem. Your example simulates CPU bound work. You say your tasks will be making calls to a web service - meaning they will be IO bound.
As such, they should be running asynchronously. However, Task.Delay(20000).Wait(); waits synchronously, so it doesn't represent what will/should actually be going on.
Take this example instead:
var tasks = Enumerable.Range(1, 10).Select(async d =>
{
Console.Out.WriteLine("Processing [{0}]", d);
await Task.Delay(5000); // Simulate IO work. Here will be some web service calls taking 7/8+ seconds.
Console.Out.WriteLine("Task Complete [{0}]", d);
return (2*d).ToString();
}).ToList();
var results = Task.WhenAll(tasks).Result;
Console.Out.WriteLine("All processing were complete with results: {0}", string.Join("|", results));
All tasks start instantly as expected.
I expect you have 4 cpu cores.
Having two (cpu bond) threads fighting over a core leads to it taking longer to complete the work then have 1 thread doing the first task, then doing the second task run.
Until it know otherwise the task system assumes tasks are short running and CPU bound and that they will use “none blocking” IO.
Therefore I expect that the task systems defaults to the number of threads being close to the number of cores.
Using TaskCreationOptions.LongRunning
provides a hint to the TaskScheduler that oversubscription may be
warranted. Oversubscription lets you create more threads than the
available number of hardware threads.
And lastly tasks are not threads, they are design to hide a lot of the details of threads from you, including controlling the number of threads that are in use. It is reasonable to create 100s of tasks, if you created 100s of threads all trying to run at the same time, the cpu cache etc will have a very hard time.
However lets get back to what you are trying to do. Your example simulates CPU bound work. You say your tasks will be making calls to a web service - meaning they will be IO bound.
As such, they should be running asynchronously. However, Task.Delay(20000).Wait(); waits synchronously, so it doesn't represent what will/should actually be going on. See Gediminas Masaitis answer for a code sample using await to make the delay asynchronously. However as soon as you use yet more asynchronously code, you need to think more about locking etc.
Asynchronously IO is clearly better if you have 100s of requests going on at the same time. However if you just have a "hand full" and no other usage of await in your application then TaskCreationOptions.LongRunning may be good enough.
Related
I have a procedure that needs to be executed on another thread, asynchronously. The procedure will process a number of data in batches (can receive 2 items or 40000). From local tests, the longest runtime was about 2 minutes (for 40000 items).
The scenario is the following: UI call to back-end, back-end starts an asynchronous thread that will run the procedure and then returns a boolean, to know if the request was received or not (these are the requirements, I do not use await/wait). I am not quite sure what to use here, between:
Task.Run(()=> MyProcedure())
OR
Task.Factory.StartNew(()=> MyProcedure(), TaskCreationOptions.LongRunning)
What would be the best option for this situation?
The procedure will process a number of data in batches
This can mean multiple things. If you are going to use async/await in those batches, such that your MyProcedure is actually asynchronous, then you should be fine with Task.Run; the fact that MyProcedure releases itself back to the thread-pool whenever it has gone async means that this should work fine, for example:
async Task MyProcedure()
{
while (ThereIsWorkToDo)
{
var batch = // ... gather some work
await ProcessBatchAsync(batch).ConfigureAwait(false);
}
}
However: if MyProcedure() is not asynchronous, but you just want to run it in the background, then yes, TaskCreationOptions.LongRunning might be reasonable, but for something that takes 2 minutes: so might be a regular dedicated thread.
I have this code
Lines.ToList().ForEach(y =>
{
globalQueue.AddRange(GetTasks(y.LineCode).ToList());
});
So for each line in my list of lines I get the tasks that I add to a global production queue. I can have 8 lines. Each get task request GetTasks(y.LineCode) take 1 minute. I would like to use parallelism to be sure I request my 8 calls together and not one by one.
What should I do?
Using another ForEach loop or using another extension method? Is there a ForEachAsync? Make the GetTasks request itself async?
Parallelism isn't concurrency. Concurrency isn't asynchrony. Running multiple slow queries in parallel won't make them run faster, quite the opposite. These are different problems and require very different solutions. Without a specific problem one can only give generic advice.
Parallelism - processing an 800K item array
Parallelism means processing a ton of data using multiple cores in parallel. To do that, you need to partition your data and feed each partition to a "worker" for processing. You need to minimize communication between workers and the need of synchronization to get the best performance, otherwise your workers will spend CPU time doing nothing. That means, no global queue updating.
If you have a lot of lines, or if line processing is CPU-bound, you can use PLINQ to process it :
var query = from y in lines.AsParallel()
from t in GetTasks(y.LineCode)
select t;
var theResults=query.ToList();
That's it. No need to synchronize access to a queue, either through locking or using a concurrent collection. This will use all available cores though. You can add WithDegreeOfParallelism() to reduce the number of cores used to avoid freezing
Concurrency - calling 2000 servers
Concurrency on the other hand means doing several different things at the same time. No partitioning is involved.
For example, if I had to query 8 or 2000 servers for monitoring data (true story) I wouldn't use Parallel or PLINQ. For one thing, Parallel and PLINQ use all available cores. In this case though they won't be doing anything, they'll just wait for responses. Parallelism classes can't handle async methods either because there's no point - they aren't meant to wait for responses.
A very quick & dirty solution would be to start multiple tasks and wait for them to return, eg :
var tasks=lines.Select(y=>Task.Run(()=>GetTasks(y.LineCode));
//Array of individual results
var resultsArray=await Task.WhenAll(tasks);
//flatten the results
var resultList=resultsArray.SelectMany(r=>r).ToList();
This will start all requests at once. Network Security didn't like the 2000 concurrent requests, since it looked like a hack attack and caused a bit of network flooding.
Concurrency with Dataflow
We can use the TPL Dataflow library and eg ActionBlock or TransformBlock to make the requests with a controlled degree of parallelism :
var options=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 4 ,
BoundedCapacity=10,
};
var spamBlock=new TransformManyBlock<Line,Result>(
y=>GetTasks(y.LineCode),
options);
var outputBlock=new BufferBlock<Result>();
spamBlock.LinkTo(outputBlock);
foreach(var line in lines)
{
await spamBlock.SendAsync(line);
}
spamBlock.Complete();
//Wait for all 4 workers to finish
await spamBlock.Completion;
Once the spamBlock completes, the results can be found in outputBlock. By setting a BoundedCapacity I ensure that the posting loop will wait if there are too many unprocessed messages in spamBlock's input queue.
An ActionBlock can handle asynchronous methods too. Assuming GetTasksAsync returns a Task<Result[]> we can use:
var spamBlock=new TransformManyBlock<Line,Result>(
y=>GetTasksAsync(y.LineCode),
options);
You can use Parallel Foreach:
Parallel.ForEach(Lines, (line) =>
{
globalQueue.AddRange(GetTasks(line.LineCode).ToList());
});
A Parallel.ForEach loop works like a Parallel.For loop. The loop
partitions the source collection and schedules the work on multiple
threads based on the system environment. The more processors on the
system, the faster the parallel method runs.
We're developing WebAPI which has some logic of decryption of around 200 items (can be more). Each decryption takes around 20ms.
We've tried to parallel the tasks so we'll get it done as soon as possible, but it seems we're getting some kind of a limit as the threads are getting reused by waiting for the older threads to complete (and there are only few used) - overall action takes around 1-2 seconds to complete...
What we basically want to achieve is get x amount of threads start at the same time and finish after those ~20 ms.
We tried this:
Await multiple async Task while setting max running task at a time
But it seems this only describes setting a limit while we want to release it...
Here's a snippet:
var tasks = new List<Task>();
foreach (var element in Elements)
{
var task = new Task(() =>
{
element.Value = Cipher.Decrypt((string)element.Value);
}
});
task.Start();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
What are we missing here?
Thanks,
Nir.
I cannot recommend parallelism on ASP.NET. It will certainly impact the scalability of your service, particularly if it is public-facing. I have thought "oh, I'm smart enough to do this" a couple of times and added parallelism in an ASP.NET app, only to have to tear it right back out a week later.
However, if you really want to...
it seems we're getting some kind of a limit
Is it the limit of physical cores on your machine?
We tried this: Await multiple async Task while setting max running task at a time
That solution is specifically for asynchronous concurrent code (e.g., I/O-bound). What you want is parallel (threaded) concurrent code (e.g., CPU-bound). Completely different use cases and solutions.
What are we missing here?
Your current code is throwing a ton of simultaneous tasks at the thread pool, which will attempt to handle them as best as it can. You can make this more efficient by using a higher-level abstraction, e.g., Parallel:
Parallel.ForEach(Elements, element =>
{
element.Value = Cipher.Decrypt((string)element.Value);
});
Parallel is more intelligent in terms of its partitioning and (re-)use of threads (i.e., not exceeding number of cores). So you should see some speedup.
However, I would expect it only to be a minor speedup. You are likely being limited by your number of physical cores.
Asuming no hyper threading:
If it takes 20ms for 1 item , then you can look at it as if it takes 1 core 20ms. If you want 200 items to complete in 20 ms, then you need 200 cores all for you. If you don't have that many, it just can't be done...
Under normal surcumstances, as many Task Will be scheduled parallel as optimal for you system
I have a used TaskParallel library in couple of places in my WCF application.
At one place I am using it like:
Place 1
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 100 };
Parallel.ForEach(objList, options, recurringOrder =>
{
Task.Factory.StartNew(() => ProcessSingleRequestForDebitOrder(recurringOrder));
//var th = new Thread(() => ProcessSingleRequestForDebitOrder(recurringOrder)) { Priority = ThreadPriority.Normal };
//th.Start();
//ProcessSingleRequestForDebitOrder( recurringOrder);
});
And in of another method I have used it like:
Place 2
System.Threading.Tasks.Task.Factory.StartNew(() => ProcessTransaction(objInput.Clone()));
Problem is time slicing between the two places. That is if I have called the the method where parallel loop is processing hundreds of records at Place 2 my thread at Place 1 is waiting till all the records have processed. Could some how I can time slice the processing?
I am using task parallel library for .net 3.5 from;
https://www.nuget.org/packages/TaskParallelLibrary/
The problem is that you have spawned a lot of tasks in place 1 and place 2 is now queued. The Parallel loop in place 1 does nothing because the body only starts a task which is done very quickly.
Probably, you should remove the StartNew thing from place 1 so that the degree of parallelism is lower. I'm not sure this will completely remove any problems because the Parallel loop might still fully utilize all available pool threads.
Doing IO with Parallel is an anti pattern anyway because the system-chosen DOP almost always is a bad choice. The TPL has no idea how to efficiently schedule IO.
You can make place 2 a LongRunning task so that it does not depend on the thread pool and is guaranteed to run.
You also can investigate using async IO so that you do not depend on the thread pool anymore.
I was writing some code to process a lot of data, and I thought it would be useful to have Parallel.ForEach create a file for each thread it creates so the output doesn't need to be synchronized (by me at least).
It looks something like this:
Parallel.ForEach(vals,
new ParallelOptions { MaxDegreeOfParallelism = 8 },
()=>GetWriter(), // returns a new BinaryWriter backed by a file with a guid name
(item, state, writer)=>
{
if(something)
{
state.Break();
return writer;
}
List<Result> results = new List<Result>();
foreach(var subItem in item.SubItems)
results.Add(ProcessItem(subItem));
if(results.Count > 0)
{
foreach(var result in results)
result.Write(writer);
}
return writer;
},
(writer)=>writer.Dispose());
What I expected to happen was that up to 8 files would be created and would persist through the entire run time. Then each would be Disposed when the entire ForEach call finishes. What really happens is that the localInit seems to be called once for each item, so I end up with hundreds of files. The writers are also getting disposed at the end of each item that is processed.
This shows the same thing happening:
var vals = Enumerable.Range(0, 10000000).ToArray();
long sum = 0;
Parallel.ForEach(vals,
new ParallelOptions { MaxDegreeOfParallelism = 8 },
() => { Console.WriteLine("init " + Thread.CurrentThread.ManagedThreadId); return 0L; },
(i, state, common) =>
{
Thread.Sleep(10);
return common + i;
},
(common) => Interlocked.Add(ref sum, common));
I see:
init 10
init 14
init 11
init 13
init 12
init 14
init 11
init 12
init 13
init 11
... // hundreds of lines over < 30 seconds
init 14
init 11
init 18
init 17
init 10
init 11
init 14
init 11
init 14
init 11
init 18
Note: if I leave out the Thread.Sleep call, it sometimes seems to function "correctly". localInit only gets called once each for the 4 threads that it decides to use on my pc. Not every time, however.
Is this the desired behavior of the function? What's going on behind the scenes that causes it to do this? And lastly, what's a good way to get my desired functionality, ThreadLocal?
This is on .NET 4.5, by the way.
Parallel.ForEach does not work as you think it does. It's important to note that the method is build on top of Task classes and that the relationship between Task and Thread is not 1:1. You can have, for example, 10 tasks that run on 2 managed threads.
Try using this line in your method body instead of the current one:
Console.WriteLine("ThreadId {0} -- TaskId {1} ",
Thread.CurrentThread.ManagedThreadId, Task.CurrentId);
You should see that the ThreadId will be reused across many different tasks, shown by their unique ids. You'll see this more if you left in, or increased, your call to Thread.Sleep.
The (very) basic idea of how the Parallel.ForEach method works, is that it takes your enumerable creates a series of tasks that will run process sections of the enumeration, the way this is done depends a lot on the input. There is also some special logic that checks for the case of a task exceeding a certain number of milliseconds without completing. If that case is true, then a new task may be spawned to help relieve the work.
If you looked at the documentation for the localinit function in Parallel.ForEach, you'll notice that it says that it returns the initial state of the local data for each _task_, not each thread.
You might ask why there are more than 8 tasks being spawned. That answer is similar to the last, found in the documentation for ParallelOptions.MaxDegreeOfParallelism.
Changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used.
This limit is only on the number of concurrent tasks, not a hard-limit on the number of tasks that will be created during the entire time it is processing. And as I mentioned above, there are times where a separate task will be spawned, which results in your localinit function being called multiple times and writing hundreds of files to disk.
Writing to disk is certainly a operation with a bit of latency, particularly if you're using synchronous I/O. When the disk operation happens, it blocks the entire thread; the same happens with Thread.Sleep. If a Task does this, it will block the thread it is currently running on, and no other tasks can run on it. Usually in these cases, the scheduler will spawn a new Task to help pick up the slack.
And lastly, what's a good way to get my desired functionality, ThreadLocal?
The bottom line is that thread locals don't make sense with Parallel.ForEach because you're not dealing with threads; you're dealing with tasks. A thread local could be shared between tasks because many tasks can use the same thread at the same time. Also, a task's thread local could change mid-execution, because the scheduler could preempt it from running and then continue its execution on a different thread, which would have a different thread local.
I'm not sure the best way to do it, but you could rely on the localinit function to pass in whatever resource you'd like, only allowing a resource to be used in one thread at a time. You can use the localfinally to mark it as no longer in use and thus available for another task to acquire. This is what those methods were designed for; each method is only called once per task that is spawned (see the remarks section of the Parallel.ForEach MSDN documentation).
You can also split the work yourself, and create your own set of threads and run your work. However, this is less idea, in my opinion, since the Parallel class already does this heavy lifting for you.
What you're seeing is the implementation trying to get your work done as quickly as possible.
To do this, it tries using different numbers of tasks to maximize throughput. It grabs a certain number of threads from the thread pool and runs your work for a bit. It then tries adding and removing threads to see what happens. It continues doing this until all your work is done.
The algorithm is quite dumb in that it doesn't know if your work is using a lot of CPU, or a lot of IO, or even if there is a lot of synchronization and the threads are blocking each other. All it can do is add and remove threads and measure how fast each unit of work completes.
This means it is continually calling your localInit and localFinally functions as it injects and retires threads - which is what you have found.
Unfortunately, there is no easy way to control this algorithm. Parallel.ForEach is a high-level construct that intentionally hides much of the thread-management code.
Using a ThreadLocal might help a bit, but it relies on the fact that the thread pool will reuse the same threads when Parallel.ForEach asks for new ones. This is not guarenteed - in fact, it is unlikely that the thread pool will use exactly 8 threads for the whole call. This means you will again be creating more files than necessary.
One thing that is guaranteed is that Parallel.ForEach will never use more than MaxDegreeOfParallelism threads at any one time.
You can use this to your advantage by creating a fixed-size "pool" of files that can be re-used by whichever threads are running at a particular time. You know that only MaxDegreeOfParallelism threads can run at once, so you can create that number of files before calling ForEach. Then grab one in your localInit and release it in your localFinally.
Of course, you will have to write this pool yourself and it must be thread-safe as it will be called concurrently. A simple locking strategy should be good enough, though, because threads are not injected and retired very quickly compared to the cost of a lock.
According to MSDN the localInit method is called once for each task, not for each thread:
The localInit delegate is invoked once for each task that participates in the loop's execution and returns the initial local state for each of those tasks.
localInit called when thread created.
if body takes so long it must create another thread and suspends current thread,
and if it creates another thread, it calls localInit
also when Parallel.ForEach called it creates threads as much as MaxDegreeOfParallelism value for example:
var k = Enumerable.Range(0, 1);
Parallel.ForEach(k,new ParallelOptions(){MaxDegreeOfParallelism = 4}.....
it create 4 thread when first it called