Parallel execution of function and reasonable number of instances - c#

I have a list of objects and I have to do some elaboration for each one of them, all of this in the least amount of time possible.
Since those elaborations are indipendent from each others, we've decided to do them in parallel with Parallel.ForEach.
Parallel.ForEach(hugeObjectList,
new ParallelOptions { MaxDegreeOfParallelism = 50 },
obj => DoSomeWork(obj)
);
Since it seems unreasonable to me setting a huge number on ParallelOptions.MaxDegreeOfParallelism (e.g. 50 or 100), how can we find the optimal number of parallel task to crunch this list?
Does Parallel.Foreach start a DoSomeWork on a different core? (so, since we have 4 cores, the correct degree of parallelism would be 4?)

I think this says it all
By default, For and ForEach will utilize however many threads the underlying scheduler provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used.
MSDN

Asking the platform should get you close to the optimum (for CPU bound work).
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
Doing nothing is another very good option, ie
//new ParallelOptions { MaxDegreeOfParallelism = 50 },
Edit
there's a lot of io with a database ...
That makes MaxDegreeOfParallelism = 1 another very good candidate. Or maybe 2.
What you really should be looking into is async/await and async database calls. Not the Parallel class.

The only way to know for sure is to test it. More threads does not equal better performance, and may often yield worse performance. Some thoughts:
Designing an algorithm for a single thread, and then adding Parallel.For around it is pointless. You must change your algorithm to take advantage of multiple threads or the benefits to parallel processing will be minor or negative.
If you are reading from disk or downloading data over a network connection where the server is able to feed you as fast as you get the data, you may find that a producer/consumer pattern performs best. If the processing is computationally expensive, use many consumer threads (I tend to use Num Cores - 2. One for the UI, one for the producer). If not computationally expensive, it won't matter how many consumer threads you use.
If you are downloading data from the Internet from a variety of sources, and the servers take time to respond, you should start up quite a few threads (50-100 is not crazy). This is because the threads will just sit there waiting for the server to respond.

Related

How many threads are too many threads?

I need to develop a procedure that will retrieve 100 rows (product items) performing some very complex calculations.
On average, 0.3s is required for each item, which means that I may face 30s delay if I perform the calculation in serial.
The calculation for each item is not depended on the result of the other items, so I am thinking to use c# asynchronous programming features in order to create threads that will make the calculations in parallel.
The above calculation will be performed in a ASP.NET Core app (.Net 6) that will serve about 10 users.
Until now, I used asynchronous programming for the purposes of keeping the main thread responsive, so I had no worries about system resources. Now I have to design a procedure that may require 100 x 10 = 1000 threads.
Keep in mind that the calculations are performed in the database, so calculation does not require any additional resources.
Should I do it?
If whatever calculation you are running is compute limited, there is no reason to use more threads than you have logical cpu cores. There might be reasons to use less threads to reserve some resources for other things, like keeping the UI responsive. A parallel.For would be a typical solution to run compute limited code concurrently, this will automatically scale the numbers of threads used, but also allow a maximum to be set if you want to reserve some cores.
If you are IO limited you do not really need to use any threads. As long as you are using "true" asynchronous calls no threads will be used while the IO system is working. But note that IO operations may not scale well with higher concurrency, since there will be hardware limits. This is especially true if you have spinning disks.
If your workload is mixed compute and IO you might want to pipeline the IO and the compute, so take a look at DataFlow. If most of the work is performed by the database you may need to just try and see how many threads can be used before performance starts to drop. Databases will involve a mix of IO and compute, but also things locks and semaphores that might add additional limits. You should also check your queries to ensure that they are as efficient as possible, there is no need to spend a bunch of time optimizing concurrency if a index will make the queries 100 times faster.

Type of threading to use in c# for heavy IO operations

I am tasked with updating a c# application (non-gui) that is very single-threaded in it's operation and add multi-threading to it to get it to turn queues of work over quicker.
Each thread will need to perform a very minimal amount of calculations, but most of the work will be calling on and wait on SQL Server requests. So, lots of waiting as compared to CPU time.
A couple of requirements will be:
Running on some limited hardware (that is, just a couple of cores). The current system, when it's being "pushed" only takes about 25% CPU. But, since it's mostly doing waits for the SQL Server to respond (different server), we would like to the capability to have more threads than cores.
Be able to limit the number of threads. I also can't just have an unlimited number of threads going either. I don't mind doing the limiting myself via an Array, List, etc.
Be able to keep track of when these threads complete so that I can do some post-processing.
It just seems to me that the .NET Framework has so many different ways of doing threads, I'm not sure if one is better than the other for this task. I'm not sure if I should be using Task, Thread, ThreadPool, something else... It appers to me that async \ await model would not be a good fit in this case though as it waits on one specific task to complete.
I'm not sure if I should be using Task, Thread, ThreadPool, something else...
In your case it matters less than you would think. You can focus on what fits your (existing) code style and dataflow the best.
since it's mostly doing waits for the SQL Server to respond
Your main goal would be to get as many of those SQL queries going in parallel as possible.
Be able to limit the number of threads.
Don't worry about that too much. On 4 cores, with 25% CPU, you can easily have 100 threads going. More on 64bit. But you don't want 1000s of threads. A .net Thread uses 1MB minimum, estimate how much RAM you can spare.
So it depends on your application, how many queries can you get running at the same time. Worry about thread-safety first.
When the number of parallel queries is > 1000, you will need async/await to run on fewer threads.
As long as it is < 100, just let threads block on I/O. Parallel.ForEach() , Parallel.Invoke() etc look like good tools.
The 100 - 1000 range is the grey area.
add multi-threading to it to get it to turn queues of work over quicker.
Each thread will need to perform a very minimal amount of calculations, but most of the work will be calling on and wait on SQL Server requests. So, lots of waiting as compared to CPU time.
With that kind of processing, it's not clear how multithreading will benefit you. Multithreading is one form of concurrency, and since your workload is primarily I/O-bound, asynchrony (and not multithreading) would be the first thing to consider.
It just seems to me that the .NET Framework has so many different ways of doing threads, I'm not sure if one is better than the other for this task.
Indeed. For reference, Thread and ThreadPool are pretty much legacy these days; there are much better higher-level APIs. Task should also be rare if used as a delegate task (e.g., Task.Factory.StartNew).
It appers to me that async \ await model would not be a good fit in this case though as it waits on one specific task to complete.
await will wait on one task at a time, yes. Task.WhenAll can be used to combine
multiple tasks and then you can await on the combined task.
get it to turn queues of work over quicker.
Be able to limit the number of threads.
Be able to keep track of when these threads complete so that I can do some post-processing.
It sounds to me that TPL Dataflow would be the best approach for your system. Dataflow allows you to define a "pipeline" through which data flows, with some steps being asynchronous (e.g., querying SQL Server) and other steps being parallel (e.g., data processing).
I was asking a high-level question to try and get back a high-level answer.
You may be interested in my book.
The TPL Dataflow library is probably one of the best options for this job. Here is how you could construct a simple dataflow pipeline consisting of two blocks. The first block accepts a filepath and produces some intermediate data, that can be later inserted to the database. The second block consumes the data coming from the first block, by sending them to the database.
var inputBlock = new TransformBlock<string, IntermediateData>(filePath =>
{
return GetIntermediateDataFromFilePath(filePath);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = Environment.ProcessorCount // What the local machine can handle
});
var databaseBlock = new ActionBlock<IntermediateData>(item =>
{
SaveItemToDatabase(item);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 20 // What the database server can handle
});
inputBlock.LinkTo(databaseBlock);
Now every time a user uploads a file, you just save the file in a temp path, and post the path to the first block:
inputBlock.Post(filePath);
And that's it. The data will flow from the first to the last block of the pipeline automatically, transformed and processed along the way, according to the configuration of each block.
This is an intentionally simplified example to demonstrate the basic functionality. A production-ready implementation will probably have more options defined, like the CancellationToken and BoundedCapacity, will watch the return value of inputBlock.Post to react in case the block can't accept the job, may have completion propagation, watch the databaseBlock.Completion property for errors etc.
If you are interested at following this route, it would be a good idea to study the library a bit, in order to become familiar with the options available. For example there is a TransformManyBlock available, suitable for producing multiple outputs from a single input. The BatchBlock may also be useful in some cases.
The TPL Dataflow is built-in the .NET Core, and available as a package for .NET Framework. It has some learning curve, and some gotchas to be aware of, but it's nothing terrible.
It appers to me that async \ await model would not be a good fit in this case though as it waits on one specific task to complete.
That is wrong. Async/await is just a syntax to simplify a state-machine mechanism for asynchronous code. It waits without consuming any thread. in other words async keyword does not create thread and await does not hold up any thread.
Be able to limit the number of threads
see How to limit the amount of concurrent async I/O operations?
Be able to keep track of when these threads complete so that I can do some post-processing.
If you don't use "fire and forget" pattern then you can keep track of the task and its exceptions just by writing await task
var task = MethodAsync();
await task;
PostProcessing();
async Task MethodAsync(){ ... }
Or for a similar approach you can use ContinueWith:
var task = MethodAsync();
await task.ContinueWith(() => PostProcessing());
async Task MethodAsync(){ ... }
read more:
Releasing threads during async tasks
https://learn.microsoft.com/en-us/dotnet/standard/asynchronous-programming-patterns/?redirectedfrom=MSDN

Performances of PLINQ vs TPL

I have some DB operations to perform and I tried using PLINQ:
someCollection.AsParallel()
.WithCancellation(token)
.ForAll(element => ExecuteDbOperation(element))
And I notice it is quite slow compared to:
var tasks = someCollection.Select(element =>
Task.Run(() => ExecuteDbOperation(element), token))
.ToList()
await Task.WhenAll(tasks)
I prefer the PLINQ syntax, but I am forced to use the second version for performances.
Can someone explain the big difference in performances?
My supposition that this is because of the number of threads created.
In the first example this number will be roughly equal to the number of cores of your computer. By contrast, the second example will create as many threads as someCollection has elements. For IO operation that's generally more efficient.
The Microsoft guide "Patterns_of_Parallel_Programming_CSharp" recommends for IO operation to create more threads than default (p. 33):
var addrs = new[] { addr1, addr2, ..., addrN };
var pings = from addr in addrs.AsParallel().WithDegreeOfParallelism(16)
select new Ping().Send(addr);
Both PLINQ and Parallel.ForEach() were primarily designed to deal with CPU-bound workloads, which is why they don't work so well for your IO-bound work. For some specific IO-bound work, there is an optimal degree of parallelism, but it doesn't depend on the number of CPU cores, while the degree of parallelism in PLINQ and Parallel.ForEach() does depend on the number of CPU cores, to a greater or lesser degree.
Specifically, the way PLINQ works is to use a fixed number of Tasks, by default based on the number of CPU cores on your computer. This is meant to work well for a chain of PLINQ methods. But it seems this number is smaller than the ideal degree of parallelism for your work.
On the other hand Parallel.ForEach() delegates deciding how many Tasks to run to the ThreadPool. And as long as its threads are blocked, ThreadPool slowly keeps adding them. The result is that, over time, Parallel.ForEach() might get closer to the ideal degree of parallelism.
The right solution is to figure out what the right degree of parallelism for your work is by measuring, and then using that.
Ideally, you would make your code asynchronous and then use some approach to limit the degree of parallelism fro async code.
Since you said you can't do that (yet), I think a decent solution might be to avoid the ThreadPool and run your work on dedicated threads (you can create those by using Task.Factory.StartNew() with TaskCreationOptions.LongRunning).
If you're okay with sticking to the ThreadPool, another solution would be to use PLINQ ForAll(), but also call WithDegreeOfParallelism().
I belive if you get let say more then 10000 elements it will be better to use PLINQ because it won't create task for each element of your collection because it uses a Partitioner inside it. Each task creation has some overhead data initialization inside it. Partitioner will create only as many tasks that are optimized for currently avaliable cores, so it will re-use this tasks with new data to process. You can read more about it here: http://blogs.msdn.com/b/pfxteam/archive/2009/05/28/9648672.aspx

best use of Parallel.ForEach / Multithreading

I need to scrape data from a website.
I have over 1,000 links I need to access, and previously I was dividing the links 10 per thread, and would start 100 threads each pulling 10. After few test cases, 100 threads was the best count to minimize the time it retrieved the content for all the links.
I realized that .NET 4.0 offered better support for multi-threading out of the box, but this is done based on how many cores you have, which in my case does not spawn enough threads. I guess what I am asking is: what is the best way to optimize the 1,000 link pulling. Should I be using .ForEach and let the Parallel extension control the amount threads that get spawned, or find a way to tell it how many threads to start and divide the work?
I have not worked with Parallel before so maybe my approach maybe wrong.
you can use MaxDegreeOfParallelism property in Parallel.ForEach to control the number of threads that will be spawned.
Heres the code snippet -
ParallelOptions opt = new ParallelOptions();
opt.MaxDegreeOfParallelism = 5;
Parallel.ForEach(Directory.GetDirectories(Constants.RootFolder), opt, MyMethod);
In general, Parallel.ForEach() is quite good at optimizing the number of threads. It accounts for the number of cores in the system, but also takes into account what the threads are doing (CPU bound, IO bound, how long the method runs, etc.).
You can control the maximum degree of parallelization, but there's no mechanism to force more threads to be used.
Make sure your benchmarks are correct and can be compared in a fair manner (e.g. same websites, allow for a warm-up period before you start measuring, and do many runs since response time variance can be quite high scraping websites). If after careful measurement your own threading code is still faster, you can conclude that you have optimized for your particular case better than .NET and stick with your own code.
Something worth checking out is the TPL Dataflow library.
DataFlow on MSDN.
See Nesting await in Parallel.ForEach
The whole idea behind Parallel.ForEach() is that you have a set of threads and each processes part of the collection. As you noticed, this doesn't work with async-await, where you want to release the thread for the duration of the async call.
Also, the walkthrough Creating a Dataflow Pipeline specifically sets up and processes multiple web page downloads. TPL Dataflow really was designed for that scenario.
Hard to say without looking at your code and how the collection is defined, I've found that Parallel.Invoke is the most flexible. try msdn? ... sounds like you are looking to use Parallel.For Method (Int32, Int32, Action<Int32, ParallelLoopState>)

C# Multithreading File IO (Reading)

We have a situation where our application needs to process a series of files and rather than perform this function synchronously, we would like to employ multi-threading to have the workload split amongst different threads.
Each item of work is:
1. Open a file for read only
2. Process the data in the file
3. Write the processed data to a Dictionary
We would like to perform each file's work on a new thread?
Is this possible and should be we better to use the ThreadPool or spawn new threads keeping in mind that each item of "work" only takes 30ms however its possible that hundreds of files will need to be processed.
Any ideas to make this more efficient is appreciated.
EDIT: At the moment we are making use of the ThreadPool to handle this. If we have 500 files to process we cycle through the files and allocate each "unit of processing work" to the threadpool using QueueUserWorkItem.
Is it suitable to make use of the threadpool for this?
I would suggest you to use ThreadPool.QueueUserWorkItem(...), in this, threads are managed by the system and the .net framework. The chances of you meshing up with your own threadpool is much higher. So I would recommend you to use Threadpool provided by .net .
It's very easy to use,
ThreadPool.QueueUserWorkItem(new WaitCallback(YourMethod), ParameterToBeUsedByMethod);
YourMethod(object o){
Your Code here...
}
For more reading please follow the link http://msdn.microsoft.com/en-us/library/3dasc8as%28VS.80%29.aspx
Hope, this helps
I suggest you have a finite number of threads (say 4) and then have 4 pools of work. I.e. If you have 400 files to process have 100 files per thread split evenly. You then spawn the threads, and pass to each their work and let them run until they have finished their specific work.
You only have a certain amount of I/O bandwidth so having too many threads will not provide any benefits, also remember that creating a thread also takes a small amount of time.
Instead of having to deal with threads or manage thread pools directly I would suggest using a higher-level library like Parallel Extensions (PEX):
var filesContent = from file in enumerableOfFilesToProcess
select new
{
File=file,
Content=File.ReadAllText(file)
};
var processedContent = from content in filesContent
select new
{
content.File,
ProcessedContent = ProcessContent(content.Content)
};
var dictionary = processedContent
.AsParallel()
.ToDictionary(c => c.File);
PEX will handle thread management according to available cores and load while you get to concentrate about the business logic at hand (wow, that sounded like a commercial!)
PEX is part of the .Net Framework 4.0 but a back-port to 3.5 is also available as part of the Reactive Framework.
I suggest using the CCR (Concurrency and Coordination Runtime) it will handle the low-level threading details for you. As for your strategy, one thread per work item may not be the best approach depending on how you attempt to write to the dictionary, because you may create heavy contention since dictionaries aren't thread safe.
Here's some sample code using the CCR, an Interleave would work nicely here:
Arbiter.Activate(dispatcherQueue, Arbiter.Interleave(
new TeardownReceiverGroup(Arbiter.Receive<bool>(
false, mainPort, new Handler<bool>(Teardown))),
new ExclusiveReceiverGroup(Arbiter.Receive<object>(
true, mainPort, new Handler<object>(WriteData))),
new ConcurrentReceiverGroup(Arbiter.Receive<string>(
true, mainPort, new Handler<string>(ReadAndProcessData)))));
public void WriteData(object data)
{
// write data to the dictionary
// this code is never executed in parallel so no synchronization code needed
}
public void ReadAndProcessData(string s)
{
// this code gets scheduled to be executed in parallel
// CCR take care of the task scheduling for you
}
public void Teardown(bool b)
{
// clean up when all tasks are done
}
In the long run, I think you'll be happier if you manage your own threads. This will let you control how many are running and make it easy to report status.
Build a worker class that does the processing and give it a callback routine to return results and status.
For each file, create a worker instance and a thread to run it. Put the thread in a Queue.
Peel threads off of the queue up to the maximum you want to run simultaneously. As each thread completes go get another one. Adjust the maximum and measure throughput. I prefer to use a Dictionary to hold running threads, keyed by their ManagedThreadId.
To stop early, just clear the queue.
Use locking around your thread collections to preserve your sanity.
Use ThreadPool.QueueUserWorkItem to execute each independent task. Definitely don't create hundreds of threads. That is likely to cause major headaches.
The general rule for using the ThreadPool is if you don't want to worry about when the threads finish (or use Mutexes to track them), or worry about stopping the threads.
So do you need to worry about when the work is done? If not, the ThreadPool is the best option. If you want to track the overall progress, stop threads then your own collection of threads is best.
ThreadPool is generally more efficient if you are re-using threads. This question will give you a more detailed discussion.
Hth
Using the ThreadPool for each individual task is definitely a bad idea. From my experience this tends to hurt performance more than helping it. The first reason is that a considerable amount of overhead is required just to allocate a task for the ThreadPool to execute. By default, each application is assigned it's own ThreadPool that is initialized with ~100 thread capacity. When you are executing 400 operations in a parallel, it does not take long to fill the queue with requests and now you have ~100 threads all competing for CPU cycles. Yes the .NET framework does a great job with throttling and prioritizing the queue, however, I have found that the ThreadPool is best left for long-running operations that probably won't occur very often (loading a configuration file, or random web requests). Using the ThreadPool to fire off a few operations at random is much more efficient than using it to execute hundreds of requests at once. Given the current information, the best course of action would be something similar to this:
Create a System.Threading.Thread (or use a SINGLE ThreadPool thread) with a queue that the application can post requests to
Use the FileStream's BeginRead and BeginWrite methods to perform the IO operations. This will cause the .NET framework to use native API's to thread and execute the IO (IOCP).
This will give you 2 leverages, one is that your requests will still get processed in parallel while allowing the operating system to manage file system access and threading. The second is that because the bottleneck of the vast majority of systems will be the HDD, you can implement a custom priority sort and throttling to your request thread to give greater control over resource usage.
Currently I have been writing a similar application and using this method is both efficient and fast... Without any threading or throttling my application was only using 10-15% CPU, which can be acceptable for some operations depending on the processing involved, however, it made my PC as slow as if an application was using 80%+ of the CPU. This was the file system access. The ThreadPool and IOCP functions do not care if they are bogging the PC down, so don't get confused, they are optimized for performance, even if that performance means your HDD is squeeling like a pig.
The only problem I have had is memory usage ran a little high (50+ mb) during the testing phaze with approximately 35 streams open at once. I am currently working on a solution similar to the MSDN recommendation for SocketAsyncEventArgs, using a pool to allow x number of requests to be operating simultaneously, which ultimately led me to this forum post.
Hope this helps somebody with their decision making in the future :)

Categories

Resources