Batching Tasks and Running them in Parallel - c#

I have a List of items, and I would like to go through each item, create a task and launch the task. But, I want it to do batches of 10 tasks at once.
For example, if I have 100 URL's in a list, I want it to group them into batches of 10, and loop through batches getting the web response from 10 URL's per batch iteration.
Is this possible?
I am using C# 5 and .NET 4.5.

You can use Parallel.For() or Parallel.ForEach(), they will execute the work on a number of Tasks.
When you need precise control over the batches you could use a custom Partitioner but given that the problem is about URLs it will probably make more sense to use the more common MaxDegreeOfParallelism option.

The Partitioner has a good algorithm for creating the batches depending also on the number of cores.
Parallel.ForEach(Partitioner.Create(from, to), range =>
{
for (int i = range.Item1; i < range.Item2; i++)
{
// ... process i
}
});

Related

Concurrently running multiple tasks C#

I have REST web API service in IIS which takes a collection of request objects. The user can enter more than 100 request objects.
I want to run this 100 request concurrently and then aggregate the result and send it back. This involves both I/O operation (calling to backend services for each request) and CPU bound operations (to compute few response elements)
Code snippet -
using System.Threading.Tasks;
....
var taskArray = new Task<FlightInformation>[multiFlightStatusRequest.FlightRequests.Count];
for (int i = 0; i < multiFlightStatusRequest.FlightRequests.Count; i++)
{
var z = i;
taskArray[z] = Tasks.Task.Run(() =>
PerformLogic(multiFlightStatusRequest.FlightRequests[z],lite, fetchRouteByAnyLeg)
);
}
Task.WaitAll(taskArray);
for (int i = 0; i < taskArray.Length; i++)
{
flightInformations.Add(taskArray[i].Result);
}
public Object PerformLogic(Request,...)
{
//multiple IO operations each depends on the outcome of the previous result
//Computations after getting the result from all I/O operations
}
If i individually run the PerformLogic operation (for 1 object) it is taking 300 ms, now my requirement is when I run this PerformLogic() for 100 objects in a single request it should take around 2 secs.
PerformLogic() has the following steps - 1. Call a 3rd Party web service to get some details 2. Based on the details call another 3rd Party webservice 3. Collect the result from the webservice, apply few transformation
But with Task.run() it takes around 7 secs, I would like to know the best approach to handle concurrency and achieve the desired NFR of 2 secs.
I can see that at any point of time 7-8 threads are working concurrently
not sure if I can spawn 100 threads or tasks may be we can see some better performance. Please suggest an approach to handle this efficiently.
Judging by this
public Object PerformLogic(Request,...)
{
//multiple IO operations each depends on the outcome of the previous result
//Computations after getting the result from all I/O operations
}
I'd wager that PerformLogic spends most its time waiting on the IO operations. If so, there's hope with async. You'll have to rewrite PerformLogicand maybe even the IO operations - async needs to be present in all levels, from the top to the bottom. But if you can do it, the result should be a lot faster.
Other than that - get faster hardware. If 8 cores take 7 seconds, then get 32 cores. It's pricey, but could still be cheaper than rewriting the code.
First, don't reinvent the wheel. PLINQ is perfectly capable of doing stuff in parallel, there is no need for manual task handling or result merging.
If you want 100 tasks each taking 300ms done in 2 seconds, you need at least 15 parallel workers, ignoring the cost of parallelization itself.
var results = multiFlightStatusRequest.FlightRequests
.AsParallel()
.WithDegreeOfParallelism(15)
.Select(flightRequest => PerformLogic(flightRequest, lite, fetchRouteByAnyLeg)
.ToList();
Now you have told PLinq to use 15 concurrent workers to work on your queue of tasks. Are you sure your machine is up to the task? You could put any number you want in there, that doesn't mean that your computer magically gets the power to do that.
Another option is to look at your PerformLogic method and optimize that. You call it 100 times, maybe it's worth optimizing.

Parallel.ForEach performance

I am using Parallel.ForEach to extract a bunch of zipped files and copy them to a shared folder on a different machine, where then a BULK INSERT process is started. This all works well but i have noticed that, as soon as some big files come along, no new tasks are started. I assume this is because some files take longer than others, that the TPL starts scaling down, and stops creating new Tasks. I have set the MaxDegreeOfParallelism to a reasonable number (8). When i look at the CPU activity, i can see, that most of the time the SQL Server machine is below 30%, even less when it sits on a single BULK INSERT task. I think it could do more work. Can i somehow force the TPL to create more simultanously processed Tasks?
The reason is most likely the way Parallel.ForEach processes items by default. If you use it on array or something that implements IList (so that total length and indexer is available) - it will split whole workload in batches. Then separate thread will process each batch. That means if batches has different "size" (by size I mean time to processes them) - "small" batches will complete faster.
For example, let's look at this code:
var delays = Enumerable.Repeat(100, 24).Concat(Enumerable.Repeat(2000, 4)).ToArray();
Parallel.ForEach(delays, new ParallelOptions() {MaxDegreeOfParallelism = 4}, d =>
{
Thread.Sleep(d);
Console.WriteLine("Done with " + d);
});
If you run it, you will see all "100" (fast) items are processed fast and in parallel. However, all "2000" (slow) items are processed in the end one by one, without any parallelizm at all. That's because all "slow" items are in the same batch. Workload was splitted in 4 batches (MaxDegreeOfParallelism = 4), and first 3 contain only fast items. They are completed fast. Last batch has all slow items and so thread dedicated to this batch will process them one by one.
You can "fix" that for your situation either by ensuring that items are distributed evenly (so that "slow" items are not all together in source collection), or for example with custom partitioner:
var delays = Enumerable.Repeat(100, 24).Concat(Enumerable.Repeat(2000, 4)).ToArray();
var partitioner = Partitioner.Create(delays, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitioner, new ParallelOptions {MaxDegreeOfParallelism = 4}, d =>
{
Thread.Sleep(d);
Console.WriteLine("Done with " + d);
});
NoBuffering ensures that items are taken one at a time, so avoids the problem.
Using another means to parallelize your work (such as SemaphoreSlim, or BlockingCollection) are also an option.

C# run method multiple time concurrently

I have the following C# code:
string[] files = Directory.GetFiles(#"C:\Users\Documents\DailyData", "*.txt", SearchOption.AllDirectories);
for (int ii = 0; ii < files.Length; ii++)
{
perFile(files[ii]);
}
Where per file opens each file and creates a series of generic lists, then writes the results to a file.
Each iteration takes about 15 minutes.
I want the perFile method to execute concurrently eg not wait for the current iteration to finish before starting the next one.
My questions are:
How would I do this?
How can I control how many instances of perFile are running concurrently
How can I determine how many concurrent instances of perFile my computer can handle at one time
Try to use Parallel.ForEach. You can use ParallelOptions.MaxDegreeOfParallelism property to set max count of running perFile.
You can use Parallel.For to do what you need if you're using .Net 4.5 and above, You can control how many threads to use by providing a ParallelOptions with MaxDegreeOfParalleism set to a number you like.
Example code:
ParallelOptions POptions = new ParallelOptions();
POptions.MaxDegreeOfParalleism = ReplaceThisWithSomeNumber;
Parallel.For(0,files.Length,POptions,(x) => {perFile(files[x]);});
Note that your program will be IO bound though (read Asynchronous IO for more information on how to tackle that)

Chunk partitioning IEnumerable in Parallel.Foreach

Does anyone know of a way to get the Parallel.Foreach loop to use chunk partitioning versus, what i believe is range partitioning by default. It seems simple when working with arrays because you can just create a custom partitioner and set load-balancing to true.
Since the number of elements in an IEnumerable isn't known until runtime I can't seem to figure out a good way to get chunk partitioning to work.
Any help would be appreciated.
thanks!
The tasks i'm trying to perform on each object take significantly different times to perform. At the end i'm usually waiting hours for the last thread to finish its work. What I'm trying to achieve is to have the parallel loop request chunks along the way instead of pre-allocating items to each thread.
If your IEnumerable was really something that had a an indexer (i.e you could do obj[1] to get a item out) you could do the following
var rangePartitioner = Partitioner.Create(0, source.Length);
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
// Loop over each range element without a delegate invocation.
for (int i = range.Item1; i < range.Item2; i++)
{
var item = source[i]
//Do work on item
}
});
However if it can't do that you must write a custom partitioner by creating a new class derived from System.Collections.Concurrent.Partitioner<TSource>. That subject is too broad to cover in a SO answer but you can take a look at this guide on the MSDN to get you started.
UPDATE: As of .NET 4.5 they added a Partitioner.Create overload that does not buffer data, it has the same effect of making a custom partitioner with a range max size of 1. With this you won't get a single thread that has a bunch of queued up work if it got unlucky with a bunch of slow items in a row.
var partitoner = Partitioner.Create(source, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitoner, item =>
{
//Do work
}

Start new threads in each loop and control amount of threads to be not more than x-amount at the same time

I have been writing "linear" winforms for couple of months and now I am trying to figure out threading.
this is my loop that has around 40,000 rows and it takes around 1 second to perform a task on this row:
foreach (String CASE in MAIN_CASES_LIST)
{
//bunch of code here
}
How do I
put each loop into separate thread
maintain no more than x-amount of threads at the same time
If you're on .NET 4 you can utilize Parallel.ForEach
Parallel.ForEach(MAIN_CASES_LIST, CASE =>
{
//bunch of code here
});
There's a great library called SmartThreadPool which may be useful here, it does a lot of useful stuff with threading and queueing, abstracting most of this away from you
Not sure if it will help you, but you can queue up a crapload of work items, limit the number of threads, etc etc
http://www.codeproject.com/Articles/7933/Smart-Thread-Pool
Of course if you want to get your hands dirty with multi-threading or use Parallel go for it, it's just a suggestion :)
To mix the answers above, and add a limit to the maximum number of threads created, you can use this overloaded call. Just be sure to add "using System.Threading.Tasks;" at the top.
LinkedList<String> theList = new LinkedList<string>();
ParallelOptions parOptions = new ParallelOptions();
parOptions.MaxDegreeOfParallelism = 5; //only up to 5 threads allowed.
Parallel.ForEach(theList.AsEnumerable(), parOptions , (string CASE) =>
{
//bunch of code here
});

Categories

Resources