Chunk partitioning IEnumerable in Parallel.Foreach

Chunk partitioning IEnumerable in Parallel.Foreach - c#

Does anyone know of a way to get the Parallel.Foreach loop to use chunk partitioning versus, what i believe is range partitioning by default. It seems simple when working with arrays because you can just create a custom partitioner and set load-balancing to true.
Since the number of elements in an IEnumerable isn't known until runtime I can't seem to figure out a good way to get chunk partitioning to work.
Any help would be appreciated.
thanks!
The tasks i'm trying to perform on each object take significantly different times to perform. At the end i'm usually waiting hours for the last thread to finish its work. What I'm trying to achieve is to have the parallel loop request chunks along the way instead of pre-allocating items to each thread.

If your IEnumerable was really something that had a an indexer (i.e you could do obj[1] to get a item out) you could do the following
var rangePartitioner = Partitioner.Create(0, source.Length);
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
// Loop over each range element without a delegate invocation.
for (int i = range.Item1; i < range.Item2; i++)
{
var item = source[i]
//Do work on item
}
});
However if it can't do that you must write a custom partitioner by creating a new class derived from System.Collections.Concurrent.Partitioner<TSource>. That subject is too broad to cover in a SO answer but you can take a look at this guide on the MSDN to get you started.
UPDATE: As of .NET 4.5 they added a Partitioner.Create overload that does not buffer data, it has the same effect of making a custom partitioner with a range max size of 1. With this you won't get a single thread that has a bunch of queued up work if it got unlucky with a bunch of slow items in a row.
var partitoner = Partitioner.Create(source, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitoner, item =>
{
//Do work
}

Related

Is it possible to determine if a list operation is thread safe?

I am attempting to do a comparison for each element X in a list, ListA, if two properties of X, X.Code and X.Rate, have match the Code and Rate of any element Y in ListB. The current solution uses LINQ and AsParallel to execute these comparisons (time is a factor and each list can contain anywhere from 0 elements to a couple hundred elements each).
So far the AsParallel method seems much faster, however I am not sure that these operations are thread-safe. My understanding is that because this comparison will only be reading values and not modifying them that this should be safe but I am not 100% confident. How can I determine if this operation is thread-safe before unleashing it on my production environment?
Here is the code I am working with:
var s1 = System.Diagnostics.Stopwatch.StartNew();
ListA.AsParallel().ForAll(x => x.IsMatching = ListB.AsParallel().Any(y => x.Code== y.Code && x.Rate== y.Rate));
s1.Stop();
var s2 = System.Diagnostics.Stopwatch.StartNew();
ListA.ForEach(x => x.IsMatching = ListB.Any(y => x.Code == y.Code && x.Rate== y.Rate));
s2.Stop();
Currently each method returns the same result, however the AsParallel() executes in ~1/3 the time as the plain ForEach, so I hope to benefit from that if there is a way to perform this operation safely.

The code you have is thread-safe. The lists are being accessed as read-only, and the implicit synchronization required to implement the parallelized version is sufficient to ensure any writes have been committed. You do modify the elements within the list, but again, the synchronization implicit in the parallel operation, with which the current thread necessarily has to wait on, will ensure any writes to the element objects are visible in the current thread.
That said, the thread safety is irrelevant, because you are doing the whole thing wrong. You are applying a brute force, O(N^2) algorithm to a need that can be addressed using a more elegant and efficient solution, the LINQ join:
var join = from x in list1
join y in list2 on new { x.Code, x.Rate } equals new { y.Code, y.Rate }
select x;
foreach (A a in join)
{
a.IsMatching = true;
}
Your code example didn't include any initialization of sample data. So I can't reproduce your results with any reliability. Indeed, in my test set, where I initialized list1 and list2 identically, with each having the same 1000 elements (I simply set Code and Rate to the element's index in the list, i.e. 0 through 999), I found the AsParallel() version slower than the serial version, by a little more than 25% (i.e. 250 iterations of the parallel version took around 2.7 seconds, while 250 iterations of the serial version took about 1.9 seconds).
But neither came close to the join version, which completed 250 iterations of that particular test data in about 60 milliseconds, almost 20 times faster than the faster of the other two implementations.
I'm reasonably confident that in spite of my lack of a comparable data set relative to your scenario, that the basic result will still stand, and that you will find the use of the join approach far superior to either of the options you've tried so far.

Batching Tasks and Running them in Parallel

I have a List of items, and I would like to go through each item, create a task and launch the task. But, I want it to do batches of 10 tasks at once.
For example, if I have 100 URL's in a list, I want it to group them into batches of 10, and loop through batches getting the web response from 10 URL's per batch iteration.
Is this possible?
I am using C# 5 and .NET 4.5.

You can use Parallel.For() or Parallel.ForEach(), they will execute the work on a number of Tasks.
When you need precise control over the batches you could use a custom Partitioner but given that the problem is about URLs it will probably make more sense to use the more common MaxDegreeOfParallelism option.

The Partitioner has a good algorithm for creating the batches depending also on the number of cores.
Parallel.ForEach(Partitioner.Create(from, to), range =>
{
for (int i = range.Item1; i < range.Item2; i++)
{
// ... process i
}
});

Pull elements from array with property value under or over X

I have a very large list of a custom class. I often need to perform a task based on only elements from the list where a custom value of the class is over or under a specific threshold.
Currently, I do something like this:
//Sort the customList by it's X value (sometimes ascending, sometimes descending)
customList.Sort((a, b) => b.X.CompareTo(a.X));
//Iterate through array until the X value is not within the necessary range
for (int i = 0; i < customList.Count; i++)
{
if (customList[i].X < .5f) break;
PerformTask(customList[i]);
}
This isn't a huge bottleneck, but it would be best if I can speed up this kind of task for this application (not to mention I am always wanting to learn things like this).
So the question is, is there a much faster sorting method without writing it myself and/or is there a faster way to run PerformTask on the elements meeting specific criteria without iterating over all elements?
My question might also be better asked in regards to keeping a list sorted not just when adding/removing items, but also when changing the values they are sorted on...
Thanks,
Tim

Sorting is the wrong approach here. It's O(n log n) with a very efficient algorithm. Use Enumerable.Where:
foreach (var item in customList.Where(n => n.X > 0.5f))
{
PerformTask(item);
}

How to optimize large size for loop

I have a for loop with more than 20k iterations,for each iteration it is taking around two or three seconds and total around 20minutes. how i can optimize this for loop. I am using .net3.5 so parallel foreach is not possible. so i splited the 200000 nos into small chunks and implemented some threading now i am able reduce the time by 50%. is there any other way to optimize these kind of for loops.
My sample code is given below
static double sum=0.0;
public double AsyncTest()
{
List<Item> ItemsList = GetItem();//around 20k items
int count = 0;
bool flag = true;
var newItemsList = ItemsList.Take(62).ToList();
while (flag)
{
int j=0;
WaitHandle[] waitHandles = new WaitHandle[62];
foreach (Item item in newItemsList)
{
var delegateInstance = new MyDelegate(MyMethod);
IAsyncResult asyncResult = delegateInstance.BeginInvoke(item.id, new AsyncCallback(MyAsyncResults), null);
waitHandles[j] = asyncResult.AsyncWaitHandle;
j++;
}
WaitHandle.WaitAll(waitHandles);
count = count + 62;
newItemsList = ItemsList.Skip(count).Take(62).ToList();
}
return sum;
}
public double MyMethod(int id)
{
//Calculations
return sum;
}
static public void MyAsyncResults(IAsyncResult iResult)
{
AsyncResult asyncResult = (AsyncResult) iResult;
MyDelegate del = (MyDelegate) asyncResult.AsyncDelegate;
double mySum = del.EndInvoke(iResult);
sum = sum + mySum;
}

It's possible to reduce number of loops by various techniques. However, this won't give you any noticeable improvement since the heavy computation is performed inside your loops. If you've already parallelized it to use all your CPU cores there is not much to be done. There is a certain amount of computation to be done and there is a certain computer power available. You can't squeeze from your machine more than it can provide.
You can try to:
Do a more efficient implementation of your algorithm if it's possible
Switch to faster environment/language, such as unmanaged C/C++.

Is there a rationale behind your batches size (62)?
Is "MyMethod" method IO bound or CPU bound?
What you do in each cycle is wait till all the batch completes and this wastes some cycles (you are actually waiting for all 62 calls to complete before taking the next batch).
Why won't you change the approach a bit so that you still keep N operations running simultaneosly, but you fire a new operation as soon as one of the executind operations completes?

According to this blog, for loops are more faster than foreach in case of collections. Try looping with for. It will help.

It sounds like you have a CPU intensive MyMethod. For CPU intensive tasks, you can gain a significant improvement by parallelization, but only to the point of better utilizing all CPU cores. Beyond that point, too much parallelization can start to hurt performance -- which I think is what you're doing. (This is unlike I/O intensive tasks where you pretty much parallelize as much as possible.)
What you need to do, in my opinion, is write another method that takes a "chunk" of items (not a single item) and returns their "sum":
double SumChunk(IEnumerable<Item> items)
{
return items.Sum(x => MyMethod(x));
}
Then divide the number of items by n (n being the degree of parallelism -- try n = number of CPU cores, and compare that to x2) and pass each chunk to an async task of SumChunk. And finally, sum up the sub-results.
Also, watch if any of the chunks is completed much before the other ones. If that's the case, then your task distributions is not homogen. You'd need to create smaller chunks (say chunks of 300 items) and pass those to SumChunk.

Correct me if I'm wrong, but it looks to me like your threading is at the individual item level - I wonder if this may be a little too granular.
You are already doing your work in blocks of 62 items. What if you were to take those items and process all of them within a single thread? I.e., you would have something like this:
void RunMyMethods(IEnumerable<Item> items)
{
foreach(Item item in items)
{
var result = MyMethod(item);
...
}
}
Keep in mind that WaitHandle objects can be slower than using Monitor objects: http://www.yoda.arachsys.com/csharp/threads/waithandles.shtml
Otherwise, the usual advice holds: profile the performance to find the true bottlenecks. In your question you state that it takes 2-3 seconds per iteration - with 20000 iterations, it would take a fair bit more than 20 minutes.
Edit:
If you are wanting to maximise your usage of CPU time, then it may be best to split your 20000 items into, say, four groups of 5000 and process each group in its own thread. I would imagine that this sort of "thick 'n chunky" concurrency would be more efficient than a very fine-grained approach.

To start with, the numbers just don't add:
20k iterations,for each iteration it is taking around two or three seconds and total around 20minutes
That's a x40 'parallelism factor' - you can never achieve that running on a normal machine.
Second, when 'optimizing' a CPU intensive computation, there's no sense in parallelizing beyond the number of cores. Try dropping that magical 62 to 16 and bench test - it will actually run faster.
I ran a deformed malversion of your code on my laptop, and got some 10-20% improvement using Parallel.ForEach
So maybe you can make it run 17 minutes instead of 20 - does it really matter ?

Start new threads in each loop and control amount of threads to be not more than x-amount at the same time

I have been writing "linear" winforms for couple of months and now I am trying to figure out threading.
this is my loop that has around 40,000 rows and it takes around 1 second to perform a task on this row:
foreach (String CASE in MAIN_CASES_LIST)
{
//bunch of code here
}
How do I
put each loop into separate thread
maintain no more than x-amount of threads at the same time

If you're on .NET 4 you can utilize Parallel.ForEach
Parallel.ForEach(MAIN_CASES_LIST, CASE =>
{
//bunch of code here
});

There's a great library called SmartThreadPool which may be useful here, it does a lot of useful stuff with threading and queueing, abstracting most of this away from you
Not sure if it will help you, but you can queue up a crapload of work items, limit the number of threads, etc etc
http://www.codeproject.com/Articles/7933/Smart-Thread-Pool
Of course if you want to get your hands dirty with multi-threading or use Parallel go for it, it's just a suggestion :)

To mix the answers above, and add a limit to the maximum number of threads created, you can use this overloaded call. Just be sure to add "using System.Threading.Tasks;" at the top.
LinkedList<String> theList = new LinkedList<string>();
ParallelOptions parOptions = new ParallelOptions();
parOptions.MaxDegreeOfParallelism = 5; //only up to 5 threads allowed.
Parallel.ForEach(theList.AsEnumerable(), parOptions , (string CASE) =>
{
//bunch of code here
});

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.