Nested Parallel.ForEach Loops on the same list? - c#

I need to parallelize a method that does an exhaustive pairwise comparison on elements in a list. The serial implementation is straight-forward:
foreach (var element1 in list)
foreach (var element2 in list)
foo(element1, element2);
In this case, foo won't alter the state of element1 or element2. I know it's not safe to simply do nested Parallel.ForEach statements:
Parallel.ForEach(list, delegate(A element1)
{
Parallel.ForEach(list, delegate(A element2)
{
foo(element1, element2);
});
});
What would be the ideal way to implement this using the parallel tasks library?

At least if you are executing the code on a machine where the number of cores is at least twice the number of items in the list, I'm not sure it is a good idea to do embedded Parallel.ForEachs.
In other words, if you target a quad-core, and the list has one thousand items, just parallelize the parent loop. Parallelizing both loops would not make the code faster, but rather much, much slower, since parallel tasks have performance cost.
alt text http://www.freeimagehosting.net/uploads/ca97f403f8.png
At each iteration, a few milliseconds will be lost by Parallel.ForEach to determine which thread must execute the next iteration. Let's say you have a set of 7 items. If you parallelize the parent loop, those milliseconds will be lost 7 times. If you parallelize both loops, they will be lost 7×7=49 times instead. Larger is the set, bigger is the overheat.

Couldn't you just have one Parallel and one normal loop? So either
Parallel.ForEach(list, delegate(A element1)
{
foreach(A element2 in list)
foo(element1, element2)
});
or
foreach(A element1 in list)
{
Parallel.ForEach(list, delegate(A element2)
{
foo(element1, element2);
});
}
Should speed it up as well. There was never going to be a thread per cycle anyway, so this would probably be just as fast or slightly slower than nested parallel loops.

The two nested loops essentially mean that you want to foo the cartessian product of the list with itself. You can parallelize the entire operation by first creating all pairs in a temporary list, then iterating over that list with Parallel.ForEach.
EDIT: Instead of creating a list of all combinations, you can use an iterator to return a 2-element tuple with the combination. Parallel.ForEach will still parallelize the processing of the tuples.
The following sample prints out the current iteration step to show that results come back out-of-order, as would be expected during parallel processing:
const int SIZE = 10;
static void Main(string[] args)
{
List<int> list = new List<int>(SIZE);
for(int i=0;i<SIZE;i++)
{
list.Add(i);
}
Parallel.ForEach(GetCombinations(list),(t,state,l)=>
Console.WriteLine("{0},{1},{2}",l,t.Item1,t.Item2));
}
static IEnumerable<Tuple<int,int>> GetCombinations(List<int> list)
{
for(int i=0;i<list.Count;i++)
for(int j=0;j<list.Count;j++)
yield return Tuple.Create(list[i],list[j]);
}

Related

Parallel.ForEach loop is not working "it skips some and double do others"

I have 2 methods that can do the work for me, one is serial and the other one is parallel.
The reason for parallelization is because there are lots of iteration(about 100,000 or so)
For some reason, the parallel one do skip or double doing some iterations, and I don't have any clue how to debug it.
The serial method
for(int i = somenum; i >= 0; i-- ){
foreach (var nue in nuelist)
{
foreach (var path in nue.pathlist)
{
foreach (var conn in nue.connlist)
{
Func(conn,path);
}
}
}
}
The parallel method
for(int i = somenum; i >= 0; i-- ){
Parallel.ForEach(nuelist,nue =>
{
Parallel.ForEach(nue.pathlist,path=>
{
Parallel.ForEach(nue.connlist, conn=>
{
Func(conn,path);
});
});
});
}
Inside Path class
Nue firstnue;
public void Func(Conn conn,Path path)
{
List<Conn> list = new(){conn};
list.AddRange(path.list);
_ = new Path(list);
}
public Path(List<Conn>)
{
//other things
firstnue.pathlist.Add(this);
/*
firstnue is another nue that will be
in the next iteration of for loop
*/
}
They are both the same method except, of course, foreach and Parallel.ForEach loop.
the code is for the code in here (GitHub page)
List<T>, which I assume you use with firstnue.pathlist, isn't thread-safe. That means, when you add/remove items from the same List<T> from multiple threads at the same time, your data will get corrupt. In order to avoid that problem, the simplest solution is to use a lock, so multiple threads doesn't try to modify list at once.
However, a lock essentially serializes the list operations, and if the only thing you do in Func is to change a list, you may not gain much by parallelizing the code. But, if you still want to give it a try, you just need to change this:
firstnue.pathlist.Add(this);
to this:
lock (firstnue.pathlist)
{
firstnue.pathlist.Add(this);
}
Thanks to sedat-kapanoglu, I found the problem is really about thread safety. The solution was to change every List<T> to ConcurrentBag<T>.
For everyone, like me, The solution of "parallel not working with collections" is to change from System.Collections.Generic to System.Collections.Concurrent

Iterate though sequence and then call Count() or create a List by the start and then call Count

The language I use is c#.
Let that we want to iterate through the elements of a sequence called customers, which is a sequence of objects of a fictional type called Customer. In terms of code, let that we have the following:
IEnumerable<Customer> customers = module.GetCustomers();
where module is an service layer's class through one of it's methods, we can retrieve all the customers. That being sain the iteration through the elements of customers would be:
foreach(var customer in customers)
{
}
Let now that we want after having iterated through the elements of customers to get the number of customers. That could be done like below:
int numberOfCustomers = customers.Count();
My concern/question now is the following:
Using the Count() method we iterate again through the elements of customers. However, if we had created an in memory collection of this objects, calling for instance the method ToList():
List<Customer> customers = module.GetCustomers()
.ToList();
we would have the number of customers in O(1),using the Count property of the list customers.
In order to find out between these two options, which is the best one, I wrote a simple console app and I used the StopWatch class to profile them. However, I didn't come in a clear result.
Which of these two options is the best one?
UPDATE
I ran the following console app:
class Program
{
static void Main(string[] args)
{
IEnumerable<int> numbers = Enumerable.Range(0, 1000);
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
foreach (var number in numbers)
Console.WriteLine(number);
Console.WriteLine(numbers.Count());
stopwatch.Stop();
// I got 175ms
Console.WriteLine(stopwatch.ElapsedMilliseconds);
Console.ReadKey();
stopwatch.Restart();
List<int> numbers2 = numbers.ToList();
foreach (var number in numbers2)
Console.WriteLine(number);
Console.WriteLine(numbers2.Count);
stopwatch.Stop();
// I got 86ms
Console.WriteLine(stopwatch.ElapsedMilliseconds);
Console.ReadKey();
}
}
Then I ran this:
class Program
{
static void Main(string[] args)
{
IEnumerable<int> numbers = Enumerable.Range(0, 1000);
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
List<int> numbers2 = numbers.ToList();
foreach (var number in numbers2)
Console.WriteLine(number);
Console.WriteLine(numbers2.Count);
stopwatch.Stop();
// I got 167ms
Console.WriteLine(stopwatch.ElapsedMilliseconds);
Console.ReadKey();
stopwatch.Restart();
foreach (var number in numbers)
Console.WriteLine(number);
Console.WriteLine(numbers.Count());
stopwatch.Stop();
// I got 104ms
Console.WriteLine(stopwatch.ElapsedMilliseconds);
Console.ReadKey();
}
}
I usually prefer to make my repository methods return an IReadOnlyCollection<>, which helps callers to know that they can safely iterate it multiple times:
IReadOnlyCollection<Customer> customers = module.GetCustomers();
If I can't do that, and I know that I'm going to iterate over what I'm given multiple times, I'll typically use .ToList() to make sure I'm dealing with an in-memory collection:
var customers = module.GetCustomers().ToList();
In cases where customers was already an in-memory collection, this adds a little overhead by creating a list, but it helps to avoid the risk of creating an enormous amount of overhead by doing something like retrieving the data from the database multiple times.
Your benchmark is flawed for a few reasons, but one of the biggest reasons is that it's using Console.WriteLine(), which performs an I/O operation. That operation will take far, far longer than iterating the collections and counting the results, combined. In fact, the variance in the amount of time spent in Console.WriteLine() will outweigh the differences in the code you're testing.
But this actually illustrates my point well--I/O operations take vastly longer than CPU and memory operations, so it's often worthwhile to add .ToList(), which will probably add microseconds to the run time, in order to avoid the slightest possibility of adding I/O operations, which can add milliseconds.

What is the most efficient loop in c#

There are a number of different way to accomplish the same simple loop though the items of an object in c#.
This has made me wonder if there is any reason be it performance or ease of use, as to use on over the other. Or is it just down to personal preference.
Take a simple object
var myList = List<MyObject>;
Lets assume the object is filled and we want to iterate over the items.
Method 1.
foreach(var item in myList)
{
//Do stuff
}
Method 2
myList.Foreach(ml =>
{
//Do stuff
});
Method 3
while (myList.MoveNext())
{
//Do stuff
}
Method 4
for (int i = 0; i < myList.Count; i++)
{
//Do stuff
}
What I was wondering is do each of these compiled down to the same thing? is there a clear performance advantage for using one over the others?
or is this just down to personal preference when coding?
Have I missed any?
The answer the majority of the time is it does not matter. The number of items in the loop (even what one might consider a "large" number of items, say in the thousands) isn't going to have an impact on the code.
Of course, if you identify this as a bottleneck in your situation, by all means, address it, but you have to identify the bottleneck first.
That said, there are a number of things to take into consideration with each approach, which I'll outline here.
Let's define a few things first:
All of the tests were run on .NET 4.0 on a 32-bit processor.
TimeSpan.TicksPerSecond on my machine = 10,000,000
All tests were performed in separate unit test sessions, not in the same one (so as not to possibly interfere with garbage collections, etc.)
Here's some helpers that are needed for each test:
The MyObject class:
public class MyObject
{
public int IntValue { get; set; }
public double DoubleValue { get; set; }
}
A method to create a List<T> of any length of MyClass instances:
public static List<MyObject> CreateList(int items)
{
// Validate parmaeters.
if (items < 0)
throw new ArgumentOutOfRangeException("items", items,
"The items parameter must be a non-negative value.");
// Return the items in a list.
return Enumerable.Range(0, items).
Select(i => new MyObject { IntValue = i, DoubleValue = i }).
ToList();
}
An action to perform for each item in the list (needed because Method 2 uses a delegate, and a call needs to be made to something to measure impact):
public static void MyObjectAction(MyObject obj, TextWriter writer)
{
// Validate parameters.
Debug.Assert(obj != null);
Debug.Assert(writer != null);
// Write.
writer.WriteLine("MyObject.IntValue: {0}, MyObject.DoubleValue: {1}",
obj.IntValue, obj.DoubleValue);
}
A method to create a TextWriter which writes to a null Stream (basically a data sink):
public static TextWriter CreateNullTextWriter()
{
// Create a stream writer off a null stream.
return new StreamWriter(Stream.Null);
}
And let's fix the number of items at one million (1,000,000, which should be sufficiently high to enforce that generally, these all have about the same performance impact):
// The number of items to test.
public const int ItemsToTest = 1000000;
Let's get into the methods:
Method 1: foreach
The following code:
foreach(var item in myList)
{
//Do stuff
}
Compiles down into the following:
using (var enumerable = myList.GetEnumerable())
while (enumerable.MoveNext())
{
var item = enumerable.Current;
// Do stuff.
}
There's quite a bit going on there. You have the method calls (and it may or may not be against the IEnumerator<T> or IEnumerator interfaces, as the compiler respects duck-typing in this case) and your // Do stuff is hoisted into that while structure.
Here's the test to measure the performance:
[TestMethod]
public void TestForEachKeyword()
{
// Create the list.
List<MyObject> list = CreateList(ItemsToTest);
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle through the items.
foreach (var item in list)
{
// Write the values.
MyObjectAction(item, writer);
}
// Write out the number of ticks.
Debug.WriteLine("Foreach loop ticks: {0}", s.ElapsedTicks);
}
}
The output:
Foreach loop ticks: 3210872841
Method 2: .ForEach method on List<T>
The code for the .ForEach method on List<T> looks something like this:
public void ForEach(Action<T> action)
{
// Error handling omitted
// Cycle through the items, perform action.
for (int index = 0; index < Count; ++index)
{
// Perform action.
action(this[index]);
}
}
Note that this is functionally equivalent to Method 4, with one exception, the code that is hoisted into the for loop is passed as a delegate. This requires a dereference to get to the code that needs to be executed. While the performance of delegates has improved from .NET 3.0 on, that overhead is there.
However, it's negligible. The test to measure the performance:
[TestMethod]
public void TestForEachMethod()
{
// Create the list.
List<MyObject> list = CreateList(ItemsToTest);
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle through the items.
list.ForEach(i => MyObjectAction(i, writer));
// Write out the number of ticks.
Debug.WriteLine("ForEach method ticks: {0}", s.ElapsedTicks);
}
}
The output:
ForEach method ticks: 3135132204
That's actually ~7.5 seconds faster than using the foreach loop. Not completely surprising, given that it uses direct array access instead of using IEnumerable<T>.
Remember though, this translates to 0.0000075740637 seconds per item being saved. That's not worth it for small lists of items.
Method 3: while (myList.MoveNext())
As shown in Method 1, this is exactly what the compiler does (with the addition of the using statement, which is good practice). You're not gaining anything here by unwinding the code yourself that the compiler would otherwise generate.
For kicks, let's do it anyways:
[TestMethod]
public void TestEnumerator()
{
// Create the list.
List<MyObject> list = CreateList(ItemsToTest);
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
// Get the enumerator.
using (IEnumerator<MyObject> enumerator = list.GetEnumerator())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle through the items.
while (enumerator.MoveNext())
{
// Write.
MyObjectAction(enumerator.Current, writer);
}
// Write out the number of ticks.
Debug.WriteLine("Enumerator loop ticks: {0}", s.ElapsedTicks);
}
}
The output:
Enumerator loop ticks: 3241289895
Method 4: for
In this particular case, you're going to gain some speed, as the list indexer is going directly to the underlying array to perform the lookup (that's an implementation detail, BTW, there's nothing to say that it can't be a tree structure backing the List<T> up).
[TestMethod]
public void TestListIndexer()
{
// Create the list.
List<MyObject> list = CreateList(ItemsToTest);
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle by index.
for (int i = 0; i < list.Count; ++i)
{
// Get the item.
MyObject item = list[i];
// Perform the action.
MyObjectAction(item, writer);
}
// Write out the number of ticks.
Debug.WriteLine("List indexer loop ticks: {0}", s.ElapsedTicks);
}
}
The output:
List indexer loop ticks: 3039649305
However the place where this can make a difference is arrays. Arrays can be unwound by the compiler to process multiple items at a time.
Instead of doing ten iterations of one item in a ten item loop, the compiler can unwind this into five iterations of two items in a ten item loop.
However, I'm not positive here that this is actually happening (I have to look at the IL and the output of the compiled IL).
Here's the test:
[TestMethod]
public void TestArray()
{
// Create the list.
MyObject[] array = CreateList(ItemsToTest).ToArray();
// Create the writer.
using (TextWriter writer = CreateNullTextWriter())
{
// Create the stopwatch.
Stopwatch s = Stopwatch.StartNew();
// Cycle by index.
for (int i = 0; i < array.Length; ++i)
{
// Get the item.
MyObject item = array[i];
// Perform the action.
MyObjectAction(item, writer);
}
// Write out the number of ticks.
Debug.WriteLine("Enumerator loop ticks: {0}", s.ElapsedTicks);
}
}
The output:
Array loop ticks: 3102911316
It should be noted that out-of-the box, Resharper offers a suggestion with a refactoring to change the above for statements to foreach statements. That's not to say this is right, but the basis is to reduce the amount of technical debt in code.
TL;DR
You really shouldn't be concerned with the performance of these things, unless testing in your situation shows that you have a real bottleneck (and you'll have to have massive numbers of items to have an impact).
Generally, you should go for what's most maintainable, in which case, Method 1 (foreach) is the way to go.
In regards to the final bit of the question, "Did I miss any?" Yes, and I feel I would be remiss to not mention this even though the question is quite old. While those four ways of doing it will execute in relatively the same amount of time, there is a way not shown above that runs faster than all of them. Quite significantly, in fact, as the number of items in the iterated list increases. It would be the exact same way as the last method but instead of getting .Count in the condition check of the loop, you assign this value to a variable before setting up the loop and use that instead. Which leaves you with something like this:
var countVar = list.Count;
for(int i = 0; i < countVar; i++)
{
//loop logic
}
By doing it this way, you're only looking up a variable value at each iteration, rather than resolving the Count or Length properties, which is considerably less efficient.
I would suggest an even better and not well-known approach for faster loop iteration over a list. I would recommend you to first read about Span<T>. Note that you can use it if you are using .NET Core.
List<MyObject> list = new();
foreach (MyObject item in CollectionsMarshal.AsSpan(list))
{
// Do something
}
Be aware of the caveats:
The CollectionsMarshal.AsSpan method is unsafe and should be used only if you know what you're doing. CollectionsMarshal.AsSpan returns a Span<T> on the private array of List<T>. Iterating over a Span<T> is fast as the JIT uses the same tricks as for optimizing arrays. Using this method, it won't check the list is not modified during the enumeration.
This is a more detailed explanation of what it does behind the scenes and more, super interesting!

Fastest and still safe way to exchange reference variables values

Basically what i need is to be able to add items to List (or another collection) constantly, around 3000 times per second in one thread. And to get and remove all items from that list once per 2 seconds.
I don't like classic ways to do this like using concurrent collections or lock on something every time i need to access collection because it would be slower than i need.
What i'm trying to do is to have 2 collections, one for each thread, and to find a way to make a thread safe switch from one collection to another.
Simplified and not thread-safe example:
var listA = new List<int>();
var listB = new List<int>();
// method is called externally 3000 times per second
void ProducerThread(int a)
{
listA.Add(a)
}
void ConsumerThread()
{
while(true)
{
Thread.Sleep(2000);
listB = Interlocked.Exchange(ref listA,listB);
//... processing listB data
// at this point when i'm done reading data
// producer stil may add an item because ListA.Add is not atomic
// correct me if i'm wrong
listB.Clear();
}
}
Is there any way to make above code work as intended (to be thread safe) while having producer thread blocked as little as possible? Or maybe another solution?
I would start out by using a BlockingCollection or another IProducerConsomerCollection in System.Collections.Concurrent. That is exactly what you have, a producer/consumer queue that is accessed from multiple threads. Those collections are also heavily optimized for performance. They don't use use a naive "lock the whole structure anytime anyone does any operation". They are smart enough to avoid locking wherever possible using lock-free synchronization techniques, and when they do need to use critical sections they can minimize what needs to be locked on such that the structure can often be accessed concurrently despite a certain amount of locking.
Before I move from there to anything else I would use one of those collections and ensure that it is too slow. If, after using that as your solution you have demonstrated that you are spending an unacceptable amount of time adding/removing items from the collection then you could consider investigating other solutions.
If, as I suspect will be the case, they perform quick enough, I'm sure you'll find that it makes writing the code much easier and clearer to read.
I am assuming that you just want to process new additions to listA, and that while you process these additions more additions are made.
var listA = new List<int>();
var dictA = new Dictionary<int,int>();
int rangeStart = 0;
int rangeEnd = 0;
bool protectRange = false;
// method is called externally 3000 times per second
void ProducerThread(int a)
{
listA.Add(a);
dictA.Add(rangeEnd++,a);
}
void ConsumerThread()
{
while(true)
{
Thread.Sleep(2000);
int rangeInstance = rangeEnd;
var listB = new List<int>();
for( int start = rangeStart; start < rangeInstance; start++ ){
listB.add(dictA[start]);
rangeStart++;
}
//... processing listB data
}
}
If the table has a fixed maximum size, why use a List? You could also pre-set the list size.
List<int> listA = new List<int>(6000);
Now, I haven't really test the following, but I think it would do what you want:
int[] listA = new int[6000]; // 3000 time * 2 seconds
int i = 0;
// method is called externally 3000 times per second
void ProducerThread(int a)
{
if (Monitor.TryEnter(listA)) // If true, consumer is in cooldown.
{
listA[i] = a;
i++;
Monitor.Exit(listA);
}
}
void ConsumerThread()
{
Monitor.Enter(listA); // Acquire thread lock.
while (true)
{
Monitor.Wait(listA, 2000); // Release thread lock for 2000ms, automaticly retake it after Producer released it.
foreach (int a in listA) { } //Processing...
listA = new int[6000];
i = 0;
}
}
You just need to be sure to have ConsumerThread run first, so it would queue itself and wait.

Top 100 values in a Dictionary<string, int> - Why is LinQ so much faster than a foreach loop

I am writing a simple app to parse a huge textfile(60gb) and store all the words and the amount of time it appears in the file. For testing sake I cut the file down to 2gb.
I have the words and the counts in a Dictionary though I'm finding it hard to believe the results I'm seeing.
Total words in the dictionary: 1128495
Code I'm using:
sw.Start();
StringBuilder sb = new StringBuilder();
sb.AppendFormat("<html><head></head><body>");
lock (Container.values)
{
int i = int.Parse(ctx.Request.QueryString["type"]);
switch (i)
{
case 1: //LinQ
var values = Container.values.OrderByDescending(a => a.Value.Count).Take(100);
foreach (var value in values)
{
sb.AppendFormat("{0} - {1}<br />", value.Key, value.Value.Count);
}
break;
case 2: //Foreach
foreach (var y in Container.values)
{
}
break;
case 3: //For
for (int x = 0; x < Container.values.Count; x++)
{
}
break;
}
}
sw.Stop();
sb.AppendFormat("<br /><br /> {0}", sw.ElapsedMilliseconds);
sb.AppendFormat("</body>");
Ran it twice, speeds below are in milliseconds:
LinQ: #1: 598, #2 609
Foreach: #1 1000, # 1020
Why is LinQ faster than a foreach? I assume LinQ has to loop through the Dictionary itself so how does it go about that + sorting it all in such a timely manner?
Edit:
After compiling to Release mode the results are as follows:
LinQ: 796(slower?)
foreach: 945
The app is a simple console app, the code is executing in a HttpListener
Edit 2:
I have managed to figure out what the issue was. When I initialized the dictionary I set its capacity to be 89000000(when processing the 60gb file it would throw an OutOfMemory exception otherwise). For some reason this drastically slows the performance of the foreach loop. If I set the capacity to 1128495 the foreach loop executes in 56 milliseconds.
Why is this happening? If I put a counter in the loop it only runs 1128495 times even with a capacity of 89000000.
A foreach loop is implemented by the compiler by calling GetEnumerator() and then calling MoveNext and Current repeatedly on the enumerator. LINQ's OrderByDescending normally works exactly the same way, it basically does a foreach to extract all the elements and then it sorts them.
A quick look in ILSpy shows that OrderByDescending puts the container in an internal type called Buffer<T>, which has an optimization: in case the container implements ICollection<T>, it uses ICollection<T>.CopyTo instead of a foreach loop. Usually OrderByDescending would still not be faster than a foreach loop, because after extracting the elements it has to sort them.
Are you leaving out the code in your foreach loop, code that might explain why it's slower? If you really are using an empty foreach loop, perhaps the explanation is that the IEnumerator<T> type (or GetEnumerator method) of Container.values is slow compared to its CopyTo method.
Your LINQ version only takes the first 100 elements!
Remove the .Take(100) in order to compare!

Categories

Resources