Thread Index in PLINQ Query? - c#

Is there way to pass the thread "index" of a PLINQ query into one of it's operators, like a Select?
The background is that I have a PLINQ Query that does some deserialization, the deserialization method uses a structure (problably an array) to pass data to another method:
ParallelEnumerable.Range(0, nrObjects)
.WithDegreeOfParallelism(8)
.Select(i => this.Deserialize(serializer, i, arrays[threadIndex]))
.ToList();
(threadIndex is the new variable I'd like)
If I knew the thread index I could create 8 of these arrays upfront (even if all might not be used) and reuse them. The Deserialization method might be called millions of times so every small optimization counts..

Re the thread index; note that the degree of parallelism is (IIRC) only the maximum number of threads to use; it doesn't have to use that number. Relying on threadIndex in the way you describe would seem to mean that you might actually only access arrays[0]. So in short, no: I don't think so.
You can, however, get the item index; so if that is what you mean, simply:
.Select((value, itemIndex) => this.Deserialize(
serializer, i, arrays[itemIndex])).ToList();
It sounds (comments) like the intent is for obtaining a working buffer; in which case, I would (and indeed, do) keep a few convenient buffers in a container (with synchronized access etc), i.e.
attempt to obtain an existing buffer from a container (synchronized)
if not found, create a new buffer
(do work)
return/add buffer to the container (synchronized)
This will generally be very fast (much faster than buffer-allocation per iteration)

You can create ConcurrentBag of integers and then take from it before deserealization use as index and then return after. Every thread will occupy one array element at a time.
Consider using Parallel class. It is more appropriate when it comes to arrays. There are limitations on indexed versions of Select if you use ParallelEnumerable (Not in your case, but be cautious)

I would recommend using Parallel.ForEach which offers an overload that allows you to allocate (and cleanup) locals that will be passed into one parallel instance of the loop at a time. I've used this approach for processing of large sets of image data where I require buffers for various steps of the process.

Related

Fast way to build a hash set, single or multiple thread?

i want to know which why is faster to build a hashset.
my process like this:
1, DB access (single thread) , get a large list of IDs.
2,
Plan A
foreach( var oneID in IDs)
{
myHashSet.add(oneID);
}
Plan B
Parallel.ForEach(IDs,myPallOpt,(oneID)=>
{
myHashSet.add(oneID);
});
So which is faster Plan A or B?
Thanks
HashSet<T> is not thread safe, so the second option (using Parallel.ForEach) will likely cause bugs. It should definitely be avoided.
The best option is likely to just build the hashset directly from the results:
var myHashSet = new HashSet<int>(IDs);
Note that this only works if the HashSet is only intended to contain the items from this collection. If you're adding to an existing HashSet<T>, a foreach (your first option) is likely the best option.
Plan B likely won't work because it is likely not thread-safe (most .NET collection classes are not thread safe). You could fix it by making access to it thread-safe, but that essentially means serializing access to it, which is no better than single threaded. The only case it would make sense is if between beginning of your for loop and actual addition you have some cpu intensive processing that would be good to parallelize.

When to use BlockingCollection and when ConcurrentBag instead of List<T>?

The accepted answer to the question "Why does this Parallel.ForEach code freeze the program up?" advises to substitute the List usage by ConcurrentBag in a WPF application.
I'd like to understand whether a BlockingCollection can be used in this case instead?
You can indeed use a BlockingCollection, but there is absolutely no point in doing so.
First off, note that BlockingCollection is a wrapper around a collection that implements IProducerConsumerCollection<T>. Any type that implements that interface can be used as the underlying storage:
When you create a BlockingCollection<T> object, you can specify not
only the bounded capacity but also the type of collection to use. For
example, you could specify a ConcurrentQueue<T> object for first in,
first out (FIFO) behavior, or a ConcurrentStack<T> object for last
in,first out (LIFO) behavior. You can use any collection class that
implements the IProducerConsumerCollection<T> interface. The default
collection type for BlockingCollection<T> is ConcurrentQueue<T>.
This includes ConcurrentBag<T>, which means you can have a blocking concurrent bag. So what's the difference between a plain IProducerConsumerCollection<T> and a blocking collection? The documentation of BlockingCollection says (emphasis mine):
BlockingCollection<T> is used as a wrapper for an
IProducerConsumerCollection<T> instance, allowing removal attempts
from the collection to block until data is available to be removed.
Similarly, a BlockingCollection<T> can be created to enforce an
upper-bound on the number of data elements allowed in the
IProducerConsumerCollection<T> [...]
Since in the linked question there is no need to do either of these things, using BlockingCollection simply adds a layer of functionality that goes unused.
List<T> is a collection designed to use in single thread
applications.
ConcurrentBag<T> is a class of Collections.Concurrent namespace designed
to simplify using collections in multi-thread environments. If you
use ConcurrentCollection you will not have to lock your
collection to prevent corruption by other threads. You can insert
or take data from your collection with no need to write special locking codes.
BlockingCollection<T> is designed to get rid of the requirement of checking if new data is available in the shared collection between threads. if there is new data inserted into the shared collection then your consumer thread will awake immediately. So you do not have to check if new data is available for consumer thread in certain time intervals typically in a while loop.
Whenever you find the need for a thread-safe List<T>, in most cases neither the ConcurrentBag<T> nor the BlockingCollection<T> are going to be your best option. Both collections are specialized for facilitating producer-consumer scenarios, so unless you have more than one threads that are concurrently adding and removing items from the collection, you should look for other options (with the best candidate being the ConcurrentQueue<T> in most cases).
Regarding especially the ConcurrentBag<T>, it's an extremely specialized class targeting mixed producer-consumer scenarios. This means that each worker-thread is expected to be both a producer and a consumer (that adds and removes items from the same collection). It could be a good candidate for the internal storage of an ObjectPool class, but beyond that it is hard to imagine any advantageous usage scenario for this class.
People usually think that the ConcurrentBag<T> is the thread-safe equivalent of a List<T>, but it's not. The similarity of the two APIs is misleading. Calling Add to a List<T> results to adding an item at the end of the list. Calling Add to a ConcurrentBag<T> results instead to the item being added at a random slot inside the bag. The ConcurrentBag<T> is essentially unordered. It is not optimized for being enumerated, and does a lousy job when it is commanded to do so. It maintains internally a bunch of thread-local queues, so the order of its contents is dominated by which thread did what, not by when did something happened. Before each enumeration of the ConcurrentBag<T>, all these thread-local queues are copied to an array, adding pressure to the garbage collector (source code). So for example the line var item = bag.First(); results in a copy of the whole collection, for returning just one element.
These characteristics make the ConcurrentBag<T> a less than ideal choice for storing the results of a Parallel.For/Parallel.ForEach loop.
A better thread-safe substitute of the List<T>.Add is the ConcurrentQueue<T>.Enqueue method. "Enqueue" is a less familiar word than "Add", but it actually does what you expect it to do.
There is nothing that a ConcurrentBag<T> can do that a ConcurrentQueue<T> can't. For example neither collection offers a way to remove a specific item from the collection. If you want a concurrent collection with a TryRemove method that has a key parameter, you could look at the ConcurrentDictionary<K,V> class.
The ConcurrentBag<T> appears frequently in the Task Parallel Library-related examples in Microsoft's documentation. Like here for example.
Whoever wrote the documentation, apparently they valued more the tiny usability advantage of writing Add instead of Enqueue, than the behavioral/performance disadvantage of using the wrong collection. This makes some sense considering that the examples were authored at a time when the TPL was new, and the goal was the fast adoption of the library by developers who were mostly unfamiliar with parallel programming. I get it, Enqueue is a scary word when you see it for the first time. Unfortunately now there is a whole generation of developers that have incorporated the ConcurrentBag<T> in their mental tools, although it has no business being there, considering
how specialized this collection is.
In case you want to collect the results of a Parallel.ForEach loop in exactly the same order as the source elements, you can use a List<T> protected with a lock. In most cases the overhead will be negligible, especially if the work inside the loop is chunky. An example is shown below, featuring the Select LINQ operator for getting the index of each element.
var indexedSource = source.Select((item, index) => (item, index));
List<TResult> results = new();
Parallel.ForEach(indexedSource, parallelOptions, entry =>
{
var (item, index) = entry;
TResult result = GetResult(item);
lock (results)
{
while (results.Count <= index) results.Add(default);
results[index] = result;
}
});
This is for the case that the source is a deferred sequence with unknown size. If you know its size beforehand, it is even simpler. Just preallocate a TResult[] array, and update it in parallel without locking:
TResult[] results = new TResult[source.Count];
Parallel.For(0, source.Count, parallelOptions, i =>
{
results[i] = GetResult(source[i]);
});
The TPL includes memory barriers at the end of task executions, so all the values of the results array will be visible from the current thread (citation).
Yes, you could use BlockingCollection for that. finishedProxies would be defined as:
BlockingCollection<string> finishedProxies = new BlockingCollection<string>();
and to add an item, you would write:
finishedProxies.Add(checkResult);
And when it's done, you could create a list from the contents.

ReaderWriterLock for array

I have an array, that represents an inventory, with nearly 50 elements (items: some costum objects) and I need a readerwritelock for it (okay, i think a simple lock would be enough too). It should support both reference changing and value changing.
As reading and writing to different position of the array is threadsafe (Proof) I want to ensure that multiple read/write operations on the same array position is also threadsafe.
I surely could create 50 readerwriterlocks, but I don't want that ;)
Is there a way to archive this? (I know ConcurrentList/Dictionary/etc. but I want an array...)
Thanks
If you are replacing the references in the array, then this is already safe, since reference swaps are inherently atomic. So you can use:
var val = arr[index];
// now work with val
and
var newVal = ...
arr[index] = newVal;
perfectly safely, at least in terms of avoiding torn references. So one pragmatic option is to make the object immutable, and just employ the above. If you need to change the value, take a local copy, make a new version based from that, and then swap them. If lost updates are a problem, then Interlocked.CompareExchange and a re-apply loop can be used very successfully (i.e. you keep reapplying your change until you "win"). This avoids the need for any locking.
If, however, you are mutating the individual objects, then the game changes. You could make the object internally thread-safe, but this is usually not pretty. You could have a single lock for all the objects. But if you want granular locking then you will need multiple locks.
My advice: make the object immutable and just use the atomic reference-swap trick.
First off, you may not need any locks. Reading and writing with an array of a type where the CPU would handle each read and write atomically, is in and of itself thread-safe (but you might want to put in a memory barrier to avoid stale reads).
That said, just like x = 34 for an integer is threadsafe but x++ is not, if you've writes that depend upon the current value (and which are hence a read and a write), then that is not threadsafe.
If you do need locks, but don't want as many as 50, you could stripe. First set up your striped locks (I'll use simple locks rather than ReaderWriterSlim for smaller example code, the same principle applies):
var lockArray = new object[8];
for(var i =0; i != lockArray.Length; ++i)
lockArray[i] = new object();
Then when you go to use it:
lock(lockArray[idx % 8])
{
//operate on item idx of your array here
}
It's a balance between the simplicity and size of one lock for everything, vs the memory use of one lock for each element.
The big difficulty comes in if an operation on one element depends on that of another, if you need to resize the array, or any other case where you need to have more than one lock. A lot of deadlock situations can be avoided by always acquiring the locks in the same order (so no other thread needing more than one lock will try to get one you already have while holding one you need), but you need to be very careful in these cases.
You also want to make sure that if you are dealing with say, index 3 and index 11, you avoid locking on object 3 twice (I can't think of a way this particular recursive locking would go wrong, but why not just avoid it rather than have to prove it's one of the cases where recursive locking is safe?)

Thread safety of List<T> with One Writer, No Enumerators

While going through some database code looking for a bug unrelated to this question, I noticed that in some places List<T> was being used inappropriately. Specifically:
There were many threads concurrently accessing the List as readers, but using indexes into the list instead of enumerators.
There was a single writer to the list.
There was zero synchronization, readers and writers were accessing the list at the same time, but because of code structure the last element would never be accessed until the method that executed the Add() returned.
No elements were ever removed from the list.
By the C# documentation, this should not be thread safe.
Yet it has never failed. I am wondering, because of the specific implementation of the List (I am assuming internally it's an array that re-allocs when it runs out of space), it the 1-writer 0-enumerator n-reader add-only scenario accidentally thread safe, or is there some unlikely scenario where this could blow up in the current .NET4 implementation?
edit: Important detail I left out reading some of the replies. The readers treat the List and its contents as read-only.
This can and will blow. It just hasn't yet. Stale indices is usually the first thing that goes. It will blow just when you don't want it to. You are probably lucky at the moment.
As you are using .Net 4.0, I'd suggest changing the list to a suitable collection from System.Collections.Concurrent which is guaranteed to be thread safe. I'd also avoid using array indices and switch to ConcurrentDictionary if you need to look up something:
http://msdn.microsoft.com/en-us/library/dd287108.aspx
Because of it has never failed or your application doesn't crash that doesn't mean that this scenario is thread safe. for instance suppose the writer thread does update a field within the list, lets say that is was a long field, at the same time the reader thread reading that field. the value returned maybe a bitwise combination of the two fields the old one and the new one! that could happen because the reader thread start reading the value from memory but before it finishes reading it the writer thread just updated it.
Edit: That of course if we suppose that the reader threads will just read all the data without updating anything, I am sure that they doesn't change the values of the arrays them self but, but they could change a property or field within the value they read. for instance:
for (int index =0 ; index < list.Count; index++)
{
MyClass myClass = list[index];//ok we are just reading the value from list
myClass.SomeInteger++;//boom the same variable will be updated from another threads...
}
This example not talking about thread safe of the list itself rather than the shared variables that the list exposed.
The conclusion is that you have to use a synchronization mechanism such as lock before interaction with the list, even if it has only one writer and no item removed, that will help you prevent tinny bugs and failure scenarios you are dispensable for in the first place.
Thread safety only matters when data is modified more than once at a time. The number of readers does not matter. Even when someone is writing while someone reads, the reader either gets the old data or the new, it still works. The fact that elements can only be accessed after the Add() returns, prevents parts of the element being read seperately. If you would start using the Insert() method readers could get the wrong data.
It follows then, that if the architecture is 32 bits, writing a field bigger than 32 bits, such as long and double, is not a thread safe operation; see the documentation for System.Double:
Assigning an instance of this type is not thread safe on all hardware platforms because the
binary representation of that instance might be too large to assign in a single atomic
operation.
If the list is fixed in size, however, this situation matters only if the List is storing value types greater than 32 bits. If the list is only holding reference types, then any thread safety issues stem from the reference types themselves, not from their storage and retrieval from the List. For instance, immutable reference types are less likely to cause thread safety issues than mutable reference types.
Moreover, you can't control the implementation details of List: that class was mainly designed for performance, and it's likely to change in the future with that aspect, rather than thread safety, in mind.
In particular, adding elements to a list or otherwise changing its size is not thread safe even if the list's elements are 32 bits long, since there is more involved in inserting, adding, or removing than just placing the element in the list. If such operations are needed after other threads have access to the list, then locking access to the list or using a concurrent list implementation is a better choice.
First off, to some of the posts and comments, since when was documentation reliable?
Second, this answer is more to the general question than the specifics of the OP.
I agree with MrFox in theory because this all boils down to two questions:
Is the List class is implemented as a flat array?
If yes, then:
Can a write instruction be preempted in the middle of a write>
I believe this is not the case -- the full write will happen before anything can read that DWORD or whatever. In other words, it will never happen that I write two of the four bytes of a DWORD and then you read 1/2 of the new value and 1/2 of the old one.
So, if you're indexing an array by providing an offset to some pointer, you can read safely without thread-locking. If the List is doing more than just simple pointer math, then it is not thread safe.
If the List was not using a flat array, I think you would have seen it crash by now.
My own experience is that it is safe to read a single item from a List via index without thread-locking. This is all just IMHO though, so take it for what it's worth.
Worst case, such as if you need to iterate through the list, the best thing to do is:
lock the List
create an array the same size
use CopyTo() to copy the List to the array
unlock the List
then iterate through the array instead of the list.
in (whatever you call the .net) C++:
List<Object^>^ objects = gcnew List<Object^>^();
// in some reader thread:
Monitor::Enter(objects);
array<Object^>^ objs = gcnew array<Object^>(objects->Count);
objects->CopyTo(objs);
Monitor::Exit(objects);
// use objs array
Even with the memory allocation, this will be faster than locking the List and iterating through the entire thing before unlocking it.
Just a heads up though: if you want a fast system, thread-locking is your worst enemy. Use ZeroMQ instead. I can speak from experience, message-based synch is the right way to go.

How to speed up routines making use of collections in multithreading scenario

I've an application that makes use of parallelization for processing data.
The main program is in C#, while one of the routine for analyzing data is on an external C++ dll. This library scans data and calls a callback everytime a certain signal is found within the data. Data should be collected, sorted and then stored into HD.
Here is my first simple implementation of the method invoked by the callback and of the method for sorting and storing data:
// collection where saving found signals
List<MySignal> mySignalList = new List<MySignal>();
// method invoked by the callback
private void Collect(int type, long time)
{
lock(locker) { mySignalList.Add(new MySignal(type, time)); }
}
// store signals to disk
private void Store()
{
// sort the signals
mySignalList.Sort();
// file is a object that manages the writing of data to a FileStream
file.Write(mySignalList.ToArray());
}
Data is made up of a bidimensional array (short[][] data) of size 10000 x n, with n variable. I use parallelization in this way:
Parallel.For(0, 10000, (int i) =>
{
// wrapper for the external c++ dll
ProcessData(data[i]);
}
Now for each of the 10000 arrays I estimate that 0 to 4 callbacks could be fired. I'm facing a bottleneck and given that my CPU resources are not over-utilized, I suppose that the lock (together with thousand of callbacks) is the problem (am I right or there could be something else?). I've tried the ConcurrentBag collection but performances are still worse (in line with other user findings).
I thought that a possible solution for use lock-free code would be to have multiple collections. Then it would be necessary a strategy to make each thread of the parallel process working on a single collection. Collections could be for instance inside a dictionary with thread ID as key, but I do not know any .NET facility for this (I should know the threads ID for initialize the dictionary before launching the parallelization). Could be this idea feasible and, in case yes, does exist some .NET tool for this? Or alternatively, any other idea to speed up the process?
[EDIT]
I've followed the Reed Copsey's suggestion and I used the following solution (according to the profiler of VS2010, before the burden for locking and adding to the list was taking 15% of the resources, while now only 1%):
// master collection where saving found signals
List<MySignal> mySignalList = new List<MySignal>();
// thread-local storage of data (each thread is working on its List<MySignal>)
ThreadLocal<List<MySignal>> threadLocal;
// analyze data
private void AnalizeData()
{
using(threadLocal = new ThreadLocal<List<MySignal>>(() =>
{ return new List<MySignal>(); }))
{
Parallel.For<int>(0, 10000,
() =>
{ return 0;},
(i, loopState, localState) =>
{
// wrapper for the external c++ dll
ProcessData(data[i]);
return 0;
},
(localState) =>
{
lock(this)
{
// add thread-local lists to the master collection
mySignalList.AddRange(local.Value);
local.Value.Clear();
}
});
}
}
// method invoked by the callback
private void Collect(int type, long time)
{
local.Value.Add(new MySignal(type, time));
}
thought that a possible solution for use lock-free code would be to have multiple collections. Then it would be necessary a strategy to make each thread of the parallel process working on a single collection. Collections could be for instance inside a dictionary with thread ID as key, but I do not know any .NET facility for this (I should know the threads ID for initialize the dictionary before launching the parallelization). Could be this idea feasible and, in case yes, does exist some .NET tool for this? Or alternatively, any other idea to speed up the process?
You might want to look at using ThreadLocal<T> to hold your collections. This automatically allocates a separate collection per thread.
That being said, there are overloads of Parallel.For which work with local state, and have a collection pass at the end. This, potentially, would allow you to spawn your ProcessData wrapper, where each loop body was working on its own collection, and then recombine at the end. This would, potentially, eliminate the need for locking (since each thread is working on it's own data set) until the recombination phase, which happens once per thread (instead of once per task,ie: 10000 times). This could reduce the number of locks you're taking from ~25000 (0-4*10000) down to a few (system and algorithm dependent, but on a quad core system, probably around 10 in my experience).
For details, see my blog post on aggregating data with Parallel.For/ForEach. It demonstrates the overloads and explains how they work in more detail.
You don't say how much of a "bottleneck" you're encountering. But let's look at the locks.
On my machine (quad core, 2.4 GHz), a lock costs about 70 nanoseconds if it's not contended. I don't know how long it takes to add an item to a list, but I can't imagine that it takes more than a few microseconds. But let's it takes 100 microseconds (I would be very surprised to find that it's even 10 microseconds) to add an item to the list, taking into account lock contention. So if you're adding 40,000 items to the list, that's 4,000,000 microseconds, or 4 seconds. And I would expect one core to be pegged if this were the case.
I haven't used ConcurrentBag, but I've found the performance of BlockingCollection to be very good.
I suspect, though, that your bottleneck is somewhere else. Have you done any profiling?
The basic collections in C# aren't thread safe.
The problem you're having is due to the fact that you're locking the entire collection just to call an add() method.
You could create a thread-safe collection that only locks single elements inside the collection, instead of the whole collection.
Lets look at a linked list for example.
Implement an add(item (or list)) method that does the following:
Lock collection.
A = get last item.
set last item reference to the new item (or last item in new list).
lock last item (A).
unclock collection.
add new items/list to the end of A.
unlock locked item.
This will lock the whole collection for just 3 simple tasks when adding.
Then when iterating over the list, just do a trylock() on each object. if it's locked, wait for the lock to be free (that way you're sure that the add() finished).
In C# you can do an empty lock() block on the object as a trylock().
So now you can add safely and still iterate over the list at the same time.
Similar solutions can be implemented for the other commands if needed.
Any built-in solution for a collection is going to involve some locking. There may be ways to avoid it, perhaps by segregating the actual data constructs being read/written, but you're going to have to lock SOMEWHERE.
Also, understand that Parallel.For() will use the thread pool. While simple to implement, you lose fine-grained control over creation/destruction of threads, and the thread pool involves some serious overhead when starting up a big parallel task.
From a conceptual standpoint, I would try two things in tandem to speed up this algorithm:
Create threads yourself, using the Thread class. This frees you from the scheduling slowdowns of the thread pool; a thread starts processing (or waiting for CPU time) when you tell it to start, instead of the thread pool feeding requests for threads into its internal workings at its own pace. You should be aware of the number of threads you have going at once; the rule of thumb is that the benefits of multithreading are overcome by the overhead when you have more than twice the number of active threads as "execution units" available to execute threads. However, you should be able to architect a system that takes this into account relatively simply.
Segregate the collection of results, by creating a dictionary of collections of results. Each results collection is keyed to some token carried by the thread doing the processing and passed to the callback. The dictionary can have multiple elements READ at one time without locking, and as each thread is WRITING to a different collection within the Dictionary there shouldn't be a need to lock those lists (and even if you did lock them you wouldn't be blocking other threads). The result is that the only collection that has to be locked such that it would block threads is the main dictionary, when a new collection for a new thread is added to it. That shouldn't have to happen often if you're smart about recycling tokens.

Categories

Resources