I am using the below code
var processed = new List<Guid>();
Parallel.ForEach(items, item =>
{
processed.Add(SomeProcessingFunc(item));
});
Is the above code thread safe? Is there a chance of processed list getting corrupted? Or should i use a lock before adding?
var processed = new List<Guid>();
Parallel.ForEach(items, item =>
{
lock(items.SyncRoot)
processed.Add(SomeProcessingFunc(item));
});
thanks.
No! It is not safe at all, because processed.Add is not. You can do following:
items.AsParallel().Select(item => SomeProcessingFunc(item)).ToList();
Keep in mind that Parallel.ForEach was created mostly for imperative operations for each element of sequence. What you do is map: project each value of sequence. That is what Select was created for. AsParallel scales it across threads in most efficient manner.
This code works correctly:
var processed = new List<Guid>();
Parallel.ForEach(items, item =>
{
lock(items.SyncRoot)
processed.Add(SomeProcessingFunc(item));
});
but makes no sense in terms of multithreading. locking at each iteration forces totally sequential execution, bunch of threads will be waiting for single thread.
Use:
var processed = new ConcurrentBag<Guid>();
See parallel foreach loop - odd behavior.
From Jon Skeet's Book C# in Depth:
As part of Parallel Extensions in .Net 4, there are several new collections in a new System.Collections.Concurrent namespace. These are designed to be safe in the face of concurrent operations from multiple threads, with relatively little locking.
These include:
IProducerConsumerCollection<T>
BlockingCollection<T>
ConcurrentBag<T>
ConcurrentQueue<T>
ConcurrentStack<T>
ConcurrentDictionary<TKey, TValue>
and others
As alternative to the answer of Andrey:
items.AsParallel().Select(item => SomeProcessingFunc(item)).ToList();
You could also write
items.AsParallel().ForAll(item => SomeProcessingFunc(item));
This makes the query that is behind it even more efficient because no merge is required, MSDN.
Make sure the SomeProcessingFunc function is thread-safe.
And I think, but didn't test it, that you still need a lock if the list can be modified in an other thread (adding or removing) elements.
Using ConcurrentBag of type Something
var bag = new ConcurrentBag<List<Something>>;
var items = GetAllItemsINeed();
Parallel.For(items,i =>
{
bag.Add(i.DoSomethingInEachI());
});
reading is thread safe, but adding is not. You need a reader/writer lock setup as adding may cause the internal array to resize which would mess up a concurrent read.
If you can guarantee the array won't resize on add, you may be safe to add while reading, but don't quote me on that.
But really, a list is just an interface to an array.
Related
My question is little theoretical I want to know whether List<object> is thread-safe if I used Parallel.For in this way. Please see below:
public static List<uint> AllPrimesParallelAggregated(uint from, uint to)
{
List<uint> result = new List<uint>();
Parallel.For((int)from, (int)to,
() => new List<uint>(), // Local state initializer
(i, pls, local) => // Loop body
{
if (IsPrime((uint)i))
{
local.Add((uint)i);
}
return local;
},
local => // Local to global state combiner
{
lock (result)
{
result.AddRange(local);
}
});
return result;
}
Is local list is thread safe? Whether I've the correct data in result list without data is being alter due to multiple threads as using I'm having normal loop?
Note: I'm not worry about list order. I want to know about length of the list and data.
Is this solution thread-safe? Technically yes, in reality no.
The very notion of a thread-safe List (as opposed to a queue or a bag) is that the list is safe with respect to order, or, more strictly, index, since there is no key other than an ascending integer. In a parallel world, it's sort of a nonsensical concept, when you think about it. That is why the System.Collections.Concurrent namespace contains a ConcurrentBag and a ConcurrentQueue but no ConcurrentList.
Since you are asking about the thread safety of a list, I am assuming the requirement for your software is to generate a list that is in ascending order. If that is the case, no, your solution will not work. While the code is technically thread-safe, the threads may finish in any order and your variable result will end up non-sorted.
If you wish to use parallel computation, you must store your results in a bag, then when all threads are finished sort the bag to generate the ordered list. Otherwise you must perform the computation in series.
And since you have to use a bag anyway, you may as well use ConcurrentBag, and then you won't have to bother with the lock{} statement.
List Is not thread safe. but your current algorithm works. as it was explained in other answer and comments.
This is the description about localInit of For.Parallel
The <paramref name="localInit"/> delegate is invoked once for each thread that participates in the loop's execution and returns the initial local state for each of those threads. These initial states are passed to the first
IMO You are adding unnecessary complexity inside your loop. I would use ConcurrentBag instead. which is thread safe by design.
ConcurrentBag<uint> result = new ConcurrentBag<uint>();
Parallel.For((long) from, (long) to,
(i, PLS) =>
{
if (IsPrime((uint)i))
{
result.Add((uint)i); // this is thread safe. don't worry
}
});
return result.OrderBy(I => I).ToList(); // order if that matters
See concurrent bag here
All public and protected members of ConcurrentBag are thread-safe and may be used concurrently from multiple threads.
A List<T> is not thread-safe. For the way you are using it, thread safety is not required though. You need thread safety when you access a resource concurrently from multiple threads. You are not doing that, since you are working on local lists.
At the end you are adding the contents of the local lists into the result variable. Since you use lock for this operation, you are thread-safe within that block.
So your solution is probably fine.
I have an application that has to create RegistrationInputs. Of every Input I'm turning into a RegistrationInput I save the ID in a hashset of integers to make sure I'm never processing the same Input more than once. I need to make the registrationinputs of my array of inputs asynchronously, but if I see during the creation of one of the RegistrationInputs that any of the values isn't correct, I return null and remove the ID from the hashset.
Is what I'm doing thread-safe? Also is this the best way to asynchronously process data? I already tried Parallel.Foreach with an async lambda but that returns async void so I can't await it.
Inputs[] events = GetInputs();
List<Task<RegistrationInput>> tasks = new List<Task<RegistrationInput>>();
foreach (var ev in events)
tasks.Add(ProcessEvent(ev));
tempInputs = await Task.WhenAll<RegistrationInput>(tasks);
Is what I'm doing thread-safe?
No, a HashSet<T> is not thread-safe. If you need to modify it from multiple threads, you'll need to use a lock:
Any public static (Shared in Visual Basic) members of this type are
thread safe. Any instance members are not guaranteed to be thread
safe.
The best thing you could do is make those concurrent operations completely unaware of each other, and have some higher level mechanism that makes sure no two IDs are queried twice.
Also is this the best way to asynchrously process data?
It seems to me that you are on the right track with executing ProcessEvent concurrently for each event. Only thing I would do is perhaps re-write the foreach loop to use Enumerable.Select, but that is a matter of flavor:
Inputs[] events = GetInputs();
var tasks = events.Select(ev => ProcessEvent(ev));
tempInputs = await Task.WhenAll<RegistrationInput>(tasks);
You can use a ConcurentDictionary<ProcessEvent, byte> and just use the Keys.
The usage of byte as type of Value is to minimize the amount of memory used. If you don't have any memory considerations you can use anything else.
It is thread safe and you can have all functionalities in HashSet
i want to know which why is faster to build a hashset.
my process like this:
1, DB access (single thread) , get a large list of IDs.
2,
Plan A
foreach( var oneID in IDs)
{
myHashSet.add(oneID);
}
Plan B
Parallel.ForEach(IDs,myPallOpt,(oneID)=>
{
myHashSet.add(oneID);
});
So which is faster Plan A or B?
Thanks
HashSet<T> is not thread safe, so the second option (using Parallel.ForEach) will likely cause bugs. It should definitely be avoided.
The best option is likely to just build the hashset directly from the results:
var myHashSet = new HashSet<int>(IDs);
Note that this only works if the HashSet is only intended to contain the items from this collection. If you're adding to an existing HashSet<T>, a foreach (your first option) is likely the best option.
Plan B likely won't work because it is likely not thread-safe (most .NET collection classes are not thread safe). You could fix it by making access to it thread-safe, but that essentially means serializing access to it, which is no better than single threaded. The only case it would make sense is if between beginning of your for loop and actual addition you have some cpu intensive processing that would be good to parallelize.
The accepted answer to the question "Why does this Parallel.ForEach code freeze the program up?" advises to substitute the List usage by ConcurrentBag in a WPF application.
I'd like to understand whether a BlockingCollection can be used in this case instead?
You can indeed use a BlockingCollection, but there is absolutely no point in doing so.
First off, note that BlockingCollection is a wrapper around a collection that implements IProducerConsumerCollection<T>. Any type that implements that interface can be used as the underlying storage:
When you create a BlockingCollection<T> object, you can specify not
only the bounded capacity but also the type of collection to use. For
example, you could specify a ConcurrentQueue<T> object for first in,
first out (FIFO) behavior, or a ConcurrentStack<T> object for last
in,first out (LIFO) behavior. You can use any collection class that
implements the IProducerConsumerCollection<T> interface. The default
collection type for BlockingCollection<T> is ConcurrentQueue<T>.
This includes ConcurrentBag<T>, which means you can have a blocking concurrent bag. So what's the difference between a plain IProducerConsumerCollection<T> and a blocking collection? The documentation of BlockingCollection says (emphasis mine):
BlockingCollection<T> is used as a wrapper for an
IProducerConsumerCollection<T> instance, allowing removal attempts
from the collection to block until data is available to be removed.
Similarly, a BlockingCollection<T> can be created to enforce an
upper-bound on the number of data elements allowed in the
IProducerConsumerCollection<T> [...]
Since in the linked question there is no need to do either of these things, using BlockingCollection simply adds a layer of functionality that goes unused.
List<T> is a collection designed to use in single thread
applications.
ConcurrentBag<T> is a class of Collections.Concurrent namespace designed
to simplify using collections in multi-thread environments. If you
use ConcurrentCollection you will not have to lock your
collection to prevent corruption by other threads. You can insert
or take data from your collection with no need to write special locking codes.
BlockingCollection<T> is designed to get rid of the requirement of checking if new data is available in the shared collection between threads. if there is new data inserted into the shared collection then your consumer thread will awake immediately. So you do not have to check if new data is available for consumer thread in certain time intervals typically in a while loop.
Whenever you find the need for a thread-safe List<T>, in most cases neither the ConcurrentBag<T> nor the BlockingCollection<T> are going to be your best option. Both collections are specialized for facilitating producer-consumer scenarios, so unless you have more than one threads that are concurrently adding and removing items from the collection, you should look for other options (with the best candidate being the ConcurrentQueue<T> in most cases).
Regarding especially the ConcurrentBag<T>, it's an extremely specialized class targeting mixed producer-consumer scenarios. This means that each worker-thread is expected to be both a producer and a consumer (that adds and removes items from the same collection). It could be a good candidate for the internal storage of an ObjectPool class, but beyond that it is hard to imagine any advantageous usage scenario for this class.
People usually think that the ConcurrentBag<T> is the thread-safe equivalent of a List<T>, but it's not. The similarity of the two APIs is misleading. Calling Add to a List<T> results to adding an item at the end of the list. Calling Add to a ConcurrentBag<T> results instead to the item being added at a random slot inside the bag. The ConcurrentBag<T> is essentially unordered. It is not optimized for being enumerated, and does a lousy job when it is commanded to do so. It maintains internally a bunch of thread-local queues, so the order of its contents is dominated by which thread did what, not by when did something happened. Before each enumeration of the ConcurrentBag<T>, all these thread-local queues are copied to an array, adding pressure to the garbage collector (source code). So for example the line var item = bag.First(); results in a copy of the whole collection, for returning just one element.
These characteristics make the ConcurrentBag<T> a less than ideal choice for storing the results of a Parallel.For/Parallel.ForEach loop.
A better thread-safe substitute of the List<T>.Add is the ConcurrentQueue<T>.Enqueue method. "Enqueue" is a less familiar word than "Add", but it actually does what you expect it to do.
There is nothing that a ConcurrentBag<T> can do that a ConcurrentQueue<T> can't. For example neither collection offers a way to remove a specific item from the collection. If you want a concurrent collection with a TryRemove method that has a key parameter, you could look at the ConcurrentDictionary<K,V> class.
The ConcurrentBag<T> appears frequently in the Task Parallel Library-related examples in Microsoft's documentation. Like here for example.
Whoever wrote the documentation, apparently they valued more the tiny usability advantage of writing Add instead of Enqueue, than the behavioral/performance disadvantage of using the wrong collection. This makes some sense considering that the examples were authored at a time when the TPL was new, and the goal was the fast adoption of the library by developers who were mostly unfamiliar with parallel programming. I get it, Enqueue is a scary word when you see it for the first time. Unfortunately now there is a whole generation of developers that have incorporated the ConcurrentBag<T> in their mental tools, although it has no business being there, considering
how specialized this collection is.
In case you want to collect the results of a Parallel.ForEach loop in exactly the same order as the source elements, you can use a List<T> protected with a lock. In most cases the overhead will be negligible, especially if the work inside the loop is chunky. An example is shown below, featuring the Select LINQ operator for getting the index of each element.
var indexedSource = source.Select((item, index) => (item, index));
List<TResult> results = new();
Parallel.ForEach(indexedSource, parallelOptions, entry =>
{
var (item, index) = entry;
TResult result = GetResult(item);
lock (results)
{
while (results.Count <= index) results.Add(default);
results[index] = result;
}
});
This is for the case that the source is a deferred sequence with unknown size. If you know its size beforehand, it is even simpler. Just preallocate a TResult[] array, and update it in parallel without locking:
TResult[] results = new TResult[source.Count];
Parallel.For(0, source.Count, parallelOptions, i =>
{
results[i] = GetResult(source[i]);
});
The TPL includes memory barriers at the end of task executions, so all the values of the results array will be visible from the current thread (citation).
Yes, you could use BlockingCollection for that. finishedProxies would be defined as:
BlockingCollection<string> finishedProxies = new BlockingCollection<string>();
and to add an item, you would write:
finishedProxies.Add(checkResult);
And when it's done, you could create a list from the contents.
My code does very simple stuff
list already has elements. I have approximately 25000 elements (and I'm expecting to have more) in the list and each element is small (DateTime).
List<DateTime> newList = new List<DateTime>();
Parallel.ForEach(list, l => newlist.Add(new DateTime(l.Ticks + 5000)));
i.e, based on each element, I'm creating new elements and adding them to a different list.
But, this doesn't seem to be a good programming approach. I hit this exceptions some times, but not everytime.
IndexOutOfRangeException : {"Index was outside the bounds of the array."}
Can we add elements to a list using Parallel.ForEach()? If yes, why do I hit the error? If no, why?
What you would really want in this situation is more like this:
newlist = list.AsParallel().Select(l => new DateTime(l.Ticks + 5000)).ToList();
Although you should measure the performance to see if this situation even benefits from parallelization.
Try a thread local variable with a final result that adds all thread local variables to the newList as such...
Parallel.ForEach(list, () => DateTime.MinValue, (l, state, date) =>
{
date = new DateTime(l.Ticks+5000);
return date;
},
finalresult =>
{
lock (newList)
{
newList.Add(finalresult);
}
});
The first param is your old list the second Param is the initial value of each thread (I just initialized to datetime min). the third param block is as follows- the l is the same as in your code; the state is a Paralleloption object of which you can exit out of parallel loop if you choose; the last is the stand in variable that represents the thread local variable. The finalresult param represents the end result of each thread local variable and is called for each thread - it is there you can place a lock of the newList and add to the newList shared variable. In theory this works. I have used similar coding in my own code. Hope this helps you or someone else.
This will effectively call List<T>.Add concurrently, yet according to MSDN documentation for List<T>:
"Any instance members are not guaranteed to be thread safe."
Even if it were (thread safe), this is far too cheap to benefit from parallel execution (as opposed to overhead of parallel execution). Did you actually measure your performance? 25000 elements is not that many.
As everyone has mentioned, there seems to be no case for doing this parallel. It will certainly be far, far slower. However, for completion, the reason this sometimes fails is there is no lock on the list object that's being written to by multiple threads. Add this:
object _locker = new object();
List<DateTime> newList = new List<DateTime>();
Parallel.ForEach(list, l => lock (_locker) newlist.Add(new DateTime(l.Ticks + 5000)));
There simply is not enough work to do for this to warrant using Parallel.ForEach and also List<T> is not thread safe, so you would have to lock if you wanted to add to the same list in parallel. Just use a regular for loop.
Do you really need these in a list? If all you need is to enumerate the list in a foreach, you should probably do this instead, as it will use far less memory:
IEnumerable<DateTime> newSequence = list.Select(d => new DateTime(d.Ticks + 5000));
If you really need these in a list, just add .ToList() at the end:
var newSequence = list.Select(d => new DateTime(d.Ticks + 5000)).ToList();
This will almost certainly be fast enough that you don't need to parallelize it. In fact, this is probably faster than doing it in parallel, as it will have better memory performance.