I am populating a ConcurrentDictionary in a Parallel.ForEach loop:
var result = new ConcurrentDictionary<int, ItemCollection>();
Parallel.ForEach(allRoutes, route =>
{
// Some heavy operations
lock(result)
{
if (!result.ContainsKey(someKey))
{
result[someKey] = new ItemCollection();
}
result[someKey].Add(newItem);
}
}
How do I perform the last steps in a thread-safe manner without using the lock statement?
EDIT: Assume that ItemCollection is thread-safe.
I think you want GetOrAdd, which is explicitly designed to either fetch an existing item, or add a new one if there's no entry for the given key.
var collection = result.GetOrAdd(someKey, _ => new ItemCollection());
collection.Add(newItem);
As noted in the question comments, this assumes that ItemCollection is thread-safe.
You need to use the GetOrAdd method.
var result = new ConcurrentDictionary<int, ItemCollection>();
int someKey = ...;
var newItem = ...;
ItemCollection collection = result.GetOrAdd(someKey, _ => new ItemCollection());
collection.Add(newItem);
Assuming ItemCollection.Add is not thread-safe, you will need a lock, but you can reduce the size of the critical region.
var collection = result.GetOrAdd(someKey, k => new ItemCollection());
lock(collection)
collection.Add(...);
Update: Since it seems to be thread-safe, you don't need the lock at all
var collection = result.GetOrAdd(someKey, k => new ItemCollection());
collection.Add(...);
Related
Currently I was working on parallel threads in C#. So, I have confusion of using ConcurrentBag<T> and List<T>.
Here is my code:
public async Task<ConcurrentDictionary<string, ConcurrentBag<T>>> MethodA(SearchResults<Result> response)
{
var a = new ConcurrentDictionary<string, ConcurrentBag<DeviceAlertAlarm>>();
var tasks = response.GetResults().Select(async result) =>
{
var b = new List <T>();
// do something
a["xxx"] = b;
});
await Task.WhenAll(tasks);
return a;
}
For var b = new List ();
Is it mandatory of ConcurrentBag<T> in multi-threading or can I use List<T> which is best way of writing the code with respective of performance.
Which one is better Concurrentbag<T> or List<T> in the above part of code?
Because your inner list is never used concurrently you also do not need to use a ConcurrentBag<T> here.
Commented your example a bit. From what I expect your code is doing I would take ICollection<T> or IEnumerable<T> in the ConcurrentDictionary<string, IEnumerable<T>>.
// IEnumerable, List, Collection ... is enough here.
public async Task<ConcurrentDictionary<string, IEnumerable<T>>> MethodA(SearchResults<Result> response)
{
var a = new ConcurrentDictionary<string, IEnumerable<DeviceAlertAlarm>>();
var tasks = response.GetResults().Select(async (result) =>
{
//
// This list just exists and is accessed in this 'task/thread/delegate'
var b = new List<T>();
//
// do something ...
//
// The surrounding IDictionary as you have it here
// is used *concurrently* so this is enough to be thread-safe
a["xxx"] = b;
});
await Task.WhenAll(tasks);
return a;
}
If I have the following code:
var dictionary = new ConcurrentDictionary<int, HashSet<string>>();
foreach (var user in users)
{
if (!dictionary.ContainsKey(user.GroupId))
{
dictionary.TryAdd(user.GroupId, new HashSet<string>());
}
dictionary[user.GroupId].Add(user.Id.ToString());
}
Is the act of adding an item into the HashSet inherently thread safe because HashSet is a value property of the concurrent dictionary?
No. Putting a container in a thread-safe container does not make the inner container thread safe.
dictionary[user.GroupId].Add(user.Id.ToString());
is calling HashSet's add after retrieving it from the ConcurrentDictionary. If this GroupId is looked up from two threads at once this would break your code with strange failure modes. I saw the result of one of my teammates making the mistake of not locking his sets, and it wasn't pretty.
This is a plausible solution. I'd do something different myself but this is closer to your code.
if (!dictionary.ContainsKey(user.GroupId))
{
dictionary.TryAdd(user.GroupId, new HashSet<string>());
}
var groups = dictionary[user.GroupId];
lock(groups)
{
groups.Add(user.Id.ToString());
}
No, the collection (the dictionary itself) is thread-safe, not whatever you put in it. You have a couple of options:
Use AddOrUpdate as #TheGeneral mentioned:
dictionary.AddOrUpdate(user.GroupId, new HashSet<string>(), (k,v) => v.Add(user.Id.ToString());
Use a concurrent collection, like the ConcurrentBag<T>:
ConcurrentDictionary<int, ConcurrentBag<string>>
Whenever you are building the Dictionary, as in your code, you should be better off accessing it as little as possible. Think of something like this:
var dictionary = new ConcurrentDictionary<int, ConcurrentBag<string>>();
var grouppedUsers = users.GroupBy(u => u.GroupId);
foreach (var group in grouppedUsers)
{
// get the bag from the dictionary or create it if it doesn't exist
var currentBag = dictionary.GetOrAdd(group.Key, new ConcurrentBag<string>());
// load it with the users required
foreach (var user in group)
{
if (!currentBag.Contains(user.Id.ToString())
{
currentBag.Add(user.Id.ToString());
}
}
}
If you actually want a built-in concurrent HashSet-like collection, you'd need to use ConcurrentDictionary<int, ConcurrentDictionary<string, string>>, and care either about the key or the value from the inner one.
I have a ConcurrentDictionary that has as key a long and as value a hashset of int. I want that if the key isn't in the dictionary, add a new hashset with the first element. If the key exists, add the new element to the existing dictionary.
I am trying something like that:
ConcurrentDictionary<long, HashSet<int>> myDic = new ConcurrentDictionary<long, HashSet<int>>();
int myElement = 1;
myDic.AddOrUpdate(1, new Hashset<int>(){myFirstElement},
(key, actualValue) => actualValue.Add(myElement));
The problem with this code is the third parameter, because .Add() method returns a bool and the AddOrUpdate expects a hashset. The first and second parameters are right.
So my question is how I can add a new element to the hashset in thread-safe way and avoid duplicates (it is the reason why I am using a hashset as value). The problem of the hashset is that it is not thread-safe and if I get it first and later add the new element, I am doing outside of the dictionary and I could have problems.
Thanks.
To fix compiler error you can do this:
myDic.AddOrUpdate(1, new HashSet<int>() { myFirstElement },
(key, actualValue) => {
actualValue.Add(myFirstElement);
return actualValue;
});
BUT this is not thread safe, because "update" function is not run inside any lock so you are potentially adding to not-thread-safe HashSet from multiple threads. This might result in (for example) losing values (so you were adding 1000 items to HashSet but in the end you have only 970 items in it for example). Update function in AddOrUpdate should not have any side effects and here it does.
You can lock yourself over adding values to HashSet:
myDic.AddOrUpdate(1, new HashSet<int>() { myFirstElement },
(key, actualValue) => {
lock (actualValue) {
actualValue.Add(myFirstElement);
return actualValue;
}
});
But then question is why you are using lock-free structure (ConcurrentDictionary) in the first place. Besides that - any other code might get HashSet from your dictionary and add value there without any locks, making the whole thing useless. So if you decide to go that way for some reason - you have to ensure that all code locks when accessing HashSet from that dictionary.
Instead of all that - just use concurrent collection instead of HashSet. There is no ConcurrentHashSet as far as I know but you can use another ConcurrentDictionary with dummy keys as a replacement (or look over internet for custom implementations).
Side note. Here
myDic.AddOrUpdate(1, new Hashset<int>(){myFirstElement},
you create new HashSet every time when calling AddOrUpdate, even if that dictionary is not needed because key is already there. Instead use overload with add value factory:
myDic.AddOrUpdate(1, (key) => new HashSet<int>() { myFirstElement },
Edit: sample usage of ConcurrentDictionary as hash set:
var myDic = new ConcurrentDictionary<long, ConcurrentDictionary<int, byte>>();
long key = 1;
int element = 1;
var hashSet = myDic.AddOrUpdate(key,
_ => new ConcurrentDictionary<int, byte>(new[] {new KeyValuePair<int, byte>(element, 0)}),
(_, oldValue) => {
oldValue.TryAdd(element, 0);
return oldValue;
});
If you wrap the anonymous function definition in curly braces, you can define multiple statements in the body of the function and thus specify the return value like this:
myDic.AddOrUpdate(1, new HashSet<int>() { myFirstElement },
(key, actualValue) => {
actualValue.Add(myElement);
return actualValue;
});
I'm replacing an old parallelisation helper class of mine with the TPL classes now. My old code has proven very unreliable when errors occur in the action code and it doesn't seem to be built for what I'm doing now.
The first list of jobs was easily translated to Parallel.ForEach. But here comes a nested and indexed loop that I can't resolve so easily.
int streamIndex = 0;
foreach (var playlist in selectedPlaylists)
{
var localPlaylist = playlist;
foreach (var streamFile in playlist.StreamFiles)
{
var localStreamFile = streamFile;
var localStreamIndex = streamIndex++;
// Action that uses localPlaylist, localStreamFile and localStreamIndex
...
// Save each job's result to its assigned place in the list
lock (streamsList)
{
streamsList[localStreamIndex] = ...;
}
}
}
The local variables are for proper closure support as the foreach iteration variable was shared.
I'm thinking of something like
selectedPlaylists.SelectMany(p => p.StreamFiles)
but then I'm losing the association of where each streamFile came from, and the index which should be deterministic as it's used for ordering the results in the results list. Is there a way to keep these associations with Linq and also add that counter while enumerating the list? Maybe like this (made-up pseudocode):
selectedPlaylists
.SelectMany(p => new
{
Playlist = p,
StreamFile = ~~each one of p.StreamFiles~~,
Index = ~~Counter()~~
})
I could keep those old nested foreach loops and collect all jobs in a list, then use Parallel.Invoke, but that seems more complex than it needs to be. I'd like to know if there's a simple Linq feature I don't know yet.
Well you could do something like this...
//
Dictionary<int, object> streamsList = new Dictionary<int, object>();
// First create a composition that holds the playlist and the streamfile
selectedPlaylists.SelectMany(playList => playList.StreamFiles.Select(streamFile => new { PlayList = playList, StreamFile = streamFile }))
// thenfor all of theese add the respective index
.Select((composition, i) => new { StreamFile = composition.StreamFile, PlayList = composition.PlayList, LocalStreamIndex = i })
.AsParallel()
.WithCancellation(yourTokenGoesHere)
.WithDegreeOfParallelism(theDegreeGoesHere)
.ForAll(indexedComposition =>
{
object result =somefunc(indexedComposition.LocalStreamIndex, indexedComposition.PlayList, indexedComposition.StreamFile);;
lock(streamsList) // dont call the function insde the lock or the as parallel is useless.
streamsList[indexedComposition.LocalStreamIndex] = result;
});
To flatten the StreamFiles and keep association with PlayList and index them you canuse this query:
int index = 0;
var query = selectedPlaylists
.SelectMany(p => p.StreamFiles
.Select(s =>
new {
PlayList = p,
Index = index++,
StreamFile = s
}));
I am trying to using multi threading to process a list of results faster. I tried using a parallel for each but when the process method is run I do not recieve the correct results.
private IEnumerable<BulkProcessorResult> GetProccessResults(List<Foo> Foos)
{
var listOfFooLists = CreateListOfFooLists(Foos);
var bulkProcessorResults = new List<BulkProcessorResult>();
Parallel.ForEach(listOfFooLists, FooList =>
{
foreach (var Foo in FooList)
{
var processClaimResult = _processor.Process(Foo);
var bulkProcessorResult = new BulkProcessorResult()
{
ClaimStatusId = (int) processClaimResult.ClaimStatusEnum,
Property1 = Foo.Property1
};
bulkProcessorResults.Add(bulkProcessorResult);
}
});
return bulkProcessorResults;
}
If I use a normal forEach I get the correct output. If I use the above code I get status' of all 2 when there should be three with the status of 1 and one status of 3.
I am really new to threading so any help would be great.
The most obvious issue is that you're working with multiple threads (okay, this is somewhat hidden by calling Parallel.ForEach, but you should be aware that it achieves parallelism by using multiple threads/tasks) but you're using a List<T>, which isn't a thread-safe collection class:
A List<T> can support multiple readers concurrently, as long as the collection is not modified. Enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with one or more write accesses, the only way to ensure thread safety is to lock the collection during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization
Rather than implementing your own synchronization, though, and whilst not altering much else in your code, I would switch to using a ConcurrentQueue<T>:
private IEnumerable<BulkProcessorResult> GetProccessResults(List<Foo> Foos)
{
var listOfFooLists = CreateListOfFooLists(Foos);
var bulkProcessorResults = new ConcurrentQueue<BulkProcessorResult>();
Parallel.ForEach(listOfFooLists, FooList =>
{
foreach (var Foo in FooList)
{
var processClaimResult = _processor.Process(Foo);
var bulkProcessorResult = new BulkProcessorResult()
{
ClaimStatusId = (int) processClaimResult.ClaimStatusEnum,
Property1 = Foo.Property1
};
bulkProcessorResults.Enqueue(bulkProcessorResult);
}
});
return bulkProcessorResults;
}
How about treating the entire thing as a Parallel Linq query?
private IEnumerable<BulkProcessorResult> GetProccessResults(List<Foo> Foos)
{
var listOfFooLists = CreateListOfFooLists(Foos);
return listOfFooLists.AsParallel()
.SelectMany(FooList => FooList)
.Select(Foo =>
new BulProcessorResult {
ClaimStatusId = (int)_processor.Process(Foo),
Property1 = Foo.Property1
}).ToList();
}