I'm trying to run multiple functions that connect to a remote site (by network) and return a generic list. But I want to run them simultaneously.
For example:
public static List<SearchResult> Search(string title)
{
//Initialize a new temp list to hold all search results
List<SearchResult> results = new List<SearchResult>();
//Loop all providers simultaneously
Parallel.ForEach(Providers, currentProvider =>
{
List<SearchResult> tmpResults = currentProvider.SearchTitle((title));
//Add results from current provider
results.AddRange(tmpResults);
});
//Return all combined results
return results;
}
As I see it, multiple insertions to 'results' may happend at the same time... Which may crash my application.
How can I avoid this?
You can use a concurrent collection.
The System.Collections.Concurrent namespace provides several thread-safe collection classes that should be used in place of the corresponding types in the System.Collections and System.Collections.Generic namespaces whenever multiple threads are accessing the collection concurrently.
You could for example use ConcurrentBag since you have no guarantee which order the items will be added.
Represents a thread-safe, unordered collection of objects.
//In the class scope:
Object lockMe = new Object();
//In the function
lock (lockMe)
{
results.AddRange(tmpResults);
}
Basically a lock means that only one thread can have access to that critical section at the same time.
For those who prefer code:
public static ConcurrentBag<SearchResult> Search(string title)
{
var results = new ConcurrentBag<SearchResult>();
Parallel.ForEach(Providers, currentProvider =>
{
results.Add(currentProvider.SearchTitle((title)));
});
return results;
}
The Concurrent Collections are new for .Net 4; they are designed to work with the new parallel functionality.
See Concurrent Collections in the .NET Framework 4:
Before .NET 4, you had to provide your own synchronization mechanisms if multiple threads might be accessing a single shared collection. You had to lock the collection ...
... the [new] classes and interfaces in System.Collections.Concurrent [added in .NET 4] provide a consistent implementation for [...] multi-threaded programming problems involving shared data across threads.
This could be expressed concisely using PLINQ's AsParallel and SelectMany:
public static List<SearchResult> Search(string title)
{
return Providers.AsParallel()
.SelectMany(p => p.SearchTitle(title))
.ToList();
}
Related
I created list of object. And i want to fill the list in different tasks. It looks correct but it doesn't work.
This is my code:
var splittedDataList = Extensions.ListExtensions.SplitList(source, 500);
// Create a list of tasks
var poolTasks = new List<Task>();
var objectList = new List<Car>();
for (int i = 0; i < splittedDataList.Count; i++)
{
var data = splittedDataList[i];
poolTasks.Add(Task.Factory.StartNew(() =>
{
// Collect list of car
objectList = CollectCarList(data);
}));
}
// Wait all tasks to finish
Task.WaitAll(poolTasks.ToArray());
public List<Car> CollectCarList(List<Car> list)
{
///
return list;
}
The code is using Tasks as if they were threads to flatten a nested list. Tasks aren't threads, they're a promise that something will produce a result in the future. In JavaScript they're actually called promises.
The question's exact code is flattening a nested list. This can easily be done with Enumerable.SelectMany(), eg :
var cars=source.SelectMany(data=>data).ToList();
Flattening isn't an expensive operation so there shouldn't be any need for parallelism. If there are really that many items, Parallel LINQ can be used with .AsParallel(). LINQ operators after that are executed using parallel algorithms and collected at the end :
var cars=source.AsParallel()
.SelectMany(data=>data)
.ToList();
Parallel LINQ is far more useful if it's used to parallelize the real time consuming processing before flattening :
var cars=source.AsParallel()
.Select(data=>DoSomethingExpensive(data))
.SelectMany(data=>data)
.ToList();
Parallel LINQ is built for data parallelism - processing large amounts of in-memory data by partitioning the input and using worker tasks to process each partition with minimal synchronization between workers. It's definitely not meant for executing lots of asynchronous operations concurrently. There are other high-level classes for that
First off List are not thread safe. If you really wanted to fill a list via different async tasks then you would probably want to use some sort of concurrent collection.
https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent?view=net-6.0
The second questions is why would you want to do this? In your current example all this work is CPU bound anyway so creating multiple tasks does not really get you anywhere. It's not going to speed anything up, in fact it will do quite the contrary as the async state machine calls will add overhead to the processing.
If your input lists where coming from various other async tasks, e.g calls to a database then this might make more sense. In any case based on what I see above this would do what your asking.
object ListLock = new object();
async void Main()
{
var splittedDataList = new List<List<int>> { Enumerable.Range(0, 500).ToList(), Enumerable.Range(0, 500).ToList() };
// Create a list of tasks
var poolTasks = new List<Task>();
var objectList = new List<int>();
for (int i = 0; i < splittedDataList.Count; i++)
{
var data = splittedDataList[i];
poolTasks.Add(Task.Factory.StartNew(() =>
{
lock (ListLock)
{
// Collect list of car
objectList.AddRange(CollectCarList(data));
}
}));
}
// Wait all tasks to finish
Task.WaitAll(poolTasks.ToArray());
objectList.Dump();
}
// You can define other methods, fields, classes and namespaces here
public List<int> CollectCarList(List<int> list)
{
///
return list;
}
I changed the list to be a simple List of int as I didn't what the definition of Car was in your application. The lock is required to overcome the thread safety issue with List. This could be removed if you used some kind of concurrent collection. I just want to reiterate that what this code is doing in it's current state is pointless. You would be better off just doing all this on a single thread unless there is some actual async IO going somewhere else.
This question already has answers here:
multiple threads adding elements to one list. why are there always fewer items in the list than expected?
(2 answers)
Closed 2 years ago.
In my code,i'm getting list of menus from database and map them to DTO objects,
due to nested child,i decided to use parallel to map for mapping entities,but i bumped into a weird issue ,when forEach is finished some of the records are not mapped !
The number of missed records are different each time,one time one and another time more !
public List<TreeStructureDto> GetParentNodes()
{
var data = new List<TreeStructureDto>();
var result = MenuDLL.Instance.GetTopParentNodes();
Parallel.ForEach(result, res =>
{
data.Add( new Mapper().Map(res));
});
return data;
}
but when I'm debugging I'm getting
number of my original data is 59
But after mapping, the number of my final list is 58 !
My mapper class is as follows:
public TreeStructureDto Map(Menu menu)
{
return new TreeStructureDto()
{
id = menu.Id.ToString(),
children = true,
text = menu.Name,
data = new MenuDto()
{
Id = menu.Id,
Name = menu.Name,
ParentId = menu.ParentId,
Script = menu.Script,
SiblingsOrder = menu.SiblingsOrder,
systemGroups = menu.MenuSystemGroups.Select(x => Map(x)).ToList()
}
};
}
I appreciate your helps in advance.
You are adding to a single list concurrently, which is not valid because List<T> is not thread-safe (most types are not thread-safe; this isn't a fault of List<T> - the fault is simply: never assume something is thread-safe unless you've checked).
If the bulk of the CPU work in that per-item callback is the new Mapper().Map(res) part, then you may be able to fix this with synchronization, i.e.
Parallel.ForEach(result, res =>
{
var item = new Mapper().Map(res);
lock (data)
{
data.Add(item);
}
});
which prevents threads fighting while adding, but still allows the Map part to run concurrently and independently. Note that the order is going to be undefined, though; you might want some kind of data.Sort(...) after the Parallel.ForEach has finished.
An alternative solution to locking inside a Parallel.ForEach would be to use PLINQ:
public List<TreeStructureDto> GetParentNodes()
{
var mapper = new Mapper();
return MenuDLL.Instance.GetTopParentNodes()
.AsParallel()
.Select(mapper.Map)
.ToList();
}
AsParallel uses multiple threads to perform the mappings, but no collection needs to be accessed via multiple threads concurrently.
As mentioned by Marc, this may or may not prove more efficient for your situation, so you should benchmark both approaches, as well as comparing to a single-threaded approach.
If I have the following code:
var dictionary = new ConcurrentDictionary<int, HashSet<string>>();
foreach (var user in users)
{
if (!dictionary.ContainsKey(user.GroupId))
{
dictionary.TryAdd(user.GroupId, new HashSet<string>());
}
dictionary[user.GroupId].Add(user.Id.ToString());
}
Is the act of adding an item into the HashSet inherently thread safe because HashSet is a value property of the concurrent dictionary?
No. Putting a container in a thread-safe container does not make the inner container thread safe.
dictionary[user.GroupId].Add(user.Id.ToString());
is calling HashSet's add after retrieving it from the ConcurrentDictionary. If this GroupId is looked up from two threads at once this would break your code with strange failure modes. I saw the result of one of my teammates making the mistake of not locking his sets, and it wasn't pretty.
This is a plausible solution. I'd do something different myself but this is closer to your code.
if (!dictionary.ContainsKey(user.GroupId))
{
dictionary.TryAdd(user.GroupId, new HashSet<string>());
}
var groups = dictionary[user.GroupId];
lock(groups)
{
groups.Add(user.Id.ToString());
}
No, the collection (the dictionary itself) is thread-safe, not whatever you put in it. You have a couple of options:
Use AddOrUpdate as #TheGeneral mentioned:
dictionary.AddOrUpdate(user.GroupId, new HashSet<string>(), (k,v) => v.Add(user.Id.ToString());
Use a concurrent collection, like the ConcurrentBag<T>:
ConcurrentDictionary<int, ConcurrentBag<string>>
Whenever you are building the Dictionary, as in your code, you should be better off accessing it as little as possible. Think of something like this:
var dictionary = new ConcurrentDictionary<int, ConcurrentBag<string>>();
var grouppedUsers = users.GroupBy(u => u.GroupId);
foreach (var group in grouppedUsers)
{
// get the bag from the dictionary or create it if it doesn't exist
var currentBag = dictionary.GetOrAdd(group.Key, new ConcurrentBag<string>());
// load it with the users required
foreach (var user in group)
{
if (!currentBag.Contains(user.Id.ToString())
{
currentBag.Add(user.Id.ToString());
}
}
}
If you actually want a built-in concurrent HashSet-like collection, you'd need to use ConcurrentDictionary<int, ConcurrentDictionary<string, string>>, and care either about the key or the value from the inner one.
I have System.Collections.Generic.SynchronizedCollection shared collection. Our code uses .Net 4.0 Task library to span threads and pass the synchronized collection to the thread. So far threads has not been adding or removing items into the collection. But the new requirement which requires one of the thread has to remove items from the collection while the other thread just read the collection. Do I need to add lock before removing the items from the Collection? If so, would reader thread be thread safe? Or Suggest best way to get the thread safety?
No it is not fully thread-safe. Try the following in a simple Console-Application and see how it crashes with an exception:
var collection = new SynchronizedCollection<int>();
var n = 0;
Task.Run(
() =>
{
while (true)
{
collection.Add(n++);
Thread.Sleep(5);
}
});
Task.Run(
() =>
{
while (true)
{
Console.WriteLine("Elements in collection: " + collection.Count);
var x = 0;
if (collection.Count % 100 == 0)
{
foreach (var i in collection)
{
Console.WriteLine("They are: " + i);
x++;
if (x == 100)
{
break;
}
}
}
}
});
Console.ReadKey();
Note, that if you replace the SynchronizedCollection with a ConcurrentBag, you will get thread-safety:
var collection = new ConcurrentBag<int>();
SynchronizedCollection is simply not thread-safe in this application. Use Concurrent Collections instead.
As Alexander already pointed out the SynchronizedCollection is not thread safe for this scenario.
The SynchronizedCollection actually wraps a normal generic list and just delegates every call to the underlying list with a lock surrounding the call. This is also done in GetEnumerator. So the getting of the enumerator is synchronized but NOT the actual enumeration.
var collection = new SynchronizedCollection<string>();
collection.Add("Test1");
collection.Add("Test2");
collection.Add("Test3");
collection.Add("Test4");
var enumerator = collection.GetEnumerator();
enumerator.MoveNext();
collection.Add("Test5");
//The next call will throw a InvalidOperationException ("Collection was modified")
enumerator.MoveNext();
When using a foreach an enumerator will be called in this way. So adding a ToArray() before enumerating through this array will not work either as this will first enumerate into this array.
This enumeration could be faster when what you are doing inside of your foreach so it could reduce the probability of getting a concurrency issue.
As Richard pointed out: for true thread safety go for the System.Collections.Concurrent classes.
Yes, SynchronizedCollection will do the locking for you.
If you have multiple readers and just one writer, you may want to look at using a ReaderWriterLock, instead of SynchronizedCollection.
Also, if you are .Net 4+ then take a look at System.Collections.Concurrent. These classes have much better performance than SynchronizedCollection.
I have a class with two properties, say
public class Book {
public string TitleSource { get; set; }
public string TitleTarget { get; set; }
}
I have an IList<Book> where the TitleTarget is null and for each item in the list, I need to copy the TitleSource property to the TitleTarget property. I could do this through a loop, sure, but it seems like there's a LINQ or nice declarative way to do this. Is there?
Linq was designed as a way to consume things. If you look at web discussions about why there is no IEnumerable.ForEach(...) extension, you'll see that the Linq designers purposefully avoided Linq to Object scenarios where the methods were designed to change object values.
That said, you can cheat by "selecting" values and not using the results. But, that creates items which are thrown away. So, a foreach loop is much more efficient.
Edit for people who really want something besides foreach
Another "cheat" that wouldn't produce a new list would be to use a method that does little work of it's own, like Aggregate, All, or Any.
// Return true so All will go through the whole list.
books.All(book => { book.TitleTarget = book.TitleSource; return true; });
It's not LINQ as such, but there's:
books.Where(book => book.TitleTarget == null).ToList()
.ForEach(book => book.TitleTarget = book.TitleSource);
The main point is the ToList method call: there's no ForEach extension method (I don't think?) but there is one on List<T> directly. It wouldn't be hard to write your own ForEach extension method as well.
As to whether this would be better than a simple foreach loop, I'm not so sure. I would personally choose the foreach loop, since it makes the intention (that you want to modify the collection) a bit clearer.
#John Fisher is correct, there is no IEnumerable.ForEach.
There is however a ForEach on List<T>. So you could do the following:
List<Book> books = GetBooks();
books.ForEach(b => b.TitleTarget = b.TitleSource);
If you wanted a IEnumerable.ForEach it would be easy to create one:
public static class LinqExtensions
{
public static void ForEach<TSource>(this IEnumerable<TSource> source, Action<TSource> action)
{
foreach (var item in source)
{
action(item);
}
}
}
You can then use the following snippet to perform your action across your collection:
IList<Book> books = GetBooks();
books.ForEach(b => b.TitleTarget = b.TitleSource);
If you can use .NET 4.0, and you are using a thread-safe collection then you can use the new parallel ForEach construct:
using System.Threading.Tasks;
...
Parallel.ForEach(
books.Where(book => book.TitleTarget == null),
book => book.TitleTarget = book.TitleSource);
This will queue tasks to be run on the thread pool - one task that will execute the assignment delegate for each book in the collection.
For large data sets this may give a performance boost, but for smaller sets may actually be slower, given the overhead of managing the thread synchronization.
books.Select(b => b.TitleTarget = b.TitleSource);
This doesn't create any 'new items', just a query that you won't enumerate. That doesn't seem like a big deal to me.