How to multithread a process method - c#

I am trying to using multi threading to process a list of results faster. I tried using a parallel for each but when the process method is run I do not recieve the correct results.
private IEnumerable<BulkProcessorResult> GetProccessResults(List<Foo> Foos)
{
var listOfFooLists = CreateListOfFooLists(Foos);
var bulkProcessorResults = new List<BulkProcessorResult>();
Parallel.ForEach(listOfFooLists, FooList =>
{
foreach (var Foo in FooList)
{
var processClaimResult = _processor.Process(Foo);
var bulkProcessorResult = new BulkProcessorResult()
{
ClaimStatusId = (int) processClaimResult.ClaimStatusEnum,
Property1 = Foo.Property1
};
bulkProcessorResults.Add(bulkProcessorResult);
}
});
return bulkProcessorResults;
}
If I use a normal forEach I get the correct output. If I use the above code I get status' of all 2 when there should be three with the status of 1 and one status of 3.
I am really new to threading so any help would be great.

The most obvious issue is that you're working with multiple threads (okay, this is somewhat hidden by calling Parallel.ForEach, but you should be aware that it achieves parallelism by using multiple threads/tasks) but you're using a List<T>, which isn't a thread-safe collection class:
A List<T> can support multiple readers concurrently, as long as the collection is not modified. Enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with one or more write accesses, the only way to ensure thread safety is to lock the collection during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization
Rather than implementing your own synchronization, though, and whilst not altering much else in your code, I would switch to using a ConcurrentQueue<T>:
private IEnumerable<BulkProcessorResult> GetProccessResults(List<Foo> Foos)
{
var listOfFooLists = CreateListOfFooLists(Foos);
var bulkProcessorResults = new ConcurrentQueue<BulkProcessorResult>();
Parallel.ForEach(listOfFooLists, FooList =>
{
foreach (var Foo in FooList)
{
var processClaimResult = _processor.Process(Foo);
var bulkProcessorResult = new BulkProcessorResult()
{
ClaimStatusId = (int) processClaimResult.ClaimStatusEnum,
Property1 = Foo.Property1
};
bulkProcessorResults.Enqueue(bulkProcessorResult);
}
});
return bulkProcessorResults;
}

How about treating the entire thing as a Parallel Linq query?
private IEnumerable<BulkProcessorResult> GetProccessResults(List<Foo> Foos)
{
var listOfFooLists = CreateListOfFooLists(Foos);
return listOfFooLists.AsParallel()
.SelectMany(FooList => FooList)
.Select(Foo =>
new BulProcessorResult {
ClaimStatusId = (int)_processor.Process(Foo),
Property1 = Foo.Property1
}).ToList();
}

Related

Usage of ConcurrentBag<T> vs list<T>

Currently I was working on parallel threads in C#. So, I have confusion of using ConcurrentBag<T> and List<T>.
Here is my code:
public async Task<ConcurrentDictionary<string, ConcurrentBag<T>>> MethodA(SearchResults<Result> response)
{
var a = new ConcurrentDictionary<string, ConcurrentBag<DeviceAlertAlarm>>();
var tasks = response.GetResults().Select(async result) =>
{
var b = new List <T>();
// do something
a["xxx"] = b;
});
await Task.WhenAll(tasks);
return a;
}
For var b = new List ();
Is it mandatory of ConcurrentBag<T> in multi-threading or can I use List<T> which is best way of writing the code with respective of performance.
Which one is better Concurrentbag<T> or List<T> in the above part of code?
Because your inner list is never used concurrently you also do not need to use a ConcurrentBag<T> here.
Commented your example a bit. From what I expect your code is doing I would take ICollection<T> or IEnumerable<T> in the ConcurrentDictionary<string, IEnumerable<T>>.
// IEnumerable, List, Collection ... is enough here.
public async Task<ConcurrentDictionary<string, IEnumerable<T>>> MethodA(SearchResults<Result> response)
{
var a = new ConcurrentDictionary<string, IEnumerable<DeviceAlertAlarm>>();
var tasks = response.GetResults().Select(async (result) =>
{
//
// This list just exists and is accessed in this 'task/thread/delegate'
var b = new List<T>();
//
// do something ...
//
// The surrounding IDictionary as you have it here
// is used *concurrently* so this is enough to be thread-safe
a["xxx"] = b;
});
await Task.WhenAll(tasks);
return a;
}

.Net collecting the same list in different task

I created list of object. And i want to fill the list in different tasks. It looks correct but it doesn't work.
This is my code:
var splittedDataList = Extensions.ListExtensions.SplitList(source, 500);
// Create a list of tasks
var poolTasks = new List<Task>();
var objectList = new List<Car>();
for (int i = 0; i < splittedDataList.Count; i++)
{
var data = splittedDataList[i];
poolTasks.Add(Task.Factory.StartNew(() =>
{
// Collect list of car
objectList = CollectCarList(data);
}));
}
// Wait all tasks to finish
Task.WaitAll(poolTasks.ToArray());
public List<Car> CollectCarList(List<Car> list)
{
///
return list;
}
The code is using Tasks as if they were threads to flatten a nested list. Tasks aren't threads, they're a promise that something will produce a result in the future. In JavaScript they're actually called promises.
The question's exact code is flattening a nested list. This can easily be done with Enumerable.SelectMany(), eg :
var cars=source.SelectMany(data=>data).ToList();
Flattening isn't an expensive operation so there shouldn't be any need for parallelism. If there are really that many items, Parallel LINQ can be used with .AsParallel(). LINQ operators after that are executed using parallel algorithms and collected at the end :
var cars=source.AsParallel()
.SelectMany(data=>data)
.ToList();
Parallel LINQ is far more useful if it's used to parallelize the real time consuming processing before flattening :
var cars=source.AsParallel()
.Select(data=>DoSomethingExpensive(data))
.SelectMany(data=>data)
.ToList();
Parallel LINQ is built for data parallelism - processing large amounts of in-memory data by partitioning the input and using worker tasks to process each partition with minimal synchronization between workers. It's definitely not meant for executing lots of asynchronous operations concurrently. There are other high-level classes for that
First off List are not thread safe. If you really wanted to fill a list via different async tasks then you would probably want to use some sort of concurrent collection.
https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent?view=net-6.0
The second questions is why would you want to do this? In your current example all this work is CPU bound anyway so creating multiple tasks does not really get you anywhere. It's not going to speed anything up, in fact it will do quite the contrary as the async state machine calls will add overhead to the processing.
If your input lists where coming from various other async tasks, e.g calls to a database then this might make more sense. In any case based on what I see above this would do what your asking.
object ListLock = new object();
async void Main()
{
var splittedDataList = new List<List<int>> { Enumerable.Range(0, 500).ToList(), Enumerable.Range(0, 500).ToList() };
// Create a list of tasks
var poolTasks = new List<Task>();
var objectList = new List<int>();
for (int i = 0; i < splittedDataList.Count; i++)
{
var data = splittedDataList[i];
poolTasks.Add(Task.Factory.StartNew(() =>
{
lock (ListLock)
{
// Collect list of car
objectList.AddRange(CollectCarList(data));
}
}));
}
// Wait all tasks to finish
Task.WaitAll(poolTasks.ToArray());
objectList.Dump();
}
// You can define other methods, fields, classes and namespaces here
public List<int> CollectCarList(List<int> list)
{
///
return list;
}
I changed the list to be a simple List of int as I didn't what the definition of Car was in your application. The lock is required to overcome the thread safety issue with List. This could be removed if you used some kind of concurrent collection. I just want to reiterate that what this code is doing in it's current state is pointless. You would be better off just doing all this on a single thread unless there is some actual async IO going somewhere else.

Why Some records are missing when using parallel.forEach? [duplicate]

This question already has answers here:
multiple threads adding elements to one list. why are there always fewer items in the list than expected?
(2 answers)
Closed 2 years ago.
In my code,i'm getting list of menus from database and map them to DTO objects,
due to nested child,i decided to use parallel to map for mapping entities,but i bumped into a weird issue ,when forEach is finished some of the records are not mapped !
The number of missed records are different each time,one time one and another time more !
public List<TreeStructureDto> GetParentNodes()
{
var data = new List<TreeStructureDto>();
var result = MenuDLL.Instance.GetTopParentNodes();
Parallel.ForEach(result, res =>
{
data.Add( new Mapper().Map(res));
});
return data;
}
but when I'm debugging I'm getting
number of my original data is 59
But after mapping, the number of my final list is 58 !
My mapper class is as follows:
public TreeStructureDto Map(Menu menu)
{
return new TreeStructureDto()
{
id = menu.Id.ToString(),
children = true,
text = menu.Name,
data = new MenuDto()
{
Id = menu.Id,
Name = menu.Name,
ParentId = menu.ParentId,
Script = menu.Script,
SiblingsOrder = menu.SiblingsOrder,
systemGroups = menu.MenuSystemGroups.Select(x => Map(x)).ToList()
}
};
}
I appreciate your helps in advance.
You are adding to a single list concurrently, which is not valid because List<T> is not thread-safe (most types are not thread-safe; this isn't a fault of List<T> - the fault is simply: never assume something is thread-safe unless you've checked).
If the bulk of the CPU work in that per-item callback is the new Mapper().Map(res) part, then you may be able to fix this with synchronization, i.e.
Parallel.ForEach(result, res =>
{
var item = new Mapper().Map(res);
lock (data)
{
data.Add(item);
}
});
which prevents threads fighting while adding, but still allows the Map part to run concurrently and independently. Note that the order is going to be undefined, though; you might want some kind of data.Sort(...) after the Parallel.ForEach has finished.
An alternative solution to locking inside a Parallel.ForEach would be to use PLINQ:
public List<TreeStructureDto> GetParentNodes()
{
var mapper = new Mapper();
return MenuDLL.Instance.GetTopParentNodes()
.AsParallel()
.Select(mapper.Map)
.ToList();
}
AsParallel uses multiple threads to perform the mappings, but no collection needs to be accessed via multiple threads concurrently.
As mentioned by Marc, this may or may not prove more efficient for your situation, so you should benchmark both approaches, as well as comparing to a single-threaded approach.

Is it the same to iterate over Linq expression result than to assign it first to a variable?

So, this is more difficult to explain in words, so i will put code examples.
let's suppose i already have a list of clients that i want to filter.
Basically i want to know if this:
foreach(var client in list.Where(c=>c.Age > 20))
{
//Do something
}
is the same as this:
var filteredClients = list.Where(c=>c.Age > 20);
foreach(var client in filteredClients)
{
//Do something
}
I've been told that the first approach executes the .Where() in every iteration.
I'm sorry if this is a duplicate, i couldn't find any related question.
Thanks in advance.
Yes, both those examples are functionally identical. One just stores the result from Enumerable.Where in a variable before accessing it while the other just accesses it directly.
To really see why this will not make a difference, you have to understand what a foreach loop essentially does. The code in your examples (both of them) is basically equivalent to this (I’ve assumed a known type Client here):
IEnumerable<Client> x = list.Where(c=>c.Age > 20);
// foreach loop
IEnumerator<Client> enumerator = x.GetEnumerator();
while (enumerator.MoveNext())
{
Client client = enumerator.Current;
// Do something
}
So what actually happens here is the IEnumerable result from the LINQ method is not consumed directly, but an enumerator of it is requested first. And then the foreach loop does nothing else than repeatedly asking for a new object from the enumerator and processing the current element in each loop body.
Looking at this, it doesn’t make sense whether the x in the above code is really an x (i.e. a previously stored variable), or whether it’s the list.Where() call itself. Only the enumerator object—which is created just once—is used in the loop.
Now to cover that SharePoint example which Colin posted. It looks like this:
SPList activeList = SPContext.Current.List;
for (int i=0; i < activeList.Items.Count; i++)
{
SPListItem listItem = activeList.Items[i];
// do stuff
}
This is a fundamentally different thing though. Since this is not using a foreach loop, we do not get that one enumerator object which we use to iterate through the list. Instead, we repeatedly access activeList.Items: Once in the loop body to get an item by index, and once in the continuation condition of the for loop where we get the collection’s Count property value.
Unfortunately, Microsoft does not follow its own guidelines all the time, so even if Items is a property on the SPList object, it actually is creating a new SPListItemCollection object every time. And that object is empty by default and will only lazily load the actual items when you first access an item from it. So above code will eventually create a large amount of SPListItemCollections which will each fetch the items from the database. This behavior is also mentioned in the remarks section of the property documentation.
This generally violates Microsoft’s own guidelines on choosing a property vs a method:
Do use a method, rather than a property, in the following situations.
The operation returns a different result each time it is called, even if the parameters do not change.
Note that if we used a foreach loop for that SharePoint example again, then everything would have been fine, since we would have again only requested a single SPListItemCollection and created a single enumerator for it:
foreach (SPListItem listItem in activeList.Items.Cast<SPListItem>())
{ … }
They are not quite the same:
Here is the original C# code:
static void ForWithVariable(IEnumerable<Person> clients)
{
var adults = clients.Where(x => x.Age > 20);
foreach (var client in adults)
{
Console.WriteLine(client.Age.ToString());
}
}
static void ForWithoutVariable(IEnumerable<Person> clients)
{
foreach (var client in clients.Where(x => x.Age > 20))
{
Console.WriteLine(client.Age.ToString());
}
}
Here is the decompiled Intermediate Language (IL) code this results in (according to ILSpy):
private static void ForWithVariable(IEnumerable<Person> clients)
{
Func<Person, bool> arg_21_1;
if ((arg_21_1 = Program.<>c.<>9__1_0) == null)
{
arg_21_1 = (Program.<>c.<>9__1_0 = new Func<Person, bool>(Program.<>c.<>9.<ForWithVariable>b__1_0));
}
IEnumerable<Person> enumerable = clients.Where(arg_21_1);
foreach (Person current in enumerable)
{
Console.WriteLine(current.Age.ToString());
}
}
private static void ForWithoutVariable(IEnumerable<Person> clients)
{
Func<Person, bool> arg_22_1;
if ((arg_22_1 = Program.<>c.<>9__2_0) == null)
{
arg_22_1 = (Program.<>c.<>9__2_0 = new Func<Person, bool>(Program.<>c.<>9.<ForWithoutVariable>b__2_0));
}
foreach (Person current in clients.Where(arg_22_1))
{
Console.WriteLine(current.Age.ToString());
}
}
As you can see, there is a key difference:
IEnumerable<Person> enumerable = clients.Where(arg_21_1);
A more practical question, however, is whether the differences hurt performance. I concocted a test to measure that.
class Program
{
public static void Main()
{
Measure(ForEachWithVariable);
Measure(ForEachWithoutVariable);
Console.ReadKey();
}
static void Measure(Action<List<Person>, List<Person>> action)
{
var clients = new[]
{
new Person { Age = 10 },
new Person { Age = 20 },
new Person { Age = 30 },
}.ToList();
var adultClients = new List<Person>();
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 1E6; i++)
action(clients, adultClients);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds.ToString());
Console.WriteLine($"{adultClients.Count} adult clients found");
}
static void ForEachWithVariable(List<Person> clients, List<Person> adultClients)
{
var adults = clients.Where(x => x.Age > 20);
foreach (var client in adults)
adultClients.Add(client);
}
static void ForEachWithoutVariable(List<Person> clients, List<Person> adultClients)
{
foreach (var client in clients.Where(x => x.Age > 20))
adultClients.Add(client);
}
}
class Person
{
public int Age { get; set; }
}
After several runs of the program, I was not able to find any significant difference between ForEachWithVariable and ForEachWithoutVariable. They were always close in time, and neither was consistently faster than the other. Interestingly, if I change 1E6 to just 1000, the ForEachWithVariable is actually consistently slower, by about 1 millisecond.
So, I conclude that for LINQ to Objects, there is no practical difference. The same type of test could be run if your particular use case involves LINQ to Entities (or SharePoint).

Thread-safe changes to a ConcurrentDictionary

I am populating a ConcurrentDictionary in a Parallel.ForEach loop:
var result = new ConcurrentDictionary<int, ItemCollection>();
Parallel.ForEach(allRoutes, route =>
{
// Some heavy operations
lock(result)
{
if (!result.ContainsKey(someKey))
{
result[someKey] = new ItemCollection();
}
result[someKey].Add(newItem);
}
}
How do I perform the last steps in a thread-safe manner without using the lock statement?
EDIT: Assume that ItemCollection is thread-safe.
I think you want GetOrAdd, which is explicitly designed to either fetch an existing item, or add a new one if there's no entry for the given key.
var collection = result.GetOrAdd(someKey, _ => new ItemCollection());
collection.Add(newItem);
As noted in the question comments, this assumes that ItemCollection is thread-safe.
You need to use the GetOrAdd method.
var result = new ConcurrentDictionary<int, ItemCollection>();
int someKey = ...;
var newItem = ...;
ItemCollection collection = result.GetOrAdd(someKey, _ => new ItemCollection());
collection.Add(newItem);
Assuming ItemCollection.Add is not thread-safe, you will need a lock, but you can reduce the size of the critical region.
var collection = result.GetOrAdd(someKey, k => new ItemCollection());
lock(collection)
collection.Add(...);
Update: Since it seems to be thread-safe, you don't need the lock at all
var collection = result.GetOrAdd(someKey, k => new ItemCollection());
collection.Add(...);

Categories

Resources