Parallel.ForEach Usage - c#

I've got the following:
[HttpPost]
public async Task<IEnumerable<PlotAutocompleteModel>> Get()
{
IEnumerable<PlotDomain> plots = await plotService.RetrieveAllPlots();
var concurrent = ConcurrentQueue<PlotAutoCompleteModel>();
Parallel.ForEach(plots, (plot) =>
{
concurrent.Enqueue(new PlotAutocompleteModel(plot);
});
return concurrent;
}
With this usage, it takes about two seconds. Compared to: return plots.Select(plot => new PlotsAutocompleteModel(plot)).ToList(); which takes about four and a half seconds.
But I've always been told that for a simple transformation of a domain model into a view model, a Parallel.ForEach isn't ideal, mostly because it should be for more com-putative code. Which my usage clearly doesn't do.
Clarification: Where you would use significantly more resources, for instance you have a bitmap, a large quantity, which you have to rasterize and create new images from.
Is this the proper option for this code? I clearly see a performance gain due to the raw amount of records I'm iterating then transforming. Does a better approach and exist?
Update:
public class ProductAutocompleteModel
{
private readonly PlotDomain plot;
public ProductAutocompleteModel(PlotDomain plot)
{
this.plot = plot;
}
public string ProductName => plot.Project.Name;
// Another fourteen exist.
}

With this usage, it takes about two seconds. Compared to... about four and a half seconds.
But I've always been told that for a simple transformation of a domain model into a view model, a Parallel.ForEach isn't ideal, mostly because it should be for more com-putative code.
Yeah, um... there's no way - absolutely no way - that a "simple transformation of a domain model into a view model" should take four and a half seconds. There is something seriously wrong there. It should take maybe half a millisecond or so. So, your PlotAutocompleteModel constructor is doing something like 10,000 times the amount of work that is normal.
Is this the proper option for this code? I clearly see a performance gain due to the raw amount of records I'm iterating then transforming.
Probably not, because you're hosting on ASP.NET. If you use parallelism on ASP.NET, you will see individual requests complete faster, but it will negatively impact the scalability of your web server as a whole. For this reason, I never recommend parallelism in ASP.NET handlers. (There are specific situations where it would be acceptable - such as a non-public server where you know you have a hard upper limit on the number of simultaneous users - but as a general rule, it's not a good idea).
Since your PlotAutocompleteModel constructor is taking several orders of magnitude longer than expected, I suspect that it's doing blocking I/O as part of its work. The best solution here is to change the blocking I/O to asynchronous I/O, and then use concurrent asynchrony, something like this:
class PlotAutocompleteModel
{
public static async Task<PlotAutocompleteModel> CreateAsync(PlotDomain plot)
{
... // do asynchronous I/O to create a PlotAutocompleteModel.
}
}
[HttpPost]
public async Task<IEnumerable<PlotAutocompleteModel>> Get()
{
IEnumerable<PlotDomain> plots = await plotService.RetrieveAllPlots();
var tasks = plots.Select(plot => PlotAutocompleteModel.CreateAsync(plot));
return await Task.WhenAll(tasks);
}

Related

How to run in parallel a query from synchronous code in C# via Entity Framework

My goal is to speed up a query, and I thought to leverage parallelism, lets assume that I have 2,000 items in ids list, and I split them to 4 lists each one with 500 ids, and I want to open 4 treads that each one will create a DB call and to unite their results, in order to achieve that I used Parallel.ForEach, but it did not improved the performance of the query because apparently it does not well suited to io bound operations: Parallel execution for IO bound operations
The code in the if block uses parallel for each, vs the code in the else block that do it in a regular foreach.
The problem is that the method that contains this query is not async (because it is in a very legacy component) and it can not be change to async, and basically I want to do parallel io bound calculation inside non async method (via Entity Framework).
What are the best practices to achieve this goal? I saw that maybe I can use Task.WaitAll() for that, I do not care to blocking the thread that runs this query, I am more concerned that something will went wrong with the Task.WaitAll() that is called from a non async method
I use Entity Framework as ORM over a SQL database, for each thread I opens a separate context because the context is not thread safe.
Maybe the lock that I use is the one that cause me the problem, I can change it to a ConcurrentDictionary.
The scenario depicted in the code below is simplified from the one I need to improve, in our real application I do need to read the related entities after I loaded there ids, and to perform a complicated calculation on them.
Code:
//ids.Bucketize(bucketSize: 500) -> split one big list, to few lists each one with 500 ids
IEnumerable<IEnumerable<long>> idsToLoad = ids.Bucketize(bucketSize: 500);
if (ShouldLoadDataInParallel())
{
object parallelismLock = new object();
Parallel.ForEach(idsToLoad,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
(IEnumerable<long> bucket) =>
{
List<long> loadedIds = GetIdsQueryResult(bucket);
lock (parallelismLock)
{
allLoadedIds.AddRange(loadedIds );
}
});
}
else
{
foreach (IEnumerable<long> bucket in idsToLoad)
{
List<long> loadedIds = GetIdsQueryResult(bucket);
allLoadedIds.AddRange(loadedIds);
}
}
What are the best practices [for running multiple queries in parallel]?
Parallel.ForEach with seperate DbContext/SqlConnection is a fine approach.
It's just that running your queries in parallel is not really helpful here.
If your 4 queries hit 4 separate databases, then you might get a nice improvement. But there's many reasons why running 4 separate queries in parallel on a single instance might not be faster than running a single large query. Among these are blocking, resource contention, server-side query parallelism, and duplicating work between the queries.
And so
My goal is to speed up a query, and I thought to leverage parallelism
And so this is not usually a good approach to speeding up a query. There are, however, many good ways to speed up queries, so if you post a new question with the details of the query and perhaps some sample data you might get some better suggestions.

Is it possible to throttle Parallel.ForEachAsync in .NET 6.0 to avoid rate limiting?

I'm fairly new to programming (< 3 years exp), so I don't have a great understanding of the subjects in this post. Please bear with me.
My team is developing an integration with a third party system, and one of the third party's endpoints lacks a meaningful way to get a list of entities matching a condition.
We have been fetching these entities by looping over the collection of requests, and adding the results of each awaited call to a list. This works just fine, but getting the entities takes a lot longer than getting entities from other endpoints that lets us get a list of entities by providing a list of ids.
.NET 6.0 introduced Parallel.ForEachAsync(), which lets us execute multiple awaitable tasks asynchronously in parallel.
For example:
public async Task<List<TEntity>> GetEntitiesInParallelAsync<TEntity>(List<IRestRequest> requests)
where TEntity : IEntity
{
var entities = new ConcurrentBag<TEntity>();
// Create a function that takes a RestRequest and returns the
// result of the request's execution, for each request
var requestExecutionTasks = requests.Select(i =>
new Func<Task<TEntity>>(() => GetAsync<TEntity>(i)));
// Execute each of the functions asynchronously in parallel,
// and add the results to the aggregate as they come in
await Parallel.ForEachAsync(requestExecutionTasks, new ParallelOptions
{
// This lets us limit the number of threads to use. -1 is unlimited
MaxDegreeOfParallelism = -1
}, async (func, _) => entities.Add(await func()));
return entities.ToList();
}
Using this code rather than the simple foreach-loop sped up the time it takes to get the ~30 entities on my test instance, by 91% on average. That's awesome. However, we are worried about the rate limiting that is likely to occur when we use it on a client's system with possibly thousands of entities. We have a system in place that detects the "you are rate limited"-message from their API, and cues the requests for a second or so before trying again, but this is not as much a good solution as it is a safety measure.
If we where just looping over the requests, we could have throttled the calls by doing something like await Task.Delay(minimumDelay) in each iteration of the loop. Correct me if I'm wrong, but from what I understand this wouldn't actually work when executing the requests in parallel foreach, as it would make all requests wait the same amount of time before the execution. Is there a way to make each individual request wait a certain amount of time before execution, only if we are close to being rate limited? If at all possible, I would like to do this without limiting the number of threads to use.
Edit
I wanted to let this question sit a little so more people could answer. Since no new answers or comments have been added, I'm marking the one answer I got as correct. That being said, the answer suggests a different approach than using Parallel.ForEachAsync.
If I understand the current answer correctly, the answer to my original question of whether or not it's possible to throttle Parallel.ForEachAsync, would be: "no, it's not".
My suggestion is to ditch the Parallel.ForEachAsync approach, and use instead the new Chunk LINQ operator in combination with the Task.WhenAll method. You can launch 100 asynchronous operations every second like this:
public async Task<List<TEntity>> GetEntitiesInParallelAsync<TEntity>(
List<IRestRequest> requests) where TEntity : IEntity
{
var tasks = new List<Task<TEntity>>();
foreach (var chunk in requests.Chunk(100))
{
tasks.AddRange(chunk.Select(request => GetAsync<TEntity>(request)));
await Task.Delay(TimeSpan.FromSeconds(1.0));
}
return (await Task.WhenAll(tasks)).ToList();
}
It is assumed that the time required to launch an asynchronous operation (to invoke the GetAsync method) is negligible.
This approach has the inherent disadvantage that in case of an exception, the failure will not be propagated before all operations are completed. For comparison the Parallel.ForEachAsync method stops invoking the async delegate and completes ASAP, after the first failure is detected.

Is a good idea to async-ize a sync method?

In my ASP.NET Core app, at some points I'm querying a couple ADs for data. This being AD, the queries take some time to complete and the DirectoryServices API contains only synchronous calls.
Is it a good practice to try and wrap the AD sync calls as async? I think it's done like this (just an example, not the real query):
private async Task<string[]> GetUserGroupsAsync(string samAccountName)
{
var func = new Func<string, string[]>(sam =>
{
var result = new List<string>();
using (var ctx = new PrincipalContext(ContextType.Domain, "", "", ""))
{
var p = new UserPrincipal(ctx)
{
SamAccountName = sam
};
using (var search_obj = new PrincipalSearcher(p))
{
var query_result = search_obj.FindOne();
if (query_result != null)
{
var usuario = query_result as UserPrincipal;
var directory_entry = usuario.GetUnderlyingObject() as DirectoryEntry;
var grupos = usuario.GetGroups(ctx).OfType<GroupPrincipal>().ToArray();
if (grupos != null)
{
foreach (GroupPrincipal g in grupos)
{
result.Add(g.Name);
}
}
}
}
}
return result.ToArray();
});
var result = await Task.Run(() => func(samAccountName));
return result;
}
Is it a good practice
Usually not.
In a desktop app where you don't want to hold up the UI thread, then this idea can actually be a good idea. That Task.Run moves the work to a different thread and the UI thread can continue responding to user input while you're waiting for a response.
You tagged ASP.NET. The answer there is also "it depends". ASP.NET has a limited amount of worker threads that it's allowed to use. The benefit of asynchronous code is to allow a thread to go and work on some other request while you're waiting for a response. Thus, you can serve more requests with the same amount of available threads. It helps the overall performance of your application.
If you're calling await GetUserGroupsAsync(), then there is absolutely no benefit to doing what you're doing. You're freeing up the calling thread, but you've created a new thread that is going to sit locked until a response is returned. So your net thread savings is zero, and you have the additional CPU overhead of setting up the task.
If you intend on calling GetUserGroupsAsync() and then going out and getting other data while you wait for a response, then this can save time. It won't save threads, but just time. But you should be conscious that you are now taking up two threads for each request instead of just one, which means you can hit the ASP.NET max thread count faster, potentially hurting the overall performance of your application.
But whether you want to save time in ASP.NET, or if you want to free up the UI thread in a desktop app, I would still argue that you should not use Task.Run inside GetUserGroupsAsync(). If the caller wants to offload that waiting to another thread so it can then go get other data, then the caller can use Task.Run, like this:
var groupsTask = Task.Run(() => GetUserGroupsAsync());
// make HTTP request or get some other external data while we wait
var groups = await groupsTask;
The decision on whether you should create a method for a class should depend on the answer to the question: if someone thinks of what this class represents, would he think that this class will have this functionality?
Compare this with class string and methods about string equality. Most people would think that two strings are equal if they have exactly the same characters in the same order. However, for a lot of applications, it might be handy to be able to compare two strings with case insensitivity. Instead of changing the equality method of string, a new class is created. This StringComparer class contains a lot of methods to compare strings using different definitions of equality.
If someone would say: "Okay, I've just created a class that represents several methods to compare two strings for equality". Would you expect that comparing with case insensitivity is one of the methods of this class? Of course you would!
The same should be with your class. I don't know what your class represents. However, apparently you thought, that someone who has an object of this class would be happy to "Get User Groups". He is happy that he doesn't have to know how that someone made this method for him, and that he doesn't need to know the insides of the class to be able to get the user groups.
This information hiding is an important thing of classes. It gives the creator of the class the freedom to internally change how the class works, without having to change usage of the class.
So if everyone who knows what your class represents would think: "of course getting user groups will take a considerable amount of time", and "of course, my thread will be waiting idly when getting user groups", then users of your class would expect the presence of asyn-await, to prevent idly waiting.
On the other hand, it might be that users of your class would say: "Well, I know that getting user groups will take some heavy calculations. It will take some time, but my thread will be very busy". In that case, they won't expect an async method.
Assuming that you have a non-async method to get the user groups:
string[] GetUserGroups(string samAccountName) {...}
The async method would be very simple:
Task<string[] GetUserGroupsAsync(string samAccountName)
{
return Task.Run(() => GetUserGroups(samAccountName));
}
The only thing you would have to decide is: do the users of my class expect this method?
Advantages and Disadvantages
Disadvantage of having a Sync and an Async method:
People who learn about your class have to learn about more methods
Users of your class can't decide how the async method calls the sync one, without creating an extra async method, which will only add to the confusion
You'll have to add an extra unit test
You'll have to maintain the async method forever.
Advantages of having an async method:
If in future a user group would be fetched from another process, for instance a database, or an XML file, or maybe the internet, then you can internally change the class, without having to change the many, many users (after all, all your classes are very popular, aren't they :)
Conclusion
If people look at your class, and they wouldn't even think that fetching user groups would be an async method, then don't create it.
If you think that maybe in future it could be that another process provides the user groups, then it would be wise to prepare your users about this.

Correctly parallelising lots of little tasks within a method using C#.NET

I'm implementing image processing algorithms in C# using .NET Framework 4.72 and need to decrease the computation code. Overall the code is sequential but there are quite a few methods with parameters that do not depend on each other. For example, it might be something like this
public void Algorithm(Object x, Object y) {
x = Filter(x);
x = Morphology(x);
y = Filter(y);
y = Morphology(y);
var z = Add(x,y);
//Similar pattern of separate operation that are then combined.
}
These functions generally take around 100ms to 500ms. They can be parallelised, and my approach has been something like this:
public void Algorithm(Object x, Object y) {
var xTask = Task.Run(() => {
x = Filter(x);
x = Morphology(x);
});
var yTask = Task.Run(() => {
y = Filter(y);
y = Morphology(y);
});
Task.WaitAll(xTask, yTask);
var z = Add(x,y);
}
It seems to work, a similar bit of code runs approximately twice as fast. (Note that the whole thing is wrapped in another Task.Run in the top most level function, so that is why I'm not awaiting here.
Question: Is this a valid approach, or is there another method for parallelising lots of little method calls that is more safe or efficient?
Update: This is not for parallelising processing a batch of images. It is about processing a single image as quick as possible.
This is valid enough - if you can process your workload in parallel then you should. You just need to be very aware of WHEN your workload can and should be parallel - and when it needs to be performed in order.
You also need to consider the cost of creating a new task, versus the benefits of doing so (i.e. sometimes avoid very small, very fast tasks).
I would strongly recommend you create additional methods and collections for managing your tasks - when they complete, and handle running lots of separate sets in parallel. Avoiding locking, managing shared memory/variables etc. For example, are you only ever processing one image at a time, or can you start processing the next one if you have cores available?
You need to be very careful with Task.WaitAll() - obviously you need to draw all your work together at some point, but be careful not to lock or block other work.
There's lots of articles out there on the various patterns you can use (pipelines sounds like a good match here).
Here's a few starters:
https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/tpl-and-traditional-async-programming
https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism

Async slower than Sync

I have been working on Async calls and I found that the Async version of a method is running much slower than the Sync version. Can anyone comment on what I may be missing. Thanks.
Statistics
Sync method time is 00:00:23.5673480
Async method time is 00:01:07.1628415
Total Records/Entries returned per call = 19972
Below is the code that i am running.
-------------------- Test class ----------------------
[TestMethod]
public void TestPeoplePerformanceSyncVsAsync()
{
DateTime start;
DateTime end;
start = DateTime.Now;
for (int i = 0; i < 10; i++)
{
using (IPersonRepository repository = kernel.Get<IPersonRepository>())
{
IList<IPerson> people1 = repository.GetPeople();
IList<IPerson> people2 = repository.GetPeople();
}
}
end = DateTime.Now;
var diff = start - end;
Console.WriteLine(diff);
start = DateTime.Now;
for (int i = 0; i < 10; i++)
{
using (IPersonRepository repository = kernel.Get<IPersonRepository>())
{
Task<IList<IPerson>> people1 = GetPeopleAsync();
Task<IList<IPerson>> people2 = GetPeopleAsync();
Task.WaitAll(new Task[] {people1, people2});
}
}
end = DateTime.Now;
diff = start - end;
Console.WriteLine(diff);
}
private async Task<IList<IPerson>> GetPeopleAsync()
{
using (IPersonRepository repository = kernel.Get<IPersonRepository>())
{
return await repository.GetPeopleAsync();
}
}
-------------------------- Repository ----------------------------
public IList<IPerson> GetPeople()
{
List<IPerson> people = new List<IPerson>();
using (PersonContext context = new PersonContext())
{
people.AddRange(context.People);
}
return people;
}
public async Task<IList<IPerson>> GetPeopleAsync()
{
List<IPerson> people = new List<IPerson>();
using (PersonContext context = new PersonContext())
{
people.AddRange(await context.People.ToListAsync());
}
return people;
}
So we've got a whole bunch of issues here, so I'll just say right off the bat that this isn't going to be an exhaustive list.
First off, the point of asynchrony is not strictly to improve performance. It can be, in certain contexts, used to improve performance, but that's not necessarily its goal. It can also be used to keep a UI responsive, for example. Paralleization is usually used to increase performance, but parallelization and asynchrony aren't equivalent. On top of that, parallelization has an overhead. You're spending time creating threads, scheduling them, synchronizing data between them, etc. The benefit of performing some operations in parallel may or may not surpass this overhead. If it doesn't, a synchronous solution may well be more performant.
Next, your "asynchronous" example isn't asynchronous "all the way up". You're calling WaitAll on the tasks inside the loop. For the example to be properly asynchronous one would like to see it be asynchronous all the way up to a single operation, namely some form of message loop.
Next, the two aren't don't the exact same thing in an asynchronous and synchronous manor. They are doing different things, which will obviously affect performance:
Your "asynchronous" solution creates 3 repositories. Your synchronous solution creates one. There is going to be some overhead here.
GetPeopleAsync takes a list, then pulls all of the items out of the list and puts them into another list. That's unnecessary overhead.
Then there are problems with your benchmarking:
You're using DateTime.Now, which is not designed for timing how long an operation takes. it's precision isn't particularly high, for example. You should use a StopWatch to time how long code takes.
You aren't performing all that many iterations. There's plenty of opportunity for the variation to affect the results here.
You aren't accounting for the fact that the first few runs through a section of code will take longer. The JITter needs to "warm up".
Garbage collections can be affecting your timings, namely that the objects created in the first test can end up being cleaned up during the second test.
It may depend on your data, or rather the amount of it. You didn't post what test metrics you're using to run your tests but this is my experience:
Usually when you see a slowdown in the performance of parallel algorithms when you're expecting improvement it's that the overhead of loading the extra libraries and spawning threads etc. slows down the parallel algorithm and makes it look like the linear/single-threaded version is performing better.
A greater amount of data should show better performance. Also try running the same test twice when all the libraries are loaded to avoid the load overhead.
If you don't see improvement, something is seriously wrong.
Note: You're getting voted down, I'm guessing, because you posted much more code than context, metrics etc. in the OP. IMO, very few SOers will actually bother to read and grok even that much code without being able to execute it while also being presented with metrics that are not at all useful!
Why I didn't read the code: When I see a code block with scroll bars along with the kind of text that was present in the original OP, my brain says: Don't bother. I think many if not most, probably do this.
Things to try:
Two different synch times does not mean statistically significant data. You should run each algorithm a number of times (5 at least) to see if you're experiencing anomalies. If your results for the same algorithms vary wildly then you may have other issues such as bandwidth restriction, server load etc. and the issue is external.
Try a .NET memory performance and/or memory profiler to help you track down the issue.
See #servy's great answer for more clues. It seems that he actually took the time to look at your code more closely.

Categories

Resources