Thread management with ASP.NET async/await

Thread management with ASP.NET async/await - c#

I've got a database entity type Entity, a long list of Thingy and method
private Task<Entity> MakeEntity(Thingy thingy) {
...
}
MakeEntity does lots of stuff, and is CPU bound. I would like to convert all my thingies to entities, and save them in a db.context. Considering that
I don't want to finish as fast as possible
The amount of entities is large, and I want to effectively use the database, so I want to start saving changes and waiting for the remote database to do it's thing
how can I do this performantly? What I would really like is to loop while waiting for the database to do its thing, and offer all the newly made entities so far, untill the database has processed them all. What's the best route there? I've run in to saveChanges throwing if it's called concurrently, so I can't do that. What I'd really like is to have a threadpool of eight threads (or rather, as many threads as I have cores) to do the CPU bound work, and a single thread doing the SaveChanges()

This is a kind of "asynchronous stream", which is always a bit awkward.
In this case (assuming you really do want to multithread on ASP.NET, which is not recommended in general), I'd say TPL Dataflow is your best option. You can use a TransformBlock with MaxDegreeOfParallelism set to 8 (or unbounded, for that matter), and link it to an ActionBlock that does the SaveChanges.
Remember, use synchronous signatures (not async/await) for CPU-bound code, and asynchronous methods for I/O-bound code (i.e., SaveChangesAsync).

You could set up a pipeline of N CPU workers feeding into a database worker. The database worker could batch items up.
Since MakeEntity is CPU bound there is no need to use async and await there. await does not create tasks or threads (a common misconception).
var thingies = ...;
var entities = thingies.AsParallel().WithDOP(8).Select(MakeEntity);
var batches = CreateBatches(entities, batchSize: 100);
foreach (var batch in batches) {
Insert(batch);
}
You need to provide a method that creates batches from an IEnumerable. This is available on the web.
If you don't need batching for the database part you can delete that code.
For the database part you probably don't need async IO because it seems to be a low-frequency operation.

Related

Thread Contention on a ConcurrentDictionary in C#

I have a C# .NET program that uses an external API to process events for real-time stock market data. I use the API callback feature to populate a ConcurrentDictionary with the data it receives on a stock-by-stock basis.
I have a set of algorithms that each run in a constant loop until a terminal condition is met. They are called like this (but all from separate calling functions elsewhere in the code):
Task.Run(() => ExecutionLoop1());
Task.Run(() => ExecutionLoop2());
...
Task.Run(() => ExecutionLoopN());
Each one of those functions calls SnapTotals():
public void SnapTotals()
{
foreach (KeyValuePair<string, MarketData> kvpMarketData in
new ConcurrentDictionary<string, MarketData>(Handler.MessageEventHandler.Realtime))
{
...
The Handler.MessageEventHandler.Realtime object is the ConcurrentDictionary that is updated in real-time by the external API.
At a certain specific point in the day, there is an instant burst of data that comes in from the API. That is the precise time I want my ExecutionLoop() functions to do some work.
As I've grown the program and added more of those execution loop functions, and grown the number of elements in the ConcurrentDictionary, the performance of the program as a whole has seriously degraded. Specifically, those ExecutionLoop() functions all seem to freeze up and take much longer to meet their terminal condition than they should.
I added some logging to all of the functions above, and to the function that updates the ConcurrentDictionary. From what I can gather, the ExecutionLoop() functions appear to access the ConcurrentDictionary so often that they block the API from updating it with real-time data. The loops are dependent on that data to meet their terminal condition so they cannot complete.
I'm stuck trying to figure out a way to re-architect this. I would like for the thread that updates the ConcurrentDictionary to have a higher priority but the message events are handled from within the external API. I don't know if ConcurrentDictionary was the right type of data structure to use, or what the alternative could be, because obviously a regular Dictionary would not work here. Or is there a way to "pause" my execution loops for a few milliseconds to allow the market data feed to catch up? Or something else?

Your basic approach is sound except for one fatal flaw: they are all hitting the same dictionary at the same time via iterators, sets, and gets. So you must do one thing: in SnapTotals you must iterate over a copy of the concurrent dictionary.
When you iterate over Handler.MessageEventHandler.Realtime or even new ConcurrentDictionary<string, MarketData>(Handler.MessageEventHandler.Realtime) you are using the ConcurrentDictionary<>'s iterator, which even though is thread-safe, is going to be using the dictionary for the entire period of iteration (including however long it takes to do the processing for each and every entry in the dictionary). That is most likely where the contention occurs.
Making a copy of the dictionary is much faster, so should lower contention.
Change SnapTotals to
public void SnapTotals()
{
var copy = Handler.MessageEventHandler.Realtime.ToArray();
foreach (var kvpMarketData in copy)
{
...
Now, each ExecutionLoopX can execute in peace without write-side contention (your API updates) and without read-side contention from the other loops. The write-side can execute without read-side contention as well.
The only "contention" should be for the short duration needed to do each copy.
And by the way, the dictionary copy (an array) is not threadsafe; it's just a plain array, but that is ok because each task is executing in isolation on its own copy.

I think that your main problem is not related to the ConcurrentDictionary, but to the large number of ExecutionLoopX methods. Each of these methods saturates a CPU core, and since the methods are more than the cores of your machine, the whole CPU is saturated. My assumption is that if you find a way to limit the degree of parallelism of the ExecutionLoopX methods to a number smaller than the Environment.ProcessorCount, your program will behave and perform better. Below is my suggestion for implementing this limitation.
The main obstacle is that currently your ExecutionLoopX methods are monolithic: they can't be separated to pieces so that they can be parallelized. My suggestion is to change their return type from void to async Task, and place an await Task.Yield(); inside the outer loop. This way it will be possible to execute them in steps, with each step being the code from the one await to the next.
Then create a TaskScheduler with limited concurrency, and a TaskFactory that uses this scheduler:
int maxDegreeOfParallelism = Environment.ProcessorCount - 1;
TaskScheduler scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxDegreeOfParallelism).ConcurrentScheduler;
TaskFactory taskFactory = new TaskFactory(scheduler);
Now you can parallelize the execution of the methods, by starting the tasks with the taskFactory.StartNew method instead of the Task.Run:
List<Task> tasks = new();
tasks.Add(taskFactory.StartNew(() => ExecutionLoop1(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop2(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop3(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop4(data)).Unwrap());
//...
Task.WaitAll(tasks.ToArray());
The .Unwrap() is needed because the taskFactory.StartNew returns a nested task (Task<Task>). The Task.Run method is also doing this unwrapping internally, when the action is asynchronous.
An online demo of this idea can be found here.
The Environment.ProcessorCount - 1 configuration means that one CPU core will be available for other work, like the communication with the external API and the updating of the ConcurrentDictionary.
A more cumbersome implementation of the same idea, using iterators and the Parallel.ForEach method instead of async/await, can be found in the first revision of this answer.

If you're not squeamish about mixing operations in a task, you could redesign such that instead of task A doing A things, B doing B things, C doing C things, etc. you can reduce the number of tasks to the number of processors, and thus run fewer concurrently, greatly easing contention.
So, for example, say you have just two processors. Make a "general purpose/pluggable" task wrapper that accepts delegates. So, wrapper 1 would accept delegates to do A and B work. Wrapper 2 would accept delegates to do C and D work. Then ask each wrapper to spin up a task that calls the delegates in a loop over the dictionary.
This would of course need to be measured. What I am proposing is, say, 4 tasks each doing 4 different types of processing. This is 4 units of work per loop over 4 loops. This is not the same as 16 tasks each doing 1 unit of work. In that case you have 16 loops.
16 loops intuitively would cause more contention than 4.
Again, this is a potential solution that should be measured. There is one drawback for sure: you will have to ensure that a piece of work within a task doesn't affect any of the others.

How to run in parallel a query from synchronous code in C# via Entity Framework

My goal is to speed up a query, and I thought to leverage parallelism, lets assume that I have 2,000 items in ids list, and I split them to 4 lists each one with 500 ids, and I want to open 4 treads that each one will create a DB call and to unite their results, in order to achieve that I used Parallel.ForEach, but it did not improved the performance of the query because apparently it does not well suited to io bound operations: Parallel execution for IO bound operations
The code in the if block uses parallel for each, vs the code in the else block that do it in a regular foreach.
The problem is that the method that contains this query is not async (because it is in a very legacy component) and it can not be change to async, and basically I want to do parallel io bound calculation inside non async method (via Entity Framework).
What are the best practices to achieve this goal? I saw that maybe I can use Task.WaitAll() for that, I do not care to blocking the thread that runs this query, I am more concerned that something will went wrong with the Task.WaitAll() that is called from a non async method
I use Entity Framework as ORM over a SQL database, for each thread I opens a separate context because the context is not thread safe.
Maybe the lock that I use is the one that cause me the problem, I can change it to a ConcurrentDictionary.
The scenario depicted in the code below is simplified from the one I need to improve, in our real application I do need to read the related entities after I loaded there ids, and to perform a complicated calculation on them.
Code:
//ids.Bucketize(bucketSize: 500) -> split one big list, to few lists each one with 500 ids
IEnumerable<IEnumerable<long>> idsToLoad = ids.Bucketize(bucketSize: 500);
if (ShouldLoadDataInParallel())
{
object parallelismLock = new object();
Parallel.ForEach(idsToLoad,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
(IEnumerable<long> bucket) =>
{
List<long> loadedIds = GetIdsQueryResult(bucket);
lock (parallelismLock)
{
allLoadedIds.AddRange(loadedIds );
}
});
}
else
{
foreach (IEnumerable<long> bucket in idsToLoad)
{
List<long> loadedIds = GetIdsQueryResult(bucket);
allLoadedIds.AddRange(loadedIds);
}
}

What are the best practices [for running multiple queries in parallel]?
Parallel.ForEach with seperate DbContext/SqlConnection is a fine approach.
It's just that running your queries in parallel is not really helpful here.
If your 4 queries hit 4 separate databases, then you might get a nice improvement. But there's many reasons why running 4 separate queries in parallel on a single instance might not be faster than running a single large query. Among these are blocking, resource contention, server-side query parallelism, and duplicating work between the queries.
And so
My goal is to speed up a query, and I thought to leverage parallelism
And so this is not usually a good approach to speeding up a query. There are, however, many good ways to speed up queries, so if you post a new question with the details of the query and perhaps some sample data you might get some better suggestions.

Long API Call - Async calls the answer?

I am calling an external API which is slow. Currently if I havent called the API to get some orders for a while the call can be broken up into pages (pagingation).
So therefore fetching orders could be making multiple calls rather than the 1 call. Sometimes each call can be around 10 seconds per call so this could be about a minute in total which is far too long.
GetOrdersCall getOrders = new GetOrdersCall();
getOrders.DetailLevelList.Add(DetailLevelCodeType.ReturnSummary);
getOrders.CreateTimeFrom = lastOrderDate;
getOrders.CreateTimeTo = DateTime.Now;
PaginationType paging = new PaginationType();
paging.EntriesPerPage = 20;
paging.PageNumber = 1;
getOrders.Pagination = paging;
getOrders.Execute();
var response = getOrders.ApiResponse;
OrderTypeCollection orders = new OrderTypeCollection();
while (response != null && response.OrderArray.Count > 0)
{
eBayConverter.ConvertOrders(response.OrderArray, 1);
if (response.HasMoreOrders)
{
getOrders.Pagination.PageNumber++;
getOrders.Execute();
response = getOrders.ApiResponse;
orders.AddRange(response.OrderArray);
}
}
This is a summary of my code above... The getOrders.Execute() is when the api fires.
After the 1st "getOrders.Execute()" there is a Pagination result which tells me how many pages of data there are. My thinking is that I should be able to start an asnychronous call for each page and to populate the OrderTypeCollection. When all the calls are made and the collection is fully loaded then I will commit to the database.
I have never done Asynchronous calls via c# before and I can kind of follow Async await but I think my scenario falls out of the reading I have done so far?
Questions:
I think I can set it up to fire off the multiple calls asynchronously but I'm not sure how to check when all tasks have been completed i.e. ready to commit to db.
I've read somewhere that I want to avoid combining the API call and the db write to avoid locking in SQL server - Is this correct?
If someone can point me in the right direction - It would be greatly appreciated.

I think I can set it up to fire off the multiple calls asynchronously
but I'm not sure how to check when all tasks have been completed i.e.
ready to commit to db.
Yes you can break this up
The problem is ebay doesn't have an async Task Execute Method, so you are left with blocking threaded calls and no IO optimised async await pattern. If there were, you could take advantage of a TPL Dataflow pipeline which is async aware (and fun for the whole family to play), you could anyway, though i propose a vanilla TPL solution...
However, all is not lost, just fall back to Parallel.For and a ConcurrentBag<OrderType>
Example
var concurrentBag = new ConcurrentBag<OrderType>();
// make first call
// add results to concurrentBag
// pass the pageCount to the for
int pagesize = ...;
Parallel.For(1, pagesize,
page =>
{
// Set up
// add page
// make Call
foreach(var order in getOrders.ApiResponse)
concurrentBag.Add(order);
});
// all orders have been downloaded
// save to db
Note : There are MaxDegreeOfParallelism which you configure, maybe set it to 50, though it wont really matter how much you give it, the Task Scheduler is not going to aggressively give you threads, maybe 10 or so initially and grow slowly.
The other way you can do this, is create your own Task Scheduler, or just spin up your own Threads with the old fashioned Thread Class
I've read somewhere that I want to avoid combining the API call and
the db write to avoid locking in SQL server - Is this correct?
If you mean locking as in slow DB insert, use Sql Bulk Insert and update tools.
If you mean locking as in the the DB deadlock error message, then this is an entirely different thing, and worthy of its own question
Additional Resources
For(Int32, Int32, ParallelOptions, Action)
Executes a for (For in Visual Basic) loop in which iterations may run
in parallel and loop options can be configured.
ParallelOptions Class
Stores options that configure the operation of methods on the Parallel
class.
MaxDegreeOfParallelism
Gets or sets the maximum number of concurrent tasks enabled by this
ParallelOptions instance.
ConcurrentBag Class
Represents a thread-safe, unordered collection of objects.

Yes ConcurrentBag<T> Class can be used to server the purpose of one of your questions which was: "I think I can set it up to fire off the multiple calls asynchronously but I'm not sure how to check when all tasks have been completed i.e. ready to commit to db."
This generic class can be used to Run your every task and wait all your tasks to be completed to do further processing. It is thread safe and useful for parallel processing.

Parallelization of long running processes and performance optimization

I would like to parallelize the application that processes multiple video clips frame by frame. Sequence of each frame per clip is important (obviously).
I decided to go with TPL Dataflow since I believe this is a good example of dataflow (movie frames being data).
So I have one process that loads frames from database (lets say in a batch of 500, all bunched up)
Example sequence:
|mid:1 fr:1|mid:1 fr:2|mid:2 fr:1|mid:3 fr:1|mid:1 fr:3|mid:2 fr:2|mid:2 fr:3|mid:1 fr:4|
and posts them to BufferBlock. To this BufferBlock I have linked ActionBlocks with the filter to have one ActionBlock per MovieID so that I get some kind of data partitioning. Each ActionBlock is sequential, but ideally multiple ActionBlocks for multiple movies can run in parallel.
I do have the above described network working and it does run in parallel, but from my calculations only eight to ten ActionBlocks are executing simultaneously. I timed each ActionBlock's running time and its around 100-200ms.
What steps can I take to at least double concurrency?
I did try converting action delegates to async methods and make database access asynchronous within ActionBlock action delegate but it did not help.
EDIT: I implemented extra level of data partitioning: frames for Movies with Odd IDs are processed on ServerA, frames for Even movies are processed on ServerB. Both instances of the application hit the same database. If my problem was DB IO, then I would not see any improvement in total frames processed count (or very little, under 20%). But I do see it doubling. So this leads me to conclude that Threadpool is not spawning more threads to do more frames in parallel (both servers are quad-cores and profiler shows about 25-30 threads per application).

Some assumptions:
From your example data, you are receiving movie frames (and possibly the frames in the movies) out of order
Your ActionBlock<T> instances are generic; they all call the same method for processing, you just create a list of them based on each movie id (you have a list of movie ids beforehand) like so:
// The movie IDs
IEnumerable<int> movieIds = ...;
// The actions.
var actions = movieIds.Select(
i => new { Id = i, Action = new ActionBlock<Frame>(MethodToProcessFrame) });
// The buffer block.
BufferBlock<Frame> buffer = ...;
// Link everything up.
foreach (var action in actions)
{
// Not necessary in C# 5.0, but still, good practice.
// The copy of the action.
var actionCopy = action;
// Link.
bufferBlock.LinkTo(actionCopy.Action, f => f.MovieId == actionCopy.Id);
}
If this is the case, you're creating too many ActionBlock<T> instances which aren't being given work; because your frames (and possibly movies) are out-of-order, you aren't guaranteed that all of the ActionBlock<T> instances will have work to do.
Additionally, when you create an ActionBlock<T> instance it's going to be created with a MaxDegreeOfParallelism of 1, meaning that it's thread safe because only one thread can access the block at the same time.
Additionally, the TPL DataFlow library ultimately relies on the Task<TResult> class, which schedules by default on the thread pool. The thread pool is going to do a few things here:
Make sure that all processor cores are saturated. This is very different from making sure that your ActionBlock<T> instances are saturated and this is the metric you should be concerned with
Make sure that while the processor cores are saturated, make sure that the work is distributed evenly, as well as make sure that not too many concurrent tasks are executing (context switches are expensive).
It also looks like your method that processes your movies is generic, and it doesn't matter what frame from what movie is passed in (if it does matter, then you need to update your question with that, as it changes a lot of things). This would also mean that it's thread-safe.
Also, if it can be assumed that the processing of one frame doesn't rely on the processing of any previous frames (or, it looks like the frames of the movie come in order) you can use a single ActionBlock<T> but tweak up the MaxDegreeOfParallelism value, like so:
// The buffer block.
BufferBlock<Frame> buffer = ...;
// Have *one* ActionBlock<T>
var action = new ActionBlock<Frame>(MethodToProcessFrame,
// This is where you tweak the concurrency:
new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 4,
}
);
// Link. No filter needed.
bufferBlock.LinkTo(action);
Now, your ActionBlock<T> will always be saturated. Granted, any responsible task scheduler (the thread pool by default) is still going to limit the maximum amount of concurrency, but it's going to do as much as it can reasonably do at the same time.
To that end, if your action is truly thread safe, you can set the MaxDegreeOfParallelism to DataflowBlockOptions.Unbounded, like so:
// Have *one* ActionBlock<T>
var action = new ActionBlock<Frame>(MethodToProcessFrame,
// This is where you tweak the concurrency:
new ExecutionDataflowBlockOptions {
// We're thread-safe, let the scheduler determine
// how nuts we can go.
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded,
}
);
Of course, all of this assumes that everything else is optimal (I/O reads/writes, etc.)

Odds are that's the optimal degree of parallelization. The thread pool is honestly pretty darn good at determining the optimal number of actual threads to have active. My guess is that your hardware can support about that many parallel processes actually working in parallel. If you added more you wouldn't actually be increasing throughput, you'd just be spending more time doing context switches between threads and less time actually working on them.
If you notice that, over an extended period of time, your CPU load, memory bus, network connection, disk access, etc. are all working below capacity then you might have a problem, and you'd want to check to see what is actually bottlenecking. Chances are though some resource somewhere is at it's capacity, and the TPL has recognized that and ensured that it doesn't over saturate that resource.

I suspect you are IO bound. The question is where? On the read or the write. Are you writing more data than reading. CPU may be under 50% because it cannot write out faster.
I am not saying the ActionBlock is wrong but I would consider a producer consumer with BlockingCollection. Optimize how you read and write data.
This different but I have an app where I read blocks of text. Parse the text and then write the words back to SQL. I read the on a single thread, then parallel the parse, and then write on a single thread. I write on a single thread so as not to fracture indexes. If you are IO bound you need to figure out what is the slowest IO then optimize that process.
Tell me more about that IO.
In the question you mention reading from database also.
I would give BlockingCollections a try.
BlockingCollection Class
And have size limit for each as so you don't blow memory.
Make it just big enough that it (almost) never goes empty.
The Blocking Collection after the slowest step will go empty.
If you can parallel process then do so.
What I have found is parallel inserts in a table are not faster.
Let one process take lock and hold it and keep that hose open.
Look close at how you insert.
One row at a time is slow.
I use TVP and insert 10,000 at a time but a lot of people like Drapper or BulkInsert.
If you drop indexes and triggers and insert sorted by clustered index will be fastest.
Take a tablock and hold it.
I am getting inserts in the 10 ms range.
Right now the update is the slowest.
Look at that - are you doing just one row at a time?
Look at taking tablock and doing by video clip.
Unless it is an ugly update it should not take longer than in insert.

multithread read and process large text files

I have 10 lists of over 100Mb each with emails and I wanna process them using multithreads as fast as possible and without loading them into memory (something like reading line by line or reading small blocks)
I have created a function which is removing invalid ones based on a regex and another one which is organizing them based on each domain to other lists.
I managed to do it using one thread with:
while (reader.Peek() != -1)
but it takes too damn long.
How can I use multithreads (around 100 - 200) and maybe a backgroundworker or something to be able to use the form while processing the lists in parallel?
I'm new to csharp :P

Unless the data is on multiple physical discs, chances are that any more than a few threads will slow down, rather than speed up, the process.
What'll happen is that rather than reading consecutive data (pretty fast), you'll end up seeking to one place to read data for one thread, then seeking to somewhere else to read data for another thread, and so on. Seeking is relatively slow, so it ends up slower -- often quite a lot slower.
About the best you can do is dedicate one thread to reading data from each physical disc, then another to process the data -- but unless your processing is quite complex, or you have a lot of fast hard drives, one thread for processing may be entirely adequate.

There are multiple approaches to it:
1.) You can create threads explicitly like Thread t = new Thread(), but this approach is expensive on creating and managing a thread.
2.) You can use .net ThreadPool and pass your executing function's address to QueueUserWorkItem static method of ThreadPool Class. This approach needs some manual code management and synchronization primitives.
3.) You can create an array of System.Threading.Tasks.Task each processing a list which are executed parallely using all your available processors on the machine and pass that array to task.WaitAll(Task[]) to wait for their completion. This approach is related to Task Parallelism and you can find detailed information on MSDN
Task[] tasks = null;
for(int i = 0 ; i < 10; i++)
{
//automatically create an async task and execute it using ThreadPool's thread
tasks[i] = Task.StartNew([address of function/lambda expression]);
}
try
{
//Wait for all task to complete
Task.WaitAll(tasks);
}
catch (AggregateException ae)
{
//handle aggregate exception here
//it will be raised if one or more task throws exception and all the exceptions from defaulting task get accumulated in this exception object
}
//continue your processing further

You will want to take a look at the Task Parallel Library (TPL).
This library is made for parallel work, in fact. It will perform your action on the Threadpool in whatever is the most efficient fashion (typically). The only thing that I would caution is that if you run 100-200 threads at one time, then you possibly run into having to deal with context switching. That is, unless you have 100-200 processors. A good rule of thumb is to only run as many tasks in parallel as you have processors.
Some other good resources to review how to use the TPL:
Why and how to use the TPL
How to start a task.

I would be inclined to use parallel linq (plinq).
Something along the lines of:
Lists.AsParallel()
.SelectMany(list => list)
.Where(MyItemFileringFunction)
.GroupBy(DomainExtractionFunction)
AsParallel tells linq it can do this in parallel (which will mean the ordering of everything following will not be maintained)
SelectMany takes your individual lists and unrolls them such that all all items from all lists are effectivly in a single Enumerable
Where filers the items using your predicate function
GroupBy collects them by key, where DomainExtractionFunction is a function which gets a key (the domain name in your case) from the items (ie, the email)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.