I have a .NET web application, which for each request sends requests to one of the api from a set of multiple API's (which api to hit depends on the request type), gets the response, processes it and returns the response.
Suppose at any given point of time, lets say we have a maximum of 100 threads, and we get 100 requests (lets assume each thread is handling one request each) for which the request needs to go to API-1 and suddenly API-1's response time increases, and then we get subsequent requests which need to go to API-2,3 ... n and all these API(s) are working perfectly.Requests for these API(s) won't be served until one the threads get free from processing API-1 which results in an overall performance impact of the .NET web application.
What I want to achieve is that I want to limit the number of threads for each API (lets say we have a class for each api having some methods) such that each class does not exceed the maximum number of threads allocated to it.
(If I have n classes and 100 threads , I should be able to divide them into thread pools of 100/n each)
I tried a few links for thread pools but couldn't achieve the same I wanted.
Probably your application is a good target for a asynchronous programming model which, if used correctly, eliminates the blocked threads downtime issue.
Asynchronous programming in C# is a very broad topic that has been discussed a lot. Check following resources:
What is the difference between asynchronous programming and multithreading? discussion on StackOverflow
Asynchronous programming page on Microsoft Docs
Async and Await article by Stephen Cleary and his blog in general
What do the terms “CPU bound” and “I/O bound” mean? discussion on StackOverflow
If you really need to, you can still limit number of ThreadPool threads (workers) as usual. See Using TPL how do I set a max threadpool size discussion on StackOverflow.
Related
In this scenerio, I have to Poll AWS SQS messages from a queue, each async request can fetch upto 10 sqs items/messages. Once I Poll the items, Then I have to process those items on a kubernetes pod. Item processing includes getting response from few API calls, it may take some time & then saving the item to DB & S3.
I did some R&D & reach on following conclusion
To use consumer producer model, 1 thread will poll items & another thread will process the item or to use multi-threading for item processing
Maintain a data structure that will containes sqs polled items ready for processing, DS could be Blocking collection or Concurrent queue
Using Task Parellel Library for threadpooling & in item processing.
Channels can be used
My Queries
What would be best approach to achieve best performance or increase TPS.
Can/Should I use data flow TPL
Multi threaded or single threaded with asyn tasks
This is very dependant on the specifics of your use-case and how much effort would you want to put in.
I will, however, explain the thought process I would use when making such a decision.
The naive solution to handle SQS messages would be to do it one at a time sequentially (i.e. without concurrency). It doesn't mean that you're limited to a single message at a time since you can add more pods to the cluster.
So even in that naive solution you have one concurrency point you can utilize but it has a lot of overhead. The way to reduce overhead is usually to utilize the same overhead but process more messages with it. That's why, for example, SQS allows you to get 1-10 messages in a single call and not just one. It spreads the call overhead over 10 messages. In the naive solution the overhead is the cost of starting a whole process. Using the process for more messages means concurrent processing.
I've found that for stable and flexible concurrency you want many points of concurrency, but have each of them capped at some configurable degree of parallelism (whether hardcoded or actual configuration). That way you can tweak each of them to achieve optimal output (increase when you have free CPU and memory and decrease otherwise).
So, where can the additional concurrency be introduced? This is a progression where each step utilizes resources better but requires more effort.
Fetch 10 messages instead of one for every SQS API call and process them concurrently. That way you have 2 points of concurrency you can control: Number of pods, number of messages (up to 10) concurrently.
Have a few tasks each fetching 1-10 tasks and processing them concurrently. That's 3 concurrency points: Pods, tasks and messages per task. Both these solutions suffer from messages with varying processing time, meaning that a single long running message will "hold up" all the other 1-9 "slots" of work effectively reducing the concurrency to lower than configured.
Set up a TPL Dataflow block to process the messages concurrently and a task (or few) continuously fetching messages and pumping into the block. Keep in mind that SQS messages need to be explicitly deleted so the block needs to receive the message handle too so the message can be deleted after processing.
TPL Dataflow "pipe" consisting of a few blocks where each has it's own concurrency degree. That's useful when you have different steps of processing of the message where each step has different limitations (e.g. different APIs with different throttling configurations).
I personally am very fond of, and comfortable with, the Dataflow library so I would go straight to it. But simpler solutions are also valid when performance is less of an issue.
I'm not familiar with Kubernetes but there are many things to consider when maximising throughput.
All the things which you have mentioned is IO bound not CPU bound. So, using TPL is overcomplicating the design for marginal benefit. See: https://learn.microsoft.com/en-us/dotnet/csharp/async#recognize-cpu-bound-and-io-bound-work
Your Kubernetes pods are likely to have network limitations. For example, with Azure Function Apps on Consumption Plans is limited to 1,200 outbound connections. Other services will have some defined limits, too. https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections?tabs=csharp#connection-limit. Due to the nature of your work, it is likely that you will reach these limits before you need to process IO work on multiple threads.
You may also need to consider limits of the services which you are dependent on and ensure they are able to handle the throughput.
You may want to consider using Semaphores to limit the number of active connections to satisfy both your infrastructure and external dependency limits https://learn.microsoft.com/en-us/dotnet/api/system.threading.semaphoreslim?view=net-5.0
That being said, 500 messages per second is a realistic amount. To improve it further, you can look at having multiple processes with independent resource limitations processing the queue.
Not familiar with your use case, or specifically with the tech you are using, but this sounds like a very common message handling scenario.
Few guidelines:
First, these are guidelines, your usecase might be very different then what the ones commenting here are used to.
Whenever you want to increase your throughput you need to identify
your bottlenecks, and thrive towards CPU bottleneck, making sure you
fully utilize it. CPU load is usually the most expensive, and
generally makes for a more reliable metric for autoscaling. Obviously, depending on your remote api calls and your DB you might reach other bottlenecks - SQS queue size also makes for a good autoscaling metric, but keep in mind that autoscalling isn't guaranteed to increase you throughput if your bottleneck is DB or API related.
I would not go for a fancy solution with complex data structures, again, not familiar with your usecase, so I might be wrong - but keep it simple. There should be one thread that is responsible for polling the queue, and when it finds new messages it should create a Task that processes a batch. There should generally be one Task per processing batch - let the ThreadPool handle the number of threads.
Not familiar with .net SQS library. However, I am familiar with other libraries for very similar solutions. Most Libraries for queues out there already do it all for you, and you don't really have to worry about it. You should probably just have a callback function that is called when the highly optimized library already finds new messages. Those libraries probably already create a new task for each of those batches - you just need to register to their callback, and make sure you await any I/O bound code.
Edit: The solution I am proposing does have a limitation in that a single message can block an entire batch, this is not necessarily a bad thing - if your solution requires different processing for different messages, and you don't want to create this inner batch dependency, a TPL DataFlow could definitely be a good solution for your usecase.
Yeah, this sounds very much like the task for TPL Dataflow, it is very versatile yet powerful instrument. Your first chain link would acquire messages from the queue (not neccessarily one-threaded-ly, you just pass some delegates in). You will also be in control of how many items are "queued" locally this way.
Then you "subscribe" your workers in any way you desire – you can even customize it so that "faulted" processings would be put back into your queue — and it woudn't even matter if your processing is IO bound or not. If it is — well, nice, TPL dataflow is asyncronous, if not — well, not a problem, TPL dataflow can also be syncronous. Or you can fire up some thread pool threads, no biggie.
Async has become a buzzword in .net and MS have introduced it in Web API 2 so that more requests can be handled whilst others are waiting on IO to finish.
Whilst I can see the benefit of this, is it really a concern? A x64 architecture has 30000+ threads in the Thread Pool so unless you have that many concurrent users on your website is async really required? Even if you have that many concurrent users without caching I'm pretty sure SQL Server will fall over with that many requests?
Apart from it being shiny when is there a real need to have async routing on a web framework?
Many of the other answers here are coming from a UI (desktop/mobile app) perspective, not a web server perspective.
Async has become a buzzword in .net and MS have introduced it in Web API 2 so that more requests can be handled whilst others are waiting on IO to finish.
async and await were introduced in .NET 4.5 / VS 2012. However, ASP.NET has had asynchronous request capability since .NET 2.0 - a very long time ago. And there have been people using it.
What async and await bring to the table is asynchronous code that is easy to maintain.
Whilst I can see the benefit of this, is it really a concern?
The key benefit of async on the server is scalability. Simply put, async tasks scale far better than threads.
#Joshua's comment is key regarding the memory; a thread takes a significant amount of memory (and don't forget the kernel-mode stack which cannot be paged out), while an async request literally only takes a few hundred bytes.
There's also bursting to consider. The .NET threadpool has a limited injection rate, so unless you set your minWorkerThread count to a value much higher than you normally need, then when you get a burst of traffic some requests will 503 before .NET can spin up enough threads to handle them. async keeps your threads free (as much as possible) so it handles bursting traffic better.
A x64 architecture has 30000+ threads in the Thread Pool so unless you have that many concurrent users on your website is async really required?
#Joshua is again correct when he points out that you're probably thinking of a request queue limit (which defaults to 1000 for the IIS queue and 5000 for the ASP.NET request limit). It's important to note that once this queue is filled (during bursty traffic), new requests are rejected with 503.
Even if you have that many concurrent users without caching I'm pretty sure SQL Server will fall over with that many requests?
Ah, now that's another question entirely.
I'm giving a talk at ThatConference 2013 specifically on async servers. One part of that talk is situations where async doesn't help (my Twitter update).
There's an excellent blog post here that takes the position that asynchronous db calls are just not worth the effort. It's important to note the assumptions in this post:
At the time that post was written, asynchronous web servers were difficult. These days we have async and more and more libraries are offering asynchronous APIs (e.g., Entity Framework).
The architecture assumes a single web server with a single SQL Server backend. This was a very common setup traditionally, but is quickly changing today.
Where async servers really shine is when your backend can also scale. E.g., a web service, Azure SQL, NoSQL cluster, etc. Example: I'm writing an MVC/WebAPI server that uses Azure SQL and Storage for its backend (for all practical purposes, I can act like they have infinite scalability); in that case, I'm going to make my server async. In situations like this, you can scale your server 10x or more by using async.
But if you just have a single SQL Server backend (and have no plans to change to Azure SQL), then there's no point in making your web server async because you're limited by your backend anyway.
When long operations can be efficiently executed in parallel. For instance, you have to execute two SQLs and load three pictures - do all five operations as async and await them all. In this case the overall time will be the longest duration of five operations, but not the sum of the durations.
Pre-fetch. If you can predict (with good probability) what user will do (e.g. almost certainly, (s)he will want to see the details...) you may start preparing the next page (frame, window) while user's reading the previous.
where did you get 30000 from. i dont remember exactly but I think Asp.net uses 12 x number of cores threads.
I have to use async, when operation take too long time (upload, export, processing) and user have to know about progress.
You need async in following scenarios
1) When you are performing a very long operation and you don't want to freeze your UI.
2) When you designed some task that needs to be completed in background.
For example, You are rendering images from database. But you don't want your page to be freeze at that time async is really helpful.
Imagine we have an aspx web page that calls a stored procedure and 15 minutes later renders a table of data on a GridView. In my hypothetical, I'm not running, say, 4 asych which could happen in parallel-- just one long database proc.
At at least 3 places on the call stack, Microsoft lets me attempt to do things the asynch way, with async pages and web methods, async ADO.NET calls and things like the async key word and asynchronous delegates.
eg:
[WebMethod]
public IAsyncResult BeginLengthyProcedure( AsyncCallback cb, object s) {...}
[WebMethod]
public string EndLengthyProcedure(IAsyncResult call) {...}
(ref http://msdn.microsoft.com/en-us/library/aa480516.aspx )
My mental model was that IIS can only have so many "things (threads?)" at once handling requests and that if you use the async techniques the page will not exhaust the pool of threads available to take requests. I thought that while the async method is running, it consume OS threads and maybe could still crush the server with activity, but IIS will assume that it doesn't need to be counted against the maximum requests that it will deal with & thus the application remains responsive.
In the comments on this question here, I got confused about if and what resources are saved with async. I'm getting the impression from that discussion no technique with the keyword "asynch" in it will save any resource with the keywords "thread" or "resource" in it. Could it be that there is some resource that is freed up for other requests and maybe I just don't have the name of it?
Part B.
What is that limit, is it so high that only intense, 100s of requests per millisecond would hit it, or is it low enough that a few dozen users each running synchronous 15 minute pages could hit that limit?
Lucian Wischik, one of the spec leads involved with .NET Async, described asynchronous programming using the analogy of waiters (at a restaurant).
"A waiter’s job is to wait on a table until the patrons have finished their meal. If you want to serve two tables concurrently, you must hire two waiters."
That’s not true, is it? Why? Because you don’t need two waiters! You can just use the same waiter, and share him between tables.
Talk: Async Part 1 - the message-loop, and the Task type
So, rather than spinning up new threads (which is expensive) to simply sit around and wait, you enable your primary thread to put a sort of bookmark on the request which is taking a long time. This is analogous to allowing your primary waiter to check on other tables while the first table they served is busy choosing what to order or eating.
Now, if the long-running process is something your code is doing, like processing a list of items, async isn't going to save you anything. (I suppose the analogy there would be mowing lawns, in which case you'd need two mowers to mow two lawns concurrently.) Async is only useful when your code is waiting on a resource to become available or a request to be completed, such as an internet connection to be established or a query to return results. It saves you the expense and complexity associated with multi-threading.
Lucian provides an excellent talk on the subject for people who know little or nothing about asynchronous programming at the link above. While his talk focuses on async as applied by the .NET language, the theory extends to asynchronous programming in general.
We are developing an application where in UI calls a Service(WCF) layer and a component that resides inside service layer calls external webservices asynchronously. For instance: UI sends says calls WCF layer by uploading a file, and if file has 1000 entries, currently we call external services asynchronously in a loop (BeginxxxMethod) 1000 times for response. Is this the right way of doing so? Second question: what is the maximum number of asynchronous connections that can be made? Our technology is ASP.NEt and C# 4.0.
Two suggestions:
If you control the web service API add another method that lets you pass all 1000 args and returns all results. This is chunky vs chatty so you only go through the cross process pain once.
If you do not control the web service API, come up with a wrapper that makes n number of remote calls synchonously, then call this wrapper asynchonously n times. Play around until you find the best balance between number of async calls and number of sequential remote calls per async call.
Is this the right way of doing so?
Async calls are often the best way to get large batch jobs done. I think you're looking good with this approach. As a developer, it's often our job to see where cutting new threads no longer optimizes response times. Myles mentioned using a wrapper around a service. I have often done this when calling 1000's of calls at a time... calling a few thousand async calls actually hurt performance (in my application), so I created functionality to group calls a few hundred (asynchronously) at a time until all x-thousand calls were finished. This gave me the best of both worlds... I was able to find the point number of async calls at once gave me the best performance and went from there.
As far as max threads, it depends on resources, but here is a link that explains the defaults... I'll post below for ease
1023 in Framework 4.0 (32-bit environment)
32768 in Framework 4.0 (64-bit environment)
250 per core in Framework 3.5
25 per core in Framework 2.0
WE have resolved this by implementing Task Parallel library.
Below is the pseudo code.
Read the file contents in to generic list.
Use Parallel for to process each request and set MaxDOP (Degree of parallelism) according to processor count in the server to process the request.
Thanks
I added log4net to my application and can now see the thread Ids of user activities as they navigate through my website. Is there any specific algorithm to how threads assignment happens with IIS7, or is it just a random number assignment (I suspect it's not completely random because my low traffic site show threads mostly in the range 10-30)? Any maximum to the number of threads available? And I notice that my scheduler shows up with a weird threads id -- any reason for this? The scheduler is Quartz.net and the id shows as "Scheduler_Worker-10", and not just a number.
This explains all you need to know.
An Excerpt:
When ASP.NET is hosted on IIS 7.0 in
integrated mode, the use of threads is
a bit different. First of all, the
application-level queues are no more.
Their performance was always really
bad, there was no hope in fixing this,
and so we got rid of them. But perhaps
the biggest difference is that in IIS
6.0, or ISAPI mode, ASP.NET restricts the number of threads concurrently
executing requests, but in IIS 7.0
integrated mode, ASP.NET restricts the
number of concurrently executing
requests. The difference only matters
when the requests are asynchronous
(the request either has an
asynchronous handler or a module in
the pipeline completes
asynchronously). Obviously if the
reqeusts are synchronous, then the
number of concurrently executing
requests is the same as the number of
threads concurrently executing
requests, but if the requests are
asynchronous then these two numbers
can be quite different as you could
have far more reqeusts than threads.
So basically, if requests are synchronous, the same number of threads per request. See here for various parameters.
I've explained this is a blog post on my blog
ASP.NET Performance-Instantiating Business Layers
The title doesn't coincide with your question but I explain the way IIS handles Requests and I believe you'll have your answer.
A quote from the article
When IIS fields a request for your
application it hands it over to the
worker process. The worker process in
turn creates and instance of your
Global class (which is of type
HttpApplication). From that point on
the typical flow of an ASP.NET
application takes place (the ASP.NET
pipeline). However, what you need to
know and understand is that the worker
process (think of it as IIS really)
keeps the instance of your
HttpApplication (an instance of your
Global class) alive, in order to field
other requests. In fact it by default
it would create and cache up to 10
instances of your Global class, if
required (Lazy instantiation)
depending on load the number of
requests your website receives other
factors. In Figure1 above the
instances of your ASP.NET application
are shown as the red boxes. There
could be up to 10 of these cached by
the worker process. These are really
threads that the worker process has
created and cached and each thread has
its own instance of your Global class.
Note that each of these threads is in
the same App Domain. So any static
classes you may have in your
application are shared across each of
these threads or application
instances.
I suggest you read that article and I'll be happy to answer any questions you may have. Please note that I've intentional kept the article simple in that I don't talk about what happens in the kernel or go into details of the various components that participate. Keeping it simple helps people understand the concepts a lot better (I feel).
I'll answer some of your other questions here:
Is there any specific algorithm to how threads assignment happens with IIS7?
No, for all intents an purposes it's random. This is explain in the article I pointed to. The short answer is that if a cached thread is available then IIs will use it. If not, it will create a new thread, create and instance of your HttpApplication (Global) and assign all of the context to it. So in a site that's not busy, you may see the same threads handle requests. But there are no guarantees. If there is more than one free thread IIS will pick a thread at random to service that request. You should note here, that even in a not so busy site, if your requests take a long time, IIS will be forced to create new threads to service other incoming requests.
Any maximum to the number of threads available?
Yes (as explained in th article) typically 10 threads per worker process. This can be adjusted but I've worked on a number of extremely busy websites and I've never had to. The key is to make your applications respond as fast as possible. Mind you an application can have multiple worker process assigned to it (configured in your app pool) so in busy sites you actually want multiple worker processes for your application, however the implication is that you have the required hardware (CPU cores and memory).
The scheduler is Quartz.net and the id shows as "Scheduler_Worker-10", and not just a number
Threads can have names instead of Ids. If the thread has been assigned a name then you'll see that instead of an id. Of course for threads IIS creates you have no such control. Mind you, I've not used (nor know about Quartz) so I don't know about that but I'm guess that's the case.