i'm a newbie to asp.net core
i'm write a web api service, which store passed data to database. in theory there is about 300-400 request per second to server in future and response time must be less than 10 seconds
but first of all i try to run some load test with locust.
i write simple app with one controller and only one post method which simple return Ok() without any processing.
i try to create load to this service for 1000 users. my service run under ubuntu 16.04 with .net core 2.1 (2 Xeon 8175M with 8 GB of RAM). Locust run from dedicated computer
but i see only ~400 RPS and response time about 1400 ms. For empty action it is very big value.
i'm turn off all loging, run in production mode but no luck - still ~400 rps.
in system monitor (i use nmon) i see that both cpu loads only for 12-15% (total 24-30%). I have about 3 GB free ram, no network usage (about 200-300 KB/s), no disk usage, so system have hardware resource for handling request.
so i think, that there is problem with some configuration or may be with system resource like sockets, handles etc
i also try to use libuv instead of managed socket, but result is same
in kestrel configuration i setup explicitly Limit.MaxConnection and MaxUpgradedConnection to null (but it is default value)
so, i have two question:
- in theory, can kestrel provide high rps?
- if first is true, can you give me some advise for start point (links, articles and so on)
Related
I'm using Google Cloud Storage to store and retrieve some files, and my problem is that the response times I'm getting are inconsistent, and sometimes very slow.
My application is an ASP.NET Core app running in the Google Container Engine. The Container Engine cluster is in europe-west1-c. The Cloud Storage bucket is Multi-Regional, in the location EU, and it's a secure bucket (not publicly accessible). I'm using the latest version of the official Google.Cloud.Storage.V1 SDK package to access the Cloud Storage. (I tried both 1.0.0 and the new 2.0.0-beta01.) I'm using a singleton instance of the StorageClient object, which should do connection pooling under the hood.
I'm measuring and logging the time it takes to download a file from the Cloud Storage, this is the measurement I do.
var sw = Stopwatch.CreateNew();
await client.DownloadObjectAsync(googleCloudOptions.StorageBucketName, filepath, ms);
sw.Stop();
So I'm directly measuring the SDK call without any of my own application logic.
The numbers I'm getting for this measurement look like this in an average period.
44ms
56ms
501ms
274ms
90ms
237ms
145ms
979ms
446ms
148ms
You can see that the variance is already pretty large to begin with (and the response time is often really sluggish).
But occasionally I even get response times like this (the slowest I've seen was over 10 seconds).
172ms
4,348ms
72ms
51ms
179ms
2,508ms
2,592ms
100ms
Which is really bad considering that the file I'm downloading is ~2 KB in size, and my application is doing less than 1 requests per second, and I'm running my application inside the Google Cloud. I don't think that the bucket not being warmed up can be a problem, since I'm mainly downloading the same handful of files, and I'm doing at least a couple of requests every minute.
Does anyone know what can be the reason for this slowness, or how I could investigate what's going wrong?
Update: Following #jterrace's suggestion, I've run gsutil perfdiag on the production environment, and uploaded both the terminal output and the generated json report here.
I also collected some more measurements, here you can see the statistics for the last 7 days.
So you can see that slow requests don't happen super-often, but over half a second response time is not rare, and we even have a handful of requests over 5 seconds every day.
What I'd like to figure out is whether we're doing something wrong, or this is expected with Cloud Storage and we have to be prepared to be able to handle these slow responses on our side.
We have the same issue with GCS. The only answer we got (from GCS support) is to use exponential backoff.
First request should be with 200ms timeout, next try 400ms and so on.
A common problem I've seen in GCE is that due to gcloud clients having a heavy DNS dependency, that bursts of traffic are being throttled by DNS queries, not the actual clients (storage or otherwise). I highly recommend you adding etcd or some other DNS cache to your container. Any real amount of traffic in GCE will choke otherwise.
We have an ASP.NET MVC application deployed to an Azure Website that connects to MongoDB and does both read and write operations. The application does this iteratively. A few thousand times per minute.
We initialize the C# driver using Autofac and we set the MaxConnectionIdleTime to 45 seconds as suggested in https://groups.google.com/forum/#!topic/mongodb-user/_Z8YepNHnbI and a few other places.
We are still getting a large number of the below error:
Unable to read data from the transport connection: A connection
attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because
connected host has failed to respond. Method
Message:":{"ClassName":"System.IO.IOException","Message":"Unable to
read data from the transport connection: A connection attempt failed
because the connected party did not properly respond after a period of
time, or established connection failed because connected host has
failed to respond.
We get this error while connecting to both a MongoDB instance deployed on a VM in the same datacenter/region on Azure and also while connecting to an external PaaS MongoDB provider.
I run the same code in my local computer and connect to the same DB and I don't receive these errors. It's only when I deploy the code to an Azure Website.
Any suggestions?
A few thousand requests per minute is a big load, and the only way to do it right, is by controlling and limiting the maximum number of threads which could be running at any one time.
As there's not much information posted as to how you've implemented this. I'm going to cover a few possible circumstances.
Time to experiment...
The constants:
Items to process:
50 per second, or in other words...
3,000 per minute, and one more way to look at it...
180,000 per hour
The variables:
Data transfer rates:
How much data you can transfer per second is going to play a role no matter what we do, and this will vary through out the day depending on the time of day.
The only thing we can do is fire off more requests from different cpu's to distribute the weight of traffic we're sending back n forth.
Processing power:
I'm assuming you have this in a WebJob as opposed to having this coded inside the MVC site it's self. It's highly inefficient and not fit for the purpose that you're trying to achieve. By using a WebJob we can queue work items to be processed by other WebJobs. The queue in question is the Azure Queue Storage.
Azure Queue storage is a service for storing large numbers of messages
that can be accessed from anywhere in the world via authenticated
calls using HTTP or HTTPS. A single queue message can be up to 64 KB
in size, and a queue can contain millions of messages, up to the total
capacity limit of a storage account. A storage account can contain up
to 200 TB of blob, queue, and table data. See Azure Storage
Scalability and Performance Targets for details about storage account
capacity.
Common uses of Queue storage include:
Creating a backlog of work to process asynchronously
Passing messages from an Azure Web role to an Azure Worker role
The issues:
We're attempting to complete 50 transactions per second, so each transaction should be done in under 1 second if we were utilising 50 threads. Our 45 second time out serves no purpose at this point.
We're expecting 50 threads to run concurrently, and all complete in under a second, every second, on a single cpu. (I'm exaggerating a point here, just to make a point... but imagine downloading 50 text files every single second. Processing it, then trying to shoot it back over to a colleague in the hopes they'll even be ready to catch it)
We need to have a retry logic in place, if after 3 attempts the item isn't processed, they need to be placed back in to the queue. Ideally we should be providing more time to the server to respond than just one second with each failure, lets say that we gave it a 2 second break on first failure, then 4 seconds, then 10, this will greatly increase the odds of us persisting / retrieving the data that we needed.
We're assuming that our MongoDb can handle this number of requests per second. If you haven't already, start looking at ways to scale it out, the issue isn't in the fact that it's a MongoDb, the data layer could have been anything, it's the fact that we're making this number of requests from a single source that is going to be the most likely cause of your issues.
The solution:
Set up a WebJob and name it EnqueueJob. This WebJob will have one sole purpose, to queue items of work to be process in the Queue Storage.
Create a Queue Storage Container named WorkItemQueue, this queue will act as a trigger to the next step and kick off our scaling out operations.
Create another WebJob named DequeueJob. This WebJob will also have one sole purpose, to dequeue the work items from the WorkItemQueue and fire out the requests to your data store.
Configure the DequeueJob to spin up once an item has been placed inside the WorkItemQueue, start 5 separate threads on each and while the queue is not empty, dequeue work items for each thread and attempt to execute the dequeued job.
Attempt 1, if fail, wait & retry.
Attempt 2, if fail, wait & retry.
Attempt 3, if fail, enqueue item back to WorkItemQueue
Configure your website to autoscale out to x amount of cpu's (note that your website and web jobs share the same resources)
Here's a short 10 minute video that gives an overview on how to utilise queue storages and web jobs.
Edit:
Another reason you may be getting those errors could be because of two other factors as well, again caused by it being in an MVC app...
If you're compiling the application with the DEBUG attribute applied but pushing the RELEASE version instead, you could be running into issues due to the settings in your web.config, without the DEBUG attribute, an ASP.NET web application will run a request for a maximum of 90 seconds, if the request takes longer than this, it will dispose of the request.
To increase the timeout to longer than 90 seconds you will need to change the [httpRuntime][3] property in your web.config...
<!-- Increase timeout to five minutes -->
<httpRuntime executionTimeout="300" />
The other thing that you need to be aware of is the request timeout settings of your browser > web app, I'd say that if you insist on keeping the code in MVC as opposed to extracting it and putting it into a WebJob, then you can use the following code to fire a request off to your web app and offset the timeout of the request.
string html = string.Empty;
string uri = "http://google.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Timeout = TimeSpan.FromMinutes(5);
using (HttpWebResponse response = (HttpWebResonse)request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
}
Are you using mongoDB in a VM? It seems to be a network problem. This kind of transient faults should occur, so the best you can do is implement a retry pattern or use a lib such as Polly to do that:
Policy
.Handle<IOException>()
.Retry(3, (exception, retryCount) =>
{
// do something
});
https://github.com/michael-wolfenden/Polly
We are running a .NET 4.5 console application that performs USNChanged polling on a remote LDAP server and then synchronizes the records into a local AD LDS on Windows Server 2008R2. The DirSync control was not an option on the remote server but getting the records isn't the problem.
The directory is quite large, containing millions of user records. The console app successfully pulls down the records and builds a local cache. It then streams through the cache and does lookup/update/insert as required for each record on the local directory. The various network constraints in the environment had performance running between 8 and 80 records per second. As a result, we used the Task Parallel Library to improve performance:
var totalThreads = Environment.ProcessorCount *2;
var options = new ParallelOptions { MaxDegreeOfParallelism = totalThreads };
Parallel.ForEach(Data.ActiveUsersForSync.Batch(250), options, (batch, loopstate) =>
{
if (!loopstate.IsExceptional
&& !loopstate.IsStopped
&& !loopstate.ShouldExitCurrentIteration)
{
ProcessBatchSync(batch);
}
});
After introducing this block, performance increased to between 1000 and 1500 records per second. Some important notes:
This is running on an eight core machine so it allows up to 16 operations simultaneously Environment.ProcessorCount * 2;
The MoreLinq library batching mechanism is used so each task in the parallel set is processing 250 records on a given connection (from pool) before returning
Each batch is processed synchronously (no additional parallelism)
The implementation relies on System.DirectoryServices.Protocols (Win32), NOT System.DirectoryServices (ADSI)
Whenever a periodic full synchronization is executed, the system will get through about 1.1 million records and then AD LDS returns "The Server Is Busy" and the system throws a DirectoryOperationException. The number it completes before erroring is not constant but it is always near 1.1 million.
According to Microsoft (http://support.microsoft.com/kb/315071) the MaxActiveQueries value in AD LDS is no longer enforced in Windows Server 2008+. I can't change the value anyway, it doesn't show. They also show the "Server is Busy" error coming back only from a violation of that value or from having too many open notification requests per connection. This code only sends simple lookup/update/insert LDAP commands and requests no notifications from the server when something is changed.
As I understand it, I've got at most 16 threads working in tandem to query the LDS. While they are doing it very quickly, that's the max number of queries coming in in a given tick since each of these are processed single-threaded.
Is the Microsoft document incorrect? Am I misunderstanding another component here? Any assistance is appreciated.
I'm trying to understand how MS Enterprise Library's data access block manages its connections to SQL. The issue I have is that under a steady load (from a load test), at 10 minute intervals the number of connections to SQL increases quickly - which causes noticeable jump in page response times from the website.
This is the scenario I'm running:
Visual Studio load test tools, running against 3 web servers behind a load balancer
The tools give full visibility over the performance counters to all web + DB boxes
The tests take ~10 seconds each, and perform 4 inserts (form data), and some trivial selects
There are 60 tests running concurrently. There is no increase or decrease in load during the entire test.
The test is run for between 20 minutes and 3 hours, with consistent results.
And this is the issue we see:
Exactly every 10 minutes, the performance counter from SQL for SQL General: User Connections increases - by ~20 connections total
The pages performing the HTTP post / DB insert are the ones most significantly affected. The other pages show moderate, but noticeable rises.
The CPU/memory load on the web servers is unaffected
This increase corresponds with a notable bump in page response times - E.g. from .3 seconds average to up to 5 seconds
After ~5 minutes it releases many of the connections, with no affect on web performance
The following 5 minutes of testing gives the same (normal) web performance
Ultimately, the graph looks like a square wave
Happens all over again, 10 minutes after the first rise
What I've looked at:
Database calls:
All calls in the database start with:
SqlDatabase database = new SqlDatabase([...]);
And execute either proc with no required output:
return database.ExecuteScalar([...], [...]);
Or read wrapped in a using statement:
using (SqlDataReader reader = (SqlDataReader)database.ExecuteReader([...], [...]))
{
[...]
}
There are no direct uses of SqlConnection, no .Open() or .Close() methods, and no exceptions being thrown
Database verification:
We've run SQL profiler over the login / logout events, and taken snapshots with the sp_who2 command, showing who owns the connections. The latter shows that indeed the web site (seen by machine + credential) are holding the connections.
There are no scheduled jobs (DB or web server), and the user connection load is stable when there is no load from the web servers.
Connection pool config
I know the min & max size of the connection pool can be altered with the connection string.
E.g.:
"Data Source=[server];Initial Catalog=[x];Integrated Security=SSPI;Max
Pool Size=75;Min Pool Size=5;"
A fall back measure may be to set the minimum size to ~10
I understand the default max is 100, and the default min is 0 (from here)
I'm a little bit lithe to think of connection pooling (specific to this setting) and the User Connections performance counter from SQL. This article introduces these connection pools as being used to manage connection string, which seems different to what I assume it does (hold a pool of connections generally available, to avoid the cost of re-opening them on SQL)
I still haven't seen any configuration parameters that are handily defaulting to 5 or 10 minutes, to zero in on...
So, any help is appreciated.
I know that 10 minute spikes sounds like a change in load, or new activity is happening - but we've worked quite hard to isolate those & any other factors - and for this question, I am hoping to understand EL scaling its connections up & down.
Thanks.
So, it turns out that SQL user connections are created & added to the pool whenever all other connections are busy. So when long-running queries occur, or the DB is otherwise unresponsive, it will choose to expand to manage the load.
The cause of this in our case happened to be a SQL replication job (unfortunate, but found...) - And the changes in the # of User Connections was just a symptom, not a possible cause.
Although the cause turned out to be elsewhere, I now feel I understand the connection pool management, from this (and assumably other) SQL libraries.
I have an ASP.NET 3.5 server application written in C#. It makes outbound requests to a REST API using HttpWebRequest and HttpWebResponse.
I have setup a test application to send these requests on separate threads (to vaguely mimic concurrency against the server).
Please note this is more of a Mono/Environment question than a code question; so please keep in mind that the code below is not verbatim; just a cut/paste of the functional bits.
Here is some pseudo-code:
// threaded client piece
int numThreads = 1;
ManualResetEvent doneEvent;
using (doneEvent = new ManualResetEvent(false))
{
for (int i = 0; i < numThreads; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(Test), random_url_to_same_host);
}
doneEvent.WaitOne();
}
void Test(object some_url)
{
// setup service point here just to show what config settings Im using
ServicePoint lgsp = ServicePointManager.FindServicePoint(new Uri(some_url.ToString()));
// set these to optimal for MONO and .NET
lgsp.Expect100Continue = false;
lgsp.ConnectionLimit = 100;
lgsp.UseNagleAlgorithm = true;
lgsp.MaxIdleTime = 100000;
_request = (HttpWebRequest)WebRequest.Create(some_url);
using (HttpWebResponse _response = (HttpWebResponse)_request.GetResponse())
{
// do stuff
} // releases the response object
// close out threading stuff
if (Interlocked.Decrement(ref numThreads) == 0)
{
doneEvent.Set();
}
}
If I run the application on my local development machine (Windows 7) in the Visual Studio web server, I can up the numThreads and receive the same avg response time with minimal variation whether it's 1 "user" or 100.
Publishing and deploying the application to Apache2 on a Mono 2.10.2 environment, the response times scale almost linearly. (i.e, 1 thread = 300ms, 5 thread = 1500ms, 10 threads = 3000ms). This happens regardless of server endpoint (different hostname, different network, etc).
Using IPTRAF (and other network tools), it appears as though the application only opens 1 or 2 ports to route all connections through and the remaining responses have to wait.
We have built a similar PHP application and deployed in Mono with the same requests and the responses scale appropriately.
I have run through every single configuration setting I can think of for Mono and Apache and the ONLY setting that is different between the two environments (at least in code) is that sometimes the ServicePoint SupportsPipelining=false in Mono, while it is true from my machine.
It seems as though the ConnectionLimit (default of 2) is not being changed in Mono for some reason but I am setting it to a higher value both in code and the web.config for the specified host(s).
Either me and my team are overlooking something significant or this is some sort of bug in Mono.
I believe that you're hitting a bottleneck in the HttpWebRequest. The web requests each use a common service point infrastructure within the .NET framework. This appears to be intended to allow requests to the same host to be reused, but in my experience results in two bottlenecks.
First, the service points allow only two concurrent connections to a given host by default in order to be compliant to the HTTP specification. This can be overridden by setting the static property ServicePointManager.DefaultConnectionLimit to a higher value. See this MSDN page for more details. It looks as if you're already addressing this for the individual service point itself, but due to the concurrency locking scheme at the service point level, doing so may be contributing to the bottleneck.
Second, there appears to be an issue with lock granularity in the ServicePoint class itself. If you decompile and look at the source for the lock keyword, you'll find that it uses the instance itself to synchronize and does so in many places. With the service point instance being shared among web requests for a given host, in my experience this tends to bottleneck as more HttpWebRequests are opened and causes it to scale poorly. This second point is mostly personal observation and poking around the source, so take it with a grain of salt; I wouldn't consider it an authoritative source.
Unfortunately, I did not find a reasonable substitute at the time that I was working with it. Now that the ASP.NET Web API has been released, you may wish to give the HttpClient a look. Hope that helps.
I know this is pretty old but I'm putting this here in case it might help somebody else who runs into this issue. We ran into the same problem with parallel outbound HTTPS requests. There are a few issues at play.
The first issue is that ServicePointManager.DefaultConnectionLimit did not change the connection limit as far as I can tell. Setting this to 50, creating a new connection, and then checking the connection limit on the service point for the new connection says 2. Setting it on that service point to 50 once appears to work and persist for all connections that will end up going through that service point.
The second issue we ran into was with threading. The current implementation of the mono thread pool appears to create at most 2 new threads per second. This is an eternity if you are doing many parallel requests that start at exactly the same time. To counteract this, we tried setting ThreadPool.SetMinThreads to a higher number. It appears that Mono only creates up to 1 new thread when you make this call, regardless of the delta between the current number of threads and the desired number. We were able to work around this by calling SetMinThreads in a loop until the thread pool had the desired number of idle threads.
I opened a bug about the latter issue because that's the one I'm most confident is not working as intended: https://bugzilla.xamarin.com/show_bug.cgi?id=7055
If #jake-moshenko is right about ServicePointManager.DefaultConnectionLimit not having any effect if changed in Mono, please file this as a bug in http://bugzilla.xamarin.com/.
However I would try some things before discarding this completely as a Mono issue:
Try using the SGen garbage collector instead of the old boehm one, by passing --gc=sgen as a flag to mono.
If the above doesn't help, upgrade to Mono 3.2 (which BTW defaults to SGEN GC too), because there has been a lot of fixes since you asked the question.
If the above doesn't help, build your own Mono (master branch), as this important pull request about threading has been merged recently.
If the above doesn't help, build your own Mono with this pull request added. If it fixes your problem, please add a "+1" to the pull request. It might be a fix for bug 7055.