How to make short-living HttpClient (socket problem)

How to make short-living HttpClient (socket problem) - c#

I have WPF (.net 6.0) application with multiple web-services in its model. Some services have their own HttpClients.
But sometimes I need to make multiple requests within short period of time. The problem is the HttpClient must be long-living/reusable, and sockets are garbaged if we create multiple HttpClients for short work. Some of my web-services used only once a day or more rarely.
Example 1:
I have 150 proxies. Once a day I need to check all my proxies. Recreating HttpClient 150 times is smell and perfomance could be affected. Keeping these clients as long-living also bad idea.
Example 2:
I need to make multiple requests with preset of default headers/proxy/cookies or another combination of unchangable data from HttpClientHandler (unchangable from first request) once a day/week/month.
Question:
Is there any a single solution that can solve these problems. Some kind of magical HttpClient or Handler analog that doesn't have a socket problem and allows you to make a short queue of requests without loss of performance and speed of work.
I've seen something similar to the solution of this problem somewhere, maybe even on MSDN. But I can't find this article anywhere.

Related

Http connections slow down or deadlock with .NET HttpClient

We have an asp.net webapi application that needs to issue a lot of calls to other web applications (it's basically a reverse proxy). To do this we use the async methods of the HttpClient.
Yes, we have seen the hints about using only one HttpClient instance and not to dispose of it.
Yes, we have seen the hints about setting configuration values, especially the problem with the lease timeout. Currently we set ConnectionLimit = CPU*12, ConnectionLeaseTimeout = 5min and MaxIdleTime = 30s.
We can see that the connections behave as desired. The throughput in a load test was also very good. However we are facing issues where occasionally the connections stop working. It seems to happen when a lot of requests are coming in (and, being a reverse proxy, cause new requests to be issued) and it happens mostly (but not only) with the slowest of all backend applications. The behaviour is then that it takes forever to finish the requests to this endpoint or they simply end in a timeout.
An IISReset of the server hosting our reverse proxy application terminates the problems (for a while).
We have investigated in several areas already:
Performance issues of the remote web application: Although it behaves exactly as this would be the case the performance is good when the same requests are issued locally on the remote server. Also the values for CPU / network etc. are low.
Network issues (bandwidth, router, firewall, load balancers): Possible but rather unlikely since everything else runs stable and our hoster is involved in the analysis too.
Threadpool starvation: Not impossible but rather theoretical - sure we have a lot of async calls but shouldn't that help regarding this issue?
HttpCompletionOption.ResponseHeadersRead: Not a problem by itself but maybe one piece of the puzzle?
The best explanation so far focuses on the ConnectionLimit: We started setting the values mentioned above only recently and this seems to have triggered the problems. But why would it? Shouldn't it be an improvement to reuse the connections instead of opening a new one for every request? And the values we set seem to be rather conservative?
We have started to experiment with these values lately to see their impact in production. Yet it is still unclear to us if this is the only cause. And we'd appreciate a more straighforward approach for analysis. Unfortunately a memory dump and netstat printouts did not help any further.
Some suggestions about how to analyze or hints about possible causes would be highly appreciated.
***** EDIT *****
Setting the connection limit to 1000 is solving the issue! So the question remains as to why is that the case? From what we know the default connection limit is 2 in a non-web and 1000 in a web application. MS is suggesting a default value of CPU*12 (but they didn't implement it like that?!) so our change was basically to go from 1000 to 48. Still we can see that only a handful connections are open. Is there anyone who can shed some light on this? What is the exact behaviour about opening new connections, reusing existing ones, pipelining etc.? Is there any source of information for this?

ConnectionLimit means ServicePointManager.DefaultConnectionLimit? Yes it matters. When the value is X, if there are already X requests waiting response, new request will not be sent until any previous request is finished.

I posted a follow up question here: How to disable pipelining for the .NET HttpClient
Unfortunately there were no real answers to any of my questions. We ended up leaving the ConnectionLimit at 1000 (which is a workaround only but the only solution we were able to find).

WCF timeout handling

We are currently developing a software solution which has a client and a number of WCF services that it consumes. The issues we are having is WCF services timing out after a period of inactivity. As far as I understand, there are 2 ways to resolve this:
Increase timeouts (as far as I understood, this is generally not recommended. Eg. setting timeout to infinite/weeks is considered bad practice)
Periodically ping the WCF services from the Client (I'm not sure that I'm a huge fan of his as it will add redundant, periodic calls)
Handle timeout issues and attempt to reconnect (this is slow and requires a lot of manual code)
Reliable Sessions - some sources mention that this is the in-built WCF pinging and message reliability mechanism, but other sources mention that this will still time out.
What is the recommended/best way of resolving this issue? Is there any official reading material on this? I could not find all that much info myself
Thanks!

As i can see, you have to use a combination of your stated points.
You are right, increasing the timeouts is bad practice and can give you a lot of problems.
If you don't want to use Reliable Sessions, then Ping is the only applicable way to hold the connection.
You need to handle this things, no matter if a timeout occurs, the connection is lost or a exception is thrown. There are a plenty of possibilities that your connection can fault.
Reliable Sessions are a good way not to implement a ping, but technically, it does nearly the same. WCF automatically sends an "I am still here" Request.
The conclusion of this is, that you need point 3 and point 2 or 4. To reduce the manually code for point 3, you can use Proxies or a wrapper around your ServiceClient, that establishes a new connection if the old one is faulted during a request. Point 4 is easy to implement, because you only need some small additions to your binding in your config. And the traffic overhead is not that big. Point 2 is the most expensive way, you need to handle a Thread/Task that only pings the server and the service needs to be extended. But as you stated before, Reliable Sessions can fail, and Pings should bring you on the safe side.

You should ask yourself what is your WCF endpoint is doing? Is the way you have your command setup the most optimal?
Perhaps it'd be better to have your endpoint that takes a long time be based on a polling system that allows there to be a quick query instead of waiting on the results of the endpoints actions.
You should also consider data transfer as a possible issue. Is the amount of data you're transferring back a lot?
To get a more pointed answer, we'd need to know more about the specific endpoint as well as any other responsibilities there are for the service.

C# HttpClient Post choosing the right Timeout

It is valid behavior that an http(tcp) request can get lost without the listeners get informed. see here for the discussion on that:
C# httpClient (block for async call) deadlock
Problem
We are using HttpClient.PostAsJsonAsync to upload a Json File to a server. However in worst case scenarios this upload can take several hours.
That's why just using HttpClient.Timeout is not working for us. This is an hard timeout and we need to have it huge.
So what do we do when the tcp connection is gone and the client does not detect that. With our huge timeout we are stuck for a long time. So is there any other Timeout we can use in such cases? Any other ideas or best practices?
I was also looking into tcp sockets keep alive, but that doesn't seem to be an option.

After some research, I finally found an article which describes the issue and provides a workaround:
http://www.thomaslevesque.com/2014/01/14/tackling-timeout-issues-when-uploading-large-files-with-httpwebrequest/
According to this article, there is a design flaw in HttpWebRequest which I was able to reproduce. Seems ridiculous that the timeout also effects the upload.
However, I can live with the provided workaround (WebRequestExtensions) since our code is synchronous anyway.

Scaling up Multiple HttpWebRequests?

I'm building a server application that needs to perform a lot of http requests to a couple other servers on an ongoing basis. Currently, I'm basically setting up about 30 threads and continuously running HttpWebRequests synchronously on each thread, achieving a throughput of about 30 requests per second.
I am indeed setting the ServicePoint ConnectionLimit in the app.config so that's not the limiting factor.
I need to scale this up drastically. At the very least I'll need some more CPU horse power, but I'm wondering if I would gain any advantages by using the async methods of the HttpWebRequest object (eg: .BeginGetResponse() ) as opposed to creating threads myself and using the synchronous methods (eg: .GetResponse() ) on these threads.
If I go with the async methods, I obviously have to significantly redesign my app, so I'm wondering if anyone might have some insight before I go and recode everything, in case I'm out to lunch.
Thanks!

If you are on Windows NT, then System.Net.Sockets.Socket class always uses IO Completion ports for async operations. And HTTPWebRequest in async mode uses async sockets, and hence will be using IOCP.
Without doing detailed benchmarking, it is difficult to say if our bottleneck is inside HttpWebRequest, or up the stack, in your application, or on the remote side, in the server. But offhand, for sure, asyncc will give you better performance, because it will end up using IOCP under the covers. And reimplementing the app for async is not that difficult.
So, I would suggest that you first change your app architecture to async. Then see how much max throughput you are getting. Then you can start benchmarking and finding out where the bottleneck is, and removing that.

Fastest result so far for me is using 75 threads running sync httpwebrequest.
About 140 requests per second on a windows 2003 server, 4core 3ghz, 100mb connection.
Async Httprequest / winsock got stuck at about 30-50 req/sec. Did not test sync winsock but I guess it would give you about the same result as httpwebrequest.
Tests was against 1 200 000 blog feeds.
Been struggling with this the last month so it would be interesting to know if someone managed to squeeze more out of .net?
EDIT
New test: Got 350req/sec with the xfserver iocp component. Used a bunch of threads with one instance each before any greater result. The "client part" of the lib had a couple of really annoying bugs that made implementation harder then the "server part". Not what you're asking for and not recommended but some kind of step.
Next: Former winsock test did not use the 3.5 SocketAsyncEventArgs, that will be next.
ANSWER
The answer to your question, no it will not be worth the effort.
The async HttpWebRequest methods offloads main thread while keeping download in background, it does not improve the number/scalability of requests. (at least not in 3.5, might be different in 4.0?)
However, what might be worth looking at is building your own wrapper around async sockets/SocketAsyncEventArgs where iocp works and perhaps implement a begin/end pattern similar to HttpWebRequest (for easiest possible implementation in current code). The improvement is really enormous.

What is the best way scale out work to multiple machines?

We're developing a .NET app that must make up to tens of thousands of small webservice calls to a 3rd party webservice. We would prefer a more 'chunky' call, but the 3rd party does not support it. We've designed the client to use a configurable number of worker threads, and through testing have code that is fairly well optimized for one multicore machine. However, we still want to improve the speed, and are looking at spreading the work accross multiple machines. We're well versed in typical client/server/database apps, but new to designing for multiple machines. So, a few questions related to that:
Is there any other client-side optimization, besides multithreading, that we should look at that could improve speed of a http request/response? (I should note this is a non-standard webservice, so is implemented using WebClient, not a WCF or SOAP client)
Our current thinking is to use WCF to publish chunks of work to MSMQ, and run clients on one or more machines to pull work off of the queue. We have experience with WCF + MSMQ, but want to be sure we're not missing better options. Are there other, better ways to do this today?
I've seen some 3rd party tools like DigiPede and Microsoft's HPC offerings, but these seem like overkill. Any experience with those products or reasons we should consider them over roll-our-own?

Sounds like your goal is to execute all these web service calls as quickly as you can, and get the results tabulated. Given that, your greatest efficiency control is going to be through scaling the number of concurrent requests you can make.
Be sure to look at your client-side connection limits. By default, I think the system default is 2 connections. I haven't tried this myself, but by upping the number of connections with this property, you should theoretically see a multiplier effect in terms of generating more requests by generating more connections from a single machine. There's more info on MS forums.
The MSMQ option works well. I'm running that configuration myself. ActiveMQ is also a fine solution, but MSMQ is already on the server.
You have a good starting point. Get that in operation, then move on to performance and throughput.

At CodeMash this year, Wesley Faler did an interesting presentation on this sort of problem. His solution was to store "jobs" in a DB, then use clients to pull down work and mark status when complete.
He then pushed the whole infrastructure up to Amazon's EC2.
Here's his slides from the presentation - they should give you the basic idea:
I've done something similar w/ multiple PC's locally - the basics of managing the workload were similar to Faler's approach.

If you have optimized the code, you could look into optimizing the network side to minimize the number of packets sent:
reuse HTTP sessions (i.e.: multiple transactions into one session by keeping the connection open, reduces TCP overhead)
reduce the number of HTTP headers to the minimum in the request to save bandwidth
if supported by server, use gzip to compress the body of the request (need to balance CPU usage to do the compression, and the bandwidth you save)

You might want to consider Rhino Service Bus instead of MSMQ. The source is available here.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.