Async TcpClient disables the Internet after 30 seconds of sending data - c#

I wrote a small library for sending simple HTTP requests for the sake of interest based on asynchronous TcpClient. I went to the tests and faced a problem. When I run about 200 tasks to send a request, the program sends and receives data in about 30 seconds, after which Internet access is cut off and requests are stopped.
var tasks = new List<Task>();
for (int i = 0; i < 200; i++)
{
tasks.Add(Task.Run(async () =>
{
while (true)
{
var message = new GetHttpMessage("www.stackoverflow.com")
{
Headers = { {"User-agent","Test-agent-1"} }
};
var req = new Request(message);
var resp = await req.SendAsync();
}
}));
}
await Task.WhenAll(tasks);
The Internet disappears completely, the browser does not work too. Having decided to look at the activity of the network, I saw that at the beginning of work the connections actively appear and disappear, and at the time of stopping they simply hang in a certain state.
Screenshot of network activity
I use a router, maybe the problem is in it?

Related

Asp.net core Web API parallel call , batch processing with some data lost

Created one web API in asp.net core 3.x which is responsible to save data in to Azure service bus queue and that data we will process for reporting.
API load is too high so we decided to save data in-memory for each request. Once data limit increased up to certain limit (>50 count) next request (51) will get all data from memory and save in to service bus in one go and clear the memory cache.
for sequential request all logic works fine but when load coming in parallel few data lost and i think it is because of one batch request is taking some time and after that all data problem start.
I did some research and found article and used SemaphoreSlim. It's working fine but is that good approach? As you see in below code I am blocking each request but actually I want to lock when I am processing the batch. I tried to move the lock inside if condition but it was not working.
https://medium.com/swlh/async-lock-mechanism-on-asynchronous-programing-d43f15ad0b3
using (await lockThread.LockAsync())
{
var topVisitedTiles = _service.GetFromCache(CacheKey);
if (topVisitedTiles?.Count >= 50)
{
topVisitedTiles?.Add(link);
await _service.AddNewQuickLinkAsync(topVisitedTiles);
_service.SetToCache(CacheKey, new List<TopVisitedTilesItem>());
return Ok(link.Title);
}
topVisitedTiles?.Add(link);
_service.SetToCache(CacheKey, topVisitedTiles);
}
return Ok(link.Title);
I got something from research that concurrentbag and blockingcollection help but I am not aware how can i use in my case. Your small direction will help me.
You can use Task Parallel Library if you don't want to deep dive into parallel implementations of bags or queues.
In your case something like this can be used
// Define a buffer block with size = 10
var batchBlock = new BatchBlock<string>(10);
// Define an ActionBlock that processes batches received from BatchBlock
var processingBlock = new ActionBlock<string[]>((messages) =>
{
Console.WriteLine("-------------");
Console.WriteLine($"Number of messages: {messages.Length}");
Console.WriteLine($"Messages: {string.Join(", ", messages)}");
});
// Link processing block to batchBloack.
batchBlock.LinkTo(processingBlock);
batchBlock.Completion.ContinueWith((t) =>
{
processingBlock.Complete();
});
var task1 = Task.Run(async () =>
{
for (int i = 0; i < 50; i++)
{
await batchBlock.SendAsync($"Message {i}");
}
});
var task2 = Task.Run(async () =>
{
for (int i = 50; i < 100; i++)
{
await batchBlock.SendAsync($"Message {i}");
}
});
await Task.WhenAll(task1, task2);
// Complete pipeline. You can leave it as active if you want.
batchBlock.Complete();
processingBlock.Completion.Wait();

Cache HTTP request name resolution for different URLs to same host. Possible?

The problem summary: I need to make call to HTTP resource A while using name resolution from previous HTTP request to resource B on the same host.
CASE 1. Consecutive calls to same resource produce faster result after 1st call.
Profiler tells me that the difference between 1st and 2nd call goes to DNS name resolution (GetHostAddresses)
var request = (HttpWebRequest)WebRequest.Create("https://www.somehost.com/resources/b.txt");
using (var response = (HttpWebResponse)request.GetResponse()) {}
var request = (HttpWebRequest)WebRequest.Create("https://www.somehost.com/resources/b.txt");
using (var response = (HttpWebResponse)request.GetResponse()) {}
CASE 2. Consecutive calls to different resources on the same host produce same delay.
Profiler tells me that they both incur calls to DNS name resolution.
var request = (HttpWebRequest)WebRequest.Create("https://www.somehost.com/resources/a.txt");
using (var response = (HttpWebResponse)request.GetResponse()) {}
var request = (HttpWebRequest)WebRequest.Create("https://www.somehost.com/resources/b.txt");
using (var response = (HttpWebResponse)request.GetResponse()) {}
I wonder why in case 2 second call cant use DNS cache from first call? its the same host.
And main question - how to change that?
EDIT the behaviour above covers also use of HttpClient class. It appeared this is specific to the few webservers I use and this issue does not happen on other servers. I cant figure what specifically happens but I suspect the webservers in question (Amazon CloudFront and Akamai) force close connection after it has been served, ignoring my requests keep-alive headers. I am going to close this for now as it is not possible to formulate a conscious question..
Your problem doesn't exist for System.Net.Http.HttpClient, try it instead. It can reuse the existing connections (no DNS cache needed for such calls). Looks like that is exactly what you want to achieve. As a bonus it supports HTTP/2 (can be enabled with Property assignment at HttpClient instance creation).
WebRequest is ancient and not recommentded by Microsoft for new development. In .NET 5 HttpClient is rather faster (twice?).
Create the HttpClient instance once per application (link).
private static readonly HttpClient client = new HttpClient();
Analog of your request. Note await is available only in methods marked as async.
string text = await client.GetStringAsync("https://www.somehost.com/resources/b.txt");
You may also do multiple requests at once without spawning concurrent Threads.
string[] urls = new string[]
{
"https://www.somehost.com/resources/a.txt",
"https://www.somehost.com/resources/b.txt"
};
List<Task<string>> tasks = new List<Task<string>>();
foreach (string url in urls)
{
tasks.Add(client.GetStringAsync(url));
}
string[] results = await Task.WhenAll(tasks);
If you're not familiar with Asynchronous programming e.g. async/await, start with this article.
Also you can set a limit how many requests will be processed at once. Let's do the same request 1000 times with limit to 10 requests at once.
static async Task Main(string[] args)
{
Stopwatch sw = new StopWatch();
string url = "https://www.somehost.com/resources/a.txt";
using SemaphoreSlim semaphore = new SemaphoreSlim(10);
List<Task<string>> tasks = new List<Task<string>>();
sw.Start();
for (int i = 0; i < 1000; i++)
{
await semaphore.WaitAsync();
tasks.Add(GetPageAsync(url, semaphore));
}
string[] results = await Task.WhenAll(tasks);
sw.Stop();
Console.WriteLine($"Elapsed: {sw.Elapsemilliseconds}ms");
}
private static async Task GetPageAsync(string url, SemaphoreSlim semaphore)
{
try
{
return await client.GetStringAsync(url);
}
finally
{
semaphore.Release();
}
}
You may measure the time.

RestSharp - Asynchronous Request Reply Pattern

The following situation is given:
A new job is sent to an API via Post Request. This API returns a JobID and the HTTP ResponseCode 202.
This JobID is then used to request a status endpoint. If the end point has a "Finished" property set in the response body, you can continue with step 3.
The results are queried via a result endpoint using the JobID and can be processed.
My question is how I can solve this elegantly and cleanly. Are there perhaps already ready-to-use libraries that implement exactly this functionality? I could not find such functionality for RestSharp or another HttpClient.
The current solution looks like this:
async Task<string> PostNewJob()
{
var restClient = new RestClient("https://baseUrl/");
var restRequest = new RestRequest("jobs");
//add headers
var response = await restClient.ExecutePostTaskAsync(restRequest);
string jobId = JsonConvert.DeserializeObject<string>(response.Content);
return jobId;
}
async Task WaitTillJobIsReady(string jobId)
{
string jobStatus = string.Empty;
var request= new RestRequest(jobId) { Method = Method.GET };
do
{
if (!String.IsNullOrEmpty(jobStatus))
Thread.Sleep(5000); //wait for next status update
var response = await restClient.ExecuteGetTaskAsync(request, CancellationToken.None);
jobStatus = JsonConvert.DeserializeObject<string>(response.Content);
} while (jobStatus != "finished");
}
async Task<List<dynamic>> GetJobResponse(string jobID)
{
var restClient = new RestClient(#"Url/bulk/" + jobID);
var restRequest = new RestRequest(){Method = Method.GET};
var response = await restClient.ExecuteGetTaskAsync(restRequest, CancellationToken.None);
dynamic downloadResponse = JsonConvert.DeserializeObject(response.Content);
var responseResult = new List<dynamic>() { downloadResponse?.ToList() };
return responseResult;
}
async main()
{
var jobId = await PostNewJob();
WaitTillJobIsReady(jobID).Wait();
var responseResult = await GetJobResponse(jobID);
//handle result
}
As #Paulo Morgado said, I should not use Thread.Sleep / Task Delay in production code. But in my opinion I have to use it in the method WaitTillJobIsReady() ? Otherwise I would overwhelm the API with Get Requests in the loop?
What is the best practice for this type of problem?
Long Polling
There are multiple ways you can handle this type of problem, but as others have already pointed out no library such as RestSharp currently has this built in. In my opinion, the preferred way of overcoming this would be to modify the API to support some type of long-polling like Nikita suggested. This is where:
The server holds the request open until new data is available. Once
available, the server responds and sends the new information. When the
client receives the new information, it immediately sends another
request, and the operation is repeated. This effectively emulates a
server push feature.
Using a scheduler
Unfortunately this isn't always possible. Another more elegant solution would be to create a service that checks the status, and then using a scheduler such as Quartz.NET or HangFire to schedule the service at reoccurring intervals such as 500ms to 3s until it is successful. Once it gets back the "Finished" property you can then mark the task as complete to stop the process from continuing to poll. This would arguably be better than your current solution and offer much more control and feedback over whats going on.
Using Timers
Aside from using Thread.Sleep a better choice would be to use a Timer. This would allow you to continuously call a delegate at specified intervals, which seems to be what you are wanting to do here.
Below is an example usage of a timer that will run every 2 seconds until it hits 10 runs. (Taken from the Microsoft documentation)
using System;
using System.Threading;
using System.Threading.Tasks;
class Program
{
private static Timer timer;
static void Main(string[] args)
{
var timerState = new TimerState { Counter = 0 };
timer = new Timer(
callback: new TimerCallback(TimerTask),
state: timerState,
dueTime: 1000,
period: 2000);
while (timerState.Counter <= 10)
{
Task.Delay(1000).Wait();
}
timer.Dispose();
Console.WriteLine($"{DateTime.Now:HH:mm:ss.fff}: done.");
}
private static void TimerTask(object timerState)
{
Console.WriteLine($"{DateTime.Now:HH:mm:ss.fff}: starting a new callback.");
var state = timerState as TimerState;
Interlocked.Increment(ref state.Counter);
}
class TimerState
{
public int Counter;
}
}
Why you don't want to use Thread.Sleep
The reason that you don't want to use Thread.Sleep for operations that you want on a reoccurring schedule is because Thread.Sleep actually relinquishes control and ultimately when it regains control is not up to the thread. It's simply saying it wants to relinquish control of it's remaining time for a least x milliseconds, but in reality it could take much longer for it to regain it.
Per the Microsoft documentation:
The system clock ticks at a specific rate called the clock resolution.
The actual timeout might not be exactly the specified timeout, because
the specified timeout will be adjusted to coincide with clock ticks.
For more information on clock resolution and the waiting time, see the
Sleep function from the Windows system APIs.
Peter Ritchie actually wrote an entire blog post on why you shouldn't use Thread.Sleep.
EndNote
Overall I would say your current approach has the appropriate idea on how this should be handled however, you may want to 'future proof' it by doing some refactoring to utilize on of the methods mentioned above.

How can I restart communication with an FTP server that sends back reset packets without restarting our process?

We have a (long-running) Windows service that among other things periodically communicates with an FTP server embedded on a third-party device using FtpWebRequest. This works great most of the time, but sometimes our service stops communicating with the device, but as soon as you restart our service everything starts working again.
I've spent some time debugging this with an MCVE (included below) and discovered via Wireshark that once communication starts failing there is no network traffic going to the external FTP server (no packets at all show up going to this IP in Wireshark). If I try to connect to the same FTP from another application on the same machine like Windows explorer everything works fine.
Looking at the packets just before everything stops working I see packets with the reset (RST) flag set coming from the device, so I suspect this may be the issue. Once some part of the network stack on the computer our service in running on receives the reset packet it does what's described in the TCP resets section of this article and blocks all further communication from our process to the device.
As far as I can tell there's nothing wrong with the way we're communicating with the device, and most of the time the exact same code works just fine. The easiest way to reproduce the issue (see MCVE below) seems to be to make a lot of separate connections to the FTP at the same time, so I suspect the issue may occur when there are a lot of connections being made to the FTP (not all by us) at the same time.
The thing is that if we do restart our process everything works fine, and we do need to re-establish communication with the device. Is there a way to re-establish communication (after a suitable amount of time has passed) without having to restart the entire process?
Unfortunately the FTP server is running embedded on a fairly old third-party device that's not likely to be updated to address this issue, and even if it were we'd still want/need to communicate with all the ones already out in the field without requiring our customers to update them if possible.
Options we are aware of:
Using a command line FTP client such as the one built into Windows.
One downside to this is that we need to list all the files in a directory and then download only some of them, so we'd have to write logic to parse the response to this.
We'd also have to download the files to a temp file instead of to a stream like we do now.
Creating another application that handles the FTP communication part that we tear down after each request completes.
The main downside here is that inter-process communication is a bit of a pain.
MCVE
This runs in LINQPad and reproduces the issue fairly reliably. Typically the first several tasks succeed and then the issue occurs, and after that all tasks start timing out. In Wireshark I can see that no communication between my computer and the device is happening.
If I run the script again then all tasks fail until I restart LINQPad or do "Cancel All Threads and Reset" which restarts the process LINQPad uses to run the query. If I do either of those things then we're back to the first several tasks succeeding.
async Task Main() {
var tasks = new List<Task>();
var numberOfBatches = 3;
var numberOfTasksPerBatch = 10;
foreach (var batchNumber in Enumerable.Range(1, numberOfBatches)) {
$"Starting tasks in batch {batchNumber}".Dump();
tasks.AddRange(Enumerable.Range(1, numberOfTasksPerBatch).Select(taskNumber => Connect(batchNumber, taskNumber)));
await Task.Delay(TimeSpan.FromSeconds(5));
}
await Task.WhenAll(tasks);
}
async Task Connect(int batchNumber, int taskNumber) {
try {
var client = new FtpClient();
var result = await client.GetFileAsync(new Uri("ftp://192.168.0.191/logging/20140620.csv"), TimeSpan.FromSeconds(10));
result.Count.Dump($"Task {taskNumber} in batch {batchNumber} succeeded");
} catch (Exception e) {
e.Dump($"Task {taskNumber} in batch {batchNumber} failed");
}
}
public class FtpClient {
public virtual async Task<ImmutableList<Byte>> GetFileAsync(Uri fileUri, TimeSpan timeout) {
if (fileUri == null) {
throw new ArgumentNullException(nameof(fileUri));
}
FtpWebRequest ftpWebRequest = (FtpWebRequest)WebRequest.Create(fileUri);
ftpWebRequest.Method = WebRequestMethods.Ftp.DownloadFile;
ftpWebRequest.UseBinary = true;
ftpWebRequest.KeepAlive = false;
using (var source = new CancellationTokenSource(timeout)) {
try {
using (var response = (FtpWebResponse)await ftpWebRequest.GetResponseAsync()
.WithWaitCancellation(source.Token)) {
using (Stream ftpStream = response.GetResponseStream()) {
if (ftpStream == null) {
throw new InvalidOperationException("No response stream");
}
using (var dataStream = new MemoryStream()) {
await ftpStream.CopyToAsync(dataStream, 4096, source.Token)
.WithWaitCancellation(source.Token);
return dataStream.ToArray().ToImmutableList();
}
}
}
} catch (OperationCanceledException) {
throw new WebException(
String.Format("Operation timed out after {0} seconds.", timeout.TotalSeconds),
WebExceptionStatus.Timeout);
} finally {
ftpWebRequest.Abort();
}
}
}
}
public static class TaskCancellationExtensions {
/// http://stackoverflow.com/a/14524565/1512
public static async Task<T> WithWaitCancellation<T>(
this Task<T> task,
CancellationToken cancellationToken) {
// The task completion source.
var tcs = new TaskCompletionSource<Boolean>();
// Register with the cancellation token.
using (cancellationToken.Register(
s => ((TaskCompletionSource<Boolean>)s).TrySetResult(true),
tcs)) {
// If the task waited on is the cancellation token...
if (task != await Task.WhenAny(task, tcs.Task)) {
throw new OperationCanceledException(cancellationToken);
}
}
// Wait for one or the other to complete.
return await task;
}
/// http://stackoverflow.com/a/14524565/1512
public static async Task WithWaitCancellation(
this Task task,
CancellationToken cancellationToken) {
// The task completion source.
var tcs = new TaskCompletionSource<Boolean>();
// Register with the cancellation token.
using (cancellationToken.Register(
s => ((TaskCompletionSource<Boolean>)s).TrySetResult(true),
tcs)) {
// If the task waited on is the cancellation token...
if (task != await Task.WhenAny(task, tcs.Task)) {
throw new OperationCanceledException(cancellationToken);
}
}
// Wait for one or the other to complete.
await task;
}
}
This reminds me of old(?) IE behaviour of no reload of pages even when the network came back after N unsuccessful tries.
You should try setting the FtpWebRequest's cache policy to BypassCache.
HttpRequestCachePolicy bypassPolicy = new HttpRequestCachePolicy(
HttpRequestCacheLevel.BypassCache
);
ftpWebRequest.CachePolicy = bypassPolicy;
after setting KeepAlive.
I had the same issue, when trying to connect to an ftps server without the EnableSsl = true. The connection would fail twice, Wireshark shows the RST command, and then no more requests would leave the network resulting in the timeout exception, even after setting the EnableSsl = true.
I found setting the ConnectionGroupName allows the connection to reset and use a new port.
eg:
request.ConnectionGroupName = Guid.NewGuid();
Beware of port exhaustion using this method however, see https://learn.microsoft.com/en-us/troubleshoot/dotnet/framework/ports-run-out-use-connectiongroupname

async HttpClient requests slowing down

I have list of 10 000 000 urls in text file. Now I open every of them in my await/async method - at the beging the speed is very good (near 10 000 urls / min) but while the program is running it's decreasing to reach 500 urls / min after ~10 hours. When I restart the program and run from begging the situation is the same - fast at beggining and then slower and slower. I'm working on Windows Server 2008 R2. Tested my code at various PC - some results. Can You tell me where is the problem?
int finishedUrls = 0;
IEnumerable<string> urls = File.ReadLines("urlslist.txt");
await urls.ForEachAsync(500, async url =>
{
Uri newUri;
if (!Uri.TryCreate(siteUrl, UriKind.Absolute, out newUri)) return false;
_uri = newUri;
var timeout = new CancellationTokenSource(TimeSpan.FromSeconds(30));
string html = "";
using(var _httpClient = new HttpClient { Timeout = TimeSpan.FromSeconds(30),MaxResponseContentBufferSize = 300000 }) {
using(var _req = new HttpRequestMessage(HttpMethod.Get, _uri)){
using( var _response = await _httpClient.SendAsync(_req,HttpCompletionOption.ResponseContentRead,timeout.Token).ConfigureAwait(false)) {
if (_response != null &&
(_response.StatusCode == HttpStatusCode.OK || _response.StatusCode == HttpStatusCode.NotFound))
{
using (var cancel = timeout.Token.Register(_response.Dispose))
{
var rawResponse = await _response.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
html = Encoding.UTF8.GetString(rawResponse);
}
}
}
}
}
Interlocked.Increment(ref finishedUrls);
});
http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx
I believe you are exhausting your io completion ports. You need to throttle your requests. If you need higher concurrency than a single box can handle, then distribute your concurrent requests across more machines. I'd suggest using TPL more managing the conncurrency. I ran into this exact same behavior doing similar things. Also, you should absolutely not be disposing your HttpClient per request. Pull that code out and use a single client.

Categories

Resources