Download file with timeout - c#

I am attempting to just download the html from a webpage, but I want to give up after 10 seconds. The code below downloads the text just fine, but it can take longer than 10 seconds. I have the timeout set, but it is the StreamReading that takes a long time. What is the best way to stop any further processing after 10 seconds while still closing connections?
I am getting a WebException if req.GetResponse() takes longer than 10 seconds, but wr.GetResponseStream() reading is what is taking time. I also want to ensure that all connections are properly closed.
Code:
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Timeout = 10000;
req.ReadWriteTimeout = 10000;
using (WebResponse wr = req.GetResponse())
{
Console.WriteLine("A: " + DateTime.Now.ToString(" HH:mm:ss:fff"));
using (StreamReader sr = new StreamReader(wr.GetResponseStream(), true))
{
Console.WriteLine("B: " + DateTime.Now.ToString(" HH:mm:ss:fff"));
var b = sr.ReadToEnd();
Console.WriteLine("C: " + DateTime.Now.ToString(" HH:mm:ss:fff"));
}
}
Sample Output:
A: 20:04:36:522
B: 20:04:36:522
C: 20:04:54:337
Elapsed Time: ~18 Seconds

The time is consumed in ReadToEnd
Use
public virtual int Read(
char[] buffer,
int index,
int count
)
set count to like 4000, even the slowest connections should have enough bandwidth to provide that 4000 characters or 8kB almost instantaneously.
Be sure to increment index of your buffer between each read, or you could use an 8kB buffer and just add the contents of it to a dynamic buffer each iteration.
Use Read inside a loop that checks the time and exits if greater than timeout or exits if Read returns a value less than count.
Also, you might want to look into async transfers, the right way to get data from the net: HttpWebRequest - Asynchronous Programming

Related

Too much time difference between receiving response from server and writing it into log while reading file from Google Cloud Storage using HttpClient

I need to download multiple files from GCS. For this I have used the code
public class GCSStorage
{
static HttpClient httpClient;
static GoogleCredential credential = GoogleCredential.FromFile(ConfigurationManager.AppSettings["GCPCredentials"]);
if (credential.IsCreateScopedRequired)
{
credential = credential.CreateScoped(new[]
{
"https://www.googleapis.com/auth/devstorage.read_only"
});
httpClient = new Google.Apis.Http.HttpClientFactory()
.CreateHttpClient(
new Google.Apis.Http.CreateHttpClientArgs()
{
ApplicationName = "",
GZipEnabled = true,
Initializers = { credential },
});
httpClient.Timeout = new TimeSpan(0, 0, 5);
}
public string ReadObjectData(string bucketName, string location)
{
string responseBody = "";
bool isFetched = false;
try
{
Stopwatch sw = new Stopwatch();
string pathcode = System.Web.HttpUtility.UrlEncode(location);
UriBuilder uri = new UriBuilder(string.Format(googleStorageApi, bucketName, pathcode));
sw.Start();
var httpResponseMessage = httpClient.GetAsync(uri.Uri).Result;
var t = sw.ElapsedMilliseconds;
if (httpResponseMessage.StatusCode == HttpStatusCode.OK)
{
responseBody = httpResponseMessage.Content.ReadAsStringAsync().Result;
log.Info($"Read file from location : {location} in Get() time : {t} ms , ReadAsString time : {sw.ElapsedMilliseconds - t} ms, Total time : {sw.ElapsedMilliseconds} ms");
}
isFetched = true;
}
catch (Exception ex)
{
throw ex;
}
return responseBody;
}
}
And called that for multiple files using
GCSStorage gcs = new GCSStorage();
ParallelOptions option = new ParallelOptions { MaxDegreeOfParallelism = options };
Parallel.ForEach(myFiles, option, ri =>
{
text = gcs.ReadObjectData(bucket, ri); ;
});
I am recording the time taken for each individual file to download in ReadObjectData(). When I download the files using MaxDegreeOfParallelism as 1, then each file is downloaded in about 100-150ms. But when I change MaxDegreeOfParallelism to 50, time varies between 1-3s. I am downloading a bunch of 50 files.
I have no idea why this is happening. Can anyone help me understand this behavior.
Also, I have tried doing the same with Amazon S3. S3 gives a constant download time of 50-100ms in both scenarios.
I profiled the GCS response using fiddler. For the requests that are taking time (~>200ms), Overall Elapsed is around 100-200 ms but the time to write the log is much higher. For others it is exactly at the same time.
Why would the time there would be so much time difference b/w some of the requests?
Fiddler Statistics
Request Count: 1
Bytes Sent: 439 (headers:439; body:0)
Bytes Received: 7,759 (headers:609; body:7,150)
ACTUAL PERFORMANCE
--------------
ClientConnected: 18:03:35.137
ClientBeginRequest: 18:04:13.606
GotRequestHeaders: 18:04:13.606
ClientDoneRequest: 18:04:13.606
Determine Gateway: 0ms
DNS Lookup: 0ms
TCP/IP Connect: 0ms
HTTPS Handshake: 0ms
ServerConnected: 18:03:35.152
FiddlerBeginRequest: 18:04:13.606
ServerGotRequest: 18:04:13.606
ServerBeginResponse: 18:04:13.700
GotResponseHeaders: 18:04:13.700
ServerDoneResponse: 18:04:13.700
ClientBeginResponse: 18:04:13.700
ClientDoneResponse: 18:04:13.700
Overall Elapsed: 0:00:00.093
Log file
INFO 2018-08-25 18:04:13,606 41781ms GCSStorage ReadObjectData - Get() time : 114 ms
INFO 2018-08-25 18:04:14,512 42688ms GCSStorage ReadObjectData - Get() time : 902 ms
I could see that
LogTime - ClientDoneResponse + Overall Elapsed is approximately equal to Total Time
18:04:14.512 - 18:04:13.700 + 0:00:00.093 = 905 ms
Why is there so much time difference b/w receiving the response from server and writing it into the log?
When you are doing parallel programming, with multiple threads you need to have a few things in mind. First of all it is true that parallelism improves performance, but it is not that infinite parallelilsm is better than sequential . There are many reasons for this. One is you are limited by t he number of your physical cores and also hyper threading in your OS. For example if you have 8 cores, the best performance you will get is with 8 threads, if hyperthreading is also active, then it might be that with 16 threads you get a good performance.
In your example changing number of threads from 1 to 50 is too much. Try it in steps, 2, 4, 6, 8 , 10 and see when you get the best performance (record the time as you have done so far).
That number then is the best number for your parallelism most likely.

Report progress without slowing procedure

Im trying to decrypt a file reporting the progress to show it in a progress bar, here is my decription function
private static void Decrypt(String inName, String outName, byte[] rijnKey, byte[] rijnIV)
{
FileStream fin = new FileStream(inName, FileMode.Open, FileAccess.Read);
FileStream fout = new FileStream(outName, FileMode.OpenOrCreate, FileAccess.Write);
fout.SetLength(0);
byte[] bin = new byte[1048576];
long rdlen = 0;
long totlen = fin.Length;
int len;
SymmetricAlgorithm rijn = SymmetricAlgorithm.Create();
CryptoStream encStream = new CryptoStream(fout, rijn.CreateDecryptor(rijnKey, rijnIV), CryptoStreamMode.Write);
while (rdlen < totlen)
{
len = fin.Read(bin, 0, bin.Length);
encStream.Write(bin, 0, len);
rdlen = rdlen + len;
//Call here a method to report progress
}
encStream.Close();
fout.Close();
fin.Close();
}
I want to call a method to report the progress inside the loop, but depending on the response time of the method this may slow the performance of the decrypter, how can I report the progress without this problem?
Thanks!
A couple of suggestions for you:
In your reporting method, have it check as to how long it was since it last reported. If it is less than, say, 0.3s, have it return without doing anything - no progress bar needs to be updated more than 3 times per second.
and/or
Offload the work of the reporting method onto another thread - that way the method within your loop will return immediately (your loop can continue right away). In your method on the other thread, include a check not to start another thread (i.e. just return without doing anything) if the previous reporting thread has not completed yet.
Or simpler still, which may well work in your situation, include a counter in your loop and then every n times through your loop, do your progress report and reset the value of n to zero. Select a value for n by experiment, so that it updates often enough (a couple of times per second) but you are not doing more progress updates than you have to. e.g. if your loop iterates at 3000 times per second, doing your update every 1000th time will be fine.

Multithread HttpWebRequest hangs randomly on responseStream

I'm coding a multithreaded web-crawler that performs a lot of concurrent httpwebrequests every second using hundreds of threads, the application works great but sometimes(randomly) one of the webrequests hangs on the getResponseStream() completely ignoring the timeout(this happen when I perform hundreds of requests concurrently) making the crawling process never end, the strange thing is that with fiddler this never happen and the application never hang, it is really hard to debug because it happens randomly.
I've tried to set
Keep-Alive = false
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
but I still get the strange behavior, any ideas?
Thanks
HttpWebRequest code:
public static string RequestHttp(string url, string referer, ref CookieContainer cookieContainer_0, IWebProxy proxy)
{
string str = string.Empty;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.UserAgent = randomuseragent();
request.ContentType = "application/x-www-form-urlencoded";
request.Accept = "*/*";
request.CookieContainer = cookieContainer_0;
request.Proxy = proxy;
request.Timeout = 15000;
request.Referer = referer;
//request.ServicePoint.MaxIdleTime = 15000;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
List<byte> list = new List<byte>();
byte[] buffer = new byte[0x400];
int count = responseStream.Read(buffer, 0, buffer.Length);
while (count != 0)
{
list.AddRange(buffer.ToList<byte>().GetRange(0, count));
if (list.Count >= 0x100000)
{
break;
}
count = 0;
try
{
HERE IT HANGS SOMETIMES ---> count = responseStream.Read(buffer, 0, buffer.Length);
continue;
}
catch
{
continue;
}
}
//responseStream.Close();
int num2 = 0x200 * 0x400;
if (list.Count >= num2)
{
list.RemoveRange((num2 * 3) / 10, list.Count - num2);
}
byte[] bytes = list.ToArray();
str = Encoding.Default.GetString(bytes);
Encoding encoding = Encoding.Default;
if (str.ToLower().IndexOf("charset=") > 0)
{
encoding = GetEncoding(str);
}
else
{
try
{
encoding = Encoding.GetEncoding(response.CharacterSet);
}
catch
{
}
}
str = encoding.GetString(bytes);
// response.Close();
}
}
return str.Trim();
}
The Timeout property "Gets or sets the time-out value in milliseconds for the GetResponse and GetRequestStream methods." The default value is 100,000 milliseonds (100 seconds).
The ReadWriteTimeout property, "Gets or sets a time-out in milliseconds when writing to or reading from a stream." The default is 300,000 milliseconds (5 minutes).
You're setting Timeout, but leaving ReadWriteTimeout at the default, so your reads can take up to five minutes before timing out. You probably want to set ReadWriteTimeout to a lower value. You might also consider limiting the size of data that you download. With my crawler, I'd sometimes stumble upon an unending stream that would eventually result in an out of memory exception.
Something else I noticed when crawling is that sometimes closing the response stream will hang. I found that I had to call request.Abort to reliably terminate a request if I wanted to quit before reading the entire stream.
There is nothing apparent in the code you provided.
Why did you comment response.Close() out?
Documentation hints that connections may run out if not explicitly closed. The response getting disposed may close the connection but just releasing all the resources is not optimal I think. Closing the response will also close the stream so that is covered.
The system hanging without timeout can be just a network issue making the response object a dead duck or the problem is due the high number of threads resulting in memory fragmentation.
Looking at anything that may produce a pattern may help find the source:
How many threads are typically running (can you bundle request sets in less threads)
How is the network performance at the time the thread stopped
Is there a specific count or range when it happens
What data was processed last when it happened (are there any specific control characters or sequences of data that can upset the stream)
Want to ask more questions but not enough reputation so can only reply.
Good luck!
Below is some code that does something similar, it's also used to access multiple web sites, each call is in a different task. The difference is that I only read the stream once and then parse the results. That might be a way to get around the stream reader locking up randomly or at least make it easier to debug.
try
{
_webResponse = (HttpWebResponse)_request.GetResponse();
if(_request.HaveResponse)
{
if (_webResponse.StatusCode == HttpStatusCode.OK)
{
var _stream = _webResponse.GetResponseStream();
using (var _streamReader = new StreamReader(_stream))
{
string str = _streamReader.ReadToEnd();

How do I increase the performance of HttpWebResponse on HTTPS Requests?

I am building an application that highly relies on the loading speed of a web page.
I am not getting any good results with HttpWebResponse on C#. I am getting better results with internet browsers like Chrome and IE
Here are the stats that i collected:
HttpWebResponse (C#) = 17 Seconds / 20 Requests
Javascript/iFrame on Chrome = 9 seconds / 20 requests
Javascript/iFrame on IE = 11 seconds / 20 requests
Question #1
Is there anything i can do, to optimize my code for better performance?
Question #2
I can click start button twice and open two connections, so that i can get on par with browser performance. This works great, however the website i send a request to has a limit. If i send a new request before the other one is completed, it blocks my connection for 10 minutes. Is there a way i can prevent this?
My Thread:
void DomainThreadNamecheapStart()
{
while (stop == false)
{
foreach (string FromDomainList in DomainList.Lines)
{
if (FromDomainList.Length > 1)
{
// I removed my api parameters from the string
string namecheapapi = "https://api.namecheap.com/foo" + FromDomainList + "bar";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(namecheapapi);
request.Proxy = null;
request.ServicePoint.Expect100Continue = false;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
status.Text = FromDomainList + "\n" + sr.ReadToEnd();
sr.Close();
}
}
}
}
My Button:
private void button2_Click(object sender, EventArgs e)
{
stop = false;
Thread DomainThread = new Thread(new ThreadStart(DomainThreadNamecheapStart));
DomainThread.Start();
}
My Old Question:
How do I increase the performance of HttpWebResponse?
You're creating a thread every time the button is pressed. Creating a thread is expensive and takes time by itself. Try using a thread from an existing thread pool (try QueueUserWorkItem) and see if that helps.

Consuming a HTTP stream without reading one byte at a time

I have been trying to read data from the Twitter stream API using C#, and since sometimes the API will return no data, and I am looking for a near-realtime response, I have been hesitant to use a buffer length of more than 1 byte on the reader in case the stream doesn't return any more data for the next day or two.
I have been using the following line:
input.BeginRead(buffer, 0, buffer.Length, InputReadComplete, null);
//buffer = new byte[1]
Now that I plan to scale the application up, I think a size of 1 will result in a lot of CPU usage, and want to increase that number, but I still don't want the stream to just block. Is it possible to get the stream to return if no more bytes are read in the next 5 seconds or something similar?
Async Option
You can use a timer in the async callback method to complete the operation if no bytes are received for e.g. 5 seconds. Reset the timer every time bytes are received. Start it before BeginRead.
Sync Option
Alternatively, you can use the ReceiveTimeout property of the underlying socket to establish a maximum time to wait before completing the read. You can use a larger buffer and set the timeout to e.g. 5 seconds.
From the MSDN documentation that property only applies to a synchronous read. You could perform a synchronous read on a separate thread.
UPDATE
Here's rough, untested code pieced together from a similar problem. It will probably not run (or be bug-free) as-is, but should give you the idea:
private EventWaitHandle asyncWait = new ManualResetEvent(false);
private Timer abortTimer = null;
private bool success = false;
public void ReadFromTwitter()
{
abortTimer = new Timer(AbortTwitter, null, 50000, System.Threading.Timeout.Infinite);
asyncWait.Reset();
input.BeginRead(buffer, 0, buffer.Length, InputReadComplete, null);
asyncWait.WaitOne();
}
void AbortTwitter(object state)
{
success = false; // Redundant but explicit for clarity
asyncWait.Set();
}
void InputReadComplete()
{
// Disable the timer:
abortTimer.Change(System.Threading.Timeout.Infinite, System.Threading.Timeout.Infinite);
success = true;
asyncWait.Set();
}

Categories

Resources