I'm coding a multithreaded web-crawler that performs a lot of concurrent httpwebrequests every second using hundreds of threads, the application works great but sometimes(randomly) one of the webrequests hangs on the getResponseStream() completely ignoring the timeout(this happen when I perform hundreds of requests concurrently) making the crawling process never end, the strange thing is that with fiddler this never happen and the application never hang, it is really hard to debug because it happens randomly.
I've tried to set
Keep-Alive = false
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
but I still get the strange behavior, any ideas?
Thanks
HttpWebRequest code:
public static string RequestHttp(string url, string referer, ref CookieContainer cookieContainer_0, IWebProxy proxy)
{
string str = string.Empty;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.UserAgent = randomuseragent();
request.ContentType = "application/x-www-form-urlencoded";
request.Accept = "*/*";
request.CookieContainer = cookieContainer_0;
request.Proxy = proxy;
request.Timeout = 15000;
request.Referer = referer;
//request.ServicePoint.MaxIdleTime = 15000;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream responseStream = response.GetResponseStream())
{
List<byte> list = new List<byte>();
byte[] buffer = new byte[0x400];
int count = responseStream.Read(buffer, 0, buffer.Length);
while (count != 0)
{
list.AddRange(buffer.ToList<byte>().GetRange(0, count));
if (list.Count >= 0x100000)
{
break;
}
count = 0;
try
{
HERE IT HANGS SOMETIMES ---> count = responseStream.Read(buffer, 0, buffer.Length);
continue;
}
catch
{
continue;
}
}
//responseStream.Close();
int num2 = 0x200 * 0x400;
if (list.Count >= num2)
{
list.RemoveRange((num2 * 3) / 10, list.Count - num2);
}
byte[] bytes = list.ToArray();
str = Encoding.Default.GetString(bytes);
Encoding encoding = Encoding.Default;
if (str.ToLower().IndexOf("charset=") > 0)
{
encoding = GetEncoding(str);
}
else
{
try
{
encoding = Encoding.GetEncoding(response.CharacterSet);
}
catch
{
}
}
str = encoding.GetString(bytes);
// response.Close();
}
}
return str.Trim();
}
The Timeout property "Gets or sets the time-out value in milliseconds for the GetResponse and GetRequestStream methods." The default value is 100,000 milliseonds (100 seconds).
The ReadWriteTimeout property, "Gets or sets a time-out in milliseconds when writing to or reading from a stream." The default is 300,000 milliseconds (5 minutes).
You're setting Timeout, but leaving ReadWriteTimeout at the default, so your reads can take up to five minutes before timing out. You probably want to set ReadWriteTimeout to a lower value. You might also consider limiting the size of data that you download. With my crawler, I'd sometimes stumble upon an unending stream that would eventually result in an out of memory exception.
Something else I noticed when crawling is that sometimes closing the response stream will hang. I found that I had to call request.Abort to reliably terminate a request if I wanted to quit before reading the entire stream.
There is nothing apparent in the code you provided.
Why did you comment response.Close() out?
Documentation hints that connections may run out if not explicitly closed. The response getting disposed may close the connection but just releasing all the resources is not optimal I think. Closing the response will also close the stream so that is covered.
The system hanging without timeout can be just a network issue making the response object a dead duck or the problem is due the high number of threads resulting in memory fragmentation.
Looking at anything that may produce a pattern may help find the source:
How many threads are typically running (can you bundle request sets in less threads)
How is the network performance at the time the thread stopped
Is there a specific count or range when it happens
What data was processed last when it happened (are there any specific control characters or sequences of data that can upset the stream)
Want to ask more questions but not enough reputation so can only reply.
Good luck!
Below is some code that does something similar, it's also used to access multiple web sites, each call is in a different task. The difference is that I only read the stream once and then parse the results. That might be a way to get around the stream reader locking up randomly or at least make it easier to debug.
try
{
_webResponse = (HttpWebResponse)_request.GetResponse();
if(_request.HaveResponse)
{
if (_webResponse.StatusCode == HttpStatusCode.OK)
{
var _stream = _webResponse.GetResponseStream();
using (var _streamReader = new StreamReader(_stream))
{
string str = _streamReader.ReadToEnd();
Related
Assume I have the following code:
private string PostData(string functionName, string parsedContent)
{
string url = // some url;
var http = (HttpWebRequest)WebRequest.Create(new Uri(url));
http.Accept = "application/json";
http.ContentType = "application/json";
http.Method = "POST";
http.Timeout = 15000; // 15 seconds
Byte[] bytes = Encoding.UTF8.GetBytes(parsedContent);
using (Stream newStream = http.GetRequestStream())
{
newStream.Write(bytes, 0, bytes.Length);
}
using (WebResponse response = http.GetResponse())
{
using (var stream = response.GetResponseStream())
{
var sr = new StreamReader(stream);
var content = sr.ReadToEnd();
return content;
}
}
}
I set up a breakpoint over this line of code:
using (Stream newStream = http.GetRequestStream())
before http.GetRequestStream() gets executed. Here is a screenshot of my active threads:
This whole method is running in background thread with ThreadId = 3 as you can see.
After pressing F10 we get http.GetRequestStream() method executed. And here is an updated screenshot of active threads:
As you can see, now we have one extra active thread that is in state of waiting. Probably the method http.GetRequestStream() spawns it. Everything is fine, but.. this thread keeps hanging like that for the whole app lifecycle, which seems not to be the intended behaviour.
Am I misusing GetRequestStream somehow?
If I use ilspy it looks like the request is send asynchronously. That would explain the extra thread.
Looking a little bit deeper the HttpWebRequest creates a static TimerQueue with one thread and a never ending loop, that has a Monitor.WaitAny in it. Every webrequest in the appdomain will register a timer callback for timeout handling and all those callbacks are handled by that thread. Due to it being static that instance will never get garbage collected and therefore it will keep hold of the thread.
It did register for the AppDomain.Unload event so if that fires it will clean up it's resources including any threads.
Do notice that these are all internal classes and those implementation details might change at any time.
I am building an application that highly relies on the loading speed of a web page.
I am not getting any good results with HttpWebResponse on C#. I am getting better results with internet browsers like Chrome and IE
Here are the stats that i collected:
HttpWebResponse (C#) = 17 Seconds / 20 Requests
Javascript/iFrame on Chrome = 9 seconds / 20 requests
Javascript/iFrame on IE = 11 seconds / 20 requests
Question #1
Is there anything i can do, to optimize my code for better performance?
Question #2
I can click start button twice and open two connections, so that i can get on par with browser performance. This works great, however the website i send a request to has a limit. If i send a new request before the other one is completed, it blocks my connection for 10 minutes. Is there a way i can prevent this?
My Thread:
void DomainThreadNamecheapStart()
{
while (stop == false)
{
foreach (string FromDomainList in DomainList.Lines)
{
if (FromDomainList.Length > 1)
{
// I removed my api parameters from the string
string namecheapapi = "https://api.namecheap.com/foo" + FromDomainList + "bar";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(namecheapapi);
request.Proxy = null;
request.ServicePoint.Expect100Continue = false;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
status.Text = FromDomainList + "\n" + sr.ReadToEnd();
sr.Close();
}
}
}
}
My Button:
private void button2_Click(object sender, EventArgs e)
{
stop = false;
Thread DomainThread = new Thread(new ThreadStart(DomainThreadNamecheapStart));
DomainThread.Start();
}
My Old Question:
How do I increase the performance of HttpWebResponse?
You're creating a thread every time the button is pressed. Creating a thread is expensive and takes time by itself. Try using a thread from an existing thread pool (try QueueUserWorkItem) and see if that helps.
I am attempting to just download the html from a webpage, but I want to give up after 10 seconds. The code below downloads the text just fine, but it can take longer than 10 seconds. I have the timeout set, but it is the StreamReading that takes a long time. What is the best way to stop any further processing after 10 seconds while still closing connections?
I am getting a WebException if req.GetResponse() takes longer than 10 seconds, but wr.GetResponseStream() reading is what is taking time. I also want to ensure that all connections are properly closed.
Code:
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Timeout = 10000;
req.ReadWriteTimeout = 10000;
using (WebResponse wr = req.GetResponse())
{
Console.WriteLine("A: " + DateTime.Now.ToString(" HH:mm:ss:fff"));
using (StreamReader sr = new StreamReader(wr.GetResponseStream(), true))
{
Console.WriteLine("B: " + DateTime.Now.ToString(" HH:mm:ss:fff"));
var b = sr.ReadToEnd();
Console.WriteLine("C: " + DateTime.Now.ToString(" HH:mm:ss:fff"));
}
}
Sample Output:
A: 20:04:36:522
B: 20:04:36:522
C: 20:04:54:337
Elapsed Time: ~18 Seconds
The time is consumed in ReadToEnd
Use
public virtual int Read(
char[] buffer,
int index,
int count
)
set count to like 4000, even the slowest connections should have enough bandwidth to provide that 4000 characters or 8kB almost instantaneously.
Be sure to increment index of your buffer between each read, or you could use an 8kB buffer and just add the contents of it to a dynamic buffer each iteration.
Use Read inside a loop that checks the time and exits if greater than timeout or exits if Read returns a value less than count.
Also, you might want to look into async transfers, the right way to get data from the net: HttpWebRequest - Asynchronous Programming
I am using VSTS 2008 + C# + .Net 3.5 to develop a console application and I send request to another server (IIS 7.0 on Windows Server 2008). I find when the # of request threads are big (e.g. 2000 threads), the client will receive error "Unable to connect to remote server fail" when invoking response = (HttpWebResponse)request.GetResponse().My confusion is -- I have set timeout to be a large value, but I got such fail message within a minute. I think even if the connection are really larger than what IIS could serve, client should not get such fail message so soon, it should get such message after timeout period. Any comments? Any ideas what is wrong? Any ideas to make more number of concurrent connection being served by IIS 7.0?
Here is my code,
class Program
{
private static int ClientCount = 2000;
private static string TargetURL = "http://labtest/abc.wmv";
private static int Timeout = 3600;
static void PerformanceWorker()
{
Stream dataStream = null;
HttpWebRequest request = null;
HttpWebResponse response = null;
StreamReader reader = null;
try
{
request = (HttpWebRequest)WebRequest.Create(TargetURL);
request.Timeout = Timeout * 1000;
request.Proxy = null;
response = (HttpWebResponse)request.GetResponse();
dataStream = response.GetResponseStream();
reader = new StreamReader(dataStream);
// 1 M at one time
char[] c = new char[1000 * 10];
while (reader.Read(c, 0, c.Length) > 0)
{
Console.WriteLine(Thread.CurrentThread.ManagedThreadId);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message + "\n" + ex.StackTrace);
}
finally
{
if (null != reader)
{
reader.Close();
}
if (null != dataStream)
{
dataStream.Close();
}
if (null != response)
{
response.Close();
}
}
}
static void Main(string[] args)
{
Thread[] workers = new Thread[ClientCount];
for (int i = 0; i < ClientCount; i++)
{
workers[i] = new Thread((new ThreadStart(PerformanceWorker)));
}
for (int i = 0; i < ClientCount; i++)
{
workers[i].Start();
}
for (int i = 0; i < ClientCount; i++)
{
workers[i].Join();
}
return;
}
}
Kev answered you question already, I just want to add that creating so many threads is not really good design solution (just context switching overhead is a big minus) plus it won't scale good.
The quick answer would be: use asynchronous operations to read data instead of creating a bunch of threads. Or at least use thread pool (and lower worker thread count). Remember that more connections to one source will only speed things up till some degree. Try benchmarking it and you will see that probably 3-5 connections will work faster that 2000 you are using now.
You can read more about asynchronous client/server architecture (IOCP - input/output completion ports) and its advantages here. You can start from here:
MSDN - Using an Asynchronous Server Socket
MSDN - Asynchronous Server Socket Example
CodeProject - Multi-threaded .NET TCP Server Examples
All of these examples uses lower level TCP object, but it can be applied to WebRequest/WebResponse as well.
UPDATE
To try thread pool version, you can do something like this:
ManualResetEvent[] events = new ManualResetEvent[ClientCount];
for (uint cnt = 0; cnt < events.Length; cnt++)
{
events[cnt] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(obj => PerformanceWorker());
}
WaitHandle.WaitAll(events);
Not tested, may need some adjustment.
I reckon you've maxed out the web site's application pool queue. The default is 1000 requests, you're flooding the server with 2000 requests more or less all at once. Increasing the timeout isn't going to solve this.
Try increasing the Queue Length for the application pool the site resides in.
You should try and capture the underlying HTTP status, that'll give you a clue as to what is really going on.
Update:
When I run your code and try and download a sizeable file (200MB) I get (503) Server Unavailable.. Increasing the size of the Application Pool's request queue solves this (I set mine to 10000).
Only once did I see Unable to connect to remote server and sadly have been unable to replicate. This error sounds like there's something broken at the TCP/IP layer. Can you post the full exception?
Go to Smart Thread Pool and downlod the code. It is an instance thread pool that constrains the number of threads. The .Net Thread pool can be problematic in applications that connect to web servers and SQL servers.
Change the loop to this
static void Main(string[] args)
{
var stp = new SmartThreadPool((int) TimeSpan.FromMinutes(5).TotalMilliseconds,
Environment.ProcessorCount - 1, Environment.ProcessorCount - 1);
stp.Start();
for (var i = 0; i < ClientCount; i++)
{
stp.QueueWorkItem(PerformanceWorker);
}
stp.WaitForIdle();
stp.Shutdown();
return;
}
This constrains the thread pool to use 1 thread per proc. Adjust this up until performance starts to degrade. Too many threads are worse than too few. you many find that this is optimal.
Also add this to you config. The value of 100 is a default I use. There is a way to do this in code but the syntax escapes me now.
<system.net>
<connectionManagement>
<add address=“*“ maxconnection=“100“ />
</connectionManagement>
</system.net>
I am using Visual Studio 2005. How to send an SMS, here is my code:
IPHostEntry host;
host = Dns.GetHostEntry(Dns.GetHostName());
UriBuilder urlbuilder = new UriBuilder();
urlbuilder.Host = host.HostName;
urlbuilder.Port = 4719;
string PhoneNumber = "9655336272";
string message = "Just a simple text";
string subject = "MMS subject";
string YourChoiceofName = "victoria";
urlbuilder.Query = string.Format("PhoneNumber=%2B" + PhoneNumber + "&MMSFrom=" + YourChoiceofName + "&MMSSubject=" + subject + "&MMSText=" + message);//+ "&MMSFile=http://127.0.0.1/" + fileName
HttpWebRequest httpReq = (HttpWebRequest)WebRequest.Create(new Uri(urlbuilder.ToString(), false));
HttpWebResponse httpResponse = (HttpWebResponse)(httpReq.GetResponse());
here's my method:
private static void UpdatePref(List<EmailPrefer> prefList)
{
if(prefList.Count > 0)
{
foreach (EmailPref pref in prefList)
{
UpdateEmailRequest updateRequest = new UpdateEmailRequest(pref.ID.ToString(), pref.Email, pref.ListID.ToString());
UpdateEmailResponse updateResponse =(UpdateEmailResponse) updateRequest.SendRequest();
if (updateResponse.Success)
{
Console.WriteLine(String.Format("Update Succsesful. ListID:{0} Email:{2} ID:{1}", pref.ListID, pref.Email, pref.ID));
continue;
}
Console.WriteLine( String.Format("Update Unsuccessful. ListID:{0} Email:{2} ID:{1}\n", pref.ListID, pref.Email, pref.ID));
Console.WriteLine(String.Format("Error:{0}", updateResponse.ErrorMessage));
}
Console.WriteLine("Updates Complete.");
}
Console.WriteLine("Procses ended. No records found to update");
}
the list has around 84 valid records that it's looping through and sending an API request for. But it stops on the 3rd API call and only processes 2 out of the 84 records. When I debug to see what's happening, I only see that it stops here in my SendRequest method without spitting out any error. It's stops at the GetRequestStream and when I step to that and try to keep stepping, it just stops and my application stops running without any error!
HttpWebRequest request = CreateWebRequest(requestURI, data.Length);
request.ContentLength = data.Length;
request.KeepAlive = false;
request.Timeout = 30000;
// Send the Request
requestStream = request.GetRequestStream();
wtf? Eventually if I let it keep running I do get the error "The Operation Has Timed Out". But then why did the first 2 calls go through and this one timed out? I don't get it.
Also, a second question. Is it inefficient to have it create a new object inside my foreach for sending and receiving? But that's how I stubbed out those classes and required that an email, ListID and so forth be a requirement to send that type of API call. I just didn't know if it's fine or not efficient to create a new instance through each iteration in the foreach. Might be common but just felt weird and inefficient to me.
EDIT: It seems you answered your own question already in the comments.
I don't have personal experience with this, but it seems you need to call close on the HTTP web request after you've fetched the response. There's a limit of 2 on the number of open connections and the connection isn't freed until you Close(). See http://blogs.msdn.com/feroze_daud/archive/2004/01/21/61400.aspx, which gives the following code to demonstrate the symptoms you're seeing.
for(int i=0; i < 3; i++) {
HttpWebRequest r = WebRequest.Create(“http://www.microsoft.com“) as HttpWebRequest;
HttpWebResponse w = r.GetResponse() as HttpWebResponse;
}
One possibility for it timing out is that the server you're talking to is throttling you. You might try inserting a delay (a second, maybe?) after each update.
Assuming that UpdateEmailRequest and UpdateEmailResponse are somehow derived from WebRequest and WebResponse respectively, it's not particularly inefficient to create the requests the way you're doing it. That's pretty standard. However, note that WebResponse is IDisposable, meaning that it probably allocates unmanaged resources, and you should dispose of it--either by calling the Dispose method. Something like this:
UpdateEmailResponse updateResponse =(UpdateEmailResponse) updateRequest.SendRequest();
try
{
if (updateResponse.Success)
{
Console.WriteLine(String.Format("Update Succsesful. ListID:{0} Email:{2} ID:{1}", pref.ListID, pref.Email, pref.ID));
continue;
}
Console.WriteLine( String.Format("Update Unsuccessful. ListID:{0} Email:{2} ID:{1}\n", pref.ListID, pref.Email, pref.ID));
Console.WriteLine(String.Format("Error:{0}", updateResponse.ErrorMessage));
}
finally
{
updateResponse.Dispose();
}
I guess it's possible that not disposing of the response objects keeps an open connection to the server, and the server is timing out because you have too many open connections.