Why is this WebRequest code slow?

Why is this WebRequest code slow? - c#

I requested 100 pages that all 404. I wrote
{
var s = DateTime.Now;
for(int i=0; i < 100;i++)
DL.CheckExist("http://google.com/lol" + i.ToString() + ".jpg");
var e = DateTime.Now;
var d = e-s;
d=d;
Console.WriteLine(d);
}
static public bool CheckExist(string url)
{
HttpWebRequest wreq = null;
HttpWebResponse wresp = null;
bool ret = false;
try
{
wreq = (HttpWebRequest)WebRequest.Create(url);
wreq.KeepAlive = true;
wreq.Method = "HEAD";
wresp = (HttpWebResponse)wreq.GetResponse();
ret = true;
}
catch (System.Net.WebException)
{
}
finally
{
if (wresp != null)
wresp.Close();
}
return ret;
}
Two runs show it takes 00:00:30.7968750 and 00:00:26.8750000. Then i tried firefox and use the following code
<html>
<body>
<script type="text/javascript">
for(var i=0; i<100; i++)
document.write("<img src=http://google.com/lol" + i + ".jpg><br>");
</script>
</body>
</html>
Using my comp time and counting it was roughly 4 seconds. 4 seconds is 6.5-7.5faster then my app. I plan to scan through a thousands of files so taking 3.75hours instead of 30mins would be a big problem. How can i make this code faster? I know someone will say firefox caches the images but i want to say 1) it still needs to check the headers from the remote server to see if it has been updated (which is what i want my app to do) 2) I am not receiving the body, my code should only be requesting the header. So, how do i solve this?

I noticed that an HttpWebRequest hangs on the first request. I did some research and what seems to be happening is that the request is configuring or auto-detecting proxies. If you set
request.Proxy = null;
on the web request object, you might be able to avoid an initial delay.
With proxy auto-detect:
using (var response = (HttpWebResponse)request.GetResponse()) //6,956 ms
{
}
Without proxy auto-detect:
request.Proxy = null;
using (var response = (HttpWebResponse)request.GetResponse()) //154 ms
{
}

change your code to asynchronous getresponse
public override WebResponse GetResponse() {
•••
IAsyncResult asyncResult = BeginGetResponse(null, null);
•••
return EndGetResponse(asyncResult);
}
Async Get

Probably Firefox issues multiple requests at once whereas your code does them one by one. Perhaps adding threads will speed up your program.

The answer is changing HttpWebRequest/HttpWebResponse to WebRequest/WebResponse only. That fixed the problem.

Have you tried opening the same URL in IE on the machine that your code is deployed to? If it is a Windows Server machine then sometimes it's because the url you're requesting is not in IE's (which HttpWebRequest works off) list of secure sites. You'll just need to add it.
Do you have more info you could post? I've doing something similar and have run into tons of problems with HttpWebRequest before. All unique. So more info would help.
BTW, calling it using the async methods won't really help in this case. It doesn't shorten the download time. It just doesn't block your calling thread that's all.

close the response stream when you are done, so in your checkExist(), add wresp.Close() after wresp = (HttpWebResponse)wreq.GetResponse();

OK if you are getting status code 404 for all webpages then it is due to not specifying credentials. So you need to add
wreq.Credentials = CredentialCache.DefaultCredentials;
Then you may also come across status code= 500 for that you need to specify User Agent. Which looks something like the below line
wreq.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0";
"A WebClient instance does not send optional HTTP headers by default. If your request requires an optional header, you must add the header to the Headers collection. For example, to retain queries in the response, you must add a user-agent header. Also, servers may return 500 (Internal Server Error) if the user agent header is missing."
reference: https://msdn.microsoft.com/en-us/library/system.net.webclient(v=vs.110).aspx
To improve the Performance of the HttpWebrequest you need to add
wreq.Proxy=null
now the code will look like:
static public bool CheckExist(string url)
{
HttpWebRequest wreq = null;
HttpWebResponse wresp = null;
bool ret = false;
try
{
wreq = (HttpWebRequest)WebRequest.Create(url);
wreq.Credentials = CredentialCache.DefaultCredentials;
wreq.Proxy=null;
wreq.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0";
wreq.KeepAlive = true;
wreq.Method = "HEAD";
wresp = (HttpWebResponse)wreq.GetResponse();
ret = true;
}
catch (System.Net.WebException)
{
}
finally
{
if (wresp != null)
wresp.Close();
}
return ret;
}

Set cookie is matter and you must add AspxAutoDetectCookieSupport=1 like this code
req.CookieContainer = new CookieContainer();
req.CookieContainer.Add(new Cookie("AspxAutoDetectCookieSupport", "1") { Domain = target.Host });

Related

C# generate a cookie dynamically that site will accept?

I have tried just about all related solutions found on the Web, but they all refused to work for some reason. And this does not work too: C# - HttpWebRequest POST (Login to Facebook) , since we are using different methods.
And I am not using the POST method, but the GET method, which is being used in a request. The site I am using does not need any login credentials to get the image. (Most of the other root domains the site has does not require a cookie.)
The below code is a part of what I figured out to make the program get the image like the web-based versions do, but with a few problems.
Before, I was trying to use a normal WebClient to download the image since it refused to show up in any way that the PictureBox control would accept. But then I switched to HttpWebRequest.
The particular root domain of the site where I am trying to get the image from requires a cookie, though.
Below is a code snippet which basically tries to get an image from a site. The only trouble is, it is almost impossible to get the image from the site unless you pass a few things in the HttpWebRequest, along with a cookie.
For now, I am using a static cookie as a temporary workaround.
HttpWebRequest _request = (HttpWebRequest)HttpWebRequest.Create(_URL);
_request.Method = WebRequestMethods.Http.Get;
_request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
_request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");
_request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
_request.Headers.Set(HttpRequestHeader.CacheControl, "max-age=0");
_request.Host = "www.habbo" + _Country;
_request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";
using (WebResponse _response = _request.GetResponse())
using (Stream _stream = _response.GetResponseStream())
{
Image _image = Image.FromStream(_stream);
_bitmap = new Bitmap(_image);
string contentType = _response.ContentType;
_PictureBox.Image = _bitmap;
}
Let's let the following variables be:
_URL = "http://www.habbo.com/habbo-imaging/avatarimage?hb=img&user=aa&direction=2&head_direction=2&size=m&img_format=gif";
_Country = ".com";
Most of the things I am passing into the HttpWebRequest is obtained from looking at the Network tab of Google Chrome's Developer Tools.
The web-based versions of the Habbo Imager seems to just direct people to the page where they can find the image, and their browsers seem to somehow add the cookie. What I am doing is different, as all they do is display the site where the image is located, but I want to locate the image's true location, then read from it to a type Image.
Apparently the site seems to need the user to "visit" them, according to what I read from this thread: Click here
What I would like to know is, is there a better way to get a valid cookie that the server will happily accept every time?
Or do I need to somehow trick the site into thinking the user has visited the page and seen it, thereby making them maybe return the cookie we might need, even though the user doesn't ever see the page?
Not too sure if this would mean that I need to somehow dynamically generate the cookies though.
I also do not understand how to truly create or get the cookies (and set stored cookies) using C#, so if it is possible, please use some examples.
I would prefer to not use any third-party libraries, or to change the code I am using too much. Neither is the program going to send two GET requests just to be able to get what it could get with one GET request. Thus, this wouldn't work: Passing cookie with HttpWebRequest in winforms?
I am using .NET 4.0.

It is a little bit more complicated than at first sight expected. The browser makes actually two calls. The first one returns an html script with a small piece of javascript that when executed sets a cookie and reload the page. In your c# code you have to mimic that.
In your form class add an instance variable to hold all the cookies across multiple httpwebrequest calls:
readonly CookieContainer cookiecontainer = new CookieContainer();
I have created a Builder method that creates the HttpWebRequest and returns an HttpWebResponse. It takes a namevaluecollection to add any cookies to the Cookiecontainer.
private HttpWebResponse Builder(string url, string host, NameValueCollection cookies)
{
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
request.Method = WebRequestMethods.Http.Get;
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
// _request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");
request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
request.Headers.Set(HttpRequestHeader.CacheControl, "max-age=0");
request.Host = host;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";
request.CookieContainer = cookiecontainer;
if (cookies != null)
{
foreach (var cookiekey in cookies.AllKeys)
{
request.CookieContainer.Add(
new Cookie(
cookiekey,
cookies[cookiekey],
#"/",
host));
}
}
return (HttpWebResponse) request.GetResponse();
}
If the incoming stream turns out to be an text/html contenttype we need to parse its content and return the cookie name and value. The Parse method does just that:
// find in the html and return the three parameters in a string array
// setCookie('YPF8827340282Jdskjhfiw_928937459182JAX666', '127.0.0.1', 10);
private static string[] Parse(Stream _stream, string encoding)
{
const string setCookieCall = "setCookie('";
// copy html as string
var ms = new MemoryStream();
_stream.CopyTo(ms);
var html = Encoding.GetEncoding(encoding).GetString(ms.ToArray());
// find setCookie call
var findFirst = html.IndexOf(
setCookieCall,
StringComparison.InvariantCultureIgnoreCase) + setCookieCall.Length;
var last = html.IndexOf(");", findFirst, StringComparison.InvariantCulture);
var setCookieStatmentCall = html.Substring(findFirst, last - findFirst);
// take the parameters
var parameters = setCookieStatmentCall.Split(new[] {','});
for (int x = 0; x < parameters.Length; x++)
{
// cleanup
parameters[x] = parameters[x].Replace("'", "").Trim();
}
return parameters;
}
Now are our building blocks complete we can start calling our methods from the Click method. We use a loop to call our Builder twice to obtain a result from the given url. Based on the received contenttype we either Parse or create the Image from the stream.
private void button1_Click(object sender, EventArgs e)
{
var cookies = new NameValueCollection();
for (int tries = 0; tries < 2; tries++)
{
using (var response = Builder(_URL, "www.habbo" + _Country, cookies))
{
using (var stream = response.GetResponseStream())
{
string contentType = response.ContentType.ToLowerInvariant();
if (contentType.StartsWith("text/html"))
{
var parameters = Parse(stream, response.CharacterSet);
cookies.Add(parameters[0], parameters[1]);
}
if (contentType.StartsWith("image"))
{
pictureBox1.Image = Image.FromStream(stream);
break; // we're done, get out
}
}
}
}
}
Words of caution
This code works for the url in your question. I didn't take any measures to handle other patterns, and/or exceptions. It is up to you to add that. Also when doing this kind of scraping make sure the owner of the website does allow this.

Html Agility Pack, Web Scraping, and spoofing in C#

Is there a way to spoof a web request from C# code so it doesn't look like a bot or spam hitting the site? I am trying to web scrape my website, but keep getting blocked after a certain amount of calls. I want to act like a real browser. I am using this code, from HTML Agility Pack.
var web = new HtmlWeb();
web.UserAgent =
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11";

I do way too much web scraping, but here are the options:
I have a default list of headers I add as all of these are expected from a browser:
wc.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11";
wc.Headers[HttpRequestHeader.ContentType] = "application/x-www-form-urlencoded";
wc.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
wc.Headers[HttpRequestHeader.AcceptEncoding] = "gzip,deflate,sdch";
wc.Headers[HttpRequestHeader.AcceptLanguage] = "en-GB,en-US;q=0.8,en;q=0.6";
wc.Headers[HttpRequestHeader.AcceptCharset] = "ISO-8859-1,utf-8;q=0.7,*;q=0.3";
(WC is my WebClient).
As a further help - here is my webclient class that keeps cookies stored - which is also a massive help:
public class CookieWebClient : WebClient
{
public CookieContainer m_container = new CookieContainer();
public WebProxy proxy = null;
protected override WebRequest GetWebRequest(Uri address)
{
try
{
ServicePointManager.DefaultConnectionLimit = 1000000;
WebRequest request = base.GetWebRequest(address);
request.Proxy = proxy;
HttpWebRequest webRequest = request as HttpWebRequest;
webRequest.Pipelined = true;
webRequest.KeepAlive = true;
if (webRequest != null)
{
webRequest.CookieContainer = m_container;
}
return request;
}
catch
{
return null;
}
}
}
Here is my usual use for it. Add a static copy to your base site class with all your parsing functions you likely have:
protected static CookieWebClient wc = new CookieWebClient();
And call it as such:
public HtmlDocument Download(string url)
{
HtmlDocument hdoc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("option");
HtmlNode.ElementsFlags.Remove("select");
Stream read = null;
try
{
read = wc.OpenRead(url);
}
catch (ArgumentException)
{
read = wc.OpenRead(HttpHelper.HTTPEncode(url));
}
hdoc.Load(read, true);
return hdoc;
}
The other main reason you may be crashing out is the connection is being closed by the server as you have had an open connection for too long. You can prove this by adding a try catch around the download part as above and if it fails, reset the webclient and try to download again:
HtmlDocument d = new HtmlDocument();
try
{
d = this.Download(prp.PropertyUrl);
}
catch (WebException e)
{
this.Msg(Site.ErrorSeverity.Severe, "Error connecting to " + this.URL + " : Resubmitting..");
wc = new CookieWebClient();
d = this.Download(prp.PropertyUrl);
}
This saves my ass all the time, even if it was the server rejecting you, this can re-jig the lot. Cookies are cleared and your free to roam again. If worse truly comes to worse - add proxy support and get a new proxy applied per 50-ish requests.
That should be more than enough for you to kick your own and any other sites arse.
RATE ME!

Use a regular browser and fiddler (if the developer tools are not up to scratch) and take a look at the request and response headers.
Build up your requests and request headers to match what the browser sends (you can use a couple of different browsers to asses if this makes a difference).
In regards to "getting blocked after a certain amount of calls" - throttle your calls. Only make one call every x seconds. Behave nicely to the site and it will behave nicely to you.
Chances are good that they simply look at the number of calls from your IP address per second and if it passes a threshold, the IP address gets blocked.

Problems authenticating to website from code

I am trying to write code that will authenticate to the website wallbase.cc. I've looked at what it does using Firfebug/Chrome Developer tools and it seems fairly easy:
Post "usrname=$USER&pass=$PASS&nopass_email=Type+in+your+e-mail+and+press+enter&nopass=0" to the webpage "http://wallbase.cc/user/login", store the returned cookies and use them on all future requests.
Here is my code:
private CookieContainer _cookies = new CookieContainer();
//......
HttpPost("http://wallbase.cc/user/login", string.Format("usrname={0}&pass={1}&nopass_email=Type+in+your+e-mail+and+press+enter&nopass=0", Username, assword));
//......
private string HttpPost(string url, string parameters)
{
try
{
System.Net.WebRequest req = System.Net.WebRequest.Create(url);
//Add these, as we're doing a POST
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
((HttpWebRequest)req).Referer = "http://wallbase.cc/home/";
((HttpWebRequest)req).CookieContainer = _cookies;
//We need to count how many bytes we're sending. Post'ed Faked Forms should be name=value&
byte[] bytes = System.Text.Encoding.ASCII.GetBytes(parameters);
req.ContentLength = bytes.Length;
System.IO.Stream os = req.GetRequestStream();
os.Write(bytes, 0, bytes.Length); //Push it out there
os.Close();
//get response
using (System.Net.WebResponse resp = req.GetResponse())
{
if (resp == null) return null;
using (Stream st = resp.GetResponseStream())
{
System.IO.StreamReader sr = new System.IO.StreamReader(st);
return sr.ReadToEnd().Trim();
}
}
}
catch (Exception)
{
return null;
}
}
After calling HttpPost with my login parameters I would expect all future calls using this same method to be authenticated (assuming a valid username/password). I do get a session cookie in my cookie collection but for some reason I'm not authenticated. I get a session cookie in my cookie collection regardless of which page I visit so I tried loading the home page first to get the initial session cookie and then logging in but there was no change.
To my knowledge this Python version works: https://github.com/sevensins/Wallbase-Downloader/blob/master/wallbase.sh (line 336)
Any ideas on how to get authentication working?
Update #1
When using a correct user/password pair the response automatically redirects to the referrer but when an incorrect user/pass pair is received it does not redirect and returns a bad user/pass pair. Based on this it seems as though authentication is happening, but maybe not all the key pieces of information are being saved??
Update #2
I am using .NET 3.5. When I tried the above code in .NET 4, with the added line of System.Net.ServicePointManager.Expect100Continue = false (which was in my code, just not shown here) it works, no changes necessary. The problem seems to stem directly from some pre-.Net 4 issue.

This is based on code from one of my projects, as well as code found from various answers here on stackoverflow.
First we need to set up a Cookie aware WebClient that is going to use HTML 1.0.
public class CookieAwareWebClient : WebClient
{
private CookieContainer cookie = new CookieContainer();
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = (HttpWebRequest)base.GetWebRequest(address);
request.ProtocolVersion = HttpVersion.Version10;
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = cookie;
}
return request;
}
}
Next we set up the code that handles the Authentication and then finally loads the response.
var client = new CookieAwareWebClient();
client.UseDefaultCredentials = true;
client.BaseAddress = #"http://wallbase.cc";
var loginData = new NameValueCollection();
loginData.Add("usrname", "test");
loginData.Add("pass", "123");
loginData.Add("nopass_email", "Type in your e-mail and press enter");
loginData.Add("nopass", "0");
var result = client.UploadValues(#"http://wallbase.cc/user/login", "POST", loginData);
string response = System.Text.Encoding.UTF8.GetString(result);
We can try this out using the HTML Visualizer inbuilt into Visual Studio while staying in debug mode and use that to confirm that we were able to authenticate and load the Home page while staying authenticated.
The key here is to set up a CookieContainer and use HTTP 1.0, instead of 1.1. I am not entirely sure why forcing it to use 1.0 allows you to authenticate and load the page successfully, but part of the solution is based on this answer.
https://stackoverflow.com/a/10916014/408182
I used Fiddler to make sure that the response sent by the C# Client was the same as with my web browser Chrome. It also allows me to confirm if the C# client is being redirect correctly. In this case we can see that with HTML 1.0 we are getting the HTTP/1.0 302 Found and then redirects us to the home page as intended. If we switch back to HTML 1.1 we will get an HTTP/1.1 417 Expectation Failed message instead.
There is some information on this error message available in this stackoverflow thread.
HTTP POST Returns Error: 417 "Expectation Failed."
Edit: Hack/Fix for .NET 3.5
I have spent a lot of time trying to figure out the difference between 3.5 and 4.0, but I seriously have no clue. It looks like 3.5 is creating a new cookie after the authentication and the only way I found around this was to authenticate the user twice.
I also had to make some changes on the WebClient based on information from this post.
http://dot-net-expertise.blogspot.fr/2009/10/cookiecontainer-domain-handling-bug-fix.html
public class CookieAwareWebClient : WebClient
{
public CookieContainer cookies = new CookieContainer();
protected override WebRequest GetWebRequest(Uri address)
{
var request = base.GetWebRequest(address);
var httpRequest = request as HttpWebRequest;
if (httpRequest != null)
{
httpRequest.ProtocolVersion = HttpVersion.Version10;
httpRequest.CookieContainer = cookies;
var table = (Hashtable)cookies.GetType().InvokeMember("m_domainTable", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.GetField | System.Reflection.BindingFlags.Instance, null, cookies, new object[] { });
var keys = new ArrayList(table.Keys);
foreach (var key in keys)
{
var newKey = (key as string).Substring(1);
table[newKey] = table[key];
}
}
return request;
}
}
var client = new CookieAwareWebClient();
var loginData = new NameValueCollection();
loginData.Add("usrname", "test");
loginData.Add("pass", "123");
loginData.Add("nopass_email", "Type in your e-mail and press enter");
loginData.Add("nopass", "0");
// Hack: Authenticate the user twice!
client.UploadValues(#"http://wallbase.cc/user/login", "POST", loginData);
var result = client.UploadValues(#"http://wallbase.cc/user/login", "POST", loginData);
string response = System.Text.Encoding.UTF8.GetString(result);

You may need to add the following:
//get response
using (System.Net.WebResponse resp = req.GetResponse())
{
foreach (Cookie c in resp.Cookies)
_cookies.Add(c);
// Do other stuff with response....
}
Another thing that you might have to do is, if the server responds with a 302 (redirect) the .Net web request will automatically follow it and in the process you might lose the cookie you're after. You can turn off this behavior with the following code:
req.AllowAutoRedirect = false;

The Python you reference uses a different referrer (http://wallbase.cc/start/). It is also followed by another post to (http://wallbase.cc/user/adult_confirm/1). Try the other referrer and followup with this POST.
I think you are authenticating correctly, but that the site needs more info/assertions from you before proceeding.

matweb.com: How to get source of page?

I have url like:
http://www.matweb.com/search/DataSheet.aspx?MatGUID=849e2916ab1541be9ff6a17b78f95c82
I want to download source code from that page using this code:
private static string urlTemplate = #"http://www.matweb.com/search/DataSheet.aspx?MatGUID=";
static string GetSource(string guid)
{
try
{
Uri url = new Uri(urlTemplate + guid);
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.Method = "GET";
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
Stream responseStream = webResponse.GetResponseStream();
StreamReader responseStreamReader = new StreamReader(responseStream);
String result = responseStreamReader.ReadToEnd();
return result;
}
catch (Exception ex)
{
return null;
}
}
When I do so I get:
You do not seem to have cookies enabled. MatWeb Requires cookies to be enabled.
Ok, that I understand, so I added lines:
CookieContainer cc = new CookieContainer();
webRequest.CookieContainer = cc;
I got:
Your IP Address has been restricted due to excessive use. The problem may be compounded when an IP address may be shared by many people in a company or through an internet service provider. We apologize for any inconvenience.
I can understand this but I'm not getting this message when I try to visit this page using web browser. What can I do to get the source code? Some cookies or http headers?

It probably doesn't like your UserAgent. Try this:
webRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)"; //maybe substitute your own in here

It looks like you're doing something that the company doesn't like, if you got an "excessive use" response.

You are downloading pages too fast.
When you use a browser you might get up to one page per second. Using a application you can get several pages per second and that's probably what their web server is detecting. Hence the excessive usage.

Get web page contents from Firefox in a C# program

I need to write a simple C# app that should receive entire contents of a web page currently opened in Firefox. Is there any way to do it directly from C#? If not, is it possible to develop some kind of plug-in that would transfer page contents? As I am a total newbie in Firefox plug-ins programming, I'd really appreciate any info on getting me started quickly. Maybe there are some sources I can use as a reference? Doc links? Recommendations?
UPD: I actually need to communicate with a Firefox instance, not get contents of a web page from a given URL

It would help if you elaborate What you are trying to achieve. May be plugins already out there such as firebug can help.
Anways, if you really want to develop both plugin and C# application:
Check out this tutorial on firefox extension:
http://robertnyman.com/2009/01/24/how-to-develop-a-firefox-extension/
Otherwise, You can use WebRequest or HttpWebRequest class in .NET request to get the HTML source of any URL.

I think you'd almost certainly need to write a Firefox plugin for that. However there are certainly ways to request a webpage, and receive its HTML response within C#. It depends on what your requirements are?
If you're requirements are simply receive the source from any website, leave a comment and I'll point you towards the code.

Uri uri = new Uri(url);
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(uri.AbsoluteUri);
req.AllowAutoRedirect = true;
req.MaximumAutomaticRedirections = 3;
//req.UserAgent = _UserAgent; //"Mozilla/6.0 (MSIE 6.0; Windows NT 5.1; Searcharoo.NET)";
req.KeepAlive = true;
req.Timeout = _RequestTimeout * 1000; //prefRequestTimeout
// SIMONJONES http://codeproject.com/aspnet/spideroo.asp?msg=1421158#xx1421158xx
req.CookieContainer = new System.Net.CookieContainer();
req.CookieContainer.Add(_CookieContainer.GetCookies(uri));
System.Net.HttpWebResponse webresponse = null;
try
{
webresponse = (System.Net.HttpWebResponse)req.GetResponse();
}
catch (Exception ex)
{
webresponse = null;
Console.Write("request for url failed: {0} {1}", url, ex.Message);
}
if (webresponse != null)
{
webresponse.Cookies = req.CookieContainer.GetCookies(req.RequestUri);
// handle cookies (need to do this incase we have any session cookies)
foreach (System.Net.Cookie retCookie in webresponse.Cookies)
{
bool cookieFound = false;
foreach (System.Net.Cookie oldCookie in _CookieContainer.GetCookies(uri))
{
if (retCookie.Name.Equals(oldCookie.Name))
{
oldCookie.Value = retCookie.Value;
cookieFound = true;
}
}
if (!cookieFound)
{
_CookieContainer.Add(retCookie);
}
}
string enc = "utf-8"; // default
if (webresponse.ContentEncoding != String.Empty)
{
// Use the HttpHeader Content-Type in preference to the one set in META
doc.Encoding = webresponse.ContentEncoding;
}
else if (doc.Encoding == String.Empty)
{
doc.Encoding = enc; // default
}
//http://www.c-sharpcorner.com/Code/2003/Dec/ReadingWebPageSources.asp
System.IO.StreamReader stream = new System.IO.StreamReader
(webresponse.GetResponseStream(), System.Text.Encoding.GetEncoding(doc.Encoding));
webresponse.Close();

This does what you want.
using System.Net;
var cli = new WebClient();
string data = cli.DownloadString("http://www.heise.de");
Console.WriteLine(data);

Native messaging enables an extension to exchange messages with a native application installed on the user's computer.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why is this WebRequest code slow? - c#

change your code to asynchronous getresponse public override WebResponse GetResponse() { ••• IAsyncResult asyncResult = BeginGetResponse(null, null); ••• return EndGetResponse(asyncResult); } Async Get

Probably Firefox issues multiple requests at once whereas your code does them one by one. Perhaps adding threads will speed up your program.

The answer is changing HttpWebRequest/HttpWebResponse to WebRequest/WebResponse only. That fixed the problem.

close the response stream when you are done, so in your checkExist(), add wresp.Close() after wresp = (HttpWebResponse)wreq.GetResponse();

Set cookie is matter and you must add AspxAutoDetectCookieSupport=1 like this code req.CookieContainer = new CookieContainer(); req.CookieContainer.Add(new Cookie("AspxAutoDetectCookieSupport", "1") { Domain = target.Host });

Related

C# generate a cookie dynamically that site will accept?

Html Agility Pack, Web Scraping, and spoofing in C#

Problems authenticating to website from code

matweb.com: How to get source of page?

Get web page contents from Firefox in a C# program

Categories

Resources