Get web page contents from Firefox in a C# program

Get web page contents from Firefox in a C# program - c#

I need to write a simple C# app that should receive entire contents of a web page currently opened in Firefox. Is there any way to do it directly from C#? If not, is it possible to develop some kind of plug-in that would transfer page contents? As I am a total newbie in Firefox plug-ins programming, I'd really appreciate any info on getting me started quickly. Maybe there are some sources I can use as a reference? Doc links? Recommendations?
UPD: I actually need to communicate with a Firefox instance, not get contents of a web page from a given URL

It would help if you elaborate What you are trying to achieve. May be plugins already out there such as firebug can help.
Anways, if you really want to develop both plugin and C# application:
Check out this tutorial on firefox extension:
http://robertnyman.com/2009/01/24/how-to-develop-a-firefox-extension/
Otherwise, You can use WebRequest or HttpWebRequest class in .NET request to get the HTML source of any URL.

I think you'd almost certainly need to write a Firefox plugin for that. However there are certainly ways to request a webpage, and receive its HTML response within C#. It depends on what your requirements are?
If you're requirements are simply receive the source from any website, leave a comment and I'll point you towards the code.

Uri uri = new Uri(url);
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(uri.AbsoluteUri);
req.AllowAutoRedirect = true;
req.MaximumAutomaticRedirections = 3;
//req.UserAgent = _UserAgent; //"Mozilla/6.0 (MSIE 6.0; Windows NT 5.1; Searcharoo.NET)";
req.KeepAlive = true;
req.Timeout = _RequestTimeout * 1000; //prefRequestTimeout
// SIMONJONES http://codeproject.com/aspnet/spideroo.asp?msg=1421158#xx1421158xx
req.CookieContainer = new System.Net.CookieContainer();
req.CookieContainer.Add(_CookieContainer.GetCookies(uri));
System.Net.HttpWebResponse webresponse = null;
try
{
webresponse = (System.Net.HttpWebResponse)req.GetResponse();
}
catch (Exception ex)
{
webresponse = null;
Console.Write("request for url failed: {0} {1}", url, ex.Message);
}
if (webresponse != null)
{
webresponse.Cookies = req.CookieContainer.GetCookies(req.RequestUri);
// handle cookies (need to do this incase we have any session cookies)
foreach (System.Net.Cookie retCookie in webresponse.Cookies)
{
bool cookieFound = false;
foreach (System.Net.Cookie oldCookie in _CookieContainer.GetCookies(uri))
{
if (retCookie.Name.Equals(oldCookie.Name))
{
oldCookie.Value = retCookie.Value;
cookieFound = true;
}
}
if (!cookieFound)
{
_CookieContainer.Add(retCookie);
}
}
string enc = "utf-8"; // default
if (webresponse.ContentEncoding != String.Empty)
{
// Use the HttpHeader Content-Type in preference to the one set in META
doc.Encoding = webresponse.ContentEncoding;
}
else if (doc.Encoding == String.Empty)
{
doc.Encoding = enc; // default
}
//http://www.c-sharpcorner.com/Code/2003/Dec/ReadingWebPageSources.asp
System.IO.StreamReader stream = new System.IO.StreamReader
(webresponse.GetResponseStream(), System.Text.Encoding.GetEncoding(doc.Encoding));
webresponse.Close();

This does what you want.
using System.Net;
var cli = new WebClient();
string data = cli.DownloadString("http://www.heise.de");
Console.WriteLine(data);

Native messaging enables an extension to exchange messages with a native application installed on the user's computer.

Related

Recover from a NameResolutionFailure error during a HTTPWebRequest

In our Outlook COM add-in, we're making an API call to our server using the .NET HTTPWebRequest method. One of our customers is running into a System.Net.WebException with the message The remote name could not be resolved: 'mydomain.com' and WebExceptionStatus.NameResolutionFailure as the status. All the users from this particular customer's company are using outlook/the addin from behind a VPN so we are piggy-backing proxy configuration from IE in order to make the request.
Our API calls work for a period of time but then it randomly disconnects and then does not allow future requests to go through either. Once the users close and restart Outlook though, it seems to work just fine again without changing any network configuration or reconnecting to wifi etc.
Other posts like this suggested retrying with a sleep in between. We have added a retry mechanism with 3 attempts, each with a sleep in between but that has not resolved the intermitent issue.
Our domain is hooked up to an AWS Classic Load Balancer so mydomain.com actually resolves a CNAME record to an AWS static domain, pointing to the ELB. I'm not sure if that would have any impact on the request or routing it.
The strange part is we also have a web browser component that loads a web page in a sidebar from the exact same domain as the API calls. It works perfectly and loads a URL from the same domain. The users can also load the URL in their browsers without any issues. It just appears that the HTTPWebRequest is running into the domain resolution issue. We've checked that it's not just a matter of a weak wifi signal. Since they are able to use IE which has the same proxy config to access the site just fine, I don't think it's that.
We're at a loss for how to gracefully recover and have the request try again. I've looked into suggestions from this answer and this other answer, we'll be trying those next. Unfortunately, we are not able to make the requests use direct IP addresses as some of the other answers suggest. That also eliminates the ability to edit the hosts file to point straight to it. The reason is we can't assign a static IP on a classic ELB.
We're considering trying to set the host to use the CNAME record from AWS directly but this is going to cause SSL errors as it doesn't have a valid cert for the CNAME entry. Is there a way to get around that by masking it via a header, similar to the IP approach?
Feel free to ask for more information, I will do my best to provide it.
Any suggestions on what to try / troubleshoot are welcome!
Update: We’re targeting .NET v4.5
Here's the code
var result = string.Empty;
bool retrying = false;
int retries = 0;
HttpWebRequest webRequest = null;
try
{
ServicePointManager.ServerCertificateValidationCallback =
CertificateCheck;
ServicePointManager.MaxServicePoints = 4;
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
retry:
webRequest = (HttpWebRequest)WebRequest.Create(uriParam);
webRequest.Timeout = 120000;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.Accept = acceptParam;
webRequest.Headers.Add("Cookie", cookieParam);
webRequest.UseDefaultCredentials = true;
webRequest.Proxy = null;
webRequest.KeepAlive = true; //default
webRequest.ServicePoint.ConnectionLeaseTimeout = webRequest.Timeout;
webRequest.ServicePoint.MaxIdleTime = webRequest.Timeout;
webRequest.ContentLength = dataParam.Length;
using (var reqStream = webRequest.GetRequestStream())
{
reqStream.Write(dataParam, 0, dataParam.Length);
reqStream.Flush();
reqStream.Close();
}
try
{
using (WebResponse webResponse = webRequest.GetResponse())
{
using (var responseStream = webResponse.GetResponseStream())
{
if (responseStream != null)
{
using (var reader = new StreamReader(responseStream))
{
result = reader.ReadToEnd();
}
}
}
webResponse.Close();
}
}
catch (WebException)
{
if (retrying && retries == 3)
{
//don't retry any more
return string.Empty;
}
retrying = true;
retries++;
webRequest.Abort();
System.Threading.Thread.Sleep(2000);
goto retry;
}
}
catch (Exception ex)
{
Log.Error(ex);
result = string.Empty;
}
finally
{
webRequest?.Abort();
}
return result;

C#: HttpWebRequest POST data not working

I am developing a C# wpf application that has a functionality of logging into my website and download the file. This said website has an Authorize attribute on its action. I need 2 cookies for me to able to download the file, first cookie is for me to log in, second cookie(which is provided after successful log in) is for me to download the file. So i came up with the flow of keeping my cookies after my httpwebrequest/httpwebresponse. I am looking at my posting flow as maybe it is the problem. Here is my code.
void externalloginanddownload()
{
string pageSource = string.Empty;
CookieContainer cookies = new CookieContainer();
HttpWebRequest getrequest = (HttpWebRequest)WebRequest.Create("login uri");
getrequest.CookieContainer = cookies;
getrequest.Method = "GET";
getrequest.AllowAutoRedirect = false;
HttpWebResponse getresponse = (HttpWebResponse)getrequest.GetResponse();
using (StreamReader sr = new StreamReader(getresponse.GetResponseStream()))
{
pageSource = sr.ReadToEnd();
}
var values = new NameValueCollection
{
{"Username", "username"},
{"Password", "password"},
{ "Remember me?","False"},
};
var parameters = new StringBuilder();
foreach (string key in values.Keys)
{
parameters.AppendFormat("{0}={1}&",
HttpUtility.UrlEncode(key),
HttpUtility.UrlEncode(values[key]));
}
parameters.Length -= 1;
HttpWebRequest postrequest = (HttpWebRequest)WebRequest.Create("login uri");
postrequest.CookieContainer = cookies;
postrequest.Method = "POST";
using (var writer = new StreamWriter(postrequest.GetRequestStream()))
{
writer.Write(parameters.ToString());
}
using (WebResponse response = postrequest.GetResponse()) // the error 500 occurs here
{
using (var streamReader = new StreamReader(response.GetResponseStream()))
{
string html = streamReader.ReadToEnd();
}
}
}

When you get the WebResponse, the cookies returned will be in the response, not in the request (oddly enough, even though you need to CookieContainer on the request).
You will need to add the cookies from the response object to your CookieContainer, so it gets sent on the next request.
One simple way:
for(var cookie in getresponse.Cookies)
cookies.Add(cookie)
Since the cookies in response is already a cookies container, you can do this (might help to check for null in case all cookies were already there)
if (response.Cookies != null) cookies.Add(response.Cookies)
You may also have trouble with your POST as you need to set ContentType and length:
myWebRequest.ContentLength = parameters.Length;
myWebRequest.AllowWriteStreamBuffering = true;
If you have any multibyte characters to think about, you may have to address that as well by setting the encoding to UTF-8 on the request and the stringbuilder, and converting string to bytes and using that length.
Another tip: some web server code chokes if there is no user agent. Try:
myWebRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
And just in case you have any multibyte characters, it is better to do this:
var databytes = System.Text.Encoding.UTF8.GetBytes(parameters.ToString());
myWebRequest.ContentLength = databytes.Length;
myWebRequest.ContentType = "application/x-www-form-urlencoded; charset=utf-8";
using (var stream = myWebRequest.GetRequestStream())
{
stream.Write(databytes, 0, databytes.Length);
}

In C# Application (Server side Web API) Enable the C++ Exception and Common Language Run time Exceptions using (Ctrl+Alt+E) what is the Server side Exception it's throw.
First you check data is binding Properly. After you can see what it is Exact Exception. the Internal Server Error Mostly throw the data is not correct format and not properly managed Exception.

How to fake Http post?

I am using asp.net mvc and I want to fake a http post to see what would happen. Is there any software that I can use?

I believe that Fiddler allows you to do this, plus a whole lot more.
I only use it for reviewing what's going to/from the server when dealing with AJAX induced headaches, but I'm fairly sure you can use it to re-issue HTTP requests both as they were originally and modified, which should fit the bill for you.

string var1 = "Foo";
string var2 = "Bar";
ASCIIEncoding encoding = new ASCIIEncoding();
string post = "var1=" + var1 + "&var2=" + var2;
byte[] bites = encoding.GetBytes(post);
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://Url/PageToPostTo.aspx");
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = bites.Length;
Stream s = request.GetRequestStream();
s.Write(bites, 0, bites.Length);
s.Close();

I like TamperData, a Firefox addon.

Here is some javascript for you:
function makeRequest(message,url,responseFunction){
var http_request;
if (window.XMLHttpRequest){ // Mozilla, Safari,...
http_request = new XMLHttpRequest();
if (http_request.overrideMimeType) {
// set type accordingly to anticipated content type
//http_request.overrideMimeType('text/xml');
http_request.overrideMimeType('text/html');
}
}
else if (window.ActiveXObject){ // IE
try {
http_request = new ActiveXObject("Msxml2.XMLHTTP");
} catch (e){
try {
http_request = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e) {}
}
}
http_request.onreadystatechange = responseFunction;
http_request.open("POST", url);
http_request.setRequestHeader("Content-Type", "text/plain;charset=UTF-8");
http_request.send(message);
}

Charles has the ability to capture any http requests and responses and allows you to save sessions and edit/repeat them with easy. Worth a try and see if it's to your liking.

The below open source project allows you to fake external web services in your acceptance tests.
It supports the common HTTP verbs GET, POST, DELETE & PUT;
http://www.nuget.org/packages/boomerang/
https://github.com/garfieldmoore/Boomerang

Why is this WebRequest code slow?

I requested 100 pages that all 404. I wrote
{
var s = DateTime.Now;
for(int i=0; i < 100;i++)
DL.CheckExist("http://google.com/lol" + i.ToString() + ".jpg");
var e = DateTime.Now;
var d = e-s;
d=d;
Console.WriteLine(d);
}
static public bool CheckExist(string url)
{
HttpWebRequest wreq = null;
HttpWebResponse wresp = null;
bool ret = false;
try
{
wreq = (HttpWebRequest)WebRequest.Create(url);
wreq.KeepAlive = true;
wreq.Method = "HEAD";
wresp = (HttpWebResponse)wreq.GetResponse();
ret = true;
}
catch (System.Net.WebException)
{
}
finally
{
if (wresp != null)
wresp.Close();
}
return ret;
}
Two runs show it takes 00:00:30.7968750 and 00:00:26.8750000. Then i tried firefox and use the following code
<html>
<body>
<script type="text/javascript">
for(var i=0; i<100; i++)
document.write("<img src=http://google.com/lol" + i + ".jpg><br>");
</script>
</body>
</html>
Using my comp time and counting it was roughly 4 seconds. 4 seconds is 6.5-7.5faster then my app. I plan to scan through a thousands of files so taking 3.75hours instead of 30mins would be a big problem. How can i make this code faster? I know someone will say firefox caches the images but i want to say 1) it still needs to check the headers from the remote server to see if it has been updated (which is what i want my app to do) 2) I am not receiving the body, my code should only be requesting the header. So, how do i solve this?

I noticed that an HttpWebRequest hangs on the first request. I did some research and what seems to be happening is that the request is configuring or auto-detecting proxies. If you set
request.Proxy = null;
on the web request object, you might be able to avoid an initial delay.
With proxy auto-detect:
using (var response = (HttpWebResponse)request.GetResponse()) //6,956 ms
{
}
Without proxy auto-detect:
request.Proxy = null;
using (var response = (HttpWebResponse)request.GetResponse()) //154 ms
{
}

change your code to asynchronous getresponse
public override WebResponse GetResponse() {
•••
IAsyncResult asyncResult = BeginGetResponse(null, null);
•••
return EndGetResponse(asyncResult);
}
Async Get

Probably Firefox issues multiple requests at once whereas your code does them one by one. Perhaps adding threads will speed up your program.

The answer is changing HttpWebRequest/HttpWebResponse to WebRequest/WebResponse only. That fixed the problem.

Have you tried opening the same URL in IE on the machine that your code is deployed to? If it is a Windows Server machine then sometimes it's because the url you're requesting is not in IE's (which HttpWebRequest works off) list of secure sites. You'll just need to add it.
Do you have more info you could post? I've doing something similar and have run into tons of problems with HttpWebRequest before. All unique. So more info would help.
BTW, calling it using the async methods won't really help in this case. It doesn't shorten the download time. It just doesn't block your calling thread that's all.

close the response stream when you are done, so in your checkExist(), add wresp.Close() after wresp = (HttpWebResponse)wreq.GetResponse();

OK if you are getting status code 404 for all webpages then it is due to not specifying credentials. So you need to add
wreq.Credentials = CredentialCache.DefaultCredentials;
Then you may also come across status code= 500 for that you need to specify User Agent. Which looks something like the below line
wreq.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0";
"A WebClient instance does not send optional HTTP headers by default. If your request requires an optional header, you must add the header to the Headers collection. For example, to retain queries in the response, you must add a user-agent header. Also, servers may return 500 (Internal Server Error) if the user agent header is missing."
reference: https://msdn.microsoft.com/en-us/library/system.net.webclient(v=vs.110).aspx
To improve the Performance of the HttpWebrequest you need to add
wreq.Proxy=null
now the code will look like:
static public bool CheckExist(string url)
{
HttpWebRequest wreq = null;
HttpWebResponse wresp = null;
bool ret = false;
try
{
wreq = (HttpWebRequest)WebRequest.Create(url);
wreq.Credentials = CredentialCache.DefaultCredentials;
wreq.Proxy=null;
wreq.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0";
wreq.KeepAlive = true;
wreq.Method = "HEAD";
wresp = (HttpWebResponse)wreq.GetResponse();
ret = true;
}
catch (System.Net.WebException)
{
}
finally
{
if (wresp != null)
wresp.Close();
}
return ret;
}

Set cookie is matter and you must add AspxAutoDetectCookieSupport=1 like this code
req.CookieContainer = new CookieContainer();
req.CookieContainer.Add(new Cookie("AspxAutoDetectCookieSupport", "1") { Domain = target.Host });

Communicating with the web through a C# app?

Although i can grasp the concepts of the .Net framework and windows apps, i want to create an app that will involve me simulating website clicks and getting data/response times from that page. I have not had any experience with web yet as im only a junior, could someone explain to me (in english!!) the basic concepts or with examples, the different ways and classes that could help me communicate with a website?

what do you want to do?
send a request and grab the response in a String so you can process?
HttpWebRequest and HttpWebResponse will work
if you need to connect through TCP/IP, FTP or other than HTTP then you need to use a more generic method
WebRequest and WebResponse
All the 4 methods above are in System.Net Namespace
If you want to build a Service in the web side that you can consume, then today and in .NET please choose and work with WCF (RESTfull style).
hope it helps you finding your way :)
as an example using the HttpWebRequest and HttpWebResponse, maybe some code will help you understand better.
case: send a response to a URL and get the response, it's like clicking in the URL and grab all the HTML code that will be there after the click:
private void btnSendRequest_Click(object sender, EventArgs e)
{
textBox1.Text = "";
try
{
String queryString = "user=myUser&pwd=myPassword&tel=+123456798&msg=My message";
byte[] requestByte = Encoding.Default.GetBytes(queryString);
// build our request
WebRequest webRequest = WebRequest.Create("http://www.sendFreeSMS.com/");
webRequest.Method = "POST";
webRequest.ContentType = "application/xml";
webRequest.ContentLength = requestByte.Length;
// create our stram to send
Stream webDataStream = webRequest.GetRequestStream();
webDataStream.Write(requestByte, 0, requestByte.Length);
// get the response from our stream
WebResponse webResponse = webRequest.GetResponse();
webDataStream = webResponse.GetResponseStream();
// convert the result into a String
StreamReader webResponseSReader = new StreamReader(webDataStream);
String responseFromServer = webResponseSReader.ReadToEnd().Replace("\n", "").Replace("\t", "");
// close everything
webResponseSReader.Close();
webResponse.Close();
webDataStream.Close();
// You now have the HTML in the responseFromServer variable, use it :)
textBox1.Text = responseFromServer;
}
catch (Exception ex)
{
textBox1.Text = ex.Message;
}
}
The code does not work cause the URL is fictitious, but you get the idea. :)

You could use the System.Net.WebClient class of the .NET Framework. See the MSDN documentation here.
Simple example:
using System;
using System.Net;
using System.IO;
public class Test
{
public static void Main (string[] args)
{
if (args == null || args.Length == 0)
{
throw new ApplicationException ("Specify the URI of the resource to retrieve.");
}
WebClient client = new WebClient ();
// Add a user agent header in case the
// requested URI contains a query.
client.Headers.Add ("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
Stream data = client.OpenRead (args[0]);
StreamReader reader = new StreamReader (data);
string s = reader.ReadToEnd ();
Console.WriteLine (s);
data.Close ();
reader.Close ();
}
}
There are other useful methods of the WebClient, which allow developers to download and save resources from a specified URI.
The DownloadFile() method for example will download and save a resource to a local file. The UploadFile() method uploads and saves a resource to a specified URI.
UPDATE:
WebClient is simpler to use than WebRequest. Normally you could stick to using just WebClient unless you need to manipulate requests/responses in an advanced way. See this article where both are used: http://odetocode.com/Articles/162.aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get web page contents from Firefox in a C# program - c#

This does what you want. using System.Net; var cli = new WebClient(); string data = cli.DownloadString("http://www.heise.de"); Console.WriteLine(data);

Native messaging enables an extension to exchange messages with a native application installed on the user's computer.

Related

Recover from a NameResolutionFailure error during a HTTPWebRequest

C#: HttpWebRequest POST data not working

How to fake Http post?

Why is this WebRequest code slow?

Communicating with the web through a C# app?

Categories

Resources