Get Html source of current page in C# Windows Forms App - c#

I am working on creating an Internet Explorer add on using BandOjects and C# Windows Forms Application, and am testing out parsing HTML source code. I have been currently parsing information based on the URL of the site.
I would like to get HTML source of the current page of an example site I have that uses a login. if I use the URL of the page I am on, it will always grab the source of the login page rather than the actual page, as my app doesn't recognize that I logged in. would i need to store my login credentials for the site using some kind of api? or is there a way to grab the current page of the HTML regardless? I would prefer the latter as it seemingly would be less trouble. Thanks!

I use this method in one of my apps:
private static string RetrieveData(string url)
{
// used to build entire input
var sb = new StringBuilder();
// used on each read operation
var buf = new byte[8192];
try
{
// prepare the web page we will be asking for
var request = (HttpWebRequest)
WebRequest.Create(url);
/* Using the proxy class to access the site
* Uri proxyURI = new Uri("http://proxy.com:80");
request.Proxy = new WebProxy(proxyURI);
request.Proxy.Credentials = new NetworkCredential("proxyuser", "proxypassword");*/
// execute the request
var response = (HttpWebResponse)
request.GetResponse();
// we will read data via the response stream
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
// fill the buffer with data
count = resStream.Read(buf, 0, buf.Length);
// make sure we read some data
if (count != 0)
{
// translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(buf, 0, count);
// continue building the string
sb.Append(tempString);
}
} while (count > 0); // any more data to read?
}
catch(Exception exception)
{
MessageBox.Show(#"Failed to retrieve data from the network. Please check you internet connection: " +
exception);
}
return sb.ToString();
}
You have to just pass the url of the web page for which you need to retrieve the code.
For example:
string htmlSourceGoggle = RetrieveData("www.google.com")
Note: You can get un-comment the proxy configuration if you use proxy to access the internet. Replace the proxy address, username and password with the one you use.
For logging in via code. check this: Login to website, via C#

Related

Using POST to communicate between controller methods on two separate servers

I'm using C# 6.0, ASP.NET 4.5 MVC 4.
I'm developing an API that is essentially a wrapper for another API that generates PDFs. A separate server will be implementing it directly, and all other applications will send their data to this server for conversion. The underlying PDF conversion software has specific system requirements so this will free us from the limitation of what machines our applications can run on. It's also somewhat brittle so isolating it is desireable.
To accomplish this I've set up two separate MVC applications, one with the conversion implementation, the other as a simple application that generates data to be converted, which implements the API I'm developing. They're set up to exchange data using POST.
The problem I've run into is that the PDF server isn't receiving the data to be converted. It runs, but its parameter only contains null. I set it up so that it will return a PDF containing the error if this happens. It comes through successfully, containing the resulting error message it generated so that part of it is functioning properly.
Here's the code running on the PDF server:
[HttpPost]
public FileResult MakePdf(string html)
{
byte[] pdf = null;
var converter = new HtmlToPdfConverter();
try
{
pdf = converter.GeneratePdf(html);
}
catch (Exception e)
{
Debug.WriteLine(e.Message);
var errorHtml = errorTop + new Regex("\\s").Replace(e.Message, " ") + errorBottom;
pdf = converter.GeneratePdf(errorHtml);
}
return File(pdf, "application/pdf");
}
Here's the code that's sending the HTML there to be converted:
public byte[] Fetch() {
var webRequest = (HttpWebRequest)WebRequest.Create("http://localhost:60272/PdfServer/MakePdf");
webRequest.Method = "POST";
var encoder = new UTF8Encoding();
byte[] data = encoder.GetBytes(Resource); // Resource contains valid HTML output by ASP.NET
webRequest.ContentLength = data.Length;
webRequest.ContentType = "text/html";
using (var stream = webRequest.GetRequestStream())
{
stream.Write(data, 0, data.Length);
stream.Flush();
}
using (var webResponse = webRequest.GetResponse())
{
using (Stream responseStream = webResponse.GetResponseStream())
{
using (var memoryStream = new MemoryStream())
{
int bufferLength = 1024;
data = new byte[bufferLength];
int responseLength = 0;
do
{
responseLength = responseStream.Read(data, 0, bufferLength);
memoryStream.Write(data, 0, responseLength);
} while (responseLength != 0);
data = memoryStream.ToArray();
}
}
}
return data;
}
I haven't tried sending data to an ASP.NET MVC controller method from a separate application before. The code I wrote here is based on examples I've found of how it's done.
Any ideas about what I'm doing wrong?
Try to form encode it: "application/x-www-form-urlencoded" and name the string-data html. So it would look something like:
var s = "html=" + Resource;
And then send s, instead of sending Resource. And of course set the content type to "application/x-www-form-urlencoded". This should help MVC map the data to the html parameter.
That's the only thing I can think of.
On a side note, I think you also should Close() your stream when you're done, rather than flushing it.
===
A final idea would be to try to change your encoding from text/html to text/plain. I know you're thinking it's HTML, but your method is taking in a string. So to MVC it's expecting a string, not HTML, the fact that it's actually HTML is incidental to the MVC deserializer.

Retrieving website content and returning it in ASP.NET MVC 4

I have two servers. One is a private server and I don't want users to have direct access to it, and the other one is the server that public does have access to.
I can access my private server by URL like: http://xxx.xx.xxx.xxx/
What i want to do is create some kind of "proxy", only to work with my private server. My idea is to go to: http://www.domain.com/server/path/here/something
This page should show me the content of http://xxx.xx.xxx.xxx/path/here/something
I have this working, but the only way I could make it work was to return the content as a string, and then the browser would interpret the HTML.
This works fine for pages that return HTML content, but it doesn't work (of course) if I want to access a .gif or any kind of file directly.
Here's the code I currently have:
public string Index(string url)
{
string uri = "http://xxx.xx.xxx.xxx/" + url;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Method = "GET";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader responseStream = new StreamReader(response.GetResponseStream());
string resultado = responseStream.ReadToEnd();
return resultado;
}
How can I change my code so that it works for any file ?
You can check the response content type and do what you need based on that.
You'll need to change your action to return ActionResult instead of string.
if(response.ContentType.Equals("text/html"))
{
//show html stuff
return Content(resultado);
}
else if(response.ContentType.Contains("image/"))
{
var ms = new MemoryStream();
responseStream.BaseStream.CopyTo(ms);
var imageBytes = ms.ToArray();
return File(imageBytes, response.ContentType);
}
you have to write a system which reads your html or images from resultado and do something according to that PLUS you need to control your Url as well.

data transmission between two sites

I'm use Asp.Net Mvc 4
www.hostname.com to my site from my report.hostname2.com
send and receive data to move directly to that address by the string bi codebehind.
querysstring not, because sending a very long string
I mean rapor.coskunoglu.net/Pdf address to send string data to move directly to
that address
PDF of the screen to make it appear so.
How can I do this?
Thank you, take it easy.
I'm sorry, my english is not good.
EDIT0:
I want to use POST.
sb -> my StringBuilder.
byte[] bytt = Encoding.UTF8.GetBytes(sb.ToString());
WebRequest wr = WebRequest.Create("http://report.hostname2.com/Pdf");
wr.ContentType = "application/x-www-form-urlencoded";
wr.ContentLength = bytt.Length;
wr.Method = "POST";
Stream st = wr.GetRequestStream();
st.Write(bytt, 0, bytt.Length);
st.Close();
After you send the POST I want to go to report.hostname2.com.
Did you see this my job?
One way to achieve that is to store the data you want to transmit into some commonly shared database between the two sites and then simply send an id to the other site as a query string so that it could retrieve the data. If you cannot use a shared database then all that's left is standard HTTP protocol means:
GET - query string - impractical in your case if the data is large
POST - generate a form and then submit this form to the remote site - could be a good solution for your case because you are not limited in size
Alternatively you could use a WebClient to POST some data:
StringBuilder sb = ... the data to send
using (var client = new WebClient())
{
var values = new NameValueCollection
{
{ "data", sb.ToString() }
};
byte[] result = client.UploadValues("http://report.hostname2.com/Pdf", values);
}
And then on the remote site you could read the data POST parameter from the request.

Get web page contents from Firefox in a C# program

I need to write a simple C# app that should receive entire contents of a web page currently opened in Firefox. Is there any way to do it directly from C#? If not, is it possible to develop some kind of plug-in that would transfer page contents? As I am a total newbie in Firefox plug-ins programming, I'd really appreciate any info on getting me started quickly. Maybe there are some sources I can use as a reference? Doc links? Recommendations?
UPD: I actually need to communicate with a Firefox instance, not get contents of a web page from a given URL
It would help if you elaborate What you are trying to achieve. May be plugins already out there such as firebug can help.
Anways, if you really want to develop both plugin and C# application:
Check out this tutorial on firefox extension:
http://robertnyman.com/2009/01/24/how-to-develop-a-firefox-extension/
Otherwise, You can use WebRequest or HttpWebRequest class in .NET request to get the HTML source of any URL.
I think you'd almost certainly need to write a Firefox plugin for that. However there are certainly ways to request a webpage, and receive its HTML response within C#. It depends on what your requirements are?
If you're requirements are simply receive the source from any website, leave a comment and I'll point you towards the code.
Uri uri = new Uri(url);
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(uri.AbsoluteUri);
req.AllowAutoRedirect = true;
req.MaximumAutomaticRedirections = 3;
//req.UserAgent = _UserAgent; //"Mozilla/6.0 (MSIE 6.0; Windows NT 5.1; Searcharoo.NET)";
req.KeepAlive = true;
req.Timeout = _RequestTimeout * 1000; //prefRequestTimeout
// SIMONJONES http://codeproject.com/aspnet/spideroo.asp?msg=1421158#xx1421158xx
req.CookieContainer = new System.Net.CookieContainer();
req.CookieContainer.Add(_CookieContainer.GetCookies(uri));
System.Net.HttpWebResponse webresponse = null;
try
{
webresponse = (System.Net.HttpWebResponse)req.GetResponse();
}
catch (Exception ex)
{
webresponse = null;
Console.Write("request for url failed: {0} {1}", url, ex.Message);
}
if (webresponse != null)
{
webresponse.Cookies = req.CookieContainer.GetCookies(req.RequestUri);
// handle cookies (need to do this incase we have any session cookies)
foreach (System.Net.Cookie retCookie in webresponse.Cookies)
{
bool cookieFound = false;
foreach (System.Net.Cookie oldCookie in _CookieContainer.GetCookies(uri))
{
if (retCookie.Name.Equals(oldCookie.Name))
{
oldCookie.Value = retCookie.Value;
cookieFound = true;
}
}
if (!cookieFound)
{
_CookieContainer.Add(retCookie);
}
}
string enc = "utf-8"; // default
if (webresponse.ContentEncoding != String.Empty)
{
// Use the HttpHeader Content-Type in preference to the one set in META
doc.Encoding = webresponse.ContentEncoding;
}
else if (doc.Encoding == String.Empty)
{
doc.Encoding = enc; // default
}
//http://www.c-sharpcorner.com/Code/2003/Dec/ReadingWebPageSources.asp
System.IO.StreamReader stream = new System.IO.StreamReader
(webresponse.GetResponseStream(), System.Text.Encoding.GetEncoding(doc.Encoding));
webresponse.Close();
This does what you want.
using System.Net;
var cli = new WebClient();
string data = cli.DownloadString("http://www.heise.de");
Console.WriteLine(data);
Native messaging enables an extension to exchange messages with a native application installed on the user's computer.

Not generating a complete response from a HttpWebResponse object in C#

I am creating a HttpWebRequest object from another aspx page to save the response stream to my data store. The Url I am using to create the HttpWebRequest object has querystring to render the correct output. When I browse to the page using any old browser it renders correctly. When I try to retrieve the output stream using the HttpWebResponse.GetResponseStream() it renders my built in error check.
Why would it render correctly in the browser, but not using the HttpWebRequest and HttpWebResponse objects?
Here is the source code:
Code behind of target page:
protected void PageLoad(object sender, EventsArgs e)
{
string output = string.Empty;
if(Request.Querystring["a"] != null)
{
//generate output
output = "The query string value is " + Request.QueryString["a"].ToString();
}
else
{
//generate message indicating the query string variable is missing
output = "The query string value was not found";
}
Response.Write(output);
}
Code behind of page creating HttpWebRequest object
string url = "http://www.mysite.com/mypage.aspx?a=1";
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url)
//this if statement was missing from original example
if(User.Length > 0)
{
request.Credentials = new NetworkCredentials("myaccount", "mypassword", "mydomain");
request.PreAuthenticate = true;
}
request.UserAgent = Request.UserAgent;
HttpWebResponse response = (HttpWebResponse) request.GetResponse();
Stream resStream = response.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
StreamReader readStream = new StreamReader(resStream, encode, true, 2000);
int count = readStream.Read(read, 0, read.Length);
string str = Server.HtmlEncode(" ");
while (count > 0)
{
// Dumps the 256 characters on a string and displays the string to the console.
string strRead = new string(read, 0, count);
str = str.Replace(str, str + Server.HtmlEncode(strRead.ToString()));
count = readStream.Read(read, 0, 256);
}
// return what was found
result = str.ToString();
resStream.Close();
readStream.Close();
Update
#David McEwing - I am creating the HttpWebRequest with the full page name. The page is still generating the error output. I updated the code sample of the target page to demonstrate exactly what I am doing.
#Chris Lively - I am not redirecting to an error page, I generate a message indicating the query string value was not found. I updated the source code example.
Update 1:
I tried using Fiddler to trace the HttpWebRequest and it did not show up in the Web Sessions history window. Am I missing something in my source code to get a complete web request and response.
Update 2:
I did not include the following section of code in my example and it was culprit causing the issue. I was setting the Credentials property of the HttpWebRequest with a sevice account instead of my AD account which was causing the issue.
I updated my source code example
What webserver are you using? I can remember at one point in my past when doing something with IIS there was an issue where the redirect between http://example.com/ and http://example.com/default.asp dropped the query string.
Perhaps run Fiddler (or a protocol sniffer) and see if there is something happening that you aren't expecting.
Also check if passing in the full page name works. If it does the above is almost certainly the problem.
Optionally, you can try to use the AllowAutoRedirect property of the HttpRequestObject.
I need to replace the following line of code:
request.Credentials = new NetworkCredentials("myaccount", "mypassword", "mydomain");
with:
request.Credentials = System.Net.CredentialCache.DefaultNetworkCredentials;

Categories

Resources