Does anyone (please) know how to do this? I thought that there would be an easy way to achieve this but can't find anything about saving the contents of WebBrowser HTML.
You might try something like this:
(Assuming C# 4 and WPF 4)
dynamic doc = webBrowser.Document;
var htmlText = doc.documentElement.InnerHtml;
Works for me...
Yous should use HttpWebRequest and HttpWebResponse objects. Simple sample (found in web, tested, working):
HttpWebRequest myWebRequest = (HttpWebRequest)HttpWebRequest.Create(#"http://www.[pagename].com");
myWebRequest.Method = "GET";
HttpWebResponse myWebResponse = (HttpWebResponse)myWebRequest.GetResponse();
StreamReader myWebSource = new StreamReader(myWebResponse.GetResponseStream());
string myPageSource = string.Empty;
myPageSource = myWebSource.ReadToEnd();
myWebResponse.Close();
Related
I've been trying to see if I could get timetable data of a school website, and make a little application of it. At the moment this is what I have :
string userInput = "/*My username will be here*/";
string passInput = "/*My password will be here */";
string formUrl = "https://portal.gc.ac.nz/student/index.php/process-login";
string formParams = string.Format("username={0}&password={1}", userInput, passInput);
string cookieHeader;
WebRequest req = WebRequest.Create(formUrl);
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();
cookieHeader = resp.Headers["Set-cookie"];
string pageSource;
string getUrl = "https://portal.gc.ac.nz/student/index.php/timetable";
WebRequest getRequest = WebRequest.Create(getUrl);
getRequest.Headers.Add("Cookie", cookieHeader);
WebResponse getResponse = getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream()))
{
pageSource = sr.ReadToEnd();
}
I couldn't find a way to check if above code works, however my question is:
How can you access the data(texts) you want from the page? I want to get the subject names. Part of the html looks like this :
There are a few ways to do this: one would be regexp matching and taking the contents of the tags and another would be to just use HtmlAgilityPack library.
If you don't need to do it in C# I would strongly recommend a different language like Python or Perl. It seems to me that you are trying to scrape the data and in this case I strongly recommend to use the Scrapy framework from Python if possible. It's the best tool I encountered for scraping and you can use XPath to get your data easily. Here is the link to Scrapy's website.
I'm currently trying to read from an HTML file hosted online. My code should read to the end and then change xylosNotice1.Text to the source read from the HTML file.
Here is what I've tried:
WebRequest req = HttpWebRequest.Create("http://www.example.com/UpdateMe/updates.html");
req.Method = "GET";
string source;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
source = reader.ReadToEnd();
}
Console.WriteLine(source);
xylosNotice1.Text = (source);
It doesn't update the textbox with the source.
Sorry guys stupid mistake, The code needed to be placed in the Main_Load part this is resolved.
I want to get plain text using WebRequest class, just like what we get when we use webbrowser1.Document.Body.InnerText . I have tried the following code
public string request_Resource()
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myurl);
Stream stream = request.GetResponse().GetResponseStream();
StreamReader sr = new StreamReader(stream);
WebBrowser wb = new WebBrowser();
wb.DocumentText = sr.ReadToEnd();
return wb.Document.Body.InnerText;
}
when i execute this is get a NullReferenceException.
Is there a better way to get a plain text.
Note: I cannot use webbrowser control directly to load the webpage, because, i don't want to deal with all those events that fire up multiple times when ever a page is loaded.
UPDATE: I have changed my code to use WebClient Class instead of WebRequest upon suggestion
My code looks something like this now
public string request_Resource()
{
WebClient wc = new WebClient();
wc.Proxy = null;
//The user agent header is added to avoid any possible errors
wc.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 ( .NET CLR 3.5.30729; .NET4.0C)");
return wc.DownloadString(myurl);
}
I am considering using HTML Utility Pack, can anyone suggest any better alternative.
You're looking for the HTML Agility Pack, which can parse the HTML without IE.
It has an InnerText property.
To answer your question, you need to wait for the browser to parse the text.
By the way, you should use the WebClient class instead of WebRequest.
Use webclient:
public string request_Resource()
{
WebClient wc = new WebClient();
byte[] data = wc.DownloadData(myuri);
return Encoding.UTF8.GetString(data);
}
This will give you the content of the website. Then you can use HtmlAgilityPack to parse the result.
If you need just plain HTML text, then you have already wrote that code.
public string request_Resource()
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myurl);
Stream stream = request.GetResponse().GetResponseStream();
StreamReader sr = new StreamReader(stream);
return sr.ReadToEnd();
}
I want to generate html content based on a result returned by http url.
http://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1c239bjatxn_5taq0&address=2114+Bigelow+Ave&citystatezip=Seattle%2C+WA
This page will give you some XML results. I want to convert to use that XML to generate HTML. I am not getting any idea where to start? Would someone offer any guidelines or sample code for asp.net?
For details: http://www.zillow.com/howto/api/GetDeepSearchResults.htm
To fetch the data you can use the HttpWebRequest class, this is an example I have to hand but it may be slightly overdone for your needs (and you need to make sure you're doing the right thing - I suspect the above to be a GET rather than a POST).
Uri baseUri = new Uri(this.RemoteServer);
HttpWebRequest rq = (HttpWebRequest)HttpWebRequest.Create(new Uri(baseUri, action));
rq.Method = "POST";
rq.ContentType = "application/x-www-form-urlencoded";
rq.Accept = "text/xml";
rq.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
Encoding encoding = Encoding.GetEncoding("UTF-8");
byte[] chars = encoding.GetBytes(body);
rq.ContentLength = chars.Length;
using (Stream stream = rq.GetRequestStream())
{
stream.Write(chars, 0, chars.Length);
stream.Close();
}
XDocument doc;
WebResponse rs = rq.GetResponse();
using (Stream stream = rs.GetResponseStream())
{
using (XmlTextReader tr = new XmlTextReader(stream))
{
doc = XDocument.Load(tr);
responseXml = doc.Root;
}
if (responseXml == null)
{
throw new Exception("No response");
}
}
return responseXml;
Once you've got the data back you need to render HTML, lots and lots of choices - if you just want to convert what you've got into HTML with minimal further processing then you can use XSLT - which is a question all on its own. If you need to do stuff with it then the question is too vague and you'll need to be more specific.
Create a xsl stylesheet, and inject the stylesheet element into the resulting xml from teh page
I want the fastest method to download the source of HTML with given URL address
Is there any solution beyond normal C# solutions like (WebClient Download or HttpWebRequest, HttpWebResponse)
that speed up fetching HTML source code ??
I normally just use this function when downloading and viewing html.
string getHtml(string url)
{
HttpWebRequest myWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
myWebRequest.Method = "GET";
// make request for web page
HttpWebResponse myWebResponse = (HttpWebResponse)myWebRequest.GetResponse();
StreamReader myWebSource = new StreamReader(myWebResponse.GetResponseStream());
string myPageSource = string.Empty;
myPageSource = myWebSource.ReadToEnd();
myWebResponse.Close();
return myPageSource;
}
http://www.devasp.net/net/articles/display/994.html