I'm looking for a method that replicates a Web Browsers Save Page As function (Save as Type = Text Files) in C#.
Dilemma: I've attempted to use WebClient and HttpWebRequest to download all Text from a Web Page. Both methods only return the HTML of the web page which does not include dynamic content.
Sample code:
string url = #"https://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=" + package.Item2 + "&LOCALE=en";
try
{
System.Net.ServicePointManager.SecurityProtocol = System.Net.SecurityProtocolType.Tls11 | System.Net.SecurityProtocolType.Tls12;
using (WebClient client = new WebClient())
{
string content = client.DownloadString(url);
}
}
The above example returns the HTML without the tracking events from the page.
When I display the page in Firefox, right click on the page and select Save Page As and save as Text File all of the raw text is saved in the file. I would like to mimic this feature.
If you are scraping a web page that shows dynamic content then you basically have 2 options:
Use something to render the page first. The simplest in C# would be to have a WebBrowser control, and listen for the DocumentCompleted event. Note that there is some nuance to this when it fires for multiple documents on one page
Figure out what service the page is calling to get the extra data, and see if you can access that directly. It may well be the case that the Canadapost website is accessing an API that you can also call directly.
recently, I've been developing a website with .NET c# and I've been trying to display images via the imagepath in the db. It is currently working by this line of code.
return File(byte array, "image/jpeg");
The problem is that by this line of code the layout page is completely ignored as only the image with white background is shown. I need to display the image along with the return view (the image inside the layout of the website). thx
The reason you're not seeing anything is because you can only have one type of response per request; Either a response containing HTML or a response containing the image.
What you need to do in order to serve images via your application is set up a route that you can put in your <img> tags as the src attribute.
<img src="/static/images/123">
Your route would listen for requests on the /static/images/ path and then try to parse the ID number at the end. It could then take that ID number (123) and look up the relevant image in your database.
So, to be clear, you'd have at least two requests that occur; First, you serve the request for the page, then you serve subsequent requests for the image(s). These two request handlers do not share the same code.
Finally, if you really wanted to "inline" an image as part of the page response, The only way you can "inline" an image into a page is to base64 encode it and set that as as the src of an <img> tag. This process is slow and bloats your HTML, making it take longer to load.
I know that I can access all iframes using the following properties of webbrowser:
string html = webBrowser1.Document.Window.Frames[0].WindowFrameElement.InnerText;
But I'm struggling with cross-domain restriction..
My document url is like www.subdomain1.sport.com/...
And iframes url is like www.subdimain2.sport.com/...
How to access iframes content and put some text into input tag there?
I think you must refer the following URL to get the content of IFrame which exists on cross domain.
http://codecentrix.blogspot.com/2008/02/when-ihtmlwindow2document-throws.html
Html code from GeckoWebBrowser.Document.DocumentElement.InnerHtml property differs from html downloaded from server because it is converted to DOM and there could be Javascript that changes document structure.
How to get real page source?
There's a method for that:
GeckoWebBrowser.ViewSource();
or
GeckoWebBrowser.ViewSource(string url);
Opens a new window which contains the source code for the current (or specified) page. If you just want the text, try:
GeckoWebBrowser.Navigate("view-source:" + url);
I have a new way to fix your problem like that:
GeckoWebBrowser1.Navigate("about:blank")
Me.Delay(1) Make your function to wait 1 second here
GeckoWebBrowser1.Navigate("view-source:" + "YourUrl")
Hi I tried to read a page using HttpWebRequest like this
string lcUrl = "http://www.greatandhra.com";
HttpWebRequest loHttp = (HttpWebRequest)WebRequest.Create(lcUrl);
loHttp.Timeout = 10000; // 10 secs
loHttp.UserAgent = "Code Sample Web Client";
HttpWebResponse loWebResponse = (HttpWebResponse)loHttp.GetResponse();
Encoding enc = Encoding.GetEncoding(1252); // Windows default Code Page
StreamReader loResponseStream =
new StreamReader(loWebResponse.GetResponseStream(), enc);
string lcHtml = loResponseStream.ReadToEnd();
mydiv.InnerHtml = lcHtml;
// Response.Write(lcHtml);
loWebResponse.Close();
loResponseStream.Close();
i can able to read that page and bind it to mydiv. But when i click on any one of links in that div it is not displaying any result. Because my application doesnt contain entire site. So what we will do now.
Can somebody copy my code and test it plz
Nagu
I'm fairly sure you can't insert a full page in a DIV without breaking something. In fact the whole head tag may be getting skipped altogether (and any javascript code there may not be run). Considering what you seem to want to do, I suggest you use an IFRAME with a dynamic src, which will also hopefully lift some pressure off your server (which wouldn't be in charge of fetching the html to be mirrored anymore).
If you really want a whole page of HTML embedded in another, then the IFRAME tag is probably the one to use, rather than the DIV.
Rather than having to create a web request and have all that code to retrieve the remote page, you can just set the src attribute of the IFRAME to point ot the page you want it to display.
For example, something like this in markup:
<iframe src="<%=LcUrl %>" frameborder="0"></iframe>
where LcUrl is a property on your code-behind page, that exposes your string lcUrl from your sample.
Alternatively, you could make the IFRAME runat="server" and set its src property programatically (or even inject the innerHTML in a way sismilar to your code sample if you really wanted to).
The code you are putting inside .InnerHtml of the div contains the entire page (including < html >, < body >, < /html > and < /body> ) which can cause a miriad of problems with any number of browsers.
I would either move to an iframe, or consider some sort of parsing the HTML for the remote site and displaying a transformed version (ie. strip the HTML ,BODY, META tags, replace some link URLs, etc).
But when i click on any one of links in that div it is not displaying any result
Probably because the links in the download page are relative... If you just copy the HTML into a DIV in your page, the browser considers the links relative to the current URL : it doesn't know about the origin of this content. I think the solution is to parse the downloaded HTML, and convert relative URLs in href attributes to absolute URLs
If you want to embed it, you need to strip everything but the body part. That means that you have to parse your string lcHTML for <body....> and remove everything before and includeing the body tag. You must also strip away everything from </body>. Then you need to parse the string for all occurences of <a href="....."> that do not start with http:// and include h t t p://www.greatandhra.com or set <base target="h t t p://www.greatandhra.com"> in your head section.
If you don't want to embed, simply clear the response buffer and stream the lcHTML string back to the browser.
PS: I had to write all h t t p with spaces to be able to post this.
Sounds like what you are trying to do is display a different site embedded in your site. For this to work by dropping it into a div you would have to extract the code between the body tags as it wouldn't be valid with html and head in the middle of another page.
The links won't work because you've now taken that page out of context in your site so you'd also have to rewrite any links on the page that are relative (i.e. don't start with http) to point to a page on your site which will then fetch the other sites page and display them back in your site, or you could add the url of the site you're grabbing to the beginning of all the relative links so they link back to that site.