I am trying to create HTTP Request to get the full html source with CSS styles. I already got the HTML source but without CSS styles ! , is there any other way to get full HTML with CSS styles ?
// create an instance
WebClient webClient = new WebClient();
// call the HTML page you want to download, and get it as a string
string htmlCode = webClient.DownloadString(uri);
return htmlCode;
Actually you can only get page text with "DownloadString" method. If css written in external file you can't get it.
Try this links: With Css - https://www.google.com/
Without Css - https://www.facebook.com/
In google head tags include some css and you can get them. But facebook don't.
You will have to include the full css path of the website you trying to get the css content from for example : www.data.com/css/stylex.css
Related
I'm looking for a method that replicates a Web Browsers Save Page As function (Save as Type = Text Files) in C#.
Dilemma: I've attempted to use WebClient and HttpWebRequest to download all Text from a Web Page. Both methods only return the HTML of the web page which does not include dynamic content.
Sample code:
string url = #"https://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?trackingNumber=" + package.Item2 + "&LOCALE=en";
try
{
System.Net.ServicePointManager.SecurityProtocol = System.Net.SecurityProtocolType.Tls11 | System.Net.SecurityProtocolType.Tls12;
using (WebClient client = new WebClient())
{
string content = client.DownloadString(url);
}
}
The above example returns the HTML without the tracking events from the page.
When I display the page in Firefox, right click on the page and select Save Page As and save as Text File all of the raw text is saved in the file. I would like to mimic this feature.
If you are scraping a web page that shows dynamic content then you basically have 2 options:
Use something to render the page first. The simplest in C# would be to have a WebBrowser control, and listen for the DocumentCompleted event. Note that there is some nuance to this when it fires for multiple documents on one page
Figure out what service the page is calling to get the extra data, and see if you can access that directly. It may well be the case that the Canadapost website is accessing an API that you can also call directly.
If I use this
WebClient client = new WebClient();
String htmlCode = client.DownloadString("http://test.net");
I am able to use the agility pack to scan the html and get most of the tags that I need but its missing the html that is rendered by the javascript.
My question is, how do I get the final rendered page source using c#. Is there something more to the WebClient to get the final rendered source after javascript is run?
The HTML Agility Pack alone is not enough to do what you want, You need a javascript engine as well. To do that, you may want to check out something like Geckofx, which will allow you to embed a fully functional web browser into your application, and than allow you to programatically access the contents of the dom after the page has rendered.
http://code.google.com/p/geckofx/
You need to wrap a browser in your application.
You are in luck! There is a .NET wrapper for WebKit. http://webkitdotnet.sourceforge.net/
You can use the WebBrowser Class from System.Windows.Forms.
using (WebBrowser wb = new WebBrowser())
{
//Code here
}
https://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(v=vs.110).aspx
I know that I can access all iframes using the following properties of webbrowser:
string html = webBrowser1.Document.Window.Frames[0].WindowFrameElement.InnerText;
But I'm struggling with cross-domain restriction..
My document url is like www.subdomain1.sport.com/...
And iframes url is like www.subdimain2.sport.com/...
How to access iframes content and put some text into input tag there?
I think you must refer the following URL to get the content of IFrame which exists on cross domain.
http://codecentrix.blogspot.com/2008/02/when-ihtmlwindow2document-throws.html
We are supplied with HTML 'wrapper' files from the client, which we need to insert out content into, and then render the HTML.
Before we render the HTML with our content inserted, I need to add a few tags to the <head> section of the client's wrapper, such as references to our script files, css and some meta tags.
So what I'm doing is
string html = File.ReadAllText(wrapperLocation, Encoding.GetEncoding("iso-8859-1"));
and now I have the complete HTML. I then search for a pre-defined content well in that string and insert our content into that, and render it.
How can I create an instance of a HTML document and modify the <head> section as required?
edit: I don't want to reference System.Windows.Forms so WebBrowser is not an option.
I haven't tried this library myself, but this would probably fit the bill: http://htmlagilitypack.codeplex.com/
You can use https://github.com/jamietre/CsQuery to edit an html dom.
var dom = CQ.Create(html);
var dom = CQ.CreateFromUrl("http://www.jquery.com");
dom.Select("div > span")
.Eq(1)
.Text("Change the text content of the 2nd span child of each div");
Just select the head and add to it.
I use the WebBrowser control as host, and navigate/alter the document through its Document property.
Nice documentation and samples at the link above.
Are you using MasterPages?
This seems like the most obvious use of them.
The MasterPage has <asp:ContentPlaceHolder>'s for all the points where you want the content to go.
In our app we have a base controller that overrides all the View() overloads so that it reads in the name of the MasterPage from the web.config. That way customising the app is as simple as a new MasterPage and from a Controllers point of view there is no code change since our base class handles the MasterPage/web.config stuff.
I couldn't get an automated solution to this, so it came down to a hack:
public virtual void PopulateCssTag(string tags)
{
// tags is a pre-compsed string containing all the tags I need.
this.Wrapper = this.Wrapper.Replace("</head>", tags + "</head>");
}
Hi I tried to read a page using HttpWebRequest like this
string lcUrl = "http://www.greatandhra.com";
HttpWebRequest loHttp = (HttpWebRequest)WebRequest.Create(lcUrl);
loHttp.Timeout = 10000; // 10 secs
loHttp.UserAgent = "Code Sample Web Client";
HttpWebResponse loWebResponse = (HttpWebResponse)loHttp.GetResponse();
Encoding enc = Encoding.GetEncoding(1252); // Windows default Code Page
StreamReader loResponseStream =
new StreamReader(loWebResponse.GetResponseStream(), enc);
string lcHtml = loResponseStream.ReadToEnd();
mydiv.InnerHtml = lcHtml;
// Response.Write(lcHtml);
loWebResponse.Close();
loResponseStream.Close();
i can able to read that page and bind it to mydiv. But when i click on any one of links in that div it is not displaying any result. Because my application doesnt contain entire site. So what we will do now.
Can somebody copy my code and test it plz
Nagu
I'm fairly sure you can't insert a full page in a DIV without breaking something. In fact the whole head tag may be getting skipped altogether (and any javascript code there may not be run). Considering what you seem to want to do, I suggest you use an IFRAME with a dynamic src, which will also hopefully lift some pressure off your server (which wouldn't be in charge of fetching the html to be mirrored anymore).
If you really want a whole page of HTML embedded in another, then the IFRAME tag is probably the one to use, rather than the DIV.
Rather than having to create a web request and have all that code to retrieve the remote page, you can just set the src attribute of the IFRAME to point ot the page you want it to display.
For example, something like this in markup:
<iframe src="<%=LcUrl %>" frameborder="0"></iframe>
where LcUrl is a property on your code-behind page, that exposes your string lcUrl from your sample.
Alternatively, you could make the IFRAME runat="server" and set its src property programatically (or even inject the innerHTML in a way sismilar to your code sample if you really wanted to).
The code you are putting inside .InnerHtml of the div contains the entire page (including < html >, < body >, < /html > and < /body> ) which can cause a miriad of problems with any number of browsers.
I would either move to an iframe, or consider some sort of parsing the HTML for the remote site and displaying a transformed version (ie. strip the HTML ,BODY, META tags, replace some link URLs, etc).
But when i click on any one of links in that div it is not displaying any result
Probably because the links in the download page are relative... If you just copy the HTML into a DIV in your page, the browser considers the links relative to the current URL : it doesn't know about the origin of this content. I think the solution is to parse the downloaded HTML, and convert relative URLs in href attributes to absolute URLs
If you want to embed it, you need to strip everything but the body part. That means that you have to parse your string lcHTML for <body....> and remove everything before and includeing the body tag. You must also strip away everything from </body>. Then you need to parse the string for all occurences of <a href="....."> that do not start with http:// and include h t t p://www.greatandhra.com or set <base target="h t t p://www.greatandhra.com"> in your head section.
If you don't want to embed, simply clear the response buffer and stream the lcHTML string back to the browser.
PS: I had to write all h t t p with spaces to be able to post this.
Sounds like what you are trying to do is display a different site embedded in your site. For this to work by dropping it into a div you would have to extract the code between the body tags as it wouldn't be valid with html and head in the middle of another page.
The links won't work because you've now taken that page out of context in your site so you'd also have to rewrite any links on the page that are relative (i.e. don't start with http) to point to a page on your site which will then fetch the other sites page and display them back in your site, or you could add the url of the site you're grabbing to the beginning of all the relative links so they link back to that site.