How to "render" HTML without WebBrowser control - c#

First at all - I don't know if "render" is the right word. I like to get information from an website. At the moment therefore I use the WebBbrowser control.
Now I like to use HttpWebRequests because I think, that's much faster and threading could be used more easily. But I can't use them, because the HTML-output string I receive from HttpWebRequest has got a lot of JavaScript inside from which the information, that I need, will be created.
The web browser "renders" the JavaScript to readable HTML. How to do this step "by hand", so that I can use HttpWebRequests?
I hope you understand, what I want.

so if you need the javascript rendering engine i suggest you have a look at selenium project.

I solved my problem with a project called Awesomium. In this thread you will find all you need to get html with executed javascript. The "special part" is this one:
var sourceVal = webView.ExecuteJavascriptWithResult( "document.getElementsByTagName('html')[0].outerHTML;" );
if ( sourceVal != null )
{
using ( sourceVal )
{
html = sourceVal.ToString();
}
}

Related

How can I combine two urls the same way a browser does?

I'm writing some kind of a page scraper, and one of the things I'm looking to do is combine the current url with an url fragment extracted from the current page.
Like this:
if (WebPath.IsAbsolute(urlFragment))
links.Add(new Uri(urlFragment));
else
links.Add(new Uri(currentUrl, urlFragment));
Easy peasy - this approach works most of the time, for both relative and absolute Uris.
However, some pages look like http://example.com/couple/of/folders/, with the url fragment couple/of/otherfolders/. And every single browser out there interprets that as http://example.com/couple/of/otherfolders.
Of course, my code yields http://example.com/couple/of/folders/couple/of/otherfolders. Which totally looks correct from the Uri's point of view - but I don't get how a browser can interpret this otherwise.
Now, I've searched for a solution to this problem, but I only found people who didn't know how to combine two urls, so that didn't get me very far. Closest thing I found was this question: How do you combine URL fragments in Java the same way browsers do? , but the answer doesn't tackle my particular problem.
Does anybody know what I'm missing?
Edit - this is the IsAbsolute method (I know I should replace it with new Uri(link).IsAbsoluteUri):
public static bool IsAbsolute(string path)
{
var uppercasePath = path.ToUpper();
return uppercasePath.StartsWith("HTTP://") || uppercasePath.StartsWith("HTTPS://");
}
Normally, browsers wouldn’t do that. But when there’s a <base> element, its href replaces the current page’s URL for the page’s URL-resolving purposes.
Check for a <base> and use it in place of currentUrl if it exists.
Also, thanks for reminding me to fix all my scrapers!

Detect Razor/C# code?

Is there a way to detect if an HTML page contains any razor/C# code? Essentially I want users to be able to provide custom layouts, with tags that I will replace with RenderSection. I want to validate that prior to making this replacement, that none of the HTML contains anything like for example, <a href="#(some C# code)".
All discussions about alternative ways to do this, should/could/would aside, just simply:
Is there a way to programmatically detect if a file contains C#/Razor code?
I don't know a lot about the Razor markup -- but I am thinking that when you grab the layout string they are passing in you will want to parse the text out and grab everything that starts with an # and toss those words into an array. Then, when you republish it to you website use razor code to access the data in the array...
Alternately, and easier, would be to go through all the passed in code and replace all the # signs with a different symbol say & that way it wont get interpreted by the Razor processor:
layoutString = layoutString.Replace('#', '&');
In the browser? No, because unless the programmer made a mistake, there is no Razor/C# code in teh rendered HTML, only HTML that was the result of that.
What you ask is like asking what type of oven was used to bake a pizza from the pizza. Bad news - you never will know.
If you provie sensible tags from those, you could parse them in javascript, but you have to output that metadata yourself as part of the generated html.
After reading your comment to TomTom; the answer is:
No. Razor does not come with any public syntax parser.

javascript alert not working

I have this code and I am trying to run it on a .NET platform but it is not working. Does anyone have any idea what is wrong with my code? Thanks. I am using visual studio 2010, and c# programming language.
private void AlertWithConfirmation()
{
Response.Write("<script language='javascript'>");
Response.Write("var x=window.confirm(\"Are you sure you are ok?\")");
Response.Write("if (x)");
Response.Write("window.alert(\"Good!\")");
Response.Write("else");
Response.Write("window.alert(\"Too bad\")");
Response.Write("</script>");
}
Your code produces this:
<script language='javascript'>var x=window.confirm("Are you sure you are ok?")if (x)window.alert("Good!")elsewindow.alert("Too bad")</script>
Note the elsewindow identifier that comes from the lack of separator between the commands, which of course does not exist. It will cause an error because the undefined value doesn't have an alert method.
Some improvements:
Use the type attribute instead of the deprecated langauge attribute.
Use semicolons at the end of statements.
Use brackets around code blocks (e.g. following if).
Use the return value from confirm directly instead of polluting the global namespace with a variable.
Write it as a single string instead of a bunch of strings.
:
private void AlertWithConfirmation() {
Response.Write(
"<script type=\"text/javascript\">" +
"if (window.confirm('Are you sure you are ok?')) {" +
"window.alert('Good!');" +
"} else {" +
"window.alert('Too bad');" +
"}" +
"</script>"
);
}
Note that if you use this within a regular page, it will write the script tag before the doctype tag, which will cause the browser to go into quirks mode, which will most likely mess up your layout. If you want to add scripts to a regular page you should put a PlaceHolder on the page where you can add it, or use the ClientScriptManager.RegisterStartupScript method.
Make sure the result of the Response.Write looks something like this:
<script type="text/javascript">
var x=window.confirm('Are you sure you are ok?');
if (x) {
window.alert('Good!');
} else {
window.alert('Too bad');
}
</script>
The HTML generated by an aspx page is rendered in the Render phase which is at the end of the page lifecycle.
Therefore if you call Response.Write earlier in the page lifecycle, it will output at the start of the page before the first tag - almost certainly not what you want.
If you inspect the generated HTML (View Source in the browser) you'll see this.
In general, if you want to render some javascript you should use some other technique, such as setting the Text property of a Literal control at the appropriate place in the page.
You have already asked two similar questions in a period of 24h. You got to have some patience.
how to use javascript alert so that user can choose
Javascript alert problem

Convert HTML to XML with WP7

simple situation, want to search through a HTML string, get out a couple of information.
Gets annoying after writing mass lines of .Substing and. IndexOf for each element i want to find and cut out of the HTML file.
Afaik i´m unable to load such dll as HTMLtidy or HTML Agility Pack into my WP7 project so is there a more efficient and reliable way to search trough my HTML string instead of building Substings with IndexOf?
void client_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
{
string document = string.Empty;
using (var reader = new StreamReader(e.Result))
document = reader.ReadToEnd();
string temp = document.Substring(document.IndexOf("Games Played"), (document.IndexOf("League Games") - document.IndexOf("Games Played")));
temp = (temp.Substring(temp.IndexOf("<span>"), (temp.IndexOf("</span>") - temp.IndexOf("<span>")))).Remove(0, 6);
Int32.TryParse(temp, out leaugeGamesPlayed);
}
Thanks for your help
Gpx
You can use the HTML Agility Pack but you need the converted version of HTML Agility Pack for the Phone. It's only available from svn repository but it works great, I use it in my app.
http://htmlagilitypack.codeplex.com/SourceControl/changeset/view/77494#
You can find two projects under trunk named HAPPhone and HAPPhoneTest. You can use the download button to the right to get the code. It uses Linq instead of XPath to work.
You could use LINQ to parse the HTML and locate the elements that you're interested in. For example:
XDocument parsed = XDocument.Parse(document);
var spans = parsed.Descendants("span");
Beth Massi has a great blog post: Querying HTML with LINQ to XML
Assuming you're doing this because you're getting the HTML from a web site/page/server.
Don't convert it on the device.
Create a wrapper/proxy site/server/page to do the conversion for you. While this has the downside of having to create the extra service, it has the following advantages:
Code on the server will be easier to update than code within a distrbued app. (Experience with parsing HTML you don't directly control will show that you will need to make changes in your parsing as the original HTML is almost certain to throw something unexpected at you when changed in the future.)
If you can do it once on the server you can cache the result rather than having instance of the app have to do the conversion over.
By virtue of the above 2 points, the app will run faster!
If you have the HTML file at design/build time then convert it to something easier to work with and avoid unnecessary computation at run time.
As a workaround, you could consider loading the HTML into a WebBrowser control and then query the DOM via injected javascript (which calls back to .NET)

Is there a jQuery-like CSS/HTML selector that can be used in C#?

I'm wondering if there's a jQuery-like css selector that can be used in C#.
Currently, I'm parsing some html strings using regex and thought it would be much nicer to have something like the css selector in jQuery to match my desired elements.
Update 10/18/2012
CsQuery is now in release 1.3. The latest release incorporates a C# port of the validator.nu HTML5 parser. As a result CsQuery will now produce a DOM that uses the HTML5 spec for invalid markup handling and is completely standards compliant.
Original Answer
Old question but new answer. I've recently released version 1.1 of CsQuery, a jQuery port for .NET 4 written in C# that I've been working on for about a year. Also on NuGet as "CsQuery"
The current release implements all CSS2 & CSS3 selectors, all jQuery extensions, and all jQuery DOM manipulation methods. It's got extensive test coverage including all the tests from jQuery and sizzle (the jQuery CSS selection engine). I've also included some performance tests for direct comparisons with Fizzler; for the most part CsQuery dramatically outperforms it. The exception is actually loading the HTML in the first place where Fizzler is faster; I assume this is because fizzler doesn't build an index. You get that time back after your first selection, though.
There's documentation on the github site, but at a basic level it works like this:
Create from a string of HTML
CQ dom = CQ.Create(htmlString);
Load synchronously from the web
CQ dom = CQ.CreateFromUrl("http://www.jquery.com");
Load asynchronously (non-blocking)
CQ.CreateFromUrlAsync("http://www.jquery.com", responseSuccess => {
Dom = response.Dom;
}, responseFail => {
..
});
Run selectors & do jQuery stuff
var childSpans = dom["div > span"];
childSpans.AddClass("myclass");
the CQ object is like thejQuery object. The property indexer used above is the default method (like $(...).
Output:
string html = dom.Render();
You should definitely see #jamietre's CsQuery. Check out his answer to this question!
Fizzler and Sharp-Query provide similar functionality, but the projects seem to be abandoned.
Not quite jQuery like, but this may help:
http://www.codeplex.com/htmlagilitypack
For XML you might use XPath...
I'm not entirely clear as to what you're trying to achieve, but if you have a HTML document that you're trying to extract data from, I'd recommend loading it with a parser, and then it becomes fairly trivial to query the object to pull desired elements.
The parser I linked above allows for use of XPath queries, which sounds like what you are looking for.
Let me know if I've misunderstood.

Categories

Resources