parsing html, source code, javascript problem - c#

http://booking.travel24.com/index.php?KID=610000&&id=lmpergebnis&showresult=1&detail=zielgebiet&region=-1&ziel=-1&termin=20.02.2011&ruecktermin=17.03.2011&dauer=-1&abflughafen=46&personen=25;25&kategorie=-1&verpflegung=-1&zimmer=-1
I am trying to parse some HTML parts of this page, but when I check the source code I can not find this: "Tunesien, Marokko".
If I check with xdeveloper I can see this as html:
<a class="reglreg" href="javascript:s_hliste(20009);">Tunesien, Marokko</a>
but if i check source code of the page I can't find this. Why?

If you view the source and search for "Marokko" you will see there are several places where it occurs (loaded as data in several JavaScript arrays).
It appears as if some of the content is produced dynamically through the JavaScript loaded onto the page. That JavaScript builds HTML and changes the page to include the content you are looking for.

To answer your first real question
Why?
Because when you check the source code inside a browser, you'll get the original html code. Then javascript comes along and modify the DOM which you can follow in any modern browser's console.
can i get somehow whole source code
then? if i can not see it in browser
how can i see it?
To make it simple, it depends how you're trying to parse it. With what language?

maybe the data is coming via AJAX call, so it's not there on the html at the start, but dynamically added to it.
if you need to parse this, you can try to "emulate" the ajax call yourself.

Related

Having trouble using razor C# with JavaScript

I am trying to get Razor (C#) and javascript to play nicely together but I can't seem to do it. I have searched the other articles on StackOverflow, but none of them seem
to work for me.
Some noticeable differences from other posts and mine include the following:
I am using an external JavaScript file (not mandatory, but it is there).
I am using a cshtml file for the header layout for all pages (which puts the head tag in a different file than the one actually attempting to call the function.
I also use jQuery, if it would be easier that way.
What I am trying to accomplish:
All I need to do is get the contents of a tag (innerHTML, or .html in jQuery) (by id, class, whatever) and assign that value to "AppState["gEntryID"] for use with the next page.
Some things I have tried:
function entryClickHandler()
{
#AppState["gEntryID"] = document.getElementById("tester").innerHTML;
}
AND
function entryClickHandler()
{
<text>
#AppState["gEntryID"] = document.getElementById("tester").innerHTML;
</text>
}
I have tried these (and a few other variations on these) in both the external file and the head section within the HeaderLayout File.
I understand that C# runs before the page and the JavaScript mostly runs after the page (at least with events such as this).
Any help would be greatly appreciated.
It doesn't work that way. You cannot set variables in the C#/Razor side with JavaScript without using a form post or ajax submit.
Javascript doesn't get access to the page until after Razor has done it's part and rendered and sent the page.

$(selector).text() equivalent in c# (Revised)

I am trying check if the inner html of the element is empty but I wanted to do the validation on the server side, I'm treating the html as a string. Here is my code
public string HasContent(string htmlString){
// this is the expected value of the htmlString
// <span class="spanArea">
// <STYLE>.ExternalClass234B6D3CB6ED46EEB13945B1427AA47{;}</STYLE>
// </span>
// From this jquery code-------------->
// if($('.spanArea').text().length>0){
//
// }
// <------------------
// I wanted to convert the jquery statement above into c# code.
/// c# code goes here
return htmlSTring;
}
using this line
$('.spanArea').text() // what is the equivalent of this line in c#
I will know if the .spanArea does really have something to display in the ui or not. I wanted to do the checking on the server side. No need to worry about how to I managed to access the DOM I have already taken cared of it. Consider the htmlString as the Html string.
My question is if there is any equivalent for this jquery line in C#?
Thanks in advance! :)
If you really need to get that data from the HTML in the ServerSide then I would recommend you to use a Html-Parser for that job.
If you check other SO posts you will find that Html Agility Pack was recommended many times.
Tag the SpanArea with runat="server" and you can then access it in the code behind:
<span id="mySpan" class="spanArea" runat="server" />
You can then:
string spanContent = mySpan.InnerText;
Your code-behind for the page that includes this AJAX call will have already have executed (in presenting the page to the browser) before the AJAX call is ever executed so your question doesn't appear correct.
The code-behind that is delivering the HTML fragment you indicated is probably constructing that using a StringBuilder or similar so you should be able to verify in that code whether there is any data.
The fragment you provided only includes a DIV, a SPAN and a STYLE tag. This is all likely to collapse to a zero width element and display nothing.
Have a look at this article which will help you understand the ASP.NET page life cycle.

How to copy all data from a HTML doc and save it to a string using C#

I need to create a data index of HTML pages provided to a service by essentially grabbing all text on them and putting them in a string to go into a storage system.
If this were GUI based, I would simply Ctrl+A on the HTML page, copy it, then go to Notepad and Ctrl+V. Simples. If I can do it via good old point n' click, then surely there must be a way to do it programmatically, but I'm struggling to find anything useful.
The HTML docs in question are being loaded for rendering currently using the System.Windows.Controls.WebBrowser class, so I wonder if its somehow possible to grab the data from there?
I'm going to keep hunting, but any pointers would be very appreciated.
Note: We don't want the HTML source code, and would also really rather not have to parse all the source code to get the text unless we absolutely have to.
If I understand your problem correctly, you will have to do a bit of work to get the data.
WebBrowser browser=new WebBrowser(); // This is what you have
HtmlDocument doc = browser.Document; // This gives you the browser contents
String content =
(((mshtml.HTMLDocumentClass)(doc.DomDocument)).documentElement).innerText;
That last line is the browser's view of the rendered content.
This looks like it might be quite helpful.

Methods for dynamically building JavaScript within an ASPX page?

I have a page that is referenced via a <script> tag from a page on another site. In the script src, I pass in the form I want my script to build (from a db table), and the div where the dynamically built form should go. The calling page looks something like this:
<div id="FormContainer"></div>
<script type="text/JavaScript" src="http://www.example.com/GenerateForm.aspx?FormId=1&div=FormContainer"></script>
GenerateForm.aspx contains the code that reads the QueryString parameters for the FormId, and the Div Id, and outputs JavaScript that will build the form.
My question is this. What are the different methods for "outputting" the JavaScript? Some of the JavaScript is static, and can be packaged into an external .js file and I have jQuery too. But should I add that on the GenerateForm.aspx markup page? Or should I use a ScriptManager?
And what about the dynamically built JavaScript? Currently I'm just using Response.Write() for a proof of concept, but instead, should I be doing something else? Use a Literal control on the page and set its value? Use a ScriptManager? Something else?
I know this is a verbose question, so thanks in advance!
If you want to use a seperate, referenced Javascript file, you probably want to do is use an ashx file. Basically this is just a generic handler that you'll use to write directly to the output stream without having to deal with the ASP.NET page lifecycle. If you add a basic Generic Handler (.ashx) to your site from the Add New Item dialog, the template should be enough direction, using context.Response.Write() to output your Javascript dynamically.
The ScriptManager is more useful if you want to output individual lines of Javascript to be ran at certain times, like after an event has fired. Then you can do ScriptManager.RegisterClientBlock(this, this.GetType(), "CodeBlock", "alert('Button clicked');", true); to show a client alert box after a button has been clicked, for example.
Static files should be handled just that way - statically. The server can handle the caching, and does not cause unnecessary processing if you reference the static script file directly from the script tag. However, if you need to load a static script dynamically, you could, for example, create a literal that had the <script> tag inside it. This way it uses the browser's cached version of the static file.

Difference between Response.Write() and ClientScript.RegisterStartupScript()?

What is the difference between Response.Write() and ClientScript.RegisterStartupScript()
Thank you.
The Response.Write method can be used to output code during the rendering phase of the page. The <%= %> server tag is a shortcut for <%Response.Write( )%>.
If you use Response.Write from the code behind, you will write to the page before it has started rendering, so the code will end up outside the html document. Eventhough the browser will execute the code, it doesn't work properly. Having something before the doctype tag will make the browser ignore the doctype and render the page in quirks mode, which usually breaks the layout. Also, as the script runs before anything of the page exists, the code can't access any elements in the page.
The ClientScript.RegisterStartupScript method is the preferred way of adding script dynamically to the page. It will render the script at the end of the form so that it doesn't break the html documnet, and it can access the elements in the form.
Also, you give each script an identity, which means that duplicates are removed. If a user control registers a script, and you use several instances of the user control, the script will only be rendered once in the page.
There is a huge difference.
Basically Response.Write will write to your response stream right now, normally this will put whatever you write at the very top of your page output, even before the tag (unless you call it after the page render event).
When you use RegisterStartupScript it will wait and write your JavaScript to the response stream after the page's controls have rendered (IE, the controls wrote their HTML to the response stream). This means the JavaScript you register will be executed by the browser after the other HTML before it has been loaded into the DOM. This is very similar to the event. Another thing this does is if “registers” the script so if you have more than one control on the page that both need that JavaScript they can check to see if it’s already been registered so it’s only rendered once and both controls use it client side.
Hopefully that makes sense, there are more details then that but I tried to keep it simple.
Response.Write
The Write method writes a specified
string to the current HTTP output.
ClientScriptManager.RegisterStartupScript
Registers the startup script with the
Page object.
As I think, both these methods are unrelated. Response.Write() can be used to write something on page that is rendered. While ClientScript.RegisterStartupScript() can be used for registering a javascript on page start up.

Categories

Resources