Download js generated html with C# - c#

There is a reports website which content I want to parse in C#. I tried downloading the html with WebClient but then I don't get the complete source since most of it is generated via js when I visit the website.
I tried using WebBrowser but could't get it to work in a console app, even after using Application.Run() and SetApartmentState(ApartmentState.STA).
Is there another way to access this generated html? I also took a look into mshtml but couldn't figure it out.
Thanks

The Javascript is executed by the browser. If your console app gets the JS, then it is working as expected, and what you really need is for your console app to execute the JS code that was downloaded.

You can use a headless browser - XBrowser may server.
If not, try HtmlUnit as described in this blog post.

Just a comment here. There shouldn't be any difference between performing an HTTP request with some C# code and the request generated by a browser. If the target web page is getting confused and not generating the correct markup because it can't make heads or tails of from the type of browser it thinks it's serving then maybe all you have to do is set the user agent like so:
((HttpWebRequest)myWebClientRequest).UserAgent = "<a valid user agent>";
For example, my current user agent is:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Maybe once you do that the page will work correctly. There may be other factors at work here, such as the referrer and so on, but I would try this first and see if it works.

Your best bet is to abandon the console app route and build a Windows Forms application. In that case the WebBrowser will work without any work needed.

Related

Redirect to another html page

Is there a way to redirect my application to a webpage after checking the browser version first?
I'm using C# to run my angular app and the index.html is loaded by default, but is there a way to control that ?
E.g : if my browser is IE load wrongBrowser.html otherwise load index.html (the default one)
Note that i dont want to redirect my page because i want to keep the orignal url : ex localhost/api/search=text. If i do a redirect, it will overide my url. So i just want to load the html content
Im using C# with visual Studio for the server side
The first page of your app will have to load as it needs to be able to determine the browser specs. Only then can you then redirect to another page based on that knowledge.
I have never used Angular JS neither Angular with C#, but from personal knowledge I know you can "redirect" without changing the url, using a XML request in vanilla javascript (maybe you can place this somewhere):
var request = new XMLHttpRequest();
request.addEventListener("load", function(evt){
document.write(evt.target.response);
}, false);
request.open('GET', 'a.html', true),
request.send();
Now what this does is simple, we set variable request which is a XMLHttpRequest object, then we set an event listener for when it loads, after it loads we replace the code in the page with the targets code, we then set the url and send the request.
I have only used this for testing, so there might be issues that I don't know of, but it does import the html code in.
In C# you can do the following using Request.Browser:
if(Request.Browser.Browser.IndexOf("InternetExplorer")){
return View("wrongBrowser");
}
You have to use IndexOf due to the fact that most, if not all browsers return their version in their name too.
Here's a list of possible browser strings:
IE <= 11: InternetExplorer <numeric version>
Edge: Edge <version>
Safari: Safari <version>
Chrome: Chrome <version>
Opera: Chrome <version>
There are more, most of them will come under Chrome though. I will not go into much detail as you specified IE11 which is listed above. The above method is not really reliable for other, less popular browsers so keep that in mind.

HttpWebRequest POST data

Is it possible to make an exact identical POST with HttpWebRequest in C# as a browser would? Without a page being able to detect that it is actually no browser?
If so, were could i read up more on that?
Download and become familiar with a tool like Fiddler. It allows you to inspect web requests made from applications, like a normal browser, and see exactly what is being sent. You can then emulate the data being sent with a request created in C#, providing values for headers, cookies, etc.
I think this is doable.
Browser detection is done based on a header in the request. All you need to do is set that header. In HttpWebRequest we dont need to set the headers collection but rather the .UserAgent property.
Eg:
.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)";
There is quite a lot to user agents. Check this link for the complete list of User-Agents
Useful Links:
How to create a simple proxy in C#?
Is WebRequest The Right C# Tool For Interacting With Websites?
http://codehelp.smartdev.eu/2009/05/08/improve-webclient-by-adding-useragent-and-cookies-to-your-requests/

Get HTTP headers on WebKit.NET

I've been trying to figure out how to handle 401 responses on WebKit.NET and show an authentication box so that user can enter his credentials and then send them back to the server.
This guy figured a way to add the proper headers to a new request and send them to the server, but seems like the code is sending them to every page that the browser navigates to which is not what I want. I dug a bit into the code and there is this interface called IWebResourceLoadDelegate which among other contains two event handlers called didReceiveResponse and didReceiveContentLength that will be called for every response, but can't figure out how in the world to read the headers from the parameters being passed. I think the header is just not being passed at all.
Also, seems like the guys at web kit sharp haven't solve this issue either, but somehow Chrome does handles it properly. I'm not sure which build of WebKit Chrome uses. I just hope is not a custom build such that I won't have a choice other than spending the rest of my life trying to build WebKit (and the other rest trying to add the missing functionality).
Any one has any idea how could I begin to figure out how to handle this?
I haven't worked on this project in some time, but it looks to me like you should be able to get the request headers from the WebURLResponse object, perhaps from the allHeaderFields or statusCode methods...
It would be really great if you could finish my work to get full HTTP Auth support in WebKit.NET. I just haven't had the time... Chrome and Safari have their own proprietary implementations that do the trick.

in a controller in asp.net-mvc how can i get information about the users browser?

I am logging errors on my asp.net-mvc site and I wanted to see if there is anyway to detect the users browser info (name, version, etc) as it seems like people are getting issue but its because they are using very old browser. This info would help me avoid debugging time if I know they are using a "Not supported" browser.
You can get the supplied User Agent which gives browser information:
Request.UserAgent
There is a site which lists browser user agent strings: http://www.useragentstring.com
Other values you may be interested in.
Request.Browser.Platform
Request.Browser.Version
Request.Browser.EcmaScriptVersion
You may try the Request.Browser property. It will contain pretty much everything you might need about the client browser (assuming it is sending the UserAgent header properly of course).

How to launch a browser and later direct it to a page?

I need to launch a browser, do some work and then make the browser navigate to a URL (in that order).
The first part is of course simple and I have a Process object. I am at a loss as to how to later direct it to the target page?
How do I treat the Process as a browser and make it navigate to the desired page?
Any help, pointers, code snippets appreciated.
Instead of launching the browser & then navigating to the page, just tell the OS that you want to run the URL. Windows will pick the correct browser, and navigate the user to the given URL.
System.Diagnostics.Process.Start("http://www.StackOverflow.com");
If you don't need to do this in production, you could use a testing library such as WatiN to do this:
using WatiN.Core;
//Placeholder page to launch initial browser
IE ie = new IE("http://www.google.com");
DoSomeWork();
//Now navigate to the page you want
ie.GoTo("http://stackoverflow.com");
My first instinct for this question was DDE, but it appears that has been decommissioned in Windows Vista so that is no good. Shame, as it was the only consistent mechanism in Windows for Interprocess Communication (IPC)...oh how I miss Arexx on the Amiga.
Anyhow, I believe the following will work but unfortunately, due to the way it works, it launches Internet Explorer irrespective of the configured browser.
If your application has a Form, then create a WebBrowser control on it. Set this to non-visible as we are only making use of its as a launching device rather than to display the web page.
In code, at the point where you want to show a web page, use the following code:
webBrowser1.DocumentText = "window.open('How to launch a browser and later direct it to a page?', 'BananasAreOhSoYummy');";
What this does is to tell the WebBrowser control, which is just the IE in disguise, to open a new window called 'BananasAreOhSoYummy'. Because we have given the window a name, we can use that line repeatedly, with different URLs, to change the page in that particular browser window. (A new window will be opened if the user has happened to close it.)
I will have a think about an approach that honours the user's default browser choice.
If you don't need the actual instance of IE, you can use the System.Windows.Forms.WebBrowser control.
I think instead of sending the browser a url you could send it javascript that would run and direct the browser to a site.
Not sure if this would work but I see no reason why it wouldn't

Categories

Resources