Read ASPX files from filesystem and render to HTML - c#

Is it possible to read an aspx file and render as an html file, and write the resulting html file to disk?
The .aspx file is on the filesystem without the codebehind file. If it is possible, please provide some example code.

from remote url
byte[] buf = new byte[8192];
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(path);
webRequest.KeepAlive = false;
string content = string.Empty;
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
if (!(webResponse.StatusCode == HttpStatusCode.OK))
if (_log.IsErrorEnabled) _log.Error(string.Format("Url {0} not found", path));
Stream resStream = webResponse.GetResponseStream();
int count = 0;
do
{
count = resStream.Read(buf, 0, buf.Length);
if (count != 0)
{
content += encoding.GetString(buf, 0, count);
}
}
while (count > 0);
from network or virtual path
string content = string.Empty;
path = HttpContext.Current.Server.MapPath(path);
if (!File.Exists(path))
if (_log.IsErrorEnabled) _log.Error(string.Format("file {0} not found", path));
StreamReader sr = new StreamReader(path, encoding);
content = sr.ReadToEnd();

You need to use the wwAspRuntimeHost class.
Rick Strahl had a post on this, and I actually used the same approach he recommendsd to host ASP.NET runtime engine in a non-IIS environment. Here's the link:
http://www.west-wind.com/presentations/aspnetruntime/aspnetruntime.asp
(update to the original post)
http://www.west-wind.com/Weblog/posts/1197.aspx

This is what ASP.NET does all the time. It looks for an ASPX page on the file system, compiles it, if required, and then processes the request.
Codebehind is optional. You can have a website with only ASPX in it, without any precompiled code.
Here's a ASPX page without codebehind
<%# Page language="c#" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
<HTML>
<HEAD>
<title>ClearCache</title>
</HEAD>
<body>
<form id="ClearCache" method="post" runat="server">
<%
IList keys = new ArrayList(Cache.Count);
foreach (DictionaryEntry de in Cache)
keys.Add(de.Key);
foreach (string key in keys)
{
this.Response.Write(key + "<br>");
Cache.Remove(key);
}
%>
</form>
</body>
</HTML>
Downloading the file as html:
var wc = new WebClient();
wc.DownloadFile(myUrl, filename);
If you don't have a ASP.NET web-server, you have to start a server. Cassini is great for this. Then your code should look like this:
var server = new Server(80,"/", pathToWebSite);
server.Start();
var wc = new WebClient();
wc.DownloadFile(server.RootUrl + "myPage.aspx", filename);
server.Stop();
If you run this more than once, the server should be cached.
Note that you could also use a RuntimeHost as mentioned by code4life. Cassini does something similar. I'd give goth a try and see, what better fits your purpose.

ASPX files are dynamic => generated HTML depends on state of the application.
If you are missing the codebehind file, you cannot properly translate the code.
Mono Project has a code evaluator. That said, it won't help you without application state.
The only thing you can do is parse the aspx file as xml (if it is valid) and filter out the dynamic content.

I don't think you can do what you need to, without the ASP.NET runtime. If you have the ASP.NET runtime, and still want to be able to generate a HTML file from the content of an ASPX file, you could write an IHttpModule which writes the response text to a file.

If I'm understanding your question correctly, you want an instance of the Page class created (i.e. the aspx page is compiled) and ulimately the resulting html? But you want that to happen outside the context of a web server request?
If you're looking for the html after an aspx page is actually processed, why not just grab the html returned after a page is actually rendered via IIS or whatever?
Perhaps if you shared your motivation(s) for attempting this you'll get some solid suggestions...

Related

Get Document OuterHTML of MVC Application in C#

We need to export the entire page of MVC Application to PDF for that purpose need to get all the HTML contents (i.e. including dynamic content too)
To get the contents of page we used following code
string contents = File.ReadAllText(path);
but it will give only static content of page(i.e. it gives page source code) not new nodes added in DOM.
Then tried following code but this also gives static content
// WebClient object
WebClient client = new WebClient();
// Retrieve resource as a stream
Stream data = client.OpenRead(new Uri("xxxx.html"));
// Retrieve the text
StreamReader reader = new StreamReader(data);
string htmlContent = reader.ReadToEnd();
So i want to get enitre outerHTML of document in C# with out using any third party DLL . i googled so many links and everyone updated like use webbrowser control and get the content.
i don't how this will be useful for our application. Our Application is MVC4. we need to export the enitre page to PDF so we need enitre content OF HTML (dynamic content too)
How can i use this below code in ourt MVC Application to get document outerHTML
mshtml.HTMLDocument doc = webBrowser1.Document.DomDocument as mshtml.HTMLDocument;
string html = doc.documentElement.outerHTML;
or
var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser.Document.DomDocument;
StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML);
htmlDoc.Load(sr)
Any help on this.
You haven't mentioned what the PDF is intended for. Most likely it is for the visitor of the page to download. If that is true, maybe you could use jsPDF. That way you get around the problem with not having access to the entire page serverside.

What's the most efficient way to visit a .html page?

I have a .html page that just has 5 characters on it (4 numbers and a period).
The only way I know of is to make a webbrowser that navigates to a URL, then use
browser.GetElementByID();
However that uses IE so I'm sure it's slow. Is there any better way (without using an API, something built into C#) to simply visit a webpage in a fashion that you can read off of it?
Try these 2 lines:
var wc = new System.Net.WebClient();
string html = wc.DownloadString("http://google.com"); // Your page will be in that html variable
It appears that you want to download a url, parse it as html then to find an element and read its inner text, right? Use nuget to grab a reference to HtmlAgilityPack, then:
using(var wc = new System.Net.WebClient()){
string html = wc.DownloadString("http://foo.com");
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var el = doc.GetElementbyId("foo");
if(el != null)
{
var text = el.InnerText;
Console.WriteLine(text);
}
}
Without using any APIs? You're in the .NET framework, so you're already using an abstraction layer to some extent. But if you want pure C# without any addons, you could just open a TCP socket to the site and download the contents (it's just a formatted string, after all) and read the data.
Here's a similar question: How to get page via TcpClient?

How do I recover the full html of a page, including what is generated by javascript

How do I recover the full html of a page, including what is generated by javascript. The problem is that I want to access the contents of the select tag, but the page but it is coming empty, this probably being generated dynamically. Please I'm about to give up!
I just posted a piece of code because this very large, if I find it necessary to put the whole code.
res = (HttpWebResponse)req.GetResponse();
res.Cookies = req.CookieContainer.GetCookies(req.RequestUri);
cookieContainer.Add(res.Cookies);
sr = new StreamReader(res.GetResponseStream());
getHtml = sr.ReadToEnd();
viewstate = rxViewstate.Match(getHtml).Groups[1].Value;
EventValdidation = rxEventValidation.Match(getHtml).Groups[1].Value;
viewstate = HttpUtility.UrlEncode(viewstate);
EventValdidation = HttpUtility.UrlEncode(EventValdidation);
//Here I should take the contents of the select tag.
getHtml = rxDropDownMenu.Match(getHtml).Groups[2].Value;
You can't just do this with HttpWebRequest, all that does is download the raw HTML and non of the linked JavaScript files.
It also wouldn't run the JavaScript or give you any kind of DOM to inspect.
You'd really need to use WebBrowser or perhaps something like Awesomium.

Download .aspx file from WebClient object

I'm trying to download a ".aspx" file from the web server using a WebClient object and save it to the file system, but it raises an exception of "HTTP 500 Internal Error", I think becuase the server tries to render the html and send the content of the same rather than the file itself.
var objWebClient = new WebClient();
var remoteUrl = "someserverURL" + "default.aspx";
objWebClient.DownloadFile(remoteUrl, localPathToSave);
Tried adding HTTP headers but I think they might not be of use with request object being from a desktop system and not a browser, have set the server, to serve all content in "application/octet-stream" format.
You can't do this.
If the web server is set up correctly, it will not allow you to directly download an aspx file.
The reason it downloads all your other files like jpegs and text files is because the web server will happily serve these file types and allow them to be downloaded.
If what you are attempting to do was possible then anyone would be able to download the aspx source files for any .net site, which would be hugely insecure.
What you could do is to get the rendered html content from the .aspx page and save that.
var webClient = new WebClient();
var remoteUrl = "someserverURL" + "default.aspx";
byte[] data = webClient.DownloadData(remoteUrl);
var utf8Encoding = new UTF8Encoding();
var html = utf8Encoding.GetString(data);
//now you could save the html to a file

Creating XML in C# for jQuery

I'm trying to generate some XML for a jQuery.get (AJAX) call, and I'm getting the following error from my C# page: "Using themed css files requires a header control on the page. (e.g. <head runat="server" />)."
The file generating the XML is a simple .aspx file, consisting entirely of:
<%# Page Language="C#" AutoEventWireup="true" CodeBehind="ChangePeopleService.aspx.cs" Inherits="ChangeRegister.Person.ChangePeopleService" EnableTheming="false" %>
with codebehind using Linq-to-XML, which is working ok:
XElement xml = new XElement("People",
from p in People
select new XElement("Person", new XAttribute("Id", p.Id),
new XElement("FirstName", p.FirstName)));
HttpContext.Current.Response.ContentType = "text/xml";
HttpContext.Current.Response.Write(xml.ToString());
I know that the error relates to the Web.Config's <pages styleSheetTheme="default" theme="default"> tag, because when I remove the 'styleSheetTheme' and 'theme' attributes, the XML gets generated ok. The problem then obviously is that every other page loses its styling. All this leads me to think that I'm approaching this wrong.
My question is: what's an accepted way to generate XML in C#, for consumption by a jQuery AJAX call, say?
If I am returning simple data (not a page), I probably wouldn't use aspx; that is really web-forms, but what you are returning isn't a web-form. Two options leap to mind:
use ASP.NET MVC; sounds corny, but it really is geared up to return different types of response much more elegantly
use a handler (ashx) - which omits all the web-form noise, just leaving you with a HttpContext with which to construct your response
You could also try (within aspx) clearing the response (Clear()?) and calling Close() afterwards. But IMO a lot more roundabout than just using a handler.
You need to use theme=""
example:
<%# Page Language="C#" AutoEventWireup="true" CodeBehind="ChangePeopleService.aspx.cs" Inherits="ChangeRegister.Person.ChangePeopleService" Theme="" %>
Try writing to the Response.OutputStream instead:
HttpContext.Current.Response.ContentType = "text/xml";
HttpContext.Current.Response.ContentEncoding = Encoding.UTF8;
using (TextWriter textWriter
= new StreamWriter(HttpContext.Current.Response.OutputStream, Encoding.UTF8))
{
XmlTextWriter writer = new XmlTextWriter(textWriter);
writer.WriteString(xml.ToString());
}

Categories

Resources