I am developing a local server using self-hosted ServiceStack. I hardcoded a demo webpage and allow it to be accessed at localhost:8080/page:
public class PageService : IService<Page>
{
public object Execute(Page request)
{
var html = System.IO.File.ReadAllText(#"demo_chat2.html");
return html;
}
}
// set route
public override void Configure(Container container)
{
Routes
.Add<Page>("/page")
.Add<Hello>("/hello")
.Add<Hello>("/hello/{Name}");
}
It works fine for Chrome/Firefox/Opera, however, IE would treat the url as a download request and promote "Do you want to open or save page from localhost?"
What shall I do to let IE treat the url as a web page? (I already added doctype headers to the demo page; but that cannot prevent IE from treating it as a download request.)
EDIT.
Ok. I used Fiddler to check the response when accessing localhost. The responses that IE and Firefox get are exactly the same. And in the header the content type is written as:
Content-Type: text/html,text/html
Firefox treats this content type as text/html, however IE does not recognize this content type (it only recognizes a single text/html)!
So this leads me to believe that this is due to a bug in SS.
Solution
One solution is to explicitly set the content type:
return new HttpResult(
new MemoryStream(Encoding.UTF8.GetBytes(html)), "text/html");
I don't know what is your exact problem is. If you want to serve html page, there is a different way to do that.
Servicestack support razor engine as plugin that is useful if you like to serve html page and also you can bind data with it. Different way of doing this is explain here. razor.servicestack.net . This may be useful. Let me know if you need any additional details.
Related
Is there a way to get the fully rendered html of a web page using WebClient instead of the page source? I'm trying to scrape some data from the page's html. My current code is like this:
WebClient client = new WebClient();
var result = client.DownloadString("https://somepageoutthere.com/");
//using CsQuery
CQ dom = result;
var someElementHtml = dom["body > main];
WebClient will only return the URL you requested. It will not run any javacript on the page (which runs on the client) so if javascript is changing the page DOM in any way, you will not get that through webclient.
You are better off using some other tools. Look for those that will render the HTML and javascript in the page.
I don't know what you mean by "fully rendered", but if you mean "with all data loaded by ajax calls", the answer is: no, you can't.
The data which is not present in the initial html page is loaded through javascript in the browser, and WebClient has no idea what javascript is, and cannot interpret it, only browsers do.
To get this kind of data, you need to identify these calls (if you don't know the url of the data webservice, you can use tools like Fiddler), simulate/replay them from your application, and then, if successful, get response data, and extract data from it (will be easy if data comes as json, and more tricky if it comes as html)
better use http://html-agility-pack.net
it has all the functionality to scrap web data and having good help on the site
So I'm trying to read the source of an url, let's say domain.xyz. No problem, I can simply get it work using HttpWebRequest.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
My problem is that it will return the page source, but without the source of the iframe inside this page. I only get something like this:
<iframe src="http://anotherdomain.xyz/frame_that_only_works_on_domain_xyz"></iframe>
I figured out that I can easily get the src of the iframe with WebBrowser, or basic string functions (the results are the same), and create another HttpWebRequest using the address. The problem is that if I view the full page (where the frame was inserted) in a browser (Chrome), i get the expected results. But if I copy the src to another tab, the contents are not the same. It says that the content I want to view is blocked because it's only allowed through domain.xyz.
So my final question is:
How can I simulate the request through a specified domain, or get the full, rendered page source?
That's likely the referer property of the web request: typically a browser tells the web server where it found the link to the page it is requesting.
That means, when you create the web request for the iframe, you set the referer property of that request to the page containing the link.
If that doesn't work, cookies may be another option. I.e. you have to collect the cookies sent for the first request, and send them with the second request.
I am having some trouble adding a value to the Page.Request & Page.Response headers and have the key & value stay/persist through a redirect.
I have an enum tracking code that I want to place in the headers to trace how a user goes through my site prior to their checkout.
I am using this code to add the headers to response and request context.
var RequestSessionVariable = context.Request.Headers["SessionTrackingCode"];
if (RequestSessionVariable == null)
{
context.Response.AddHeader("SessionTrackingCode", ((int)tracker).ToString());
context.Request.Headers.Add("SessionTrackingCode", ((int)tracker).ToString());
}
else
{
if(!RequestSessionVariable.Contains(((int)tracker).ToString()))
{
RequestSessionVariable += ("," + ((int)tracker).ToString());
context.Request.Headers["SessionTrackingCode"] = RequestSessionVariable;
context.Response.Headers["SessionTrackingCode"] = RequestSessionVariable;
}
}
The method call that occurs in Page_Load of the necessary controls within the website:
trackingcodes.AddPageTrackingCode(TrackingCode.TrackingCodes.ShoppingCart, this.Context);
The header SessionTrackingCode is their but after a Response.Redirect("~/value.aspx") the RequestSessionVariable is always null. Is there something that happens on the redirect that will wipe out the headers that I add? Or what am I doing wrong on the addition of the header key and value?
this equals:
public partial class Cart : System.Web.UI.UserControl
Headers send by client on every request, so any redirect will require client to send headers again.
Unless you are using some special client (not a browser) any special headers will be essentially ignored/lost during requests. Browser only will send known headers (cookies, authentication, referrer) in requests and act on other set of known headers in response (setCookies). You are using custom header that not known to browser so browser will not read in from response nor send it in request.
Your options:
switch to use cookies for your tracking (same as everyone else)
use AJAX requests to send/receive custom headers (probably not what you are looking for as urls look like regular GET/POST ones)
build custom client that will pay attention to your headers (purely theoretical, unless you building some sort of sales terminal no one will install your client to visit your site)
Note: adding headers to request in page code does no make much sense as request will not be send anywhere (it is what come from browser).
This looks like a job for cookies, rather than http headers. The browser will not return your custom headers to you, but it will return your cookies.
I have a simple form that uploads an image to a database. Using a controller action, the image can then be served back (I've hard coded to use jpegs for this code):
public class ImagesController : Controller
{
[HttpPost]
public ActionResult Create(HttpPostedFileBase image)
{
var message = new MessageItem();
message.ImageData = new byte[image.ContentLength];
image.InputStream.Read(message.ImageData, 0, image.ContentLength);
this.session.Save(message);
return this.RedirectToAction("index");
}
[HttpGet]
public FileResult View(int id)
{
var message = this.session.Get<MessageItem>(id);
return this.File(message.ImageData, "image/jpeg");
}
}
This works great and directly browsing to the image (e.g. /images/view/1) displays the image correctly. However, I noticed that when FireBug is turned on, I'm greeted with a lovely error:
Image corrupt or truncated: ... (followed by the base64 representation of the image).
Additionally in Chrome developer tools:
Resource interpreted as Document but transferred with MIME type image/jpeg.
I checked the headers that are being returned. The following is an example of the headers sent back to the browser. Nothing looks out of the ordinary (perhaps the Cache-Control?):
Cache-Control private, s-maxage=0
Content-Type image/jpeg
Server Microsoft-IIS/7.5
X-AspNetMvc-Version 3.0
X-AspNet-Version 4.0.30319
X-SourceFiles =?UTF-8?B?(Trimmed...)
X-Powered-By ASP.NET
Date Wed, 25 May 2011 23:48:22 GMT
Content-Length 21362
Additionally, I thought I'd mention that I'm running this on IIS Express (even tested on Cassini with the same results).
The odd part is that the image displays correctly but the consoles are telling me otherwise. Ideally I'd like to not ignore these errors. Finally, to further add to the confusion, when referenced as an image (e.g. <img src="/images/view/1" />), no error occurs.
EDIT: It is possible to fully reproduce this without any of the above actions:
public class ImageController : Controller
{
public FileResult Test()
{
// I know this is directly reading from a file, but the whole purpose is
// to return a *buffer* of a file and not the *path* to the file.
// This will throw the error in FireBug.
var buffer = System.IO.File.ReadAllBytes("PATH_TO_JPEG");
return this.File(buffer, "image/jpeg");
}
}
You're assuming the MIME type is always image/jpeg, and your're not using the MIME type of the uploaded image. I've seen this MIME types posted by different browsers for uploaded images:
image/gif
image/jpeg
image/pjpeg
image/png
image/x-png
image/bmp
image/tiff
Maybe image/jpeg is not the correct MIME type for the file and the dev tools are giving you a warning.
Could it be that the session.Save/Get is truncating the jpeg?
Use Fiddler and save this file on the server. Attempt a GET request directly to the image. Then attempt the GET to the action method. Compare fiddler's headers and content (can save out and compare with a trial of BeyondCompare). If they match for both get requests - well.. that wouldn't make sense - something would be different in that case and hopefully point to the issue. Something has to be different - but without seeing the fiddler output its hard to say : )
Could it possibly be that the image itself is corrupt? If you save it as a file on your website and access it directly does the error come up? How does that request look compared to your action request in Fiddler? It could be the browsers are trying to get the content type by extension, you could try a route like this to see if there are any changes:
routes.MapRoute(
"JpegImages",
"Images/View/{id}.jpg",
new { controller = "Images", action = "View" }
);
One more thing to check. image.InputStream.Read() returns an integer which is the actual number of bytes read. It may be that all the bytes aren't able to be read at once, can you record that and throw an error if the numbers don't match?
int bytesRead = image.InputStream.Read(message.ImageData, 0, image.ContentLength);
if (bytesRead != image.ContentLength)
throw new Exception("Invalid length");
I wonder if it is something to do with X-SourceFiles. I'm doing the exact same thing as you with MVC but I am persisting my byte array in the database. The only difference I don't understand in our headers is the X-SourceFiles.
Here is something about what X-SourceFiles does What does the X-SourceFiles header do? and it talks about encoding. So maybe?? The answerer claims this only happens on your local host by the way.
As far as I understand it if you are returning a proper byte array that is a jpeg then your code should work fine. That is exactly what I'm doing successfully (without an X-SourceFiles header).
Thanks everyone for all the help. I know this is is going to be a very anti-climatic ending for this problem, but I was able to "resolve" the issue. I tried building my code from another machine using the same browser/firebug versions. Oddly enough, no errors appeared. When I went back to the other machine (cleared all cache and even re-installed browser/firebug) it was still getting the error. What's even more weird is that both Chrome/Firefox are now showing the error when I visit other websites.
Again, thanks everyone for all their suggestions!
I'm downloading a web site using WebClient
public void download()
{
client = new WebClient();
client.DownloadStringCompleted += new DownloadStringCompletedEventHandler(client_DownloadStringCompleted);
client.Encoding = Encoding.UTF8;
client.DownloadStringAsync(new Uri(eUrl.Text));
}
void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
SaveFileDialog sd = new SaveFileDialog();
if (sd.ShowDialog() == DialogResult.OK)
{
StreamWriter writer = new StreamWriter(sd.FileName,false,Encoding.Unicode);
writer.Write(e.Result);
writer.Close();
}
}
This works fine. But I am unable to read content that is loaded using ajax. Like this:
<div class="center-box-body" id="boxnews" style="width:768px;height:1167px; ">
loading .... </div>
<script language="javascript">
ajax_function('boxnews',"ajax/category/personal_notes/",'');
</script>
This "ajax_function" downloads data from server on the client side.
How can I download the full web html data?
To do so, you would need to host a Javascript runtime inside of a full-blown web browser. Unfortunately, WebClient isn't capable of doing this.
Your only option would be automation of a WebBrowser control. You would need to send it to the URL, wait until both the main page and any AJAX content has been loaded (including triggering that load if user action is required to do so), then scrape the entire DOM.
If you are only scraping a particular site, you are probably better off just pulling the AJAX URL yourself (simulating all required parameters), rather than pulling the web page that calls for it.
I think you'd need to use a WebBrowser control to do this since you actually need the javascript on the page to run to complete the page load. Depending on your application this may or may not be possible for you -- note it's a Windows.Forms control.
When you visit a page in a browser, it
1.downloads a document from the
requested url
2.downloads anything referenced by an
img, link, script,etc tag (anything
that references an external file)
3.executes javascript where applicable.
The WebClient class only performs step 1. It encapsulates a single http request and response. It does not contain a script engine, and does not, as far as I know, find image tags, etc that reference other files and initiate further requests to obtain those files.
If you want to get a page once it's been modified by an AJAX call and handler, you'll need to use a class that has the full capabilities of a web browser, which pretty much means using a web browser that you can somehow automate server-side. The WebBrowser control does this, but it's for WinForms only, I think. I shudder to think of the security issues here, or the demand that would be placed on the server if multiple users are taking advantage of this facility simultaneously.
A better question to ask yourself is: why are you doing this? If the data you're really interested in is being obtained via AJAX (probably through a web service), why not skip the webClient step and just go straight to the source?