I have these lines in my global.asax (basically because of Can I add my caching lines to global.asax?)
The thing I want to now understand is whether this code purely adds the HTTP headers to the page or does it also make .Net cache this page on the server for 300 seconds?
Response.Cache.SetExpires(DateTime.Now.AddSeconds(300));
Response.Cache.SetCacheability(HttpCacheability.Public);
Your page will be stored in output cache, too. Are you sure you want to do this for every page on the site?
KB article
Related
I understand that Server.Transfer doesn't make a round trip back to the requesting client.
What I haven't been able to learn is if control is simply passed directly to the new request handler you're transferring to or if or if the entire request life-cycle is executed again.
I assume the entire life-cycle is executed again using the transfer URL but wanted to verify this was the case.
Here is what I found through experimentation.
When using Server.Transfer the entire request life cycle is not ran again.
If you write your own Module, hook it into the request life cycle, and call Server.Transfer from that module the rest of the request life cycle will be skipped and the page life cycle will begin immediately.
After completing the transfer page life cycle the request life cycle picks back up with its tear-down events. Note, the HtppContext in for the tear-down events will be the original one you transferred from. That is, the URL and QueryString values will be the same as the original request and not be the URL and QueryString values for the page you transferred to.
Server.Transfer does modify the HttpContext.Request object to contain the new URL and QueryString information during the page life cycle for the page you transferred to.
If you transfer to a resource that is not a page but is text based (e.g. something.xml) the content of that page will be returned exactly as is with its encoding set to text/html.
If you transfer to a resource that is not a page and is not text based (e.g. something.pdf) then an HttpException error will be thrown. This happens even if you have defined a custom Handler for this resource.
It's just passed along, with its state intact. The request lifecycle does not get run again, although the page lifecycle will run for the page you're transferring to.
http://msdn.microsoft.com/en-us/library/ms525800(v=vs.90).aspx
Server.Transfer acts as an efficient replacement for the Response.Redirect method. Response.Redirect specifies to the browser to request a different page. Because a redirect forces a new page request, the browser makes two requests to the Web server, so the Web server handles an extra request. IIS 5.0 introduced a new function, Server.Transfer, which transfers execution to a different ASP page on the server. This avoids the extra request, resulting in better overall system performance, as well as a better user experience.
This link is also helpful -
http://www.developer.com/net/asp/article.php/3299641/ServerTransfer-Vs-ResponseRedirect.htm
I need to know where the files are saved when a page is cached using the following:
<%# OutputCache Duration="60" VaryByParam="None" %>
Because sometimes I need to delete the files to 'reset' the page so I can get the latest data.
EDIT: A second question: does the above line uses the memory of the server to save the cached pages?
Thanks
You could use the RemoveOutputCacheItem method to remove a cached page.
does the above line uses the memory of the server to save the cached pages?
This will depend on the value of the Location attribute. If you set it to Server then it will be stored in memory. If you set it to Client, then the page will be cached on the client browser.
I'm trying to get the list of requests that occur within the confines of a single httpwebrequest from my aspx page.
When using fiddler, you request a page from IE. While doing that request the page requests x number of other files as part of the request. Fiddler shows you that you are getting a .css file, a .js file and maybe its's also requesting a couple other pages from that page before it renders.
I want to be able to make the httpwebrequest from my aspx page then monitor (or list out) the URLs that are being called within that request.
That said I am open to alternate ways to do the request. e.g. IFRAME, etc.
Maybe this just can't be done from an aspx page. Ideas?
If you are using an HttpWebRequest on the server, it is not going to download all of the other embedded resources. If you want to get the list of resources used on the page, you'll have to parse the HTML yourself.
Here's a related questions that might be useful: How can I use HTML Agility Pack to retrieve all the images from a website?
This cannot be done from ASPX page. I think you should hook on one of the Global ASAX events (via writing custom HttpModule) and intercept the requests there.
Let's say I have two pages on the same ASP.NET C# WebSite.
Page1.aspx does things in the Page_Load event
I navigate to Page2.aspx using the menu
Page2.aspx does some things then Response.Redirect back to Page1.aspx
Page1.aspx cannot do things in Page_Load event this time because it never fires.
I tried to turn off cache declaratively, tried using true for endResponse in my redirect... nothing seems to make a difference.
Never mind everybody! I am a moron!
Using Visual Studio Dev Localhost the Redirect was redirecting to the live page! :)
The reason for the page executing doesn't affect the page cycle, the Load event always fires when the page is executed.
So, if the Page_Load doesn't run sometimes, it's because the page is cached and doesn't execute on the server. The page can be cached in the browser, in a router somewhere along the way, or on the server using server side page caching.
If you haven't enabled server side page caching for the page, it's cached in the browser or in the network. You can use cache settings to try to elliminate this:
Response.Cache.SetCacheability(HttpCacheability.NoCache);
This will keep the page from being cached in normal circumstances. (Check also that your browser isn't in offline mode, then it will use anything in the cache regardless of it's cacheability settings.)
When you navigate to a page using the Back button, the page is reloaded from memory, and no request is sent to the server.
You can confirm this using Fiddler.
I'm not sure if this is true in all browsers.
If you are redirecting, it's possible the client is caching the response. In order to get past this you might add an extra query parameter that simply holds the time.
This is usually enough to get past most pages caching mechanisms.
Try using Server.Transfer instead of Response.Redirect.
The client will not see the URL change but this may not matter, depending on your requirements
I had the same problem and found that this works for me: (add this on the Page_Load section)
if (this.Master.Page.Header != null && Session["RELOAD"] == null)
{
System.Web.UI.HtmlControls.HtmlHead hh = this.Master.Page.Header;
System.Web.UI.HtmlControls.HtmlMeta hm = new System.Web.UI.HtmlControls.HtmlMeta();
hm.Attributes.Add("http-equiv", "Refresh");
hm.Attributes.Add("content", "3");
hh.Controls.Add(hm);
}
and then I add Session["RELOAD"] = "1" right after it executes the code I want to run to prevent it from refreshing over and over again. Works like a charm.
Changing VS from debug to Release mode worked for me ....
Please run the following code to disable the page cache in firefox.
Response.AppendHeader("Cache-Control", "no-store");
Apply this in page load of master page.
I'm using a C# WebClient to post login details to a page and read the all the results.
The page I am trying to load includes flash (which, in the browser, translates into HTML). I'm guessing it's flash to avoid being picked up by search engines???
The flash I am interested in is just text (not an image/video) etc and when I "View Selection Source" in firefox I do actually see the text, within HTML, that I want to see.
(Interestingly when I view the source for the whole page I do not see the text, within HTML, that I want to see. Could this be related?)
Currently after I have posted my login details, and loaded the HTML back, I see the page which does NOT show the flash HTML (as if I had viewed source for the whole page).
Thanks in advance,
Jim
PS: I should point out that the POST is actually working, my log in is successful.
Fiddler (or similar tool) is invaluable to track down screen-scraping problems like this. Using a normal browser and with fiddler active, look at all the requests being made as you go through the login and navigation process to get to the data you want. In between, you will likely see one or more things that your code is doing differently which the server is responding to and hence showing you different HTML than a real client.
The list of stuff below (think of it as "scraping 101") is what you want to look for. Most of the stuff below is probably stuff you're already doing, but I included everything for completeness.
In order to scrape effectively, you may need to deal with one or more of the following:
cookies and/or hidden fields. when you show up at any page on a site, you'll typically get a session cookie and/or hidden form field which (in a normal browser) would be propagated back to the server on all subsequent requests. You will likely also get a persistent cookie. On many sites, if a requests shows up without a proper cookie (or form field for sites using "cookieless sessions"), the site will redirect the user to a "no cookies" UI, a login page, or another undesirable location (from the scraper app's perspective). always make sure you capture the cookies set on the initial request and faithfully send them back to the server on subsequent requests, except if one of those subsequent requests changes a cookie (in which case propagate that new cookie instead).
authentication tokens a special case of above is forms-authentication cookies or hidden fields. make sure you're capturing the login token (usually a cookie) and sending it back.
POST vs. GET this is obvious, but make sure you're using the same HTTP method that a real browser does.
form fields (esp. hidden ones!) I'm sure you're doing this already, but make sure to send all form fields that a real browser does, not just the visible fields. make sure fields are HTML-encoded properly.
HTTP headers. you already checked this, but it may make sense to check again just to make sure the (non-cookie) headers are identical. I always start with the exact same headers and then start pulling out headers one by one, and only keep the ones that cause the request to fail or return bogus data. this approach simplifies your scraping code.
redirects. These can either come from the server, or from client script (e.g. "if user doesn't have flash plug-in loaded, redirect to a non-flash page"). See WebRequest: How to find a postal code using a WebRequest against this ContentType="application/xhtml+xml, text/xml, text/html; charset=utf-8"? for a crazy example of how redirection can trip up a screen-scraper. Note that if you're using .NET for scraping, you'll need to use HttpWebRequest (not WebClient) for redirect-dependent scraping, because by default WebClient doesn't provide a way for your code to attach cookies and headers to the second (post-redirect) request. See the thread above for more details.
sub-requests (frames, ajax, flash, etc.) - often, page elements (not the main HTTP requests) will end up fetching the data you want to scrape. you'll be able to figure this out by looking which HTTP response contains the text you want, and then working backwards until you find what on the page is actually making the request for that content. A few sites do really crazy things in sub-requests, like requesting compressed or encrypted text via ajax, and then using client-side script to decrypt it. if this is the case, you'll need to do a bit more work like reverse-engineering what the client script is doing.
ordering - this one is obvious: make HTTP requests in the same order that a browser client does. that doesn't mean you need to make every request (e.g. images). Typically you only need to make the requests which return text/html content type, unless the data you want is not in the HTML and is in an ajax/flash/etc. request.
(Interestingly when I view the source for the whole page I do not see the text, within HTML, that I want to see. Could this be related?)
This usually means that the discrepancy is caused by some DOM manipulations via javascript after the page has loaded. Try turning off javascript and see what it looks like.