Telling search engine bots to wait - c#

Short story:
My site pre generates pages based on user submited data, sometimes this cache has to be cleared when this happens it would kill a super computer unless i controled the amount of stats being generated at once.
The problem:
Now comes the search engine bots that hit the site constantly ( due to the sheer amount of pages, its pretty constants that search engines bot crawl ). The problem here is that they will use up all my "generate" slots, and real users will be left with a page saying "bla bla, please wait".
Posible solution:
Can i basicly return a 503 to the bots, without having them give me negative ranking for having a unstable site?
Or did someone come up with some other solution?

How critical is it that the cache is cleared immediately? If your cache supports it, you could instead mark all the cached pages as 'dirty' and regenerate them when a real user next visits; if a bot visits in the meantime, serve them the stale page.

Related

Submit process is taking long time for Web page

I have a dot net web application. There is one page where we enter data & submit the form.We upload the attachment before submitting the form.The submit action is taking long time almost minute for files with attachment of 650KB. The code behind is C#. We use third party API(Ektron).Its a CMS tool.
Please let me know , in what all ways i can analyse the bottle neck for the issue.Please provide open source Tool & the browser addons.. other than Page speed & Yslow .
Please check if the time taken is for the request to initiate or the response to comeback to your browser..
It is only then you can look for a solution..
To answer the second half of your question. At the very least most modern browsers (FireFox, Chrome and Safari) have a developer console that will give you a breakdown of the times taken in each request state on a per request basis. My personal preference is FireFox with FireBug as I find the Network pane view easy to interpret.
Redgate ANTS Performance Profiler is pretty much the bees knees for troubleshooting performance problems in ASP.net.

Google Experiments on Page that Redirects

I have a site that has a several page offer form. The offer information is stored in a session, and I keep track of what step the customer has completed. If the customer has not completed all previous steps in relation to the page they are on then they are redirected back to the start of the process. In this way I prohibit users from accessing step 3 by simply typing in it's URL. This is done because information on steps after 1 depend on valid information from previous steps.
The problem is that when I set up my content experiment through Google Analytics it cannot validate my original or variation pages since when it hits those pages (which is step 4) the sever recognizes that they are not allowed on that page and returns step 1 to them.
I attempted to proceed anyways, but it seems that when I arrive at step 4 it is not pushing me to my variation page (I have it set so that everyone that arrives at step 4 should go to the variation.) I'm assuming the problem is because of the redirect.
Any ideas?
The perceived problem was that GA could not ping my page because of the redirect I have on it.
The actual problem was that my GA experiments code was not the very first thing after my opening HEAD tag. GA Experiments only scan the first 256 characters of a page, so if the beginning of the experiments code is not within that, then it won't work.
Also, I had my GA code in a js file and was linking it on the page for cleanliness....this also does not work. GA Experiments scans the first 256 characters of the page and cares not for links, I needed to have the exact code, with comments, that GA gave me on the page for this to work.

IE shows a previously cached version of my page

my scenario is this; the user selects the list of reports they wish to print, once they select and click on the a button, i open up another page with the selected reports ready for printing. I am using a session variable to pass reports from one page to another.
first time you try it, it works fine, second time you try it, it opens the report window with the previous selected reports. I have to refresh the page to make sure it loads the latest selections.
is there a way to get the latest value from the session every time you use it? or is there a better way to solve this problem. open for suggestions...
Thanks
C# Asp.net, IE&7 /IE 8
After doing some more checking maybe if you check out COMET it might help.
The idea is that you can have code in your second page which will keep checking the server for updated values every few seconds and if it finds updated values it will refresh itself.
There are 2 very good links explaining the imlementation.
Scalable COMET Combined with ASP.NET
Scalable COMET Combined with ASP.NET - Part 2
The first link explains what COMET is and how it ties in with ASP.NET, the second link has an example using a chat room. However, I'm sure the code querying for updates will be pretty generic and can be applied to your scenario.
I have never implemented COMET yet so I'm not sure how complex it is or if it is easy to implement into your solution.
Maybe someone developing the SO application is able to resolve this issue for you. SO uses some real-time feature for the notifications on a page, i.e: You are in the middle of writing an answer and a message pops up in your client letting you know someone else has added an answer and to click "here" to refresh.
The proper fix is to set the caching directives on the HTTP response correctly, so that the cached response is not reused without validation from the server.
When you fail to specify the cache lifetime, the client has to "guess" how long the response is good for, and the browser's guess probably isn't what you want. See http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx
It's better to use URL paramaters. So you have a view of value of the paramaters.

How can I stop applications from using Sessions when a request is made by a bot?

The solution is for a project in which changing all instances of Session[string] is not an option. My thoughts have been implementing the SessionStateStoreProviderBase. I understand that creating a class Session and having properties like Session.UserName would be a good idea.
Edit: The goal here is to turn off Sessions per user request, not application wide, without changing code in each aspx page.
First you need a way to tell a bot from a human apart.
When you're through, consider what do you want to achieve.
If you wish to disable Session to bots, then be sure it won't break you site. If a search engine bot gets a crashed page, it will index and rank it as such.
Set up your robots.txt file to direct (most) bots to a page of your choice, where you have control over session and other information. If you want free access to all pages, you have to put in code to distinguish bots by http header information - that's a research project in itself.

Handling Long Running Reports

I am working on a ASP.net application written in C# with Sql Server 2000 database. We have several PDF reports which clients use for their business needs. The problem is these reports take a while to generate (> 3 minutes). What usually ends up happening is when the user requests the report the request timeout kills the request before the web server has time to finish generating the report, so the user never gets a chance to download the file. Then the user will refresh the page and try again, which starts the entire report generation process over and still ends up timing out. (No we aren't caching reports right now; that is something I am pushing hard for...).
How do you handle these scenarios? I have an idea in my head which involves making an aysnchronous request to start the report generating and then have some javascript to periodically check the status. Once the status indicates the report is finished then make a separate request for the actual file.
Is there a simpler way that I am not seeing?
Using the filesystem here is probably a good bet. Have a request that immediately returns a url to the report pdf location. Your server can then either kick off an external process or send a request to itself to perform the reporting. The client can poll the server (using http HEAD) for the PDF at the supplied url. If you make the filename of the PDF derive from the report parameters, either by using a hash or directly putting the parameters into the name you will get instant server side caching too.
I would consider making this report somehow a little bit more offline from the processing point of view.
Like creating a queue to put report requests into, process the reports from there and after finish, it can send a message to the user.
Maybe I would even create a separate Windows Service for the queue handling.
Update: sending to the user can be email or they can have a 'reports' page, where they can check their reports' status and download them if they are ready.
What about emailing the report to the user. All the asp page should do is send the request to generate the report and return a message that the report will be emailed after is has finished running.
Your users may not accept this approach, but:
When they request a report (by clicking a button or a link or whatever), you could start the report generation process on a separate thread, and re-direct the user to a page that says "thank you, your report will be emailed to you in a few minutes".
When the thread is done generating the report, you could email the PDF directly (probably won't work because of size), or save the report on the server and email a link to the user.
Alternatively, you could go into IIS and raise the timeout to > 3 minutes.
Here is some of the things I would do if I would be presented this problem:
1- Stop those timeout! They are a total waste of resources. (bring up the timeout value of asp pages)
2- Centralize all the db access in one single point, then gather stats about what reports ran when by whom and how much time it took. Investigate why it takes so long, is it because of report complexity? data range? server load? (you could actually all write that on a .csv file on the server and import this file periodically in the sql server to analyze later).
Eventually, it's going to be easier for you to "cache" reports if you go through this single access point (example, same query same date will return same PDF previously generated)
3- I know this really wasn't the question but have you tried diving into those queries to see why they are so long to run? Query tuning maybe?
4- Email/SMS/on screen message when report is ready seems great... if your user generally send a batch of report to be generated maybe a little dashboard indicating progression of "their" queue could be built in the app. A little ajax control would periodically refresh the status..
Hint: If you used that central db access and you have sufficient information about what runs when why and how-long you will eventually be able to roughly estimates the time it will take for a report to run.
If the response time is mission critical, should certain users be limited in the data range (date range for example) during some hours of the day?
Good luck and please post more details about your scenario if you want to get more accurate hints...
Query tuning is probably your best place to start. Though I don't know you are generating the report, that step shouldn't really take all that long. A poorly performing query on the other hand could absolutely kill your performance.
Depending on what you find in looking at the query, you may need to add some indexes, or possibly even set up a table to store the information for your report in a denormalized way, to make it available faster. This denormalized table could then be refreshed (through a SQL Server Job) every hour, or with whatever frequency your requirements dictate (within reason).
If its' a relatively static report, without varying user input parameters, then caching the report run earlier in the day would be a good idea as well, but its' hard to say any more about this without knowing your situation.
For a problem like this you really need to start at the database unless you have reason to suspect your report generating code of being the culprit. There are various band-aids you could use that might help for a while, but if your db is the root cause then those solutions will not scale well, and you'll likely run into similar problems (or worse) some time in the future.

Categories

Resources