I have a feed reader running every minute (it's picking up a feed that gets updated often). But I seem to be running into getting blocked by Akamai when accessing a few websites. Perhaps they think I'm up to something, but I'm not - I just want to get the feed.
Any thoughts on how to either play nice with Akamai or code this differently? From what I know, I can't know when the feed is updated other than polling it - but is there a preferred way - like checking a cache? This is coded in c# though I doubt that makes a difference.
Without more of a context it is hard to ascertain why you are being blocked. Is it because of rate limits or other access control measures?
Assuming it is rate limits, there is not much you can do. I would recommend you to first verify that the robots.txt allows you to crawl the URL and if allowed use some sort of exponential back off. Helps to play nice by providing a meaningful User-Agent so that when they do update their rules they might want to consider whitelisting legitimate requests such as yourself.
Related
I'm accessing a MySQL database using the standard MySql.Data package from Oracle. Every few releases of the application, we need to tweak the database schema (e.g. client wanted DECIMAL(10,2) changed to DECIMAL(10,3)) which the application handles by sending the necessary SQL statement. This works except that on a large database, the schema update can be a rather lengthy operation and times out.
The obvious solution is to crank up the timeout, but that results in a relatively poor user experience - I can put up a dialog that says "updating, please wait" and then just sit there with no kind of progress indicator.
Is there a way to get some kind of feedback from the MySQL server that it's 10% complete, 20% complete, etc., that I could pass on to the user?
There's two ways to approach this problem.
The first is the easiest way, as you've suggested, and just use a progress bar that bounces back and forth. It's not great, it's not the best user experience, but it's better than locking up the application and at least it's giving feedback. Also I assume this is not something that occurs regularly and is a one-off annoyance every now and again. Not something I'd really be worried about.
However, if you really are worried about user-experience and want to give better feed back, then you're going to need to try some metrics. Taking your DECIMAL example, time the change on different row-counts. 100,000 rows, a million rows, etc etc. This will give you a napkin-guess time it might take. Note, different hardware, other things running on the computer, you're never going to get it down exact. But you have an estimate.
Once you have an estimate, and you know the row-count, you can create a real progress bar based on those estimates. And if it gets to 100% and the real operation hasn't completed, or if it finishes before you get to 100% (and you can insta-jump the bar!), it's... something.
Personally I'd go with option one, and perhaps add a helpful message that Windows commonly does. "This may take a few minutes". Maybe add "Now's a great time for coffee!". And a nice little animated gif :)
I've ran into an issue which i'm struggling to decide the best way to solve. Perhaps my software articheture needs to change?
I have a cron job which hits my website method every 10 seconds and then on my website the method then makes an API call each time to an API however the API is rate limited x amount in a minute and y amount a day
Currently i'm exceeding the API limits and need to control this in the website method somehow. I've thought storing in a file perhaps but seems hacky similary to a database as I don't currently use one for this project.
I've tried this package: https://github.com/David-Desmaisons/RateLimiter but alas it doesn't work in my scenario and I think it would work if I did one request with a loop as provided in his examples. I noticed he had a persistent timer(PersistentCountByIntervalAwaitableConstraint) but he has no documentation or examples for it(I emailed him incase). I've done a lot of googling around and can't find any examples of this only server rate limiting which is the other way around server limiting client and not client limiting requests to server
How can I solve my issue without changing the cronjobs? What does everyone think the best solution to this is?
Assuming that you don't want to change the clients generating the load, there is no choice but to implement rate limiting on the server.
Since an ASP.NET application can be restarted at any time, the state used for that rate-limiting must be persisted somewhere. You can choose any data store you like for that.
In this case you have two limits: One per minute and one per day. If you simply apply two separate rate limiters you will end up with the daily limit being exceeded fairly quickly. After that, there will be no further access for the rest of the day. Likely, this is undesirable.
It seems better to only apply the daily limit because it is more restrictive. A simple solution would be to calculate how far apart requests must be to meet the daily limit. Then, you store the date of the last request. Any new incoming request is immediately failed if not enough time has passed.
Let me know if this helps you.
For a call tracking application I'm developing, I want to maintain local database.
As it stands, the application searches for new records in Twilio and inserts them into my database every time it loads. This is very time consuming.
In order to avoid that runtime expense, is there a way I can use usage triggers in Twilio to automatically populate my database in real time? Or even just daily?
If not, how can I achieve something like this?
Since Twilio is already calling your servers (unless there's some way to use it without doing that, but I don't think there is), can't you implement logging there? For instance, before you feed back your greeting, pop in a logging routine to note that you've received a call?
I'm not sure if they offer any other sorts of APIs or callbacks, but I really don't see why anything like that would be necessary. It'd just tie up your servers with more requests at no additional gain. I was just going through their documentation and I don't see anything like this. I could be just totally glossing over it, but again it just seems redundant. The entire Twilio system is based effectively on event hooks, so having separate ones wouldn't serve much additional use.
On the other hand, if for some reason you have absolutely no access whatsoever to the code or people behind the code that serves TwiML back, unless someone else is seeing an event hook API, you might want to just set up a scheduled job on your server (or in Azure, or whatever you're using) to query Twilio daily, since I know you mentioned that that would be sufficient. You could also, of course, set it more frequently. But that really seems like a waste of resources and effort when they're already telling you everything about every call through the massive list of query parameters they pass with every request.
Basically I'm writing an ASP.NET MVC application that though a javascript sends a GET request every 30 seconds, checking if a certain row in a table in the database has changed.
I've been looking at the OutputCache attribute but it doesn't seem like it would work since it would just cache the content and not really check if an update was made.
What would be the "cheapest" way to do this? I mean the way that burdens the server the least?
A HEAD request may be faster, but not guaranteed to be, but it is worth investigating.
If you can't use something to stream the change to you, the cheapest way is to use an API that takes a date, and returns a boolean flag or an integer stating whether a change occurred. Essentially it's polling, and this would be minimal because it's the smallest response back and forth, if a SignalR or some other message receive process isn't possible.
Depends what you want it to do, have you considered long-polling? Eg, make the GET/POST request using javascript and allow the server to withhold the reply until your 'event' happens.
OutputCache works perfectly. But it's expiration time should be a divider of your polling time, like 10 sec for - in this case - 30 sec on client size.
I'm not an expert on EF, but if your database supports triggers; that would be an option and you can cache the result for a longer period (like say 1 hour) unless a trigger is set.
But if your record is being updated very fast, trigger would be costly.
In that case I would go with Caching + a time-stamp mechanism (like versions in a NoSQL db, or time-stamp in Oracle).
And remember that you are fetching the record every 30 seconds, not on every change on the record. That's a good thing, because it makes your solution much simpler.
Probably SignalR with push notification when there's a change in the database (and that could be either tracked manually or by SqlDependency, depending on the database)...
I used Wyatt Barnett's suggestion and it works great.
Thanks for the answers - appreciate it.
Btw I'm answering it since I can't mark his comment as answer.
I am bulding an app and I tried YSlow and got Grade F on most of my practices. I have loads of jscript that am working on reducing. I want to be able to cache some of these because the pages get called many times.
I have one master age and I wanted to cache the scripts and css files.
How do I achieve this?
Are there any recommended best practices?
Are there any other performance improvements that I can make?
Have you re-read RFC 2616 yet this year? If not, do. Trying to build websites without a strong familiarity with HTTP is like trying to seduce someone when you're extremely drunk; just because lots of other people do it doesn't mean you'll have good performance.
If a resource can be safely reused within a given time period (e.g safe for the next hour/day/month) say so. Use the max-age component of the cache-control header as well as expires (max-age is better than expires, but doing both costs nothing).
If you know the time something last changed, say so in a Last-Modified header (see note below).
If you don't know when something last changed, but can add the ability to know, do so (e.g. timestamp database rows on UPDATE).
If you can keep a record of every time something changed do so, and build an e-tag from it. While E-tags should not be based on times an exception is if you know they can't change in a finer resolution (time to nearest .5 second is fine if you can't have more than 1 change every .5 second, etc.)
If you receive a request with a If-Modified-Since with a date matching last change time or a If-None-Match matching the e-tag, send a 304 instead of the whole page.
Use Gzip or Deflate compression (deflate is slightly better when client says it can handle both) but do note that you must change the e-tag. Sending correct Vary header for this breaks IE caching, so Vary on User-Agent instead (imperfect solution for an imperfect world). If you roll your own compression in .NET note that flushing the compression stream causes bugs, write a wrapper that only flushes the output on Flush() prior to the final flush on Close().
Don't defeat the caching done for you. Turning off e-tags on static files gives you a better YSlow rating and worse performance (except on web-farms, when the more complicated solution recommended by YSlow should be used). Ignore what YSlow says about turning off e-tags (maybe they've fixed that bug now and don't say it any more) unless you are on a web-farm where different server types can deal with the same request (e.g. IIS and Apache dealing with the same URI; Yahoo are which is why this worked for them, most people aren't).
Favour public over private unless inapproprate.
Avoid doing anything that depends on sessions. If you can turn off sessions, so much the better.
Avoid sending large amounts of viewstate. If you can do something without viewstate, so much the better.
Go into IIS and look at the HTTP Headers section. Set appropriate values for static files. Note that this can be done on a per-site, per-directory and per-file basis.
If you have a truly massive file (.js, .css) then give it a version number and put that version in the URI used to access it (blah.js/?version=1.1.2). Then you can set a really long expiry date (1 year) and/or a hard-coded e-tag and not worry about cache staleness as you will change the version number next time and to the rest of the web it's a new resource rather than an updated one.
Edit:
I said "see note below" and didn't add the note.
The last modified time of any resource, is the most recent of:
Anything (script, code-behind) used to create the entity sent.
Anything used as part of it.
Anything that was used as part of it, that has now been deleted.
Of these, number 3 can be the trickiest to work out, since it has after all been deleted. One solution is to keep track of changes of the resource itself, and update this on deletion of anything used to create it, the other is to have a "soft delete" where you have the item still, but marked as deleted and not used in any other way. Just what the best way to track this stuff is, depends on the application.
You should just create separate .js and .css files and the browser does the caching for you. It is also a good idea to use a js-minimizer that removes all the white space from the js-files.
If you have a huge ViewState like > 100Kb try to reduce it as well. If the ViewState is still huge, you can store the ViewState on the server as a file...
http://aspalliance.com/472
You might also use the caching on the page if the page is not too dynamic...
http://msdn.microsoft.com/en-us/library/06bh14hk.aspx
You can also reference common js and css libraries to trusted online stores. For example, if you add jquery as <script src="http://code.jquery.com/jquery-latest.js"></script> the jquery file has probably been cached by the browser of client, because of another web site that references this address before, even if it is cached because of your web site.
This way may have pros and cons but there is such a way.
Also I don't know if response of YSlow changes with this way.