We have an existing big application which contains a lot of data. We'd like to use it as a datasource for various internally written C# web applications, so we don't have more redundant data.
The data we are looking at doesn't change too much, so caching would work fine most of the time. So we are writing a C# Web Service against the data to be reused in various internally written applications.
However roughly once per month, the Oracle database source is unavailable.
What is the best way to handle this in the web service so that those other applications that rely on that data aren't disrupted also?
Set up replication or failover partners? Honestly, this doesn't seem like a job for more code; it sounds like a job for more infrastructure. I know Oracle licenses are expensive, but so is paying developers to work around unavailability.
If you simply had to solve it with code, then the web services should simply retain and return their cached data if any regularly-scheduled DB query fails with a timeout or connection failed-type message. The cached data should be kept as long as necessary in this circumstance, until a call to refresh that data succeeds. If there is no cached data, you can either swallow the error and return nothing, or return an error stating the data is unavailable from both places.
The solution was to use a secondary Cache which doesn't expires.
The secondary cache is updated with the latest values if the first (shorter) cache is successfully updated from the database. If the database querying fails and the first cache has expired, then the first cache is updated by the second cache. So there is always a secondary cache.
Related
I developing a C# application that storing data in Azure SQL Database.
As you probably know, Azure SQL Database is placed somewhere on the Internet. Not over LAN network (but this question also relevant for reliable network like LAN).
I've noticed that from time-to-time that I'm getting errors like "Connection is closed" (or another network errors). It's really easy to simulate this with Clumsy. The reasons for those errors are bad network conditions.
So, my first idea to solve this is "try again". When I getting this error, I simply try again and then it's working good. Like a magic.
This maybe solving the problem, but, open another kind of problems. Not all the situations are good with this solution. I'll explain:
I'll separate the scenarios for two types:
Retry cant make any damage - operation like SELECT or DELETE. Retrying will have the same expected result. So, with this type of problems - my solution is working fine!
Insert or Update - retry will damage the information.
I'll focus the the point number 2. For example, let's say I have:
A users table. Columns in this table: ID, UserName, Credits.
Store Procedure that make the user (by user id) pay some of his credits.
The "Pay" Stored Procedure is:
UPDATE tblUsers SET [Credits] -= #requestedCredits WHERE ID=#ID
Calling the SP is tricky problem:
If this will work without problem - we are fine.
If it will fail, we don't know whether the operation is done on the DB or not. Retrying here can lead to that the user will pay twice!
So, "Retry" strategy here is not an option.
Solutions I'm thought on:
I'm though to solve this problem by adding a "VersionID" for each row. My SP now:
UPDATE tblUsers SET [Credits] -= #requestedCredits, VersionId=NEWID() WHERE ID=#ID AND VersionID=#OldVersionId
Before making the user Pay(), I'll check the VersionID (Random GUID) and if this GUID wasn't changed after network failure while paying, I'll try again (proof that the data wasn't changed on the DB). If this VersionId changed, so the user is paid for the service.
The problem is when I using multiple machines at same time, this making this solution problematic. Because another instance maybe made a Pay() on the version-id and I'll think that my change is executed by me (which wrong).
What to do?
It sounds like you are making SQL queries from a local/on-premise/remote (i.e. non-Azure property) to a SQL Azure database.
Some of the possible mechanisms of dealing with this are
Azure hosted data access layer with API
Consider creating a thin data access layer API hosted on Azure WebApp or VM to be called from the remote machine. This API service can interact with SQL Azure reliably.
SQL is more sensitive to timeout and network issues than say a HTTP endpoint. Especially if your queries involve transfer of large amounts of data.
Configure an increased timeout
The database access mechanism being used by the C# application is not specified in the question. Many libraries or functions for data access allow you to specify an increased timeout for the connection.
Virtual Private Network
Azure allows you to you create a site-to-site or point-to-site VPN with better network connectivity. However, this is the least preferred mechanism.
You never blindly retry. In case of error you read current state then re-apply the logic and then write the new state. What 'apply the logic' means will differ from case to case. Present the user again with the form, refresh a web page, run a method in your business logic, anything really.
The gist of it is that you can never simply retry the operation w/o first reloading the persisted state. The only truth is what's in the DB and the error is big warning that your cached state is stale.
The scenario is that our client owns and manages a system (we wrote it) hosted at their clients premises. Their client is contractually restricted from changing any data in the database behind the system but they could change the data if they chose because they have full admin rights (the server is procured by them and hosted on their premises).
The requirement is to get notification if they change any data. For now, please ignore deleting data, this discussion is about amendments to data in tables.
We are using Linq to Sql and have overridden the data context so that for each read of the data, we compare a hash of the rows data against a stored hash, previously made during insert/update, held on each row in the table.
We are concerned about scalability so I would like to know if anyone has any other ideas. We are trying to get notified of data changes in SSMS, queries run directly on the db, etc. Also, if someone was to stop our service (Windows service), upon startup we would need to know a row had been changed. Any thoughts?
EDIT: Let me just clarify as I could have been clearer. We are not necessarily trying to stop changes being made (this is impossible as they have full access) more get notified if they change the data.
The answer is simple: to prevent the client directly manipulating the data, store it out of their reach in a Windows Azure or Amazon EC2 instance. The most they will be able to do is get the connection string which will then connect them as a limited rights user.
Also, if someone was to stop our service (Windows service), upon startup we would need to know a row had been changed.
You can create triggers which will write whatever info you want to an audit table, you can then inspect the audit table to determine changes made by your application and directly by the client. Auditing database changes is a well known problem that has been solved many times before, there is plenty of information out there about it.
for each read of the data, we compare a hash of the rows data against a stored hash
As you can probably guess, this is painfully slow and not scalable.
I have two situations in this case:
I want to query a WCF service and hold the data somewhere, because one of the web pages renders based on the data that's retrieved from the service. I don't want the page itself querying the service, but I'd rather have some sort of scheduled worker that runs once every a couple of minutes, and retrieves the data and holds it somewhere.
Where should I cache the service response, and what is the correct way to create the task to query the service every couple minutes?
I think I could achieve this by saving the response to a static variable, alongside the last query date, and then check on the page load if enough time has passed, I call the service and refresh the data, else I use the static cache.
This would also account for the case where no users access the page for a long time, and the site not futilely querying the service.
But it seems kind of rough, are there other, better ways to accomplish this kind of task?
You could indeed take another approach like having a scheduled program query the information and put it in an in-memory cache available to all the web servers in your farm. However, whether that would be better for your scenario depends on the size of your app and how much time/effort you want to spend on it.
An in-memory cache is harder to implement/support than a static variable but it's sometimes better since static variables can be cleared up every time the server resets (e.g. after X number of minutes of inactivity)
Depending on the size of your system I would start with the static variable, test drive the approach for a while and then decide if you need something more sophisticated.
Have you taken a look at Velocity
Nico: Why don't you write a simple console daemon that gets the data and stores it on your end in a database and then have your web app get the data from your local copy? You can make that console app run every certain amount of time. Inserting the data should not be a problem if you are using sql server 2008. You can pass datatable parameters to a stored proc and insert a whole table in one call. If you don't use Sql Server 2008, then serialize the whole collection returned by the web service and store in a table in one big blob column and record the timestamp when you got the data. You then can read the content of that column, deserealize your collection and reconstruct it to native objects for displaying on your page.
I've never seen (and I don't think its possible) to have your web app query the web service every certain amount of time. Imagine the web site is idle for hours therefore no interaction from anybody. That means that no events will fire and nothing will be queried.
Alternatively, you could create a dummy page executing a javascript function at certain intervals and have that javascript function make an ajax request to the server to get the data from the web service and cache the data. The problem is that the minute you walk out of that page, nothing will happen and you'll stop querying the web service. I think this is silly.
I have been tasked to scale out the session for an application. From my research the most obvious choice is to use the State Server session provider, because I don't need the users sessions to persist (SQL Server Session provider)
About the app:
Currently using InProc session provider
All objects stored in session are serializable
All objects are small (mostly simple objects (int, string) and a few simple class instances)
Before I dive head-first into the IT side of things and with the ability to provide a custom session provider with ASP.NET 4, should I even consider a custom session state provider. Why or why not? Are there any "good" ones out there?
Thanks!
User feedback:
Why are we using session: persistence of data between postbacks (e.g. user selections)
How: user makes a selection, selection is stored. User leaves a page and returns,
selections are restored. etc. etc.
Will be creating a web farm
I've provided some links you can read up on on properly scaling session using the state server.
Useful links from Maarten Balliauw's blog:
http://blog.maartenballiauw.be/post/2007/11/ASPNET-load-balancing-and-ASPNET-state-server-%28aspnet_state%29.aspx
http://blog.maartenballiauw.be/post/2008/01/ASPNET-Session-State-Partitioning.aspx
http://blog.maartenballiauw.be/post/2008/01/ASPNET-Session-State-Partitioning-using-State-Server-Load-Balancing.aspx
My State Server related projects:
http://www.codeproject.com/KB/aspnet/p2pstateserver.aspx (latest code at https://github.com/tenor/p2pStateServer)
http://www.codeproject.com/KB/aspnet/stateserverfailover.aspx
Hope that helps.
It depends on what you mean by "scaling" session storage. If your simply talking about session state performance, your not going to beat the in-process session state provider. Switching to the State Server provider will actually make things slower -- due to the extra overhead of serializing and transferring objects across process boundaries.
Where the State Server can help you scale, is that it allows multiple machines in a load balanced web-farm to share a single session state. It is limited by machine memory, however, and if you will have lots of concurrent sessions you may want to use the SQL Session State provider.
For the most performance in a web farm, you can also try using AppFabric as was previously suggested. I haven't done it myself but it is explained here.
Also, here's a link for using Memcached as a Session State provider. Again, I haven't used it, so I can't offer an opinion on it...
EDIT: As #HOCA mentions in the comments, there are 3rd party solutions if cost isn't an issue. One I've heard of (but not used) is ScaleOut SessionServer.
I would highly recommend that before you look in to scaling out session that you first evaluate whether session was even needed in the first place.
Session variables are serialized and deserialized for every single page load whether the page uses that data or not. (EDIT: Craig pointed out that you have some level of control over this in .net 4 http://msdn.microsoft.com/en-us/library/system.web.sessionstate.sessionstatebehavior.aspx However, this still has drawbacks, see comments to this answer.)
For single server instances this is okay as you are just pulling it from the local memory of your web server. The load on these apps tend to be pretty small so caching user specific information locally makes sense.
However, as soon as you move storage of session to another server you have increased the network requirements and page load times of your application. Namely, every page will result in the session data to be moved from the remote server, across the network wire, and into memory of the web server.
At this point you have to ask yourself: is the load to pull this information from the database server directly as necessary more than pulling it from the session server every single time?
There are few instances where pulling it from the database server as needed takes longer or results in more traffic than grabbing it from a remote session server.
Bearing in mind that a lot of people set up their database server(s) to also be session servers and you start to see why use of session doesn't make any sense.
The only time I would consider using session for load balanced web apps is if the time to acquire the data exceeded a "reasonable" amount of time. For example, if you have a really complex query to return a single value and this query would have to be run for lots of pages. But even then there are better ways that reduce the complexity of dealing with remote session data.
I would advise against the use of session state, regardless of the provider.
Especially with your "very small objects" use viewstate. Why?
Scales best of all. NOTHING to remember on the server.
NO worries about session timeout.
No worries about webfarms.
No worries about wasted resources for sessions that will never come back.
Especially in ASP.NET 4 viewstate can be very manageable and small.
I have a legacy client server system where the server maintains a record of some data stored in a sqlite database. The data is related to monitoring access patterns of files stored on the server. The client application is basically a remote viewer of the data. When the client is launched, it connects to the server and gets the data from the server to display in a grid view. The data gets updated in real time on the server and the view in the client automatically gets refreshed.
There are two problems with the current implementation:
When the database gets too big, it takes a lot of time to load the client. What are the best ways to deal with this. One option is to maintain a cache at the client side. How to best implement a cache ?
How can the server maintain a diff so that it only sends the diff during the refresh cycle. There can be multiple clients and each client needs to display the latest data available on the server.
The server is a windows service daemon. Both the client and the server are implemented in C#
Could put a date/timestamp (indexed) on the data and then load the data > last successful timestamp.
Load data in pages so you get a quicker startup and then load the rest in the background.
If you go for the "work offline" (cached) solution then you should take a look at the MS ADO.NET Sync Framework. It supports providers and solves most of the hard problems with synchronizing data.
Another option is to retrieve only selected columns, like primary key and a single descriptive column. The remaining data could be lazy-loaded on demand, such as when it scrolls into view or is being accessed.