I have an application that run on 100 servers. this application communicated with a nosql database which i dont like. 100 servers creating sessions, locking, committing etc.
I d like to build a cache farm or dirty real pools that the IIS servers will go and read data, objects from them. Caches will expires every once a while. etc.
The whole point is to avoid database access from these 100 servers.
What would you use for this cache farm ?
WCF? REST? Best Practices and Patterns Caching Block ? or Windows AppFabric?
I already have a layer of Distributed Cache so not thinking about that.
What architecture would you go for ? any recommendation or case studies?
What sort of volumes of data are we talking about here? A few KB? A MB? A hundred GB?
If anything but the latter, rather than reinvent the wheel, use one of the existing cache services. Based on the C# tag and the mention of IIS, I assume you are running a Windows. As such I would suggest that you have a look at Memcached
From the Memcached page:
Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
I have used Memcached myself in environments similar to the one you describe (using MS SQL rather than nosql for data storage). After an initial round of cache tuning we have not encountered any problems whatsoever.
Without knowing specifics of your requirements it is difficult to say if Memcached is the solution that is right but it is certainly something worth looking into. For reference their FAQ Wiki can be found here:
http://code.google.com/p/memcached/wiki/FAQ
Related
I am currently using MemoryCache _cache = new MemoryCache(new MemoryCacheOptions()); for caching some data from database that does not change so often, but it does change.
And on create/update/delete of that data I do the refresh of the cache.
This works fine, but the problem is that on production we will have few nodes, so when method for creating of record is called for instance, cache will be refreshed only on that node, not on other nodes, and they will have stale data.
My question is, can I somehow fix this using MemoryCache, or I need to do something else, and if I do, what are the possible solutions?
I think you are looking for is Distributed Caching
Using the IDistributedCache interface you can use either Redis or Sql Server and it supplies basic Get/Set/Remove methods. Changes made on one node will be available to other nodes.
Using Redis is a great way of sharing Session type data between servers in a load balanced environment, Sql Server does not seem to be a great fit given that you seem to be caching to avoid db calls.
It might also be worth considering if you are actually complicating things by caching in the first place. When you have a single application you see the benefit, as keeping them in application memory saves a request over the network, but when you have a load balanced scenario, you have to compare retrieving those records from a distributed cached vs retrieving them from the database.
If the data is just an in memory copy of a relatively small database table, then there is probably not a lot to choose performance wise between the two. If the data is based on a complicated expensive query then the cache is the way to go.
If you are making hundreds of requests a minute for the data, then any network request may be too much, but you can consider what are the consequences of the data being a little stale? For example, if you update a record, and the new record is not available immediately on every server, does your application break? Or does the change just occur in a more phased way? In that case you could keep your in process memory cache, just use a shorter Time To Live.
If you really need every change to propagate to every node straight away then you could consider using a library like Cache Manager in conjunction with Redis which can combine an in memory cache and synchronisation with a remote cache.
Somewhat dated question, but maybe still useful: I agree with what ste-fu said, well explained.
I'll only add that, on top of CacheManager, you may want to take a look at FusionCache ⚡🦥, which I recently released.
On top of supporting an optional distributed 2nd layer transparently managed for you, it also has some other nice features like an optimization that prevents multiple concurrent factory for the same cache key from being executed (less load on the source database), a fail-safe mechanism and advanced timeouts with background factory completion
If you will give it a chance please let me know what you think.
/shameless-plug
What are the benefits and downsides of using Redis for caching such data as userId-UserName pairs, NewsId-NewsDomainName? Why I should not cache this data in app memory bu creatinf Dictionatries for it? I think it must be much faster, than using redis?
Thank you!
Depending on what your workload looks like, you may want one or the other, or a combination of both caching strategies. Why?
in process caching is faster (good for latency), and more importantly, it doesn't produce any network traffic to get a hit (good for scalability);
remote caching, Redis or alike, allows you to keep one copy of the cached data that is accessed by all servers*, so it uses less memory (unless you only have one app server, which seems unlikely), and is less prone to data inconsistency problems (which seems important if you are dealing with user data)
In a cache cluster, or any data cluster where requests for a particular piece of data goes to a small set of servers, one of the biggest issues is hotspot. In this case, you may want to combine both- cache hot keys locally, but very briefly, to prevent overwhelming the backend servers, but not so long that it results in serving stale data for a long time.
* although, if there's more than one cache server in the cluster, and cluster management has server ejection/readmission logic but no data flush logic, you may have stale data on some of the servers.
what if you have multiple server? would your second server know what are stored in the first server? Nope. this could be the main reason you need to use redis.
And if you stored, let's say a great amount of data in your server, it also could affect your server performance
I have used ASP.NET in mostly intranet scenarios and pretty familiar with it but for something such as shopping cart or similar session data there are various possibilities. To name a few:
1) State-Server session
2) SQL Server session
3) Custom database session
4) Cookie
What have you used and what our your success or lessons learnt stories and what would you recommend? This would obviously make a difference in a large-scale public website so please comment on your experiences.
I have not mentioned in-proc since in a large-scale app this has no place.
Many thanks
Ali
The biggest lesson I learned was one I already knew in theory, but got to see in practice.
Removing all use of sessions entirely from an application (does not necessarily mean all of the site) is something we all know should bring a big improvement to scalability.
What I learnt was just how much of an improvement it could be. By removing the use of sessions, and adding some code to handle what had been handled by them before (which at each individual point was a performance lose, as each individual point was now doing more work than it had before) the performance gain was massive to the point of making actions one would measure in many seconds or even a couple of minutes become sub-second, CPU usage became a fraction of what it had been, and the number of machines and amount of RAM went from clearly not enough to cope, to be a rather over-indulgent amount of hardware.
If sessions cannot be removed entirely (people don't like the way browsers use HTTP authentication, alas), moving much of it into a few well-defined spots, ideally in a separate application on the server, can have a bigger effect that which session-storage method is used.
In-proc certainly can have a place in a large-scale application; it just requires sticky sessions at the load balancing level. In fact, the reduced maintenance cost and infrastructure overhead by using in-proc sessions can be considerable. Any enterprise-grade content switch you'd be using in front of your farm would certainly offer such functionality, and it's hard to argue for the cash and manpower of purchasing/configuring/integrating state servers versus just flipping a switch. I am using this in quite large scaled ASP.NET systems with no issues to speak of. RAM is far too cheap to ignore this as an option.
In-proc session (at least when using IIS6) can recycle at any time and is therefore not very reliable because the sessions will end when the server decides, not when the session actually times out. The sessions will also expire when you deploy a new version of the web site, which is not true of server-based session providers. This can potentially give your users a bad experience if you update in the middle of their session.
Using a Sql Server is the best option because it is possible to have sessions that never expire. However, the cost of the server, disk space, its maintenance, and peformance all have to be considered. I was using one on my E-commerce app for several years until we changed providers to one with very little database space. It was a shame that it had to go.
We have been using the state service for about 3 years now and haven't had any issues. That said, we now have the session timeout set at an hour an in E-commerce that is probably costing us some business vs the never expire model.
When I worked for a large company, we used a clustered SQL Server in another application that was more critical to remain online. We had multiple redundency on every part of the system including the network cards. Keep in mind that adding a state server or service is adding a potential single point of failure for the application unless you go the clustered route, which is more expensive to maintain.
There was also an issue when we first switched to the SQL based approach where binary objects couldn't be serialized into session state. I only had a few and modified the code so it wouldn't need the binary serialization so I could get the site online. However, when I went back to fix the serialization issue a few weeks later, it suddenly didn't exist anymore. I am guessing it was fixed in a Windows Update.
If you are concerned about security, state server is a no-no. State server performs absolutely no access checks, anybody who is granted access to the tcp port state server uses can access or modify any session state.
In proc is unreliable (and you mentioned that) so that's not to consider.
Cookies isn't really a session state replacement since you can't store much data there
I vote for a database based storage (if needed at all) of some kind, it has the best possibility to scale.
Following on from this question...
What to do when you’ve really screwed up the design of a distributed system?
... the client has reluctantly asked me to quote for option 3 (the expensive one), so they can compare prices to a company in India.
So, they want me to quote (hmm). In order for me to get this as accurate as possible, I will need to decide how I'm actually going to do it. Here's 3 scenarios...
Scenarios
Split the database
My original idea (perhaps the most tricky) will yield the best speed on both the website and the desktop application. However, it may require some synchronising between the two databases as the two "systems" so heavily connected. If not done properly and not tested thouroughly, I've learnt that synchronisation can be hell on earth.
Implement caching on the smallest system
To side-step the sync option (which I'm not fond of), I figured it may be more productive (and cheaper) to move the entire central database and web service to their office (i.e. in-house), and have the website (still on the hosted server) download data from the central office and store it in a small database (acting as a cache)...
Set up a new server in the customer's office (in-house).
Move the central database and web service to the new in-house server.
Keep the web site on the hosted server, but alter the web service URL so that it points to the office server.
Implement a simple cache system for images and most frequently accessed data (such as product information).
... the down-side is that when the end-user in the office updates something, their customers will effectively be downloading the data from a 60KB/s upload connection (albeit once, as it will be cached).
Also, not all data can be cached, for example when a customer updates their order. Also, connection redundancy becomes a huge factor here; what if the office connection is offline? Nothing to do but show an error message to the customers, which is nasty, but a necessary evil.
Mystery option number 3
Suggestions welcome!
SQL replication
I had considered MSSQL replication. But I have no experience with it, so I'm worried about how conflicts are handled, etc. Is this an option? Considering there are physical files involved, and so on. Also, I believe we'd need to upgrade from SQL express to SQL non-free, and buy two licenses.
Technical
Components
ASP.Net website
ASP.net web service
.Net desktop application
MSSQL 2008 express database
Connections
Office connection: 8 mbit down and 1 mbit up contended line (50:1)
Hosted virtual server: Windows 2008 with 10 megabit line
Having just read for the first time your original question related to this I'd say that you may have laid the foundation for resolving the problem simply because you are communicating with the database by a web service.
This web service may well be the saving grace as it allows you to split the communications without affecting the client.
A good while back I was involved in designing just such a system.
The first thing that we identified was that data which rarely changes - and immediately locked all of this out of consideration for distribution. A manual process for administering using the web server was the only way to change this data.
The second thing we identified was that data that should be owned locally. By this I mean data that only one person or location at a time would need to update; but that may need to be viewed at other locations. We fixed all of the keys on the related tables to ensure that duplication could never occur and that no auto-incrementing fields were used.
The third item was the tables that were truly shared - and although we worried a lot about these during stages 1 & 2 - in our case this part was straight-forwards.
When I'm talking about a server here I mean a DB Server with a set of web services that communicate between themselves.
As designed our architecture had 1 designated 'master' server. This was the definitive for resolving conflicts.
The rest of the servers were in the first instance a large cache of anything covered by item1. In fact it wasn't a large cache but a database duplication but you get the idea.
The second function of the each non-master server was to coordinate changes with the master. This involved a very simplistic process of actually passing through most of the work transparently to the master server.
We spent a lot of time designing and optimising all of the above - to finally discover that the single best performance improvement came from simply compressing the web service requests to reduce bandwidth (but it was over a single channel ISDN, which probably made the most difference).
The fact is that if you do have a web service then this will give you greater flexibility about how you implement this.
I'd probably start by investigating the feasability of implementing one of the SQL server replication methods
Usual disclaimers apply:
Splitting the database will not help a lot but it'll add a lot of nightmare. IMO, you should first try to optimize the database, update some indexes or may be add several more, optimize some queries and so on. For database performance tuning I recommend to read some articles from simple-talk.com.
Also in order to save bandwidth you can add bulk processing to your windows client and also add zipping (archiving) to your web service.
And probably you should upgrade to MS SQL 2008 Express, it's also free.
It's hard to recommend a good solution for your problem using the information I have. It's not clear where is the bottleneck. I strongly recommend you to profile your application to find exact place of the bottleneck (e.g. is it in the database or in fully used up channel and so on) and add a description of it to the question.
EDIT 01/03:
When the bottleneck is an up connection then you can do only the following:
1. Add archiving of messages to service and client
2. Implement bulk operations and use them
3. Try to reduce operations count per user case for the most frequent cases
4. Add a local database for windows clients and perform all operations using it and synchronize the local db and the main one on some timer.
And sql replication will not help you a lot in this case. The most fastest and cheapest solution is to increase up connection because all other ways (except the first one) will take a lot of time.
If you choose to rewrite the service to support bulking I recommend you to have a look at Agatha Project
Actually hearing how many they have on that one connection it may be time to up the bandwidth at the office (not at all my normal response) If you factor out the CRM system what else is a top user of the bandwidth? It maybe the they have reached the point of needing more bandwidth period.
But I am still curious to see how much information you are passing that is getting used. Make sure you are transferring efferently any chance you could add some easy quick measures to see how much people are actually consuming when looking at the data.
Say I need to design an in-memory service because of a very high load read/write system. I want to dump the results of the objects every 2 minutes. How would I access the in-memory objects/data from within a web application?
(I was thinking a Windows service would be running in the background handling the in-memory service etc.)
I want the fastest possible solution, and I would guess most people would say use a web service? What other options would I have? I just don't understand how I could hook into the Windows service's objects etc.
(Please don't ask why I would want to do this, maybe you're right and it's a bad idea but I am also curious if this type of architecture is possible.)
Update
I was looking at this site swoopo.com that I would think has a lot of hits near the end of auctions, but since the auction keeps resetting the hits to the database would be just crazy so I was thinking if they did it in memory then dumped to db every x minutes...
What you're describing is called a cache, with a facade front-end.
You write a facade to which you commit your changes and acquire your datasets. The facade queues up reads and writes and commits when the queue is full or after a certain amount of time has passed. Your web application has a single point of access to the data (the facade), and the facade is structured in such a way to avoid writing and reading from storage too often.
Most relational database management systems do this for you. They do this kind of optimization and queuing internally so writing another layer on top of it would only slow things down. So don't write a cache if you're using an RDBMS.
Regarding the specifics of accessing such a facade, you can treat it as just an object, and implement it however you want (its own thread, a thread pool, a Web service, a Windows service, whatever).
Any remoting technology would work such as sockets, pipes and the like.
Check out: www.remobjects.com
You could use a Windows Message Queues or a Service Bus, or even .NET remoting.
See http://www.nservicebus.com/, or http://code.google.com/p/masstransit/.
You could hook into the Windows Services objects by using Remoting or WCF, both offer very fast interprocess communication. Sockets are fast too but are more cumbersome to program compared to WCF. There is a ton of WCF documentation and support online.
Databases provide a level of caching for you. The advantage of an in memory golden copy such as the one you propose is that it never has to read from disk when a request comes in and if you host it on the same machine as your IIS (provided you have enough RAM for both) there is no extra network hop, making it much faster that querying a db. However, the downside to this approach is that it does not scale as well if you need to add machines to load balance.
Third party messaging providers such as TIBCO are also worth looking at.