What are the benefits and downsides of using Redis for caching such data as userId-UserName pairs, NewsId-NewsDomainName? Why I should not cache this data in app memory bu creatinf Dictionatries for it? I think it must be much faster, than using redis?
Thank you!
Depending on what your workload looks like, you may want one or the other, or a combination of both caching strategies. Why?
in process caching is faster (good for latency), and more importantly, it doesn't produce any network traffic to get a hit (good for scalability);
remote caching, Redis or alike, allows you to keep one copy of the cached data that is accessed by all servers*, so it uses less memory (unless you only have one app server, which seems unlikely), and is less prone to data inconsistency problems (which seems important if you are dealing with user data)
In a cache cluster, or any data cluster where requests for a particular piece of data goes to a small set of servers, one of the biggest issues is hotspot. In this case, you may want to combine both- cache hot keys locally, but very briefly, to prevent overwhelming the backend servers, but not so long that it results in serving stale data for a long time.
* although, if there's more than one cache server in the cluster, and cluster management has server ejection/readmission logic but no data flush logic, you may have stale data on some of the servers.
what if you have multiple server? would your second server know what are stored in the first server? Nope. this could be the main reason you need to use redis.
And if you stored, let's say a great amount of data in your server, it also could affect your server performance
Related
I am currently using MemoryCache _cache = new MemoryCache(new MemoryCacheOptions()); for caching some data from database that does not change so often, but it does change.
And on create/update/delete of that data I do the refresh of the cache.
This works fine, but the problem is that on production we will have few nodes, so when method for creating of record is called for instance, cache will be refreshed only on that node, not on other nodes, and they will have stale data.
My question is, can I somehow fix this using MemoryCache, or I need to do something else, and if I do, what are the possible solutions?
I think you are looking for is Distributed Caching
Using the IDistributedCache interface you can use either Redis or Sql Server and it supplies basic Get/Set/Remove methods. Changes made on one node will be available to other nodes.
Using Redis is a great way of sharing Session type data between servers in a load balanced environment, Sql Server does not seem to be a great fit given that you seem to be caching to avoid db calls.
It might also be worth considering if you are actually complicating things by caching in the first place. When you have a single application you see the benefit, as keeping them in application memory saves a request over the network, but when you have a load balanced scenario, you have to compare retrieving those records from a distributed cached vs retrieving them from the database.
If the data is just an in memory copy of a relatively small database table, then there is probably not a lot to choose performance wise between the two. If the data is based on a complicated expensive query then the cache is the way to go.
If you are making hundreds of requests a minute for the data, then any network request may be too much, but you can consider what are the consequences of the data being a little stale? For example, if you update a record, and the new record is not available immediately on every server, does your application break? Or does the change just occur in a more phased way? In that case you could keep your in process memory cache, just use a shorter Time To Live.
If you really need every change to propagate to every node straight away then you could consider using a library like Cache Manager in conjunction with Redis which can combine an in memory cache and synchronisation with a remote cache.
Somewhat dated question, but maybe still useful: I agree with what ste-fu said, well explained.
I'll only add that, on top of CacheManager, you may want to take a look at FusionCache ⚡🦥, which I recently released.
On top of supporting an optional distributed 2nd layer transparently managed for you, it also has some other nice features like an optimization that prevents multiple concurrent factory for the same cache key from being executed (less load on the source database), a fail-safe mechanism and advanced timeouts with background factory completion
If you will give it a chance please let me know what you think.
/shameless-plug
This is more of a programming strategy and direction question, than the actual code itself.
I am programming in C-Sharp.
I have an application that remotely starts processes on many different clients on the network, could be up to 1000 clients in theory.
It then monitors the status of the remote processes by reading a log file on each client.
I currently do this by running one thread that loops through all of the clients in a list, and reading the log file. It works fine for 10 or 20 machines, but 1000 would probably be untenable.
There are several problems with this approach:
First, if the thread doesn’t finish reading all of the client statuses before it’s called again, the client statuses at the end of the list might not be read and updated.
Secondly, if any client in the list goes offline during this period, the updating hangs, until that client is back online again.
So I require a different approach, and have thought up a few possible ways to resolve this.
Spawn a separate thread for each client, to read their log file and update its progress.
a. However, I’m not sure if having 1000 threads running on my machine is something that would be acceptable.
Test the connect for each machine first, before trying to read the file, and if it cannot connect, then just ignore it for that iteration and move on to the next client in the list.
a. This still has the same problem of not getting through the list before the next call, and causes more delay and it tries to test the connection via a port first. With 1000 clients, this would be noticeable.
Have each client send the data to the machine running the application whenever there is an update.
a. This could create a lot of chatter with 1000 machines trying to send data repeatedly.
So I’m trying to figure if there is another more efficient and reliable method, that I haven’t considered, or which one of these would be the best.
Right now I’m leaning towards having the clients send updates to the application, instead of having the application pulling the data.
Looking for thoughts, concerns, ideas and recommendations.
In my opinion, you are doing this (Monitoring) the wrong way. Instead of keeping all logs in a text file, you'd better preserve them in a central data repository that can be of any kind. With respect to the fact that you are monitoring the performance of those system, your design and the mechanism behind it must not impact the performance of the target systems negatively, and with this design the disk and CPU would be involved so much in certain cases that can result in a performance issue itself.
I recommend you to create a log repository server using a fast in-memory database like Redis, and send logged data directly to that server. Keep in mind that this database must be running on a different virtual machine. You can then tune Redis to store received data on physical Disk once a particular number of indexes are reached or a particular interval elapses. The in-memory feature here is advantageous as you may need to query information a lot in a monitoring application like this. On the other hand, the performance of Redis is so high that it efficiently passes processing millions of indexes.
The blueprint for you is that:
1- Centralize all log data in a single repository.
2- Configure clients to send monitored information to the centralized repository.
3- Read the data from the centralized repository by the main server (monitoring system) when required.
I'm not trying to advertise for a particular tool here as I'm only sharing my own experience. There's many more tools that you can use for this purpose such as ElasticSearch.
SETUP:
We have a .Net application that is distributed over 6 local servers each with a local database(ORACLE), 1 main server and 1 load balance machine. Requests come to the load balancer which redirects the incoming requests to one of the 6 local servers. In certain time intervals data is gathered in the main server and redistributed to the 6 local servers to be able to make decisions with the complete data.
Each local server has a cache component that caches the incoming requests based on different parameters (Location, incoming parameters, etc). With each request a local server decides whether to go to the database (ORACLE) or get the response from the cache. However in both cases the local server has to goto the database to do 1 insert and 1 update per request.
PROBLEM:
On a peak day each local server receives 2000 requests per second and system starts slowing down (CPU: 90% ). I am trying to increase the capacity before adding another local server to the mix. After running some benchmarks the bottleneck as it always is, seems to be the inevitable 1 insert and 1 update per request to database.
TRIED METHODS
To be able decrease the frequency I have created a Windows service that sits between the DB and .NET application. It contains a pipe server and receives each insert and update from the main .NET application and saves them in a Hashtable. The new service then at certain time intervals goes to the database once to do batch inserts and updates. The point was to go to the database less frequently. Although this had a positive effect it didn't benefit to the system load as much as I expected. The most of the cpu load comes from oracle.exe as requests per second increase.
I am trying to avoid going to the database as much as I can and the only way to avoid DB seems to be increasing the cache hit ratio other than the above mentioned solution I tried. My cache hit ratio is around 81 % percent currently. Because each local machine has its own cache I am actually missing lots of cacheable requests. When two similar requests redirects to different servers the second request cannot benefit from the cached result of the first one.
I don't have a lot of experience in system architecture so I would appreciate any help to this problem. Any suggestions on different caching architectures or setup, or any tools are welcome.
Thank you in advance, hopefully I made my question clear.
For me this looks like a application for a timesten solution. In that case you can eliminate the local databases and return to just one. Where you now have the local oracle databases, you can implement a cache grid. Most likely this is going to be a AWT (Async, Write Through) cache. See Oracle In-Memory Database Cache Concepts
It's not a cheap option but if could be worth investigating.
You can keep concentrating on the business logic and have no worries about speed. This of course only works good, if the aplication code is already tuned and the sql is performant and scalable. The SQL has to be prepared (using bind variables) to have the best performance.
Your application connects to the cache and no longer to the database. You create the cache tables in the cache group for which you want to have caching. All tables in a SQL should be cached, otherwise, the complete SQL is passed through to the Oracle database. In the grid a cache fusion mechanism is in place so you have no worries about where the data in your grid is located.
In the current release support for .net is included.
The data is consistent and asynchronously updated to the Oracle database. If the data that is needed is in the cache and you take the Oracle database down, the app can keep running. As soon as the database is back again, the synchronization pick up again. Very powerful.
2000 requests per second per server, about 24000 rps to database. It's a HUGE load for DB.
Try to optimize, scaleup or clusterize database.
May be NoSQL DB (Redis\Raven\Mongo) as middleware will be suitable for you. Local server will read\write sharded NoSQL DB, aggregated data will by synchronized with Oracle off-peak times.
I know the question is old now, but I wanted let everyone know how we solved our issue.
After trying many optimizations it turned out that all we needed was Solid State Drives for the 6 local machines. The CPU dropped down to 30% percent immediately after we installed them. This is the first time that I've seen any kind of hardware update contributes this much to performance.
If you have high load setup, before making any software or architecture changes try upgrading to a SSD.
Thanks everyone for your answers.
I have an application that run on 100 servers. this application communicated with a nosql database which i dont like. 100 servers creating sessions, locking, committing etc.
I d like to build a cache farm or dirty real pools that the IIS servers will go and read data, objects from them. Caches will expires every once a while. etc.
The whole point is to avoid database access from these 100 servers.
What would you use for this cache farm ?
WCF? REST? Best Practices and Patterns Caching Block ? or Windows AppFabric?
I already have a layer of Distributed Cache so not thinking about that.
What architecture would you go for ? any recommendation or case studies?
What sort of volumes of data are we talking about here? A few KB? A MB? A hundred GB?
If anything but the latter, rather than reinvent the wheel, use one of the existing cache services. Based on the C# tag and the mention of IIS, I assume you are running a Windows. As such I would suggest that you have a look at Memcached
From the Memcached page:
Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
I have used Memcached myself in environments similar to the one you describe (using MS SQL rather than nosql for data storage). After an initial round of cache tuning we have not encountered any problems whatsoever.
Without knowing specifics of your requirements it is difficult to say if Memcached is the solution that is right but it is certainly something worth looking into. For reference their FAQ Wiki can be found here:
http://code.google.com/p/memcached/wiki/FAQ
I have used ASP.NET in mostly intranet scenarios and pretty familiar with it but for something such as shopping cart or similar session data there are various possibilities. To name a few:
1) State-Server session
2) SQL Server session
3) Custom database session
4) Cookie
What have you used and what our your success or lessons learnt stories and what would you recommend? This would obviously make a difference in a large-scale public website so please comment on your experiences.
I have not mentioned in-proc since in a large-scale app this has no place.
Many thanks
Ali
The biggest lesson I learned was one I already knew in theory, but got to see in practice.
Removing all use of sessions entirely from an application (does not necessarily mean all of the site) is something we all know should bring a big improvement to scalability.
What I learnt was just how much of an improvement it could be. By removing the use of sessions, and adding some code to handle what had been handled by them before (which at each individual point was a performance lose, as each individual point was now doing more work than it had before) the performance gain was massive to the point of making actions one would measure in many seconds or even a couple of minutes become sub-second, CPU usage became a fraction of what it had been, and the number of machines and amount of RAM went from clearly not enough to cope, to be a rather over-indulgent amount of hardware.
If sessions cannot be removed entirely (people don't like the way browsers use HTTP authentication, alas), moving much of it into a few well-defined spots, ideally in a separate application on the server, can have a bigger effect that which session-storage method is used.
In-proc certainly can have a place in a large-scale application; it just requires sticky sessions at the load balancing level. In fact, the reduced maintenance cost and infrastructure overhead by using in-proc sessions can be considerable. Any enterprise-grade content switch you'd be using in front of your farm would certainly offer such functionality, and it's hard to argue for the cash and manpower of purchasing/configuring/integrating state servers versus just flipping a switch. I am using this in quite large scaled ASP.NET systems with no issues to speak of. RAM is far too cheap to ignore this as an option.
In-proc session (at least when using IIS6) can recycle at any time and is therefore not very reliable because the sessions will end when the server decides, not when the session actually times out. The sessions will also expire when you deploy a new version of the web site, which is not true of server-based session providers. This can potentially give your users a bad experience if you update in the middle of their session.
Using a Sql Server is the best option because it is possible to have sessions that never expire. However, the cost of the server, disk space, its maintenance, and peformance all have to be considered. I was using one on my E-commerce app for several years until we changed providers to one with very little database space. It was a shame that it had to go.
We have been using the state service for about 3 years now and haven't had any issues. That said, we now have the session timeout set at an hour an in E-commerce that is probably costing us some business vs the never expire model.
When I worked for a large company, we used a clustered SQL Server in another application that was more critical to remain online. We had multiple redundency on every part of the system including the network cards. Keep in mind that adding a state server or service is adding a potential single point of failure for the application unless you go the clustered route, which is more expensive to maintain.
There was also an issue when we first switched to the SQL based approach where binary objects couldn't be serialized into session state. I only had a few and modified the code so it wouldn't need the binary serialization so I could get the site online. However, when I went back to fix the serialization issue a few weeks later, it suddenly didn't exist anymore. I am guessing it was fixed in a Windows Update.
If you are concerned about security, state server is a no-no. State server performs absolutely no access checks, anybody who is granted access to the tcp port state server uses can access or modify any session state.
In proc is unreliable (and you mentioned that) so that's not to consider.
Cookies isn't really a session state replacement since you can't store much data there
I vote for a database based storage (if needed at all) of some kind, it has the best possibility to scale.