Data Caching in Web API

Data Caching in Web API - c#

I am working on web api project, my web api is calling repository.Repository calls third party data source to perform CRUD.Calling the data source is very costly, and it gets updated weekly.
So I thought to implement caching. I have seen few output caching packages, but it does not fulfill my requirement, because:
If I output cache Get method, I am not able to use same Cache output in GetById method or the same cached data for some other operation like find opeartion. I have to also manually update cache when ever any update/post happens.
One more thing i am confused what to do in this scenario whether remove cache or update
cache whenever put or post operation happens?
I am totally confused to complete this requirement.Please suggest me how to fulfill this requirement.I searched on web,but have not found anything like that.
I am novice both on SO and WebAPI so pardon me if question not fulfilling the standard

If I output cache Get method, I am not able to use same Cache output
in GetById method or the same cached data for some other operation
like find opeartion. I have to also manually update cache when ever
any update/post happens.
To use the cached data for different operations like GetById and Find you need to store data in different data structures. Caches like REDIS supports hashmap for objects which can be used by GetById. It totally depends on your case what kind DS you need to use.
One more thing i am confused what to do in this scenario whether
remove cache or update cache whenever put or post operation happens?
To answer second part of your first question and this one I would say you need to choose between a writeback and write through cache. You can read more about WB and WT caches in this article. Basically there are two approaches
Each cache entry has some TTL and after that you fetch data from "data source". The Pros is your POST and PUT operations will be faster since it does not need to update the cache but cons is the data might be stale for sometime.
The second option is to invalidate the appropriate entry in the cache whenever a POST or PUT operation happens.
The third option is to update the cache entry at the time of POST and PUT.
In terms of write / update latency option 1 is the fastest but has the risk of getting stale data. Option 2 will slow down both GET, PUT / POST operations and option 3 will slow down the write operations.
Your choice should depend on the ratio of read and write operations in the system. If the system you are designing is read heavy then option 3 is better than 2.

Related

Asp.Net core distributed caching

I am currently using MemoryCache _cache = new MemoryCache(new MemoryCacheOptions()); for caching some data from database that does not change so often, but it does change.
And on create/update/delete of that data I do the refresh of the cache.
This works fine, but the problem is that on production we will have few nodes, so when method for creating of record is called for instance, cache will be refreshed only on that node, not on other nodes, and they will have stale data.
My question is, can I somehow fix this using MemoryCache, or I need to do something else, and if I do, what are the possible solutions?

I think you are looking for is Distributed Caching
Using the IDistributedCache interface you can use either Redis or Sql Server and it supplies basic Get/Set/Remove methods. Changes made on one node will be available to other nodes.
Using Redis is a great way of sharing Session type data between servers in a load balanced environment, Sql Server does not seem to be a great fit given that you seem to be caching to avoid db calls.
It might also be worth considering if you are actually complicating things by caching in the first place. When you have a single application you see the benefit, as keeping them in application memory saves a request over the network, but when you have a load balanced scenario, you have to compare retrieving those records from a distributed cached vs retrieving them from the database.
If the data is just an in memory copy of a relatively small database table, then there is probably not a lot to choose performance wise between the two. If the data is based on a complicated expensive query then the cache is the way to go.
If you are making hundreds of requests a minute for the data, then any network request may be too much, but you can consider what are the consequences of the data being a little stale? For example, if you update a record, and the new record is not available immediately on every server, does your application break? Or does the change just occur in a more phased way? In that case you could keep your in process memory cache, just use a shorter Time To Live.
If you really need every change to propagate to every node straight away then you could consider using a library like Cache Manager in conjunction with Redis which can combine an in memory cache and synchronisation with a remote cache.

Somewhat dated question, but maybe still useful: I agree with what ste-fu said, well explained.
I'll only add that, on top of CacheManager, you may want to take a look at FusionCache ⚡🦥, which I recently released.
On top of supporting an optional distributed 2nd layer transparently managed for you, it also has some other nice features like an optimization that prevents multiple concurrent factory for the same cache key from being executed (less load on the source database), a fail-safe mechanism and advanced timeouts with background factory completion
If you will give it a chance please let me know what you think.
/shameless-plug

How to manage frequent data access in .net application?

I have three tables in my sql Database say Specials, Businesses, Comments. And in my master page i have a prompt area where i need to display alternative data from these 3 tables based on certain conditions during each page refresh (These tables have more than 1000 records). So in that case what will be the best option to retrieve data from these tables?
Accessing data each time from database is not a good idea i know, is there any other good way to do this, like Caching or any other new techniques to effectively manage this. Now it takes too much time to load the page after each page refresh.
Please give your suggestions.
At present what i was planning is to create a SP for data retrieval and to keep the value returned in a Session.
So that we can access the data from this session rather going to DB each time on page refresh. But do not know is there any other effective way to accomplish the same.

Accessing data each time from database is not a good idea
It not always true, it depends on how frequently the data is getting changed. If you choose to cache the data, you will have to revalidate it every time the data is changed. I am assuming you do not want to display a static count or something that once displayed will not change. If that's not the case, you can simply store in cookies and display from there.
Now it takes too much time to load the page after each page refresh.
Do you know what takes too much time? Is it client side code or server side code (use Glimpse to know that)? If server side, is it the code that hits the DB and the query execution time or its server side in memory manipulation.
Generally first step to improve performance is to measure it precisely and in order for you to solve such issues you ought to know where the problem is.
Based on your first statement, If i were you, I would display each count in a separate div which will be refreshed asynchronously. You could choose to update the data periodically using a timer or even better push it from server (use SignalR). The update will happen transparently so no page reload required.
Hope this helps.

I agree that 1000 records doesn't seem like a lot, but if you really aren't concerned about there being a slight delay you may try using HttpContext.Cache object. It's very much like a dictionary with string keys and object values, with the addition that you can set expirations etc...
Excuse typos, on mobile so no compile check:
var tableA = HttpContext.Cache.Get("TableA")
if tableA == null {
//if its null, there was no copy in the cache so create your
//object using your database call
tableA = Array, List, however you store your data
//add the item to the cache, with an expiration of 1 minute
HTTPContext.Cache.Insert("TableA", tableA, null, NoAbsoluteExpiration, TimeSpan(0,1,0))
}
Now, no matter how many requests go through, you only hit the database once a minute, or once for however long you think is reasonable considering your needs. You can also trigger a removal of the item from cache, if some particular condition happens.

One suggestion is to think of your database as a mere repository to persist state. Your application tier could cache collections of your business objects, persist them when they change, and immediately return state to your presentation tier (the web page).
This assumes all updates to the data are coming from your page. If the database is being populated from different places, you'll need to either tie everything into a common application tier, or poll the database to update your cache.

WCF discard duplicate data

I'm writing a gateway between 2 services, one of the services is a very slow webservice, and gets overloaded quickly, the other is super quick and frequently sends the same data.
I'd like to have my service discard (at the earliest point possible) data that I've received that is equal to previous objects.
What is the best way to do this?
The best way I know (which I doubt is best) is to compare received objects after deserialization with the set of objects I've already received (a cache, in other words).
I care more that I discard as much as is computationally easy to discard, than making sure I discard all duplicate data.
FYI, the data has, among other things, geolocational information that is frequently the same.
Clarification:
Situation:
Service 1 is fast and frequently sends updates that have no new data.
Service 2 is slow
I want to send data from Service 1 to Service 2 (with some slight modifications), but only if I haven't already sent the same data.
Dale

It's hard to say what the best way is without a little more info, but it sounds like you could benefit from a relatively simple cache. I'm not sure if you're in a write heavy or read heavy scenario, but you should be able to make it work either way.
IE, the quick service is called and checks for results in cache before calling the slow service.

I think Kennith provided a good idea.
Just to frame the problem, it sounds to me like you have something like this situation (clarify your question if this isn't correct)...
[Service 1] -> (Calls) -> [Service 2]
Service 1 - Faster and overloads Service 2.
Service 1 - Sends repeat data to Service 2, so much of it can be ignored.
If this is the case, as Kenneth suggested, you may want to implement a caching mechanism of requests that have already been sent from Service 1, and store the answers as received from Service 2. So, in pseudocode, something like this:
Service 1 checks in some common storage to see if a request it is about to send has already been sent.
If it has, it checks to see the answer that was sent back from Service 2.
If it hasn't, it sends the request and adds that request to the list of requests that are sent.
When the answer is provided, that is stored in the cache along with the original request.
This will also have the benefit of making your lookups faster via Service 1.

The earliest point possible is to detect the duplicate data is in the sender of the data, not in the receiver.
Add a memory cache there. If the object is duplicate, then no need to serialize it and waste any time sending it.
By the way, if you want a good .Net memory cache, I've had good luck with Couchbase's MemcacheD

When is it appropriate to use CacheItemRemovedCallback?

I have a large data set that is updated once a day. I am caching the results of an expensive query on that data but I want to update that cache each day. I am considering using CacheItemRemovedCallback to reload my cache on a daily interval, but I had the following concerns:
Isn't it possible that the CacheItemRemovedCallback could be called before my expiration (in the case of running out of memory)? Which means reloading it immediately doesn't seem like a good idea.
Does the CacheItemRemovedCallback get called before or after the item is actually removed? If it is after, doesn't this theoretically leave a period of time where the cache would be unavailable?
Are these concerns relevant and if using CacheItemRemovedCallback to reload your cache is a bad idea, then when is it useful?

If you're going to reload, be sure to check the CacheItemRemovedReason. I recently had to debug an issue where a developer decided they should immediately re-populate the cache in this method, and under low memory conditions, it basically sat chewing up CPU while it got stuck in a loop of building the cache objects, adding them to the cache, expiring, repeat.
The callback is fired after the item is removed.

From everyone's responses and from further reading I have come to the following conclusion:
My concerns are valid. Using CacheItemRemovedCallback to refresh cached items is not a good idea. The only practical use for this callback seems to be logging information about when your cache is removed.
It seems that CacheItemUpdateCallback is the more appropriate way of refreshing your cache on a regular interval.
Ultimately, I have decided not to use either of these calls. Instead I will write a service action so the database import job can notify my application when it needs to refresh its data. This avoids using a timed refresh altogether.

Yes, there is a change that the method could be fired off for a lot of various reasons. However, loading or waiting to load the cache again would be dependent upon what is best for your typical use case in your application.
CacheItemRemovedCallback does indeed fire after the item is removed from the cache. Right before the item is to be removed, you can use the CacheItemUpateCallback method to determine whether or not you want to flush the cache at that time. There may be good reasons to wait in flushing the cache, such as you currently have users in your application and it takes a long amount of time to build the cache again.
Generally speaking, the best practice is to test that your cached item actually exists in the cache before using its data. If the data doesn't exist, you can rebuild the cache at that time (causing a slightly longer response for the user) or choose to do something else.

This really isn't so much a cache of individual values as it is a snapshot of an entire dataset. As such, you don't benefit from using the Cache class here.
I'd recommend loading a static collection on startup and replacing it every 24 hours by setting a timer. The idea would be to create a new collection and atomically assign it, as the old one may still be in use and we want it to remain self-consistent.

Optimizing Smart Client Performance

I have a smart client (WPF) that makes calls to the server va services (WCF). The screen I am working on holds a list of objects that it loads when the constructor is called. I am able to add, edit and delete records in the list.
Typically what I am doing is after every add or delete I am reloading the entire model from the service again, there are a number off reasons for this including the fact that the data may have changed on the server between calls.
This approach has proved to be a big hit on perfomance because I am loading everything sending the list up and down the wire on Add and Edit.
What other options are open to me, should I only be send the required information to the server and how would I go about not reloading all the data again ever time an add or delete is performed?

The optimal way of doing what you're describing (I'm going to assume that you know that client/server I/O is the bottleneck already) is to send only changes in both directions once the client is populated.
This can be straightforward if you've adopted a journaling model for updates to the data. In order for any process to make a change to the shared data, it has to create a time-stamped transaction that gets added to a journal. The update to the data is made by a method that applies the transaction to the data.
Once your data model supports transaction journals, you have a straightforward way of keeping the client and server in sync with a minimum of network traffic: to update the client, the server sends all of the journal entries that have been created since the last time the client was updated.
This can be a considerable amount of work to retrofit into an existing design. Before you go down this road, you want to be sure that the problem you're trying to fix is in fact the problem that you have.

Make sure this functionality is well-encapsulated so you can play with it without having to touch other components.
Have your source under version control and check in often.
I highly recommend having a suite of automated unit tests to verify that everything works as expected before refactoring and continues to work as you perform each change.
If the performance hit is on the server->client transfer of data, moreso than the querying, processing and disk IO on the server, you could consider devising a hash of a given collection or graph of objects, and passing the hash to a service method on the server, which would query and calculate the hash from the db, compare the hashes, and return true or false. Only if false would you then reload data. This works if changes are unlikely or infrequent, because it requires two calls to get the data, when it has changed. If changes in the db are a concern, you might not want to only get the changes when the user modifies or adds something -- this might be a completely separate action based on a timer, for example. Your concurrency strategy really depends on your data, number of users, likelihood of more than one user being interested in changing the same data at the same time, etc.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.