I'm building a WCF translation service. The service uses google translat's web api.
To avoid re-fetching commonly searched queries, the service keeps a memory cache of the last searches.
The search function first checks the cache, and only then puts out a request to google.
Last important detail - every time a request arrives, I construct a string to be it's key - it's comprised of the search term, and the two languages name codes - so it is unique per specific search.
The question is this:
Suppose two exact request's arrive at the same time, I would like to lock the second one out of the whole search function, so that when it enters it will find the search result that was already entered to the cache by the first one. If a different request arrives, I want to let it in.
My idea was to put a lock using the string I constructed, as it is unique per exact search (same search term, same languages). Will this give the result I described?
Thank for reading this far (:
See the code example below - and again thanks!
public async Task<TranslationResult> TranslateAsync(Langueges From, Langueges To, string Content)
{
string key = Content + From.ToString() + To.ToString();
lock(key)
{
//look for the search result in the cache
//if not found, request from google and store result in the memory cache
}
}
Something like this?
// this cache is probably going to be the application level cache in asp.net
// but using Dictionary as an example in this case
Dictionary<string,TranslationResult> cache = new ...
private static object syncRoot = new Object();
if (!cache.ContainsKey(key)
{
lock(syncRoot)
{
// this double checks is not a mistake.
// the second request would check for existence of
// the key once it acquire the lock
if (!cache.ContainsKey(key) {
// send request to google then store the result
}
else {
return cache[key];
}
}
}
else
{
return cache[key];
}
I don't think you'll get the behavior that you're looking for: the variable "key" will always be a newly created object and the current thread will always be granted the lock right away. I don't believe string interning will be in play here because the "key" variable is not a literal. If "key" were an interned string, then it might work like you want, but that would be a strange side-effect.
What to do? Consider using the ConcurrentDictionary.GetOrAdd method. Your "key" variable will become the key argument. If the key is not in the cache, then the valueFactory will be called. Your valuefactory implementation should send the call to Google. This is ConcurrentDictionary is thread-safe, so no explicit locking is needed on your part.
if two exact request's arrives at the same time then returns one of them with orginaly :)
You can know which is not trunslated then send it again. i made this before like this
Related
I have a web api in C#.
There is a function which can be called a few times from client (asynchronized of-course) with same param.
What I need is a way to check if the function was just called with same params and if so, skip some code. (heavy action which there is no need to do it twice)
I tried to add a list to the HttpContext.Current.Appliction, and in the start of the function check if the list contains this param:
If contains - skip it.
If not - add the param to the list and perform the action
at the end of the function, remove the the param from the list.
However, this didn't work as the code is being called from a few different places in the client (asynchronized).
So - sometimes it comes to the "if" line and the param is not in the list yet, but then before it adds the param to the list it reaches the "if" line again from the second call, so the param is not in the list yet, so both of them are "true" and both gets into the if.
public void DoAction(string param)
{
try
{
if (HttpContext.Current.Application["CurrentDoActionParams"] == null)
{
HttpContext.Current.Application.Add("CurrentDoActionParams", new List<string>());
}
if (!((List<string>)HttpContext.Current.Application["CurrentDoActionParams"]).Contains(param))
{
((List<string>)HttpContext.Current.Application["CurrentDoActionParams"]).Add(param);
//.....
//do heavy action here
//.....
}
}
catch (Exception ex)
{
throw ex;
}
finally
{
if (HttpContext.Current.Application["CurrentDoActionParams"] != null)
{
if (((List<string>)HttpContext.Current.Application["CurrentDoActionParams"]).Contains(param))
{
((List<string>)HttpContext.Current.Application["CurrentDoActionParams"]).Remove(param);
}
}
}
}
Is there a way to achieve this? What is the correct way?
Can you implement a cache, and set a suitable expiration option?
Your action can then check the cache to see if the item already exists and return it; if not present it can perform the necessary actions to generate a fresh item for the user, then add it to the cache.
application level a cache is a good way to ensure this.
an example is here: https://learn.microsoft.com/en-us/dotnet/framework/performance/caching-in-net-framework-applications
the key could be the name of the method with the specific parameter values.
If your method for example looks something like this :
public int MyMethod(int firstParam, int secondParam)
then your cache key, when you call the method with values 3,4 could be:
MyMethod_3_4
and the data is whatever the result of that method is. This way when the method is called, you get the key, you go check the cache ....
is it there? get the data from cache using the unique key
it isn't there, so go run the method, get the data and cache it.
Make your cache expire after whatever value makes sense to you.
Now, only do this if you do not have many methods and huge amounts of data.
If you want to store considerably more data then use a dedicated caching system, something like Redis, NCache or whatever else you deem appropriate.
Firstly, if you're doing it that way I'd recommend a HashSet<string> as the lookup will be faster and it is designed for this kind of thing. Make sure you put a lock around checking and adding stuff to the hash set.
#marhls suggested a cache, and this is pretty easy to do and comes out of the box. Just make sure that, if something is not in the cache, use a lock when adding it so that you don't get the apparent race-conditions you're currently experiencing.
I'm having some trouble understanding the basic concepts of locking in a multi-user / web application.
When a user gets authorized by our federation, he'll return with a username claim, which we'll then use to retrieve some extra info about him like so:
var claimsIdentity = (ClaimsIdentity)HttpContext.Current.User.Identity;
if(!claimsIdentity.HasClaim(CustomClaims.UserId)) //If not set, retrieve it from dataBase
{
//This can take some time
var userId = retrieveUserId(claimsIdentity.FindFirst(ClaimTypes.NameIdentifier));
//Because the previous call could take some time, it's possible we add the claim multiple time during concurrent requests
claimsIdentity.AddClaim(new Claim(CustomClaims.UserId, userId));
}
As indicated in the code, having duplicate claims isn't really what I'm looking for, so I thought I'd lock everything around the check whether the claim exists or not:
private static readonly object _authorizeLock = new object();
...
lock(_authorizeLock)
{
if(!claimsIdentity.HasClaim(CustomClaims.UserId)) //If not set, retrieve it from dataBase
{
...
}
}
However, this doesn't feel right. Wouldn't this lock be for all incoming requests? This would mean that even authorized users would still have to "wait", even though their info has already been retrieved.
Does anybody have an idea how I could best deal with this?
Answer 1:
Get over it and live with duplicate entries.
Answer 2:
If you have sessions turned on you get implicit locking between requests from the same user (session) by accessing the session storage. Simply add a dummy
Session["TRIGGER_SESSION_LOCKING_DUMMY"] = true
Answer 3:
Implement some custom locking on an object indexed by your Identity. Something like this
lock(threadSafeStaticDictionary[User.Identity.Name]) { ... }
Answer 4:
Lock on the Identity object directly (which should be shared since you get duplicates) (though it is not recommended)
lock(User.Identity)
Are the following assumptions valid for this code? I put some background info under the code, but I don't think it's relevant.
Assumption 1: Since this is a single application, I'm making the assumption it will be handled by a single process. Thus, static variables are shared between threads, and declaring my collection of lock objects statically is valid.
Assumption 2: If I know the value is already in the dictionary, I don't need to lock on read. I could use a ConcurrentDictionary, but I believe this one will be safe since I'm not enumerating (or deleting), and the value will exist and not change when I call UnlockOnValue().
Assumption 3: I can lock on the Keys collection, since that reference won't change, even if the underlying data structure does.
private static Dictionary<String,Object> LockList =
new Dictionary<string,object>();
private void LockOnValue(String queryStringValue)
{
lock(LockList.Keys)
{
if(!LockList.Keys.Contains(queryStringValue))
{
LockList.Add(screenName,new Object());
}
System.Threading.Monitor.Enter(LockList[queryStringValue]);
}
}
private void UnlockOnValue(String queryStringValue)
{
System.Threading.Monitor.Exit(LockList[queryStringValue]);
}
Then I would use this code like:
LockOnValue(Request.QueryString["foo"])
//Check cache expiry
//if expired
//Load new values and cache them.
//else
//Load cached values
UnlockOnValue(Request.QueryString["foo"])
Background: I'm creating an app in ASP.NET that downloads data based on a single user-defined variable in the query string. The number of values will be quite limited. I need to cache the results for each value for a specified period of time.
Approach: I decided to use local files to cache the data, which is not the best option, but I wanted to try it since this is non-critical and performance is not a big issue. I used 2 files per option, one with the cache expiry date, and one with the data.
Issue: I'm not sure what the best way to do locking is, and I'm not overly familiar with threading issues in .NET (one of the reasons I chose this approach). Based on what's available, and what I read, I thought the above should work, but I'm not sure and wanted a second opinion.
Your current solution looks pretty good. The two things I would change:
1: UnlockOnValue needs to go in a finally block. If an exception is thrown, it will never release its lock.
2: LockOnValue is somewhat inefficient, since it does a dictionary lookup twice. This isn't a big deal for a small dictionary, but for a larger one you will want to switch to TryGetValue.
Also, your assumption 3 holds - at least for now. But the Dictionary contract makes no guarantee that the Keys property always returns the same object. And since it's so easy to not rely on this, I'd recommend against it. Whenever I need an object to lock on, I just create an object for that sole purpose. Something like:
private static Object _lock = new Object();
lock only has a scope of a single process. If you want to span processes you'll have to use primitives like Mutex (named).
lock is the same as Monitor.Enter and Monitor.Exit. If you also do Monitor.Enter and Monitor.Exit, it's being redundant.
You don't need to lock on read, but you do have to lock the "transaction" of checking if the value doesn't exist and adding it. If you don't lock on that series of instructions, something else could come in between when you check for the key and when you add it and add it--thus resulting in an exception. The lock you're doing is sufficient to do that (you don't need the additional calls to Enter and Exit--lock will do that for you).
Now that Stack Overflow uses redis, do they handle cache invalidation the same way? i.e. a list of identities hashed to a query string + name (I guess the name is some kind of purpose or object type name).
Perhaps they then retrieve individual items that are missing from the cache directly by id (which bypasses a bunch of database indexes and uses the more efficient clustered index instead perhaps). That'd be smart (the rehydration that Jeff mentions?).
Right now, I'm struggling to find a way to pivot all of this in a succinct way. Are there any examples of this kind of thing that I could use to help clarify my thinking prior to doing a first cut myself?
Also, I'm wondering where the cutoff is between using a .net cache (System.Runtime.Caching or System.Web.Caching) and going out and using redis. Or is Redis just hands down faster?
Here's the original SO question from 2009:
https://meta.stackexchange.com/questions/6435/how-does-stackoverflow-handle-cache-invalidation
A couple of other links:
https://meta.stackexchange.com/questions/69164/does-stackoverflow-use-caching-and-if-so-how/69172#69172
https://meta.stackexchange.com/questions/110320/stack-overflow-db-performance-and-redis-cache
I honestly can't decide if this is a SO question or a MSO question, but:
Going off to another system is never faster than querying local memory (as long as it is keyed); simple answer: we use both! So we use:
local memory
else check redis, and update local memory
else fetch from source, and update redis and local memory
This then, as you say, causes an issue of cache invalidation - although actually that isn't critical in most places. But for this - redis events (pub/sub) allow an easy way to broadcast keys that are changing to all nodes, so they can drop their local copy - meaning: next time it is needed we'll pick up the new copy from redis. Hence we broadcast the key-names that are changing against a single event channel name.
Tools: redis on ubuntu server; BookSleeve as a redis wrapper; protobuf-net and GZipStream (enabled / disabled automatically depending on size) for packaging data.
So: the redis pub/sub events are used to invalidate the cache for a given key from one node (the one that knows the state has changed) immediately (pretty much) to all nodes.
Regarding distinct processes (from comments, "do you use any kind of shared memory model for multiple distinct processes feeding off the same data?"): no, we don't do that. Each web-tier box is only really hosting one process (of any given tier), with multi-tenancy within that, so inside the same process we might have 70 sites. For legacy reasons (i.e. "it works and doesn't need fixing") we primarily use the http cache with the site-identity as part of the key.
For the few massively data-intensive parts of the system, we have mechanisms to persist to disk so that the in-memory model can be passed between successive app-domains as the web naturally recycles (or is re-deployed), but that is unrelated to redis.
Here's a related example that shows the broad flavour only of how this might work - spin up a number of instances of the following, and then type some key names in:
static class Program
{
static void Main()
{
const string channelInvalidate = "cache/invalidate";
using(var pub = new RedisConnection("127.0.0.1"))
using(var sub = new RedisSubscriberConnection("127.0.0.1"))
{
pub.Open();
sub.Open();
sub.Subscribe(channelInvalidate, (channel, data) =>
{
string key = Encoding.UTF8.GetString(data);
Console.WriteLine("Invalidated {0}", key);
});
Console.WriteLine(
"Enter a key to invalidate, or an empty line to exit");
string line;
do
{
line = Console.ReadLine();
if(!string.IsNullOrEmpty(line))
{
pub.Publish(channelInvalidate, line);
}
} while (!string.IsNullOrEmpty(line));
}
}
}
What you should see is that when you type a key-name, it is shown immediately in all the running instances, which would then dump their local copy of that key. Obviously in real use the two connections would need to be put somewhere and kept open, so would not be in using statements. We use an almost-a-singleton for this.
I am running a server, and I would like to have a users dictionary, and give each user a specific number.
Dictionary<int,ServerSideUser> users = new Dictionary<int,ServerSideUser>();
The key represents the user on the server, so when people send messages to that user, they send them to this number. I might as well have used the users IP number, but that's not that a good idea.
I need to allocate such a number for each user, and I'm really not sure how to do so. Someone suggested something like
Enumerable.Range(int.MinValue, int.MaxValue)
.Except(users.Select(x => x.Key)).First();
but I really don't think it's the optimal way.
Also, I have the same problem with a List (or LinkedList) somewhere else.
Any ideas?
If the size of the "number" doesn't matter, take a Guid, it will always be unique and non-guessable.
If you want a dictionary that uses an arbitrary, ordered integer key, you may also be able to use a List<ServerSideUser>, in which the list index serves as the key.
Is there a specific reason you need to use a Dictionary?
Using a List<> or similar data structure definitely has limitations. Because of concurrency issues, you wouldn't want to remove users from the list at all, except when cycling the server. Otherwise, you might have a scenario in which user 255 sends a message to user 1024, who disconnects and is replaced by a new user 1024. New user 1024 then receives the message intended for old user 1024.
If you want to be able to manage the memory footprint of the user list, many of the other approaches here work; Will's answer is particularly good if you want to use ints rather than Guids.
Why don't you keep track of the current maximum number and increment that number by one every time a new user is added?
Another option: Use a factory to generate ServerSideUser instances, which assigns a new, unique ID to each user.
In this example, the factory is the class itself. You cannot instantiate the class, you must get a new instance by calling the static Create method on the type. It increments the ID generator and creates a new instance with this new id. There are many ways to do this in a thread safe way, I'm doing it here in a rudimentary 1.1-compatible way (c# pseudocode that may actually compile):
public class ServerSideUser
{
// user Id
public Id {get;private set;}
// private constructor
private ServerSideUser(){}
private ServerSideUser(int id) { Id = id; }
// lock object for generating an id
private static object _idgenLock = new Object();
private static int _currentId = 0; // or whatever
// retrieves the next id; thread safe
private static int CurrentId
{
get{ lock(_idgenLock){ _currentId += 1; return _currentId; } }
}
public static ServerSideUser Create()
{
return new ServerSideUser(CurrentId);
}
}
I suggest the combination of your approach and incremental.
Since your data is in memory, it is enough to have the identifier of type int.
Make a variable for the next user and a linked list of free identifiers.
When new user is added, use an Id from the list. If the list is empty — use the variable and increment it.
When a user is removed, add its identifier to the Dictionary.
P.S. Consider using a database.
First of all, I'd also start by seconding the GUID suggestion. Secondly, I'd assume that you're persisting the user information on the server somehow, and that somehow is likely a database. If this is the case, why not let the database pick a unique ID for each user via a primary key? Maybe it's not the best choice for what you're trying to do here, but this is the kind of problem that databases have been handling for years, so, why re-invent?
I think it depends on how you define the "uniqueness" of the clients.
For example if you have different two clients from the same machine do you consider them two clients or one?
I recommend you to use long value represents the time of connection establishment like "hhmmss" or even you can include milliseconds
Why not just start from 1 and count upwards?
lock(dict) {
int newId = dict.Count + 1;
dict[newId] = new User();
}
If you're really concerned about half the worlds population turning up at your one server, try using long:s instead.. :-D
Maybe a bit brutal, but could DateTime.Now.Ticks be something for you? As an added bonus, you know when the user was added to your dict.
From the MSDN docs on Ticks...
A single tick represents one hundred nanoseconds or one ten-millionth of a
second. There are 10,000 ticks in a millisecond.
The value of this property represents the number of 100-nanosecond intervals
that have elapsed since 12:00:00 midnight, January 1, 0001, which
represents DateTime..::.MinValue.