I just read this SO question Which design is most preferable: test-create, try-create, create-catch?
Regarding to the answer, seems devs prefer "Try-Create" pattern, and some of them mentioned the TryCreate(user, out resultCode) can be a threadsafe, but other patterns not.
Try-Create
enum CreateUserResultCode
{
Success,
UserAlreadyExists,
UsernameAlreadyExists
}
if (!TryCreate(user, out resultCode))
{
switch(resultCode)
{
case UserAlreadyExists: act on user exists error;
case UsernameAlreadyExists: act on username exists error;
}
}
I am thinking if the tryCreate method involves multiple db calls what is the proper way to make it thread-safe in a real practice?
Say tryCreate will do 2 things:
Check if the user name existing in db
If the name not existing then create a new one
It is very possible a thread finishes 1 but not 2, another thread start to invoke this method tryCreate and also finishes 1, which is a race-condition.
Surely, in tryCreate I can add a lock or sth, but it will make the tryCreate a hot point. If I have a high-profile website, all new registers have to wait for the lock in tryCreate.
What I see from other websites is when you type in your username,it will trigger a ajax call to check if it is existing in current db, then you go to next step to create it.(Not sure if there is a lock been created at this moment.)
Any thoughts about how to implement a proper safe tryCreate involving multiple db calls in a real life?
Updates 1:
The logic of TryCreate can be very complicated and not just 2 db calls
Thread-safety concept doesn't spawn to the databases. So whatever you do on the client side has no effect on the database side, purely because by design database supports multiple concurrent connection from many clients.
Instead use transactions to perform several actions as atomic operation on the database end.
Related
I am using Cosmos DB and wanted to implement a user friendly id for displaying in UI for end user. For this what I am doing is taking max ID existing in DB and adding 1 to it .
The problem that I am facing over here is when multiple user hit this function the max id returned is same and the new ID generated is getting duplicated.
how can I make sure that a certain block of code is only executed one at a time.
I tried SemaphoreSlim , didn't help.
I am expecting to generate auto incremented ID without any duplication .
Cosmos DB does not provide an auto-increment id feature because it is impossible to do so with a database like this one while maintaining scalability. The first comment to your question is the best one, just don't use them. Anything you implement to do this will always be a performance bottleneck and your database will never scale. Also, if you try to scale out and use multiple SDK instances, any mutex you implement would now force you to re-implement as a distributed mutex across multiple compute instances making performance even worse.
If the container is designed to be a key-value store, find one or more properties that guarantee uniqueness, then use that as your id (and partition key as well). This can still provide the user some amount of human readability while providing the uniqueness you are looking for.
What about if you use lock ?
Declare:
private readonly object _threadLock = new object();
private int counter;
Use it:
lock (_threadLock)
{
// put your thread safe code here.
counter++;
}
What I did is created a function that gets the max MeaningfulID from DB , and the next time i pass the insert operation , I add 1 to it.
My mistake was I instantiated semaphore at local scope. Now that it is mate static and at class level , even if multiple user access it, it waits until previous thread is completed.
I have an issue where my API receives a ColumnId from outside (its purpose is to do some updates in the database). However, if two requests with the same Id try to access it, I receive an error because two transactions can't be executed over the same row.
Since I still need to execute both nonetheless, is there a way to make a Singleton or a static class that will handle these HTTP requests so that if two requests with the same ColumnId get sent to the API, the API executes the first one and then executes the second one?
public MyClass DoStuff(MyClass2 obj, HttpRequestMessage request)
{
MyClass test = new MyClass();
// .Create() creates a session with the database
using (var sc = _sessionManager.Create())
{
try
{
var anotherObj = _repository.FindForUpdate(obj.Id);
//modify anotherObj, save it to the database and set some values for `test` based on anotherObj
}
catch
{
sc.Rollback();
}
}
return test;
}
FindForUpdate executes a query similar to this:
SELECT * FROM table WHERE id = #Id FOR UPDATE
The best I can think of is to have a singleton (as stated above) that will queue and lock the using statement in DoStuff if the Id is the same, but I don't know how to do it.
It should be quite straightforward to implement a global lock either in a static class or in a class defined with a singleton lifetime in your IoC container. You could use the lock keyword for this, or one of the many other synchronization primitives offered by .Net such as the SemaphoreSlim class.
However, as pointed out by John, this scales poorly to multiple web servers, and doesn't leverage the concurrency mechanisms offered by the database. It's hard to give concrete recommendations without knowing the specifics of your database platform and data access framework, but you should probably look into either using FOR UPDATE WAIT if your database supports it, or just an optimistic concurrency mechanism with some retry logic in your application for reapplying the update after waiting a short while.
Ideally, you will also want to change any long-running blocking operations in your application to use async/await, so that the web server thread is released back to the threadpool for serving other requests.
I have a simple WebApi controller method that's purpose is to trigger some background processing and return a 202 Accepted response immediately (without necessarily having completed the background processing as is consistent with a 202 response.)
public async Task<IHttpActionResult> DoSomething(string id)
{
HostingEnvironment.QueueBackgroundWorkItem(async ct =>
{
//Do work
}
return ResponseMessage(new HttpResponseMessage(HttpStatusCode.Accepted));
}
However, I want to be able to prevent multiple requests to the same endpoint with the same id from triggering multiple instances of the same background processing simultaneously.
Effectively, if two requests with the same id's were to be made at the same time (or near enough), the first one would do the processing for that id and the second one would be able to identify this and take action accordingly and not duplicate the work that's already being done.
I'd like to avoid databases/persistent storage if at all possible and I'm aware of the risks of running async tasks within an IIS worker process - for the sake of the argument, the background processing is not critical.
How can I go about doing this? Any help would be much appreciated.
You'll need some kind of storage shared between all your possible workers. This could be, for example:
A static variable. Probably the easiest to implement, but has limitations when the application does not run in only one AppDomain (especially important if you want to scale). So probably not the best way
An SQL Database: Probably the most common one. If your application already uses one, I'd say go for this route.
A No-SQL database, for example a key-value store. Might be an alternative if your application does not use a SQL database yet.
Some external component such a workflow management tool
EDIT:
Example using ConcurrentDictionary (per request of the thread starter - I stil think using a (possibly NoSQL) database would be the most elegant way) - actually you could just put Tasks into the dictionary:
private ConcurrentDictionary<string, Task<SomeType>> _cache = new //...
var task = _cache.GetOrAdd("<Key>", key => Task.Run(() => /*do some work*/));
if (task.IsCompleted)
/*result ready*/;
else
/*have to wait*/;
An API pattern we are considering for separating the work of calculating some results from the committing of those results is:
interface IResults { }
class Results : IResults { }
Task<IResults> CalculateResultsAsync(CancellationToken ct)
{
return Task.Run<IResults>(() => new Results(), ct);
}
void CommitResults(IResults iresults)
{
Results results = (Results)iresults;
// Commit the results
}
This would allow a client to have a UI that kicked off the calculation of some results and know when the calculation was ready, and then at that time decide whether or not to commit the results. This is mainly to help us deal with the case where during the calculation, the UI will allow the user to cancel the operation. We want to ensure that:
The cancel UI is only shown while the action is still cancellable (i.e once we're in CommitResults, there is no going back), so once the CalculateResultsAsync task completes, we take down the cancel UI and as long as the user hasn't cancelled, go ahead and call the commit method.
We don't want to have a case (i.e. a race condition) where the user hits cancel and the results are committed anyways.
The client will never make use of IResults other than to pass it back to CommitResults.
Question:
The general question is: is this a good approach? Specifically:
It doesn't feel right having this split into two methods since the client is never inspecting IResults, they are just handing it back to the Commit method.
Is there a standard approach to this problem?
This is a very standard pattern (if not the ideal pattern), especially when your Results object is immutable. We do this regularly in TPL-using code inside the Visual Studio codebase. Much happiness always exists when your asynchronous/parallel logic is processing data, and the mutating crap lives apart from that.
If you're familiar with or have heard of the "Roslyn" project, this is a pattern we're actually encouraging people to use. The idea is refactorings can process asynchronously in the background and produce an object just like your result one that represents the result of the refactoring being applied. Then, on the UI thread anybody can take one of those result objects and apply it, which goes and updates all your files to contain the new text.
I do find the entire IResults/Results thing a bit strange -- it's not clear if you're using this to hide implementations from yourself or not. If the empty interface and the cast bugs you, you could consider adding to IResult a commit method, which the result object implements. Up to you.
I'm not sure why exactly would you need this pattern. To me, it seems that if you check the CancellationToken just before starting the commit, you're going to get exactly the same result, with simpler interface.
imagine the simplest DB access code with some in-memory caching -
if exists in cache
return object
else
get from DB
add to cache
return object
Now, if the DB access takes a second and I have, say, 5 ASP.Net requests/threads hitting that same code within that second, how can I ensure only the first one does the DB call? I have a simple thread lock around it, but that simply queues them up in an orderly fashion, allowing each to call the DB in turn. My data repositories basically read in entire tables in one go, so we're not talking about Get by Id data requests.
Any ideas on how I can do this? Thread wait handles sound almost what I'm after but I can't figure out how to code it.
Surely this must be a common scenario?
Existing pseudocode:
lock (threadLock)
{
get collection of entities using Fluent NHib
add collection to cache
}
Thanks,
Col
You've basically answered your own question. The "lock()" is fine, it prevents the other threads proceeding into that code while any other thread is in there. Then, inside the lock perform your first pseudo-code. Check if it's cached already, if not, retrieve the value and cache it. The next thread will then come in, check the cache, find it's available and use that.
Surely this must be a common scenario?
Not necessarily as common as you may think.
In many similar caching scenarios:
the race condition you describe doesn't happen frequently (it requires multiple requests to arrive when the cache is cold)
the data returned from the database is readonly, and data returned by multiple requests is essentially interchangeable.
the cost of retrieving the database is not so prohibitive that it matters.
But if in scenario you absolutely need to prevent this race condition, then use a lock as suggested by Roger Perkins.
I'd use Monitor/Mutext over lock. Using lock u need to specify a resource (may also use this-pointer, which is not recommended).
try the following instead:
Mutext myMutex = new Mutex();
// if u want it systemwide use a named mutex
// Mutext myMutex = new Mutex("SomeUniqueName");
mutex.WaitOne();
// or
//if(mutex.WaitOne(<ms>))
//{
// //thread has access
//}
//else
//{
// //thread has no access
//}
<INSERT CODE HERE>
mutex.ReleaseMutex();
I don't know general solution or established algorithm is exist.
I personally use below code pattern to solve problem like this.
1) Define a integer variable that can be accessed by all thread.
int accessTicket = 0;
2) Modify code block
int myTicket = accessTicket;
lock (threadLock)
{
if (myTicket == accessTicket)
{
++accessTicket;
//get collection of entities using Fluent NHib
//add collection to cache
}
}
UPDATE
Purpose of this code is not prevent multiple DB access of duplicate caching. We can do it with normal thread lock.
By using the access ticket like this we can prevent other thread doing again already finished work.
UPDATE#2
LOOK THERE IS lock (threadLock)
Look before comment.
Look carefully before vote down.