We are struggling with the duplicate documents getting created due to race condition. We process events and we either create or update the document. We noticed that we are creating duplicate documents if we get two events within few milliseconds. The first event should result into new document and the second one should be an update.
Here is the logic that we have in the stored prod.
Look for an existing document with the specific Id and status
Create a new document or update an existing document if it exist.
If create, we do a select one more time to check if we have only one document with the combination of id and status. If more than 1, rollback. In case of update, we rely on the Etag.
We are good with the update but create is giving us hard time. Let me know if there is a way we can fix it.
Deduplicate key is the combination of external id and status. We have an existing database and we want to avoid any change that requires creating a new database.
Thanks,
Rohit
Define a unique key. CosmosDB will prevent the insertion of duplicate keys that are designated unique. You can then catch the exception and perform your update logic.
Edit based on feedback
I'm assuming you're in an environment where more than one thread or process is executing this logic. You're going to need a critical section (a lock) when you try to process each document. When it comes time to interact with CosmosDB, you'll need to acquire a lock on the id of the document you're going to insert/update. You can then check to see if the document exists, and do your insert or update based on the result. Then you'll exit the critical section by releasing the lock.
Depending on what technologies you're using will dictate what is available for you. If it's a single instance of an Azure Function, you can use something like a static ThreadSafeDictionary for locking. If it's multiple Azure Functions or Web Apps, you'll need a distributed lock. There are several ways to do this, such as Azure Blob Leases.
I am unaware of any type of synchronization functionality available OOTB in CosmosDB.
Related
I have a WebApi Async controller method that calls another async method that first does a null check to see if a record exists, and if it doesn't add it to database. Problem is if I have say 3 requests come in at the same time all the null checks happen at once in various threads (i'm assuming) and I will get 2 duplicate entries. For example:
public async void DoSomething()
{
var record = {query that returns record or null}
if (record == null)
{
AddNewRecordToDatabase();
}
}
... This seems like a very common thing and maybe I'm missing something, but how do I prevent this from happening? I have to purposely try to get it to create duplicates of course, but it is a requirement to not allow it to do so.
Thanks in advance,
Lee
I would solve this by putting unique constraints in the data layer. Assuming your data source is sql, you can put a unique constraint across the columns you are querying by with "query that returns record or null" and it will prevent these duplicates. The problem with using a lock or a mutex, is that it doesn't scale across multiple instances of the service. You should be able to deploy many instances of your service (to different machines), have any of those instances handle requests, and still have consistent behavior. A mutex or lock isn't going to protect you from this concurrency issue in this situation.
I prevent this from happening with async calls by calling a stored procedure instead.
The stored procedure then makes the check, via a "On duplicate key detection" or a similar query for MSSQL db.
That way, it's merely the order of the async calls that gets to determine which is a create, and which is not.
There are several answers to this, depending on the details and what your team is comfortable with.
The best and most performant answer it to modify your c# code so that instead of calling a CRUD database operation it calls a stored procedure that you write. The stored procedure would check for existence and insert or update only as needed. The specifics are completely under your control, since you write the code.
If you want to stick with ordinary CRUD operations, you can force the database to serialize the requests one after the other by wrapping them in a transaction and using a strict transaction isolation level. On SQL Server you'd want to use serializable. This will prevent any transaction from altering the state of the table in the short time between the part where you check for existence and when you insert the record. See this article for a list of transaction isolation levels and how to apply them in c# code. If you do this there is a risk of deadlock, so you'll need to catch and swallow those specific errors.
If your only need it to ensure uniqueness, and the new record has a natural (not surrogate) key, you can add a uniqueness constraint on the key, which will prevent the second insert from succeeding. This solution doesn't work so well with surrogate keys; it doesn't really solve the problem, it just relocates it to the surrogate key generation process. But if you have a decent natural key, this is very easy to implement.
I'm using Asp.NET with a MySql database.
Application flow:
Order created in Woocommerce and sent to my app
My app translated the woo order object to an object to add to an external ERP system
Order created in external ERP system and we update a local database with that order info to know that the creation was successful
I have a critical section of code that creates an order on an external ERP resource. Multiple requests for the same order can be running at the same time because they are created from an external application (woocommerce) that I can't control. So the critical section of code must only allow one of the requests to enter at a time otherwise duplicate orders can be created.
Important note: the application is hosted on Elastic Beanstalk which has a load balancer so the application can scale across multiple servers, which makes a standard C# lock object not work.
I would like to create a lock that can be shared across multiple servers/application instances so that only one server can acquire the lock and enter the critical section of code at a time. I can't find how to do this using MySql and C# so if anyone has an example that would be great.
Below is how I'm doing my single instance thread safe locking. How can I convert this to be safe across multiple instances:
SalesOrder newOrder = new SalesOrder(); //the external order object
var databaseOrder = new SalesOrderEntity(); //local MySql database object
/*
* Make this section thread safe so multiple threads can't try to create
* orders at the same time
*/
lock (orderLock)
{
//check if the order is already locked or created.
//wooOrder comes from external order creation application (WooCommerce)
databaseOrder = GetSalesOrderMySqlDatabase(wooOrder.id.ToString(), originStore);
if (databaseOrder.OrderNbr != null)
{
//the order is already created externally because it has an order number
return 1;
}
if (databaseOrder.Locked)
{
//the order is currently locked and being created
return 2;
}
//the order is not locked so lock it before we attempt to create externally
databaseOrder.Locked = true;
UpdateSalesOrderDatabase(databaseOrder);
//Create a sales order in external system with the specified values
newOrder = (SalesOrder) client.Put(orderToBeCreated);
//Update the order in our own database so we know it's created in external ERP system
UpdateExternalSalesOrderToDatabase(newOrder);
}
Let me know if further detail is required.
You can use MySQL's named advisory lock function GET_LOCK(name) for this.
This works outside of transaction scope, so you an commit or rollback database changes before you release your lock. Read more about it here: https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock
You could also use some other dedicated kind of lock service. You can do this with a shared message queue service, for example. See https://softwareengineering.stackexchange.com/questions/127065/looking-for-a-distributed-locking-pattern
You need to use a MySQL DBMS transaction lock for this.
You don't show your DBMS queries directly, so I can't guess them. Still you need this sort of series of queries.
START TRANSACTION;
SELECT col, col, col FROM wooTable WHERE id = <<<wooOrderId>>> FOR UPDATE;
/* do whatever you need to do */
COMMIT;
If the same <<<wooOrderID>>> row gets hit with the same sequence of queries from another instance of your web server running on another ELB server, that one's SELECT ... FOR UPDATE query will wait until the first one does the commit.
Notice that intra-server multithreading and critical section locking is neither necessary nor sufficient to solve your problem. Why?
It's unnecessary because database connections are not thread safe in the first place.
It's insufficient because you want a database-level transaction for this, not a process-level lock.
You should use Transaction, which is a unit of work in database. It's making your code not only atomic but also it'll be thread-safe. Here is a sample adopted from mysql official website
The code you need:
START TRANSACTION
COMMIT // if your transaction worked
ROLLBACK // in case of failure
Also I highly recommend you to read about Transaction isolation levels:
Mysql Transaction Isolation Levels
If you use the Transaction as I wrote above, you have a lock on your table, which prevents other queries, e.g. select queries, to execute and they will be waiting for the transaction to end. It's called "Server blocking", in order to prevent that just read the link intensively.
I don't think there's any nice solution for this using a database, unless everything can be done neatly in a stored procedure like another answer suggested. For anything else I would look at a message queueing solution with multiple writers and a single reader.
This might be a very dumb question, but as the saying goes, "The only dumb question is the one you don't ask"...
I've got a SQL Server 2008 database and I want to lock a record for editing. However, another user might want to see information in that record at the same time. So, I want the first person in to be able to lock the record in the sense that they are the only ones who can edit it. However, I still want other users to see the data if they want to.
This is all done from a C# front end as it's gonna be on our Intranet.
Don't do your own locking - let SQL Server handle it on its own.
As long as you only SELECT, you'll put what's called a shared lock on a row - other users who want to also read that row can do so.
Only when your code goes to update the row, it will place an exclusive lock on the row in order to be able to update it. During that period of time, no other users can read that one single row you're updating - until you commit your transaction.
To expand on Marc_s's answer, the reader can also use the
set transaction isolation Level read uncommitted
statement as described here to force reads to ignore any locks (with the notable exception of any Sch-M, schema modification, locks) that may exist. This is also a useful setting for reports that do not require absolute reproducibility, as it can significantly enance performance of those reports.for
In addition to the existing answers: You can enable snapshot isolation. That gives your transaction a point-in-time snapshot of the database for reads. This transaction will not take locks on data at all. It will not block.
I have a running order for 2 handlers Deleting and Reordering pictures and would like some advises for the best solution.
On the UI some pictures are deleted, the user clicks on the deleted button. The whole flow, delete command up to an event handler which actually deletes the physical files is started.
Then immediately the user sorts the remaining pictures. A new flow from reorder command up to the reordering event handler for the file system fires again.
Already there is a concurrency problem. The reordering cannot be correctly applied without having the deletion done. At the moment this problem is handled with some sort of lock. A temp file is created and then deleted at the end of the deletion flow. While that file exists the other thread (reordering or deletion depending on the user actions) awaits.
This is not an ideal solution and would like to change it.
The potential solution must be also pretty fast (off course the current one is not a fast one) as the UI is updated thru a JSON call at the end of ordering.
In a later implementation we are thinking to use a queue of events but for the moment we are pretty stuck.
Any idea would be appreciated!
Thank you, mosu'!
Edit:
Other eventual consistency problems that we had were solved by using a Javascript data manager on the client side. Basically being optimist and tricking the user! :)
I'm starting to believe this is the way to go here as well. But then how would I know when is the data changed in the file system?
Max suggestions are very welcomed and normally they apply.
It is hard sometimes to explain all the details of an implementation but there is a detail that should be mentioned:
The way we store the pictures means that when reordered all pictures paths (and thus all links) change.
A colleague hat the very good idea of simply remove this part. That means that even if the order will change the path of the picture will remain the same. On the UI side there will be a mapping between the picture index in the display order and its path and this means there is no need to change the file system anymore, except when deleting.
As we want to be as permissive as possible with our users this is the best solution for us.
I think, in general, it is also a good approach when there appears to be a concurrency issue. Can the concurrency be removed?
Here is one thought on this.
What exactly you are reordering? Pictures? Based on, say, date.
Why there is command for this? The result of this command going to be seen by everyone or just this particular user?
I can only guess, but it looks like you've got a presentation question here. There is no need to store pictures in some order on the write side, it's just a list of names and links to the file storage. What you should do is to store just a little field somewhere in the user settings or collection settings: Date ascending or Name descending. So you command Reorder should change only this little field. Then when you are loading the gallery this field should be read first and based on this you should load one or another view. Since the store is cheap nowadays, you can store differently sorted collections on the read side for every sort params you need.
To sum up, Delete command is changing the collection on the write side, but Reoder command is just user or collection setting. Hence, there is no concurrency here.
Update
Based on your comments and clarifications.
Of course, you can and, probably, should restrict user actions only by one at the time. If time of deletion and reordering is reasonably short. It's always a question of type of user experience you are asked to achieve. Take a usual example of ordering system. After an order placed, user can almost immediately see it in the UI and the status will be something like InProcess. Most likely you won't let user to change the order in any way, which means you are not going to show any user controls like Cancel button(of course this is just an example). Hence, you can use this approach here.
If 2 users can modify the same physical collection, you have no choice here - you are working with shared data and there should be kind of synchronization. For instance, if you are using sagas, there can be a couple of sagas: Collection reordering saga and Deletion saga - they can cooperate. Deletion process started first - collection aggregate was marked as deletion in progress and then right after this reordering saga started, it will attempt to start the reordering process, but since deletion saga is inprocess, it should wait for DeletedEvent and continue the process afterwards.The same if Reordering operation started first - the Deletion saga should wait until some event and continue after that event arrived.
Update
Ok, if we agreed not touch the file system itself, but the aggregate which represents the picture collection. The most important concurrency issues can be solved with optimistic concurrency approach - in the data storage a unique constraint, based on aggregate id and aggregate version, is usually used.
Here are the typical steps in the command handler:
This is the common sequence of steps a command handler follows:
Validate the command on its own merits.
Load the aggregate.
Validate the command on the current state of the aggregate.
Create a new event, apply the event to the aggregate in memory.
Attempt to persist the aggregate. If there's a concurrency conflict during this step, either give up, or retry things from step 2.
Here is the link which helped me a lot some time ago: http://www.cqrs.nu/
I'm sure that this question has already been asked, but I don't really see it.
Using asp.net and C#, how does one track the pages that are open/closed?
I have tried all sorts of things, including:
modifying the global.asax file application/session start/end operations
setting a page's destructor to report back to the application
static variables (which persist globally rather than on a session by session basis)
Javascript window.onload and window.onbeforeunload event handlers
It's been educational, but so far no real solution has emerged.
The reason I want to do this is to prevent multiple users from modifying the same table at the same time. That is, I have a list of links to tables, and when a user clicks to modify a table, I would like to set that link to be locked so that NO USER can then modify that table. If the user closes the table modification page, I have no way to unlock the link to that table.
You should not worry about tracking pages open or closed. Once a webpage is rendered by IIS it's as good as "closed".
What you need to do is protect from two users updating your table at the same time by using locks...For example:
using (Mutex m = new Mutex(false, "Global\\TheNameOfTheMutex"))
{
// If you want to wait for 5 seconds for other page to finish,
// you can do m.WaitOne(TimeSpan.FromSeconds(5),false)
if (!m.WaitOne(TimeSpan.Zero, false))
Response.Write("Another Page is updating database.");
else
UpdateDatabase();
}
What this above snippet does is, it will not allow any other webpage to call on the UpdateDatabase method while another page is already runing the UpdateDatabase call.So no two pages can call updatedatabase at the same exact time.
But this does not protect the second user from running UpdateDatabase AFTER the first call has finished, so you need to make sure your UpdateDatabase method has proper checks in place ie. it does not allow stale data to be updated.
I think your going the wrong way about this...
You really should be handling your concurrency via your business layer / db and not relying on the interface because people can and will find a way around whatever you implement.
I would recommend storing a 'key' in your served up page everytime you serve up a page that can modify the table. The key is like a versioning stamp of the last time the table was updated. Send this key along with your update and validate that they match before doing the update. If they don't then you know someone else came along and modified that table and you should inform the user that there was a concurrency conflict, the data has changed, and do they want to see the new data.
You should not use page requests to lock database tables. That won't work well for many reasons. Each request creates a new page object, and there are multiple application contexts, which may be on multiple threads/processes, etc. Any of which may drop off the face of the earth at any point in time.
The highest level of tackling this issue is to find out why you need to lock the tables in the first place. One common way to avoid this is to accept all user table modifications and allow the users to resolve their conflicts.
If locking is absolutely necessary, you may do well with a lock table that is modified before and after changes. This table should have a way of expiring locks when users walk away without doing so.
Eg. See http://www.webcheatsheet.com/php/record_locking_in_web_applications.php It's for PHP but the concept is the same.