NHibernate concurrents threaded insertions with unique constraint - c#

I have a multi-threaded application that may do some concurrent insertions of same type of object which has a property marked as unique.
public class Foo(){
...
string PropertyThatshouldBeUnique {get;set;}
...
}
Each thread has its own session and does :
Foo myFooInstance = new Foo();
myFooInstance.PropertyThatshouldBeUnique = "Bar";
myThreadSession.SaveOrUpadte(myFooInstance);
I have a unique constraint on my database table that prevents the
multiple insertions and therefore I get an exception on the second
insert which trigger the rollback of the whole transaction (which is not good)
Concurrents insertions can be really close (a few milliseconds)
I haven't configure any specific Nhibernate Concurrency strategy (not sure if this could solve my issue or which one to use)
My problem is :
How and where in the code should I check for previously inserted Foo objects with same property value ?

Can you do something like this in NHibernate and not breaking your current architecture?
if(!Update(connection))
{
using (var command = new SqlCommand(#"INSERT INTO foo VALUES (Bar, ...)", connection))
{
try
{
command.ExecuteNonQuery();
}
catch (BlahUniqueConstraintException) // dummy exception, please replace with relevant
{
// very rarely will get in here
Update(connection);
}
}
}
private bool Update(SqlConnection connection)
{
// use update to do two things at once: find out if the record exists and also ... update
using (var command = new SqlCommand(#"UPDATE foo SET ... WHERE PropertyThatshouldBeUnique = 'Bar'", connection))
{
// if the record exists and is updated then returns 1, otherwise 0
return command.ExecuteNonQuery() > 0;
}
}

This was posted long ago, but as I was finding myself in the same kind of situation let me tell you what solutions I've found.
For DB operations, instead of letting each threads doing the job via Nhibernate, they send their request to another thread which is doing a kind of a service job. It exposes some method like Insert.
Thread 1 want to insert Object1
Thread 2 want to insert Object1
They are doing something like Thread3.Insert(Object1) concurrently.
The first call Thread3 receive, it opens a Session to the DB (costless),it checks Object1 is not in the DB with a SELECT and then it adds it.
The second call Thread3 receive, even if its a nanosecond later, it does the same but stop after the SELECT and do nothing or send back an exception.
The cost of waiting for the first call to be processed is quite low, after all you limit your multi-threading speed by DB constraints that need to be checked (thus DB access)
It's like saying, hmmm I have 2 very fast PC and 1 low Server.
I think its more of a design question than Nhibernate functionality.
Synchro among threads is already complicated, but synchro of DB among threads with UnitOfWork pattern in use is a pain.
(this is one of your issue here as you dont Commit your transaction everytime you Insert)
Maybe some guru here at SO have better ideas though.

Related

How to avoid block of code to be accessed until it is finished processing

I am using Cosmos DB and wanted to implement a user friendly id for displaying in UI for end user. For this what I am doing is taking max ID existing in DB and adding 1 to it .
The problem that I am facing over here is when multiple user hit this function the max id returned is same and the new ID generated is getting duplicated.
how can I make sure that a certain block of code is only executed one at a time.
I tried SemaphoreSlim , didn't help.
I am expecting to generate auto incremented ID without any duplication .
Cosmos DB does not provide an auto-increment id feature because it is impossible to do so with a database like this one while maintaining scalability. The first comment to your question is the best one, just don't use them. Anything you implement to do this will always be a performance bottleneck and your database will never scale. Also, if you try to scale out and use multiple SDK instances, any mutex you implement would now force you to re-implement as a distributed mutex across multiple compute instances making performance even worse.
If the container is designed to be a key-value store, find one or more properties that guarantee uniqueness, then use that as your id (and partition key as well). This can still provide the user some amount of human readability while providing the uniqueness you are looking for.
What about if you use lock ?
Declare:
private readonly object _threadLock = new object();
private int counter;
Use it:
lock (_threadLock)
{
// put your thread safe code here.
counter++;
}
What I did is created a function that gets the max MeaningfulID from DB , and the next time i pass the insert operation , I add 1 to it.
My mistake was I instantiated semaphore at local scope. Now that it is mate static and at class level , even if multiple user access it, it waits until previous thread is completed.

Global lock and wait if something true

I have an issue where my API receives a ColumnId from outside (its purpose is to do some updates in the database). However, if two requests with the same Id try to access it, I receive an error because two transactions can't be executed over the same row.
Since I still need to execute both nonetheless, is there a way to make a Singleton or a static class that will handle these HTTP requests so that if two requests with the same ColumnId get sent to the API, the API executes the first one and then executes the second one?
public MyClass DoStuff(MyClass2 obj, HttpRequestMessage request)
{
MyClass test = new MyClass();
// .Create() creates a session with the database
using (var sc = _sessionManager.Create())
{
try
{
var anotherObj = _repository.FindForUpdate(obj.Id);
//modify anotherObj, save it to the database and set some values for `test` based on anotherObj
}
catch
{
sc.Rollback();
}
}
return test;
}
FindForUpdate executes a query similar to this:
SELECT * FROM table WHERE id = #Id FOR UPDATE
The best I can think of is to have a singleton (as stated above) that will queue and lock the using statement in DoStuff if the Id is the same, but I don't know how to do it.
It should be quite straightforward to implement a global lock either in a static class or in a class defined with a singleton lifetime in your IoC container. You could use the lock keyword for this, or one of the many other synchronization primitives offered by .Net such as the SemaphoreSlim class.
However, as pointed out by John, this scales poorly to multiple web servers, and doesn't leverage the concurrency mechanisms offered by the database. It's hard to give concrete recommendations without knowing the specifics of your database platform and data access framework, but you should probably look into either using FOR UPDATE WAIT if your database supports it, or just an optimistic concurrency mechanism with some retry logic in your application for reapplying the update after waiting a short while.
Ideally, you will also want to change any long-running blocking operations in your application to use async/await, so that the web server thread is released back to the threadpool for serving other requests.

What's the difference between Entity Framework (6) transactions with single and multiple SaveChanges() calls

I want to know what are the practical differences of executing a transaction in the same database context between these 3 ways:
1) Multiple operations with one single SaveChanges(), without explicitly using a sql transaction
using (TestDbContext db = new TestDbContext())
{
// first operation
// second operation
db.SaveChanges();
}
2) Multiple operations with one single SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.Database.BeginTransaction())
{
// operation 1
// operation 2
db.SaveChanges();
trans.commit();
}
3) Multiple operations with multiple SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.BeginTransaction())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
trans.commit();
}
In (2) and (3), if commit() is supposed to actually execute requested sql queries to database, is it really different, say, save changes for each operation or save changes for all operation at once?
And if (1) can also allow multiple operations to be safely executed in the same database context so what's the main use of manually starting a transaction? I'd say we can manually provide try/catch block to roll back the transaction if something bad happens, but AFAIK, SaveChanges() also covers it, automatically, at least with SQLServer.
** UPDATED: Another thing is: Should I make db context and transaction variables class-level or these should be local to containing methods only?
If you do not start a transaction, it is implicit. Meaning, all SaveChanges() you perform will be available in the database immediately after the call.
If you start a transaction, SaveChanges() still performs the updates, but the data is not available to other connections until a commit is called.
You can test this yourself by setting break points, creating new objects, adding them to the context, and performing a SaveChanges(). You will see the ID property will have a value after that call, but there will be no corresponding row in the database until you perform a commit on the transaction.
As far as your second question goes, it really depends on concurrency needs, what your class is doing and how much data you're working with. It's not so much a scoping issue as it is a code execution issue.
Contexts are not thread safe, so as long as you only have one thread in your application access the context, you can make it at a broader scope. But then, if other instances of the application are accessing the data, you're going to have to make sure you refresh the data to the latest model. You also should consider that the more of the model you have loaded into memory, the slower saves are going to be over time.
I tend to create my contexts as close to the operations that are to be performed as possible, and dispose them soon after.
Your question doesn't really seem to be about entity framework at all, and is more regarding sql transactions. A sql transaction is a single 'atomic' change. That is to say that either all the changes are committed, or none are committed.
You don't really have an example which covers the scenario, but if you added another example like:
using (TestDbContext db = new TestDbContext())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
}
...in this example, if your first operation saved successfully, but the second operation failed, you could have a situation where data committed at the first step is potentially invalid.
That's why you would use a sql transaction, to wrap both SaveChanges into a single operation that means either all data is committed, or none is committed.

How can I queue MySQL queries to make them sequential rather than concurrent and prevent excessive I/O usage?

I have multiple independent processes each submitting bulk insert queries (millions of rows) into a MySQL database, but it is much slower to have them run concurrently than in sequence.
How can I throttle the execution of these queries so that only one at a time can be executed, in sequence?
I have thought of checking if PROCESSLIST contains any running queries but it may not be the best way to properly queue queries on a real first-come, first-queued, first-served basis.
I am using C# and the MySQL Connector for .NET.
I'm guessing that you're using InnoDb (which allows you to do concurrent writes). MyISAM only has table level locking so would queue up the writes.
I'd recommend an approach similar to ruakh's but that you use a table in the database to manage the locking with. The table would be called something like lock_control
Just before you try to do a bulk insert to the table you request a LOCK TABLES lock_control WRITE on this lock_control table. If you are granted the lock then continue with your bulk write and afterwards release the lock. If the table is write locked by another thread then the LOCK TABLES command will block until the lock is released.
You could do this locking with the table you're inserting into directly but I believe that no other thread would be able to read from the table either whilst you hold the lock.
The advantage over doing this locking in the db rather than on the filesystem is that you could have inserts coming in from multiple client machines and it somehow feels a little simpler to handle the locking/inserting all within MySQL.
I am not sure if this will help but here goes. I had a similar problem where my program was throwing exceptions as MySql queries were out of order. So, I decided to run my queries in sequence so succeeding queries don't fail. I found a solution here.
https://michaelscodingspot.com/c-job-queues/ -
public class job_queue
{
private ConcurrentQueue<Action> _jobs = new ConcurrentQueue<Action>();
private bool _delegateQueuedOrRunning = false;
public void Enqueue(Action job)
{
lock (_jobs)
{
_jobs.Enqueue(job);
if (!_delegateQueuedOrRunning)
{
_delegateQueuedOrRunning = true;
ThreadPool.UnsafeQueueUserWorkItem(ProcessQueuedItems, null);
}
}
}
private void ProcessQueuedItems(object ignored)
{
while (true)
{
Action item;
lock (_jobs)
{
if (_jobs.Count == 0)
{
_delegateQueuedOrRunning = false;
break;
}
_jobs.TryDequeue(out item);
}
try
{
//do job
item();
}
catch
{
ThreadPool.UnsafeQueueUserWorkItem(ProcessQueuedItems, null);
throw;
}
}
}
}
This is a class to run methods one after another in queue. And you add methods that contain Mysql Queries to job_queue by .
var mysql_tasks= new job_queue();
mysql_tasks.Enqueue(() => { Your_MYSQL_METHOD_HERE(); });
Create a Bulk insert service / class
Get the client to throw the Data at it.
It does them one at a time. Message back done if you need it.
You don't want to be choking your DB to one thread, that will kill everything else.
Not being much of a C#-er, I can't say if this is the best way to do this; but if no one gives a better answer, one common, non-language-specific approach to this sort of thing is to use a temporary file on the file-system. Before performing one of these INSERTs, grab a write-lock on the file, and after the INSERT is done, release the write-lock. (You'll want to use a using or finally block for this.) This answer gives sample code for obtaining a write-lock in C# in a blocking way.

Threading: allow one thread to access data while blocking others, and then stop blocked threads from executing the same code

imagine the simplest DB access code with some in-memory caching -
if exists in cache
return object
else
get from DB
add to cache
return object
Now, if the DB access takes a second and I have, say, 5 ASP.Net requests/threads hitting that same code within that second, how can I ensure only the first one does the DB call? I have a simple thread lock around it, but that simply queues them up in an orderly fashion, allowing each to call the DB in turn. My data repositories basically read in entire tables in one go, so we're not talking about Get by Id data requests.
Any ideas on how I can do this? Thread wait handles sound almost what I'm after but I can't figure out how to code it.
Surely this must be a common scenario?
Existing pseudocode:
lock (threadLock)
{
get collection of entities using Fluent NHib
add collection to cache
}
Thanks,
Col
You've basically answered your own question. The "lock()" is fine, it prevents the other threads proceeding into that code while any other thread is in there. Then, inside the lock perform your first pseudo-code. Check if it's cached already, if not, retrieve the value and cache it. The next thread will then come in, check the cache, find it's available and use that.
Surely this must be a common scenario?
Not necessarily as common as you may think.
In many similar caching scenarios:
the race condition you describe doesn't happen frequently (it requires multiple requests to arrive when the cache is cold)
the data returned from the database is readonly, and data returned by multiple requests is essentially interchangeable.
the cost of retrieving the database is not so prohibitive that it matters.
But if in scenario you absolutely need to prevent this race condition, then use a lock as suggested by Roger Perkins.
I'd use Monitor/Mutext over lock. Using lock u need to specify a resource (may also use this-pointer, which is not recommended).
try the following instead:
Mutext myMutex = new Mutex();
// if u want it systemwide use a named mutex
// Mutext myMutex = new Mutex("SomeUniqueName");
mutex.WaitOne();
// or
//if(mutex.WaitOne(<ms>))
//{
// //thread has access
//}
//else
//{
// //thread has no access
//}
<INSERT CODE HERE>
mutex.ReleaseMutex();
I don't know general solution or established algorithm is exist.
I personally use below code pattern to solve problem like this.
1) Define a integer variable that can be accessed by all thread.
int accessTicket = 0;
2) Modify code block
int myTicket = accessTicket;
lock (threadLock)
{
if (myTicket == accessTicket)
{
++accessTicket;
//get collection of entities using Fluent NHib
//add collection to cache
}
}
UPDATE
Purpose of this code is not prevent multiple DB access of duplicate caching. We can do it with normal thread lock.
By using the access ticket like this we can prevent other thread doing again already finished work.
UPDATE#2
LOOK THERE IS lock (threadLock)
Look before comment.
Look carefully before vote down.

Categories

Resources