We have a table with a key field, and another table which contains the current value of that key sequence, ie, to insert a new record you need to:
UPDATE seq SET key = key + 1
SELECT key FROM seq
INSERT INTO table (id...) VALUES (#key...)
Today I have been investigating collisions, and have found that without using transactions the above code run in parallel induces collisions, however, swapping the UPDATE and SELECT lines does not induce collisions, ie:
SELECT key + 1 FROM seq
UPDATE seq SET key = key + 1
INSERT INTO table (id...) VALUES (#key...)
Can anyone explain why? (I am not interested in better ways to do this, I am going to use transactions, and I cannot change the database design, I am just interested in why we observed what we did.)
I am running the two lines of SQL as a single string using C#'s SqlConnection, SqlCommand and SqlDataAdapter.
First off, your queries do not entirely make sense. Here's what I presume you are actually doing:
UPDATE seq SET key = key + 1
SELECT #key = key FROM seq
INSERT INTO table (id...) VALUES (#key...)
and
SELECT #key = key + 1 FROM seq
UPDATE seq SET key = #key
INSERT INTO table (id...) VALUES (#key...)
You're experiencing concurrency issues tied to the Transaction Isolation Level.
Transaction Isolation Levels represent a compromise between the need for concurrency (i.e. performance) and the need for data quality (i.e. accuracy).
By default, SQL uses a Read Committed isolation level, which means you can't get "dirty" reads (reads of data that has been modified by another transaction that but not yet committed to the table). It does not, however, mean that you are immune from other types of concurrency issues.
In your case, the issue you are having is called a non-repeatable read.
In your first example, the first line is reading the key value, then updating it. (In order for the UPDATE to set the column to key+1 it must first read the value of key). Then the second line's SELECT is reading the key value again. In a Read Committed or Read Uncommitted isolation level, it is possible that another transaction meanwhile completes an update to the key field, meaning that line 2 will read it as key+2 instead of the expected key+1.
Now, with your second example, once the key value has been read and modified and placed in the #key variable, it is not being read again. This prevents the non-repeatable read issue, but you're still not totally immune from concurrency problems. What can happen in this scenario is a lost update, in which two or more transactions end up trying to update key to the same value, and subsequently inserting duplicate keys to the table.
To be absolutely certain of having no concurrency problems with this structure as designed, you will need to use locking hints to ensure that all reads and updates to key are serializable (i.e. not concurrent). This will have horrendous performance, but "WITH UPDLOCK,HOLDLOCK" will get you there.
Your best solution, if you cannot change the database design, is to find someone who can. As Brian Hoover indicated, an auto-incrementing IDENTITY column is the way to do this with superb performance. The way you're doing it now reduces SQL's V-8 engine to one that is only allowed to fire on one cylinder.
Related
I have a table where the primary key is an increment int 'ID', that I have to manually set. I know an autoincrement int (IDENTITY) should have been the best option, but I can't change the existing table design.
So I need to atomize the operation of Read-Write, in some sort of:
Lock table
Read the MAX value of existings ID
Add new record with Primary Key = ID+1
Release table
What is the correct way to lock the table in a multiuser environment? I suppose it's a mix of transactions and the use of TABLOCX. I need to ensure:
No deadlocks
If something fails, the table should no stay locked (for example, program fails and exits when triying to write, and no COMMIT/ROLLBACK is called). I don't know even if this could be possible.
NOTE: The database is also used by other applications that I suppose care themselves of this problem.
EDITED: Could this be considered enough atomic to be a solution?:
INSERT INTO MYTABLE (ID, OtherFields...) VALUES ((Select Max(ID)+1 from MYTABLE), 'values'...)
Attempting to roll your own auto-increment mechanism using table locks is almost bound to fail - however, since you wrote you can't change the existing table, I would suggest using a sequence to get the next number instead of locking the table.
CREATE SEQUENCE dbo.MySequence -- Don't use this name, please!
AS int -- note: default is bigInt
START WITH 1
INCREMENT BY 1
NO CYCLE;
This has all some1 of the benefits of an identity column, without having to add an identity column to your table.
You can also use the sequence to generate a default value to a column (assuming adding a default constraint doesn't count as "changing the existing table structure", of course). See example D in official documentation
ALTER TABLE dbo.YourTableName
ADD CONSTRAINT YourTableName_id_default
DEFAULT NEXT VALUE FOR MySequence
FOR Id;
1 The benefits are you don't need to add locks or to calculate the next number yourself.
However, you should know that unlike an identity column, this doesn't protect you from updates to the id column, nor does it protect you from insert statements that explicitly insert a value to this column (without using next value for).
The first problem can be quite easily solved with an instead-of-update trigger on the table that will only update columns that aren't the id column, but I'm not sure how to solve the other problem.
So if the other process is correctly handling the locking, you could do exactly what you mentioned (lock, get last ID, insert and release) by executing something similar to the following:
DECLARE #MaxID INT
BEGIN TRY
BEGIN TRANSACTION
SELECT
#MaxID = MAX(I.ID)
FROM
MyTable AS I WITH (TABLOCKX, HOLDLOCK) -- TABLOCKX: no operations can be done, HOLDLOCK: until the end of the transaction
INSERT INTO MyTable (
ID,
OtherColumn)
SELECT
ID = ISNULL(#MaxID + 1, 1)
OtherColumn = 'Other values'
COMMIT
END TRY
BEGIN CATCH
-- Handle your error logging and rollback the transaction so the table locks are released, a basic example:
DECLARE #ErrorMessage VARCHAR(MAX) = ERROR_MESSAGE()
IF ##TRANCOUNT > 0
ROLLBACK
RAISERROR(#ErrorMessage, 16, 1)
END CATCH
However you will still have to do additional stuff for batch inserts, or if you need the inserted ID to load other related tables.
Also TABLOCKX is pretty restrictive, there are other less-restrictive locks but I believe they might leave you open for concurrency issues. You can check other locking hints in the docs.
I have created two threads in C# and I am calling two separate functions in parallel. Both functions read the last ID from XYZ table and insert new record with value ID+1. Here ID column is the primary key. When I execute the both functions I am getting primary key violation error. Both function having the below query:
insert into XYZ values((SELECT max(ID)+1 from XYZ),'Name')
Seems like both functions are reading the value at a time and trying to insert with the same value.
How can I solve this problem.. ?
Let the database handle selecting the ID for you. It's obvious from your code above that what you really want is an auto-incrementing integer ID column, which the database can definitely handle doing for you. So set up your table properly and instead of your current insert statement, do this:
insert into XYZ values('Name')
If your database table is already set up I believe you can issue a statement similar to:
alter table your_table modify column you_table_id int(size) auto_increment
Finally, if none of these solutions are adequate for whatever reason (including, as you indicated in the comments section, inability to edit the table schema) then you can do as one of the other users suggested in the comments and create a synchronized method to find the next ID. You would basically just create a static method that returns an int, issue your select id statement in that static method, and use the returned result to insert your next record into the table. Since this method would not guarantee a successful insert (due to external applications ability to also insert into the same table) you would also have to catch Exceptions and retry on failure).
Set ID column to be "Identity" column. Then, you can execute your queries as:
insert into XYZ values('Name')
I think that you can't use ALTER TABLE to change column to be Identity after column is created. Use Managament Studio to set this column to be Identity. If your table has many rows, this can be a long running process, because it will actually copy your data to a new table (will perform table re-creation).
Most likely that option is disabled in your Managament Studio. In order to enable it open Tools->Options->Designers and uncheck option "Prevent saving changes that require table re-creation"...depending on your table size, you will probably have to set timeout, too. Your table will be locked during that time.
A solution for such problems is to have generate the ID using some kind of a sequence.
For example, in SQL Server you can create a sequence using the command below:
CREATE SEQUENCE Test.CountBy1
START WITH 1
INCREMENT BY 1 ;
GO
Then in C#, you can retrieve the next value out of Test and assign it to the ID before inserting it.
It sounds like you want a higher transaction isolation level or more restrictive locking.
I don't use these features too often, so hopefully somebody will suggest an edit if I'm wrong, but you want one of these:
-- specify the strictest isolation level
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
insert into XYZ values((SELECT max(ID)+1 from XYZ),'Name')
or
-- make locks exclusive so other transactions cannot access the same rows
insert into XYZ values((SELECT max(ID)+1 from XYZ WITH (XLOCK)),'Name')
I am making an invoicing system, with the support for multiple subsidaries which each have their own set of invoice numbers, therefore i have a table with a primary key of (Subsidiary, InvoiceNo)
I cannot use MySQL auto increment field, as then it will be constantly incrementing the same count for all subsidaries.
I don't want to make seperate tables for each subsidiary as there will be new subsidaries added as need be...
I am currently using "Select Max (ID) Where Subsidiary = X", from my table and adding the invoice according to this.
I am using nHibernate, and the Invoice insert, comes before the InvoiceItem insert, therefore if Invoice insert fails, InvoiceItem will not be carried out. But instead i will catch the exception, re-retrieve the Max(ID) and try again.
What is the problem with this approach? And if any, what is an alternative?
The reson for asking is because i read one of the answers on this question: Nhibernate Criteria: 'select max(id)'
This is a very bad idea to use when generating primary keys. My advise is as follows:
Do not give primary keys a business meaning (synthetic keys);
Use a secondary mechanism for generating the invoice numbers.
This will make your life a lot easier. The mechanism for generating invoice numbers can then e.g. be a table that looks something like:
Subsidiary;
NextInvoiceNumber.
This will separate the internal numbering from how the database works.
With such a mechanism, you will be able to use auto increment fields again, or even better, GUID's.
Some links with reading material:
http://fabiomaulo.blogspot.com/2008/12/identity-never-ending-story.html
http://nhforge.org/blogs/nhibernate/archive/2009/02/09/nh2-1-0-new-generators.aspx
As you say, the problem with this approach is multiple sessions might try and insert the same invoice ID. You get a unique constraint violation, have to try again, that might fail as well, and so on.
I solve such problems by locking the subsiduary during the creation of new invoices. However, don't lock the table, (a) if you are using InnoDB there are problems that a lock table command by default will commit the transaction. (b) There is no reason why invoices for two different subsiduaries shouldn't be added at the same time as they have different independent invoice numbers.
What I would do in your situation is:
Open an transaction and make sure your tables are InnoDB.
Lock the subsiduary with an SELECT .. FOR UPDATE command. This can be done using LockMode.UPGRADE in NHibernate.
Find the max id using max(..) function and do the insert
Commit the transaction
This serializes all invoice inserts for one subsiduary (i.e. only one session can do such an insert at once, any second attempt will wait until the first is complete or has rolled back) but that's what you want. You don't want holes in your invoice numbers (e.g. if you insert invoice id 3485 and then it fails, then there are invoices 3484 and 3486 but no 3485).
I have a sql server table of licence keys/serial numbers.
Table structure is something like;
[
RecordId int,
LicenceKey string,
Status int (available, locked, used, expired etc.)
AssignedTo int (customerId)
....
]
Through my ASP.NET application, when the user decides to buy a licence clicking the accept button, i need to reserve a licence key for the user.
My approach is like,
Select top 1 licenceKey from KeysTable Where Status = available
Update KeysTable Set status = locked
then return the key back to the application.
My concern is, if two asp.net threads access the same record and returns the same licencekey.
What do you think is the best practice of doing such assignments ? Is there a well known aproach or a pattern to this kind of problem ?
Where to use lock() statements if i need any ?
I'm using Sql Server 2005, stored procedures for data access, a DataLayer a BusinessLayer and Asp.Net GUI.
Thanks
There's probably no need to use explicit locks or transactions in this case.
In your stored procedure you can update the table and retrieve the license key in a single, atomic operation by using an OUTPUT clause in your UPDATE statement.
Something like this:
UPDATE TOP (1) KeysTable
SET Status = 'locked'
OUTPUT INSERTED.LicenseKey
-- if you want more than one column...
-- OUTPUT INSERTED.RecordID, INSERTED.LicenseKey
-- if you want all columns...
-- OUTPUT INSERTED.*
WHERE Status = 'available'
To achieve what you're talking about, you'll want to use a serializable transaction. To do this, follow this pattern:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRANSACTION
--Execute select
--Execute update
COMMIT TRANSACTION
However, why do you have a table with every possible license key? Why not have a key generation algorithm, then create a new key when a user purchases it?
You could also try using locks (in SQL) in addition to transactions, to verify that only one thread has access at a time.
I believe that an application lock may be of help here.
I think that you should actually mark the key as unavailable in the same stored proc that you are querying for it, because otherwise there will always be some sort of race condition. Manually locking tables is not a good practise IMHO.
If you have a two staged process (e.g. like booking airline tickets), you could introduce a concept of reserving a key for a specified period of time (e.g. 30 mins), so that when you query for a new key, you reserve it at the same time.
EDIT: Locking in business logic probably would work if you can guarantee that only one process is going to change the database, but it is much better to do it on the database level, preferably in a single stored proc. To do it correctly you have to set the transaction level and use transactions in the database, just as #Adam Robinson suggested in his answer.
I have a legacy data table in SQL Server 2005 that has a PK with no identity/autoincrement and no power to implement one.
As a result, I am forced to create new records in ASP.NET manually via the ole "SELECT MAX(id) + 1 FROM table"-before-insert technique.
Obviously this creates a race condition on the ID in the event of simultaneous inserts.
What's the best way to gracefully resolve the event of a race collision? I'm looking for VB.NET or C# code ideas along the lines of detecting a collision and then re-attempting the failed insert by getting yet another max(id) + 1. Can this be done?
Thoughts? Comments? Wisdom?
Thank you!
NOTE: What if I cannot change the database in any way?
Create an auxiliary table with an identity column. In a transaction insert into the aux table, retrieve the value and use it to insert in your legacy table. At this point you can even delete the row inserted in the aux table, the point is just to use it as a source of incremented values.
Not being able to change database schema is harsh.
If you insert existing PK into table you will get SqlException with a message indicating PK constraint violation. Catch this exception and retry insert a few times until you succeed. If you find that collision rate is too high, you may try max(id) + <small-random-int> instead of max(id) + 1. Note that with this approach your ids will have gaps and the id space will be exhausted sooner.
Another possible approach is to emulate autoincrementing id outside of database. For instance, create a static integer, Interlocked.Increment it every time you need next id and use returned value. The tricky part is to initialize this static counter to good value. I would do it with Interlocked.CompareExchange:
class Autoincrement {
static int id = -1;
public static int NextId() {
if (id == -1) {
// not initialized - initialize
int lastId = <select max(id) from db>
Interlocked.CompareExchange(id, -1, lastId);
}
// get next id atomically
return Interlocked.Increment(id);
}
}
Obviously the latter works only if all inserted ids are obtained via Autoincrement.NextId of single process.
The key is to do it in one statement or one transaction.
Can you do this?
INSERT (PKcol, col2, col3, ...)
SELECT (SELECT MAX(id) + 1 FROM table WITH (HOLDLOCK, UPDLOCK)), #val2, #val3, ...
Without testing, this will probably work too:
INSERT (PKcol, col2, col3, ...)
VALUES ((SELECT MAX(id) + 1 FROM table WITH (HOLDLOCK, UPDLOCK)), #val2, #val3, ...)
If you can't, another way is to do it in a trigger.
The trigger is part of the INSERT transaction
Use HOLDLOCK, UPDLOCK for the MAX. This holds the row lock until commit
The row being updated is locked for the duration
A second insert will wait until the first completes.
The downside is that you are changing the primary key.
An auxiliary table needs to be part of a transaction.
Or change the schema as suggested...
Note: All you need is a source of ever-increasing integers. It doesn't have to come from the same database, or even from a database at all.
Personally, I would use SQL Express because it is free and easy.
If you have a single web server:
Create a SQL Express database on the web server with a single table [ids] with a single autoincrementing field [new_id]. Insert a record into this [ids] table, get the [new_id], and pass that onto your database layer as the PK of the table in question.
If you have multiple web servers:
It's a pain to setup, but you can use the same trick by setting appropriate seed/increment (i.e. increment = 3, and seed = 1/2/3 for three web servers).
What about running the whole batch (select for id and insert) in serializable transaction?
That should get you around needing to make changes in the database.
Is the main concern concurrent access? I mean, will multiple instances of your app (or, God forbid, other apps outside your control) be performing inserts concurrently?
If not, you can probably manage the inserts through a central, synchronized module in your app, and avoid race conditions entirely.
If so, well... like Joel said, change the database. I know you can't, but the problem is as old as the hills, and it's been solved well -- at the database level. If you want to fix it yourself, you're just going to have to loop (insert, check for collisions, delete) over and over and over again. The fundamental problem is that you can't perform a transaction (I don't mean that in the SQL "TRANSACTION" sense, but in the larger data-theory sense) if you don't have support from the database.
The only further thought I have is that if you at least have control over who has access to the database (e.g., only "authorized" apps, either written or approved by you), you could implement a side-band mutex of sorts, where a "talking stick" is shared by all the apps and ownership of the mutex is required to do an insert. That would be its own hairy ball of wax, though, as you'd have to figure out policy for dead clients, where it's hosted, configuration issues, etc. And of course a "rogue" client could do inserts without the talking stick and hose the whole setup.
The best solution is to change the database. You may not be able to change the column to be an identity column, but you should be able to make sure there's a unique constraint on the column and add a new identity column seeded with your existing PK's. Then either use the new column instead or use a trigger to make the old column mirror the new, or both.