What is the best way to test if rows are being locked while they are being updated?
I have a query which select the top x records of a table and it updates them but multiple worker threads will be calling the same query so I want to ensure this are locked and if locked that it throws an error of some sort so that I can handle it accordingly but I don't seem to be able to throw an exception that I can catch in .NET (or in SQL for that matter).
The query looks like:
BEGIN TRANSACTION
UPDATE MyTable WITH (ROWLOCK)
SET x = #X,
y = #Y,
WHERE ID IN (SELECT TOP 50 ID
FROM MyTable
WHERE Z IS NULL)
SELECT TOP 50 x
FROM MyTable
WHERE x = #W
COMMIT TRANSACTION
I've tried to step through the debugger in SQL to just call the BEGIN TRANSACTION and then call the same query from my .NET application and expected an error but it just ran fine in .NET
So my question is how can I generate an error so that I know the records are currently being updated? I want to take a specific action when this occurs i.e. retry in x milliseconds for example.
Thanks.
Based on your recent comments (please add this information to your question body), you just want to make sure that each thread only “gets” rows that are not made available to the other threads. Like picking tasks from a table of pending tasks, and making sure no task is picked up by two threads.
You are overthinking the locking. Your problem is not something that requires fiddling with the SQL locking mechanism. You might fiddle with locking if you need to tune for performance reasons but you’re far from being able to establish if that is needed. I wouldn’t even bother with lock hints at this stage.
What you want to do is have a field in the row that indicates whether the row has been taken, and by whom. Your made-up sample T-SQL doesn’t use anything consistently, but the closest thing is the Z column.
You need to select rows that have a value of NULL (not taken yet) in Z. You are clearly on that track already. This field could be a BIT Yes/No and it can be made to work (look up the OUTPUT clause of UPDATE to see how you can pick up which rows were selected); but I suspect you will find far more useful for tracing/debugging purposes to be able to identify rows taken together by looking at the database alone. I would use a column that can hold a unique value that cannot be used by any other thread at the same time.
There are a number of approaches. You could use a (client process) Thread ID or similar, but that will repeat over time which might be undesirable. You could create a SQL SEQUENCE and use those values, which has the nice feature of being incremental, but that makes the SQL a little harder to understand. I’ll go with a GUID for illustration purposes.
DECLARE #BatchId AS UNIQUEIDENTIFIER
SET #BatchId = NEWID()
UPDATE MyTable
SET x = #X,
y = #Y,
Z = #BatchId
WHERE ID IN (SELECT TOP 50 ID
FROM MyTable
WHERE Z IS NULL)
Now only your one thread can use those rows (assuming of course that no thread cheats and violates the code pattern). Notice that I didn’t even open an explicit transaction. In SQL Server, UPDATEs are transactionally atomic by default (they run inside an implicit transaction). No thread can pick those rows again because no thread (by default) is allowed to update or even see those rows until the UPDATE has been committed in its entirety (for all rows). Any thread that tries to run the UPDATE at the same time either gets different unassigned rows (depending on your locking hints and how cleverly you select the rows to pick - that’s part of the “advanced” performance tuning), or is paused for a few milliseconds waiting for the first UPDATE to complete.
That’s all you need. Really. The rows are yours and yours only:
SELECT #BatchId AS BatchId, x
FROM MyTable
WHERE Z = #BatchId
Now your code gets the data from its dedicated rows, plus the unique ID which it can use later to reference the same rows, if you need to (take out the BatchId from the return set if you truly don’t need it).
If my reading of what you are trying to do is correct, you will probably want to flag the rows when your code is done by setting another field to a flag value or a timestamp or something. That way you will be able to tell if rows were “orphaned” because they were taken for processing but the process died for any reason (in which case they probably need to be made available again by setting Z to NULL again).
Related
We have a table with a key field, and another table which contains the current value of that key sequence, ie, to insert a new record you need to:
UPDATE seq SET key = key + 1
SELECT key FROM seq
INSERT INTO table (id...) VALUES (#key...)
Today I have been investigating collisions, and have found that without using transactions the above code run in parallel induces collisions, however, swapping the UPDATE and SELECT lines does not induce collisions, ie:
SELECT key + 1 FROM seq
UPDATE seq SET key = key + 1
INSERT INTO table (id...) VALUES (#key...)
Can anyone explain why? (I am not interested in better ways to do this, I am going to use transactions, and I cannot change the database design, I am just interested in why we observed what we did.)
I am running the two lines of SQL as a single string using C#'s SqlConnection, SqlCommand and SqlDataAdapter.
First off, your queries do not entirely make sense. Here's what I presume you are actually doing:
UPDATE seq SET key = key + 1
SELECT #key = key FROM seq
INSERT INTO table (id...) VALUES (#key...)
and
SELECT #key = key + 1 FROM seq
UPDATE seq SET key = #key
INSERT INTO table (id...) VALUES (#key...)
You're experiencing concurrency issues tied to the Transaction Isolation Level.
Transaction Isolation Levels represent a compromise between the need for concurrency (i.e. performance) and the need for data quality (i.e. accuracy).
By default, SQL uses a Read Committed isolation level, which means you can't get "dirty" reads (reads of data that has been modified by another transaction that but not yet committed to the table). It does not, however, mean that you are immune from other types of concurrency issues.
In your case, the issue you are having is called a non-repeatable read.
In your first example, the first line is reading the key value, then updating it. (In order for the UPDATE to set the column to key+1 it must first read the value of key). Then the second line's SELECT is reading the key value again. In a Read Committed or Read Uncommitted isolation level, it is possible that another transaction meanwhile completes an update to the key field, meaning that line 2 will read it as key+2 instead of the expected key+1.
Now, with your second example, once the key value has been read and modified and placed in the #key variable, it is not being read again. This prevents the non-repeatable read issue, but you're still not totally immune from concurrency problems. What can happen in this scenario is a lost update, in which two or more transactions end up trying to update key to the same value, and subsequently inserting duplicate keys to the table.
To be absolutely certain of having no concurrency problems with this structure as designed, you will need to use locking hints to ensure that all reads and updates to key are serializable (i.e. not concurrent). This will have horrendous performance, but "WITH UPDLOCK,HOLDLOCK" will get you there.
Your best solution, if you cannot change the database design, is to find someone who can. As Brian Hoover indicated, an auto-incrementing IDENTITY column is the way to do this with superb performance. The way you're doing it now reduces SQL's V-8 engine to one that is only allowed to fire on one cylinder.
I want to lock one record and then no one may make changes to that record. When I release the lock, then people may change the record.
In the meantime that a record is locked, I want to show the user a warning that the record is locked and that changes are not allowed.
How can I do this?
I've tried all the IsolationLevel levels, but none of them has the behavior I want. Some of the Isolation levels wait until the lock is released and then make a change. I don't want this, because updating is not allowed at the moment a record is locked.
What can I do to lock a record and deny all changes?
I use SQL Server 2008
With the assumption that this is MS SQL server, you probably want UPDLOCK, possibly combined with ROWLOCK (Table hints). I'm having trouble finding a decent article which describes the theory, but here is quick example:
SELECT id From mytable WITH (ROWLOCK, UPDLOCK) WHERE id = 1
This statement will place an update lock on the row for the duration of the transaction (so it is important to be aware of when the transaction will end). As update locks are incompatible with exclusive locks (required to update records), this will prevent anyone from updating this record until the transaction has ended.
Note that other processes attempting to modify this record will be blocked until the transaction completes, however will continue with whatever write operation they requested once the transaction has ended (unless they are timed out or killed off as a deadlocked process). If you wish to prevent this then your other processes need to use additional hints in order to either abort if an incompatible lock is detected, or skip the record if it has changed.
Also, You should not use this method to lock records while waiting for user input. If this is your intention then you should add some sort of "being modified" column to your table instead.
The SQL server locking mechanisms are really only suited for use to preserve data integrity / preventing deadlocks - transactions should generally be kept as short as possible and should certainly not be maintained while waiting for user input.
Sql Server has locking hints, but these are limited to the scope of a query.
If the decision to lock the record is taken in an application, you can use the same mechanisms as optimistic locking and deny any changes to the record from the application.
Use a timestamp or guid as a lock on the record and deny access or changes to the record if the wrong locking key is given. Be careful to unlock records again or you will get orphans
See this duplicate question on SO.
Basically it's:
begin tran
select * from [table] with(holdlock,rowlock) where id = #id
--Here goes your stuff
commit tran
Archive
Something like this maybe?
update t
set t.IsLocked = 1
from [table] t
where t.id = #id
Somewhere in the update trigger:
if exists (
select top 1 1
from deleted d
join inserted i on i.id = d.id
where d.IsLocked = 1 and i.RowVersion <> d.RowVersion)
begin
print 'Row is locked'
rollback tran
end
You don't want to wait for the lock to be released and show the message as soon as you encounter a lock, if this is the case then did you try NOWAIT. See Table Hints (Transact-SQL) and SQL Server 2008 Table Hints for more details. To get benefit of NOWAIT you need to lock records on edits, google for more details.
I have a sql server table of licence keys/serial numbers.
Table structure is something like;
[
RecordId int,
LicenceKey string,
Status int (available, locked, used, expired etc.)
AssignedTo int (customerId)
....
]
Through my ASP.NET application, when the user decides to buy a licence clicking the accept button, i need to reserve a licence key for the user.
My approach is like,
Select top 1 licenceKey from KeysTable Where Status = available
Update KeysTable Set status = locked
then return the key back to the application.
My concern is, if two asp.net threads access the same record and returns the same licencekey.
What do you think is the best practice of doing such assignments ? Is there a well known aproach or a pattern to this kind of problem ?
Where to use lock() statements if i need any ?
I'm using Sql Server 2005, stored procedures for data access, a DataLayer a BusinessLayer and Asp.Net GUI.
Thanks
There's probably no need to use explicit locks or transactions in this case.
In your stored procedure you can update the table and retrieve the license key in a single, atomic operation by using an OUTPUT clause in your UPDATE statement.
Something like this:
UPDATE TOP (1) KeysTable
SET Status = 'locked'
OUTPUT INSERTED.LicenseKey
-- if you want more than one column...
-- OUTPUT INSERTED.RecordID, INSERTED.LicenseKey
-- if you want all columns...
-- OUTPUT INSERTED.*
WHERE Status = 'available'
To achieve what you're talking about, you'll want to use a serializable transaction. To do this, follow this pattern:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRANSACTION
--Execute select
--Execute update
COMMIT TRANSACTION
However, why do you have a table with every possible license key? Why not have a key generation algorithm, then create a new key when a user purchases it?
You could also try using locks (in SQL) in addition to transactions, to verify that only one thread has access at a time.
I believe that an application lock may be of help here.
I think that you should actually mark the key as unavailable in the same stored proc that you are querying for it, because otherwise there will always be some sort of race condition. Manually locking tables is not a good practise IMHO.
If you have a two staged process (e.g. like booking airline tickets), you could introduce a concept of reserving a key for a specified period of time (e.g. 30 mins), so that when you query for a new key, you reserve it at the same time.
EDIT: Locking in business logic probably would work if you can guarantee that only one process is going to change the database, but it is much better to do it on the database level, preferably in a single stored proc. To do it correctly you have to set the transaction level and use transactions in the database, just as #Adam Robinson suggested in his answer.
I have a legacy data table in SQL Server 2005 that has a PK with no identity/autoincrement and no power to implement one.
As a result, I am forced to create new records in ASP.NET manually via the ole "SELECT MAX(id) + 1 FROM table"-before-insert technique.
Obviously this creates a race condition on the ID in the event of simultaneous inserts.
What's the best way to gracefully resolve the event of a race collision? I'm looking for VB.NET or C# code ideas along the lines of detecting a collision and then re-attempting the failed insert by getting yet another max(id) + 1. Can this be done?
Thoughts? Comments? Wisdom?
Thank you!
NOTE: What if I cannot change the database in any way?
Create an auxiliary table with an identity column. In a transaction insert into the aux table, retrieve the value and use it to insert in your legacy table. At this point you can even delete the row inserted in the aux table, the point is just to use it as a source of incremented values.
Not being able to change database schema is harsh.
If you insert existing PK into table you will get SqlException with a message indicating PK constraint violation. Catch this exception and retry insert a few times until you succeed. If you find that collision rate is too high, you may try max(id) + <small-random-int> instead of max(id) + 1. Note that with this approach your ids will have gaps and the id space will be exhausted sooner.
Another possible approach is to emulate autoincrementing id outside of database. For instance, create a static integer, Interlocked.Increment it every time you need next id and use returned value. The tricky part is to initialize this static counter to good value. I would do it with Interlocked.CompareExchange:
class Autoincrement {
static int id = -1;
public static int NextId() {
if (id == -1) {
// not initialized - initialize
int lastId = <select max(id) from db>
Interlocked.CompareExchange(id, -1, lastId);
}
// get next id atomically
return Interlocked.Increment(id);
}
}
Obviously the latter works only if all inserted ids are obtained via Autoincrement.NextId of single process.
The key is to do it in one statement or one transaction.
Can you do this?
INSERT (PKcol, col2, col3, ...)
SELECT (SELECT MAX(id) + 1 FROM table WITH (HOLDLOCK, UPDLOCK)), #val2, #val3, ...
Without testing, this will probably work too:
INSERT (PKcol, col2, col3, ...)
VALUES ((SELECT MAX(id) + 1 FROM table WITH (HOLDLOCK, UPDLOCK)), #val2, #val3, ...)
If you can't, another way is to do it in a trigger.
The trigger is part of the INSERT transaction
Use HOLDLOCK, UPDLOCK for the MAX. This holds the row lock until commit
The row being updated is locked for the duration
A second insert will wait until the first completes.
The downside is that you are changing the primary key.
An auxiliary table needs to be part of a transaction.
Or change the schema as suggested...
Note: All you need is a source of ever-increasing integers. It doesn't have to come from the same database, or even from a database at all.
Personally, I would use SQL Express because it is free and easy.
If you have a single web server:
Create a SQL Express database on the web server with a single table [ids] with a single autoincrementing field [new_id]. Insert a record into this [ids] table, get the [new_id], and pass that onto your database layer as the PK of the table in question.
If you have multiple web servers:
It's a pain to setup, but you can use the same trick by setting appropriate seed/increment (i.e. increment = 3, and seed = 1/2/3 for three web servers).
What about running the whole batch (select for id and insert) in serializable transaction?
That should get you around needing to make changes in the database.
Is the main concern concurrent access? I mean, will multiple instances of your app (or, God forbid, other apps outside your control) be performing inserts concurrently?
If not, you can probably manage the inserts through a central, synchronized module in your app, and avoid race conditions entirely.
If so, well... like Joel said, change the database. I know you can't, but the problem is as old as the hills, and it's been solved well -- at the database level. If you want to fix it yourself, you're just going to have to loop (insert, check for collisions, delete) over and over and over again. The fundamental problem is that you can't perform a transaction (I don't mean that in the SQL "TRANSACTION" sense, but in the larger data-theory sense) if you don't have support from the database.
The only further thought I have is that if you at least have control over who has access to the database (e.g., only "authorized" apps, either written or approved by you), you could implement a side-band mutex of sorts, where a "talking stick" is shared by all the apps and ownership of the mutex is required to do an insert. That would be its own hairy ball of wax, though, as you'd have to figure out policy for dead clients, where it's hosted, configuration issues, etc. And of course a "rogue" client could do inserts without the talking stick and hose the whole setup.
The best solution is to change the database. You may not be able to change the column to be an identity column, but you should be able to make sure there's a unique constraint on the column and add a new identity column seeded with your existing PK's. Then either use the new column instead or use a trigger to make the old column mirror the new, or both.
I have some complex stored procedures that may return many thousands of rows, and take a long time to complete.
Is there any way to find out how many rows are going to be returned before the query executes and fetches the data?
This is with Visual Studio 2005, a Winforms application and SQL Server 2005.
You mentioned your stored procedures take a long time to complete. Is the majority of the time taken up during the process of selecting the rows from the database or returning the rows to the caller?
If it is the latter, maybe you can create a mirror version of your SP that just gets the count instead of the actual rows. If it is the former, well, there isn't really that much you can do since it is the act of finding the eligible rows which is slow.
A solution to your problem might be to re-write the stored procedure so that it limits the result set to some number, like:
SELECT TOP 1000 * FROM tblWHATEVER
in SQL Server, or
SELECT * FROM tblWHATEVER WHERE ROWNUM <= 1000
in Oracle. Or implement a paging solution so that the result set of each call is acceptably small.
make a stored proc to count the rows first.
SELECT COUNT(*) FROM table
Unless there's some aspect of the business logic of you app that allows calculating this, no. The database it going to have to do all the where & join logic to figure out how line rows, and that's the vast majority of the time spend in the SP.
You can't get the rowcount of a procedure without executing the procedure.
You could make a different procedure that accepts the same parameters, the purpose of which is to tell you how many rows the other procedure should return. However, the steps required by this procedure would normally be so similar to those of the main procedure that it should take just about as long as just executing the main procedure.
You would have to write a different version of the stored procedure to get a row count. This one would probably be much faster because you could eliminate joining tables which you aren't filtered against, remove ordering, etc. For example if your stored proc executed the sql such as:
select firstname, lastname, email, orderdate from
customer inner join productorder on customer.customerid=productorder.productorderid
where orderdate>#orderdate order by lastname, firstname;
your counting version would be something like:
select count(*) from productorder where orderdate>#orderdate;
Not in general.
Through knowledge about the operation of the stored procedure, you may be able to get either an estimate or an accurate count (for instance, if the "core" or "base" table of the query is able to be quickly calculated, but it is complex joins and/or summaries which drive the time upwards).
But you would have to call the counting SP first and then the data SP or you could look at using a multiple result set SP.
It could take as long to get a row count as to get the actual data, so I wouldn't advodate performing a count in most cases.
Some possibilities:
1) Does SQL Server expose its query optimiser findings in some way? i.e. can you parse the query and then obtain an estimate of the rowcount? (I don't know SQL Server).
2) Perhaps based on the criteria the user gives you can perform some estimations of your own. For example, if the user enters 'S%' in the customer surname field to query orders you could determine that that matches 7% (say) of the customer records, and extrapolate that the query may return about 7% of the order records.
Going on what Tony Andrews said in his answer, you can get an estimated query plan of the call to your query with:
SET showplan_text OFF
GO
SET showplan_all on
GO
--Replace with call you your stored procedure
select * from MyTable
GO
SET showplan_all ofF
GO
This should return a table, or many tables which will let you get the estimated row count of your query.
You need to analyze the returned data set, to determine what is a logical, (meaningful) primary key for the result set that is being returned. In general this WILL be much faster than the complete procedure, because the server is not constructing a result set from data in all the columns of each row of each table, it is simply counting the rows... In general, it may not even need to read the actual table rows off disk to do this, it may simply need to count index nodes...
Then write another SQL statement that only includes the tables necessary to generate those key columns (Hopefully this is a subset of the tables in the main sql query), and the same where clause with the same filtering predicate values...
Then add another Optional parameter to the Stored Proc called, say, #CountsOnly, with a default of false (0) as so...
Alter Procedure <storedProcName>
#param1 Type,
-- Other current params
#CountsOnly TinyInt = 0
As
Set NoCount On
If #CountsOnly = 1
Select Count(*)
From TableA A
Join TableB B On etc. etc...
Where < here put all Filtering predicates >
Else
<Here put old SQL That returns complete resultset with all data>
Return 0
You can then just call the same stored proc with #CountsOnly set equal to 1 to just get the count of records. Old code that calls the proc would still function as it used to, since the parameter value is set to default to false (0), if it is not included
It's at least technically possible to run a procedure that puts the result set in a temporary table. Then you can find the number of rows before you move the data from server to application and would save having to create the result set twice.
But I doubt it's worth the trouble unless creating the result set takes a very long time, and in that case it may be big enough that the temp table would be a problem. Almost certainly the time to move the big table over the network will be many times what is needed to create it.