deadlock on a single SQL Server table - c#

I am using SQL Server 2008 Enterprise. And using ADO.Net + C# + .Net 3.5 + ASP.Net as client to access database. When I access SQL Server 2008 tables, I always invoke stored procedure from my C# + ADO.Net code.
I have 3 operations on table FooTable. And Multiple connections will execute them at the same time in sequences, i.e. executes delete, the execute insert and then execute select. Each statement (delete/insert/select) is of a separate individual transaction in the single store procedure.
My question is whether it is possible that deadlock will occur on delete statement? My guess is whether it is possible that deadlock occurs if multiple connections are operating on the same Param1 value?
BTW: For the statements below, Param1 is a column of table FooTable, Param1 is a foreign key of another table (refers to another primary key clustered index column of the other table). There is no index on Param1 itself for table FooTable. FooTable has another column which is used as clustered primary key, but not Param1 column.
create PROCEDURE [dbo].[FooProc]
(
#Param1 int
,#Param2 int
,#Param3 int
)
AS
DELETE FooTable WHERE Param1 = #Param1
INSERT INTO FooTable
(
Param1
,Param2
,Param3
)
VALUES
(
#Param1
,#Param2
,#Param3
)
DECLARE #ID bigint
SET #ID = ISNULL(##Identity,-1)
IF #ID > 0
BEGIN
SELECT IdentityStr FROM FooTable WHERE ID = #ID
END
Here is what the activity monitor table looks like,
ProcessID System Process Login Database Status Opened transaction Command Application Wait Time Wait Type CPU
52 No Foo suspended 0 DELETE .Net SqlClient Data Provider 4882 LCK_M_U 0
53 No George Foo suspended 2 DELETE .Net SqlClient Data Provider 12332 LCK_M_U 0
54 No George Foo suspended 2 DELETE .Net SqlClient Data Provider 6505 LCK_M_U 0
(a lot of rows like the row for process ID 54)

I would add an index on Param1 to FooTable; without it, the DELETE is doing full table scan, and that'll create problems with deadlocks.
EDIT
Based on your activity details, it doesn't look like you have deadlocks, you have blocking, many deletes are queueing up while one delete takes place. Again, indexing on Param1 would alleviate this, without it, each delete is going to do a full table scan to find the records to delete, while that is happening, the other delete's have to wait. If you have an index on Param1, it'll process much quicker and you won't see the blocking you are now.
If you have deadlocks, the system will kill one of the involved processes, otherwise nothing would ever process; with blocking, things will process, but very slowly if the table is large.

I do not think you would get a deadlock (this is not my field of expertise), but an explicit transaction would probably be a better choice here. A scenario that comes to mind with this code is the following
Two concurrent calls to the procedure execute with Param1 value of 5, both delete and then both insert, so now you have two records with Param1 value of 5. Depending on your data consistency requirements this might or might not be a concern to you.
An alternative for you might be to actually perform an Update and if no rows are affected (check ##rowcount) then do an Insert all in a transaction of course. Or better yet, take a look at Merge to perform Insert/Update operation in a single statement.

Related

Lock SQL server table for a read-write operation

I have a table where the primary key is an increment int 'ID', that I have to manually set. I know an autoincrement int (IDENTITY) should have been the best option, but I can't change the existing table design.
So I need to atomize the operation of Read-Write, in some sort of:
Lock table
Read the MAX value of existings ID
Add new record with Primary Key = ID+1
Release table
What is the correct way to lock the table in a multiuser environment? I suppose it's a mix of transactions and the use of TABLOCX. I need to ensure:
No deadlocks
If something fails, the table should no stay locked (for example, program fails and exits when triying to write, and no COMMIT/ROLLBACK is called). I don't know even if this could be possible.
NOTE: The database is also used by other applications that I suppose care themselves of this problem.
EDITED: Could this be considered enough atomic to be a solution?:
INSERT INTO MYTABLE (ID, OtherFields...) VALUES ((Select Max(ID)+1 from MYTABLE), 'values'...)
Attempting to roll your own auto-increment mechanism using table locks is almost bound to fail - however, since you wrote you can't change the existing table, I would suggest using a sequence to get the next number instead of locking the table.
CREATE SEQUENCE dbo.MySequence -- Don't use this name, please!
AS int -- note: default is bigInt
START WITH 1
INCREMENT BY 1
NO CYCLE;
This has all some1 of the benefits of an identity column, without having to add an identity column to your table.
You can also use the sequence to generate a default value to a column (assuming adding a default constraint doesn't count as "changing the existing table structure", of course). See example D in official documentation
ALTER TABLE dbo.YourTableName
ADD CONSTRAINT YourTableName_id_default
DEFAULT NEXT VALUE FOR MySequence
FOR Id;
1 The benefits are you don't need to add locks or to calculate the next number yourself.
However, you should know that unlike an identity column, this doesn't protect you from updates to the id column, nor does it protect you from insert statements that explicitly insert a value to this column (without using next value for).
The first problem can be quite easily solved with an instead-of-update trigger on the table that will only update columns that aren't the id column, but I'm not sure how to solve the other problem.
So if the other process is correctly handling the locking, you could do exactly what you mentioned (lock, get last ID, insert and release) by executing something similar to the following:
DECLARE #MaxID INT
BEGIN TRY
BEGIN TRANSACTION
SELECT
#MaxID = MAX(I.ID)
FROM
MyTable AS I WITH (TABLOCKX, HOLDLOCK) -- TABLOCKX: no operations can be done, HOLDLOCK: until the end of the transaction
INSERT INTO MyTable (
ID,
OtherColumn)
SELECT
ID = ISNULL(#MaxID + 1, 1)
OtherColumn = 'Other values'
COMMIT
END TRY
BEGIN CATCH
-- Handle your error logging and rollback the transaction so the table locks are released, a basic example:
DECLARE #ErrorMessage VARCHAR(MAX) = ERROR_MESSAGE()
IF ##TRANCOUNT > 0
ROLLBACK
RAISERROR(#ErrorMessage, 16, 1)
END CATCH
However you will still have to do additional stuff for batch inserts, or if you need the inserted ID to load other related tables.
Also TABLOCKX is pretty restrictive, there are other less-restrictive locks but I believe they might leave you open for concurrency issues. You can check other locking hints in the docs.

Efficiently execute 100K update statements - C# & Sql Server

My C# application retrieves over a million records from Sql Server, processes them and then updates the database back. This results in close to 100,000 update statements and they all have the following form -
update Table1 set Col1 = <some number> where Id in (n1, n2, n3....upto n200)
"Id" is an int, primary key with clustered index. No two update statements update the same Ids, so in theory, they can all run in parallel without any locks. Therefore, ideally, I suppose I should run as many as possible in parallel. The expectation is that all finish in no more than 5 minutes.
Now, my question is what is the most efficient way of doing it? I'm trying the below -
Running them sequentially one by one - This is the least efficient solution. Takes over an hour.
Running them in parallel by launching each update in it's on thread - Again very inefficient because we're creating thousands of threads but I tried anyway and it took over an hour and quite a few of them failed because of this or that connection issue.
Bulk inserting in a new table and then do a join for the update. But then we run into concurrency issues because more than one user are expected to be doing it.
Merge batches instead of updates - Google says that merge is actually slower than individual update statements so I haven't tried it.
I suppose this must be a very common problem with many applications out there that handle a sizeable amounts of data. Are there any standard solutions? Any ideas or suggestions will be appreciated.
I created a integer tbl type so that I can pass all my id's to sp as a list and then single query will update whole table.
This is still slow but i see this is way more quicker than conventional "where id in (1,2,3)"
definition for TYPE
CREATE TYPE [dbo].[integer_list_tbltype] AS TABLE(
[n] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[n] ASC
)WITH (IGNORE_DUP_KEY = OFF)
)
GO
Here is the usage.
declare #intval integer_list_tbltype
declare #colval int=10
update c
set c.Col1=#colval
from #intval i
join Table1 c on c.ID = i.n
Let me know if you have any questions.

Primary key violation error in sql server 2008

I have created two threads in C# and I am calling two separate functions in parallel. Both functions read the last ID from XYZ table and insert new record with value ID+1. Here ID column is the primary key. When I execute the both functions I am getting primary key violation error. Both function having the below query:
insert into XYZ values((SELECT max(ID)+1 from XYZ),'Name')
Seems like both functions are reading the value at a time and trying to insert with the same value.
How can I solve this problem.. ?
Let the database handle selecting the ID for you. It's obvious from your code above that what you really want is an auto-incrementing integer ID column, which the database can definitely handle doing for you. So set up your table properly and instead of your current insert statement, do this:
insert into XYZ values('Name')
If your database table is already set up I believe you can issue a statement similar to:
alter table your_table modify column you_table_id int(size) auto_increment
Finally, if none of these solutions are adequate for whatever reason (including, as you indicated in the comments section, inability to edit the table schema) then you can do as one of the other users suggested in the comments and create a synchronized method to find the next ID. You would basically just create a static method that returns an int, issue your select id statement in that static method, and use the returned result to insert your next record into the table. Since this method would not guarantee a successful insert (due to external applications ability to also insert into the same table) you would also have to catch Exceptions and retry on failure).
Set ID column to be "Identity" column. Then, you can execute your queries as:
insert into XYZ values('Name')
I think that you can't use ALTER TABLE to change column to be Identity after column is created. Use Managament Studio to set this column to be Identity. If your table has many rows, this can be a long running process, because it will actually copy your data to a new table (will perform table re-creation).
Most likely that option is disabled in your Managament Studio. In order to enable it open Tools->Options->Designers and uncheck option "Prevent saving changes that require table re-creation"...depending on your table size, you will probably have to set timeout, too. Your table will be locked during that time.
A solution for such problems is to have generate the ID using some kind of a sequence.
For example, in SQL Server you can create a sequence using the command below:
CREATE SEQUENCE Test.CountBy1
START WITH 1
INCREMENT BY 1 ;
GO
Then in C#, you can retrieve the next value out of Test and assign it to the ID before inserting it.
It sounds like you want a higher transaction isolation level or more restrictive locking.
I don't use these features too often, so hopefully somebody will suggest an edit if I'm wrong, but you want one of these:
-- specify the strictest isolation level
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
insert into XYZ values((SELECT max(ID)+1 from XYZ),'Name')
or
-- make locks exclusive so other transactions cannot access the same rows
insert into XYZ values((SELECT max(ID)+1 from XYZ WITH (XLOCK)),'Name')

how to improve SQL query performance in my case

I have a table, schema is very simple, an ID column as unique primary key (uniqueidentifier type) and some other nvarchar columns. My current goal is, for 5000 inputs, I need to calculate what ones are already contained in the table and what are not. Tht inputs are string and I have a C# function which converts string into uniqueidentifier (GUID). My logic is, if there is an existing ID, then I treat the string as already contained in the table.
My question is, if I need to find out what ones from the 5000 input strings are already contained in DB, and what are not, what is the most efficient way?
BTW: My current implementation is, convert string to GUID using C# code, then invoke/implement a store procedure which query whether an ID exists in database and returns back to C# code.
My working environment: VSTS 2008 + SQL Server 2008 + C# 3.5.
My first instinct would be to pump your 5000 inputs into a single-column temporary table X, possibly index it, and then use:
SELECT X.thecol
FROM X
JOIN ExistingTable USING (thecol)
to get the ones that are present, and (if both sets are needed)
SELECT X.thecol
FROM X
LEFT JOIN ExistingTable USING (thecol)
WHERE ExistingTable.thecol IS NULL
to get the ones that are absent. Worth benchmarking, at least.
Edit: as requested, here are some good docs & tutorials on temp tables in SQL Server. Bill Graziano has a simple intro covering temp tables, table variables, and global temp tables. Randy Dyess and SQL Master discuss performance issue for and against them (but remember that if you're getting performance problems you do want to benchmark alternatives, not just go on theoretical considerations!-).
MSDN has articles on tempdb (where temp tables are kept) and optimizing its performance.
Step 1. Make sure you have a problem to solve. Five thousand inserts isn't a lot to insert one at a time in a lot of contexts.
Are you certain that the simplest way possible isn't sufficient? What performance issues have you measured so far?
What do you need to do with those entries that do or don't exist in your table??
Depending on what you need, maybe the new MERGE statement in SQL Server 2008 could fit your bill - update what's already there, insert new stuff, all wrapped neatly into a single SQL statement. Check it out!
http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
http://www.sql-server-performance.com/articles/dba/SQL_Server_2008_MERGE_Statement_p1.aspx
http://blogs.msdn.com/brunoterkaly/archive/2008/11/12/sql-server-2008-merge-capability.aspx
Your statement would look something like this:
MERGE INTO
(your target table) AS t
USING
(your source table, e.g. a temporary table) AS s
ON t.ID = s.ID
WHEN NOT MATCHED THEN -- new rows does not exist in base table
....(do whatever you need to do)
WHEN MATCHED THEN -- row exists in base table
... (do whatever else you need to do)
;
To make this really fast, I would load the "new" records from e.g. a TXT or CSV file into a temporary table in SQL server using BULK INSERT:
BULK INSERT YourTemporaryTable
FROM 'c:\temp\yourimportfile.csv'
WITH
(
FIELDTERMINATOR =',',
ROWTERMINATOR =' |\n'
)
BULK INSERT combined with MERGE should give you the best performance you can get on this planet :-)
Marc
PS: here's a note from TechNet on MERGE performance and why it's faster than individual statements:
In SQL Server 2008, you can perform multiple data manipulation language (DML) operations in a single statement by using the MERGE statement. For example, you may need to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table. Typically, this is done by executing a stored procedure or batch that contains individual INSERT, UPDATE, and DELETE statements. However, this means that the data in both the source and target tables are evaluated and processed multiple times; at least once for each statement.
By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place. This topic provides best practice recommendations to help you achieve optimal performance when using the MERGE statement.
Try to ensure you end up running only one query - i.e. if your solution consists of running 5000 queries against the database, that'll probably be the biggest consumer of resources for the operation.
If you can insert the 5000 IDs into a temporary table, you could then write a single query to find the ones that don't exist in the database.
If you want simplicity, since 5000 records is not very many, then from C# just use a loop to generate an insert statement for each of the strings you want to add to the table. Wrap the insert in a TRY CATCH block. Send em all up to the server in one shot like this:
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
if you have a unique index or primary key defined on your string GUID, then the duplicate inserts will fail. Checking ahead of time to see if the record does not exist just duplicates work that SQL is going to do anyway.
If performance is really important, then consider downloading the 5000 GUIDS to your local station and doing all the analysis localy. Reading 5000 GUIDS should take much less than 1 second. This is simpler than bulk importing to a temp table (which is the only way you will get performance from a temp table) and doing an update using a join to the temp table.
Since you are using Sql server 2008, you could use Table-valued parameters. It's a way to provide a table as a parameter to a stored procedure.
Using ADO.NET you could easily pre-populate a DataTable and pass it as a SqlParameter.
Steps you need to perform:
Create a custom Sql Type
CREATE TYPE MyType AS TABLE
(
UniqueId INT NOT NULL,
Column NVARCHAR(255) NOT NULL
)
Create a stored procedure which accepts the Type
CREATE PROCEDURE spInsertMyType
#Data MyType READONLY
AS
xxxx
Call using C#
SqlCommand insertCommand = new SqlCommand(
"spInsertMyType", connection);
insertCommand.CommandType = CommandType.StoredProcedure;
SqlParameter tvpParam =
insertCommand.Parameters.AddWithValue(
"#Data", dataReader);
tvpParam.SqlDbType = SqlDbType.Structured;
Links: Table-valued Parameters in Sql 2008
Definitely do not do it one-by-one.
My preferred solution is to create a stored procedure with one parameter that can take and XML in the following format:
<ROOT>
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000">
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000001">
....
</ROOT>
Then in the procedure with the argument of type NCHAR(MAX) you convert it to XML, after what you use it as a table with single column (lets call it #FilterTable). The store procedure looks like:
CREATE PROCEDURE dbo.sp_MultipleParams(#FilterXML NVARCHAR(MAX))
AS BEGIN
SET NOCOUNT ON
DECLARE #x XML
SELECT #x = CONVERT(XML, #FilterXML)
-- temporary table (must have it, because cannot join on XML statement)
DECLARE #FilterTable TABLE (
"ID" UNIQUEIDENTIFIER
)
-- insert into temporary table
-- #important: XML iS CaSe-SenSiTiv
INSERT #FilterTable
SELECT x.value('#ID', 'UNIQUEIDENTIFIER')
FROM #x.nodes('/ROOT/MyObject') AS R(x)
SELECT o.ID,
SIGN(SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END)) AS FoundInDB
FROM #FilterTable o
LEFT JOIN dbo.MyTable t
ON o.ID = t.ID
GROUP BY o.ID
END
GO
You run it as:
EXEC sp_MultipleParams '<ROOT><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000"/><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000002"/></ROOT>'
And your results look like:
ID FoundInDB
------------------------------------ -----------
60EAD98F-8A6C-4C22-AF75-000000000000 1
60EAD98F-8A6C-4C22-AF75-000000000002 0

How to avoid a database race condition when manually incrementing PK of new row

I have a legacy data table in SQL Server 2005 that has a PK with no identity/autoincrement and no power to implement one.
As a result, I am forced to create new records in ASP.NET manually via the ole "SELECT MAX(id) + 1 FROM table"-before-insert technique.
Obviously this creates a race condition on the ID in the event of simultaneous inserts.
What's the best way to gracefully resolve the event of a race collision? I'm looking for VB.NET or C# code ideas along the lines of detecting a collision and then re-attempting the failed insert by getting yet another max(id) + 1. Can this be done?
Thoughts? Comments? Wisdom?
Thank you!
NOTE: What if I cannot change the database in any way?
Create an auxiliary table with an identity column. In a transaction insert into the aux table, retrieve the value and use it to insert in your legacy table. At this point you can even delete the row inserted in the aux table, the point is just to use it as a source of incremented values.
Not being able to change database schema is harsh.
If you insert existing PK into table you will get SqlException with a message indicating PK constraint violation. Catch this exception and retry insert a few times until you succeed. If you find that collision rate is too high, you may try max(id) + <small-random-int> instead of max(id) + 1. Note that with this approach your ids will have gaps and the id space will be exhausted sooner.
Another possible approach is to emulate autoincrementing id outside of database. For instance, create a static integer, Interlocked.Increment it every time you need next id and use returned value. The tricky part is to initialize this static counter to good value. I would do it with Interlocked.CompareExchange:
class Autoincrement {
static int id = -1;
public static int NextId() {
if (id == -1) {
// not initialized - initialize
int lastId = <select max(id) from db>
Interlocked.CompareExchange(id, -1, lastId);
}
// get next id atomically
return Interlocked.Increment(id);
}
}
Obviously the latter works only if all inserted ids are obtained via Autoincrement.NextId of single process.
The key is to do it in one statement or one transaction.
Can you do this?
INSERT (PKcol, col2, col3, ...)
SELECT (SELECT MAX(id) + 1 FROM table WITH (HOLDLOCK, UPDLOCK)), #val2, #val3, ...
Without testing, this will probably work too:
INSERT (PKcol, col2, col3, ...)
VALUES ((SELECT MAX(id) + 1 FROM table WITH (HOLDLOCK, UPDLOCK)), #val2, #val3, ...)
If you can't, another way is to do it in a trigger.
The trigger is part of the INSERT transaction
Use HOLDLOCK, UPDLOCK for the MAX. This holds the row lock until commit
The row being updated is locked for the duration
A second insert will wait until the first completes.
The downside is that you are changing the primary key.
An auxiliary table needs to be part of a transaction.
Or change the schema as suggested...
Note: All you need is a source of ever-increasing integers. It doesn't have to come from the same database, or even from a database at all.
Personally, I would use SQL Express because it is free and easy.
If you have a single web server:
Create a SQL Express database on the web server with a single table [ids] with a single autoincrementing field [new_id]. Insert a record into this [ids] table, get the [new_id], and pass that onto your database layer as the PK of the table in question.
If you have multiple web servers:
It's a pain to setup, but you can use the same trick by setting appropriate seed/increment (i.e. increment = 3, and seed = 1/2/3 for three web servers).
What about running the whole batch (select for id and insert) in serializable transaction?
That should get you around needing to make changes in the database.
Is the main concern concurrent access? I mean, will multiple instances of your app (or, God forbid, other apps outside your control) be performing inserts concurrently?
If not, you can probably manage the inserts through a central, synchronized module in your app, and avoid race conditions entirely.
If so, well... like Joel said, change the database. I know you can't, but the problem is as old as the hills, and it's been solved well -- at the database level. If you want to fix it yourself, you're just going to have to loop (insert, check for collisions, delete) over and over and over again. The fundamental problem is that you can't perform a transaction (I don't mean that in the SQL "TRANSACTION" sense, but in the larger data-theory sense) if you don't have support from the database.
The only further thought I have is that if you at least have control over who has access to the database (e.g., only "authorized" apps, either written or approved by you), you could implement a side-band mutex of sorts, where a "talking stick" is shared by all the apps and ownership of the mutex is required to do an insert. That would be its own hairy ball of wax, though, as you'd have to figure out policy for dead clients, where it's hosted, configuration issues, etc. And of course a "rogue" client could do inserts without the talking stick and hose the whole setup.
The best solution is to change the database. You may not be able to change the column to be an identity column, but you should be able to make sure there's a unique constraint on the column and add a new identity column seeded with your existing PK's. Then either use the new column instead or use a trigger to make the old column mirror the new, or both.

Categories

Resources