I have a requirement to update 20 tables in a SQL Server database in a C# application. For better performance, I am planning to use multiple threads for updating tables. Could anybody refer any example link which gives idea for this kind of operation?
Also, as per my understanding, if I use multi threading, then I have to use different connection object for each thread. In that case, how I can put multiple threads in a single transaction, which are basically using different connection objects?
Use TPL (task parallel library), here is an example http://safeery2k.wordpress.com/2013/09/17/ado-net-using-tpl/
20 tables to update at once is pretty crazy.
If 20 stored procedures is the ONLY way to do what needs to get done then create one more stored procedure that will call the 20 stored procedures for you and simply use this stored procedure.
This way if an error/exception occurs you will be able to rollback the mods AND it is now contained in one location - nice and easy (all things considered).
I'm very glad I don't have to deal with the situation you're in with this one! Good luck with this!
Related
I got into an application which is using .NET/C# as front end and SQL Server 2008 as back end. I found that ALWAYS transactions are handled in the c# code. Seems its an unwritten rule for this project that we shouldn't use Transactions within stored procedure.
I personally feel that transactions should be handled within Stored Procedure as it would give more control over the code! We might have lot of validations happening within the script all this while we don't need a open transaction. We need to open a transaction just before we do a Insert/Update/Delete and can close it asap.
Looking for answers which would help me understand the best practice for handling transactions and when exactly do we need to opt for Transactions within Stored Proc / C#.
There isn't a hard and fast rule, but I see several reasons to control transactions from the business tier:
Communication across data store boundaries. Transactions don't have to be against a RDBMS; they can be against a variety of entities.
The ability to rollback/commit transactions based on business logic that may not be available to the particular stored procedure you are calling.
The ability to invoke an arbitrary set of queries within a single transaction. This also eliminates the need to worry about transaction count.
Personal preference: c# has a more elegant structure for declaring transactions: a using block. By comparison, I've always found transactions inside stored procedures to be cumbersome when jumping to rollback/commit.
We might have lot of validations happening within the script all this
while we don't need a open transaction. We need to open a transaction
just before we do a Insert/Update/Delete and can close it asap.
This may or may not be a problem depending on how many transactions are being opened (it's not clear if this is a single job, or a procedure which is run with high concurrency). I would suggest looking at what locks are being placed on objects, and how long those locks are being held.
Keep in mind that validation possibly should lock; what if the data changes between the time you validated it and the time the action occurs?
If it is a problem, you could break the offending procedure into two procedures, and call one from outside of a TransactionScope.
I have a windows service which has five threads.Each thread will pic different Excel file then it reads the excel rows and inserting into data base. Is it possible to INSERT parallelly ? Currently i am using single class with lock for inserting.
If you are inserting and the key is created for you by the DBMS, then there should be no problem, and no need to lock.
This depends on your database. If you database is capable of handling multiple connections (which it should nowaways).
It has nothing to do with your class where you do the insert though. Any locking there is not really necessary (unless, of course, you database does not support multiple connections, which I seriously doubt).
Make sure it's in a transaction, and get rid of the lock! You should be fine...assuming whatever database you're using supports transactions.
Most modern databases will support multiple writes, it's safer to use the transaction incase another goes wrong.
The best way to tackle reading from a single table in SQL Server using multiple threads and make sure not reading the same record twice in different thread using c#
Thank you for your help in advance
Are you trying to read records from the table in parallel to speed up retreiving the data or are you just worried about data corruption with threads accessing the same data?
Database Management Systems like MsSQL handle concurrency extremely well so thread safety in that respect is not something you would have to be concerned with in your code if you have mutiple threads reading the same table.
If you want to read data in parallel without any overlapping you could run a SQL command with paging, and just have each thread fetch a different page. You could have say 20 threads all read 20 different pages at once and it would be guaranteed that they are not reading the same rows. Then you can concatenate the data. The greater the page size the more performance boost you would get from creating the thread.
efficient way to implement paging
Assuming a dependency on SQL Server, you could possibly looking at the SQL Server Service Broker features to provide queuing for you. One thing to keep in mind with that is that currently SQL Server Service Broker isn't available on SQL Azure, so if you had plans on moving onto the Azure cloud that could be a problem.
Anyway - with SQL Server Service Broker the concurrent access is managed at the database engine layer. Another way of doing it is having one thread that reads the database and then dispatches threads with the message as the input. That is slightly easier than trying to use transactions in the database to ensure that messages aren't read twice.
Like I said though, SQL Server Service Broker is probably the way to go. Or a proper external queuing mechanism.
Solution 1:
I am assuming that you are attempting to process or extract data from a large table. If I were assigned this task I would first look at paging . If you are trying to split work among threads that is. So Thread 1 handles pages 0 to 10, Thread 2 handles pages 11 to 20, etc... or you could batch rows using the actual rownumber. So in your stored proc you would do this;
WITH result_set AS (
SELECT
ROW_NUMBER() OVER (ORDER BY <ordering>) AS [row_number],
x, y, z
FROM
table
WHERE
<search-clauses>
) SELECT
*
FROM
result_set
WHERE
[row_number] BETWEEN #IN_Thread_Row_Start AND #IN_Thread_Row_End;
Another choice which would be more efficient is if you have a natural key, or a darn good surrogate, is to page using that and have the thread pass in the key parameters rather than the records it is interested in ( or page numbers ).
Immediate concerns with this solution would be:
ROW_NUMBER performance
CTE Performance (I believe they are stored in memory)
So if this was my problem to resolve I would look at paging via a key.
Solution 2:
The second solution would be to mark the rows as they are processing, virtually locking them, that is if you have data writer permission. So your table would have a field called Processed or Locked, as the rows are selected by your thread, they are updated as Locked = 1;
Then your select from other threads selects only rows that aren't locked. When your process is done and all rows are processed you could reset the lock.
Hard to say what will perform best w.o some trials... GL
This question is super old but still very relevant and I spent a lot of time finding this solution so i thought id post it for anyone else who happens along this. This is very common when using a sql table as a queue rather than msmq.
The solution (after a lot of investigation) is simple and can be tested by opening 2 tabs in ssms with each tab running its own transaction to simulate multiple processes/threads hitting the same table.
The quick answer is this: the key to this is using updlock and readpast hints on your selects.
To illustrate the reads working without duplication check out this simple example.
--on tab 1 in ssms
begin tran
SELECT TOP 1 ordno FROM table_queue WITH (updlock, readpast)
--on tab 2 in ssms
begin tran
SELECT TOP 1 ordno FROM table_queue WITH (updlock, readpast)
You will notice that the first selected record is locked and does not get duplicated by the select statement firing on the second tab/process.
Now in the real world you wouldnt just execute a select on your table like the simple example above. You would update your records as "isprocessing=1" or something similar if you are using your table as a queue. The above code just demonstrates that this allows concurrent reads without duplication.
So in the real world (if you are using your table as a queue and processing this queue with multiple services for instance) you would execute your select in a subquery to an update statement most likely.
Something like this.
begin tran
update table_queue set processing= 1 where myId in
(
SELECT TOP 50 myId FROM table_queue WITH (updlock, readpast)
)
commit tran
You may also combine yoru update statement with an output keyword so you have a list of all ids that are now locked (processing=1) so you can work with them.
if you are processing data using a table as queue this will ensure you will not duplicate records in your select statements without any need for paging or anything else.
This solution is being tested in an enterprise level application where we experienced a lot of duplication in our select statements when being monitored by many services running on many different boxes.
I'm working on a online sales web site. I'm using C# 4,0 and SQL server 2008 and I want to control and prevent users simultaneously insert into the table like dbo.orders... How can I do that?
Inserts will not be a problem, but updates can be. The term that you need to research is database concurrency. There are four basic models you can implement each with its own pros and cons. Some are better suited for certain situations and there are hundreds of articles on the web for this subject.
Are you trying to solve this in C# code on in SQL? Because in SQL it's simple. If you add BEGIN TRAN in the beginning of the stored procedure and COMMIT in the end, this will act as a lock in C# preventing concurrent code executions effectively serializing the requests. So if there are two inserts, they will be executed one after another. One thing to remember is that it will be blocking operation, i.e. the second insert won't start until the first one is finished (regardless successfully or not).
In your Add method you can use Locking with lock keyword, this will allow one thread at a time.
I am writing an application that logs status updates (GPS locations) from devices to a database. The updates occur at a set interval for each device, which is currently every 3 seconds. I'm using a simple table in SQL Server 08 for storing each update.
I've noticed that running the inserts is an area of slow down in my application. Its not a severe slow down, but noticable. Naturally, I'd like to write to the database in as an efficient way as possible. I have an idea to improve the performance and am looking for input and advice to see if it will help:
The status updates come in from an asynchronous Socket thread. In my current implementation, the database insert call is executed from this thread. I'm thinking I can create a queue for holding update data that the Socket thread can quickly add its update to and then go on its merry way. There would then be a separate thread whose sole responsibility would be checking the update queue and inserting the updates into the database.
Basically this whole process rests on the assumption that writing to the database from one location with a bunch of data all at once is more efficient than writing one row of data at a random time. Is my assumption correct, or way off base? Also, on the SQL side, is there a command to tell it to write a bunch of rows at once that would improve write performance?
This is how the database is being written to:
I'm using LinqToSQL in C#, so for each insert, I first create a DataContext instance. From the DataContext object I then call a stored procedure which inserts the location update.
The table is indexed by datetime, for the time of the update.
Have a look at the SqlBulkCopy class - this allows you to use BCP to insert chunks of data very quickly.
Also, make sure your indexes are efficient. If you have a clustered index on anything that does not increase sequentially (integer, date) then you will suffer performance slowdowns as the pages are filled up.
Have you looked MSMQ ( Microsoft Message Queuing (MSMQ)) ? That seems to me an option to take a look.
Yes, inserting in batches will typically be faster than separate inserts given your description. Each insert will require a connection to be set up and packets to be transferred. If you have a single small insert that takes one packet and you issue three of those, but you alternatively have three inserts that are small enough that they can all fit in one packet then it will help.
Quantifying it is difficult just based on your description - you'll need to do testing for that. For example, if you are keeping a dedicated connection open at all times anyway, as hova suggests, then you might see less of an impact.
Another area you might want to take a look at is whether you are setting up and tearing down a connection for each insert. That alone might make a performance improvement, negating the need for batching.
You'll also want to have as few indexes on the table as possible.
It sounds like a good idea. Why not give it a shot and see how it performs?
On the SQL side you'd want to have a look at making sure you are using parameterized queries.
Also batching your INSERT statements will certainly increase the performance.
Connection management is also key, of course that depends on how the application is built and whether it depends on a connection being there.
Are you not afraid to loose data while are you collecting data to batch copy?
I'm writing application doing the same. At start I will have to write data from 3,5k GPS devices. One device should send data each minute but it can send faster. Destination number of devices is 10,5k.
I'm wondering about inserting performance too. For now I'm saving received data to db on every packet using pure ADO.NET ICommand and stored procedure. On my test serwer (Xeon 3,4GHz and one 1TB hard disk - normal desktop ;) it takes for now 1ms or less.
#GRIMUS - should I wondering if there will be more devices?