I am writing an application where users can create items with a start date and an end date and save them to a SQL database hosted in Microsoft Sql Server. The rule in the application is that only a single item can be active for a given time (no overlapping items). The application also needs to be load balanced, which means (as far as I know) traditional semaphores / locking won't work.
A few additional items:
The records are persisted into two tables (based on a business requirement).
Users are allowed to "insert" records in the middle of an existing record. Inserted records adjust the start & end dates of any pre-existing records to prevent overlapping items (if necessary).
Optimally we want to accomplish this using our ORM and .Net. We don't have as much leeway to make database schema changes but we can create transactions and do other kinds of SQL operations through our ORM.
Our goal is to prevent the following from happening:
Saves from multiple users resulting in overlapping items in either table (ex. users 1 & 2 query the database, see that there aren't overlapping records, and save at the same time)
Saves from multiple users resulting in a different state in each of the destination tables (ex. Two users "insert" records, and the action is interleaved between the two tables. Table A looks as though User 1 went first, and table B looks as though User 2 went first.)
My question is how could I lock or prevent multiple users from saving / inserting at the same time across load balanced servers.
Note: We are currently looking into using sp_getapplock as it seems like it would do what we want, if you have experience with this or feel like it would be a bad decision and want to elaborate that would be appreciated as well!
Edit: added additional info
There are at least a couple of options:
You can create a stored procedure which wraps the INSERT operation in a transaction:
begin tran
select to see if there is an existing record
you can:
- invalidate the previous record
- insert a new record
or
- raise an error
commit tran
catch error
rollback tran
You can employ a last-in-wins strategy where you don't employ a write level lock, but rather a read-level pseuodo-lock, essentially ignoring all records except the latest one.
Related
I have one database server, acting as the main SQL Server, containing a Table to hold all data. Other database servers come in and out (different instances of SQL Server). When they come online, they need to download data from main Table (for a given time period), they then generate their own additional data to the same local SQL Server database table, and then want to update the main server with only new data, using a C# program, through a scheduled service, every so often. Multiple additional servers could be generating data at the same time, although it's not going to be that many.
Main table will always be online. The additional non-main database table is not always online, and should not be an identical copy of main, first it will contain a subset of the main data, then it generates its own additional data to the local table and updates main table every so often with its updates. There could be a decent amount of number of rows generated and/or downloaded. so an efficient algorithm is needed to copy from the extra database to the main table.
What is the most efficient way to transfer this in C#? SqlBulkCopy doesn't look like it will work because I can't have duplicate entries in main server, and it would fail if checking constraints since some entries already exist.
You could do it in DB or in C#. In all cases you must do something like Using FULL JOINs to Compare Datasets. You know that already.
Most important thing is to do it in transaction. If you have 100k rows split it to 1000 rows per transaction. Or try to determine what combination of rows per transaction is best for you.
Use Dapper. It's really fast.
If you have all your data in C#, use TVP to pass it to DB stored procedure. In stored procedure use MERGE to UPDATE/DELETE/INSERT data.
And last. In C# use Dictionary<Tkey, TValue> or something different with O(1) access time.
SQLBulkCopy is the fastest way for inserting data into a table from a C# program. I have used it to copy data between databases and so far nothing beats it speed wise. Here is a nice generic example: Generic bulk copy.
I would use a IsProcessed flag in the table of the main server and keep track of the main table's primary keys when you download data to the local db server. Then you should be able to do a delete and update to the main server again.
Here's how i would do it:
Create a stored procedure on the main table database which receives a user defined table variable with the same structure as the main table.
it should do something like -
INSERT INTO yourtable (SELECT * FROM tablevar)
OR you could use the MERGE statement for the Insert-or-Update functionality.
In code, (a windows service) load all (or a part of) the data from the secondery table and send it to the stored procedure as a table variable.
You could do it in bulks of 1000's and each time a bulk is updated you should mark it in the source table / source updater code.
Can you use linked servers for this? If yes it will make copying of data from and to main server much easier.
When copying data back to the main server I’d use IF EXISTS before each INSERT statement to additionally make sure there are no duplicates and encapsulate all insert statements into transaction so that if an error occurs transaction is rolled back.
I also agree with others on doing this in batches on 1000 or so records so that if something goes wrong you can limit the damage.
I'm a C++ programmer and I'm not familiar with the .NET database model. I usually use IDataReader (OdbcDataReader, OledbDataReader or SqlDataReader) to read data from database. Sometimes when I need a bulk of data I use DataAdapter, but what should I do to achieve the functionality of scrollable cursors that exists in native libraries like ODBC?
Thanks all of you for your answers, but I am in a situation that I can't accept them, of course this is my fault that didn't explain my problem completely. I explain it as a comment in one of answers that now removed.
I have to write a program that will act as a proxy between client side program and MSSQL, for this library I have following requirements:
My program should be compatible with MSSQL2000
I don't know all the tables and queries that will be sent by the user, I should simply add some information to it, make a log, ... and then execute it against MSSQL, so it is really hard to use techniques that based on ordered field(s) of the query or primary key of the table(All my works are in one database but that database is huge and may change over time).
Only a part of data is needed by the client, most DBMS support LIMIT OFFSET, unfortunately MSSQL do not support it, and ROW_NUMBER does not exist in the MSSQL2000 and if it supported, then again I need to understand program logic and that need a parse of SQL command(Actually I write a parsing library with boost::spirit but that's native code and beside that I'm not yet 100% sure about its functionality).
I may have multiple clients but most of queries that will be sent by them are one of a few predefined queries(of course users still send custom queries but its about 30% of all queries), So I think I can open some scrollable cursors and respond to clients using that cursors and a custom cache.
Server machine and its MSSQL will be dedicated to my program, so I really want to use all of the power of the server and DBMS to achieve my functionality.
So now:
What is the problem in using scrollable cursors and why I should avoid them?
How can I use scrollable cursors in .NET?
In SQL Server you can create queries paged thus. The page number you handle it easily from the application. You do not need to create cursors for this task.
For SQL Server 2005 o higher
SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY ID) AS ROW FROM TABLEA ) AS ALIAS
WHERE ROW > 40
AND ROW <= 49
For SQL Server 2000
SELECT TOP 10 T.* FROM TABLA AS T WHERE T.ID NOT IN
( SELECT TOP 39 id from tabla order by id desc )
ORDER BY T.ID DESC
PD: edited to include support for SQL Server 2000
I usually use DataReader.Read() to skip all rows that I do not want to use when doing paging on a DB which do not support paging.
If you don't want to build the SQL paged query yourself you are free to use my paging class: https://github.com/jgauffin/Griffin.Data/blob/master/src/Griffin.Data/BasicLayer/Paging/SqlServerPager.cs
When Microsoft designed the ADO.NET API, they made the decision to expose only firehose cursors (IDataReader etc). This may or may not actually pose a problem for you. You say that you want "functionality of scrollable cursors", but that can mean all sorts of things, not just paging, and each particular use case can be tackled in a variety of ways. For example:
Requirement: The user should be able to arbitrarily page up and down the resultset.
Retrieve only one page of data at a time, e.g. using the ROW_NUMBER() function. This is more efficient than scrolling through a cursor.
Requirement: I have an extremely large data set and I only want to process one row at a time to avoid running out of memory.
Use the firehose cursor provided by ADO.NET. Note that this is only practical if (a) you don't need to hit the database at all during the loop, or (b) you have MARS configured in your connection string.
Simulate a keyset cursor by retrieving the set of unique identifiers into an array, then loop through the array and read one row of data at a time.
Requirement: I am doing a complicated calculation that involves moving forwards and backwards through the resultset.
You should be able to re-write your algorithm to eliminate this requirement. For example, read one set of rows, process them, read another set of rows, process them, etc.
UPDATE (more information provided in the question)
Your business requirements are asking too much. You have to handle arbitrary queries that assume the presence of scrollable cursors, but you can't provide scrollable cursors, and you can't re-write the client code to not use scrollable cursors. That's an impossible position to be in. I recommend you stick with what you currently have (C++ and ODBC) and don't bother trying to re-write it in .NET.
I don't think cursors will work for you particular case. The main reason is that you have 3 tiers. But let's take two steps back.
Most 3 tier applications have a stateless middle tier (your c++ code). Caching is fine since it really just an optimization and does not create any real state in the middle tier. The middle tier normally has a small number of open sessions to the database. Because opening a db session is expensive for the processor, and after the db session is open a set amount of RAM is reserved at the database server. When a request is received by the middle tier, the request is processed and handed on to the SQL database. An algorithm may be used to pick any of the open sessions, or it can even be done at random. In this model it is not possible to know what session will receive the next request. Cursors belong to the session that received the original query request. So you can't really expect that the receiving session will be the one that has your open cursor.
The 3 tier model I described is used mainly for web applications so they can scale to hundreds or thousands of clients. Were SQL servers would never be able to open that many sessions. Microsoft ADO.NET already has many features to support the kind of architecture I described, so it is not very hard to implement. And the same is used even in non Web applications depending on the circumstance. You could potentially keep track of your sessions so you could open a single session per client, I would first make sure that the use case justifies that. Know that open cursors can take up a lot of resources as well.
Cursors still have a place within a single transaction, it's just hard to keep them open so that the client application can fetch/update values within the result set.
What I would suggest its that you do the following within the query transaction. Store in a separate table the primary key values of the main table in your query. On the separate table include other values like sessionid and rownumber. Return a few of the first rows by linking to the new table in the original query. And in subsequent calls just query the corresponding rows again by linking to your new table. You will need an equivalent to a caching mechanism to purge old data, and to refresh the result set according to your needs.
I have tables A, B and C in database. I have to put the result obtained from A and B into table C.
Currently, I have an SP that returns the result of the A and B to the C# application. This result will be copied into table C using "System.Data.SqlClient.SqlBulkCopy". The advanatge is during the insert using bulkcopy, log files are not created.
I want to avoid this extra traffic, by handling the insert in the SP itself. However, it should not be using any log files. Any way to achieve this?
Please share your thoughts.
Volume Of Data: 150,000
Database : SQL Server 2005
The database is in full recovery model; it cannot be changed.. Is SELECT INTO usefull in such scenario?
EDIT: When I use System.Data.SqlClient.SqlBulkCopy, the operation is getting completed in 3 mnutes; in normal insert it takes 30 minutes... This particular operation need not be recovered; however other operations in the database has to be recoveed - hence I cannot change the recovery mode of the whole database.
Thanks
Lijo
You can use SELECT INTO with the BULK_LOGGED recovery model in order minimise the number of records written to the transaction log as described in Example B of the INTO Clause documentation (MSDN):
ALTER DATABASE AdventureWorks2008R2 SET RECOVERY BULK_LOGGED;
GO
-- Put your SELECT INTO statement here
GO
ALTER DATABASE AdventureWorks2008R2 SET RECOVERY FULL;
This is also required for bulk inserts if you wish to have minimal impact on the transaction log as described in Optimizing Bulk Import Performance (MSDN):
For a database under the full recovery model, all row-insert operations that are performed during bulk import are fully logged in the transaction log. For large data imports, this can cause the transaction log to fill rapidly. For bulk-import operations, minimal logging is more efficient than full logging and reduces the possibility that a bulk-import operation will fill the log space. To minimally log a bulk-import operation on a database that normally uses the full recovery model, you can first switch the database to the bulk-logged recovery model. After bulk importing the data, switch the recovery model back to the full recovery model.
(emphasis mine)
I.e. if you don't already set the database recovery model to BULK_LOGGED before performing a bulk insert then you won't currently be getting the benefit of minimal transaction logging with bulk insers either and so the transaction log won't be source of your slowdown. (The SqlBulkCopy class doesn't do this for you automatically or anything)
Maybe you can use select into.
Try to take a look at http://msdn.microsoft.com/en-us/library/ms191244.aspx
Can you give an example of the processing your procedure does?
Typically, I would think a set-based insert of 150,000 rows (no linked servers or anything) would take almost no time on most installations.
How long does just selecting the 150,000 rows with a query take?
Are you using a cursor and loop instead of a single INSERT INTO C SELECT * FROM (some combination of A and B)?
Is there any blocking which is causing the operation to wait for other operations to complete?
If your database is in full recovery model, it is going to log the operation - that's the point of using the database that way. The database has been told to use that model and it's going to do that to ensure it can comply.
Imagine if you told the database that a column needed to be unique but it didn't actually enforce it for you! It would be worth less than a comment on a post-it note which fell off a specification document!
In SQL Server 2008 you do not need to return the data to the client/application before proceeding with a minimally logged operation. You can do it within the stored procedure immediately following your query that produces the result to be inserted to Table C.
See Insert: Specifically "Using INSERT INTO…SELECT to Bulk Load Data with Minimal Logging"
[Edit]: Having since expanded your question to include that you are using the FULL recovery model, you therefore cannot benefit from minimally logged operations.
Instead you should concentrate your efforts on optimising your data insert process, than concern yourself with logging overhead.
Insert data into table C in parts using insert into c select * from AandB WHERE ID < SOMETHING. Or you can take send output of a and b data as xml to stored procedure to insert bulk data.
Hope this will help you.
I have a table where I want to Log activities of some part of my application. A record will be inserted (may be updated in future) in this table when record inserted/updated in some other table.
E.g.
If record is inserted in Orders table
an entry will be inserted in Log
table.
If record is inserted in Booking
table an entry will be inserted in
Log table.
if record is updated in Customers
table an entry will be inserted in
Log table if Log table does not have
an entry for this customer.
etc..
Should I use Triggers on these tables to add records in Log table or should I have a common method in my code and call that method whenever insert/update activity occurs?
I have to do this activity on some part of my applications so there can be more than 20 tables on which I will be adding trigger or from couple of different locations from where I will call method.
I am using SQL Server 2005 and C#
What is better, Trigger or A method?
Method is better option than Trigger.
Triggers are generally
- performance heavy
- Less visible in the code, ie are hidden away
- More difficult to debug & Maintain.
- Limits on values to be passed to the log table
A method would give you lots of advantages in terms of optimizing the code, and extending the logic and easier to maintain
As this seems an important task I would use triggers inside the RDBMS to ensure that not only your application causes the logs to be created.
In case someone has the ability to update the database without your app by using TOAD, SSMS, Query Ananlyzer etc tec, a trigger would be better
it is never too late for such questions ,
in General , triggers reduce the round trip of your DB and code ,
in your case , to do this in C# you will need 2 trips for each action ,one for the action (Insert) and one for the log action , and of course you need to do a lot of handling for exceptions in your code so if the record is not inserted you handle this and also you log different action of failure
as trigger ,you send the data once to the server and all actions and handling are done there with no extra connections
this is useful specially now that every thing is shared and connections polls are limited .
May be my title is not clear. I am looking for some kind of version control on database tables, like subversion does on files, like wiki does.
I want to trace the changes log.
I want to extract and run the diff in reverse. (undo like a "svn merge -r 101:100").
I may need a indexed search on the history.
I've read the "Design Pattern for Undo Engine", but it is related to "Patterns". Are there anything I could reuse without reinvent the wheel?
EDIT:
For example, bank account transactions. I have column "balance"(and others) updated in table. a user will find a mistake by him 10 days later, and he will want to cancel/rollback the specific transaction, without changing others.
How can I do it gracefully in the application level?
Martin Fowler covers the topic in Patterns for things that change with time. Still patterns and not an actual framework but he shows example data and how to use it.
You could use a revision approach for each record that you want to trace. This would involve retaining a row in your table for every revision of a record. The records would be tied together by a shared 'ID' and could be queried on the 'Revision Status' (e.g. Get the latest "Approved" record).
In your application tier, you can handle these records individually and roll back to an earlier state if needed, as long as you record all the necessary information.
[ID] [Revision Date] [Revision Status] [Modified By] [Balance]
1 1-1-2008 Expired User1 $100
1 1-2-2008 Expired User2 $200
2 1-2-2008 Approved User3 $300
1 1-3-2008 Approved User1 $250
Pedantic point. Your bank account example would not get past an auditor/regulator.
Any erroneous entries in an account should be left there for the record. An equal and opposite correction transaction would be applied to the account. In effect rolling back the original transaction but leaving a very obvious trace of the original error and its correction.
I'd go with a bi-temporal database design, which would give you all the data required to perform and rollback, whether that means inserting more rows or simply deleting the later modifications.
There's a fair amount of subtlety to such a database design but there's are very good book on the subject:
Developing Time-oriented Database Applications in SQL by Richard T. Snodgrass
available for download here:
http://www.cs.arizona.edu/people/rts/tdbbook.pdf
Using a database transaction would be a bad idea because the locks it would create in the database - basically database transactions should be as short as possible.
Anything in the application layer, unless it has some persistence mechanism itself, won't survive application restarts (although that might not be a requirement).
Based on your comment to James Anderson, I would have the user interface write a new insert when cancelling a transaction. It would insert a new record into the table that had the same values as the cancelled transaction except the value would be a negative number instead of a positive number. If you have a structure that includes something to define the purpose of the transaction, I would make it say cancelled and the record number of the transaction it was cancelling.
Based on the various comments a possible solution for your problem would be to make a "date effective" table.
Basicly you add valid-from-date and valid-to-date columns to every table.
The "current" record should always have a valid_to_date of "2999-12-31" or some arbiteraly high value.
When a value changes you change the "valid-to-date" to the current date and insert a
new row with a valid-from-date of today and a valid-to-date of "2999-12-31" copy all the columns from the old row if they have not been changed.
You can create views with
"select all-columns-except-valid-xx-date from table where valid-to-date = '2999-12-31'"
Which will allow all your current queries to work unchanged.
This is a very common tecnique in data warehouse environments and for thing like exchange rates where the effective date is important.
The undo logic should be obvious.
I'm not aware of a specific pattern, although I have set up full undo/audit histories before using triggers and rowversions.
There are a couple of apps for MS Sql that let you trawl through the logs and see the actual changes.
I've used one called Log Navigator back with MS SQL 2000 that used to let me undo a specific historical transaction - I can't find it now though.
http://www.lumigent.com and http://www.apexsql.com do tools for viewing the logs, but I don't think either lets you roll them back.
I think the best way to do this is to write your application with this in mind - which you have a couple of good suggestions here already on how to do.