There is something that worries me about my application. I have a SQL query that does a bunch of inserts into the database across various tables. I timed how long it takes to complete the process, it takes about 1.5 seconds. At this point I'm not even done developing the query, I still have more inserts to program into this. So I fully expect this to process to take even longer, perhaps up to 3 seconds.
Now, it is important that all of this data be consistent and finish either completely, or not at all. So What I'm wondering about is, is it OK for a transaction to take that long. Doesn't it lock up the table, so selects, inserts, updates, etc... cannot be run until the transaction is finished? My concern is if this query is being run frequently it could lock up the entire application so that certain parts of it become either incredibly slow, or unusable. With a low user base, I doubt this would be an issue, but if my application should gain some traction, this query could potentially be a lot.
Should I be concerned about this or am I missing something where the database won't act how I am thinking. I'm using a SQL Server 2014 database.
To note, I timed this by using the StopWatch C# object immediately before the transaction starts, and stop it right after the changes are committed. So it's about as accurate as can be.
You're right to be concerned about this, as a transaction will lock the rows it's written until the transaction commits, which can certainly cause problems such as deadlocks, and temporary blocking which will slow the system response. But there are various factors that determine the potential impact.
For example, you probably largely don't need to worry if your users are only updating and querying their own data, and your tables have indexing to support both read and write query criteria. That way each user's row locking will largely not affect the other users--depending on how you write your code of course.
If your users share data, and you want to be able to support efficient searching across multiple user's data even with multiple concurrent updates for example, then you may need to do more.
Some general concepts:
-- Ensure your transactions write to tables in the same order
-- Keep your transactions as short as possible by preparing the data to be written as much as possible before starting the transaction.
-- If this is a new system (and even if not new), definitely consider enabling Snapshot Isolation and/or Read Committed Snapshot Isolation on the database. SI will (when explicitly set on the session) allow your read queries not to be blocked by concurrent writes. RCSI will allow all your read queries by default not to be blocked by concurrent writes. But read this to understand both the benefits and gotchas of both isolation levels: https://www.brentozar.com/archive/2013/01/implementing-snapshot-or-read-committed-snapshot-isolation-in-sql-server-a-guide/
I think its depend on your code, how you used loop effectively, select query and the other statement.
Related
I want to use DeleteManyAsync method to delete multiple documents. I will encounter big collections being deleted. In the meantime I would like my new documents to be inserted. I would like to know if my database collection will be locked when DeleteManyAsync is fired.
This is the code I want to use :
List<MyDocument> list= new List<MyDocument>();
var filter = Builders<MyDocument>.Filter.In("_id", vl.Select(i => i.InternalId));
await _context?.MyDocuments?.DeleteManyAsync(filter);
Mongo db locks are a low level concern and are handled at the database server level. You, as a programmer writing a client application using the driver, do not need to concern yourself about the database locks too much.
What I'm trying to say is that when using the C# driver you won't notice any kind of issue related to concurrent write operations executed on the same collection. Locks are handled by the storage engine, not by the driver used at the client application level.
If you check this documentation you can read that, in case of conflicting write operations on the same collection, the storage engine will retry the operation at the server level:
When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation
So, again, the concurrency issues are handled at the server level.
Consider that if you need your application to be highly scalable you should design your system in order to avoid as much as possible concurrent write operations on the same collection. As I said above, locks are handled by the storage engine in order to preserve the correctness of your data, but locks can reduce the overall scalability of your system. So, if scalability is critical in your scenario, you should carefully design your system and avoid contention of resources at the database level as much as possible.
At the client application level you just need to decide whether or not retrying on a failed write operation.
Sometimes you can safely retry a failed operation, some other times you can't (e.g.: in some cases you will endup having duplicate data at the database level. A good guard against this is using unique indexes).
As a rule of thumb, idempotent write operations can safely be retried in case of a failure (because applying them multiple times does not have any side effect). Put another way, strive to have idempotent write operations as much as possible: this way you are always safe retrying a failed write operation.
If you need some guidance about the mongo C# driver erorr handling, you can take a look to this documentation
Update 25th July 2020
Based on the author comment, it seems that the main concern is not the actual database locking strategy, but the delete performances instead.
In that case I would proceed in the following manner:
always prefer a command performing a single database roundtrip (such as deleteMany) over issuing multiple single commands (such as deleteOne). By doing a single roundtrip you will minimize the latency cost and you will perform a single database command. It's simply more efficient
when you use a deleteMany command be sure to always filter documents by using a proper index, so that collection scan is avoided when finding the documents to be deleted
if you measure and you are sure that your bottleneck is the deleteMany speed, considere comparing the performances of deleteMany command with the one of an equivalent bulk write operation. I never tried that, so I have no idea about the actual speed comparison. My feeling is that probably there is no difference at all, because I supsect that under the hood deleteMany performs a bulk write. I have no clue on that, this is just a feeling.
consider changing your design in order to exploit the TTL index feature for an automatic deletion of the documents when some sort of expiration criteria is satisfied. This is not always possible, but it can be handy when applicable.
if you perform the delete operation as part of some sort of cleanup task on the data, consider scheduling a job performing the data cleanup operation on a regular basis, but outisde of the business hours of your users.
I have two servers that runs the same query checking for specific values in a single shared DB. If the query finds the values it will alter those values. At the same time the other server might run the same query and there will be some kind of conflict while trying to alter the information.
Question: How could I best configure that the servers won't run their query at the same time and guarantee that they won't get conflicts?
Databases take care of this for you automatically. They use locks to make sure only one query access specific data at a time. These locks don't have to apply whole tables; depending on the query and transaction type, per-row locks are possible also. When you have two queries that should be grouped together, such as your select and update, transactions make sure the locks from the first query are not released until both queries have finished.
Generally, databases are meant to serve queries (and release their locks) quickly, so that two queries that arrive at about the same time will be processed in sequence with little to no observable delay to the end user. It is possible for locks to cause problems for queries that need to lock a lot of data, that need to run for a long time, or when two transactions begin to lock unrelated data but later both need data locked by the other. That is called a deadlock.
Problems with locks can be controlled by adjusting transaction isolation levels. However, it's usually a mistake to go messing with isolation levels. Most of the time the defaults will do what you need, and messing with isolation levels without fully-understanding what you're doing can make the situation worse, as well as allow queries to return stale or wrong data.
Transactions and Isolation levels are your friends here. You need to set the isolation level so that they won't interfere.
Refer to https://msdn.microsoft.com/en-gb/library/ms173763.aspx for guidance on the level you need to set.
You need to have an extra column in database ex. server_id. and write a query as:
select * from database where server_id = 1 --for the first server
select * from database where server_id = 2 --for the second server
I'm using Dapper, but this applies the same to ADO.NET code.
I have an operation on a web app that changes a lot of state in the database. To ensure an all-or-nothing result, I use a transaction to manage this. To do this, all my Repository classes share a connection (which is instantiated per request). On my connection I can call Connection.BeginTransaction().
However, this operation can sometimes take a while (say 10 seconds), and it's locking some frequently-read-from tables while it does it's thing. I want to allow other repos on other threads to continue without locking while this is happening.
It looks like I need to do 2 things to make this happen:
1) Set the IsoloationLevel to something like ReadUncommited:
_transaction = Connection.BeginTransaction(IsolationLevel.ReadUncommitted);
2) For all other connections that don't need a transaction, I still need to enroll those connections in a transaction, so that I can again set ReadUncommited. If I don't do this then they'll still lock while they wait for the long running operation to complete.
So does this mean I need ALL my connections to start a transaction? This sounds expensive and sub-performant. Are there other solutions I'm missing here?
Thanks
Be aware that there is a trade-off between using locks or not, it's about performance vs concurrency control. Therefore, I don't think you should use ReadUncommited all the time.
If you try to use ReadUncommited on all other transactions that need not to be blocked by this long running transaction, they will by accident not be blocked also by other transactions.
Generally, we use this isolation level when performance is the first priority and does not need data accuracy
I want to allow other repos on other threads to continue without
locking while this is happening.
I think you can try IsolationLevel.SnapShot on only the transaction that does long locking work: https://msdn.microsoft.com/en-us/library/tcbchxcb(v=vs.110).aspx
Extracted from the link:
The term "snapshot" reflects the fact that all queries in the
transaction see the same version, or snapshot, of the database, based
on the state of the database at the moment in time when the
transaction begins. No locks are acquired on the underlying data rows
or data pages in a snapshot transaction, which permits other
transactions to execute without being blocked by a prior uncompleted
transaction. Transactions that modify data do not block transactions
that read data, and transactions that read data do not block
transactions that write data, as they normally would under the default
READ COMMITTED isolation level in SQL Server. This non-blocking
behavior also significantly reduces the likelihood of deadlocks for
complex transactions.
Be aware that an enormous amount of data could be generated in tempdb for version store if there are a lot of modifications.
I am having a challenge of maintaining an incredibly large transaction using Nhibernate. So, let us say, I am saving large number of entities. If I do not flush on a transaction N, let us say 10000, then the performance gets killed due to overcrowded Nh Session. If I do flush, I place locks on DB level which in combination with read committed isolation level do affect working application. Also note that in reality I import an entity whose business logic is one of the hearts of the system and on its import around 10 tables are affected. That makes Stateless session a bad idea due to manual maintaining of cascades.
Moving BL to stored procedure is a big challenge due to to reasons:
there is already complicated OO business logic in the domain
classes of application,
duplicated BL will be introduced.
Ideally I would want to Flush session to some file and only then preparation of data is completed, I would like to execute its contents. Is it possible?
Any other suggestions/best practices are more than welcome.
You scenario is a typical ORM batch problem. In general we can say that no ORM is meant to be used for stuff like that. If you want to have high batch processing performance (not everlasting locks and maybe deadlocks) you should not use the ORM to insert 1000s of records.
Instead use native batch inserts which will always be a lot faster. (like SqlBulkCopy for MMSQL)
Anyways, if you want to use nhibernate for this, try to make use of the batch size setting.
Call save or update to all your objects and only call session.Flush once at the end. This will create all your objects in memory...
Depending on the batch size, nhibernate should try to create insert/update batches with this size, meaning you will have lot less roundtrips to the database and therefore fewer locks or at least it shouldn't take that long...
In general, your operations should only lock the database the moment your first insert statement gets executed on the server if you use normal transactions. It might work differently if you work with TransactionScope.
Here are some additional reads of how to improve batch processing.
http://fabiomaulo.blogspot.de/2011/03/nhibernate-32-batching-improvement.html
NHibernate performance insert
http://zvolkov.com/clog/2010/07/16?s=Insert+or+Update+records+in+bulk+with+NHibernate+batching
So I am troubleshooting some performance problems on a legacy application, and I have uncovered a pretty specific problem (there may be others).
Essentially, the application is using an object relational mapper to fetch data, but it is doing so in a very inefficient/incorrect way. In effect, it is performing a series of entity graph fetches to fill a datagrid in the UI, and on databinding the grid (it is ASP.Net Webforms) it is doing additional fetches, which lead to other fetches, etc.
The net effect of this is that many, many tiny queries are being performed. Using SQL Profiler shows that a certain page performs over 10,000 queries (to fill a single grid. No query takes over 10ms to complete, and most of them register as 0ms in Profiler. Each query will use and release one connection, and the series of queries would be single-threaded (per http request).
I am very familiar with the ORM, and know exactly how to fix the problem.
My question is: what is the exact effect of having many, many small queries being executed in an application? In what ways does it/can it stress the different components of the system?
For example, what is the effect on the webserver's CPU and memory? Would it flood the connection pool and cause blocking? What would be the impact on the database server's memory, CPU and I/O?
I am looking for relatively general answers, mainly because I want to start monitoring the areas that are likely to be the most affected (I need to measure => fix => re-measure). Concurrent use of the system at peak would likely be around 100-200 users.
It will depend on the database but generally there is a parse phase for each query. If the query has used bind variables it will probably be cached. If not, you wear the hit of a parse and that often means short locks on resources. i.e. BAD. In Oracle, CPU and blocking are much more prevelant at the parse than the execute. SQL Server less so but it's worse at the execute. Obviously doing 10K of anything over a network is going to be a terrible solution, especially x 200 users. Volume I'm sure is fine but that frequency will really highlight all the overhead in comms latency and stuff like that. Connection pools generally are in the hundreds, not tens of thousands, and now you have 10s of thousands of objects all being created, queued, managed, destroyed, garbage collected etc.
But I'm sure you already know all this deep down. Ditch the ORM for this part and write a stored procedure to execute the single query to return your result set. Then put it on the grid.