How to troubleshoot SqlException deadlocked on lock | communication buffer resources - c#

There are varying versions of this question on stackoverflow already, but none of them helped me to get to the bottom of my issue. So, here I go again with more specific details of my problem.
We've been randomly getting Transaction (Process ID xx) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction. . Let me be clear, this is not row or table level locking. I've tried enough guessed/random things; I need exact step by step guide on how to troubleshoot deadlock on communication buffer.
If you interested in specific details then read on.
Specific Details of Scenario : We have a very simple Dapper ORM based C# .net core Web API that takes in requests and performs CRUD operations to a database hosted on this Microsoft Sql server. To do this, Connection manager (registered as a scoped service) opens a new IDbConnection connection in request scope; this connection is used to execute deletes, inserts, updates or get. For insert/update/delete C# line looks like this await connection.ExecuteAsync("<Create update or delete statement>", entity); For GET requests we simply run await connection.QueryFirstOrDefaultAsync<TEntity>("<select statement>", entity); ; There are 5 types of entity (all presenting simple non relational tables). They all CRUD by ID.
What has been tried so far
MAXDOP=1 query hint on SQL statements
Ensuring only 1 entity CRUD at given point in time for the one kind of entity.
Restarting SQL server/application instance
Ensuring ports/RAM/CPU/network bandwidth are not exhausted
Alter DATABASE XXXXXX SET READ_COMMITTED_SNAPSHOT ON/OFF
Keeping transactions as small as possible
Persistent retry policy as a workaround (to handle random transient nature of the issue)
Single thread per entity type
Server Specifications:
We have Microsoft Sql Server 2016 On Azure hosted in virtual machine with 64 cores and 400GB RAM. Usual workload on this server is 10% CPU and 30% RAM, occasionally it goes up to 80% CPU and 350GB RAM. At all the times when this issue occurred, CPU usage was noticed under 20% (mostly around 10%, only one occasion 20%, RAM was under 30% on all occasions).
Deadlock XML Event as per #Dan Guzman's request
File size was too large for this post so created this google drive file. Please click on the following link then in top right corner click download. It is a zip file.
https://drive.google.com/file/d/1oZ4dT8Yrd2uW2oBqBy9XK_laq7ftGzFJ/view?usp=sharing

#DanGuzman helped so I had to upvote/choose his answer as accepted answer. But, I'd like to summarize what went here, what I learned and a step by step approach on how to troubleshoot deadlock on communication buffer (or any deadlock for that matter).
Step - 1
Pull the deadlock report. I used following query but you could also use the query #DanGuzman suggested (in comment section to this question).
SELECT
xed.value('#timestamp', 'datetime2(3)') as CreationDate,
xed.query('.') AS XEvent
FROM
(
SELECT CAST([target_data] AS XML) AS TargetData
FROM sys.dm_xe_session_targets AS st
INNER JOIN sys.dm_xe_sessions AS s
ON s.address = st.event_session_address
WHERE s.name = N'system_health'
AND st.target_name = N'ring_buffer'
) AS Data
CROSS APPLY TargetData.nodes('RingBufferTarget/event[#name="xml_deadlock_report"]') AS XEventData (xed)
ORDER BY CreationDate DESC
Step - 2
Locate the deadlock event corresponding to your sql exception timing/data. Then read this report in conjunction with Detecting and Ending Deadlocks guide to understand the root cause of your deadlock issue. In my case I was getting deadlock on communication buffer so as per this guide the Memory (the Memory section of Detecting and Ending Deadlocks guide) must have been causing the problem. As Dan pointed out, in my case, following query appeared in deadlock report which was using way too much buffer (as a result of inefficient query). So what is deadlock on communication buffer? Well, if this query requires too much buffer to finish its execution then two such queries could start their execution at the same time and start claiming the buffer they need but at some point available buffer might not be enough and they'll have to wait for more buffer freed up from completion of execution of other queries. So both query wait for each other to complete in a hope to get some more buffer freed up. this could lead to deadlock on buffer (as per the Memory section of guide)
<inputbuf>
#SomeStatus1 nvarchar(4000),#ProductName nvarchar(4000),#ProductNameSide nvarchar(4000),#BayNo nvarchar(4000),#CreatedDateTime datetime,#EffectiveDate datetime,#ForSaleFrom datetime,#ForSaleTo datetime,#SetupInfoNode nvarchar(4000),#LocationNumber nvarchar(4000),#AverageProductPrice decimal(3,2),#NetAverageCost decimal(3,1),#FocustProductType nvarchar(4000),#IsProduceCode nvarchar(4000),#ActivationIndicator nvarchar(4000),#ResourceType nvarchar(4000),#ProductIdentifierNumber nvarchar(4000),#SellingStatus nvarchar(4000),#SectionId nvarchar(4000),#SectionName nvarchar(4000),#SellPriceGroup nvarchar(4000),#ShelfCapacity decimal(1,0),#SellingPriceTaxExclu decimal(2,0),#SellingPriceTaxInclu decimal(2,0),#UnitToSell nvarchar(4000),#VendorNumber nvarchar(4000),#PastDate datetime,#PastPrice decimal(29,0))
UPDATE dbo.ProductPricingTable
SET SellingPriceTaxExclu = #SellingPriceTaxExclu, SellingPriceTaxInclu = #SellingPriceTaxInclu,
SellPriceGroup = #SellPriceGroup,
ActivationIndicator = #ActivationIndicator,
IsProduceCode = #IsProduceCode,
EffectiveDate = #EffectiveDate,
NetCos
</inputbuf>
Step 3 (The Fix)
Wait !!!! But I used Dapper. Then how come it could convert my query into such a deadly query? Well Dapper is great for most situation with out of box defaults, however, clearly, in my situation this default 4000 nvarchar killed it (please read Dan's answer for understanding how could such a query could cause problem). As Dan suggested, I had automatic parameter building from input entity like this await connection.ExecuteAsync("<Create update or delete statement>", entity);, where entity is an instance of C# model class. I changed it custom parameters as shown below. (for sake of simplicity I only added one parameter but you could use all required)
var parameters = new DynamicParameters();
parameters.Add("Reference", entity.Reference, DbType.AnsiString, size: 18 );
await connection.ExecuteAsync("<Create update or delete statement>", parameters );
I can see in profiler that requests are now having exact matching type column parameter type. That's it, this fix made the problem go away. Thanks Dan.
Conclusion
I could conclude that in my case deadlock on communication buffer occurred because of a bad query that took too much buffer to execute. This was the case because I blindly used default Dapper parameter builder. Using Dapper's custom parameter builder solved the problem.

Deadlocks are often a symptom that query and index tuning is needed. Below is an example query from the deadlock trace that suggests the root cause of the deadlocks:
<inputbuf>
#SomeStatus1 nvarchar(4000),#ProductName nvarchar(4000),#ProductNameSide nvarchar(4000),#BayNo nvarchar(4000),#CreatedDateTime datetime,#EffectiveDate datetime,#ForSaleFrom datetime,#ForSaleTo datetime,#SetupInfoNode nvarchar(4000),#LocationNumber nvarchar(4000),#AverageProductPrice decimal(3,2),#NetAverageCost decimal(3,1),#FocustProductType nvarchar(4000),#IsProduceCode nvarchar(4000),#ActivationIndicator nvarchar(4000),#ResourceType nvarchar(4000),#ProductIdentifierNumber nvarchar(4000),#SellingStatus nvarchar(4000),#SectionId nvarchar(4000),#SectionName nvarchar(4000),#SellPriceGroup nvarchar(4000),#ShelfCapacity decimal(1,0),#SellingPriceTaxExclu decimal(2,0),#SellingPriceTaxInclu decimal(2,0),#UnitToSell nvarchar(4000),#VendorNumber nvarchar(4000),#PastDate datetime,#PastPrice decimal(29,0))
UPDATE dbo.ProductPricingTable
SET SellingPriceTaxExclu = #SellingPriceTaxExclu, SellingPriceTaxInclu = #SellingPriceTaxInclu,
SellPriceGroup = #SellPriceGroup,
ActivationIndicator = #ActivationIndicator,
IsProduceCode = #IsProduceCode,
EffectiveDate = #EffectiveDate,
NetCos
</inputbuf>
Although the SQL statement text is truncated, it does show that all parameter declarations are nvarchar(4000) (a common problem with ORMs). This may prevent indexes from being used efficiently when column types referenced in join/where clauses are different, resulting in full scans that lead to deadlocks during concurrent queries.
Change the parameter types to match that of the referenced columns and check the execution plan for efficiency.

Related

Is there a batch size limit to CreateBatchWrite()

When attempting to upload ~30,000 users into a dynamodb table using the Amazon.DynamoDBv2 wrapper for .net, not all records made it, however, there was no exception either.
var userBatch = _context.CreateBatchWrite<Authentication_User>();
userBatch.AddPutItems(users);
userBatch.ExecuteAsync();
Approximately 2,500'ish records were written to the table. Has anyone found a limit to number or size of batch inserts?
From the documentation (emphasis mine):
When using the object persistence model, you can specify any number of operations in a batch.
I exited the process after the return from ExecuteAsync(). That was the problem. When I let it run I can see the data slowly build up. However, in the bulk insert I used a Task.Wait() because there was nothing to do until the records had been loaded. However, Iridium's answer above assited me with the second issue that revolved around ProvisionedThroughput exceptions.

Figuring out what process is running on my SQL that is being called by my c# code

I am working on a .NET nop commerce application where I have around 5 million+ results in the database and I need to query all of that data for extraction. But the data from SQL is never returned to my code while my GC keeps on growing (it goes beyond 1gb) but when I run the same stored procedure in SQL after providing the respective parameters, it takes less than 2 minutes. I need to somehow figure out why call from my code is taking so much time.
NopCommerce uses entity framework libraries to call the databases stored procedure but that is not async so I am just trying to call the stored procedure in an async way using this function:
await dbcontext.Database.SqlQuery<TEntity>(commandText, parameters).ToListAsync();
as of my research from another SO post ToListAsync(); turns this call into an async when so the task is sent back to the task library.
now I need to figure out 3 things that currently I'm unable to do:
1) I need to figure out if that thread is running in the background? I assume it is as GC keeps growing but I'm just not sure, below is a pic of how I tried that using Diagnostics tool in Visual Studio:
2) I need to make sure if SQL processes are giving enough time to the database calls from my code, I tried following queries but they don't show me any value for the process running for that particular data export initiated by my code
I tried this query:
select top 50
sum(qs.total_worker_time) as total_cpu_time,
sum(qs.execution_count) as total_execution_count,
count(*) as number_of_statements,
qs.plan_handle
from
sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc
also tried this one:
SELECT
r.session_id
,st.TEXT AS batch_text
,SUBSTRING(st.TEXT, statement_start_offset / 2 + 1, (
(
CASE
WHEN r.statement_end_offset = - 1
THEN (LEN(CONVERT(NVARCHAR(max), st.TEXT)) * 2)
ELSE r.statement_end_offset
END
) - r.statement_start_offset
) / 2 + 1) AS statement_text
,qp.query_plan AS 'XML Plan'
,r.*
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) AS qp
ORDER BY cpu_time DESC
also when I use sp_who or sp_who2 the statuses for the processes for my database stay in the 'runnable' form like this, also those CPU and DISKIO:
3) I need to know, what if my DB call has been completed successfully but mapping them to the relevant list is taking a lot of time?
I would very much appreciate someone pointing me in the right direction, maybe help me with a query that can help me see the right results, or help me with viewing the running background threads and their status or maybe helping me with learning more about viewing the GC or threads and CPU utilization in a better way.
Any help will be highly appreciated. Thanks
A couple of diagnostic things to try:
Try adding a top 100 clause to the select statement, to see if there's a problem in the communication layer, or in the data mapper.
How much data is being returned by the stored procedure? If the procedure is returning more than a million rows, you may not be querying the data you mean.
Have you tried running it both synchronously and asynchronously?

Need to run migration console app: Too many records killing process - no idea how to solve

I've written a bit of code to manipulate data for a comprehensive transaction. However, am suffering endless problems and dead ends. If I run my code on a small data set, it works as expected. But now that I've had the production environment restored to the testing db, to get a full scope of testing. I basically wasted my time as best I can tell.
private static void AddProvisionsForEachSupplement2(ISupplementCoordinator supplmentCord)
{
var time = DateTime.Now;
using (var scope = new UnitOfWorkScope())
{
var supplements = supplmentCord.GetContracts(x => x.EffectiveDate <= new DateTime(2014, 2, 27)).AsEnumerable();
foreach (var supplement in supplements){
var specialProvisionTable = supplement.TrackedTables.FirstOrDefault(x => x.Name == "SpecialProvisions");
SetDefaultSpecialProvisions(specialProvisionTable, supplement);
Console.Out.WriteLine(supplement.Id.ToString() + ": " + (DateTime.Now - time).TotalSeconds);
}
}
}
You can see I decided to test my timing, it takes roughly 300+ seconds to complete the loop, and then the 'commit' that occurs is obscenely long. Probably longer.
I get this error:
The transaction associated with the current connection has completed but has not been disposed. The transaction must be disposed before the connection can be used to execute SQL statements.
I added [Transaction(Timeout = 6000)] to even get that, before I was getting a transaction timeout.
There is so much more info that we need and giving you a definitive answer is going to be tough. However when dealing with large datasets in one massive transaction is always going to hurt performance.
However, the first thing I would do too be honest is to hook up a SQL profiler (like NHProf, or SQL profiler, or even Log4Net) to see how many queries you are issuing. Then as you test you can see if you can reduce the queries which will in turn reduce the time.
Then after working this out you have a few options:-
Use a stateless session to grab data, but this removes change tracking so you will need to persist it later manually
Reduce the single big transactions into smaller transactions (this is what I would start with)
See if eager loading a whole lot of data in one hit might be more efficient
See if using batching is going to help you
Don't give up, fight and understand why this is giving you issues, this is the key becoming a master of NHibernate (rather than a slave to it)
Good luck :)

"cursor like" reading inside a CLR procedure/function

I have to implement an algorithm on data which is (for good reasons) stored inside SQL server. The algorithm does not fit SQL very well, so I would like to implement it as a CLR function or procedure. Here's what I want to do:
Execute several queries (usually 20-50, but up to 100-200) which all have the form select a,b,... from some_table order by xyz. There's an index which fits that query, so the result should be available more or less without any calculation.
Consume the results step by step. The exact stepping depends on the results, so it's not exactly predictable.
Aggregate some result by stepping over the results. I will only consume the first parts of the results, but cannot predict how much I will need. The stop criteria depends on some threshold inside the algorithm.
My idea was to open several SqlDataReader, but I have two problems with that solution:
You can have only one SqlDataReader per connection and inside a CLR method I have only one connection - as far as I understand.
I don't know how to tell SqlDataReader how to read data in chunks. I could not find documentation how SqlDataReader is supposed to behave. As far as I understand, it's preparing the whole result set and would load the whole result into memory. Even if I would consume only a small part of it.
Any hint how to solve that as a CLR method? Or is there a more low level interface to SQL server which is more suitable for my problem?
Update: I should have made two points more explicit:
I'm talking about big data sets, so a query might result in 1 mio records, but my algorithm would consume only the first 100-200 ones. But as I said before: I don't know the exact number beforehand.
I'm aware that SQL might not be the best choice for that kind of algorithm. But due to other constraints it has to be a SQL server. So I'm looking for the best possible solution.
SqlDataReader does not read the whole dataset, you are confusing it with the Dataset class. It reads row by row, as the .Read() method is being called. If a client does not consume the resultset the server will suspend the query execution because it has no room to write the output into (the selected rows). Execution will resume as the client consumes more rows (SqlDataReader.Read is being called). There is even a special command behavior flag SequentialAccess that instructs the ADO.Net not to pre-load in memory the entire row, useful for accessing large BLOB columns in a streaming fashion (see Download and Upload images from SQL Server via ASP.Net MVC for a practical example).
You can have multiple active result sets (SqlDataReader) active on a single connection when MARS is active. However, MARS is incompatible with SQLCLR context connections.
So you can create a CLR streaming TVF to do some of what you need in CLR, but only if you have one single SQL query source. Multiple queries it would require you to abandon the context connection and use isntead a fully fledged connection, ie. connect back to the same instance in a loopback, and this would allow MARS and thus consume multiple resultsets. But loopback has its own issues as it breaks the transaction boundaries you have from context connection. Specifically with a loopback connection your TVF won't be able to read the changes made by the same transaction that called the TVF, because is a different transaction on a different connection.
SQL is designed to work against huge data sets, and is extremely powerful. With set based logic it's often unnecessary to iterate over the data to perform operations, and there are a number of built-in ways to do this within SQL itself.
1) write set based logic to update the data without cursors
2) use deterministic User Defined Functions with set based logic (you can do this with the SqlFunction attribute in CLR code). Non-Deterministic will have the affect of turning the query into a cursor internally, it means the value output is not always the same given the same input.
[SqlFunction(IsDeterministic = true, IsPrecise = true)]
public static int algorithm(int value1, int value2)
{
int value3 = ... ;
return value3;
}
3) use cursors as a last resort. This is a powerful way to execute logic per row on the database but has a performance impact. It appears from this article CLR can out perform SQL cursors (thanks Martin).
I saw your comment that the complexity of using set based logic was too much. Can you provide an example? There are many SQL ways to solve complex problems - CTE, Views, partitioning etc.
Of course you may well be right in your approach, and I don't know what you are trying to do, but my gut says leverage the tools of SQL. Spawning multiple readers isn't the right way to approach the database implementation. It may well be that you need multiple threads calling into a SP to run concurrent processing, but don't do this inside the CLR.
To answer your question, with CLR implementations (and IDataReader) you don't really need to page results in chunks because you are not loading data into memory or transporting data over the network. IDataReader gives you access to the data stream row-by-row. By the sounds it your algorithm determines the amount of records that need updating, so when this happens simply stop calling Read() and end at that point.
SqlMetaData[] columns = new SqlMetaData[3];
columns[0] = new SqlMetaData("Value1", SqlDbType.Int);
columns[1] = new SqlMetaData("Value2", SqlDbType.Int);
columns[2] = new SqlMetaData("Value3", SqlDbType.Int);
SqlDataRecord record = new SqlDataRecord(columns);
SqlContext.Pipe.SendResultsStart(record);
SqlDataReader reader = comm.ExecuteReader();
bool flag = true;
while (reader.Read() && flag)
{
int value1 = Convert.ToInt32(reader[0]);
int value2 = Convert.ToInt32(reader[1]);
// some algorithm
int newValue = ...;
reader.SetInt32(3, newValue);
SqlContext.Pipe.SendResultsRow(record);
// keep going?
flag = newValue < 100;
}
Cursors are a SQL only function. If you wanted to read chunks of data at a time, some sort of paging would be required so that only a certain amount of the records would be returned. If using Linq,
.Skip(Skip)
.Take(PageSize)
Skips and takes could be used to limit results returned.
You can simply iterate over the DataReader by doing something like this:
using (IDataReader reader = Command.ExecuteReader())
{
while (reader.Read())
{
//Do something with this record
}
}
This would be iterating over the results one at a time, similiar to a cursor in SQL Server.
For multiple recordsets at once, try MARS
(if SQL Server)
http://msdn.microsoft.com/en-us/library/ms131686.aspx

why does entity framework+mysql provider enumeration returns partial results with no exceptions

I'm trying to make sense of a situation I have using entity framework on .net 3.5 sp1 + MySQL 6.1.2.0 as the provider. It involves the following code:
Response.Write("Products: " + plist.Count() + "<br />");
var total = 0;
foreach (var p in plist)
{
//... some actions
total++;
//... other actions
}
Response.Write("Total Products Checked: " + total + "<br />");
Basically the total products is varying on each run, and it isn't matching the full total in plist. Its varies widely, from ~ 1/5th to half.
There isn't any control flow code inside the foreach i.e. no break, continue, try/catch, conditions around total++, anything that could affect the count. As confirmation, there are other totals captured inside the loop related to the actions, and those match the lower and higher total runs.
I don't find any reason to the above, other than something in entity framework or the mysql provider that causes it to end the foreach when retrieving an item.
The body of the foreach can have some good variation in time, as the actions involve file & network access, my best shot at the time is that when the .net code takes beyond certain threshold there is some type of timeout in the underlying framework/provider and instead of causing an exception it is silently reporting no more items for enumeration.
Can anyone give some light in the above scenario and/or confirm if the entity framework/mysql provider has the above behavior?
Update 1: I can't reproduce the behavior by using Thread.Sleep in a simple foreach in a test project, not sure where else to look for this weird behavior :(.
Update 2: in the example above the .Count() always returns the same + correct amount of items. Using ToList or ToArray as suggested gets around of the issue as expected (no flow control statements in the foreach body) and both counts match + don't vary on each run.
What I'm interested in is what causes this behavior in entity framework + mysql. Would really prefer not having to change the code in all the projects that use entity framework + mysql to do .ToArray before enumerating the results because I don't know when it'll swallow some results. Or if I do it, at least know what/why it happened.
If the problem is related to the provider or whatever, then you can solve/identify that by realising the enumerable before you iterate over it:
var realisedList = plist.ToArray();
foreach(var p in realisedList)
{
//as per your example
}
If, after doing this, the problem still persists then
a) One of the actions in the enumerator is causing an exception that is getting swallowed somewhere
b) The underlying data really is different every time.
UPDATE: (as per your comment)
[deleted - multiple enumerations stuff as per your comment]
At the end of the day - I'd be putting the ToArray() call in to have the problem fixed in this case (if the Count() method is required to get a total, then just change it to .Length on the array that's constructed).
Perhaps MySql is killing the connection while you're enumerating, and doesn't throw an error to EF when the next MoveNext() is called. EF then just dutifully responds by saying that the enumerable is simply finished. If so, until such a bug in the provider is fixed, the ToArray() is the way forward.
I think actually that you hit on the answer in your question, but it may be the data that is causing the problem not the timeout. Here is the theory:
One (or several) row(s) in the result set has some data that causes an exception / problem, when it hits that row the system thinks that it has reached the last row.
To test this you could try:
Ordering the data and see if the number returned in the for each statement is the same each time.
Select only the id column and see if the problem goes away
Remove all rows from the table, add them back a few at a time to see if a specific row is causing the problem
If it is a timeout problem, have you tried changing the timeout in the connection string.
I believe it has to do with the way the EF handles lazy loading. You might have to use either Load() or Include() and also check using IsLoaded property within your processing loop. Check out these two links for more information:
http://www.singingeels.com/Articles/Entity_Framework_and_Lazy_Loading.aspx
http://blogs.msdn.com/jkowalski/archive/2008/05/12/transparent-lazy-loading-for-entity-framework-part-1.aspx
I apologize I don't know more about EF to be more specific. Hopefully the links will provide enough info to get you started and others can chime in with any questions you might have.
The issue, cause and workaround is described exactly in this mysql bug.
As suspected it Is a timeout related error in the provider, but its not the regular timeout i.e. net_write_timeout. That's why the simple reproduction in a test project didn't work, since the timeout relates to All the cycles of the foreach and not just a particularly long body between the read of 2 rows.
As of now, the issue is present in the latest version of the MySql provider and under normal conditions would only affect scenarios where rows are being read with a connection maintained for a long time (which might or not involve a slow query). This is great, because it doesn't affect all of the previous projects where I have used MySql / applying the workaround to the sources also means it doesn't fail silently.
Ps. couple of what seem to be related mysql bugs: 1, 2

Categories

Resources