Maximum number of retries (6) exceeded while executing database operations with 'CosmosExecutionStrategy' - c#

I am working on API development project using ASP.NET Core 2.2, GraphQL.NET , CosmosDB, Entity Framework Core (Microsoft.EntityFrameworkCore.Cosmos v2.2.4).
While testing the API method which pulls the data from AzureComosDB, sometime I get this error:
Microsoft.EntityFrameworkCore.Storage.RetryLimitExceededException: 'Maximum number of retries (6) exceeded while executing database operations with 'CosmosExecutionStrategy'. See inner exception for the most recent failure.'
I am not sure why this error is popping up intermittently.
Can anyone help me here by providing some guidance to fix this issue?

I would like to know more information about context file as the error says `
'Maximum number of retries (6) exceeded'
`. It might cause if you are trying to redeploy database on every request, So Considering you have already deployed database in cosmosdb it is recommended to remove Database.EnsureCreated() as will create performance issues.
Refer to this documentation for more information https://learn.microsoft.com/en-us/ef/core/providers/cosmos/?tabs=dotnet-core-cli

First of all, have you checked the inner exception as stated in the error?
Microsoft.EntityFrameworkCore.Storage.RetryLimitExceededException: 'Maximum number of retries (6) exceeded while executing database operations with 'CosmosExecutionStrategy'. See inner exception for the most recent failure.'
it might give a clue as to why it is failing.
Now, this error is caused by the cosmos retry stategy. If an operation failes it will retry it to up to six times.
You can modify this strategy but the default can be found here.
The fact that it is retried indicates it it an error that might be gone when retried. A good example is a glitch in the netwerk connection (like, when the wifi signal is bad). Another one could be the fact that the requests are exceeding the provisioned Request Unit limits.

Related

Microsoft Graph API - Exchange Online Messages call returns ServiceUnavailable

I am fetching messages from Exchange in Office365 using Microsoft Graph API.
However, for some folders I seem to get intermittent exceptions.
What we are using:
Microsoft.Graph Version 3.9.0 - Microsoft Graph Client Library for .Net
Microsoft.Graph.Core Version 1.21.0 - Microsoft Graph Core Client Library for .Net
This is the call being used:
'GET /v1.0/users/{id}/mailFolders/{id}/messages'
And this is the error (ServiceUnavailable with UnknownError as inner exception):
Status Code: ServiceUnavailable Microsoft.Graph.ServiceException:
Code: UnknownError Message: Error while processing response. Inner
error:
AdditionalData:
date: 2020-08-04T13:55:33
request-id: ** ClientRequestId: **
Code: UnknownError Message: Error while processing response. Inner error:
AdditionalData:
date: 2020-08-04T13:55:33
request-id: ** ClientRequestId: **
What I've tried:
Throttling:
These are usually the errors we would see with throttling. However, in this case, there seems to be no indication of throttling being applied. There isn't any 'back-off' time returned in the result. Other requests to different folders returns just fine too. By applying our own 'back-off' time (ranging between 5mins-20mins does not seem so make a difference either).
Beta endpoint:
The call posted above shows /v1.0 used. We've also switched to the /beta endpoint, with no difference.
Amount of mails retrieved:
Graph allows us to retrieve up to 999 mails at a time. We've reduced that all the way down to a mail or 2 at a time, but it still returns with the same error.
Delta token:
We've also tried switching over to using the delta token in order to retrieve the mails. This also returns with the same error.
Graph downgrade:
Hoping that there is some difference in the last few versions, we downgraded Graph. There was no difference.
Check local sync issues:
I've noticed in the past (quite a while back), that when doing this call for a folder that has potential local sync issues, this is the same type of error response returned. In this case, there is no reason to believe that these are local sync issues.
Additional:
When setting up the httpProvider, I've removed the default retry handlers as well. I've seen that using the default retry handler, it would automatically catch the 'ServiceException' and do internal retries (not adhering to back offs (not that there is any)), and would result in a tooManyRetries or a timeout (hiding the actual issue). By removing the default retry handler, we can see the actual 'ServiceException' error returned by the server.
When:
Based on our telemetry, this seems to have started happening a lot more frequently since around the 11-13th of June. Before that we did not experience any issues.
There are days that the requests work, but they are few and far in between.
This is quite a big issue for us, so any suggestions would be greatly appreciated. Any specific Microsoft Support channel that I can log this with would also help.
Thanks in advance.

Docusign eSign: CreateEnvelope requests timing out

We've been having issues sending certain Docusign envelopes lately, specifically those with large file sizes.
The errors we've been getting are:
Error calling CreateEnvelope: The operation has timed out
And
The request was aborted: The request was canceled.
No inner exception with any additional information in either case.
These errors only occur on our production server; on my local development machine everything works fine, so I can only assume that this is a connectivity issue; that there simply isn't enough time to send the supplied data over the available connection before something times out. What I would like to know is, what is the something that's timing out? Are these errors coming from my end, or Docusign's? If the former, is there any way to increase the timeout? I've got my HTTP execution timeout set to 300 seconds:
<httpRuntime maxRequestLength="30000" requestValidationMode="4.0" executionTimeout="300" targetFramework="4.5" />
... but that doesn't seem to affect anything, it always seems to time out at the default 1 minute 50 seconds.
Is there anything more I can do to prevent these requests from timing out?
Thanks,
Adam
Our issue has been resolved. The timeouts were indeed being caused by something on our end; there is a "Timeout" property which can be set against the EnvelopesApi object before sending; it can also be passed into the constructor when declared. So our fix was as simple as:
EnvelopesApi envelopesApi = new EnvelopesApi();
envelopesApi.Configuration.Timeout = DocusignTimeout;
The crux of our issue was that the Timeout property was not exposed in older versions of eSign. We had upgraded to 2.1.0 (the current version) earlier this week, but something must not have taken, as the metadata still showed our DocuSign.eSign.Client.Configuration class at version 15.4.0.0. Uninstalling the reinstalling eSign and RestSharp packages from NuGet gave us the correct version of this class, and enabled us to set our own timeout.
Hope this is helpful!

TCP Provider, error: 0 - The specified network name is no longer available

Recently we face the above issue from on of our web service log. This happens only on and off but we puzzled what may cause this error to happen. The sql is just a normal sql. First we thought this may because the long running queries but as per goggling found out that .Net 2008 will have timeout expired exception if exceed defaults 30 seconds timeout.So suspect not because the sql itself. Form the event viewer we can find NLB convergence event happen during the time. Is that NLB convergence will cause network glitch and thus cause the error triggered?

context.Database.ExecuteSqlCommand - Error code 701 - Microsoft Azure

I have the following code in the migration configuration seeding method:
string sqlQuery = // 22 mb file contents
context.Database.ExecuteSqlCommand( sqlQuery );
The ExecuteSqlCommand function is making Azure throw this error:
The service has encountered an error processing your request. Please
try again. Error code 701. A severe error occurred on the current
command. The results, if any, should be discarded.
Why might this be? I really can't find much information on it
SQL error 701 according to my googling is the following:
SQL Server has failed to allocate sufficient memory to run the query.
This can be caused by a variety of reasons including operating system
settings, physical memory availability, or memory limits on the
current workload. In most cases, the transaction that failed is not
the cause of this error.
May be time for an upgrade, or change what your query is pulling back or splitting it out into separate queries.

IBM WebSphere XMS.Net CWSMQ0082E error

On several occasions I have received the following error from a .Net (C#, 4.0) application out of the blue on sending a message thru a producer:
CWSMQ0082E: Failed to send to CompCode: 2, Reason: 2009. A problem was encountered whilst sending a message. See the linked exception for more information.
Of course, the LinkedException (why not use the InnerException IBM???) is null i.e. no more information available.
Code I'm using (pretty straightforward):
var m = _session.CreateBytesMessage();
m.WriteBytes(mybytearray);
m.JMSReplyTo = myreplytoqueue;
m.SetIntProperty(XMSC.JMS_IBM_MSGTYPE, MQC.MQMT_DATAGRAM);
m.SetIntProperty(XMSC.JMS_IBM_REPORT_COA, MQC.MQRO_COD);
m.SetIntProperty(XMSC.JMS_IBM_REPORT_COD, MQC.MQRO_COA);
myproducer.Send(m, DeliveryMode.Persistent, mypriority, myttl);
(Offtopic: I hate the SetIntProperty way of setting properties. Which <expletive deleted> came up with that idea? It takes ages to look up all sorts of constants all over the place and its allowed values.)
The exception is thrown on the .Send method. I'm using XMS.Net (IA9H / 2.0.0.7). The only Google result that turns up turns out to have a different reason code (and even if it were the same, it should be fixed in my version if I understand correctly). This occurs randomly (though it seems to happen more often when it's been a while since a message has been sent/received) and I have no way to reproduce this.
I have ab-so-lute-ly no idea how to troubleshoot this or even where to start looking. Is this something caused by the server-side? Is it caused by XMS.net or some underlying IBM WebSphere MQ infrastructure?
Some results that I found that seem similar are suggesting to set SHARECNV to any value higher than 0 or to "true" / "yes" but the documentation explicitly tells me the default is 10. Also; I have no idea if this is the cause so changing it to another value feels like a shotgun approach.
Anybody any idea on how to go about solving this? I could of course just catch the exception, tear everything (channels, sessions, whatever) down and restart but that's just plain ugly IMHO.
The 2009 return code means "Connection Broken." Basically, the underlying TCP socket is gone and the client finds out about it at the time of the API call. It is possible to tune the channels using heartbeat and keepalive so that WMQ tries harde to keep the socket alive. However if the socket is timed out by the underlying infrastructure, nothing WMQ can do will help. Examples we've seen are that firewalls and load balancers are often set to detect idle connections and sever them.
Modern versions of WMQ client will attempt to reconnect transparently. The application just blocks a bit longer when this occurs.
Short of using the automatic reconnect, the only solution is in fact to rebuild the connection. Since it will get a new connection handle, all the object handles must be rebuilt as well.
Many of the tuning functions described here are available through the client configuration file, available in v7.0 and greater clients. In particular, the TCP stanza of that file enables keepalive. (The TCP spec says that if keepalive is provided, it must be disabled by default.) The QMgr has a similar ini file with configuration stanzas, including one for keepalive. The latest WMQ client is available as SupportPac MQC71 if you need that.
In cases where the main exception is sufficient enough to indicate the error, the inner exception will be null. In your case it's MQ reason code 2009 which means a connection to queue manager has been broken. The socket through which your application and queue manager were communicating was closed for some reason. The reason for socket close could be a network blip.
Along with suggestions T.Rob noted above, You could also run a XMS and Queue manager trace to understand the problem further. Please see the Troubleshooting chapter in XMS InfoCenter.
HTH

Categories

Resources