Azure Blob Storage .NET client request timeout - c#

I'm trying to understand the behavior of handling network errors in the Azure Storage .NET client. In short, my issue is:
If I pull my network cable while I'm downloading a blob from blob storage, my application will hang for at least 30 minutes (this is how long my patience lasted - it probably hangs longer).
For example, this happens if I use the following code (I have not configured any settings on the blob client itself).
...
var blockBlob = container.GetBlockBlobReference("myblob.data");
var blobRequestOptions = new BlobRequestOptions()
{
RetryPolicy = new NoRetry(),
};
using (var stream = new MemoryStream())
{
blockBlob.DownloadToStream(stream, null, blobRequestOptions);
}
I know that I can configure the MaximumExecutionTime property in BlobRequestOptions, but it seems a bit strange to me that the default behavior is to hang indefinitely if there's a drop in network connectivity. This makes me suspect that I'm missing something basic on how the client is supposed to be used. (There default value for MaximumExecutionTimeout appears to be Infinite).
I also know I can pass in a ServerTimeout, but my understanding is that this is used internally in the Azure Storage service and would't be applicable if there's a network drop.
What I think I'm looking for specifically is a per-request timeout for the HTTP calls made to blob storage. Something like Timeout on a HttpWebRequest.
(I've reproduced my issue in the Azure Storage Client version 9.3.2)

From my understanding of the SDK, the timeout is handled by default on the server side. I did not find anything regarding this on the MSDN, but the Azure Java SDK (using the same HTTP endpoints) says :
The default maximum execution is set in the client and is by default null, indicating no maximum time.
You can check it here : https://azure.github.io/azure-storage-java/index.html?com/microsoft/azure/storage/RequestOptions.html
Look for the setMaximumExecutionTimeInMs method.
Since the timeouts seem to be handled by the server and the default client has not default timeout value, it makes sense that you request never end when you unplug the router because you won't be able to catch the server-sided timeout.

I found that the storage sdk team had indeed acknowledged and addressed this bug in v8.1.3 as seen in the change log:
https://github.com/Azure/azure-storage-net/blob/dfc88329b56ef022e38f2d39d709ddc2b41fe6a0/Common/changelog.txt
Changes in 8.1.3 :
- Blobs (Desktop) : Fixed a bug where the MaximumExecutionTime was not honored, leading to infinite wait, if due to a failure, e.g., a network failure after receiving the response headers, server stopped sending partial response.
commit: https://github.com/Azure/azure-storage-net/pull/459/commits/ad8fd6ad3cdfad77cfe23afe16f1f96c04ad90ee
However, your claim is you can reproduce this in 9.3.2. I too, am seeing this issue with 11.1.1. I'm thinking that the bug was not fully addressed.

Related

How to fix incosistent and slow Google Cloud Storage response times?

I'm using Google Cloud Storage to store and retrieve some files, and my problem is that the response times I'm getting are inconsistent, and sometimes very slow.
My application is an ASP.NET Core app running in the Google Container Engine. The Container Engine cluster is in europe-west1-c. The Cloud Storage bucket is Multi-Regional, in the location EU, and it's a secure bucket (not publicly accessible). I'm using the latest version of the official Google.Cloud.Storage.V1 SDK package to access the Cloud Storage. (I tried both 1.0.0 and the new 2.0.0-beta01.) I'm using a singleton instance of the StorageClient object, which should do connection pooling under the hood.
I'm measuring and logging the time it takes to download a file from the Cloud Storage, this is the measurement I do.
var sw = Stopwatch.CreateNew();
await client.DownloadObjectAsync(googleCloudOptions.StorageBucketName, filepath, ms);
sw.Stop();
So I'm directly measuring the SDK call without any of my own application logic.
The numbers I'm getting for this measurement look like this in an average period.
44ms
56ms
501ms
274ms
90ms
237ms
145ms
979ms
446ms
148ms
You can see that the variance is already pretty large to begin with (and the response time is often really sluggish).
But occasionally I even get response times like this (the slowest I've seen was over 10 seconds).
172ms
4,348ms
72ms
51ms
179ms
2,508ms
2,592ms
100ms
Which is really bad considering that the file I'm downloading is ~2 KB in size, and my application is doing less than 1 requests per second, and I'm running my application inside the Google Cloud. I don't think that the bucket not being warmed up can be a problem, since I'm mainly downloading the same handful of files, and I'm doing at least a couple of requests every minute.
Does anyone know what can be the reason for this slowness, or how I could investigate what's going wrong?
Update: Following #jterrace's suggestion, I've run gsutil perfdiag on the production environment, and uploaded both the terminal output and the generated json report here.
I also collected some more measurements, here you can see the statistics for the last 7 days.
So you can see that slow requests don't happen super-often, but over half a second response time is not rare, and we even have a handful of requests over 5 seconds every day.
What I'd like to figure out is whether we're doing something wrong, or this is expected with Cloud Storage and we have to be prepared to be able to handle these slow responses on our side.
We have the same issue with GCS. The only answer we got (from GCS support) is to use exponential backoff.
First request should be with 200ms timeout, next try 400ms and so on.
A common problem I've seen in GCE is that due to gcloud clients having a heavy DNS dependency, that bursts of traffic are being throttled by DNS queries, not the actual clients (storage or otherwise). I highly recommend you adding etcd or some other DNS cache to your container. Any real amount of traffic in GCE will choke otherwise.

How to disable Nagle's algorithm in ServiceStack?

We're using ServiceStack 3.9.71.0 and we're currently experiencing unexplained latency issues with clients over a WAN connection.
A reply with a very small payload (<100 bytes) is received after 200ms+.
The round-trip-time (RTT) on the link is about 40ms due to the geographical distance. This has been verified by pinging the other host and using a simple echo service to test the latency of a TCP connection.
Both ping and echo test show latencies which are in line with expectations. Getting a reply from our ServiceStack host takes much longer than expected.
We've verified that:
WAN link is only running at 25% of capacity (no congestion)
No QOS is employed on the WAN link
same host gives fast reply to same request from a different host on local network
delay is not caused by our code processing the request
We've now stumbled across Nagle's algorithm and that it can mean delays for small requests on WAN networks (http://blogs.msdn.com/b/windowsazurestorage/archive/2010/06/25/nagle-s-algorithm-is-not-friendly-towards-small-requests.aspx).
In .NET it can be disabled by setting TcpClient.NoDelay = true (https://msdn.microsoft.com/en-us/en-US/library/system.net.sockets.tcpclient.nodelay(v=vs.110).aspx).
How can this be disabled for ServiceStack's TCP handling?
EDIT: I don't think that this is a duplicate of HttpWebRequest is slow with chunked data. The mentioned question covers HttpWebRequest which isn't used by ServiceStack. ServiceStack uses HttpListener which also happens to be controlled / managed by the mentioned ServicePointManager. We're going to conduct a test to see whether setting ServicePointManager.UseNagleAlgorithm = false solves the issue.
I think you provided an answer in your Update UseNagleAlgorithm = false should solve this issue. But be careful because ServicePointManager.UseNagleAlgorithm = false; is a global settings which means it will turn off this algorithm for all of your endpoint and for all of your requests in the entire App Domain. When you call more than one service endpoints (usually that is the case) with mixed sized of Request it will bite back. So you should consider setting this only for one specific ServicePoint, you can acquire it by:
ServicePoint sp = ServicePointManager.FindServicePoint(<uri>);
sp.UseNagleAlgorithm = false;
and not set it globally
Here is an article about it: https://technet2.github.io/Wiki/blogs/windowsazurestorage/nagles-algorithm-is-not-friendly-towards-small-requests.html

HttpModule ATL Server Service InputStream Failure

I'm trying to add some SOAP message logging capabilities to an old, old ATL Server web service that is running in integrated mode in IIS 7.5 on a Windows Server 2008 box, but running into a strange problem. For further background, I've added the assembly that contains the HttpModule to the modules element of the web.config for the ATL Server web service.
I've been following the answer provided here and the logging itself works great.
However, whenever I use this logging capability, the service responds with "SOAP Invalid Request", while the log file has the SOAP message as expected. I've done lots of fiddling around with it and figured out that this only happens if/when I access the request object's InputStream property in my handler for the BeginRequest event. It will fail if I even simply set a variable to the length of the InputStream like this:
private void OnBegin(object sender, EventArgs e)
{
var request = _application.Request;
//this will blow up
var foo = request.InputStream.Position;
}
If I don't touch the InputStream in my handler (which doesn't do much good when I'm only doing this to log the contents of the request, obviously), the request goes through perfectly.
I can access header values in the Request object and various other properties of the HttpApplication involved, but accessing the InputStream causes the service to choke.
Is there something intrinsic with ATL Server that will prevent me from doing this logging? Do I need to add some sort of locking or other safeguard in my BeginRequest handler to make sure this behaves? Is my handler hosing up the InputStream somehow causing it to be unusable for the service?
Another way of approaching this is to ask if there is a way to see the request as it gets to my service (i.e. after this HttpModule executes)?
It may also be worth noting that I am using SoapUI to test the service.
EDIT:
I've now done some failed request tracing in IIS and I get this error message:
ModuleName IsapiModule
Notification 128
HttpStatus 500
HttpReason Internal Server Error
HttpSubStatus 0
ErrorCode 0
ConfigExceptionInfo
Notification EXECUTE_REQUEST_HANDLER
ErrorCode The operation completed successfully. (0x0)
This comes in the handler for the ATL Server web service (i.e. the DLL for the service). Directly before that is the "GENERAL_READ_ENTITY_START" and "GENERAL_READ_ENTITY_END" messages, and the "END" has this message:
BytesReceived 0
ErrorCode 2147942438
ErrorCode Reached the end of the file. (0x80070026)
Does that mean what I think it means? That the handler isn't getting any data? Is this more evidence pointing towards my HttpModule messing with the Request's InputStream?
Are you sure your request object is valid? You're doing things slightly differently here from the sample you reference. They are extracting the stream from the sender argument whereas you obviously rely on a member variable.
So I finally determined that this wasn't a workable approach: I couldn't get the HttpModule to fire at all in IIS 6 (which I would need to have it to do for it to be an acceptable solution). I tried setting the Filter property on the Request object and all sorts of other crazy ideas, but none that led me to be able to both record the request body in the HttpModule and have the service still work.
So I did more digging and came upon this article on codeproject that talks about the inner workings of ATL Server, specifically the HandleRequest method in atlsoap.h. I mucked around in there for a while and figured out a way to get at the request body in there, and it was pretty simple to write it to a file manually from there.
For those curious, this is the final code I added to HandleRequest():
//****************************************REQUEST LOGGING**********************************************************
BYTE* bytes = pRequestInfo->pServerContext->GetAvailableData();
FILE* pFile;
pFile = fopen("C:\\IISLog\\ATL.txt", "a");
fwrite(bytes, 1, pRequestInfo->pServerContext->GetAvailableBytes(), pFile);
fclose(pFile);
//****************************************REQUEST LOGGING**********************************************************
I am going to still play around with it a bit more, but I have what appears to be a workable solution.

IBM WebSphere XMS.Net CWSMQ0082E error

On several occasions I have received the following error from a .Net (C#, 4.0) application out of the blue on sending a message thru a producer:
CWSMQ0082E: Failed to send to CompCode: 2, Reason: 2009. A problem was encountered whilst sending a message. See the linked exception for more information.
Of course, the LinkedException (why not use the InnerException IBM???) is null i.e. no more information available.
Code I'm using (pretty straightforward):
var m = _session.CreateBytesMessage();
m.WriteBytes(mybytearray);
m.JMSReplyTo = myreplytoqueue;
m.SetIntProperty(XMSC.JMS_IBM_MSGTYPE, MQC.MQMT_DATAGRAM);
m.SetIntProperty(XMSC.JMS_IBM_REPORT_COA, MQC.MQRO_COD);
m.SetIntProperty(XMSC.JMS_IBM_REPORT_COD, MQC.MQRO_COA);
myproducer.Send(m, DeliveryMode.Persistent, mypriority, myttl);
(Offtopic: I hate the SetIntProperty way of setting properties. Which <expletive deleted> came up with that idea? It takes ages to look up all sorts of constants all over the place and its allowed values.)
The exception is thrown on the .Send method. I'm using XMS.Net (IA9H / 2.0.0.7). The only Google result that turns up turns out to have a different reason code (and even if it were the same, it should be fixed in my version if I understand correctly). This occurs randomly (though it seems to happen more often when it's been a while since a message has been sent/received) and I have no way to reproduce this.
I have ab-so-lute-ly no idea how to troubleshoot this or even where to start looking. Is this something caused by the server-side? Is it caused by XMS.net or some underlying IBM WebSphere MQ infrastructure?
Some results that I found that seem similar are suggesting to set SHARECNV to any value higher than 0 or to "true" / "yes" but the documentation explicitly tells me the default is 10. Also; I have no idea if this is the cause so changing it to another value feels like a shotgun approach.
Anybody any idea on how to go about solving this? I could of course just catch the exception, tear everything (channels, sessions, whatever) down and restart but that's just plain ugly IMHO.
The 2009 return code means "Connection Broken." Basically, the underlying TCP socket is gone and the client finds out about it at the time of the API call. It is possible to tune the channels using heartbeat and keepalive so that WMQ tries harde to keep the socket alive. However if the socket is timed out by the underlying infrastructure, nothing WMQ can do will help. Examples we've seen are that firewalls and load balancers are often set to detect idle connections and sever them.
Modern versions of WMQ client will attempt to reconnect transparently. The application just blocks a bit longer when this occurs.
Short of using the automatic reconnect, the only solution is in fact to rebuild the connection. Since it will get a new connection handle, all the object handles must be rebuilt as well.
Many of the tuning functions described here are available through the client configuration file, available in v7.0 and greater clients. In particular, the TCP stanza of that file enables keepalive. (The TCP spec says that if keepalive is provided, it must be disabled by default.) The QMgr has a similar ini file with configuration stanzas, including one for keepalive. The latest WMQ client is available as SupportPac MQC71 if you need that.
In cases where the main exception is sufficient enough to indicate the error, the inner exception will be null. In your case it's MQ reason code 2009 which means a connection to queue manager has been broken. The socket through which your application and queue manager were communicating was closed for some reason. The reason for socket close could be a network blip.
Along with suggestions T.Rob noted above, You could also run a XMS and Queue manager trace to understand the problem further. Please see the Troubleshooting chapter in XMS InfoCenter.
HTH

Using .NET's HttpWebRequest to download a multitude of files in a row

I have an application that needs to download several files in a row in succession (sometimes a few thousand). However, what ends up happening when several files need to be downloaded is I get an exception with an inner exception of type SocketException and the error code 10048 (WSAEADDRINUSE). I did some digging and basically it's because the server has run out of sockets (and they are all waiting for 240s or so before they become available again) - not coincidentally it starts happening around the 1024 file range. I would expect that the HttpWebRequest/ServicePointManager would be reusing my connection, but apparently it is not (and the files are https, so that may be part of it). I never saw this problem in the C++ code that this was ported from (but that doesn't mean it didn't ever happen - I'd be surprised if it was, though).
I am properly closing the WebRequest object and the HttpWebRequest object has KeepAlive set to true by default. Next my intent is to fiddle around with ServicePointManager.SetTcpKeepAlive(). However, I can't see how more people haven't run into this problem.
Has anyone else run into the problem, and if so, what did you do to get around it? Currently I have a retry scheme that detects this error and waits it out, but that doesn't seem like the right thing to do.
Here's some basic code to verify what I'm doing (just in case I'm missing closing something):
WebRequest webRequest = WebRequest.Create(uri);
webRequest.Method = "GET";
webRequest.Credentials = new NetworkCredential(username, password);
WebResponse webResponse = webRequest.GetResponse();
try
{
using(Stream stream = webResponse.GetResponseStream())
{
// read the stream
}
}
finally
{
webResponse.Close()
}
What kind of application is this? You mentioned that the server is running out of ports, but then you mentioned HttpWebRequest. Are you running this code in a webservice or ASP.NET page, which is trying to then download multiple files for the same incoming request from the client?
What kind of authentication is the page using? If it is using NTLM authentication, then the connections cannot be shared if the credentials being used are different for each request.
What I would suggest is to group your request per credential. So, for eg, all requests using username "John" would be grouped. You can specify the "ConnectionGroupName" property on the service point, so the system will try to reuse connections for the same credential and server.
If that also doesnt work, you will need to do one or more of the following:
1) Throttle your requests.
2) Increase the wildcard port range.
3) Use the BindIPConnectionCallback on ServicePoint to make it bind to a non-wildcard port (i.e a port in the range 1024-16384)
More digging seems to point to it possibly being due to authentication and the UnsafeAuthenticatedConnectionSharing property might alleviate this. However, I'm not sure that's the best thing, either.

Categories

Resources