I'm attempting to write a curl-like tool that demonstrates the effect of various HTTP caching headers on dot net's HttpClient class.
In my initial attempt I'm pointing the tool at one of my internal web services that does not specify any caching information in the response and examining the header of the response.
I expect to see that the request is re-sent each time and executed on the server, returning a new but identical set of content each time (for the purpose of this test, the content is static on the server). But, instead, each request after the first returns much more quickly than the first and includes a new header Age that was not present in the very first response. This indicates to me that the HttpClient in my command-line tool is returning the response from cache, not placing a new request.
Here is the first request with the response headers:
HTTP:>GET http://myserver:8058/path1/path2
Status 200 OK (OK in 00:00:00.3235905):
Date = Sat, 08 Jul 2017 15:55:22 GMT
Server = Microsoft-HTTPAPI/2.0
Content-Length = 150867
Content-Type = application/json; charset=utf-8
and here is the request from the same session of my curl tool, a little while later:
HTTP:>GET http://myserver:8058/path1/path2
Status 200 OK (OK in 00:00:00.0188433):
Date = Sat, 08 Jul 2017 15:55:22 GMT
Server = Microsoft-HTTPAPI/2.0
Age = 312
Content-Length = 150867
Content-Type = application/json; charset=utf-8
and finally, after I stop and start my program, here's another request from the new instance:
HTTP:>GET http://myserver:8058/path1/path2
Status 200 OK (OK in 00:00:00.0517271):
Date = Sat, 08 Jul 2017 15:55:22 GMT
Server = Microsoft-HTTPAPI/2.0
Age = 528
Content-Length = 150867
Content-Type = application/json; charset=utf-8
The last one I find even more difficult to understand as I was under the impression (from reading this: https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/) that caching is maintained per instance of HttpClient.
This seems to continue forever with Age increasing each request. The only way to get back to the original response is to use Internet Explorer and delete temporary internet files.
[Additional Info] After leaving my command line application open for a couple of hours I repeated the request and received a response identical to the original, without the Age header. So it appears that, if HttpClient was caching the response, that cache expired after a couple of hours.
Can anyone tell me if I'm correct that HttpClient is performing internal caching in this case, and if so, why it's doing so in the absence of any caching-related response headers and what policy it's using?
Related
Request:
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
String responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
Console.WriteLine(responseString);
Response:
{"code":"SUCCESS","details":
{"created_time":"","id":"xxxx"},
"message":"uploaded",
"status":"success"}
HTTP/1.1 200 OK
Date: Wed, 18 Dec 2019 11:42:26 IST
Last-Modified: Wed, 18 Dec 2019 11:42:25 IST
Content-Type: application/json
Connection: Keep-Alive
Server: AWServer
Pragma: no-cache
Cache-Control: no-cache
Expires: 1
Whenever the above-mentioned C# request is executed, the response occasionally contains headers(HTTP/1.1 200 OK...), When I'm only trying to get the body part({"code"....} alone(response.GetResponseStream()). Is this the intended behavior?
Take a look at the basic article on http headers
HTTP headers let the client and the server pass additional information with an HTTP request or response. An HTTP header consists of its case-insensitive name followed by a colon (:), then by its value. Whitespace before the value is ignored.
Headers are additional information. I guess that since you left out the url and the whole creation of the Request and the url, this means that some responses have Headers and some not. That depends on the additional non-body information the api or web server wants to respond with.
It's in the control of the responder and not the receiver.
Don't ignore them: Some times interesting metadata come from Headers. It should not be data but information about it, like encoding, CORS info etc.
last modified header link
date header link
I am responding to a GET request from a field device with the following:
var reply = new HttpResponseMessage(System.Net.HttpStatusCode.OK)
{
Content = new StringContent("SUCCESS")
};
The word appears in the message of the body. But the field device is saying that it is not successful.
From this, I can gather that I shouldn't be using HttpResponseMessage but some other means.
The suggested reply needs to look like this:
HTTP/1.1 200 OK<CR><LF>
Date: Mon, 15 Feb 2016 11:34:50 GMT<CR><LF>
Server: Apache/2.2.31 (Win32) mod_ssl/2.2.31 OpenSSL/1.0.2f PHP/5.4.45<CR><LF>
X-Powered-By: PHP/5.4.45<CR><LF>
Content-Length: 7<CR><LF>
Keep-Alive: timeout=5, max=100<CR><LF>
Connection: Keep-Alive<CR><LF>
Content-Type: text/plain<CR><LF>
<CR><LF>
SUCCESS<CR><LF>
BTW I am using Microsoft for this not PHP, this is from the manual.
Should I be using HttpRequestMessage ?
The part I am concerned with is <CR><LF>SUCCESS<CR><LF>
So the issue was that the device required a time sync with my server before it could send data later on. So there never was an issue with the response.
I'm trying to gather a list of recent posts that contain a certain hashtag. The API Documentation states that I should be using the following GET call:
https://api.instagram.com/v1/tags/{tag-name}/media/recent?access_token=ACCESS-TOKEN
When I load the page where I want this information displayed, I perform the following:
using(HttpClient Client = new HttpClient())
{
var uri = "https://api.instagram.com/v1/tags/" + tagToLookFor + "/media/recent?access_token=" + Session["instagramaccesstoken"].ToString();
var results = Client.GetAsync(uri).Result;
// Result handling below here.
}
For reference, tagToLookFor is a constant string defined at the top of the class (eg. foo), and I store the Access Token returned from the OAuth process in the Session object with a key of 'instagramaccesstoken'.
While debugging this, I checked to make sure the URI was being formed correctly, and it does contain both the tag name and the just-created access_token. Using Apigee with the same URI (Save for a different access_token) returns the valid results I would expect. However, attempting to GET using the URI on my webstie returns:
{
StatusCode: 400,
ReasonPhrase: 'BAD REQUEST',
Version: 1.1,
Content: System.Net.Http.StreamContent,
Headers:{
X-Ratelimit-Remaining: 499
Vary: Cookie
Vary: Accept-Language
X-Ratelimit-Limit: 500
Pragma: no-cache
Connection: keep-alive
Cache-Control: no-store, must-revalidate, no-cache, private
Date: Fri, 27 Nov 2015 21:39:56 GMT
Set-Cookie: csrftoken=97cc443e4aaf11dbc44b6c1fb9113378; expires=Fri, 25-Nov-2016 21:39:56 GMT; Max-Age=31449600; Path=/
Content-Length: 283
Content-Language: en
Content-Type: application/json; charset=utf-8
Expires: Sat, 01 Jan 2000 00:00:00 GMT
}
}
I'm trying to determine what the difference between the two could be; the only thing that I can think of is that access_token is somehow being invalidated when I switch between pages. The last thing I do on the Login/Auth page is store the access_token using Session.Add, then call Server.Transfer to move to the page that I'm calling this on.
Any Ideas on what the issue could be? Thanks.
Attach the token to the header when making the request.
Client.DefaultRequestHeaders.Add("access_token", "Bearer " + token);
The problem ended up being one regarding Sandbox Mode. I had registered an app after the switch, and I was the only user in my sandbox. As a result, it had no problem finding my posts/info, but Sandbox Mode acts as if the Sandbox users are the only users on Instagram, so naturally it would not find anything else.
It turns out there was an existing registered application in my organization (made before the switch date) that does not have any such limitations, so I have been testing using that AppID/secret.
tl;dr: If you're the only user in your app's sandbox, work on getting users into your sandbox. See their article about it for more info.
I'm using HttpClient 0.6.0 from NuGet.
I have the following C# code:
var client = new HttpClient(new WebRequestHandler() {
CachePolicy =
new HttpRequestCachePolicy(HttpRequestCacheLevel.CacheIfAvailable)
});
client.GetAsync("http://myservice/asdf");
The service (this time CouchDB) returns an ETag value and status code 200 OK. There is returned a Cache-Control header with value must-revalidate
Update, here are the response headers from couchdb (taken from the visual studio debugger):
Server: CouchDB/1.1.1 (Erlang OTP/R14B04)
Etag: "1-27964df653cea4316d0acbab10fd9c04"
Date: Fri, 09 Dec 2011 11:56:07 GMT
Cache-Control: must-revalidate
Next time I do the exact same request, HttpClient does a conditional request and gets back 304 Not Modified. Which is right.
However, if I am using low-level HttpWebRequest class with the same CachePolicy, the request isn't even made the second time. This is the way I would want HttpClient also behave.
Is it the must-revalidate header value or why is HttpClient behaving differently? I would like to do only one request and then have the rest from cache without the conditional request..
(Also, as a side-note, when debugging, the Response status code is shown as 200 OK, even though the service returns 304 Not Modified)
Both clients behave correctly.
must-revalidate only applies to stale responses.
When the must-revalidate directive is present in a response received by a cache, that cache MUST NOT use the entry after it becomes stale to respond to a
subsequent request without first revalidating it with the origin server. (I.e., the cache MUST do an end-to-end revalidation every time, if, based solely on the origin server's Expires or max-age value, the cached response is stale.)
Since you do not provide explicit expiration, caches are allowed to use heuristics to determine freshness.
Since you do not provide Last-Modified caches do not need to warn the client that heuristics was used.
If none of Expires, Cache-Control: max-age, or Cache-Control: s- maxage (see section 14.9.3) appears in the response, and the response does not include other restrictions on caching, the cache MAY compute a freshness lifetime using a heuristic. The cache MUST attach Warning 113 to any response whose age is more than 24 hours if such warning has not already been added.
The response age is calculated based on Date header since Age is not present.
If the response is still fresh according to heuristic expiration, caches may use the stored response.
One explanation is that HttpWebRequest uses heuristics and that there was a stored response with status code 200 that was still fresh.
Answering my own question..
According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4 I would say that
a "Cache-Control: must-revalidate" without expiration states that the resource should be validated on every request.
In this case it means a conditional GET should be done every time the resource is made. So in this case System.Net.Http.HttpClient is behaving correctly and the legacy (Http)WebRequest is doing invalid behavior.
I'm trying to get the same type of results that Fiddler gets when I launch a webpage from my app.
Below is the code I'm using and the results I'm getting. I've used google.com only as an example.
What do I need to modify in my code to get the results I want or do I need an entirely different approach?
Thanks for your help.
My code:
// create the HttpWebRequest object
HttpWebRequest objRequest = (HttpWebRequest)WebRequest.Create("http://www.google.com");
// get the response object which has the header info, using the GetResponse method
var objResults = objRequest.GetResponse();
// get the header count
int intCount = objResults.Headers.Count;
// loop through the results object
for (int i = 0; i < intCount; i++)
{
string strKey = objResults.Headers.GetKey(i);
string strValue = objResults.Headers.Get(i);
lblResults.Text += strKey + "<br />" + strValue + "</br /><br />";
}
My results:
Cache-Control
private, max-age=0
Content-Type
text/html; charset=ISO-8859-1
Date
Tue, 05 Jun 2012 17:40:38 GMT
Expires
-1
Set-Cookie
PREF=ID=526197b0260fd361:FF=0:TM=1338918038:LM=1338918038:S=gefqgwkuzuPJlO3G; expires=Thu, 05-Jun-2014 17:40:38 GMT; path=/; domain=.google.com,NID=60=CJbpzMe6uTKf58ty7rysqUFTW6GnsQHZ-Uat_cFf1AuayffFtJoFQSIwT5oSQKqQp5PSIYoYtBf_8oSGh_Xsk1YtE7Z834Qwn0A4Sw3ruVCA9v3f_UDYH4b4fAloFJbW; expires=Wed, 05-Dec-2012 17:40:38 GMT; path=/; domain=.google.com; HttpOnly
P3P
CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server
gws
X-XSS-Protection
1; mode=block
X-Frame-Options
SAMEORIGIN
Transfer-Encoding
chunked
=========================
Fiddler results:
Result Protocol Host URL Body Caching Content-Type Process Comments Custom
1 304 HTTP www.rolandgarros.com /images/misc/weather/P8.gif 0 max-age=700 Expires: Tue, 05 Jun 2012 17:53:40 GMT image/gif firefox:5456
2 200 HTTP www.google.com / 23,697 private, max-age=0 Expires: -1 text/html; charset=UTF-8 chrome:2324
3 304 HTTP www.rolandgarros.com /images/misc/weather/P9.gif 0 max-age=700 Expires: Tue, 05 Jun 2012 17:53:57 GMT image/gif firefox:5456
4 200 HTTP Tunnel to translate.googleapis.com:443 0 chrome:2324
5 200 HTTP www.google.com
The difference is Fiddler is actually recording an entire session, not just a single HTTP request.
If a user loads Google.com, the response is typically an HTML document which contains images, script files, CSS files, etc. Your browser will then initiate a new HTTP request for each one of those resources. With Fiddler running, it tracks each of those HTTP requests and spits out the result code and other information about the session.
With your C# code above, you're only initiating a single HTTP request, thus you only have information about a single result.
You'd probably be better off writing a browser plugin. Otherwise, you'd have to parse the HTML response and load other resources from that document as well.
If you do need to do this with C# code, you could probably parse the document with the HTML Agility Pack and then look for other resources within the HTML to simulate a browser. There's also embedded browsers, such as Awesomium, that might be helpful.
You are not asking for the same information that Fiddler is displaying. Fiddler shows the HTTP Status code, the host and URI and (it appears, from your example) the Content Length, Content Type and Cache status.
For many of these you will have to peek in to the response headers.