Occasional response headers in C# HTTP Request - c#

Request:
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
String responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
Console.WriteLine(responseString);
Response:
{"code":"SUCCESS","details":
{"created_time":"","id":"xxxx"},
"message":"uploaded",
"status":"success"}
HTTP/1.1 200 OK
Date: Wed, 18 Dec 2019 11:42:26 IST
Last-Modified: Wed, 18 Dec 2019 11:42:25 IST
Content-Type: application/json
Connection: Keep-Alive
Server: AWServer
Pragma: no-cache
Cache-Control: no-cache
Expires: 1
Whenever the above-mentioned C# request is executed, the response occasionally contains headers(HTTP/1.1 200 OK...), When I'm only trying to get the body part({"code"....} alone(response.GetResponseStream()). Is this the intended behavior?

Take a look at the basic article on http headers
HTTP headers let the client and the server pass additional information with an HTTP request or response. An HTTP header consists of its case-insensitive name followed by a colon (:), then by its value. Whitespace before the value is ignored.
Headers are additional information. I guess that since you left out the url and the whole creation of the Request and the url, this means that some responses have Headers and some not. That depends on the additional non-body information the api or web server wants to respond with.
It's in the control of the responder and not the receiver.
Don't ignore them: Some times interesting metadata come from Headers. It should not be data but information about it, like encoding, CORS info etc.
last modified header link
date header link

Related

Is Dot Net HttpClient Unexpectedly Caching Responses?

I'm attempting to write a curl-like tool that demonstrates the effect of various HTTP caching headers on dot net's HttpClient class.
In my initial attempt I'm pointing the tool at one of my internal web services that does not specify any caching information in the response and examining the header of the response.
I expect to see that the request is re-sent each time and executed on the server, returning a new but identical set of content each time (for the purpose of this test, the content is static on the server). But, instead, each request after the first returns much more quickly than the first and includes a new header Age that was not present in the very first response. This indicates to me that the HttpClient in my command-line tool is returning the response from cache, not placing a new request.
Here is the first request with the response headers:
HTTP:>GET http://myserver:8058/path1/path2
Status 200 OK (OK in 00:00:00.3235905):
Date = Sat, 08 Jul 2017 15:55:22 GMT
Server = Microsoft-HTTPAPI/2.0
Content-Length = 150867
Content-Type = application/json; charset=utf-8
and here is the request from the same session of my curl tool, a little while later:
HTTP:>GET http://myserver:8058/path1/path2
Status 200 OK (OK in 00:00:00.0188433):
Date = Sat, 08 Jul 2017 15:55:22 GMT
Server = Microsoft-HTTPAPI/2.0
Age = 312
Content-Length = 150867
Content-Type = application/json; charset=utf-8
and finally, after I stop and start my program, here's another request from the new instance:
HTTP:>GET http://myserver:8058/path1/path2
Status 200 OK (OK in 00:00:00.0517271):
Date = Sat, 08 Jul 2017 15:55:22 GMT
Server = Microsoft-HTTPAPI/2.0
Age = 528
Content-Length = 150867
Content-Type = application/json; charset=utf-8
The last one I find even more difficult to understand as I was under the impression (from reading this: https://aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong/) that caching is maintained per instance of HttpClient.
This seems to continue forever with Age increasing each request. The only way to get back to the original response is to use Internet Explorer and delete temporary internet files.
[Additional Info] After leaving my command line application open for a couple of hours I repeated the request and received a response identical to the original, without the Age header. So it appears that, if HttpClient was caching the response, that cache expired after a couple of hours.
Can anyone tell me if I'm correct that HttpClient is performing internal caching in this case, and if so, why it's doing so in the absence of any caching-related response headers and what policy it's using?

(Instagram API) Trouble Reaching Tag Endpoints

I'm trying to gather a list of recent posts that contain a certain hashtag. The API Documentation states that I should be using the following GET call:
https://api.instagram.com/v1/tags/{tag-name}/media/recent?access_token=ACCESS-TOKEN
When I load the page where I want this information displayed, I perform the following:
using(HttpClient Client = new HttpClient())
{
var uri = "https://api.instagram.com/v1/tags/" + tagToLookFor + "/media/recent?access_token=" + Session["instagramaccesstoken"].ToString();
var results = Client.GetAsync(uri).Result;
// Result handling below here.
}
For reference, tagToLookFor is a constant string defined at the top of the class (eg. foo), and I store the Access Token returned from the OAuth process in the Session object with a key of 'instagramaccesstoken'.
While debugging this, I checked to make sure the URI was being formed correctly, and it does contain both the tag name and the just-created access_token. Using Apigee with the same URI (Save for a different access_token) returns the valid results I would expect. However, attempting to GET using the URI on my webstie returns:
{
StatusCode: 400,
ReasonPhrase: 'BAD REQUEST',
Version: 1.1,
Content: System.Net.Http.StreamContent,
Headers:{
X-Ratelimit-Remaining: 499
Vary: Cookie
Vary: Accept-Language
X-Ratelimit-Limit: 500
Pragma: no-cache
Connection: keep-alive
Cache-Control: no-store, must-revalidate, no-cache, private
Date: Fri, 27 Nov 2015 21:39:56 GMT
Set-Cookie: csrftoken=97cc443e4aaf11dbc44b6c1fb9113378; expires=Fri, 25-Nov-2016 21:39:56 GMT; Max-Age=31449600; Path=/
Content-Length: 283
Content-Language: en
Content-Type: application/json; charset=utf-8
Expires: Sat, 01 Jan 2000 00:00:00 GMT
}
}
I'm trying to determine what the difference between the two could be; the only thing that I can think of is that access_token is somehow being invalidated when I switch between pages. The last thing I do on the Login/Auth page is store the access_token using Session.Add, then call Server.Transfer to move to the page that I'm calling this on.
Any Ideas on what the issue could be? Thanks.
Attach the token to the header when making the request.
Client.DefaultRequestHeaders.Add("access_token", "Bearer " + token);
The problem ended up being one regarding Sandbox Mode. I had registered an app after the switch, and I was the only user in my sandbox. As a result, it had no problem finding my posts/info, but Sandbox Mode acts as if the Sandbox users are the only users on Instagram, so naturally it would not find anything else.
It turns out there was an existing registered application in my organization (made before the switch date) that does not have any such limitations, so I have been testing using that AppID/secret.
tl;dr: If you're the only user in your app's sandbox, work on getting users into your sandbox. See their article about it for more info.

Would like to get http response results like Fiddler

I'm trying to get the same type of results that Fiddler gets when I launch a webpage from my app.
Below is the code I'm using and the results I'm getting. I've used google.com only as an example.
What do I need to modify in my code to get the results I want or do I need an entirely different approach?
Thanks for your help.
My code:
// create the HttpWebRequest object
HttpWebRequest objRequest = (HttpWebRequest)WebRequest.Create("http://www.google.com");
// get the response object which has the header info, using the GetResponse method
var objResults = objRequest.GetResponse();
// get the header count
int intCount = objResults.Headers.Count;
// loop through the results object
for (int i = 0; i < intCount; i++)
{
string strKey = objResults.Headers.GetKey(i);
string strValue = objResults.Headers.Get(i);
lblResults.Text += strKey + "<br />" + strValue + "</br /><br />";
}
My results:
Cache-Control
private, max-age=0
Content-Type
text/html; charset=ISO-8859-1
Date
Tue, 05 Jun 2012 17:40:38 GMT
Expires
-1
Set-Cookie
PREF=ID=526197b0260fd361:FF=0:TM=1338918038:LM=1338918038:S=gefqgwkuzuPJlO3G; expires=Thu, 05-Jun-2014 17:40:38 GMT; path=/; domain=.google.com,NID=60=CJbpzMe6uTKf58ty7rysqUFTW6GnsQHZ-Uat_cFf1AuayffFtJoFQSIwT5oSQKqQp5PSIYoYtBf_8oSGh_Xsk1YtE7Z834Qwn0A4Sw3ruVCA9v3f_UDYH4b4fAloFJbW; expires=Wed, 05-Dec-2012 17:40:38 GMT; path=/; domain=.google.com; HttpOnly
P3P
CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server
gws
X-XSS-Protection
1; mode=block
X-Frame-Options
SAMEORIGIN
Transfer-Encoding
chunked
=========================
Fiddler results:
Result Protocol Host URL Body Caching Content-Type Process Comments Custom
1 304 HTTP www.rolandgarros.com /images/misc/weather/P8.gif 0 max-age=700 Expires: Tue, 05 Jun 2012 17:53:40 GMT image/gif firefox:5456
2 200 HTTP www.google.com / 23,697 private, max-age=0 Expires: -1 text/html; charset=UTF-8 chrome:2324
3 304 HTTP www.rolandgarros.com /images/misc/weather/P9.gif 0 max-age=700 Expires: Tue, 05 Jun 2012 17:53:57 GMT image/gif firefox:5456
4 200 HTTP Tunnel to translate.googleapis.com:443 0 chrome:2324
5 200 HTTP www.google.com
The difference is Fiddler is actually recording an entire session, not just a single HTTP request.
If a user loads Google.com, the response is typically an HTML document which contains images, script files, CSS files, etc. Your browser will then initiate a new HTTP request for each one of those resources. With Fiddler running, it tracks each of those HTTP requests and spits out the result code and other information about the session.
With your C# code above, you're only initiating a single HTTP request, thus you only have information about a single result.
You'd probably be better off writing a browser plugin. Otherwise, you'd have to parse the HTML response and load other resources from that document as well.
If you do need to do this with C# code, you could probably parse the document with the HTML Agility Pack and then look for other resources within the HTML to simulate a browser. There's also embedded browsers, such as Awesomium, that might be helpful.
You are not asking for the same information that Fiddler is displaying. Fiddler shows the HTTP Status code, the host and URI and (it appears, from your example) the Content Length, Content Type and Cache status.
For many of these you will have to peek in to the response headers.

HttpWebRequest with caching enabled throws exceptions

I'm working on a small C#/WPF application that interfaces with a web service implemented in Ruby on Rails, using handcrafted HttpWebRequest calls and JSON serialization. Without caching, everything works as it's supposed to, and I've got HTTP authentication and compression working as well.
Once I enable caching, by setting request.CachePolicy = new HttpRequestCachePolicy(HttpRequestCacheLevel.CacheIfAvailable);, things go awry - in the production environment. When connecting to a simple WEBrick instance, things work fine, I get HTTP/1.1 304 Not Modified as expected and HttpWebRequest delivers the cached content.
When I try the same against the production server, running nginx/0.8.53 + Phusion Passenger 3.0.0, the application breaks. First request (uncached) is served properly, but on the second request which results in the 304 response, I get a WebException stating that "The request was aborted: The request was canceled." as soon as I invoke request.GetResponse().
I've run the connections through fiddler, which hasn't helped a whole lot; both WEBrick and nginx return an empty entity body, albeit different response headers. Intercepting the request and changing the response headers for nginx to match those of WEBrick didn't change anything, leading me to think that it could be a keep-alive issue; setting request.KeepAlive = false; changes nothing, though - it doesn't break stuff when connecting to WEBrick, and it doesn't fix stuff when connecting to nginx.
For what it's worth, the WebException.InnerException is a NullReferenceException with the following StackTrace:
at System.Net.HttpWebRequest.CheckCacheUpdateOnResponse()
at System.Net.HttpWebRequest.CheckResubmitForCache(Exception& e)
at System.Net.HttpWebRequest.DoSubmitRequestProcessing(Exception& exception)
at System.Net.HttpWebRequest.ProcessResponse()
at System.Net.HttpWebRequest.SetResponse(CoreResponseData coreResponseData)
Headers for the (working) WEBrick connection:
########## request
GET /users/current.json HTTP/1.1
Authorization: Basic *REDACTED*
Content-Type: application/json
Accept: application/json
Accept-Charset: utf-8
Host: testbox.local:3030
If-None-Match: "84a49062768e4ca619b1c081736da20f"
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
########## response
HTTP/1.1 304 Not Modified
X-Ua-Compatible: IE=Edge
Etag: "84a49062768e4ca619b1c081736da20f"
Date: Wed, 01 Dec 2010 18:18:59 GMT
Server: WEBrick/1.3.1 (Ruby/1.8.7/2010-08-16)
X-Runtime: 0.177545
Cache-Control: max-age=0, private, must-revalidate
Set-Cookie: *REDACTED*
Headers for the (exception-throwing) nginx connection:
########## request
GET /users/current.json HTTP/1.1
Authorization: Basic *REDACTED*
Content-Type: application/json
Accept: application/json
Accept-Charset: utf-8
Host: testsystem.local:8080
If-None-Match: "a64560553465e0270cc0a23cc4c33f9f"
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
########## response
HTTP/1.1 304 Not Modified
Connection: keep-alive
Status: 304
X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 3.0.0
ETag: "a64560553465e0270cc0a23cc4c33f9f"
X-UA-Compatible: IE=Edge,chrome=1
X-Runtime: 0.240160
Set-Cookie: *REDACTED*
Cache-Control: max-age=0, private, must-revalidate
Server: nginx/0.8.53 + Phusion Passenger 3.0.0 (mod_rails/mod_rack)
UPDATE:
I tried doing a quick-and-dirty manual ETag cache, but turns out that's a no-go: I get a WebException when invoking request.GetResponce(), telling me that "The remote server returned an error: (304) Not Modified." - yeah, .NET, I kinda knew that, and I'd like to (attempt to) handle it myself, grr.
UPDATE 2:
Getting closer to the root of the problem. The showstopper seems to be a difference in the response headers for the initial request. WEBrick includes a Date: Wed, 01 Dec 2010 21:30:01 GMT header, which isn't present in the nginx reply. There's other differences as well, but intercepting the initial nginx reply with fiddler and adding a Date header, the subsequent HttpWebRequests are able to process the (unmodified) nginx 304 replies.
Going to try to look for a workaround, as well as getting nginx to add the Date header.
UPDATE 3:
It seems that the serverside issue is with Phusion Passenger, they have an open issue about lack of the Date header. I'd still say that HttpWebRequest's behavior is... suboptimal.
UPDATE 4:
Added a Microsoft Connect ticket for the bug.
I think the designers find it reasonable to throw an exception when the "expected behavior"---i.e., getting a response body---cannot be completed. You can handle this somewhat intelligently as follows:
catch (WebException ex)
{
if (ex.Status == WebExceptionStatus.ProtocolError)
{
var statusCode = ((HttpWebResponse)ex.Response).StatusCode;
// Test against HttpStatusCode enumeration.
}
else
{
// Do something else, e.g. throw;
}
}
So, it turns out to be Phusion Passenger (or nginx, depending on how you look at it - and Thin as well) that doesn't add a Date HTTP response header, combined with what I see as a bug in .NET HttpWebRequest (in my situation there's no If-Modified-Since, thus Date shouldn't be necessary) leading to the problem.
The workaround for this particular case was to edit our Rails ApplicationController:
class ApplicationController < ActionController::Base
# ...other stuff here
before_filter :add_date_header
# bugfix for .NET HttpWebRequst 304-handling bug and various
# webservers' lazyness in not adding the Date: response header.
def add_date_header
response.headers['Date'] = Time.now.to_s
end
end
UPDATE:
Turns out it's a bit more complex than "just" setting HttpRequestCachePolicy - to repro, I also need to have manually constructed HTTP Basic Auth. So the involved components are the following:
HTTP server that doesn't include a HTTP "Date:" response header.
manual construction of HTTP Authorization request header.
use of HttpRequestCachePolicy.
Smallest repro I've been able to come up with:
namespace Repro
{
using System;
using System.IO;
using System.Net;
using System.Net.Cache;
using System.Text;
class ReproProg
{
const string requestUrl = "http://drivelog.miracle.local:3030/users/current.json";
// Manual construction of HTTP basic auth so we don't get an unnecessary server
// roundtrip telling us to auth, which is what we get if we simply use
// HttpWebRequest.Credentials.
private static void SetAuthorization(HttpWebRequest request, string _username, string _password)
{
string userAndPass = string.Format("{0}:{1}", _username, _password);
byte[] authBytes = Encoding.UTF8.GetBytes(userAndPass.ToCharArray());
request.Headers["Authorization"] = "Basic " + Convert.ToBase64String(authBytes);
}
static public void DoRequest()
{
var request = (HttpWebRequest) WebRequest.Create(requestUrl);
request.Method = "GET";
request.CachePolicy = new HttpRequestCachePolicy(HttpRequestCacheLevel.CacheIfAvailable);
SetAuthorization(request, "user#domain.com", "12345678");
using(var response = request.GetResponse())
using(var stream = response.GetResponseStream())
using(var reader = new StreamReader(stream))
{
string reply = reader.ReadToEnd();
Console.WriteLine("########## Server reply: {0}", reply);
}
}
static public void Main(string[] args)
{
DoRequest(); // works
DoRequest(); // explodes
}
}
}

Remove HTTP headers from a raw response

Let's say we make a request to a URL and get back the raw response, like this:
HTTP/1.1 200 OK
Date: Wed, 28 Apr 2010 14:39:13 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=e2bca72563dfffcc:TM=1272465553:LM=1272465553:S=ZN2zv8oxlFPT1BJG; expires=Fri, 27-Apr-2012 14:39:13 GMT; path=/; domain=.google.co.uk
Server: gws
X-XSS-Protection: 1; mode=block
Connection: close
<!doctype html><html><head>...</head><body>...</body></html>
What would be the best way to remove the HTTP headers from the response in C#? With regexes? Parsing it into some kind of HTTPResponse object and using only the body?
EDIT:
I'm using SOCKS to make the request; that's why I get the raw response.
Headers and body are separated by empty line. it is really easier to do it without RE. Just search for first empty line.
If you use HttpWebrequest class you get an HttpWebResponse object returned which in turn contains a collection of Headers. You can then remove them, parse them or do whatever you wish with them.
Note that using the substring method will leave you with a leading carriage return. I used this:
string HTTPHeaderDelimiter = "\r\n\r\n";
if (RawHTTPResponse.IndexOf("HTTP/1.1 200 OK") > -1)
{
HTTPPayload = RawHTTPResponse.Substring(RawHTTPResponse.IndexOf(HTTPHeaderDelimiter)+HTTPHeaderDelimiter.Length);
}
else
{
return;
}

Categories

Resources