Setting HttpResponse.ContentEncoding to GZIP - c#

I have a small IHttpModule that's reading a POST request from another server and relaying it on. The response from the remote server has the header
Content-Encoding: gzip
How do i specify this in the HttpResponse i'm returning to the caller? Content-Encoding is defined as a text encoding type, so it's expecting a text encoding such as UTF8.
context.Response.ContentEncoding = ???;
Should i be ignoring this and manually setting the header?

If you modifying response, then you should, decode and read the content, gzip retrieved value and add header in response.
//Code for gzip the content and add header
context.Response.Filter = new System.IO.Compression.GZipStream(
context.Response.Filter,
System.IO.Compression.CompressionMode.Compress);
context.Response.AppendHeader("Content-Encoding", "gzip");
If relaying the response without any change, then no need to do any thing.

Related

gRPC Client Never Receives Response Headers

I want to send information about the current client version from the server in all responses to the callers.
I want to have this information in the HTTP header. But I am not able to read the headers in the client.
I call the method WriteResponseHeaderAsync in the server method or interceptor (tried both). I see (Fiddler) that the header is in the response header.
But, I cannot read this header on the client or in the interceptor on the client. I tried everything.
My code in method:
var result = AuthorizationClient.LoginAsync(loginRequest);
var responseHeaders = await result.ResponseHeadersAsync;
ResponseHeaders is empty (responseHeaders.Count is 0) all time. I am able to use Trailers but the right place is HTTP header.
Is possible to read the response headers? Is possible to read the response headers in interceptors?
If yes, how?
I am using: C#, Grpc.AspNetCore.Web 2.51.0 (on server), Grpc.Net.Client.Web 2.51.0 (Client. Blazor WebAssembly)

why does content-type header get removed?

I am issuing a request like so:
And when inspecting what has been sent to my controller, it looks like the Content-Type header does not even make it there:
What am I doing wrong? Why is Content-Type header being ignored completely?
It is a bit confusing, but Content-Type headers are not accessible from the generic "Headers" collection.
You should actually pull the header from the ContentType property of the Headers object on the content object of the request:
var contentType = Request.Content.Headers.ContentType;

One specific site which Http Response (hebrew) characters do not come property encoded

The following has been amusing me for a while now.
First of all, I have been scraping sites for a couple of months. Among them hebrew sites as well, and had no problem whatsoever in receiving hebrew characters from the http server.
For some reason I am very curious to sort out, the following site is an exception. I can't get the characters properly encoded. I tried emulating the working requests I do via Fiddler, but to no avail. My c# request headers look exactly the same, but still the characters will not be readable.
What I do not understand is why I have always been able to retrieve hebrew characters from other sites, while from this one specifically I am not. What is this setting that is causing this.
Try the following sample out.
HttpClient httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0");
//httpClient.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "text/html;q=0.9");
//httpClient.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.5");
//httpClient.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
var getTask = httpClient.GetStringAsync("http://winedepot.co.il/Default.asp?Page=Sale");
//doing it like this for the sake of the example
var contents = getTask.Result;
//add a breakpoint at the following line to check the contents of "contents"
Console.WriteLine();
As mentioned, such code works for any other israeli site I try - say, Ynet news site, for instance.
Update: I figured out while "debugging" with Fiddler that the response object, for the ynet site (one which works), returns the header
Content-Type: text/html; charset=UTF-8
while this header is absent in the response from winedepot.co.il
I tried adding it, but still made no difference.
var getTask = httpClient.GetAsync("http://www.winedepot.co.il");
var response = getTask.Result;
var contentObj = response.Content;
contentObj.Headers.Remove("Content-Type");
contentObj.Headers.Add("Content-Type", "text/html; charset=UTF-8");
var readTask = response.Content.ReadAsStringAsync();
var contents = readTask.Result;
Console.WriteLine();
The problem you're encountering is that the webserver is lying about its content-type, or rather, not being specific enough.
The first site responds with this header:
Content-Type: text/html; charset=UTF-8
The second one with this header:
Content-Type: text/html
This means that in the second case, your client will have to make assumptions about what encoding the text is actually in. To learn more about text encodings, please read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
And the built-in HTTP clients for .NET don't really do a great job at this, which is understandable, because it is a Hard Problem. Read the linked article for the trouble a web browser will have to go through in order to guess the encoding, and then try to understand why you don't want this logic in a programmable web client.
Now the sites do provide you with a <meta http-equiv="Content-Type" content="actual encoding here" /> tag, which is a nasty workaround for not having to properly configure a web server. When a browser encounters such a tag, it will have to restart parsing the document with the specified content-type, and then hope it is correct.
The steps roughly are, assuming an HTML payload:
Perform web request, keep the response document in a binary buffer.
Inspect the content-type header, if present, and if it isn't present or doesn't provide a charset, do some assumption about the encoding.
Read the response by decoding the buffer, and parsing the resulting HTML.
When encountering a <meta http-equiv="Content-Type" /> header, discard all decoded text, and start again by interpreting the binary buffer as text encoded in the specified encoding.
The C# HTTP clients stop at step 2, and rightfully so. They are HTTP clients, not HTML-displaying browsers. They don't care that your payload is HTML, JSON, XML, or any other textual format.
When no charset is given in the content-type response header, the .NET HTTP clients default to the ISO-8859-1 encoding, which cannot display the characters from the character set Windows-1255 (Hebrew) that the page actually is encoded in (or rather, it has different characters at the same code points).
Some C# implementations that try to do encoding detection from the meta HTML element are provided in Encoding trouble with HttpWebResponse. I cannot vouch for their correctness, so you'll have to try it at your own risk. I do know that the currently highest-voted answer actually re-issues the request when it encounters the meta tag, which is quite silly, because there is no guarantee that the second response will be the same as the first, and it's just a waste of bandwidth.
You can also do some assumption about that you know the encoding being used for a certain site or page, and then force the encoding to that:
using (Stream resStream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(resStream, YourFixedEncoding);
string content = reader.ReadToEnd();
}
Or, for HttpClient:
using (var client = new HttpClient())
{
var response = await client.GetAsync(url);
var responseStream = await client.ReadAsStreamAsync();
using (var fixedEncodingReader = new StreamReader(responseStream, Encoding.GetEncoding(1255)))
{
string responseString = fixedEncodingReader.ReadToEnd();
}
}
But assuming an encoding for a particular response, or URL, or site, is entirely unsafe altogether. It is in no way guaranteed that this assumption will be correct every time.

HttpWebRequest invalid contentlength

I need to get content length in order to tell my app where the buffer ends. The problem is that httpwebresponse.ContentLength returns -1 even though Content-Length header is presented in response.
Then I though I'm going to read the actual header to find out the length. The Content-Length returned by the page I'm testing on is 1646. An HTTP sniffer claims that I received 1900 bytes, so I assume the difference are the header length. Then I copied the whole body from response and pasted it into online strlen site and the body size is actually 1850!!
How is this possible? Why does response return invalid content-length and why does httpwebrequest.ContentLength returns -1? How can I calculate the actual response length before receiving the response itself?
EDIT:
This is the code I'm using to get the response:
using (System.IO.Stream responseStream = hwresponse.GetResponseStream())
{
using (MemoryStream memoryStream = new MemoryStream())
{
int count = 0;
do
{
count = responseStream.Read(buffer, 0, buffer.Length);
TCP_R.SendBytes(buffer);
} while (count != 0);
}
}
byte[] PACKET_END_IDENTIFIER = { 0x8, 0x01, 0x8, 0x1, 0x8 };
TCP_R.SendBytes(PACKET_END_IDENTIFIER);
TCP_R.Close();
I have a proxy server application that takes a request, sends it to another application (my client) client executes the request and using TCP_R class returns the result. When server gets response from client, it returns response back to browser.
Each time I do a request, I get all the data + extra garbage, here's an example:
<tag1><tag2><tag3> ag3>
ag3> is the garbage data, it's like the ending of buffer is cut off and added again. It apprears that the client responds with a valid response, the garbage data is added onDataRecieve event.. any tips? thanks!
-1 isn't an invalid value of the ContentLength property. I assume you mean the ContentLength property of the response is -1... asking the request what the length is would be non-sensical. Even so, it's perfectly valid:
The ContentLength property contains the value of the Content-Length header returned with the response. If the Content-Length header is not set in the response, ContentLength is set to the value -1.
If the body length is 1850, that suggests it's using chunked transfer encoding. But that should be transparent to you - just keep reading from the response stream until the end. If you're using .NET 4, it's dead easy - just create a MemoryStream and use Stream.CopyTo to copy the data to that MemoryStream.

If a browser can shows Accept-Encoding of deflate, can it handle .NET gzipped responses?

I'm looking at this method in this HTTPCombiner:
private bool CanGZip(HttpRequest request)
{
string acceptEncoding = request.Headers["Accept-Encoding"];
if (!string.IsNullOrEmpty(acceptEncoding) &&
(acceptEncoding.Contains("gzip") || acceptEncoding.Contains("deflate")))
return true;
return false;
}
If this returns true then the response is compressed using a GZipStream. Is this right?
Those are two different algorithms :
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5
Some code here :
http://www.singular.co.nz/blog/archive/2008/07/06/finding-preferred-accept-encoding-header-in-csharp.aspx
So, according to the protocol, it is not right, as if the browser says "give me the content using deflate", you shouldn't send it back gzipped.
GZip (which is based on Deflate) and Deflate are two different algorithms, so a request for "deflate" should definitely not return gzipped content.
However, this should be easy to fix, by simply using a GZipStream if the accept header contains "gzip" and a DeflateStream for "deflate".
Both are included in System.IO.Compression, so it's not like you'd have to code your own deflate algorithm or use a third party implementation.
Typically most of the browsers understand GZip and Deflate. They tell the server by specifying it in the request header as Accept-Encoding:gzip, deflate. The HTTPCombiner gives preference to GZip. If both the types are present then GZip is given the preference. HttpCombiner will send the content only if the browser requests for Defalte only.

Categories

Resources