How to get the file size from http headers - c#

I want to get the size of an http:/.../file before I download it. The file can be a webpage, image, or a media file. Can this be done with HTTP headers? How do I download just the file HTTP header?

Yes, assuming the HTTP server you're talking to supports/allows this:
public long GetFileSize(string url)
{
long result = -1;
System.Net.WebRequest req = System.Net.WebRequest.Create(url);
req.Method = "HEAD";
using (System.Net.WebResponse resp = req.GetResponse())
{
if (long.TryParse(resp.Headers.Get("Content-Length"), out long ContentLength))
{
result = ContentLength;
}
}
return result;
}
If using the HEAD method is not allowed, or the Content-Length header is not present in the server reply, the only way to determine the size of the content on the server is to download it. Since this is not particularly reliable, most servers will include this information.

Can this be done with HTTP headers?
Yes, this is the way to go. If the information is provided, it's in the header as the Content-Length. Note, however, that this is not necessarily the case.
Downloading only the header can be done using a HEAD request instead of GET. Maybe the following code helps:
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://example.com/");
req.Method = "HEAD";
long len;
using(HttpWebResponse resp = (HttpWebResponse)(req.GetResponse()))
{
len = resp.ContentLength;
}
Notice the property for the content length on the HttpWebResponse object – no need to parse the Content-Length header manually.

Note that not every server accepts HTTP HEAD requests. One alternative approach to get the file size is to make an HTTP GET call to the server requesting only a portion of the file to keep the response small and retrieve the file size from the metadata that is returned as part of the response content header.
The standard System.Net.Http.HttpClient can be used to accomplish this. The partial content is requested by setting a byte range on the request message header as:
request.Headers.Range = new RangeHeaderValue(startByte, endByte)
The server responds with a message containing the requested range as well as the entire file size. This information is returned in the response content header (response.Content.Header) with the key "Content-Range".
Here's an example of the content range in the response message content header:
{
"Key": "Content-Range",
"Value": [
"bytes 0-15/2328372"
]
}
In this example the header value implies the response contains bytes 0 to 15 (i.e., 16 bytes total) and the file is 2,328,372 bytes in its entirety.
Here's a sample implementation of this method:
public static class HttpClientExtensions
{
public static async Task<long> GetContentSizeAsync(this System.Net.Http.HttpClient client, string url)
{
using (var request = new System.Net.Http.HttpRequestMessage(System.Net.Http.HttpMethod.Get, url))
{
// In order to keep the response as small as possible, set the requested byte range to [0,0] (i.e., only the first byte)
request.Headers.Range = new System.Net.Http.Headers.RangeHeaderValue(from: 0, to: 0);
using (var response = await client.SendAsync(request))
{
response.EnsureSuccessStatusCode();
if (response.StatusCode != System.Net.HttpStatusCode.PartialContent)
throw new System.Net.WebException($"expected partial content response ({System.Net.HttpStatusCode.PartialContent}), instead received: {response.StatusCode}");
var contentRange = response.Content.Headers.GetValues(#"Content-Range").Single();
var lengthString = System.Text.RegularExpressions.Regex.Match(contentRange, #"(?<=^bytes\s[0-9]+\-[0-9]+/)[0-9]+$").Value;
return long.Parse(lengthString);
}
}
}
}

WebClient webClient = new WebClient();
webClient.OpenRead("http://stackoverflow.com/robots.txt");
long totalSizeBytes= Convert.ToInt64(webClient.ResponseHeaders["Content-Length"]);
Console.WriteLine((totalSizeBytes));

HttpClient client = new HttpClient(
new HttpClientHandler() {
Proxy = null, UseProxy = false
} // removes the delay getting a response from the server, if you not use Proxy
);
public async Task<long?> GetContentSizeAsync(string url) {
using (HttpResponseMessage responce = await client.GetAsync(url))
return responce.Content.Headers.ContentLength;
}

Related

c# JSON REST response via 3 different approaches (WebRequest, RESTSharp, HttpClient) is empty, but Postman and browser works

When calling a specific endpoint in C# which works without issues in Postman (or via Firefox), I'm getting an empty response.
The url I'm calling is returning a collection of data. In the url parameters I can specify how much of said data I want.
I've inspected the response size in Postman, and when I limit the amount of data requested in my C# call such that the response is around 700kb, then I get a JSON response back.
However, if I exceed this size in the C# call, then the response is empty '{ }' and the ContentLength returned = -1 (the statusCode returned is 200, so this seems fine at least). This same request which fails in C# works fine within Postman and Firefox however...
I somehow suspect this occurs because either the deserializer's buffer is not big enough OR because the response is still in transit and the code somehow continues executing before it has read the whole response body...
See below for the 3 implementations which I've tested:
1:
var httpClient = new HttpClient();
var responseMessage = await httpClient.GetAsync(requestUrl, HttpCompletionOption.ResponseHeadersRead);
if (responseMessage.StatusCode == System.Net.HttpStatusCode.OK)
{
using (var httpStream = await responseMessage.Content.ReadAsStreamAsync())
{
using (var sr = new StreamReader(httpStream))
{
Info(await sr.ReadToEndAsync()); //Info logs the string to a file
}
}
}
2 (RESTSharp):
var client = new RestClient();
var request = new RestRequest(requestUrl, Method.GET, DataFormat.Json);
Info(request.Content); //Info logs the string to a file
3:
var httpWebRequest = (HttpWebRequest)WebRequest.Create(requestUrl);
httpWebRequest.ContentType = "application/json; charset-utf8";
var httpWebResponse = httpWebRequest.GetResponse() as HttpWebResponse;
var binReader = new BinaryReader(responseStream);
const int bufferSize = 4096;
byte[] responseBytes;
using (MemoryStream ms = new MemoryStream())
{
byte[] buffer = new byte[bufferSize];
int count;
while ((count = binReader.Read(buffer, 0, buffer.Length)) != 0)
{
ms.Write(buffer, 0, count);
}
responseBytes = ms.ToArray();
}
Info(Encoding.UTF8.GetString(responseBytes, 0, responseBytes.Length)); //Info logs the string to a file
I'm not modifying the HttpClient.MaxResponseContentBufferSize property, but for good measure I've also tried changing this value, to no avail.
How can I resolve this?
I've found the issue, it was being caused by the backend service to which I was connecting. Thank you again #Panagiotis Kanavos

How to set content-md5 header in GET method using HttpClient?

I have the following code to set content-md5 in my GET method request using HttpClient
httpClient.DefaultRequestHeaders.TryAddWithoutValidation("content-md5", "value");
I cannot use HttpRequestMessage content to set it because it's not a POST method. When using Postman it works like a charm but fails when using HttpClient.GetAsync.
Client request a hmac to the server as follows
{
"content_to_hash": "my content"
}
The server will give response like this
{
"content_md5": "88af7ceab9fdafb76xxxxx",
"date": "Sat, 02 May 2020 00:13:16 +0700",
"hmac_value": "WfHgFyT792IENmK8Mqz9LysmP8ftOP00qA="
}
Now I have to access a GET request using that hmac where it's the problem because I cannot set in httpClient GET request header.
Here's the image
From reading the HttpClient and related source code, there's no way you can get around this and add the header to the actual request object headers. There is an internal list of invalid headers, which includes any Content-* headers. It has to be on a content object.
Therefore, my suggest solution is to create your own content object:
public class NoContentMd5 : HttpContent
{
protected override Task SerializeToStreamAsync(Stream stream, TransportContext context)
{
return Task.CompletedTask;
}
protected override bool TryComputeLength(out long length)
{
length = 0;
return false;
}
public NoContentMd5(byte[] contentMd5)
{
this.Headers.ContentMD5 = contentMd5;
}
public NoContentMd5(string contentMd5)
{
this.Headers.TryAddWithoutValidation("Content-MD5", contentMd5);
}
}
This will add the Content-MD5 header with a value of your choosing, but the request won't contain a body.
The next problem you'll encounter is that you're trying to make a GET request with content, which isn't supported by the helper client.GetAsync(...) method. You'll have to make your own request object and use client.SendAsync(...) instead:
HttpClient client = new HttpClient();
var request = new HttpRequestMessage(HttpMethod.Get, "https://localhost/my/test/uri");
request.Content = new NoContentMd5("d41d8cd98f00b204e9800998ecf8427e ");
var result = await client.SendAsync(request);
Note that if you have your Content-MD5 hash as bytes, I've also added a constructor to NoContentMd5 for byte[] too.
The only potential issue with this is that it includes a Content-Length: 0 header. Hopefully that's OK with the API you're working with.
There's an alternative solution described in this answer to question with a similar issue. I'd argue against using it since is vulnerable to changes in the implementation details of HttpRequestHeaders (because it uses reflection, so if MS change the code, it might break) .
Aside from the fact that it's not considered good practice to send a body with GET request (see HTTP GET with request body), you can try this:
using (var content = new StringContent(string.Empty))
using (var request = new HttpRequestMessage
{
Method = HttpMethod.Get,
RequestUri = new Uri("http://localhost"),
Content = content
})
{
request.Headers.TryAddWithoutValidation("content-md5", "value");;
using (var response = await httpClient.SendAsync(request))
{
response.EnsureSuccessStatusCode();
}
}
UPDATE:
The proper way would be to set the ContentMD5 property on the HttpContentHeaders, for example:
content.Headers.ContentMD5 = Convert.FromBase64String(hashString);
But as you pointed out in the comments, trying to send content in a GET request causes an error.

C# httpclient request size/bandwidth

Basically I'm sending a http post request using HttpClient, but the total response of the kb is roughly 60kb, however I only need to read the response url to determine the outcome, is there anyway I can just read response url rather than the entire data?
Example of the code I'm currently using
string URI = "example.com";
var client = new HttpClient();
var response = await client.PostAsync(URI);
var content = await response.Content.ReadAsStringAsync();
string source = content.ToString();
return source;
What this does is return the body content of " Example.com " but I later realised I wouldn't need to read the body content for a string to determine the outcome, but just simply get the response urls.
I assume this would decrease the size of the request drastically if I'm able to receive the response url of the post request without receiving the body content or other content.
Try to use HttpCompletionOption with proper overload of SendAsync method and rewrite your code like
var request = new HttpRequestMessage(HttpMethod.Post, url);
var response = await _client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);

.NETCore HttpWebRequest - Old Way isn't Working

Before I upgraded to the newest .NetCore I was able to run the HttpWebRequest, add the headers and content Type and pull the stream of the JSON file from Twitch. Since the upgrade this is not working. I receive a Web Exception each time I go to get the response Stream. Nothing has changed with twitch because it still works with the old Bot. The old code is below:
private const string Url = "https://api.twitch.tv/kraken/streams/channelname";
HttpWebRequest request;
try
{
request = (HttpWebRequest)WebRequest.Create(Url);
}
request.Method = "Get";
request.Timeout = 12000;
request.ContentType = "application/vnd.twitchtv.v5+json";
request.Headers.Add("Client-ID", "ID");
try
{
using (var s = request.GetResponse().GetResponseStream())
{
if (s != null)
using (var sr = new StreamReader(s))
{
}
}
}
I have done some research and found that I may need to start using either an HttpClient or HttpRequestMessage. I have tried going about this but when adding headers content type the program halts and exits. after the first line here: (when using HttpsRequestMessage)
request.Content.Headers.ContentType.MediaType = "application/vnd.twitchtv.v5+json";
request.Content.Headers.Add("Client-ID", "rbp1au0xk85ej6wac9b8s1a1amlsi5");
You are trying to add a ContentType header, but what you really want is to add an Accept header (your request is a GET and ContentType is used only on requests which contain a body, e.g. POST or PUT).
In .NET Core you need to use HttpClient, but remember that to correctly use it you need to leverage the use of async and await.
Here it is an example:
using System.Net.Http;
using System.Net.Http.Headers;
private const string Url = "https://api.twitch.tv/kraken/streams/channelname";
public static async Task<string> GetResponseFromTwitch()
{
using(var client = new HttpClient())
{
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/vnd.twitchtv.v5+json"));
client.DefaultRequestHeaders.Add("Client-ID", "MyId");
using(var response = await client.GetAsync(Url))
{
response.EnsureSuccessStatusCode();
return await response.Content.ReadAsStringAsync(); // here we return the json response, you may parse it
}
}
}

Getting the Redirected URLs from the Original URL [duplicate]

Using the WebClient class I can get the title of a website easily enough:
WebClient x = new WebClient();
string source = x.DownloadString(s);
string title = Regex.Match(source,
#"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>",
RegexOptions.IgnoreCase).Groups["Title"].Value;
I want to store the URL and the page title. However when following a link such as:
http://tinyurl.com/dbysxp
I'm clearly going to want to get the Url I'm redirected to.
QUESTIONS
Is there a way to do this using the WebClient class?
How would I do it using HttpResponse and HttpRequest?
If I understand the question, it's much easier than people are saying - if you want to let WebClient do all the nuts and bolts of the request (including the redirection), but then get the actual response URI at the end, you can subclass WebClient like this:
class MyWebClient : WebClient
{
Uri _responseUri;
public Uri ResponseUri
{
get { return _responseUri; }
}
protected override WebResponse GetWebResponse(WebRequest request)
{
WebResponse response = base.GetWebResponse(request);
_responseUri = response.ResponseUri;
return response;
}
}
Just use MyWebClient everywhere you would have used WebClient. After you've made whatever WebClient call you needed to do, then you can just use ResponseUri to get the actual redirected URI. You'd need to add a similar override for GetWebResponse(WebRequest request, IAsyncResult result) too, if you were using the async stuff.
I know this is already an answered question, but this works pretty to me:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://tinyurl.com/dbysxp");
request.AllowAutoRedirect = false;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string redirUrl = response.Headers["Location"];
response.Close();
//Show the redirected url
MessageBox.Show("You're being redirected to: "+redirUrl);
Cheers.! ;)
With an HttpWebRequest, you would set the AllowAutoRedirect property to false. When this happens, any response with a status code between 300-399 will not be automatically redirected.
You can then get the new url from the response headers and then create a new HttpWebRequest instance to the new url.
With the WebClient class, I doubt you can change it out-of-the-box so that it does not allow redirects. What you could do is derive a class from the WebClient class and then override the GetWebRequest and the GetWebResponse methods to alter the WebRequest/WebResponse instances that the base implementation returns; if it is an HttpWebRequest, then set the AllowAutoRedirect property to false. On the response, if the status code is in the range of 300-399, then issue a new request.
However, I don't know that you can issue a new request from within the GetWebRequest/GetWebResponse methods, so it might be better to just have a loop that executes with HttpWebRequest/HttpWebResponse until all the redirects are followed.
I got the Uri for the redirected page and the page contents.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strUrl);
request.AllowAutoRedirect = true;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
strLastRedirect = response.ResponseUri.ToString();
StreamReader reader = new StreamReader(dataStream);
string strResponse = reader.ReadToEnd();
response.Close();
In case you are only interested in the redirect URI you can use this code:
public static string GetRedirectUrl(string url)
{
HttpWebRequest request = (HttpWebRequest) HttpWebRequest.Create(url);
request.AllowAutoRedirect = false;
using (HttpWebResponse response = HttpWebResponse)request.GetResponse())
{
return response.Headers["Location"];
}
}
The method will return
null - in case of no redirect
a relative url - in case of a redirect
Please note: The using statement (or a final response.close()) is essential. See MSDN Library for details. Otherwise you may run out of connections or get a timeout when executing this code multiple times.
HttpWebRequest.AllowAutoRedirect can be set to false. Then you'd have to manually http status codes in the 300 range.
// Create a new HttpWebRequest Object to the mentioned URL.
HttpWebRequest myHttpWebRequest=(HttpWebRequest)WebRequest.Create("http://www.contoso.com");
myHttpWebRequest.MaximumAutomaticRedirections=1;
myHttpWebRequest.AllowAutoRedirect=true;
HttpWebResponse myHttpWebResponse=(HttpWebResponse)myHttpWebRequest.GetResponse();
The WebClient class has an option to follow redirects. Set that option and you should be fine.
Ok this is really hackish, but the key is to use the HttpWebRequest and then set the AllowAutoRedirect property to true.
Here's a VERY hacked together example
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://tinyurl.com/dbysxp");
req.Method = "GET";
req.AllowAutoRedirect = true;
WebResponse response = req.GetResponse();
response.GetResponseStream();
Stream responseStream = response.GetResponseStream();
// Content-Length header is not trustable, but makes a good hint.
// Responses longer than int size will throw an exception here!
int length = (int)response.ContentLength;
const int bufSizeMax = 65536; // max read buffer size conserves memory
const int bufSizeMin = 8192; // min size prevents numerous small reads
// Use Content-Length if between bufSizeMax and bufSizeMin
int bufSize = bufSizeMin;
if (length > bufSize)
bufSize = length > bufSizeMax ? bufSizeMax : length;
StringBuilder sb;
// Allocate buffer and StringBuilder for reading response
byte[] buf = new byte[bufSize];
sb = new StringBuilder(bufSize);
// Read response stream until end
while ((length = responseStream.Read(buf, 0, buf.Length)) != 0)
sb.Append(Encoding.UTF8.GetString(buf, 0, length));
string source = sb.ToString();string title = Regex.Match(source,
#"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>",RegexOptions.IgnoreCase).Groups["Title"].Value;
enter code here

Categories

Resources