Calculate percent of bytes read from compressed HTTP response stream

Calculate percent of bytes read from compressed HTTP response stream - c#

In my desktop application I process HTTP response from the server and I want to provide a progress of processed bytes relatively to the total response length.
I know the content length from the HTTP header but the problem is that a compression (gzip) was applied for the response body. I can calculate the number of processed bytes but they are after decompression and that number is different from the content-length. The HTTP response's Stream is not seekable and to determine the progress I cannot use its Position property as I will have a NotSupportedException, similar to what MSDN declares for NetworkStream.Position property.
Below is the code that reads from the gzipped response
HttpClient httpClient = new HttpClient();
HttpResponseMessage response = httpClient.GetAsync(gzipGetUri, HttpCompletionOption.ResponseHeadersRead).Result;
long? contentLength = response.Content.Headers.ContentLength;
long totalCount = 0;
int percentDone = 0;
Stream responseStream = response.Content.ReadAsStreamAsync().Result;
Stream gzipStream = new GZipStream(responseStream, CompressionMode.Decompress);
byte[] buffer = new byte[1024];
while (true)
{
int nBytes = gzipStream.Read(buffer, 0, buffer.Length);
if (nBytes <= 0)
break;
// process decompressed bytes here...
totalCount += nBytes;
// This code throws NotSupportedException: This stream does not support seek operations
percentDone = (int)(((double)responseStream.Position / contentLength) * 100);
}
Console.WriteLine($"content-length: {contentLength}; bytes read from gzipStream: {totalCount}");
The output to Console (when the line with calculation of percentDone is commented out) is
content-length: 1,316,578; bytes read from gzipStream: 9,410,402
My question is how I can determine the number of bytes that were consumed from a non-seekable response stream before they are transformed by decompression. Also I cannot use the count after decompression for percentDone calculation because I do not know the final number of decompressed bytes.
I guess that I can derive a class from Stream that counts passing through bytes, use it as a wrapper around responseStream and pass it as an inner stream to gzipStream but that solution seems too heavy.

Related

Read the response from a web service [duplicate]

I'm trying to get an image from an url using a byte stream. But i get this error message:
This stream does not support seek operations.
This is my code:
byte[] b;
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create(url);
WebResponse myResp = myReq.GetResponse();
Stream stream = myResp.GetResponseStream();
int i;
using (BinaryReader br = new BinaryReader(stream))
{
i = (int)(stream.Length);
b = br.ReadBytes(i); // (500000);
}
myResp.Close();
return b;
What am i doing wrong guys?

You probably want something like this. Either checking the length fails, or the BinaryReader is doing seeks behind the scenes.
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create(url);
WebResponse myResp = myReq.GetResponse();
byte[] b = null;
using( Stream stream = myResp.GetResponseStream() )
using( MemoryStream ms = new MemoryStream() )
{
int count = 0;
do
{
byte[] buf = new byte[1024];
count = stream.Read(buf, 0, 1024);
ms.Write(buf, 0, count);
} while(stream.CanRead && count > 0);
b = ms.ToArray();
}
edit:
I checked using reflector, and it is the call to stream.Length that fails. GetResponseStream returns a ConnectStream, and the Length property on that class throws the exception that you saw. As other posters mentioned, you cannot reliably get the length of a HTTP response, so that makes sense.

Use a StreamReader instead:
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create(url);
WebResponse myResp = myReq.GetResponse();
StreamReader reader = new StreamReader(myResp.GetResponseStream());
return reader.ReadToEnd();
(Note - the above returns a String instead of a byte array)

You can't reliably ask an HTTP connection for its length. It's possible to get the server to send you the length in advance, but (a) that header is often missing and (b) it's not guaranteed to be correct.
Instead you should:
Create a fixed-length byte[] that you pass to the Stream.Read method
Create a List<byte>
After each read, call List.AddRange to append the contents of your fixed-length buffer onto your byte list
Note that the last call to Read will return fewer than the full number of bytes you asked for. Make sure you only append that number of bytes onto your List<byte> and not the whole byte[], or you'll get garbage at the end of your list.

If the server doesn't send a length specification in the HTTP header, the stream size is unknown, so you get the error when trying to use the Length property.
Read the stream in smaller chunks, until you reach the end of the stream.

With images, you don't need to read the number of bytes at all. Just do this:
Image img = null;
string path = "http://www.example.com/image.jpg";
WebRequest request = WebRequest.Create(path);
req.Credentials = CredentialCache.DefaultCredentials; // in case your URL has Windows auth
WebResponse resp = req.GetResponse();
using( Stream stream = resp.GetResponseStream() )
{
img = Image.FromStream(stream);
// then use the image
}

Perhaps you should use the System.Net.WebClient API. If already using client.OpenRead(url) use client.DownloadData(url)
var client = new System.Net.WebClient();
byte[] buffer = client.DownloadData(url);
using (var stream = new MemoryStream(buffer))
{
... your code using the stream ...
}
Obviously this downloads everything before the Stream is created, so it may defeat the purpose of using a Stream. webClient.DownloadData("https://your.url") gets a byte array which you can then turn into a MemoryStream.

The length of a stream can not be read from the stream since the receiver does not know how many bytes the sender will send. Try to put a protocol on top of http and send i.e. the length as first item in the stream.

C# MemoryStream & GZipInputStream: Can't .Read more than 256 bytes

I'm having a problem with writing an uncompressed GZIP stream using SharpZipLib's GZipInputStream. I only seem to be able to get 256 bytes worth of data with the rest not being written to and left zeroed. The compressed stream (compressedSection) has been checked and all data is there (1500+ bytes). The snippet of the decompression process is below:
int msiBuffer = 4096;
using (Stream msi = new MemoryStream(msiBuffer))
{
msi.Write(compressedSection, 0, compressedSection.Length);
msi.Position = 0;
int uncompressedIntSize = AllMethods.GetLittleEndianInt(uncompressedSize, 0); // Gets little endian value of uncompressed size into an integer
// SharpZipLib GZip method called
using (GZipInputStream decompressStream = new GZipInputStream(msi, uncompressedIntSize))
{
using (MemoryStream outputStream = new MemoryStream(uncompressedIntSize))
{
byte[] buffer = new byte[uncompressedIntSize];
decompressStream.Read(buffer, 0, uncompressedIntSize); // Stream is decompressed and read
outputStream.Write(buffer, 0, uncompressedIntSize);
using (var fs = new FileStream(kernelSectionUncompressed, FileMode.Create, FileAccess.Write))
{
fs.Write(buffer, 0, buffer.Length);
fs.Close();
}
outputStream.Close();
}
decompressStream.Close();
So in this snippet:
1) The compressed section is passed in, ready to be decompressed.
2) The expected size of the uncompressed output (which is stored in a header with the file as a 2-byte little-endian value) is passed through a method to convert it to integer. The header is removed earlier as it is not part of the compressed GZIP file.
3) SharpLibZip's GZIP stream is declared with the compressed file stream (msi) and a buffer equal to int uncompressedIntSize (have tested with a static value of 4096 as well).
4) I set up a MemoryStream to handle writing the output to a file as GZipInputStream doesn't have Read/Write; it takes the expected decompressed file size as the argument (capacity).
5) The Read/Write of the stream needs byte[] array as the first argument, so I set up a byte[] array with enough space to take all the bytes of the decompressed output (3584 bytes in this case, derived from uncompressedIntSize).
6) int GzipInputStream decompressStream uses .Read with the buffer as first argument, from offset 0, using the uncompressedIntSize as the count. Checking the arguments in here, the buffer array still has a capacity of 3584 bytes but has only been given 256 bytes of data. The rest are zeroes.
It looks like the output of .Read is being throttled to 256 bytes but I'm not sure where. Is there something I've missed with the Streams, or is this a limitation with .Read?

You need to loop when reading from a stream; the lazy way is probably:
decompressStream.CopyTo(outputStream);
(but this doesn't guarantee to stop after uncompressedIntSize bytes - it'll try to read to the end of decompressStream)
A more manual version (that respects an imposed length limit) would be:
const int BUFFER_SIZE = 1024; // whatever
var buffer = ArrayPool<byte>.Shared.Rent(BUFFER_SIZE);
try
{
int remaining = uncompressedIntSize, bytesRead;
while (remaining > 0 && // more to do, and making progress
(bytesRead = decompressStream.Read(
buffer, 0, Math.Min(remaining, buffer.Length))) > 0)
{
outputStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
if (remaining != 0) throw new EndOfStreamException();
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}

The issue turned out to be an oversight I'd made earlier in the posted code:
The file I'm working with has 27 sections which are GZipped, but they each have a header which will break the Gzip decompression if the GZipInput stream hits any of them. When opening the base file, it was starting from the beginning (adjusted by 6 to avoid the first header) each time instead of going to the next post-head offset:
brg.BaseStream.Seek(6, SeekOrigin.Begin);
Instead of:
brg.BaseStream.Seek(absoluteSectionOffset, SeekOrigin.Begin);
This meant that the extracted compressed data was an amalgam of the first headerless section + part of the 2nd section along with its header. As the first section is 256 bytes long without its header, this part was being decompressed correctly by the GZipInput stream. But after that is 6-bytes of header which breaks it, resulting in the rest of the output being 00s.
There was no explicit error being thrown by the GZipInput stream when this happened, so I'd incorrectly assumed that the cause was the .Read or something in the stream retaining data from the previous pass. Sorry for the hassle.

Does GetResponseStream()).ReadToEnd() read only 182955 characters

I am doing an HTTP POST and getting a huge XML back in the response . I am seeing that the xml gets truncated at 182956 th charcater and hence I am not able to Deserialize the response . Is there a way I can read the entire content ? Thanks in advance for your help .
string myresponse = string.Empty();
HttpWebResponse httpmyResponse = (HttpWebResponse)myrequest.GetResponse();
response = new StreamReader(httpmyResponse.GetResponseStream()).ReadToEnd();
Content-Length: 444313
Content-Type: application/xml

SO post I referred to in comments might actually solve your problem. In particular if you set DefaultMaximumErrorResponseLength to bigger value it might help. Internally, here how ResponseStream is being created
private Stream MakeMemoryStream(Stream stream) {
// some code emitted here
SyncMemoryStream memoryStream = new SyncMemoryStream(0); // buffered Stream to save off data
try {
//
// Now drain the Stream
//
if (stream.CanRead) {
byte [] buffer = new byte[1024];
int bytesTransferred = 0;
int maxBytesToBuffer = (HttpWebRequest.DefaultMaximumErrorResponseLength == -1)?buffer.Length:HttpWebRequest.DefaultMaximumErrorResponseLength*1024;
while ((bytesTransferred = stream.Read(buffer, 0, Math.Min(buffer.Length, maxBytesToBuffer))) > 0)
{
memoryStream.Write(buffer, 0, bytesTransferred);
if(HttpWebRequest.DefaultMaximumErrorResponseLength != -1)
maxBytesToBuffer -= bytesTransferred;
}
}
memoryStream.Position = 0;
}
catch {
}
// some other code
return memoryStream;
}
Important members here are stream that is response stream, and memoryStream - that is response stream that you're going to get back as a result to a method call GetResposneStream(). As you can see, before reading stream, method sets maxBytesToBuffer equal to DefaultMaximumErrorResponseLength*1024, if DefaultMaximumErrorResponseLength is not equal to -1, otherwise to the length of buffer which is 1024. Then, in the while loop, it reads stream, and on each iteration decreases maxBytesToBuffer by amount of bytes read (maxBytesToBuffer -= bytesTransferred).
Now lets consider both cases
DefaultMaximumErrorResponseLength is -1, stream length is 444313. In this case maxBytesToBuffer will be equal to buffer.Length, which is 1024. So it will read 1024 bytes, as a result bytesTransferred will be 1024, after first iteration, maxBytesToBuffer will become 0 (because of maxBytesToBuffer -= bytesTransferred), so next time it will read 0 bytes, and exit while loop, so you will have only 1024 bytes read from your entire stream.
DefaultMaximumErrorResponseLength is 1024, stream length is 444313. In this case maxBytesToBuffer will be equal to DefaultMaximumErrorResponseLength*1024 = 1048576. Again entering while loop first time, it will read 1024 (because of Math.Min(buffer.Length, maxBytesToBuffer)). On each iteration it will decrease maxBytesToBuffer by 1024, so while loop can iterate at least 1024 times, each time reading 1024 bytes. After roughly 433 iterations (that is your content length 444313/1024 = 433.8) it should read all of your content in the stream.
Having said this, I would first check what's the value of DefaultMaximumErrorResponseLength and do the math (as I've done previously), and see if that is root cause of your problem or not.
Code was taken from MS Reference Source web site

BinaryReader reading different length of data depending on BufferSize

The issue is as follows, I am using an HttpWebRequest to request some online data from dmo.gov.uk. The response I am reading using a BinaryReader and writing to a MemoryStream. I have packaged the code being used into a simple test method:
public static byte[] Test(int bufferSize)
{
var request = (HttpWebRequest)WebRequest.Create("http://www.dmo.gov.uk/xmlData.aspx?rptCode=D3B.2");
request.Method = "GET";
request.Credentials = CredentialCache.DefaultCredentials;
var buffer = new byte[bufferSize];
using (var httpResponse = (HttpWebResponse)request.GetResponse())
{
using (var ms = new MemoryStream())
{
using (var reader = new BinaryReader(httpResponse.GetResponseStream()))
{
int bytesRead;
while ((bytesRead = reader.Read(buffer, 0, bufferSize)) > 0)
{
ms.Write(buffer, 0, bytesRead);
}
}
return ms.GetBuffer();
}
}
}
My real-life code uses a buffer size of 2048 bytes usually, however I noticed today that this file has a huge amount of empty bytes (\0) at the end which bloats the file size. As a test I tried increasing the buffer size to near-on the file size I expected (I was expecting ~80Kb so made the buffer size 79000) and now I get the right file size. But I'm confused, I expected to get the same file size regardless of the buffer size used to read the data.
The following test:
Console.WriteLine(Test(2048).Length);
Console.WriteLine(Test(79000).Length);
Console.ReadLine();
Yields the follwoing output:
131072
81341
The second figure, using the high buffer size is the exact file size I was expecting (This file changes daily, so expect that size to differ after today's date). The first figure contains \0 for everything after the file size expected.
What's going on here?

You should change ms.GetBuffer(); to ms.ToArray();.
GetBuffer will return the entire MemoryStream buffer while ToArray will return all the values inside the MemoryStream.

Error "This stream does not support seek operations" in C#

I'm trying to get an image from an url using a byte stream. But i get this error message:
This stream does not support seek operations.
This is my code:
byte[] b;
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create(url);
WebResponse myResp = myReq.GetResponse();
Stream stream = myResp.GetResponseStream();
int i;
using (BinaryReader br = new BinaryReader(stream))
{
i = (int)(stream.Length);
b = br.ReadBytes(i); // (500000);
}
myResp.Close();
return b;
What am i doing wrong guys?

You probably want something like this. Either checking the length fails, or the BinaryReader is doing seeks behind the scenes.
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create(url);
WebResponse myResp = myReq.GetResponse();
byte[] b = null;
using( Stream stream = myResp.GetResponseStream() )
using( MemoryStream ms = new MemoryStream() )
{
int count = 0;
do
{
byte[] buf = new byte[1024];
count = stream.Read(buf, 0, 1024);
ms.Write(buf, 0, count);
} while(stream.CanRead && count > 0);
b = ms.ToArray();
}
edit:
I checked using reflector, and it is the call to stream.Length that fails. GetResponseStream returns a ConnectStream, and the Length property on that class throws the exception that you saw. As other posters mentioned, you cannot reliably get the length of a HTTP response, so that makes sense.

Use a StreamReader instead:
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create(url);
WebResponse myResp = myReq.GetResponse();
StreamReader reader = new StreamReader(myResp.GetResponseStream());
return reader.ReadToEnd();
(Note - the above returns a String instead of a byte array)

You can't reliably ask an HTTP connection for its length. It's possible to get the server to send you the length in advance, but (a) that header is often missing and (b) it's not guaranteed to be correct.
Instead you should:
Create a fixed-length byte[] that you pass to the Stream.Read method
Create a List<byte>
After each read, call List.AddRange to append the contents of your fixed-length buffer onto your byte list
Note that the last call to Read will return fewer than the full number of bytes you asked for. Make sure you only append that number of bytes onto your List<byte> and not the whole byte[], or you'll get garbage at the end of your list.

If the server doesn't send a length specification in the HTTP header, the stream size is unknown, so you get the error when trying to use the Length property.
Read the stream in smaller chunks, until you reach the end of the stream.

With images, you don't need to read the number of bytes at all. Just do this:
Image img = null;
string path = "http://www.example.com/image.jpg";
WebRequest request = WebRequest.Create(path);
req.Credentials = CredentialCache.DefaultCredentials; // in case your URL has Windows auth
WebResponse resp = req.GetResponse();
using( Stream stream = resp.GetResponseStream() )
{
img = Image.FromStream(stream);
// then use the image
}

Perhaps you should use the System.Net.WebClient API. If already using client.OpenRead(url) use client.DownloadData(url)
var client = new System.Net.WebClient();
byte[] buffer = client.DownloadData(url);
using (var stream = new MemoryStream(buffer))
{
... your code using the stream ...
}
Obviously this downloads everything before the Stream is created, so it may defeat the purpose of using a Stream. webClient.DownloadData("https://your.url") gets a byte array which you can then turn into a MemoryStream.

The length of a stream can not be read from the stream since the receiver does not know how many bytes the sender will send. Try to put a protocol on top of http and send i.e. the length as first item in the stream.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Calculate percent of bytes read from compressed HTTP response stream - c#

Related

Read the response from a web service [duplicate]

C# MemoryStream & GZipInputStream: Can't .Read more than 256 bytes

Does GetResponseStream()).ReadToEnd() read only 182955 characters

BinaryReader reading different length of data depending on BufferSize

Error "This stream does not support seek operations" in C#

Categories

Resources