C# MemoryStream & GZipInputStream: Can't .Read more than 256 bytes

C# MemoryStream & GZipInputStream: Can't .Read more than 256 bytes - c#

I'm having a problem with writing an uncompressed GZIP stream using SharpZipLib's GZipInputStream. I only seem to be able to get 256 bytes worth of data with the rest not being written to and left zeroed. The compressed stream (compressedSection) has been checked and all data is there (1500+ bytes). The snippet of the decompression process is below:
int msiBuffer = 4096;
using (Stream msi = new MemoryStream(msiBuffer))
{
msi.Write(compressedSection, 0, compressedSection.Length);
msi.Position = 0;
int uncompressedIntSize = AllMethods.GetLittleEndianInt(uncompressedSize, 0); // Gets little endian value of uncompressed size into an integer
// SharpZipLib GZip method called
using (GZipInputStream decompressStream = new GZipInputStream(msi, uncompressedIntSize))
{
using (MemoryStream outputStream = new MemoryStream(uncompressedIntSize))
{
byte[] buffer = new byte[uncompressedIntSize];
decompressStream.Read(buffer, 0, uncompressedIntSize); // Stream is decompressed and read
outputStream.Write(buffer, 0, uncompressedIntSize);
using (var fs = new FileStream(kernelSectionUncompressed, FileMode.Create, FileAccess.Write))
{
fs.Write(buffer, 0, buffer.Length);
fs.Close();
}
outputStream.Close();
}
decompressStream.Close();
So in this snippet:
1) The compressed section is passed in, ready to be decompressed.
2) The expected size of the uncompressed output (which is stored in a header with the file as a 2-byte little-endian value) is passed through a method to convert it to integer. The header is removed earlier as it is not part of the compressed GZIP file.
3) SharpLibZip's GZIP stream is declared with the compressed file stream (msi) and a buffer equal to int uncompressedIntSize (have tested with a static value of 4096 as well).
4) I set up a MemoryStream to handle writing the output to a file as GZipInputStream doesn't have Read/Write; it takes the expected decompressed file size as the argument (capacity).
5) The Read/Write of the stream needs byte[] array as the first argument, so I set up a byte[] array with enough space to take all the bytes of the decompressed output (3584 bytes in this case, derived from uncompressedIntSize).
6) int GzipInputStream decompressStream uses .Read with the buffer as first argument, from offset 0, using the uncompressedIntSize as the count. Checking the arguments in here, the buffer array still has a capacity of 3584 bytes but has only been given 256 bytes of data. The rest are zeroes.
It looks like the output of .Read is being throttled to 256 bytes but I'm not sure where. Is there something I've missed with the Streams, or is this a limitation with .Read?

You need to loop when reading from a stream; the lazy way is probably:
decompressStream.CopyTo(outputStream);
(but this doesn't guarantee to stop after uncompressedIntSize bytes - it'll try to read to the end of decompressStream)
A more manual version (that respects an imposed length limit) would be:
const int BUFFER_SIZE = 1024; // whatever
var buffer = ArrayPool<byte>.Shared.Rent(BUFFER_SIZE);
try
{
int remaining = uncompressedIntSize, bytesRead;
while (remaining > 0 && // more to do, and making progress
(bytesRead = decompressStream.Read(
buffer, 0, Math.Min(remaining, buffer.Length))) > 0)
{
outputStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
if (remaining != 0) throw new EndOfStreamException();
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}

The issue turned out to be an oversight I'd made earlier in the posted code:
The file I'm working with has 27 sections which are GZipped, but they each have a header which will break the Gzip decompression if the GZipInput stream hits any of them. When opening the base file, it was starting from the beginning (adjusted by 6 to avoid the first header) each time instead of going to the next post-head offset:
brg.BaseStream.Seek(6, SeekOrigin.Begin);
Instead of:
brg.BaseStream.Seek(absoluteSectionOffset, SeekOrigin.Begin);
This meant that the extracted compressed data was an amalgam of the first headerless section + part of the 2nd section along with its header. As the first section is 256 bytes long without its header, this part was being decompressed correctly by the GZipInput stream. But after that is 6-bytes of header which breaks it, resulting in the rest of the output being 00s.
There was no explicit error being thrown by the GZipInput stream when this happened, so I'd incorrectly assumed that the cause was the .Read or something in the stream retaining data from the previous pass. Sorry for the hassle.

Related

Crypting part of a stream to another stream

My application requires the need to crypt parts of a stream to other streams, as some files have some parts encrypted with one key and others with other keys. To support this, I tried to make a method that crypts a part of a stream using an ICryptoTransform (yes, crypt, since ICryptoTransforms can be decryptors or encryptors) with a given offset and size, and it should be buffered.
This was my idea:
Open a buffer stream, for data chunks
Open a CryptoStream, and pass the buffer stream into it
Read and hopefully crypt it like this:
Read a chunk (the size of bufferSize) from the stream
Write that chunk into the crypto stream (which should write it to the
buffer stream)
Call .Flush() on that, to make sure the data
has been crypted
Write the contents of the buffer stream to the output stream
Seek the buffer stream back to the beginning, so that a new chunk can be written and crypted in it
Repeat until offset + size has been reached on the input stream.
This is my current code:
public static void CryptStreamPartBuffered(Stream input, Stream output, ICryptoTransform transform, long offset, long size, int bufferSize = 4000000)
{
using (MemoryStream ms = new MemoryStream()) //opening a memory stream, which will be the "buffer" for the crypted chunks
{
using (CryptoStream cs = new CryptoStream(ms, transform, CryptoStreamMode.Write)) //opening a cryptostream on the "buffer" stream, to crypt it's contents
{
input.Seek(offset, SeekOrigin.Begin); //seeking input stream to given start offset
byte[] buffer = new byte[bufferSize];
while (input.Position < offset + size)
{
int remaining = bufferSize, bytesRead;
while (remaining > 0 && (bytesRead = input.Read(buffer, 0, Math.Min(remaining, bufferSize))) > 0)
{
remaining -= bytesRead;
cs.Write(buffer); //writing current chunk of data to crypto stream
cs.Flush(); //making sure that the crypto stream has done its work on the current chunk (although I'm not really sure whether this is the right thing to do)
output.Write(ms.ToArray()); //writing the (hopefully) crypted data to the output stream
ms.Seek(0, SeekOrigin.Begin); //re-seeking the chunk stream to it's beginning, so that when the next chunk gets crypted, it can be written there
}
}
}
}
}
This straight-up doesn't work for files smaller than the default buffer size. While I do know that this can be fixed by just setting a smaller buffer size manually when calling the method, but having the support would be great.
However, this doesn't seem to do its job very well. I seem to get garbled data and I feel like I'm doing something wrong here, probably some very obvious mistake I just can't figure out.

Does GetResponseStream()).ReadToEnd() read only 182955 characters

I am doing an HTTP POST and getting a huge XML back in the response . I am seeing that the xml gets truncated at 182956 th charcater and hence I am not able to Deserialize the response . Is there a way I can read the entire content ? Thanks in advance for your help .
string myresponse = string.Empty();
HttpWebResponse httpmyResponse = (HttpWebResponse)myrequest.GetResponse();
response = new StreamReader(httpmyResponse.GetResponseStream()).ReadToEnd();
Content-Length: 444313
Content-Type: application/xml

SO post I referred to in comments might actually solve your problem. In particular if you set DefaultMaximumErrorResponseLength to bigger value it might help. Internally, here how ResponseStream is being created
private Stream MakeMemoryStream(Stream stream) {
// some code emitted here
SyncMemoryStream memoryStream = new SyncMemoryStream(0); // buffered Stream to save off data
try {
//
// Now drain the Stream
//
if (stream.CanRead) {
byte [] buffer = new byte[1024];
int bytesTransferred = 0;
int maxBytesToBuffer = (HttpWebRequest.DefaultMaximumErrorResponseLength == -1)?buffer.Length:HttpWebRequest.DefaultMaximumErrorResponseLength*1024;
while ((bytesTransferred = stream.Read(buffer, 0, Math.Min(buffer.Length, maxBytesToBuffer))) > 0)
{
memoryStream.Write(buffer, 0, bytesTransferred);
if(HttpWebRequest.DefaultMaximumErrorResponseLength != -1)
maxBytesToBuffer -= bytesTransferred;
}
}
memoryStream.Position = 0;
}
catch {
}
// some other code
return memoryStream;
}
Important members here are stream that is response stream, and memoryStream - that is response stream that you're going to get back as a result to a method call GetResposneStream(). As you can see, before reading stream, method sets maxBytesToBuffer equal to DefaultMaximumErrorResponseLength*1024, if DefaultMaximumErrorResponseLength is not equal to -1, otherwise to the length of buffer which is 1024. Then, in the while loop, it reads stream, and on each iteration decreases maxBytesToBuffer by amount of bytes read (maxBytesToBuffer -= bytesTransferred).
Now lets consider both cases
DefaultMaximumErrorResponseLength is -1, stream length is 444313. In this case maxBytesToBuffer will be equal to buffer.Length, which is 1024. So it will read 1024 bytes, as a result bytesTransferred will be 1024, after first iteration, maxBytesToBuffer will become 0 (because of maxBytesToBuffer -= bytesTransferred), so next time it will read 0 bytes, and exit while loop, so you will have only 1024 bytes read from your entire stream.
DefaultMaximumErrorResponseLength is 1024, stream length is 444313. In this case maxBytesToBuffer will be equal to DefaultMaximumErrorResponseLength*1024 = 1048576. Again entering while loop first time, it will read 1024 (because of Math.Min(buffer.Length, maxBytesToBuffer)). On each iteration it will decrease maxBytesToBuffer by 1024, so while loop can iterate at least 1024 times, each time reading 1024 bytes. After roughly 433 iterations (that is your content length 444313/1024 = 433.8) it should read all of your content in the stream.
Having said this, I would first check what's the value of DefaultMaximumErrorResponseLength and do the math (as I've done previously), and see if that is root cause of your problem or not.
Code was taken from MS Reference Source web site

stream.read gives corrupted bytes

HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream stream = response.GetResponseStream();
int sizeToRead = (int)response.ContentLength;
int sizeRead = 0;
int buffer = 1;
byte[] bytes = new byte[sizeToRead];
while (sizeToRead > 0)
{
int rs = sizeToRead > buffer ? buffer : sizeToRead;
stream.Read(bytes, sizeRead, rs);
sizeToRead -= rs;
sizeRead += rs;
}
stream.Close();
System.IO.File.WriteAllBytes("c:\\tmp\\b.mp3", bytes);
I have the above piece of code. Its purpose is to download a mp3 file from somewhere and save it to c:\tmp\filename. And it works perfectly.
However, if i change the buffer size to something not 1, say 512. The downloaded mp3 file will be scratchy. I have compared the file downloaded by my program with the one downloaded via browser, I found that some bytes of the mp3 file downloaded by my program are set to 0 (their file sizes are same thought).
Besides, I have also used fiddler to monitor the traffic when I use the above piece of code to download the mp3 file. I diffed the mp3 downloaded from my program and the browser, all the bytes are same.
So, I guess the problem is inside the stream reader or the reading process. Does anyone know why does it happen? and how to solve it without setting the buffer size to 1?

Stream.Read returns an int that tells you how many bytes were actually read. If you're dealing with a stream you had better actually take in that information and act on it.
To put it another way, just because you asked for 2 bytes to be read, doesn't mean that your buffer contains 2 valid bytes.
If you need to retrieve a particular number of bytes (that you know of), then you should loop until you've obtained that number of bytes.

Is stream.Read() returning the same value as rs? Try this:
byte[] bytes = new byte[sizeToRead];
while (sizeToRead > 0) {
int rs = sizeToRead > buffer ? buffer : sizeToRead;
rs = stream.Read(bytes, sizeRead, rs);
sizeToRead -= rs;
sizeRead += rs;
}

BinaryReader reading different length of data depending on BufferSize

The issue is as follows, I am using an HttpWebRequest to request some online data from dmo.gov.uk. The response I am reading using a BinaryReader and writing to a MemoryStream. I have packaged the code being used into a simple test method:
public static byte[] Test(int bufferSize)
{
var request = (HttpWebRequest)WebRequest.Create("http://www.dmo.gov.uk/xmlData.aspx?rptCode=D3B.2");
request.Method = "GET";
request.Credentials = CredentialCache.DefaultCredentials;
var buffer = new byte[bufferSize];
using (var httpResponse = (HttpWebResponse)request.GetResponse())
{
using (var ms = new MemoryStream())
{
using (var reader = new BinaryReader(httpResponse.GetResponseStream()))
{
int bytesRead;
while ((bytesRead = reader.Read(buffer, 0, bufferSize)) > 0)
{
ms.Write(buffer, 0, bytesRead);
}
}
return ms.GetBuffer();
}
}
}
My real-life code uses a buffer size of 2048 bytes usually, however I noticed today that this file has a huge amount of empty bytes (\0) at the end which bloats the file size. As a test I tried increasing the buffer size to near-on the file size I expected (I was expecting ~80Kb so made the buffer size 79000) and now I get the right file size. But I'm confused, I expected to get the same file size regardless of the buffer size used to read the data.
The following test:
Console.WriteLine(Test(2048).Length);
Console.WriteLine(Test(79000).Length);
Console.ReadLine();
Yields the follwoing output:
131072
81341
The second figure, using the high buffer size is the exact file size I was expecting (This file changes daily, so expect that size to differ after today's date). The first figure contains \0 for everything after the file size expected.
What's going on here?

You should change ms.GetBuffer(); to ms.ToArray();.
GetBuffer will return the entire MemoryStream buffer while ToArray will return all the values inside the MemoryStream.

C# TCP file transfer - Images semi-transferred

I am developing a TCP file transfer client-server program. At the moment I am able to send text files and other file formats perfectly fine, such as .zip with all contents intact on the server end. However, when I transfer a .gif the end result is a gif with same size as the original but with only part of the image showing as if most of the bytes were lost or not written correctly on the server end.
The client sends a 1KB header packet with the name and size of the file to the server. The server then responds with OK if ready and then creates a fileBuffer as large as the file to be sent is.
Here is some code to demonstrate my problem:
// Serverside method snippet dealing with data being sent
while (true)
{
// Spin the data in
if (streams[0].DataAvailable)
{
streams[0].Read(fileBuffer, 0, fileBuffer.Length);
break;
}
}
// Finished receiving file, write from buffer to created file
FileStream fs = File.Open(LOCAL_FOLDER + fileName, FileMode.CreateNew, FileAccess.Write);
fs.Write(fileBuffer, 0, fileBuffer.Length);
fs.Close();
Print("File successfully received.");
// Clientside method snippet dealing with a file send
while(true)
{
con.Read(ackBuffer, 0, ackBuffer.Length);
// Wait for OK response to start sending
if (Encoding.ASCII.GetString(ackBuffer) == "OK")
{
// Convert file to bytes
FileStream fs = new FileStream(inPath, FileMode.Open, FileAccess.Read);
fileBuffer = new byte[fs.Length];
fs.Read(fileBuffer, 0, (int)fs.Length);
fs.Close();
con.Write(fileBuffer, 0, fileBuffer.Length);
con.Flush();
break;
}
}
I've tried a binary writer instead of just using the filestream with the same result.
Am I incorrect in believing successful file transfer to be as simple as conversion to bytes, transportation and then conversion back to filename/type?
All help/advice much appreciated.

Its not about your image .. It's about your code.
if your image bytes were lost or not written correctly that's mean your file transfer code is wrong and even the .zip file or any other file would be received .. It's gonna be correpted.
It's a huge mistake to set the byte buffer length to the file size. imagine that you're going to send a large a file about 1GB .. then it's gonna take 1GB of RAM .. for an Idle transfering you should loop over the file to send.
This's a way to send/receive files nicely with no size limitation.
Send File
using (FileStream fs = new FileStream(srcPath, FileMode.Open, FileAccess.Read))
{
long fileSize = fs.Length;
long sum = 0; //sum here is the total of sent bytes.
int count = 0;
data = new byte[1024]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
count = fs.Read(data, 0, data.Length);
network.Write(data, 0, count);
sum += count;
}
network.Flush();
}
Receive File
long fileSize = // your file size that you are going to receive it.
using (FileStream fs = new FileStream(destPath, FileMode.Create, FileAccess.Write))
{
int count = 0;
long sum = 0; //sum here is the total of received bytes.
data = new byte[1024 * 8]; //8Kb buffer .. you might use a smaller size also.
while (sum < fileSize)
{
if (network.DataAvailable)
{
{
count = network.Read(data, 0, data.Length);
fs.Write(data, 0, count);
sum += count;
}
}
}
}
happy coding :)

When you write over TCP, the data can arrive in a number of packets. I think your early tests happened to fit into one packet, but this gif file is arriving in 2 or more. So when you call Read, you'll only get what's arrived so far - you'll need to check repeatedly until you've got as many bytes as the header told you to expect.
I found Beej's guide to network programming a big help when doing some work with TCP.

As others have pointed out, the data doesn't necessarily all arrive at once, and your code is overwriting the beginning of the buffer each time through the loop. The more robust way to write your reading loop is to read as many bytes as are available and increment a counter to keep track of how many bytes have been read so far so that you know where to put them in the buffer. Something like this works well:
int totalBytesRead = 0;
int bytesRead;
do
{
bytesRead = streams[0].Read(fileBuffer, totalBytesRead, fileBuffer.Length - totalBytesRead);
totalBytesRead += bytesRead;
} while (bytesRead != 0);
Stream.Read will return 0 when there's no data left to read.
Doing things this way will perform better than reading a byte at a time. It also gives you a way to ensure that you read the proper number of bytes. If totalBytesRead is not equal to the number of bytes you expected when the loop is finished, then something bad happened.

Thanks for your input Tvanfosson. I tinkered around with my code and managed to get it working. The synchronicity between my client and server was off. I took your advice though and replaced read with reading a byte one at a time.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# MemoryStream & GZipInputStream: Can't .Read more than 256 bytes - c#

Related

Crypting part of a stream to another stream

Does GetResponseStream()).ReadToEnd() read only 182955 characters

stream.read gives corrupted bytes

BinaryReader reading different length of data depending on BufferSize

C# TCP file transfer - Images semi-transferred

Categories

Resources