I'm about to lose my freaking mind. I've been trying to get GzipStream to compress a string for the past hour, but for whatever reason, it refuses to write the entire byte array to the memory stream. At first I thought it had something to do with the using statements, but even after removing them it didn't seem to make a difference.
Initial config:
var str = "Here is a relatively simple string to compress";
byte[] compressedBytes;
string returnedData;
var bytes = Encoding.UTF8.GetBytes(str);
Works correctly (writes 64 length byte array):
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
msi.CopyTo(gs);
}
compressedBytes = mso.ToArray();
}
Fails (writes 10 length byte array):
using(var mso = new MemoryStream())
using(var msi = new MemoryStream(bytes))
using(var zip = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(zip);
compressedBytes = mso.ToArray();
}
Also fails (writes 10 length byte array):
var mso = new MemoryStream();
var msi = new MemoryStream(bytes);
var zip = new GZipStream(mso, CompressionMode.Compress);
msi.CopyTo(zip);
compressedBytes = mso.ToArray();
Can somebody explain why the first one works but in the other two I'm getting these incomplete arrays? Is something getting disposed out from under me? For that matter, is there a way for me to avoid using two memorystreams?
Thanks,
Zoombini
System.IO.Compression.GZipStream has to be closed (disposed) before you can use the underlying stream, because
It works block oriented
It has to write the footer, including the checksum (see the file format description on Wikipedia)
You're trying to get the the compressed data before GZipStream is closed. This doesn't return the full data, as you've seen. The reason the first one works is because you're calling compressedBytes = mso.ToArray(); after GZipStream has been disposed. So, untested but in theory, you should be able to modify your second code slightly like this to get it to work.
using(var mso = new MemoryStream())
{
using(var msi = new MemoryStream(bytes))
using(var zip = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(zip);
}
compressedBytes = mso.ToArray();
}
As others have said, you need to close the GZipStream before you can get the full data. A using statement will cause the Dispose method to be called on the stream at the end of the block, which will close the stream if it is not already closed. All of your examples above will work as expected if you place zip.Close(); after msi.CopyTo(zip);.
You can eliminate one of the MemoryStreams if you write it this way:
using (MemoryStream mso = new MemoryStream())
{
using (GZipStream zip = new GZipStream(mso, CompressionMode.Compress))
{
zip.Write(bytes, 0, bytes.Length);
}
compressedBytes = mso.ToArray();
}
Related
I have a very simple gzip method:
public byte[] Compress(string input)
{
var bytes = Encoding.UTF8.GetBytes(input);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
using (var gz = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gz);
return mso.ToArray();
}
}
However, unit tests fail. Even passing in a simple short string doesn't get gzipp'ed properly. e.g. "this is a test" becomes a byte array with 10 elements: [31,139,8,0,0,0,0,0,4,0] which of course doesn't ungzip properly. What's going wrong here? This has come straight from msdn!
You need to flush close the stream for it to compress. At the point you call mso.ToArray(), the GZipStream hasn't compressed anything yet and is waiting for more data.
A simple solution:
public byte[] Compress(string input)
{
var bytes = Encoding.UTF8.GetBytes(input);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gz = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gz);
}
return mso.ToArray();
}
}
Following various samples I've been able to convert a memory stream to a compressed stream and then to a byte array to save in a database but I'm having trouble going the other way. Here's what I've got so far...
...
using (MemoryStream compressedStream = new MemoryStream()) {
...some code that builds the compressedStream for an undetermined
number of byteArrays from a database
using (MemoryStream uncompressedStream = new MemoryStream()) {
// method 1
using (GZipStream unzippedStream = new GZipStream(compressedStream, CompressionMode.Decompress)) {
unzippedStream.CopyTo(uncompressedStream);
}
// method 2
using (GZipStream unzippedStream = new GZipStream(uncompressedStream, CompressionMode.Decompress)) {
compressedStream.CopyTo(unzippedStream);
}
... do something with uncompressedStream
}
}
Method 1 seams to follows the examples I see on here but causes an error "stream does not support writing"
Method 2 seams to make more sense but the uncompressed stream is always empty
P.S. Really what I would like to have is something simple like
MemoryStream compressed = GZipStream(uncompressed, Compress)
MemoryStream upcompressed = GZipStream(compressed, Decompress)
This code example works. The first part is just to get a compressed byte array. The second part demonstrates how the compressed stream can be created in code, write can be done multiple times. But the position must be set to 0.
byte[] compressed;
string output;
using (var outStream = new MemoryStream()) {
using (var tinyStream = new GZipStream(outStream, CompressionMode.Compress))
using (var mStream = new MemoryStream(Encoding.UTF8.GetBytes("This is a test"))) {
mStream.CopyTo(tinyStream);
}
compressed = outStream.ToArray();
}
using (var compressedStream = new MemoryStream()) {
// can do multiple writes here to create the compressed stream
compressedStream.Write(compressed, 0, compressed.Length);
compressedStream.Flush();
compressedStream.Position = 0;
using (var unzippedStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var uncompressedStream = new MemoryStream()) {
unzippedStream.CopyTo(uncompressedStream);
output = Encoding.UTF8.GetString(uncompressedStream.ToArray());
}
}
Console.WriteLine(output);
If compress some json text, and write that it to a file using a FileStream I get the expected results. However, I do not want to write to disk. I simply want to memorystream of the compressed data.
Compression to FileStream:
string json = Resource1.json;
using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(json)))
using (FileStream output = File.Create(#"C:\Users\roarker\Desktop\output.json.gz"))
{
using (GZipStream compression = new GZipStream(output, CompressionMode.Compress))
{
input.CopyTo(compression);
}
}
Above works. Below, the output memory stream is length 10 and results in an empty .gz file.
string json = Resource1.json;
using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(json)))
using (MemoryStream output = new MemoryStream())
{
using (GZipStream compression = new GZipStream(output, CompressionMode.Compress))
{
input.CopyTo(compression);
byte[] bytes = output.ToArray();
}
}
EDIT:
Moving output.ToArray() outside the inner using clause seems to work. However, this closes the output stream for most usage. IE:
using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(json)))
using (MemoryStream output = new MemoryStream())
{
using (GZipStream compression = new GZipStream(output, CompressionMode.Compress))
{
input.CopyTo(compression);
}
WriteToFile(output);
}
where :
public static void WriteToFile(Stream stream)
{
using (FileStream output = File.Create(#"C:\Users\roarker\Desktop\output.json.gz"))
{
stream.CopyTo(output);
}
}
This will fail on stream.CopyTo because the stream has been closed. I know I could make a new Stream from bytes of output.ToArray(), but why is this necessary? why does ToArray() work when the stream is closed?
Final Edit:
Just needed to use the contructor of the GZipStream with the leaveOpen parameter.
You're calling ToArray() before you've closed the GZipStream... that means it hasn't had a chance to flush the final bits of its buffer. This is a common issue for compression an encryption streams, where closing the stream needs to write some final pieces of data. (Even calling Flush() explicitly won't help, for example.)
Just move the ToArray call:
using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(json)))
using (MemoryStream output = new MemoryStream())
{
using (GZipStream compression = new GZipStream(output, CompressionMode.Compress))
{
input.CopyTo(compression);
}
byte[] bytes = output.ToArray();
// Use bytes
}
(Note that the stream will be disposed when you call ToArray, but that's okay.)
This question already has answers here:
Unzipping a .gz file using C#
(3 answers)
Closed 8 years ago.
I am reading an unzipped binary file from disk like this:
string fn = #"c:\\MyBinaryFile.DAT";
byte[] ba = File.ReadAllBytes(fn);
MemoryStream msReader = new MemoryStream(ba);
I now want to increase speed of I/O by using a zipped binary file. But how do I fit it into the above schema?
string fn = #"c:\\MyZippedBinaryFile.GZ";
//Put something here
byte[] ba = File.ReadAllBytes(fn);
//Or here
MemoryStream msReader = new MemoryStream(ba);
What is the best way to achieve this pls.
I need to end up with a MemoryStream as my next step is to deserialize it.
You'd have to use a GZipStream on the content of your file.
So basically it should be like this:
string fn = #"c:\\MyZippedBinaryFile.GZ";
byte[] ba = File.ReadAllBytes(fn);
using (MemoryStream msReader = new MemoryStream(ba))
using (GZipStream zipStream = new GZipStream(msReader, CompressionMode.Decompress))
{
// Read from zipStream instead of msReader
}
To account for the valid comment by flindenberg, you can also open the file directly without having to read the entire file into memory first:
string fn = #"c:\\MyZippedBinaryFile.GZ";
using (FileStream stream = File.OpenRead(fn))
using (GZipStream zipStream = new GZipStream(stream, CompressionMode.Decompress))
{
// Read from zipStream instead of stream
}
You need to end up with a memory stream? No problem:
string fn = #"c:\\MyZippedBinaryFile.GZ";
using (FileStream stream = File.OpenRead(fn))
using (GZipStream zipStream = new GZipStream(stream, CompressionMode.Decompress))
using (MemoryStream ms = new MemoryStream()
{
zipStream.CopyTo(ms);
ms.Seek(0, SeekOrigin.Begin); // don't forget to rewind the stream!
// Read from ms
}
I'm using GZipStream to compress a string, and I've modified two different examples to see what works. The first code snippet, which is a heavily modified version of the example in the documentation, simply returns an empty string.
public static String CompressStringGzip(String uncompressed)
{
String compressedString;
// Convert the uncompressed source string to a stream stored in memory
// and create the MemoryStream that will hold the compressed string
using (MemoryStream inStream = new MemoryStream(Encoding.Unicode.GetBytes(uncompressed)),
outStream = new MemoryStream())
{
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
inStream.CopyTo(compress);
StreamReader reader = new StreamReader(outStream);
compressedString = reader.ReadToEnd();
}
}
return compressedString;
and when I debug it, all I can tell is nothing is read from reader, which is compressedString is empty. However, the second method I wrote, modified from a CodeProject snippet is successful.
public static String CompressStringGzip3(String uncompressed)
{
//Transform string to byte array
String compressedString;
byte[] uncompressedByteArray = Encoding.Unicode.GetBytes(uncompressed);
using (MemoryStream outStream = new MemoryStream())
{
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
compress.Write(uncompressedByteArray, 0, uncompressedByteArray.Length);
compress.Close();
}
byte[] compressedByteArray = outStream.ToArray();
StringBuilder compressedStringBuilder = new StringBuilder(compressedByteArray.Length);
foreach (byte b in compressedByteArray)
compressedStringBuilder.Append((char)b);
compressedString = compressedStringBuilder.ToString();
}
return compressedString;
}
Why is the first code snippet not successful while the other one is? Even though they're slightly different, I don't know why the minor changes in the second snippet allow it to work. The sample string I'm using is SELECT * FROM foods f WHERE f.name = 'chicken';
I ended up using the following code for compression and decompression:
public static String Compress(String decompressed)
{
byte[] data = Encoding.UTF8.GetBytes(decompressed);
using (var input = new MemoryStream(data))
using (var output = new MemoryStream())
{
using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
{
input.CopyTo(gzip);
}
return Convert.ToBase64String(output.ToArray());
}
}
public static String Decompress(String compressed)
{
byte[] data = Convert.FromBase64String(compressed);
using (MemoryStream input = new MemoryStream(data))
using (GZipStream gzip = new GZipStream(input, CompressionMode.Decompress))
using (MemoryStream output = new MemoryStream())
{
gzip.CopyTo(output);
StringBuilder sb = new StringBuilder();
return Encoding.UTF8.GetString(output.ToArray());
}
}
The explanation for a part of the problem comes from this question. Although I fixed the problem by changing the code to what I included in this answer, these lines (in my original code):
foreach (byte b in compressedByteArray)
compressedStringBuilder.Append((char)b);
are problematic, because as dlev aptly phrased it:
You are interpreting each byte as its own character, when in fact that is not the case. Instead, you need the line:
string decoded = Encoding.Unicode.GetString(compressedByteArray);
The basic problem is that you are converting to a byte array based on an encoding, but then ignoring that encoding when you retrieve the bytes.
Therefore, the problem is solved, and the new code I'm using is much more succinct than my original code.
You need to move the code below outside the second using statement:
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
inStream.CopyTo(compress);
outStream.Position = 0;
StreamReader reader = new StreamReader(outStream);
compressedString = reader.ReadToEnd();
}
CopyTo() is not flushing the results to the underlying MemoryStream.
Update
Seems that GZipStream closes and disposes it's underlying stream when it is disposed (not the way I would have designed the class). I've updated the sample above and tested it.