I have a very simple gzip method:
public byte[] Compress(string input)
{
var bytes = Encoding.UTF8.GetBytes(input);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
using (var gz = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gz);
return mso.ToArray();
}
}
However, unit tests fail. Even passing in a simple short string doesn't get gzipp'ed properly. e.g. "this is a test" becomes a byte array with 10 elements: [31,139,8,0,0,0,0,0,4,0] which of course doesn't ungzip properly. What's going wrong here? This has come straight from msdn!
You need to flush close the stream for it to compress. At the point you call mso.ToArray(), the GZipStream hasn't compressed anything yet and is waiting for more data.
A simple solution:
public byte[] Compress(string input)
{
var bytes = Encoding.UTF8.GetBytes(input);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gz = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gz);
}
return mso.ToArray();
}
}
Related
I am working on an existing system where data is stored in a compressed byte array in a database.
The existing data has all been compressed using GZipDotNet.dll.
I am trying to switch to using the gzip functions in System.IO.Compression.
When I use:
public static byte[] DeCompressByteArray(byte[] inArray)
{
byte[] outStream = null;
outStream = GZipDotNet.GZip.Uncompress(inArray);
return outStream;
}
It works fine but:
public static byte[] DeCompressByteArray(byte[] inArray)
{
byte[] outStream = null;
using (var compressedStream = new MemoryStream(inArray))
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var resultStream = new MemoryStream())
{
zipStream.CopyTo(resultStream);
outStream = resultStream.ToArray();
}
return outStream;
}
Gives a response of:
The magic number in GZip header is not correct. Make sure you are passing in a GZip stream
I have the following functions to gzip and unzip text content, however the uncompressed output is not complete, missing approx the last 10th of the file (long html file). What am I doing wrong?
private byte[] GZipString(string input)
{
var bytes = Encoding.UTF8.GetBytes(input);
using (var mso = new MemoryStream())
using (var gz = new GZipStream(mso, CompressionMode.Compress))
{
gz.Write(bytes, 0, bytes.Length);
return mso.ToArray();
}
}
private string UnzipFile(string filename)
{
var bytes = File.ReadAllBytes(filename);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
using (var gz = new GZipStream(msi, CompressionMode.Decompress))
{
gz.CopyTo(mso);
return Encoding.UTF8.GetString(mso.ToArray());
}
}
Following various samples I've been able to convert a memory stream to a compressed stream and then to a byte array to save in a database but I'm having trouble going the other way. Here's what I've got so far...
...
using (MemoryStream compressedStream = new MemoryStream()) {
...some code that builds the compressedStream for an undetermined
number of byteArrays from a database
using (MemoryStream uncompressedStream = new MemoryStream()) {
// method 1
using (GZipStream unzippedStream = new GZipStream(compressedStream, CompressionMode.Decompress)) {
unzippedStream.CopyTo(uncompressedStream);
}
// method 2
using (GZipStream unzippedStream = new GZipStream(uncompressedStream, CompressionMode.Decompress)) {
compressedStream.CopyTo(unzippedStream);
}
... do something with uncompressedStream
}
}
Method 1 seams to follows the examples I see on here but causes an error "stream does not support writing"
Method 2 seams to make more sense but the uncompressed stream is always empty
P.S. Really what I would like to have is something simple like
MemoryStream compressed = GZipStream(uncompressed, Compress)
MemoryStream upcompressed = GZipStream(compressed, Decompress)
This code example works. The first part is just to get a compressed byte array. The second part demonstrates how the compressed stream can be created in code, write can be done multiple times. But the position must be set to 0.
byte[] compressed;
string output;
using (var outStream = new MemoryStream()) {
using (var tinyStream = new GZipStream(outStream, CompressionMode.Compress))
using (var mStream = new MemoryStream(Encoding.UTF8.GetBytes("This is a test"))) {
mStream.CopyTo(tinyStream);
}
compressed = outStream.ToArray();
}
using (var compressedStream = new MemoryStream()) {
// can do multiple writes here to create the compressed stream
compressedStream.Write(compressed, 0, compressed.Length);
compressedStream.Flush();
compressedStream.Position = 0;
using (var unzippedStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var uncompressedStream = new MemoryStream()) {
unzippedStream.CopyTo(uncompressedStream);
output = Encoding.UTF8.GetString(uncompressedStream.ToArray());
}
}
Console.WriteLine(output);
I'm using GZipStream to compress a string, and I've modified two different examples to see what works. The first code snippet, which is a heavily modified version of the example in the documentation, simply returns an empty string.
public static String CompressStringGzip(String uncompressed)
{
String compressedString;
// Convert the uncompressed source string to a stream stored in memory
// and create the MemoryStream that will hold the compressed string
using (MemoryStream inStream = new MemoryStream(Encoding.Unicode.GetBytes(uncompressed)),
outStream = new MemoryStream())
{
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
inStream.CopyTo(compress);
StreamReader reader = new StreamReader(outStream);
compressedString = reader.ReadToEnd();
}
}
return compressedString;
and when I debug it, all I can tell is nothing is read from reader, which is compressedString is empty. However, the second method I wrote, modified from a CodeProject snippet is successful.
public static String CompressStringGzip3(String uncompressed)
{
//Transform string to byte array
String compressedString;
byte[] uncompressedByteArray = Encoding.Unicode.GetBytes(uncompressed);
using (MemoryStream outStream = new MemoryStream())
{
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
compress.Write(uncompressedByteArray, 0, uncompressedByteArray.Length);
compress.Close();
}
byte[] compressedByteArray = outStream.ToArray();
StringBuilder compressedStringBuilder = new StringBuilder(compressedByteArray.Length);
foreach (byte b in compressedByteArray)
compressedStringBuilder.Append((char)b);
compressedString = compressedStringBuilder.ToString();
}
return compressedString;
}
Why is the first code snippet not successful while the other one is? Even though they're slightly different, I don't know why the minor changes in the second snippet allow it to work. The sample string I'm using is SELECT * FROM foods f WHERE f.name = 'chicken';
I ended up using the following code for compression and decompression:
public static String Compress(String decompressed)
{
byte[] data = Encoding.UTF8.GetBytes(decompressed);
using (var input = new MemoryStream(data))
using (var output = new MemoryStream())
{
using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
{
input.CopyTo(gzip);
}
return Convert.ToBase64String(output.ToArray());
}
}
public static String Decompress(String compressed)
{
byte[] data = Convert.FromBase64String(compressed);
using (MemoryStream input = new MemoryStream(data))
using (GZipStream gzip = new GZipStream(input, CompressionMode.Decompress))
using (MemoryStream output = new MemoryStream())
{
gzip.CopyTo(output);
StringBuilder sb = new StringBuilder();
return Encoding.UTF8.GetString(output.ToArray());
}
}
The explanation for a part of the problem comes from this question. Although I fixed the problem by changing the code to what I included in this answer, these lines (in my original code):
foreach (byte b in compressedByteArray)
compressedStringBuilder.Append((char)b);
are problematic, because as dlev aptly phrased it:
You are interpreting each byte as its own character, when in fact that is not the case. Instead, you need the line:
string decoded = Encoding.Unicode.GetString(compressedByteArray);
The basic problem is that you are converting to a byte array based on an encoding, but then ignoring that encoding when you retrieve the bytes.
Therefore, the problem is solved, and the new code I'm using is much more succinct than my original code.
You need to move the code below outside the second using statement:
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
inStream.CopyTo(compress);
outStream.Position = 0;
StreamReader reader = new StreamReader(outStream);
compressedString = reader.ReadToEnd();
}
CopyTo() is not flushing the results to the underlying MemoryStream.
Update
Seems that GZipStream closes and disposes it's underlying stream when it is disposed (not the way I would have designed the class). I've updated the sample above and tested it.
So here's a strange one. I have this method to take a Base64-encoded deflated string and return the original data:
public static string Base64Decompress(string base64data)
{
byte[] b = Convert.FromBase64String(base64data);
using (var orig = new MemoryStream(b))
{
using (var inflate = new MemoryStream())
{
using (var ds = new DeflateStream(orig, CompressionMode.Decompress))
{
ds.CopyTo(inflate);
return Encoding.ASCII.GetString(inflate.ToArray());
}
}
}
}
This returns an empty string unless I add a second call to ds.CopyTo(inflate). (WTF?)
...
using (var ds = new DeflateStream(orig, CompressionMode.Decompress))
{
ds.CopyTo(inflate);
ds.CopyTo(inflate);
return Encoding.ASCII.GetString(inflate.ToArray());
}
...
(Flush/Close/Dispose on ds have no effect.)
Why does the DeflateStream copy 0 bytes on the first call? I've also tried looping with Read(), but it also returns zero on the first call, then works on the second.
Update: here's the method I'm using to compress data.
public static string Base64Compress(string data, Encoding enc)
{
using (var ms = new MemoryStream())
{
using (var ds = new DeflateStream(ms, CompressionMode.Compress))
{
byte[] b = enc.GetBytes(data);
ds.Write(b, 0, b.Length);
ds.Flush();
return Convert.ToBase64String(ms.ToArray());
}
}
}
This happens when the compressed bytes are incomplete (i.e., not all blocks are written out).
If I use your Base64Compress with the following Decompress method I will get an InvalidDataException with the message 'Unknown block type. Stream might be corrupted.'
Decompress
public static string Decompress(Byte[] bytes)
{
using (var uncompressed = new MemoryStream())
using (var compressed = new MemoryStream(bytes))
using (var ds = new DeflateStream(compressed, CompressionMode.Decompress))
{
ds.CopyTo(uncompressed);
return Encoding.ASCII.GetString(uncompressed.ToArray());
}
}
Note that everything works as expected when using the following Compress method
public Byte[] Compress(Byte[] bytes)
{
using (var memoryStream = new MemoryStream())
{
using (var deflateStream = new DeflateStream(memoryStream, CompressionMode.Compress))
deflateStream.Write(bytes, 0, bytes.Length);
return memoryStream.ToArray();
}
}
Update
Oops, foolish me... you cannot ToArray the memory stream until you dispose the DeflateStream (as flush is acutally not implemented (and Deflate/GZip compress blocks of data); the final block is only written on close/dispose.
Re-write compress as:
public static string Base64Compress(string data, Encoding enc)
{
using (var ms = new MemoryStream())
{
using (var ds = new DeflateStream(ms, CompressionMode.Compress))
{
byte[] b = enc.GetBytes(data);
ds.Write(b, 0, b.Length);
}
return Convert.ToBase64String(ms.ToArray());
}
}