Programmatic compression/decompression to MemoryStream with GZipStream - c#

I built (based on a CodeProject article) a wrapper class (C#) to use a GZipStream to compress a MemoryStream. It compresses fine but doesn't decompress. I've looked at many other examples that have the same problem, and I feel like I'm following what's said but still am getting nothing when I decompress. Here's the compression and decompression methods:
public static byte[] Compress(byte[] bSource)
{
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress, true))
{
gzip.Write(bSource, 0, bSource.Length);
gzip.Close();
}
return ms.ToArray();
}
}
public static byte[] Decompress(byte[] bSource)
{
try
{
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Decompress, true))
{
gzip.Read(bSource, 0, bSource.Length);
gzip.Close();
}
return ms.ToArray();
}
}
catch (Exception ex)
{
throw new Exception("Error decompressing byte array", ex);
}
}
Here's an example of how I use it:
string sCompressed = Convert.ToBase64String(CompressionHelper.Compress("Some Text"));
// Other Processes
byte[] bReturned = CompressionHelper.Decompress(Convert.FromBase64String(sCompressed));
// bReturned has no elements after this line is executed

There is a bug in Decompress method.
The code does not read content of bSource. On the contrary, it overrides its content wile reading from empty gzip, created based on empty memory stream.
Basically what your version of code is doing:
//create empty memory
using (MemoryStream ms = new MemoryStream())
//create gzip stream over empty memory stream
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress, true))
// write from empty stream to bSource
gzip.Write(bSource, 0, bSource.Length);
The fix could look like this:
public static byte[] Decompress(byte[] bSource)
{
using (var inStream = new MemoryStream(bSource))
using (var gzip = new GZipStream(inStream, CompressionMode.Decompress))
using (var outStream = new MemoryStream())
{
gzip.CopyTo(outStream);
return outStream.ToArray();
}
}

The OP said in an edit, now rolled back:
Thanks to Alex's explanation of what was going wrong, I was able to fix the Decompress method. Unfortunately, I'm using .Net 3.5, so I wasn't able to implement the Stream.CopyTo method he suggested. With his explanation, though, I was able to figure out a solution. I made the appropriate changes to the Decompress method below.
public static byte[] Decompress(byte[] bSource)
{
try
{
using (var instream = new MemoryStream(bSource))
{
using (var gzip = new GZipStream(instream, CompressionMode.Decompress))
{
using (var outstream = new MemoryStream())
{
byte[] buffer = new byte[4096];
while (true)
{
int delta = gzip.Read(buffer, 0, buffer.Length);
if (delta > 0)
outstream.Write(buffer, 0, delta);
if (delta < 4096)
break;
}
return outstream.ToArray();
}
}
}
}
catch (Exception ex)
{
throw new Exception("Error decompressing byte array", ex);
}
}

Related

GZIPStream Compression Always Returns 10 Bytes

I'm trying to compress some text in my UWP application. I created this method to make it easier later on:
public static byte[] Compress(this string s)
{
var b = Encoding.UTF8.GetBytes(s);
using (MemoryStream ms = new MemoryStream())
using (GZipStream zipStream = new GZipStream(ms, CompressionMode.Compress))
{
zipStream.Write(b, 0, b.Length);
zipStream.Flush(); //Doesn't seem like Close() is available in UWP, so I changed it to Flush(). Is this the problem?
return ms.ToArray();
}
}
But unfortunately this always returns 10 bytes, no matter what the input text is. Is it because I don't use .Close() on the GZipStream?
You are returning the byte data too early.
The Close() method is replaced by the Dispose() method. So the GZIP stream will be written only when disposed so after you leave the using(GZipStream) {} block.
public static byte[] Compress(string s)
{
var b = Encoding.UTF8.GetBytes(s);
var ms = new MemoryStream();
using (GZipStream zipStream = new GZipStream(ms, CompressionMode.Compress))
{
zipStream.Write(b, 0, b.Length);
zipStream.Flush(); //Doesn't seem like Close() is available in UWP, so I changed it to Flush(). Is this the problem?
}
// we create the data array here once the GZIP stream has been disposed
var data = ms.ToArray();
ms.Dispose();
return data;
}

GZipStream does gzipping but ungzipping file end up with "Unexpected end of data"

Does anyone know why I'm getting the "Unexpected end of data" error message when un-gzipping the gzip file?
To verify the bytes data is not corrupted, I use the FooTest4.csv to write to file and was able to opened the file successfully.
Both 'FooTest3.csv.gzand 'FooTest2.csv.gz ran into "Unexpected end of data" when un-gzipping.
public static List<byte> CompressFile(List<byte> parmRawBytes)
{
//Initialize variables...
List<byte> returnModifiedBytes = null;
File.WriteAllBytes(#"X:\FooTest4.csv", parmRawBytes.ToArray());
using (var memoryStream = new MemoryStream())
{
using (var gzipStream = new GZipStream(memoryStream, CompressionMode.Compress, false))
{
gzipStream.Write(parmRawBytes.ToArray(), 0, parmRawBytes.ToArray().Length);
gzipStream.Flush();
File.WriteAllBytes(#"X:\FooTest3.csv.gz", memoryStream.ToArray());
returnModifiedBytes = memoryStream.ToArray().ToList();
}
}
File.WriteAllBytes(#"X:\FooTest2.csv.gz", returnModifiedBytes.ToArray());
return returnModifiedBytes;
}
GZipStream needs to be closed so it can write some terminating data to the end of the buffer to complete the gzip encoding.
byte[] inputBytes = ...;
using (var compressedStream = new MemoryStream())
{
using (var compressor = new GZipStream(compressedStream, CompressionMode.Compress))
{
compressor.Write(inputBytes, 0, inputBytes.Length);
}
// get bytes after the gzip stream is closed
File.WriteAllBytes(pathToFile, compressedStream.ToArray());
}
Instead of loading the bytes, compressing and saving them you could do do compression and writing at once. Also I don't know why you're using List<Byte> instead of byte[], maybe this could be it.
void CompressFile(string inputPath, string outputPath)
{
Stream readStream = new FileStream(inputPath, Filemode.Open);
Stream writeStream = new FileStream(outputPath, FileMode.Create);
Stream compressionStream = new GZipStream(writeStream. CompressionMode.Compress);
byte[] data = new byte[readStream.Length];
readStream.Read(data, 0, data.Length);
compressionStream.Write(data, 0, data.Length);
readStream.Close();
writeStream.Close();
}
byte[] CompressFile(string inputPath)
{
byte[] data = File.ReadAllBytes(inputPath);
MemoryStream memStream = new MemoryStream(data);
var gzipStream = new GZipStream(memStream, CompressionMode.Compress);
gzipStream.Write(data, 0, data.Length);
gzipStream.Close();
return gzipStream.ToArray();
}
PS: I wrote the code in the text editor, so there might be errors. Also you say the error is on the "unzippiing", why no show us the unzip code?

How to convert a compressed stream to an uncompressed stream in c# using GZipStream

Following various samples I've been able to convert a memory stream to a compressed stream and then to a byte array to save in a database but I'm having trouble going the other way. Here's what I've got so far...
...
using (MemoryStream compressedStream = new MemoryStream()) {
...some code that builds the compressedStream for an undetermined
number of byteArrays from a database
using (MemoryStream uncompressedStream = new MemoryStream()) {
// method 1
using (GZipStream unzippedStream = new GZipStream(compressedStream, CompressionMode.Decompress)) {
unzippedStream.CopyTo(uncompressedStream);
}
// method 2
using (GZipStream unzippedStream = new GZipStream(uncompressedStream, CompressionMode.Decompress)) {
compressedStream.CopyTo(unzippedStream);
}
... do something with uncompressedStream
}
}
Method 1 seams to follows the examples I see on here but causes an error "stream does not support writing"
Method 2 seams to make more sense but the uncompressed stream is always empty
P.S. Really what I would like to have is something simple like
MemoryStream compressed = GZipStream(uncompressed, Compress)
MemoryStream upcompressed = GZipStream(compressed, Decompress)
This code example works. The first part is just to get a compressed byte array. The second part demonstrates how the compressed stream can be created in code, write can be done multiple times. But the position must be set to 0.
byte[] compressed;
string output;
using (var outStream = new MemoryStream()) {
using (var tinyStream = new GZipStream(outStream, CompressionMode.Compress))
using (var mStream = new MemoryStream(Encoding.UTF8.GetBytes("This is a test"))) {
mStream.CopyTo(tinyStream);
}
compressed = outStream.ToArray();
}
using (var compressedStream = new MemoryStream()) {
// can do multiple writes here to create the compressed stream
compressedStream.Write(compressed, 0, compressed.Length);
compressedStream.Flush();
compressedStream.Position = 0;
using (var unzippedStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var uncompressedStream = new MemoryStream()) {
unzippedStream.CopyTo(uncompressedStream);
output = Encoding.UTF8.GetString(uncompressedStream.ToArray());
}
}
Console.WriteLine(output);

StreamReader ReadToEnd() returns empty string on first attempt

I know this question has been asked before on Stackoverflow, but could not find an explanation.
When I try to read a string from a compressed byte array I get an empty string on the first attempt, on the second I succed and get the string.
Code example:
public static string Decompress(byte[] gzBuffer)
{
if (gzBuffer == null)
return null;
using (var ms = new MemoryStream(gzBuffer))
{
using (var decompress = new GZipStream(ms, CompressionMode.Decompress))
{
using (var sr = new StreamReader(decompress, Encoding.UTF8))
{
string ret = sr.ReadToEnd();
// this is the extra check that is needed !?
if (ret == "")
ret = sr.ReadToEnd();
return ret;
}
}
}
}
All suggestions are appreciated.
- Victor Cassel
I found the bug. It was as Michael suggested in the compression routine. I missed to call Close() on the GZipStream.
public static byte[] Compress(string text)
{
if (string.IsNullOrEmpty(text))
return null;
byte[] raw = Encoding.UTF8.GetBytes(text);
using (var ms = new MemoryStream())
{
using (var compress = new GZipStream (ms, CompressionMode.Compress))
{
compress.Write(raw, 0, raw.Length);
compress.Close();
return ms.ToArray();
}
}
}
What happened was that the data seemed to get saved in a bad state that required two calls to ReadToEnd() in the decompression routine later on to extract the same data. Very odd!
try adding ms.Position = 0 before string ret = sr.ReadToEnd();
Where is gzBuffer coming from? Did you also write the code that is producing the compressed data?
Perhaps the buffer data you have is invalid or somehow incomplete, or perhaps it consists of multiple deflate streams concatenated together.
I hope this helps.
For ByteArray:
static byte[] CompressToByte(string data)
{
MemoryStream outstream = new MemoryStream();
GZipStream compressionStream =
new GZipStream(outstream, CompressionMode.Compress, true);
StreamWriter writer = new StreamWriter(compressionStream);
writer.Write(data);
writer.Close();
return StreamToByte(outstream);
}
static string Decompress(byte[] data)
{
MemoryStream instream = new MemoryStream(data);
GZipStream compressionStream =
new GZipStream(instream, CompressionMode.Decompress);
StreamReader reader = new StreamReader(compressionStream);
string outtext = reader.ReadToEnd();
reader.Close();
return outtext;
}
public static byte[] StreamToByte(Stream stream)
{
stream.Position = 0;
byte[] buffer = new byte[128];
using (MemoryStream ms = new MemoryStream())
{
while (true)
{
int read = stream.Read(buffer, 0, buffer.Length);
if (!(read > 0))
return ms.ToArray();
ms.Write(buffer, 0, read);
}
}
}
You can replace if(!(read > 0)) with if(read <= 0).
For some reason if(read <= 0) isn't displayed corret above.
For Stream:
static Stream CompressToStream(string data)
{
MemoryStream outstream = new MemoryStream();
GZipStream compressionStream =
new GZipStream(outstream, CompressionMode.Compress, true);
StreamWriter writer = new StreamWriter(compressionStream);
writer.Write(data);
writer.Close();
return outstream;
}
static string Decompress(Stream data)
{
data.Position = 0;
GZipStream compressionStream =
new GZipStream(data, CompressionMode.Decompress);
StreamReader reader = new StreamReader(compressionStream);
string outtext = reader.ReadToEnd();
reader.Close();
return outtext;
}
The MSDN Page on the function mentions the following:
If the current method throws an OutOfMemoryException, the reader's position in the underlying Stream object is advanced by the number of characters the method was able to read, but the characters already read into the internal ReadLine buffer are discarded. If you manipulate the position of the underlying stream after reading data into the buffer, the position of the underlying stream might not match the position of the internal buffer. To reset the internal buffer, call the DiscardBufferedData method; however, this method slows performance and should be called only when absolutely necessary.
Perhaps try calling DiscardBufferedData() before your ReadToEnd() and see what it does (I know you aren't getting the exception, but it's all I can think of...)?

DeflateStream not decompressing data (the first time)

So here's a strange one. I have this method to take a Base64-encoded deflated string and return the original data:
public static string Base64Decompress(string base64data)
{
byte[] b = Convert.FromBase64String(base64data);
using (var orig = new MemoryStream(b))
{
using (var inflate = new MemoryStream())
{
using (var ds = new DeflateStream(orig, CompressionMode.Decompress))
{
ds.CopyTo(inflate);
return Encoding.ASCII.GetString(inflate.ToArray());
}
}
}
}
This returns an empty string unless I add a second call to ds.CopyTo(inflate). (WTF?)
...
using (var ds = new DeflateStream(orig, CompressionMode.Decompress))
{
ds.CopyTo(inflate);
ds.CopyTo(inflate);
return Encoding.ASCII.GetString(inflate.ToArray());
}
...
(Flush/Close/Dispose on ds have no effect.)
Why does the DeflateStream copy 0 bytes on the first call? I've also tried looping with Read(), but it also returns zero on the first call, then works on the second.
Update: here's the method I'm using to compress data.
public static string Base64Compress(string data, Encoding enc)
{
using (var ms = new MemoryStream())
{
using (var ds = new DeflateStream(ms, CompressionMode.Compress))
{
byte[] b = enc.GetBytes(data);
ds.Write(b, 0, b.Length);
ds.Flush();
return Convert.ToBase64String(ms.ToArray());
}
}
}
This happens when the compressed bytes are incomplete (i.e., not all blocks are written out).
If I use your Base64Compress with the following Decompress method I will get an InvalidDataException with the message 'Unknown block type. Stream might be corrupted.'
Decompress
public static string Decompress(Byte[] bytes)
{
using (var uncompressed = new MemoryStream())
using (var compressed = new MemoryStream(bytes))
using (var ds = new DeflateStream(compressed, CompressionMode.Decompress))
{
ds.CopyTo(uncompressed);
return Encoding.ASCII.GetString(uncompressed.ToArray());
}
}
Note that everything works as expected when using the following Compress method
public Byte[] Compress(Byte[] bytes)
{
using (var memoryStream = new MemoryStream())
{
using (var deflateStream = new DeflateStream(memoryStream, CompressionMode.Compress))
deflateStream.Write(bytes, 0, bytes.Length);
return memoryStream.ToArray();
}
}
Update
Oops, foolish me... you cannot ToArray the memory stream until you dispose the DeflateStream (as flush is acutally not implemented (and Deflate/GZip compress blocks of data); the final block is only written on close/dispose.
Re-write compress as:
public static string Base64Compress(string data, Encoding enc)
{
using (var ms = new MemoryStream())
{
using (var ds = new DeflateStream(ms, CompressionMode.Compress))
{
byte[] b = enc.GetBytes(data);
ds.Write(b, 0, b.Length);
}
return Convert.ToBase64String(ms.ToArray());
}
}

Categories

Resources