I am working on an existing system where data is stored in a compressed byte array in a database.
The existing data has all been compressed using GZipDotNet.dll.
I am trying to switch to using the gzip functions in System.IO.Compression.
When I use:
public static byte[] DeCompressByteArray(byte[] inArray)
{
byte[] outStream = null;
outStream = GZipDotNet.GZip.Uncompress(inArray);
return outStream;
}
It works fine but:
public static byte[] DeCompressByteArray(byte[] inArray)
{
byte[] outStream = null;
using (var compressedStream = new MemoryStream(inArray))
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var resultStream = new MemoryStream())
{
zipStream.CopyTo(resultStream);
outStream = resultStream.ToArray();
}
return outStream;
}
Gives a response of:
The magic number in GZip header is not correct. Make sure you are passing in a GZip stream
I have a very simple gzip method:
public byte[] Compress(string input)
{
var bytes = Encoding.UTF8.GetBytes(input);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
using (var gz = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gz);
return mso.ToArray();
}
}
However, unit tests fail. Even passing in a simple short string doesn't get gzipp'ed properly. e.g. "this is a test" becomes a byte array with 10 elements: [31,139,8,0,0,0,0,0,4,0] which of course doesn't ungzip properly. What's going wrong here? This has come straight from msdn!
You need to flush close the stream for it to compress. At the point you call mso.ToArray(), the GZipStream hasn't compressed anything yet and is waiting for more data.
A simple solution:
public byte[] Compress(string input)
{
var bytes = Encoding.UTF8.GetBytes(input);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gz = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gz);
}
return mso.ToArray();
}
}
I am newbie in .net. I am doing compression and decompression string in C#. There is a XML and I am converting in string and after that I am doing compression and decompression.There is no compilation error in my code except when I decompression my code and return my string, its returning only half of the XML.
Below is my code, please correct me where I am wrong.
Code:
class Program
{
public static string Zip(string value)
{
//Transform string into byte[]
byte[] byteArray = new byte[value.Length];
int indexBA = 0;
foreach (char item in value.ToCharArray())
{
byteArray[indexBA++] = (byte)item;
}
//Prepare for compress
System.IO.MemoryStream ms = new System.IO.MemoryStream();
System.IO.Compression.GZipStream sw = new System.IO.Compression.GZipStream(ms, System.IO.Compression.CompressionMode.Compress);
//Compress
sw.Write(byteArray, 0, byteArray.Length);
//Close, DO NOT FLUSH cause bytes will go missing...
sw.Close();
//Transform byte[] zip data to string
byteArray = ms.ToArray();
System.Text.StringBuilder sB = new System.Text.StringBuilder(byteArray.Length);
foreach (byte item in byteArray)
{
sB.Append((char)item);
}
ms.Close();
sw.Dispose();
ms.Dispose();
return sB.ToString();
}
public static string UnZip(string value)
{
//Transform string into byte[]
byte[] byteArray = new byte[value.Length];
int indexBA = 0;
foreach (char item in value.ToCharArray())
{
byteArray[indexBA++] = (byte)item;
}
//Prepare for decompress
System.IO.MemoryStream ms = new System.IO.MemoryStream(byteArray);
System.IO.Compression.GZipStream sr = new System.IO.Compression.GZipStream(ms,
System.IO.Compression.CompressionMode.Decompress);
//Reset variable to collect uncompressed result
byteArray = new byte[byteArray.Length];
//Decompress
int rByte = sr.Read(byteArray, 0, byteArray.Length);
//Transform byte[] unzip data to string
System.Text.StringBuilder sB = new System.Text.StringBuilder(rByte);
//Read the number of bytes GZipStream red and do not a for each bytes in
//resultByteArray;
for (int i = 0; i < rByte; i++)
{
sB.Append((char)byteArray[i]);
}
sr.Close();
ms.Close();
sr.Dispose();
ms.Dispose();
return sB.ToString();
}
static void Main(string[] args)
{
XDocument doc = XDocument.Load(#"D:\RSP.xml");
string val = doc.ToString(SaveOptions.DisableFormatting);
val = Zip(val);
val = UnZip(val);
}
}
My XML size is 63KB.
The code to compress/decompress a string
public static void CopyTo(Stream src, Stream dest) {
byte[] bytes = new byte[4096];
int cnt;
while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0) {
dest.Write(bytes, 0, cnt);
}
}
public static byte[] Zip(string str) {
var bytes = Encoding.UTF8.GetBytes(str);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
//msi.CopyTo(gs);
CopyTo(msi, gs);
}
return mso.ToArray();
}
}
public static string Unzip(byte[] bytes) {
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(msi, CompressionMode.Decompress)) {
//gs.CopyTo(mso);
CopyTo(gs, mso);
}
return Encoding.UTF8.GetString(mso.ToArray());
}
}
static void Main(string[] args) {
byte[] r1 = Zip("StringStringStringStringStringStringStringStringStringStringStringStringStringString");
string r2 = Unzip(r1);
}
Remember that Zip returns a byte[], while Unzip returns a string. If you want a string from Zip you can Base64 encode it (for example by using Convert.ToBase64String(r1)) (the result of Zip is VERY binary! It isn't something you can print to the screen or write directly in an XML)
The version suggested is for .NET 2.0, for .NET 4.0 use the MemoryStream.CopyTo.
IMPORTANT: The compressed contents cannot be written to the output stream until the GZipStream knows that it has all of the input (i.e., to effectively compress it needs all of the data). You need to make sure that you Dispose() of the GZipStream before inspecting the output stream (e.g., mso.ToArray()). This is done with the using() { } block above. Note that the GZipStream is the innermost block and the contents are accessed outside of it. The same goes for decompressing: Dispose() of the GZipStream before attempting to access the data.
according to
this snippet
i use this code and it's working fine:
using System;
using System.IO;
using System.IO.Compression;
using System.Text;
namespace CompressString
{
internal static class StringCompressor
{
/// <summary>
/// Compresses the string.
/// </summary>
/// <param name="text">The text.</param>
/// <returns></returns>
public static string CompressString(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
var memoryStream = new MemoryStream();
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
{
gZipStream.Write(buffer, 0, buffer.Length);
}
memoryStream.Position = 0;
var compressedData = new byte[memoryStream.Length];
memoryStream.Read(compressedData, 0, compressedData.Length);
var gZipBuffer = new byte[compressedData.Length + 4];
Buffer.BlockCopy(compressedData, 0, gZipBuffer, 4, compressedData.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gZipBuffer, 0, 4);
return Convert.ToBase64String(gZipBuffer);
}
/// <summary>
/// Decompresses the string.
/// </summary>
/// <param name="compressedText">The compressed text.</param>
/// <returns></returns>
public static string DecompressString(string compressedText)
{
byte[] gZipBuffer = Convert.FromBase64String(compressedText);
using (var memoryStream = new MemoryStream())
{
int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);
var buffer = new byte[dataLength];
memoryStream.Position = 0;
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
{
gZipStream.Read(buffer, 0, buffer.Length);
}
return Encoding.UTF8.GetString(buffer);
}
}
}
}
With the advent of .NET 4.0 (and higher) with the Stream.CopyTo() methods, I thought I would post an updated approach.
I also think the below version is useful as a clear example of a self-contained class for compressing regular strings to Base64 encoded strings, and vice versa:
public static class StringCompression
{
/// <summary>
/// Compresses a string and returns a deflate compressed, Base64 encoded string.
/// </summary>
/// <param name="uncompressedString">String to compress</param>
public static string Compress(string uncompressedString)
{
byte[] compressedBytes;
using (var uncompressedStream = new MemoryStream(Encoding.UTF8.GetBytes(uncompressedString)))
{
using (var compressedStream = new MemoryStream())
{
// setting the leaveOpen parameter to true to ensure that compressedStream will not be closed when compressorStream is disposed
// this allows compressorStream to close and flush its buffers to compressedStream and guarantees that compressedStream.ToArray() can be called afterward
// although MSDN documentation states that ToArray() can be called on a closed MemoryStream, I don't want to rely on that very odd behavior should it ever change
using (var compressorStream = new DeflateStream(compressedStream, CompressionLevel.Fastest, true))
{
uncompressedStream.CopyTo(compressorStream);
}
// call compressedStream.ToArray() after the enclosing DeflateStream has closed and flushed its buffer to compressedStream
compressedBytes = compressedStream.ToArray();
}
}
return Convert.ToBase64String(compressedBytes);
}
/// <summary>
/// Decompresses a deflate compressed, Base64 encoded string and returns an uncompressed string.
/// </summary>
/// <param name="compressedString">String to decompress.</param>
public static string Decompress(string compressedString)
{
byte[] decompressedBytes;
var compressedStream = new MemoryStream(Convert.FromBase64String(compressedString));
using (var decompressorStream = new DeflateStream(compressedStream, CompressionMode.Decompress))
{
using (var decompressedStream = new MemoryStream())
{
decompressorStream.CopyTo(decompressedStream);
decompressedBytes = decompressedStream.ToArray();
}
}
return Encoding.UTF8.GetString(decompressedBytes);
}
}
Here’s another approach using the extension methods technique to extend the String class to add string compression and decompression. You can drop the class below into an existing project and then use thusly:
var uncompressedString = "Hello World!";
var compressedString = uncompressedString.Compress();
and
var decompressedString = compressedString.Decompress();
To wit:
public static class Extensions
{
/// <summary>
/// Compresses a string and returns a deflate compressed, Base64 encoded string.
/// </summary>
/// <param name="uncompressedString">String to compress</param>
public static string Compress(this string uncompressedString)
{
byte[] compressedBytes;
using (var uncompressedStream = new MemoryStream(Encoding.UTF8.GetBytes(uncompressedString)))
{
using (var compressedStream = new MemoryStream())
{
// setting the leaveOpen parameter to true to ensure that compressedStream will not be closed when compressorStream is disposed
// this allows compressorStream to close and flush its buffers to compressedStream and guarantees that compressedStream.ToArray() can be called afterward
// although MSDN documentation states that ToArray() can be called on a closed MemoryStream, I don't want to rely on that very odd behavior should it ever change
using (var compressorStream = new DeflateStream(compressedStream, CompressionLevel.Fastest, true))
{
uncompressedStream.CopyTo(compressorStream);
}
// call compressedStream.ToArray() after the enclosing DeflateStream has closed and flushed its buffer to compressedStream
compressedBytes = compressedStream.ToArray();
}
}
return Convert.ToBase64String(compressedBytes);
}
/// <summary>
/// Decompresses a deflate compressed, Base64 encoded string and returns an uncompressed string.
/// </summary>
/// <param name="compressedString">String to decompress.</param>
public static string Decompress(this string compressedString)
{
byte[] decompressedBytes;
var compressedStream = new MemoryStream(Convert.FromBase64String(compressedString));
using (var decompressorStream = new DeflateStream(compressedStream, CompressionMode.Decompress))
{
using (var decompressedStream = new MemoryStream())
{
decompressorStream.CopyTo(decompressedStream);
decompressedBytes = decompressedStream.ToArray();
}
}
return Encoding.UTF8.GetString(decompressedBytes);
}
}
I like #fubo's answer the best but I think this is much more elegant.
This method is more compatible because it doesn't manually store the length up front.
Also I've exposed extensions to support compression for string to string, byte[] to byte[], and Stream to Stream.
public static class ZipExtensions
{
public static string CompressToBase64(this string data)
{
return Convert.ToBase64String(Encoding.UTF8.GetBytes(data).Compress());
}
public static string DecompressFromBase64(this string data)
{
return Encoding.UTF8.GetString(Convert.FromBase64String(data).Decompress());
}
public static byte[] Compress(this byte[] data)
{
using (var sourceStream = new MemoryStream(data))
using (var destinationStream = new MemoryStream())
{
sourceStream.CompressTo(destinationStream);
return destinationStream.ToArray();
}
}
public static byte[] Decompress(this byte[] data)
{
using (var sourceStream = new MemoryStream(data))
using (var destinationStream = new MemoryStream())
{
sourceStream.DecompressTo(destinationStream);
return destinationStream.ToArray();
}
}
public static void CompressTo(this Stream stream, Stream outputStream)
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
{
stream.CopyTo(gZipStream);
gZipStream.Flush();
}
}
public static void DecompressTo(this Stream stream, Stream outputStream)
{
using (var gZipStream = new GZipStream(stream, CompressionMode.Decompress))
{
gZipStream.CopyTo(outputStream);
}
}
}
This is an updated version for .NET 4.5 and newer using async/await and IEnumerables:
public static class CompressionExtensions
{
public static async Task<IEnumerable<byte>> Zip(this object obj)
{
byte[] bytes = obj.Serialize();
using (MemoryStream msi = new MemoryStream(bytes))
using (MemoryStream mso = new MemoryStream())
{
using (var gs = new GZipStream(mso, CompressionMode.Compress))
await msi.CopyToAsync(gs);
return mso.ToArray().AsEnumerable();
}
}
public static async Task<object> Unzip(this byte[] bytes)
{
using (MemoryStream msi = new MemoryStream(bytes))
using (MemoryStream mso = new MemoryStream())
{
using (var gs = new GZipStream(msi, CompressionMode.Decompress))
{
// Sync example:
//gs.CopyTo(mso);
// Async way (take care of using async keyword on the method definition)
await gs.CopyToAsync(mso);
}
return mso.ToArray().Deserialize();
}
}
}
public static class SerializerExtensions
{
public static byte[] Serialize<T>(this T objectToWrite)
{
using (MemoryStream stream = new MemoryStream())
{
BinaryFormatter binaryFormatter = new BinaryFormatter();
binaryFormatter.Serialize(stream, objectToWrite);
return stream.GetBuffer();
}
}
public static async Task<T> _Deserialize<T>(this byte[] arr)
{
using (MemoryStream stream = new MemoryStream())
{
BinaryFormatter binaryFormatter = new BinaryFormatter();
await stream.WriteAsync(arr, 0, arr.Length);
stream.Position = 0;
return (T)binaryFormatter.Deserialize(stream);
}
}
public static async Task<object> Deserialize(this byte[] arr)
{
object obj = await arr._Deserialize<object>();
return obj;
}
}
With this you can serialize everything BinaryFormatter supports, instead only of strings.
Edit:
In case, you need take care of Encoding, you could just use Convert.ToBase64String(byte[])...
Take a look at this answer if you need an example!
For those who still getting The magic number in GZip header is not correct. Make sure you are passing in a GZip stream. ERROR
and if your string was zipped using php you'll need to do something like:
public static string decodeDecompress(string originalReceivedSrc) {
byte[] bytes = Convert.FromBase64String(originalReceivedSrc);
using (var mem = new MemoryStream()) {
//the trick is here
mem.Write(new byte[] { 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00 }, 0, 8);
mem.Write(bytes, 0, bytes.Length);
mem.Position = 0;
using (var gzip = new GZipStream(mem, CompressionMode.Decompress))
using (var reader = new StreamReader(gzip)) {
return reader.ReadToEnd();
}
}
}
We can reduce code complexity by using StreamReader and StreamWriter rather than manually converting strings to byte arrays. Three streams is all you need:
public static byte[] Zip(string uncompressed)
{
byte[] ret;
using (var outputMemory = new MemoryStream())
{
using (var gz = new GZipStream(outputMemory, CompressionLevel.Optimal))
{
using (var sw = new StreamWriter(gz, Encoding.UTF8))
{
sw.Write(uncompressed);
}
}
ret = outputMemory.ToArray();
}
return ret;
}
public static string Unzip(byte[] compressed)
{
string ret = null;
using (var inputMemory = new MemoryStream(compressed))
{
using (var gz = new GZipStream(inputMemory, CompressionMode.Decompress))
{
using (var sr = new StreamReader(gz, Encoding.UTF8))
{
ret = sr.ReadToEnd();
}
}
}
return ret;
}
For .net6 cross platform Compression/Decompression string with C# using SharpZipLib library. Test for ubuntu(18.0.x) and windows.
#region helper
private byte[] Zip(string text)
{
if (text == null)
return null;
byte[] ret;
using (var outputMemory = new MemoryStream())
{
using (var gz = new GZipStream(outputMemory, CompressionLevel.Optimal))
{
using (var sw = new StreamWriter(gz, Encoding.UTF8))
{
sw.Write(text);
}
}
ret = outputMemory.ToArray();
}
return ret;
}
private string Unzip(byte[] bytes)
{
string ret = null;
using (var inputMemory = new MemoryStream(bytes))
{
using (var gz = new GZipStream(inputMemory, CompressionMode.Decompress))
{
using (var sr = new StreamReader(gz, Encoding.UTF8))
{
ret = sr.ReadToEnd();
}
}
}
return ret;
}
#endregion
I'm using GZipStream to compress a string, and I've modified two different examples to see what works. The first code snippet, which is a heavily modified version of the example in the documentation, simply returns an empty string.
public static String CompressStringGzip(String uncompressed)
{
String compressedString;
// Convert the uncompressed source string to a stream stored in memory
// and create the MemoryStream that will hold the compressed string
using (MemoryStream inStream = new MemoryStream(Encoding.Unicode.GetBytes(uncompressed)),
outStream = new MemoryStream())
{
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
inStream.CopyTo(compress);
StreamReader reader = new StreamReader(outStream);
compressedString = reader.ReadToEnd();
}
}
return compressedString;
and when I debug it, all I can tell is nothing is read from reader, which is compressedString is empty. However, the second method I wrote, modified from a CodeProject snippet is successful.
public static String CompressStringGzip3(String uncompressed)
{
//Transform string to byte array
String compressedString;
byte[] uncompressedByteArray = Encoding.Unicode.GetBytes(uncompressed);
using (MemoryStream outStream = new MemoryStream())
{
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
compress.Write(uncompressedByteArray, 0, uncompressedByteArray.Length);
compress.Close();
}
byte[] compressedByteArray = outStream.ToArray();
StringBuilder compressedStringBuilder = new StringBuilder(compressedByteArray.Length);
foreach (byte b in compressedByteArray)
compressedStringBuilder.Append((char)b);
compressedString = compressedStringBuilder.ToString();
}
return compressedString;
}
Why is the first code snippet not successful while the other one is? Even though they're slightly different, I don't know why the minor changes in the second snippet allow it to work. The sample string I'm using is SELECT * FROM foods f WHERE f.name = 'chicken';
I ended up using the following code for compression and decompression:
public static String Compress(String decompressed)
{
byte[] data = Encoding.UTF8.GetBytes(decompressed);
using (var input = new MemoryStream(data))
using (var output = new MemoryStream())
{
using (var gzip = new GZipStream(output, CompressionMode.Compress, true))
{
input.CopyTo(gzip);
}
return Convert.ToBase64String(output.ToArray());
}
}
public static String Decompress(String compressed)
{
byte[] data = Convert.FromBase64String(compressed);
using (MemoryStream input = new MemoryStream(data))
using (GZipStream gzip = new GZipStream(input, CompressionMode.Decompress))
using (MemoryStream output = new MemoryStream())
{
gzip.CopyTo(output);
StringBuilder sb = new StringBuilder();
return Encoding.UTF8.GetString(output.ToArray());
}
}
The explanation for a part of the problem comes from this question. Although I fixed the problem by changing the code to what I included in this answer, these lines (in my original code):
foreach (byte b in compressedByteArray)
compressedStringBuilder.Append((char)b);
are problematic, because as dlev aptly phrased it:
You are interpreting each byte as its own character, when in fact that is not the case. Instead, you need the line:
string decoded = Encoding.Unicode.GetString(compressedByteArray);
The basic problem is that you are converting to a byte array based on an encoding, but then ignoring that encoding when you retrieve the bytes.
Therefore, the problem is solved, and the new code I'm using is much more succinct than my original code.
You need to move the code below outside the second using statement:
using (GZipStream compress = new GZipStream(outStream, CompressionMode.Compress))
{
inStream.CopyTo(compress);
outStream.Position = 0;
StreamReader reader = new StreamReader(outStream);
compressedString = reader.ReadToEnd();
}
CopyTo() is not flushing the results to the underlying MemoryStream.
Update
Seems that GZipStream closes and disposes it's underlying stream when it is disposed (not the way I would have designed the class). I've updated the sample above and tested it.
I built (based on a CodeProject article) a wrapper class (C#) to use a GZipStream to compress a MemoryStream. It compresses fine but doesn't decompress. I've looked at many other examples that have the same problem, and I feel like I'm following what's said but still am getting nothing when I decompress. Here's the compression and decompression methods:
public static byte[] Compress(byte[] bSource)
{
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress, true))
{
gzip.Write(bSource, 0, bSource.Length);
gzip.Close();
}
return ms.ToArray();
}
}
public static byte[] Decompress(byte[] bSource)
{
try
{
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Decompress, true))
{
gzip.Read(bSource, 0, bSource.Length);
gzip.Close();
}
return ms.ToArray();
}
}
catch (Exception ex)
{
throw new Exception("Error decompressing byte array", ex);
}
}
Here's an example of how I use it:
string sCompressed = Convert.ToBase64String(CompressionHelper.Compress("Some Text"));
// Other Processes
byte[] bReturned = CompressionHelper.Decompress(Convert.FromBase64String(sCompressed));
// bReturned has no elements after this line is executed
There is a bug in Decompress method.
The code does not read content of bSource. On the contrary, it overrides its content wile reading from empty gzip, created based on empty memory stream.
Basically what your version of code is doing:
//create empty memory
using (MemoryStream ms = new MemoryStream())
//create gzip stream over empty memory stream
using (GZipStream gzip = new GZipStream(ms, CompressionMode.Compress, true))
// write from empty stream to bSource
gzip.Write(bSource, 0, bSource.Length);
The fix could look like this:
public static byte[] Decompress(byte[] bSource)
{
using (var inStream = new MemoryStream(bSource))
using (var gzip = new GZipStream(inStream, CompressionMode.Decompress))
using (var outStream = new MemoryStream())
{
gzip.CopyTo(outStream);
return outStream.ToArray();
}
}
The OP said in an edit, now rolled back:
Thanks to Alex's explanation of what was going wrong, I was able to fix the Decompress method. Unfortunately, I'm using .Net 3.5, so I wasn't able to implement the Stream.CopyTo method he suggested. With his explanation, though, I was able to figure out a solution. I made the appropriate changes to the Decompress method below.
public static byte[] Decompress(byte[] bSource)
{
try
{
using (var instream = new MemoryStream(bSource))
{
using (var gzip = new GZipStream(instream, CompressionMode.Decompress))
{
using (var outstream = new MemoryStream())
{
byte[] buffer = new byte[4096];
while (true)
{
int delta = gzip.Read(buffer, 0, buffer.Length);
if (delta > 0)
outstream.Write(buffer, 0, delta);
if (delta < 4096)
break;
}
return outstream.ToArray();
}
}
}
}
catch (Exception ex)
{
throw new Exception("Error decompressing byte array", ex);
}
}