C# equivalent to zlib.decompress - c#

What is the equivalent of the Python function zlib.decompress() in C#? I need to decompress some zlib files using C# and I don't know how to do it.
Python example:
import zlib
file = open("myfile", mode = "rb")
data = zlib.decompress(file.read())
uncompressed_output = open("output_file", mode = "wb")
uncompressed_output.write(data)
I tried using the System.IO.Compression.DeflateStream class, but for every file I try it gives me an exception that the file contains invalid data while decoding.
byte[] binary = new byte[1000000];
using (DeflateStream compressed_file = new DeflateStream(new FileStream(#"myfile", FileMode.Open, FileAccess.Read), CompressionMode.Decompress))
compressed_file.Read(binary, 0, 1000000); //exception here
using (BinaryWriter outputFile = new BinaryWriter(new FileStream(#"output_file", FileMode.Create, FileAccess.Write)))
outputFile.Write(binary);
//Reading the file like normal with a BinaryReader and then turning it into a MemoryStream also didn't work
I should probably mention that the files are ZLIB compressed files. They start with the 78 9C header.

So, luckily, I found this post: https://stackoverflow.com/a/33855097/10505778
Basically the file must be stripped of its 2 header bytes (78 9C). While the 9C is important in decompression (it specifies whether a preset dictionary has been used or not), I don't need it, but I am pretty sure it is not that difficult to modify this to accomodate it:
byte[] binary, decompressed;
using (BinaryReader file = new BinaryReader(new FileStream(#"myfile", FileMode.Open, FileAccess.Read, FileShare.Read))
binary = file.ReadBytes(int.MaxValue); //read the entire file
output = new byte[int.MaxValue];
int outputSize;
using (MemoryStream memory_stream = new MemoryStream(binary, false))
{
memory_stream.Read(decompressed, 0, 2); //discard 2 bytes
using (DeflateStream compressed_file = new DeflateStream(memory_stream, CompressionMode.Decompress)
outputSize = compressed_file.Read(decompressed, 0, int.MaxValue);
}
binary = new byte[outputSize];
Array.Copy(decompressed, 0, binary, 0, outputSize);
using (BinaryWriter outputFile = new BinaryWriter(new FileStream(#"output_file", FileMode.Create, FileAccess.Write)))
outputFile.Write(binary);

Related

how to get position and length of unzipped gzipstream in c#?

i'm trying to read .gz files using binary reader by first unzipping with gzipstream, and then creating a new binary reader with the gzipstream. however, when i try to use the BaseStream.Position and BaseStream.Length of BinaryReader (to know when i'm at the end of my file), i get a NotSupportedException, checking the doc for these fields in GZipStream Class shows:
Length
This property is not supported and always throws a NotSupportedException.(Overrides Stream.Length.)
Position
This property is not supported and always throws a NotSupportedException.(Overrides Stream.Position.)
so my question is how can i know when i'm at the end of my file when reading a decompressed GZipStream using BinaryReader? thanks
here is my code:
Stream stream = new MemoryStream(textAsset.bytes);
GZipStream zippedStream = new GZipStream(stream, CompressionMode.Decompress);
using (BinaryReader reader = new BinaryReader(zippedStream))
while(reader.BaseStream.Position != reader.BaseStream.Length)
{
//do stuff with BinaryReader
}
the above throws:
NotSupportedException: Operation is not supported. System.IO.Compression.DeflateStream.get_Position()
due to the BaseStream.Position call in the while()
You can copy your zippedStream to MemoryStream instance, that can be read fully using ToArray function. That is the easiest solution I can think of.
Stream stream = new MemoryStream(textAsset.bytes);
byte[] result;
using (GZipStream zippedStream = new GZipStream(stream, CompressionMode.Decompress))
{
using (MemoryStream reader = new MemoryStream())
{
zippedStream.CopyTo(reader);
result = reader.ToArray();
}
}
Alternatively if you want to read stream in chunks
using (GZipStream zippedStream = new GZipStream(stream, CompressionMode.Decompress))
{
byte[] buffer = new byte[16 * 1024];
int read;
while ((read = zippedStream.Read(buffer, 0, buffer.Length)) > 0)
{
// do work
}
}
Depending on what you are decoding you can read the first type into a byte array using the BinaryReader and then use BitConverter to convert these bytes into the type you want. You can then use BinaryReader as normal until the start of the next record.
byte[] headermarker = new byte[4];
int count;
// if bytes available in underlying stream.
while ((count = br.Read(headermarker, 0, 4) > 0 )
{
Int32 marker = BitConverter.ToInt32(headermarker, 0);
//
// now use Binary Reader for the rest of the record until we loop
//
}

Compressing C# objects using SharpZipLib?

Is it possible to compress a List<T> using SharpZipLib?
Turning the List to a byte array gives me around 60000 bytes (uncompressed).
Compression this with System.IO.Compression.DeflateStream gives me around a 1/3 compression rate but this is far from enough.
The purpose is to store the collections in the (MS SQL) database as a byte[] because saving them as individual rows uses to much space (1 million rows/day).
Thanks
Edit:
List<ItemLog> itemLogs = new List<ItemLog>();
//populate with 1000 ItemLogs
byte[] array = null; //original byte array
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, itemLogs);
array = ms.ToArray();
the array size is now 60000 bytes
Zip the collection using a ZipOutputStream
MemoryStream outputMemoryStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemoryStream);
zipStream.SetLevel(3);
ZipEntry entry = new ZipEntry("logs");
entry.DateTime = DateTime.Now;
zipStream.PutNextEntry(entry);
StreamUtils.Copy(ms, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false;
zipStream.Close();
outputMemoryStream.Position = 0;
byte[] compressed = outputMemoryStream.ToArray();
The compressed is now 164 bytes in size. <- length not valid/possible?
Uncompressing gives me a empty array. But as the compression is not right I will skip the uncompression code for now.
I do not see any real problem in your code. The only part, where the problem can be is copying of data. Is the stream of input data at the start of data, that should be stored? Try add the following line:
ms.Seek(0, SeekOrigin.Begin); // added line
StreamUtils.Copy(ms, zipStream, new byte[4096]);
Based on your code I wrote a simple compress function and it works as expected.
private static byte[] Compress(byte[] source)
{
byte[] compressed;
using (var memory = new MemoryStream())
using (var zipped = new ZipOutputStream(memory))
{
zipped.IsStreamOwner = false;
zipped.SetLevel(9);
var entry = new ZipEntry("data")
{
DateTime = DateTime.Now
};
zipped.PutNextEntry(entry);
#if true
zipped.Write(source, 0, source.Length);
#else
using (var src = new MemoryStream(source))
{
StreamUtils.Copy(src, zipped, new byte[4096]);
}
#endif
zipped.Close();
compressed = memory.ToArray();
}
#if false
using (var file = new FileStream("test.zip", FileMode.Create, FileAccess.Write, FileShare.Read))
{
file.Write(compressed, 0, compressed.Length);
}
#endif
return compressed;
}
You have there alternatives how to save the output (array or stream) and there is disabled code to save compressed data to file (to check in a external application the real content of compressed data).
My tested data were a 256 bytes long (data, with small compression rate), and the result was 407 bytes (file).
Try to use array, or check the stream content that is saved.

How do I convert this to read a zip file? [duplicate]

This question already has answers here:
Unzipping a .gz file using C#
(3 answers)
Closed 8 years ago.
I am reading an unzipped binary file from disk like this:
string fn = #"c:\\MyBinaryFile.DAT";
byte[] ba = File.ReadAllBytes(fn);
MemoryStream msReader = new MemoryStream(ba);
I now want to increase speed of I/O by using a zipped binary file. But how do I fit it into the above schema?
string fn = #"c:\\MyZippedBinaryFile.GZ";
//Put something here
byte[] ba = File.ReadAllBytes(fn);
//Or here
MemoryStream msReader = new MemoryStream(ba);
What is the best way to achieve this pls.
I need to end up with a MemoryStream as my next step is to deserialize it.
You'd have to use a GZipStream on the content of your file.
So basically it should be like this:
string fn = #"c:\\MyZippedBinaryFile.GZ";
byte[] ba = File.ReadAllBytes(fn);
using (MemoryStream msReader = new MemoryStream(ba))
using (GZipStream zipStream = new GZipStream(msReader, CompressionMode.Decompress))
{
// Read from zipStream instead of msReader
}
To account for the valid comment by flindenberg, you can also open the file directly without having to read the entire file into memory first:
string fn = #"c:\\MyZippedBinaryFile.GZ";
using (FileStream stream = File.OpenRead(fn))
using (GZipStream zipStream = new GZipStream(stream, CompressionMode.Decompress))
{
// Read from zipStream instead of stream
}
You need to end up with a memory stream? No problem:
string fn = #"c:\\MyZippedBinaryFile.GZ";
using (FileStream stream = File.OpenRead(fn))
using (GZipStream zipStream = new GZipStream(stream, CompressionMode.Decompress))
using (MemoryStream ms = new MemoryStream()
{
zipStream.CopyTo(ms);
ms.Seek(0, SeekOrigin.Begin); // don't forget to rewind the stream!
// Read from ms
}

Original file bytes from StreamReader, magic number detection

I'm trying to differentiate between "text files" and "binary" files, as I would effectively like to ignore files with "unreadable" contents.
I have a file that I believe is a GZIP archive. I'm tring to ignore this kind of file by detecting the magic numbers / file signature. If I open the file with the Hex editor plugin in Notepad++ I can see the first three hex codes are 1f 8b 08.
However if I read the file using a StreamReader, I'm not sure how to get to the original bytes..
using (var streamReader = new StreamReader(#"C:\file"))
{
char[] buffer = new char[10];
streamReader.Read(buffer, 0, 10);
var s = new String(buffer);
byte[] bytes = new byte[6];
System.Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, 6);
var hex = BitConverter.ToString(bytes);
var otherhex = BitConverter.ToString(System.Text.Encoding.UTF8.GetBytes(s.ToCharArray()));
}
At the end of the using statement I have the following variable values:
hex: "1F-00-FD-FF-08-00"
otherhex: "1F-EF-BF-BD-08-00-EF-BF-BD-EF-BF-BD-0A-51-02-03"
Neither of which start with the hex values shown in Notepad++.
Is it possible to get the original bytes from the result of reading a file via StreamReader?
Your code tries to change a binary buffer into a string. Strings are Unicode in NET so two bytes are required. The resulting is a bit unpredictable as you can see.
Just use a BinaryReader and its ReadBytes method
using(FileStream fs = new FileStream(#"C:\file", FileMode.Open, FileAccess.Read))
{
using (var reader = new BinaryReader(fs, new ASCIIEncoding()))
{
byte[] buffer = new byte[10];
buffer = reader.ReadBytes(10);
if(buffer[0] == 31 && buffer[1] == 139 && buffer[2] == 8)
// you have a signature match....
}
}
Usage (for a pdf file):
Assert.AreEqual("25504446", GetMagicNumbers(filePath, 4));
Method GetMagicNumbers:
private static string GetMagicNumbers(string filepath, int bytesCount)
{
// https://en.wikipedia.org/wiki/List_of_file_signatures
byte[] buffer;
using (var fs = new FileStream(filepath, FileMode.Open, FileAccess.Read))
using (var reader = new BinaryReader(fs))
buffer = reader.ReadBytes(bytesCount);
var hex = BitConverter.ToString(buffer);
return hex.Replace("-", String.Empty).ToLower();
}
You can't. StreamReader is made to read text, not binary. Use the Stream directly to read bytes. In your case FileStream.
To guess whether a file is text or binary you could read the first 4K into a byte[] and interpret that.
Btw, you tried to force chars into bytes. This is invalid by principle. I suggest you familiarize yourself with what an Encoding is: it is the only way to convert between chars and bytes in a semantically correct way.

GZipStream not reading the whole file

I have some code that downloads gzipped files, and decompresses them. The problem is, I can't get it to decompress the whole file, it only reads the first 4096 bytes and then about 500 more.
Byte[] buffer = new Byte[4096];
int count = 0;
FileStream fileInput = new FileStream("input.gzip", FileMode.Open, FileAccess.Read, FileShare.Read);
FileStream fileOutput = new FileStream("output.dat", FileMode.Create, FileAccess.Write, FileShare.None);
GZipStream gzipStream = new GZipStream(fileInput, CompressionMode.Decompress, true);
// Read from gzip steam
while ((count = gzipStream.Read(buffer, 0, buffer.Length)) > 0)
{
// Write to output file
fileOutput.Write(buffer, 0, count);
}
// Close the streams
...
I've checked the downloaded file; it's 13MB when compressed, and contains one XML file. I've manually decompressed the XML file, and the content is all there. But when I do it with this code, it only outputs the very beginning of the XML file.
Anyone have any ideas why this might be happening?
EDIT
Try not leaving the GZipStream open:
GZipStream gzipStream = new GZipStream(fileInput, CompressionMode.Decompress,
false);
or
GZipStream gzipStream = new GZipStream(fileInput, CompressionMode.Decompress);
I ended up using a gzip executable to do the decompression instead of a GZipStream. It can't handle the file for some reason, but the executable can.
Same thing happened to me. In my case only reads up to 6 lines and then reached end of file. So I realized that although the extension is gz, it was compressed by another algorithm not supported by GZipStream. So I used SevenZipSharp library and it worked. This is my code
You can use SevenZipSharp library
using (var input = File.OpenRead(lstFiles[0]))
{
using (var ds = new SevenZipExtractor(input))
{
//ds.ExtractionFinished += DsOnExtractionFinished;
var mem = new MemoryStream();
ds.ExtractFile(0, mem);
using (var sr = new StreamReader(mem))
{
var iCount = 0;
String line;
mem.Position = 0;
while ((line = sr.ReadLine()) != null && iCount < 100)
{
iCount++;
LstOutput.Items.Add(line);
}
}
}
}
Are you calling Close or Flush on fileOutput? (Or just wrap it in a using, which is recommended practice.) If you don't the file might not be flushed to disk when your program ends.

Categories

Resources