How do I convert this to read a zip file? [duplicate] - c#

This question already has answers here:
Unzipping a .gz file using C#
(3 answers)
Closed 8 years ago.
I am reading an unzipped binary file from disk like this:
string fn = #"c:\\MyBinaryFile.DAT";
byte[] ba = File.ReadAllBytes(fn);
MemoryStream msReader = new MemoryStream(ba);
I now want to increase speed of I/O by using a zipped binary file. But how do I fit it into the above schema?
string fn = #"c:\\MyZippedBinaryFile.GZ";
//Put something here
byte[] ba = File.ReadAllBytes(fn);
//Or here
MemoryStream msReader = new MemoryStream(ba);
What is the best way to achieve this pls.
I need to end up with a MemoryStream as my next step is to deserialize it.

You'd have to use a GZipStream on the content of your file.
So basically it should be like this:
string fn = #"c:\\MyZippedBinaryFile.GZ";
byte[] ba = File.ReadAllBytes(fn);
using (MemoryStream msReader = new MemoryStream(ba))
using (GZipStream zipStream = new GZipStream(msReader, CompressionMode.Decompress))
{
// Read from zipStream instead of msReader
}
To account for the valid comment by flindenberg, you can also open the file directly without having to read the entire file into memory first:
string fn = #"c:\\MyZippedBinaryFile.GZ";
using (FileStream stream = File.OpenRead(fn))
using (GZipStream zipStream = new GZipStream(stream, CompressionMode.Decompress))
{
// Read from zipStream instead of stream
}
You need to end up with a memory stream? No problem:
string fn = #"c:\\MyZippedBinaryFile.GZ";
using (FileStream stream = File.OpenRead(fn))
using (GZipStream zipStream = new GZipStream(stream, CompressionMode.Decompress))
using (MemoryStream ms = new MemoryStream()
{
zipStream.CopyTo(ms);
ms.Seek(0, SeekOrigin.Begin); // don't forget to rewind the stream!
// Read from ms
}

Related

C# equivalent to zlib.decompress

What is the equivalent of the Python function zlib.decompress() in C#? I need to decompress some zlib files using C# and I don't know how to do it.
Python example:
import zlib
file = open("myfile", mode = "rb")
data = zlib.decompress(file.read())
uncompressed_output = open("output_file", mode = "wb")
uncompressed_output.write(data)
I tried using the System.IO.Compression.DeflateStream class, but for every file I try it gives me an exception that the file contains invalid data while decoding.
byte[] binary = new byte[1000000];
using (DeflateStream compressed_file = new DeflateStream(new FileStream(#"myfile", FileMode.Open, FileAccess.Read), CompressionMode.Decompress))
compressed_file.Read(binary, 0, 1000000); //exception here
using (BinaryWriter outputFile = new BinaryWriter(new FileStream(#"output_file", FileMode.Create, FileAccess.Write)))
outputFile.Write(binary);
//Reading the file like normal with a BinaryReader and then turning it into a MemoryStream also didn't work
I should probably mention that the files are ZLIB compressed files. They start with the 78 9C header.
So, luckily, I found this post: https://stackoverflow.com/a/33855097/10505778
Basically the file must be stripped of its 2 header bytes (78 9C). While the 9C is important in decompression (it specifies whether a preset dictionary has been used or not), I don't need it, but I am pretty sure it is not that difficult to modify this to accomodate it:
byte[] binary, decompressed;
using (BinaryReader file = new BinaryReader(new FileStream(#"myfile", FileMode.Open, FileAccess.Read, FileShare.Read))
binary = file.ReadBytes(int.MaxValue); //read the entire file
output = new byte[int.MaxValue];
int outputSize;
using (MemoryStream memory_stream = new MemoryStream(binary, false))
{
memory_stream.Read(decompressed, 0, 2); //discard 2 bytes
using (DeflateStream compressed_file = new DeflateStream(memory_stream, CompressionMode.Decompress)
outputSize = compressed_file.Read(decompressed, 0, int.MaxValue);
}
binary = new byte[outputSize];
Array.Copy(decompressed, 0, binary, 0, outputSize);
using (BinaryWriter outputFile = new BinaryWriter(new FileStream(#"output_file", FileMode.Create, FileAccess.Write)))
outputFile.Write(binary);

How to Compress Large Files C#

I am using this method to compress files and it works great until I get to a file that is 2.4 GB then it gives me an overflow error:
void CompressThis (string inFile, string compressedFileName)
{
FileStream sourceFile = File.OpenRead(inFile);
FileStream destinationFile = File.Create(compressedFileName);
byte[] buffer = new byte[sourceFile.Length];
sourceFile.Read(buffer, 0, buffer.Length);
using (GZipStream output = new GZipStream(destinationFile,
CompressionMode.Compress))
{
output.Write(buffer, 0, buffer.Length);
}
// Close the files.
sourceFile.Close();
destinationFile.Close();
}
What can I do to compress huge files?
You should not to write the whole file to into the memory. Use Stream.CopyTo instead. This method reads the bytes from the current stream and writes them to another stream using a specified buffer size (81920 bytes by default).
Also you don't need to close Stream objects if use using keyword.
void CompressThis (string inFile, string compressedFileName)
{
using (FileStream sourceFile = File.OpenRead(inFile))
using (FileStream destinationFile = File.Create(compressedFileName))
using (GZipStream output = new GZipStream(destinationFile, CompressionMode.Compress))
{
sourceFile.CopyTo(output);
}
}
You can find a more complete example on Microsoft Docs (formerly MSDN).
You're trying to allocate all of this into memory. That just isn't necessary, you can feed the input stream directly into the output stream.
Alternative solution for zip format without allocating memory -
using (var sourceFileStream = new FileStream(this.GetFilePath(sourceFileName), FileMode.Open))
{
using (var destinationStream =
new FileStream(this.GetFilePath(zipFileName), FileMode.Create, FileAccess.ReadWrite))
{
using (var archive = new ZipArchive(destinationStream, ZipArchiveMode.Create, true))
{
var file = archive.CreateEntry(sourceFileName, CompressionLevel.Optimal);
using (var entryStream = file.Open())
{
var fileStream = sourceFileStream;
await fileStream.CopyTo(entryStream);
}
}
}
}
The solution will write directly from input stream to output stream

How to convert a compressed stream to an uncompressed stream in c# using GZipStream

Following various samples I've been able to convert a memory stream to a compressed stream and then to a byte array to save in a database but I'm having trouble going the other way. Here's what I've got so far...
...
using (MemoryStream compressedStream = new MemoryStream()) {
...some code that builds the compressedStream for an undetermined
number of byteArrays from a database
using (MemoryStream uncompressedStream = new MemoryStream()) {
// method 1
using (GZipStream unzippedStream = new GZipStream(compressedStream, CompressionMode.Decompress)) {
unzippedStream.CopyTo(uncompressedStream);
}
// method 2
using (GZipStream unzippedStream = new GZipStream(uncompressedStream, CompressionMode.Decompress)) {
compressedStream.CopyTo(unzippedStream);
}
... do something with uncompressedStream
}
}
Method 1 seams to follows the examples I see on here but causes an error "stream does not support writing"
Method 2 seams to make more sense but the uncompressed stream is always empty
P.S. Really what I would like to have is something simple like
MemoryStream compressed = GZipStream(uncompressed, Compress)
MemoryStream upcompressed = GZipStream(compressed, Decompress)
This code example works. The first part is just to get a compressed byte array. The second part demonstrates how the compressed stream can be created in code, write can be done multiple times. But the position must be set to 0.
byte[] compressed;
string output;
using (var outStream = new MemoryStream()) {
using (var tinyStream = new GZipStream(outStream, CompressionMode.Compress))
using (var mStream = new MemoryStream(Encoding.UTF8.GetBytes("This is a test"))) {
mStream.CopyTo(tinyStream);
}
compressed = outStream.ToArray();
}
using (var compressedStream = new MemoryStream()) {
// can do multiple writes here to create the compressed stream
compressedStream.Write(compressed, 0, compressed.Length);
compressedStream.Flush();
compressedStream.Position = 0;
using (var unzippedStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var uncompressedStream = new MemoryStream()) {
unzippedStream.CopyTo(uncompressedStream);
output = Encoding.UTF8.GetString(uncompressedStream.ToArray());
}
}
Console.WriteLine(output);

GZipStream works when writing to FileStream, but not MemoryStream

If compress some json text, and write that it to a file using a FileStream I get the expected results. However, I do not want to write to disk. I simply want to memorystream of the compressed data.
Compression to FileStream:
string json = Resource1.json;
using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(json)))
using (FileStream output = File.Create(#"C:\Users\roarker\Desktop\output.json.gz"))
{
using (GZipStream compression = new GZipStream(output, CompressionMode.Compress))
{
input.CopyTo(compression);
}
}
Above works. Below, the output memory stream is length 10 and results in an empty .gz file.
string json = Resource1.json;
using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(json)))
using (MemoryStream output = new MemoryStream())
{
using (GZipStream compression = new GZipStream(output, CompressionMode.Compress))
{
input.CopyTo(compression);
byte[] bytes = output.ToArray();
}
}
EDIT:
Moving output.ToArray() outside the inner using clause seems to work. However, this closes the output stream for most usage. IE:
using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(json)))
using (MemoryStream output = new MemoryStream())
{
using (GZipStream compression = new GZipStream(output, CompressionMode.Compress))
{
input.CopyTo(compression);
}
WriteToFile(output);
}
where :
public static void WriteToFile(Stream stream)
{
using (FileStream output = File.Create(#"C:\Users\roarker\Desktop\output.json.gz"))
{
stream.CopyTo(output);
}
}
This will fail on stream.CopyTo because the stream has been closed. I know I could make a new Stream from bytes of output.ToArray(), but why is this necessary? why does ToArray() work when the stream is closed?
Final Edit:
Just needed to use the contructor of the GZipStream with the leaveOpen parameter.
You're calling ToArray() before you've closed the GZipStream... that means it hasn't had a chance to flush the final bits of its buffer. This is a common issue for compression an encryption streams, where closing the stream needs to write some final pieces of data. (Even calling Flush() explicitly won't help, for example.)
Just move the ToArray call:
using (MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(json)))
using (MemoryStream output = new MemoryStream())
{
using (GZipStream compression = new GZipStream(output, CompressionMode.Compress))
{
input.CopyTo(compression);
}
byte[] bytes = output.ToArray();
// Use bytes
}
(Note that the stream will be disposed when you call ToArray, but that's okay.)

MemoryStream CopyTo only partially writing

I'm about to lose my freaking mind. I've been trying to get GzipStream to compress a string for the past hour, but for whatever reason, it refuses to write the entire byte array to the memory stream. At first I thought it had something to do with the using statements, but even after removing them it didn't seem to make a difference.
Initial config:
var str = "Here is a relatively simple string to compress";
byte[] compressedBytes;
string returnedData;
var bytes = Encoding.UTF8.GetBytes(str);
Works correctly (writes 64 length byte array):
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
msi.CopyTo(gs);
}
compressedBytes = mso.ToArray();
}
Fails (writes 10 length byte array):
using(var mso = new MemoryStream())
using(var msi = new MemoryStream(bytes))
using(var zip = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(zip);
compressedBytes = mso.ToArray();
}
Also fails (writes 10 length byte array):
var mso = new MemoryStream();
var msi = new MemoryStream(bytes);
var zip = new GZipStream(mso, CompressionMode.Compress);
msi.CopyTo(zip);
compressedBytes = mso.ToArray();
Can somebody explain why the first one works but in the other two I'm getting these incomplete arrays? Is something getting disposed out from under me? For that matter, is there a way for me to avoid using two memorystreams?
Thanks,
Zoombini
System.IO.Compression.GZipStream has to be closed (disposed) before you can use the underlying stream, because
It works block oriented
It has to write the footer, including the checksum (see the file format description on Wikipedia)
You're trying to get the the compressed data before GZipStream is closed. This doesn't return the full data, as you've seen. The reason the first one works is because you're calling compressedBytes = mso.ToArray(); after GZipStream has been disposed. So, untested but in theory, you should be able to modify your second code slightly like this to get it to work.
using(var mso = new MemoryStream())
{
using(var msi = new MemoryStream(bytes))
using(var zip = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(zip);
}
compressedBytes = mso.ToArray();
}
As others have said, you need to close the GZipStream before you can get the full data. A using statement will cause the Dispose method to be called on the stream at the end of the block, which will close the stream if it is not already closed. All of your examples above will work as expected if you place zip.Close(); after msi.CopyTo(zip);.
You can eliminate one of the MemoryStreams if you write it this way:
using (MemoryStream mso = new MemoryStream())
{
using (GZipStream zip = new GZipStream(mso, CompressionMode.Compress))
{
zip.Write(bytes, 0, bytes.Length);
}
compressedBytes = mso.ToArray();
}

Categories

Resources