C# Decompress .GZip to file

C# Decompress .GZip to file - c#

I have this Code
using System.IO;
using System.IO.Compression;
...
UnGzip2File("input.gz","output.xls");
Which run this procedure, it runs without error but after it, the input.gz is empty and created output.xls is also empty. At the start input.gz had 12MB. What am i doing wrong ? Or have you better/functional solution ?
public static void UnGzip2File(string inputPath, string outputPath)
{
FileStream inputFileStream = new FileStream(inputPath, FileMode.Create);
FileStream outputFileStream = new FileStream(outputPath, FileMode.Create);
using (GZipStream gzipStream = new GZipStream(inputFileStream, CompressionMode.Decompress))
{
byte[] bytes = new byte[4096];
int n;
// To be sure the whole file is correctly read,
// you should call FileStream.Read method in a loop,
// even if in the most cases the whole file is read in a single call of FileStream.Read method.
while ((n = gzipStream.Read(bytes, 0, bytes.Length)) != 0)
{
outputFileStream.Write(bytes, 0, n);
}
}
outputFileStream.Dispose();
inputFileStream.Dispose();
}

Opening the FileStream with the FileMode.Create will overwrite the existing file as documented here. This will cause the file to be empty when you try to decompress it, which in turn leads to an empty output-file.
Below is a working code sample, note that it is async, this can be changed by leaving out async/await and changing the call to the regular CopyTo-method and changing the return type to void.
public static async Task DecompressGZip(string inputPath, string outputPath)
{
using (var input = File.OpenRead(inputPath))
using (var output = File.OpenWrite(outputPath))
using (var gz = new GZipStream(input, CompressionMode.Decompress))
{
await gz.CopyToAsync(output);
}
}

Related

Best way to read a short array from disk in C#?

I have to write 4GB short[] arrays to and from disk, so I have found a function to write the arrays, and I am struggling to write the code to read the array from the disk. I normally code in other languages so please forgive me if my attempt is a bit pathetic so far:
using UnityEngine;
using System.Collections;
using System.IO;
public class RWShort : MonoBehaviour {
public static void WriteShortArray(short[] values, string path)
{
using (FileStream fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write))
{
using (BinaryWriter bw = new BinaryWriter(fs))
{
foreach (short value in values)
{
bw.Write(value);
}
}
}
} //Above is fine, here is where I am confused:
public static short[] ReadShortArray(string path)
{
byte[] thisByteArray= File.ReadAllBytes(fileName);
short[] thisShortArray= new short[thisByteArray.length/2];
for (int i = 0; i < 10; i+=2)
{
thisShortArray[i]= ? convert from byte array;
}
return thisShortArray;
}
}

Shorts are two bytes, so you have to read in two bytes each time. I'd also recommend using a yield return like this so that you aren't trying to pull everything into memory in one go. Though if you need all of the shorts together that won't help you.. depends on what you're doing with it I guess.
void Main()
{
short[] values = new short[] {
1, 999, 200, short.MinValue, short.MaxValue
};
WriteShortArray(values, #"C:\temp\shorts.txt");
foreach (var shortInfile in ReadShortArray(#"C:\temp\shorts.txt"))
{
Console.WriteLine(shortInfile);
}
}
public static void WriteShortArray(short[] values, string path)
{
using (FileStream fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write))
{
using (BinaryWriter bw = new BinaryWriter(fs))
{
foreach (short value in values)
{
bw.Write(value);
}
}
}
}
public static IEnumerable<short> ReadShortArray(string path)
{
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
using (BinaryReader br = new BinaryReader(fs))
{
byte[] buffer = new byte[2];
while (br.Read(buffer, 0, 2) > 0)
yield return (short)(buffer[0]|(buffer[1]<<8));
}
}
You could also define it this way, taking advantage of the BinaryReader:
public static IEnumerable<short> ReadShortArray(string path)
{
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
using (BinaryReader br = new BinaryReader(fs))
{
while (br.BaseStream.Position < br.BaseStream.Length)
yield return br.ReadInt16();
}
}

Memory-mapping the file is your friend, there's a MemoryMappedViewAccessor.ReadInt16 function that will allow you to directly read the data, with type short, out of the OS disk cache. Also a Write() overload that accepts an Int16. Also ReadArray and WriteArray functions if you are calling functions that need a traditional .NET array.
Overview of using Memory-mapped files in .NET on MSDN
If you want to do it with ordinary file I/O, use a block size of 1 or 2 megabytes and the Buffer.BlockCopy function to move data en masse between byte[] and short[], and use the FileStream functions that accept a byte[]. Forget about BinaryWriter or BinaryReader, forget about doing 2 bytes at a time.
It's also possible to do the I/O directly into a .NET array with the help of p/invoke, see my answer using ReadFile and passing the FileStream object's SafeFileHandle property here But even though this has no extra copies, it still shouldn't keep up with the memory-mapped ReadArray and WriteArray calls.

C# - How can I download a zip file from url, unzip it, and read the extracted files, all in memory? [duplicate]

I have files (from 3rd parties) that are being FTP'd to a directory on our server. I download them and process them even 'x' minutes. Works great.
Now, some of the files are .zip files. Which means I can't process them. I need to unzip them first.
FTP has no concept of zip/unzipping - so I'll need to grab the zip file, unzip it, then process it.
Looking at the MSDN zip api, there seems to be no way i can unzip to a memory stream?
So is the only way to do this...
Unzip to a file (what directory? need some -very- temp location ...)
Read the file contents
Delete file.
NOTE: The contents of the file are small - say 4k <-> 1000k.

Zip compression support is built in:
using System.IO;
using System.IO.Compression;
// ^^^ requires a reference to System.IO.Compression.dll
static class Program
{
const string path = ...
static void Main()
{
using(var file = File.OpenRead(path))
using(var zip = new ZipArchive(file, ZipArchiveMode.Read))
{
foreach(var entry in zip.Entries)
{
using(var stream = entry.Open())
{
// do whatever we want with stream
// ...
}
}
}
}
}
Normally you should avoid copying it into another stream - just use it "as is", however, if you absolutely need it in a MemoryStream, you could do:
using(var ms = new MemoryStream())
{
stream.CopyTo(ms);
ms.Position = 0; // rewind
// do something with ms
}

You can use ZipArchiveEntry.Open to get a stream.
This code assumes the zip archive has one text file.
using (FileStream fs = new FileStream(path, FileMode.Open))
using (ZipArchive zip = new ZipArchive(fs) )
{
var entry = zip.Entries.First();
using (StreamReader sr = new StreamReader(entry.Open()))
{
Console.WriteLine(sr.ReadToEnd());
}
}

using (ZipArchive archive = new ZipArchive(webResponse.GetResponseStream()))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
Stream s = entry.Open();
var sr = new StreamReader(s);
var myStr = sr.ReadToEnd();
}
}

Looks like here is what you need:
using (var za = ZipFile.OpenRead(path))
{
foreach (var entry in za.Entries)
{
using (var r = new StreamReader(entry.Open()))
{
//your code here
}
}
}

You can use SharpZipLib among a variety of other libraries to achieve this.
You can use the following code example to unzip to a MemoryStream, as shown on their wiki:
using ICSharpCode.SharpZipLib.Zip;
// Compresses the supplied memory stream, naming it as zipEntryName, into a zip,
// which is returned as a memory stream or a byte array.
//
public MemoryStream CreateToMemoryStream(MemoryStream memStreamIn, string zipEntryName) {
MemoryStream outputMemStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemStream);
zipStream.SetLevel(3); //0-9, 9 being the highest level of compression
ZipEntry newEntry = new ZipEntry(zipEntryName);
newEntry.DateTime = DateTime.Now;
zipStream.PutNextEntry(newEntry);
StreamUtils.Copy(memStreamIn, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false; // False stops the Close also Closing the underlying stream.
zipStream.Close(); // Must finish the ZipOutputStream before using outputMemStream.
outputMemStream.Position = 0;
return outputMemStream;
// Alternative outputs:
// ToArray is the cleaner and easiest to use correctly with the penalty of duplicating allocated memory.
byte[] byteArrayOut = outputMemStream.ToArray();
// GetBuffer returns a raw buffer raw and so you need to account for the true length yourself.
byte[] byteArrayOut = outputMemStream.GetBuffer();
long len = outputMemStream.Length;
}

Ok so combining all of the above, suppose you want to in a very simple way take a zip file called
"file.zip" and extract it to "C:\temp" folder. (Note: This example was only tested for compress text files) You may need to do some modifications for binary files.
using System.IO;
using System.IO.Compression;
static void Main(string[] args)
{
//Call it like this:
Unzip("file.zip",#"C:\temp");
}
static void Unzip(string sourceZip, string targetPath)
{
using (var z = ZipFile.OpenRead(sourceZip))
{
foreach (var entry in z.Entries)
{
using (var r = new StreamReader(entry.Open()))
{
string uncompressedFile = Path.Combine(targetPath, entry.Name);
File.WriteAllText(uncompressedFile,r.ReadToEnd());
}
}
}
}

How to Compress Large Files C#

I am using this method to compress files and it works great until I get to a file that is 2.4 GB then it gives me an overflow error:
void CompressThis (string inFile, string compressedFileName)
{
FileStream sourceFile = File.OpenRead(inFile);
FileStream destinationFile = File.Create(compressedFileName);
byte[] buffer = new byte[sourceFile.Length];
sourceFile.Read(buffer, 0, buffer.Length);
using (GZipStream output = new GZipStream(destinationFile,
CompressionMode.Compress))
{
output.Write(buffer, 0, buffer.Length);
}
// Close the files.
sourceFile.Close();
destinationFile.Close();
}
What can I do to compress huge files?

You should not to write the whole file to into the memory. Use Stream.CopyTo instead. This method reads the bytes from the current stream and writes them to another stream using a specified buffer size (81920 bytes by default).
Also you don't need to close Stream objects if use using keyword.
void CompressThis (string inFile, string compressedFileName)
{
using (FileStream sourceFile = File.OpenRead(inFile))
using (FileStream destinationFile = File.Create(compressedFileName))
using (GZipStream output = new GZipStream(destinationFile, CompressionMode.Compress))
{
sourceFile.CopyTo(output);
}
}
You can find a more complete example on Microsoft Docs (formerly MSDN).

You're trying to allocate all of this into memory. That just isn't necessary, you can feed the input stream directly into the output stream.

Alternative solution for zip format without allocating memory -
using (var sourceFileStream = new FileStream(this.GetFilePath(sourceFileName), FileMode.Open))
{
using (var destinationStream =
new FileStream(this.GetFilePath(zipFileName), FileMode.Create, FileAccess.ReadWrite))
{
using (var archive = new ZipArchive(destinationStream, ZipArchiveMode.Create, true))
{
var file = archive.CreateEntry(sourceFileName, CompressionLevel.Optimal);
using (var entryStream = file.Open())
{
var fileStream = sourceFileStream;
await fileStream.CopyTo(entryStream);
}
}
}
}
The solution will write directly from input stream to output stream

How can I used ReadAllLines with gzipped file

Is there a way to use the one-liner ReadAllLines on a gzipped file?
var pnDates = File.ReadAllLines("C:\myfile.gz");
Can I put GZipStream wrapper around the file some how?

No, File.ReadAllLines() treats the file specified as text file. A zipfile isn't that. It's trivial to do it yourself:
public IEnumerable<string> ReadAllZippedLines(string filename)
{
using (var fileStream = File.OpenRead(filename))
{
using (var gzipStream = new GZipStream(fileStream, CompressionMode.Decompress))
{
using (var reader = new StreamReader(gzipStream))
{
yield return reader.ReadLine();
}
}
}
}

There is no such thing built-in. You'll have to write yourself a small utility function.

You'd have to inflate the file first as the algorithm for gzip deals with byte data not text and incorporates a CRC. This should work for you:
EDIT - I cant comment for some reason, so this if for the bytestocompress question
byte[] decompressedBytes = new byte[4096];
using (FileStream fileToDecompress = File.Open("C:\myfile.gz", FileMode.Open))
{
using (GZipStream decompressionStream = new GZipStream(fileToDecompress, CompressionMode.Decompress))
{
decompressionStream.Read(decompressedBytes, 0, bytesToCompress.Length);
}
}
var pnDates = System.Text.Encoding.UTF8.GetString(decompressedBytes);

How can I unzip a file to a .NET memory stream?

I have files (from 3rd parties) that are being FTP'd to a directory on our server. I download them and process them even 'x' minutes. Works great.
Now, some of the files are .zip files. Which means I can't process them. I need to unzip them first.
FTP has no concept of zip/unzipping - so I'll need to grab the zip file, unzip it, then process it.
Looking at the MSDN zip api, there seems to be no way i can unzip to a memory stream?
So is the only way to do this...
Unzip to a file (what directory? need some -very- temp location ...)
Read the file contents
Delete file.
NOTE: The contents of the file are small - say 4k <-> 1000k.

Zip compression support is built in:
using System.IO;
using System.IO.Compression;
// ^^^ requires a reference to System.IO.Compression.dll
static class Program
{
const string path = ...
static void Main()
{
using(var file = File.OpenRead(path))
using(var zip = new ZipArchive(file, ZipArchiveMode.Read))
{
foreach(var entry in zip.Entries)
{
using(var stream = entry.Open())
{
// do whatever we want with stream
// ...
}
}
}
}
}
Normally you should avoid copying it into another stream - just use it "as is", however, if you absolutely need it in a MemoryStream, you could do:
using(var ms = new MemoryStream())
{
stream.CopyTo(ms);
ms.Position = 0; // rewind
// do something with ms
}

You can use ZipArchiveEntry.Open to get a stream.
This code assumes the zip archive has one text file.
using (FileStream fs = new FileStream(path, FileMode.Open))
using (ZipArchive zip = new ZipArchive(fs) )
{
var entry = zip.Entries.First();
using (StreamReader sr = new StreamReader(entry.Open()))
{
Console.WriteLine(sr.ReadToEnd());
}
}

using (ZipArchive archive = new ZipArchive(webResponse.GetResponseStream()))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
Stream s = entry.Open();
var sr = new StreamReader(s);
var myStr = sr.ReadToEnd();
}
}

Looks like here is what you need:
using (var za = ZipFile.OpenRead(path))
{
foreach (var entry in za.Entries)
{
using (var r = new StreamReader(entry.Open()))
{
//your code here
}
}
}

You can use SharpZipLib among a variety of other libraries to achieve this.
You can use the following code example to unzip to a MemoryStream, as shown on their wiki:
using ICSharpCode.SharpZipLib.Zip;
// Compresses the supplied memory stream, naming it as zipEntryName, into a zip,
// which is returned as a memory stream or a byte array.
//
public MemoryStream CreateToMemoryStream(MemoryStream memStreamIn, string zipEntryName) {
MemoryStream outputMemStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemStream);
zipStream.SetLevel(3); //0-9, 9 being the highest level of compression
ZipEntry newEntry = new ZipEntry(zipEntryName);
newEntry.DateTime = DateTime.Now;
zipStream.PutNextEntry(newEntry);
StreamUtils.Copy(memStreamIn, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false; // False stops the Close also Closing the underlying stream.
zipStream.Close(); // Must finish the ZipOutputStream before using outputMemStream.
outputMemStream.Position = 0;
return outputMemStream;
// Alternative outputs:
// ToArray is the cleaner and easiest to use correctly with the penalty of duplicating allocated memory.
byte[] byteArrayOut = outputMemStream.ToArray();
// GetBuffer returns a raw buffer raw and so you need to account for the true length yourself.
byte[] byteArrayOut = outputMemStream.GetBuffer();
long len = outputMemStream.Length;
}

Ok so combining all of the above, suppose you want to in a very simple way take a zip file called
"file.zip" and extract it to "C:\temp" folder. (Note: This example was only tested for compress text files) You may need to do some modifications for binary files.
using System.IO;
using System.IO.Compression;
static void Main(string[] args)
{
//Call it like this:
Unzip("file.zip",#"C:\temp");
}
static void Unzip(string sourceZip, string targetPath)
{
using (var z = ZipFile.OpenRead(sourceZip))
{
foreach (var entry in z.Entries)
{
using (var r = new StreamReader(entry.Open()))
{
string uncompressedFile = Path.Combine(targetPath, entry.Name);
File.WriteAllText(uncompressedFile,r.ReadToEnd());
}
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Decompress .GZip to file - c#

Related

Best way to read a short array from disk in C#?

C# - How can I download a zip file from url, unzip it, and read the extracted files, all in memory? [duplicate]

How to Compress Large Files C#

How can I used ReadAllLines with gzipped file

How can I unzip a file to a .NET memory stream?

Categories

Resources