How to Compress Large Files C# - c#

I am using this method to compress files and it works great until I get to a file that is 2.4 GB then it gives me an overflow error:
void CompressThis (string inFile, string compressedFileName)
{
FileStream sourceFile = File.OpenRead(inFile);
FileStream destinationFile = File.Create(compressedFileName);
byte[] buffer = new byte[sourceFile.Length];
sourceFile.Read(buffer, 0, buffer.Length);
using (GZipStream output = new GZipStream(destinationFile,
CompressionMode.Compress))
{
output.Write(buffer, 0, buffer.Length);
}
// Close the files.
sourceFile.Close();
destinationFile.Close();
}
What can I do to compress huge files?

You should not to write the whole file to into the memory. Use Stream.CopyTo instead. This method reads the bytes from the current stream and writes them to another stream using a specified buffer size (81920 bytes by default).
Also you don't need to close Stream objects if use using keyword.
void CompressThis (string inFile, string compressedFileName)
{
using (FileStream sourceFile = File.OpenRead(inFile))
using (FileStream destinationFile = File.Create(compressedFileName))
using (GZipStream output = new GZipStream(destinationFile, CompressionMode.Compress))
{
sourceFile.CopyTo(output);
}
}
You can find a more complete example on Microsoft Docs (formerly MSDN).

You're trying to allocate all of this into memory. That just isn't necessary, you can feed the input stream directly into the output stream.

Alternative solution for zip format without allocating memory -
using (var sourceFileStream = new FileStream(this.GetFilePath(sourceFileName), FileMode.Open))
{
using (var destinationStream =
new FileStream(this.GetFilePath(zipFileName), FileMode.Create, FileAccess.ReadWrite))
{
using (var archive = new ZipArchive(destinationStream, ZipArchiveMode.Create, true))
{
var file = archive.CreateEntry(sourceFileName, CompressionLevel.Optimal);
using (var entryStream = file.Open())
{
var fileStream = sourceFileStream;
await fileStream.CopyTo(entryStream);
}
}
}
}
The solution will write directly from input stream to output stream

Related

gzipstream memory stream to file

I am trying to compress JSON files using Gzip compression to be sent to another location. It needs to process 5,000 - 10,000 files daily, and I don't need the compressed version of the file on the local machine (they are actually being transferred to AWS S3 for long-term archiving).
Since I don't need them, I am trying to compress to a memory stream and then use that to write to AWS, rather than compress each one to disk. Whenever I try to do this, the files are broken (as in, when I open them in 7-Zip and try to open the JSON file inside, I get "Data error File is Broken).
The same thing happens when I try to write the memory stream to a local file, so I'm trying to solve that for now. Here's the code:
string[] files = Directory.GetFiles(#"C:\JSON_Logs");
foreach(string file in files)
{
FileInfo fileToCompress = new FileInfo(file);
using (FileStream originalFileStream = fileToCompress.OpenRead())
{
using (MemoryStream compressedMemStream = new MemoryStream())
{
using (GZipStream compressionStream = new GZipStream(compressedMemStream, CompressionMode.Compress))
{
originalFileStream.CopyTo(compressionStream);
compressedMemStream.Seek(0, SeekOrigin.Begin);
FileStream compressedFileStream = File.Create(fileToCompress.FullName + ".gz");
//Eventually this will be the AWS transfer, but that's not important here
compressedMemStream.WriteTo(compressedFileStream);
}
}
}
}
Rearrange your using statements so the GZipStream is definitely done by the time you read the memory stream contents:
foreach(string file in files)
{
FileInfo fileToCompress = new FileInfo(file);
using (MemoryStream compressedMemStream = new MemoryStream())
{
using (FileStream originalFileStream = fileToCompress.OpenRead())
using (GZipStream compressionStream = new GZipStream(
compressedMemStream,
CompressionMode.Compress,
leaveOpen: true))
{
originalFileStream.CopyTo(compressionStream);
}
compressedMemStream.Seek(0, SeekOrigin.Begin);
FileStream compressedFileStream = File.Create(fileToCompress.FullName + ".gz");
//Eventually this will be the AWS transfer, but that's not important here
compressedMemStream.WriteTo(compressedFileStream);
}
}
Disposing a stream takes care of flushing and closing it.

How can I used ReadAllLines with gzipped file

Is there a way to use the one-liner ReadAllLines on a gzipped file?
var pnDates = File.ReadAllLines("C:\myfile.gz");
Can I put GZipStream wrapper around the file some how?
No, File.ReadAllLines() treats the file specified as text file. A zipfile isn't that. It's trivial to do it yourself:
public IEnumerable<string> ReadAllZippedLines(string filename)
{
using (var fileStream = File.OpenRead(filename))
{
using (var gzipStream = new GZipStream(fileStream, CompressionMode.Decompress))
{
using (var reader = new StreamReader(gzipStream))
{
yield return reader.ReadLine();
}
}
}
}
There is no such thing built-in. You'll have to write yourself a small utility function.
You'd have to inflate the file first as the algorithm for gzip deals with byte data not text and incorporates a CRC. This should work for you:
EDIT - I cant comment for some reason, so this if for the bytestocompress question
byte[] decompressedBytes = new byte[4096];
using (FileStream fileToDecompress = File.Open("C:\myfile.gz", FileMode.Open))
{
using (GZipStream decompressionStream = new GZipStream(fileToDecompress, CompressionMode.Decompress))
{
decompressionStream.Read(decompressedBytes, 0, bytesToCompress.Length);
}
}
var pnDates = System.Text.Encoding.UTF8.GetString(decompressedBytes);

Saving a MemoryStream to a FileStream as ASCII

I have a memory stream that writes to a file stream. I need to be change the code below to save the memory stream as ASCII.
using (var ms = new memoryStream)
{
//...DownloadFile(file, ms);
using (var fs = File.Create(file))
{
ms.WriteTo(fs);
}
}
Use WriteAllBytes:
File.WriteAllBytes(file, ms.ToArray())

System.OutofMemoryException throw while doing GZipStream Compression

I am working in win forms. Getting errors while doing following operation.
It shows me System.OutOfMemoryException error when i try to run the operation around 2-3 times continuously. Seems .NET is not able to free the resouces used in operation. The file i am using for operation is quite big, around more than 500 MB.
My sample code is as below. Please help me how to resolve the error.
try
{
using (FileStream target = new FileStream(strCompressedFileName, FileMode.Create, FileAccess.Write))
using (GZipStream alg = new GZipStream(target, CompressionMode.Compress))
{
byte[] data = File.ReadAllBytes(strFileToBeCompressed);
alg.Write(data, 0, data.Length);
alg.Flush();
data = null;
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
Replace ReadAllBytes with Stream.CopyTo
using (FileStream target = new FileStream(strCompressedFileName, FileMode.Create, FileAccess.Write))
using (GZipStream alg = new GZipStream(target, CompressionMode.Compress))
{
using (var fileToRead = File.Open(.....))
{
fileToRead.CopyTo(alg);
}
}
A very rough example could be
// destFile - FileStream for destinationFile
// srcFile - FileStream of sourceFile
using (GZipStream gz = new GZipStream(destFile, CompressionMode.Compress))
{
byte[] src = new byte[1024];
int count = sourceFile.Read(src, 0, 1024);
while (count != 0)
{
gz.Write(src, 0, count );
count = sourceFile.Read(src, 0, 1024);
}
}
// flush, close, dispose ..
So basically I changed your ReadAllBytes to read only chunks of 1024 bytes.
You can try to use this method to compress file MSDN link
public static void Compress(FileInfo fileToCompress)
{
using (FileStream originalFileStream = fileToCompress.OpenRead())
{
using (FileStream compressedFileStream = File.Create(fileToCompress.FullName + ".gz"))
{
using (GZipStream compressionStream = new GZipStream(compressedFileStream, CompressionMode.Compress))
{
originalFileStream.CopyTo(compressionStream);
}
}
}
}
usage:
string directoryPath = #"c:\users\public\reports";
DirectoryInfo directorySelected = new DirectoryInfo(directoryPath);
foreach (FileInfo fileToCompress in directorySelected.GetFiles())
{
Compress(fileToCompress);
}

Create new FileStream out of a byte array

I am attempting to create a new FileStream object from a byte array. I'm sure that made no sense at all so I will try to explain in further detail below.
Tasks I am completing:
1) Reading the source file which was previously compressed
2) Decompressing the data using GZipStream
3) copying the decompressed data into a byte array.
What I would like to change:
1) I would like to be able to use File.ReadAllBytes to read the decompressed data.
2) I would then like to create a new filestream object usingg this byte array.
In short, I want to do this entire operating using byte arrays. One of the parameters for GZipStream is a stream of some sort, so I figured I was stuck using a filestream. But, if some method exists where I can create a new instance of a FileStream from a byte array - then I should be fine.
Here is what I have so far:
FolderBrowserDialog fbd = new FolderBrowserDialog(); // Shows a browser dialog
fbd.ShowDialog();
// Path to directory of files to compress and decompress.
string dirpath = fbd.SelectedPath;
DirectoryInfo di = new DirectoryInfo(dirpath);
foreach (FileInfo fi in di.GetFiles())
{
zip.Program.Decompress(fi);
}
// Get the stream of the source file.
using (FileStream inFile = fi.OpenRead())
{
//Create the decompressed file.
string outfile = #"C:\Decompressed.exe";
{
using (GZipStream Decompress = new GZipStream(inFile,
CompressionMode.Decompress))
{
byte[] b = new byte[blen.Length];
Decompress.Read(b,0,b.Length);
File.WriteAllBytes(outfile, b);
}
}
}
Thanks for any help!
Regards,
Evan
It sounds like you need to use a MemoryStream.
Since you don't know how many bytes you'll be reading from the GZipStream, you can't really allocate an array for it. You need to read it all into a byte array and then use a MemoryStream to decompress.
const int BufferSize = 65536;
byte[] compressedBytes = File.ReadAllBytes("compressedFilename");
// create memory stream
using (var mstrm = new MemoryStream(compressedBytes))
{
using(var inStream = new GzipStream(mstrm, CompressionMode.Decompress))
{
using (var outStream = File.Create("outputfilename"))
{
var buffer = new byte[BufferSize];
int bytesRead;
while ((bytesRead = inStream.Read(buffer, 0, BufferSize)) != 0)
{
outStream.Write(buffer, 0, bytesRead);
}
}
}
}
Here is what I ended up doing. I realize that I did not give sufficient information in my question - and I apologize for that - but I do know the size of the file I need to decompress as I am using it earlier in my program. This buffer is referred to as "blen".
string fi = #"C:\Path To Compressed File";
// Get the stream of the source file.
// using (FileStream inFile = fi.OpenRead())
using (MemoryStream infile1 = new MemoryStream(File.ReadAllBytes(fi)))
{
//Create the decompressed file.
string outfile = #"C:\Decompressed.exe";
{
using (GZipStream Decompress = new GZipStream(infile1,
CompressionMode.Decompress))
{
byte[] b = new byte[blen.Length];
Decompress.Read(b,0,b.Length);
File.WriteAllBytes(outfile, b);
}
}
}

Categories

Resources