Creating large multi-part zip with Ionic fails with OutOfMemoryException

Creating large multi-part zip with Ionic fails with OutOfMemoryException - c#

I try to create a multi-part zip file with files of a total size of 17 GB, using IonicZip.
Each zip part is set to be not larger than around 500 MB.
zip.MaxOutputSegmentSize = 500000000;
The files that go into the zip are of various sizes each, but never larger than 350MB, usually much smaller, just a couple of KB or MB.
My machine where I create the zip file has 4GB RAM.
When I start the zip creation in my program, I get an OutOfMemoryException at some point due to the used RAM.
(The same works fine when the total size of all files is about 2 GB instead of 17GB).
The code:
ZipFile zip = new ZipFile(zipFileName, Encoding.UTF8);
zip.CompressionLevel = CompressionLevel.BestCompression;
zip.ParallelDeflateThreshold = -1; // https://stackoverflow.com/questions/11981143/compression-fails-when-using-ionic-zip
zip.MaxOutputSegmentSize = 500000000; // around 500MB
[...]
while (...)
{
zip.AddEntry(adjustedFilePath, File.ReadAllBytes(filepath));
}
zip.Save();
I am wondering how IonicZip handles zip.save in combination with multi-part creation. It should not be necessary to hold all multi-parts in memory but only the current one, not?
And since I set zip.MaxOutputSegmentSize to only around 500MB and the maximum size of a single file that goes into the zip is never more than 350MB, I don't see why it should eat up so much memory.
On the other hand, when the OutOfMemoryException occurs, there is not even any single part of the multi-part written to disk yet. Usually, with smaller amount of files where the zip creation succeeds, the multiple parts are on the filesystem with different creation timestamps, approx. 5 seconds apart each. So I am really not sure what IonicZip is doing internally exactly until it spits out the first zip part.
Sorry, I'm new to C# and .NET. Is the IonicZip the best library for this? Could I use the System.IO.Compression or System.IO.Packaging package instead (I did not see that they support multi-part zips) or the commercial Xceed?
Posts that I already checked but did not help:
Compression fails when using ionic zip
Ionic zip throws out of memory exception
OutOfMemoryException when creating large ZIP file using System.IO.Packaging (not IonicZip related)

I suggest using the built in .net System.IO.Compression namespace classes to zip/unzip files; they are stream based, whereas your example here does not use streams - so must be using RAM to store the compressed data. With the .net native libraries you can write out compressed data in chunks to the output stream.

The solution I came up with which seems to be the best in terms of effort (I don't want to having to reinvent the wheel and write multi-part logic for the standard zip/compression package, seems such a standard thing that should be part of any zipping package) is to use Xceed and pretty much use the same code logic like I did for IonicZip.
It seems that IonicZip is not really handling it in an efficient way in terms of memory. Xceed works fine without any major increase in memory usage.
using Xceed.Zip;
using Xceed.FileSystem;
[...]
ZipArchive zip = new ZipArchive(new DiskFile(zipFileName));
zip.SplitNameFormat = SplitNameFormat.PkZip;
zip.SplitSize = 500000;
zip.DefaultCompressionLevel = Xceed.Compression.CompressionLevel.Highest;
zip.TempFolder = new DiskFolder(#"C:\temp");
new DiskFolder(localSourceFolder).CopyFilesTo(zip, true, true);

Related

Extracting 7z file with millions of 200 bit file size inside take hours to finish. How to speed up?

Good day, I've created my own custom Wizard Installer for my website project. My goal is to minimize the work during the installation of our client.
I'm trying to extract a 7z file that has millions of tiny files (200-bit size of each file) inside. I'm using sharpcompress to achieve this extracting process but it seems that it will take hours to finish the task which is very bad for the user.
I don't care about compression. What I need is to reduce the time of the extracting process of these millions of tiny files or if possible, to speed up the extraction.
My question is. What is the fastest way to extract millions of tiny files? or any method to pack and unpack the files with the highest speed of unpacking.
I'm trying to extract the 7z file by this code:
using (SevenZipArchive zipArchive = SevenZipArchive.Open(source7z))
{
zipArchive.WriteToDirectory(destination7z,
new ExtractionOptions { Overwrite = true, ExtractFullPath = true });
}
But seems the extracting time is very slow for tiny files.

Finding file size to decide on size cutoff for loading?

I am writing a 32 bit app and so only 4GB, the files I process can be very huge upto 3.5 GB, but what size should I consider before loading the file for processing?
I mean, C# .net I suppose only have limited RAM for framework, what should be the cutoff for a file (though it depends on how much memory the application takes, just to arrive at a ballpark figure) ? ( i dont have a file of that magnitude, just want to handle it before a memory error)
I suppose, what I need is actual file size than file size on disk ? and is it possible find that without opening the file?

You can use FileInfo.Length property.
But instead consider using stream classes (see *Reader classes in System.IO), they allow you to read parts of a file, analyse it and discard. In this way you don't care about the size of a file at all.

Efficient compression of folder with same file copied multiple times

I am creating a *.zip using Ionic.Zip. However, my *.zip contains same files multiple times, sometimes even 20x, and the ZIP format does not take advantage of it at all.
Whats worse, Ionic.Zip sometimes crashes with an OutOfMemoryException, since I am compressing the files into a MemoryStream.
Is there a .NET library for compressing that takes advantage of redundancy between files?
Users decompress the files on their own, so it cannot be an exotic format.

I ended up creating a tar.gz using the SharpZipLib library. Using this solution on 1 file, the archive is 3kB. Using it on 20 identical files, the archive is only 6kB, whereas in .zip it was 64kB.
Nuget:
Install-Package SharpZipLib
Usings:
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;
Code:
var output = new MemoryStream();
using (var gzip = new GZipOutputStream(output))
using (var tar = TarArchive.CreateOutputTarArchive(gzip))
{
for (int i = 0; i < files.Count; i++)
{
var tarEntry = TarEntry.CreateEntryFromFile(file);
tar.WriteEntry(tarEntry,false);
}
tar.IsStreamOwner = false;
gzip.IsStreamOwner = false;
}

No, there is no such API exposed by well-known ones (such as GZip, PPMd, Zip, LZMA). They all operate per file (or stream of bytes to be more specific).
You could catenate all the files, ie using a tar-ball format and then use compression algorithm.
Or, it's trivial to implement your own check: compute hash for a file and store it in the a hash-filename dictionary. If hash matches for next file you can decide what you want to do, such as ignore this file completely, or perhaps note its name and save it in another file to mark duplicates.

Yes, 7-zip. There is a SevenZipSharp library you could use, but from my experience, launching a compressing process directly using command line is much faster.
My personal experience:
We used a SevenZipSharp in a company to decompress archives up to 1GB and it was terribly slow until I reworked it so that it will use the 7-zip library directly by running its command line interface. Then it was as fast as it was when decompressing manually in Windows Explorer.

I haven't tested this, but according to one answerer in How many times can a file be compressed?
If you have a large number of duplicate files, the zip format will zip each independently, and you can then zip the first zip file to remove duplicate zip information.

Create a ZIP file without entries touching the disk?

I'm trying to create a program that has the capability of creating a zipped package containing files based on user input.
I don't need any of those files to be written to the hard drive before they're zipped, as that would be unnecessary, so how do I create these files without actually writing them to the hard drive, and then have them zipped?
I'm using DotNetZip.

See the documentation here, specifically the example called "Create a zip using content obtained from a stream":
using (ZipFile zip = new ZipFile())
{
ZipEntry e= zip.AddEntry("Content-From-Stream.bin", "basedirectory", StreamToRead);
e.Comment = "The content for entry in the zip file was obtained from a stream";
zip.AddFile("Readme.txt");
zip.Save(zipFileToCreate);
}
If your files are not already in a stream format, you'll need to convert them to one. You'll probably want to use a MemoryStream for that.

I use SharpZipLib, but if DotNetZip can do everything against a basic System.IO.Stream, then yes, just feed it a MemoryStream to write to.

Writing to the hard disk shouldn't be something avoid because it's unnecessary. That's backwards. If it's not a requirement that the entire zipping process is done in memory then avoid it by writing to the hard disk.
The hard disk is better suited for storing large amounts of data than memory is. If by some chance your zip file ends up being around a gigabyte in size your application could croak or at least cause a system slowdown. If you write directly to the hard drive the zip could be several gigabytes in size without causing an issue.

DotNetZip - Calculate final zip size before calling Save(stream)

When using DotNetZip, is it possible to get what the final zip file size will be before calling Save(stream)? I have a website where users will be downloading fairly large zip files (over 2 gigs), and I would like to be able to stream the file to the user rather then buffering the entire file into memory. Some thing like this...
response.BufferOutput = false;
response.AddHeader("Content-Length", ????);
Is this possible?

If the stream is homogenous, you could waste some time by compressing a 'small' portion ahead, calculating the compression ratio and extrapolating from that.
If you are meaning to set a content-length header or something like that, it can only be done when you (1) write a temporary file (advisable if there is any risk of connection trouble and clients requesting specific chunks anyway) (2) can keep the entire file in memory (presumably on ly on 64bit system with copious memory)
Of course, you could waste enormous resources and just compress the stream twice, but I hope you agree that would be silly.

The way to do what you want is to save the file to a temporary filesystem file, then stream the result to the user. This lets you compute the size then transmit the file.
In this case dotnetzip will not save the file into memory.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.