DotNetZip - Calculate final zip size before calling Save(stream) - c#

When using DotNetZip, is it possible to get what the final zip file size will be before calling Save(stream)? I have a website where users will be downloading fairly large zip files (over 2 gigs), and I would like to be able to stream the file to the user rather then buffering the entire file into memory. Some thing like this...
response.BufferOutput = false;
response.AddHeader("Content-Length", ????);
Is this possible?

If the stream is homogenous, you could waste some time by compressing a 'small' portion ahead, calculating the compression ratio and extrapolating from that.
If you are meaning to set a content-length header or something like that, it can only be done when you (1) write a temporary file (advisable if there is any risk of connection trouble and clients requesting specific chunks anyway) (2) can keep the entire file in memory (presumably on ly on 64bit system with copious memory)
Of course, you could waste enormous resources and just compress the stream twice, but I hope you agree that would be silly.

The way to do what you want is to save the file to a temporary filesystem file, then stream the result to the user. This lets you compute the size then transmit the file.
In this case dotnetzip will not save the file into memory.

Related

Zipping a large amount of data into an output stream without loading all the data into memory first in C#

I have a C# program that generates a bunch of short (10 seconds or so) video files. These are stored in an azure file storage blob. I want the user to be able to download these files at a later date as a zip. However, it would take a substantial amount of memory to load the entire collection of video files into memory to create the zip. I was wondering if it is possible to pull data from a stream into memory, zip encode it, output it to another stream, and dispose of it before moving onto the next segment of data.
Lets say the user has generated 100 10mb videos. If possible, this would allow me to send the zip to the user without first loading the entire 1GB of footage into memory (or storing the entire zip in memory after the fact).
The individual videos are pretty small, so if I need to load an entire file into memory at a time, that is fine as long as I can remove it from memory after it has been encoded and transmitted before moving onto the next file
Yes, it is certainly possible to stream in files, not requiring even any of those to be entirely in memory at any one time, and to compress, stream out, and transmit a zip file containing those, without holding the entire zip file either in memory or mass storage. The zip format is designed to be streamable. However I am not aware of a library that will do that for you.
ZipFile would require saving the entire zip file before transmitting it. If you're ok with saving the zip file in mass storage (not memory) before transmitting, then use ZipFile.
To write your own zip streamer, you would need to generate the zip file format manually. The zip format is documented here. You can use DeflateStream to do the actual compression and Crc32 to compute the CRC-32s. You would transmit the local header before each file's compressed data, followed by a data descriptor after each. You would save the local header information in memory as you go along, and then transmit the central directory and end record after all of the local entries.
zip is a relatively straightforward format, so while it would take a little bit of work, it is definitely doable.

How to efficiently write multiple data ranges from one file on the internet simultaneosly into one File

I want to have multiple network stream threads writing/downloading into one file simultaneosly.
So e.G you have one File and download the ranges:
0-1000
1001-2002
2003-3004...
And I want them all to write their receiving bytes into one File as efficient as possible.
Right now I am downloading each range part into one File and combine them later when they are all finished into the final File.
I would like them to, if it is possible to all write into one File to reduce disk usage and I feel like this could all be done better.
You could use persisted memory mapped files, see https://learn.microsoft.com/en-us/dotnet/standard/io/memory-mapped-files
Persisted files are memory-mapped files that are associated with a source file on a disk. When the last process has finished working with the file, the data is saved to the source file on the disk. These memory-mapped files are suitable for working with extremely large source files.

Compressing images, is there a point?

I'm working on a web site which will host thousands of user uploaded images in the formats, .png, .jpeg, and .gif.
Since there will be such a huge amount of images, saving just a few kb of space per file will in the end mean quite a lot on total storage requirements.
My first thought was to enable windows folder compression on the folder that the files are stored in (using a Windows / IIS server). On a total of 1Gb of data the total space saved on this was ~200kb.
This to me seems like a poor result. I therefore went to check if the windows folder compression could be tweaked but according to this post it cant be: NTFS compressed folders
My next though was then that I could use libraries such as Seven Zip Sharp to compress the files individually as I save them. But before I did this I went to test a few different compression programs on a few images.
The results on a 7Mb .gif was that
7z, Compress to .z7 = 1kb space saved
7z, Compress to .zip = 2kb space INCREASE
windows, native zip = 4kb space saved.
So this leaves me with two thoughs.. the zipping programs I'm using aren't very good, or images are pretty much already compressed as far as they can be (..and I'm surprised that windows built in compression is better than 7z).
So my question is, is there any way to decrease the filesize of an image archive consisting of the image formats listed above?
the zipping programs I'm using suck, or images are pretty much already compressed as far as they can be
Most common image formats are already compressed (PNG, JPEG, etc). Compressing a file twice will almost never yield any positive result, most likely it will only increase the file size.
So my question is, is there any way to decrease the filesize of an image archive consisting of the image formats listed above?
No, not likely. Compressed files might have at most a little more to give, but you have specialize on images itself, not the compression algoritm. Some good options are available in the post of Robert Levy. A tool I used to strip out metadata is PNGOUT.
Most users will likely be uploading files that have a basic level of compression already done on them so that's why you aren't seeing a ton of benefit. Some users may be uploading uncompressed files though in which case your attempts would make a difference.
That said, image compression should be thought of as a unique field from normal file compression. Normal file compression techniques will be "lossless", ensuring that every bit of the file is restored when the file is uncompressed - images (and other media) can be compressed in "lossy" ways without degrading the file to an unacceptable level.
There are specialized tools such which you can use to do things like strip out metadata, apply a slight blur, perform sampling, reduce quality, reduce dimensions, etc. Have a look at the answer here for a good example: Recommendation for compressing JPG files with ImageMagick. The top answer took the example file from 264kb to 170kb.

Better option for small files, send or compress and send?

If I have relative small file (<1 Mb) what is better option for program, first compress file to the disk , and than send this .zip file , or send only. I am never sure for net and disk speed, file size is changing to ,but that is not big changes . I think that compress, of course, is better for larger file, but when file is few Kb are I getting something with compressing of it , or I lose because time that i need to write and read from hdd?
Thanx
The best option is to read the file from disk, compress it, and then send it without re-writing the compressed file to disk. The receiver can then decompress it in memory. This is essentially how a web server serves compressed web pages to compatible browsers.
C# is not a language I'm familiar with, but you are probably looking for something like System.IO.Compression.GZipStream.

Create a ZIP file without entries touching the disk?

I'm trying to create a program that has the capability of creating a zipped package containing files based on user input.
I don't need any of those files to be written to the hard drive before they're zipped, as that would be unnecessary, so how do I create these files without actually writing them to the hard drive, and then have them zipped?
I'm using DotNetZip.
See the documentation here, specifically the example called "Create a zip using content obtained from a stream":
using (ZipFile zip = new ZipFile())
{
ZipEntry e= zip.AddEntry("Content-From-Stream.bin", "basedirectory", StreamToRead);
e.Comment = "The content for entry in the zip file was obtained from a stream";
zip.AddFile("Readme.txt");
zip.Save(zipFileToCreate);
}
If your files are not already in a stream format, you'll need to convert them to one. You'll probably want to use a MemoryStream for that.
I use SharpZipLib, but if DotNetZip can do everything against a basic System.IO.Stream, then yes, just feed it a MemoryStream to write to.
Writing to the hard disk shouldn't be something avoid because it's unnecessary. That's backwards. If it's not a requirement that the entire zipping process is done in memory then avoid it by writing to the hard disk.
The hard disk is better suited for storing large amounts of data than memory is. If by some chance your zip file ends up being around a gigabyte in size your application could croak or at least cause a system slowdown. If you write directly to the hard drive the zip could be several gigabytes in size without causing an issue.

Categories

Resources