ZipArchive Read From Unseekable Stream Without Buffering to Memory - c#

Is there a way to read a zip file from a network stream without buffering it entirely in memory? I'd have liked to avoid downloading the entire file before starting to process its contents to save on processing time.
I'm using .Net Core 3.1

The ZipArchive class as shown here will buffer the stream into memory if it's not seekable. So the only way to avoid buffering large zip files into memory would be to first download the file to the local file system and opening a FileStream which is seekable.
This is because the Directory of the zip, the part of the file that has a list of all the contents and their locations, is located at the end of the file. So the class needs to jump around between different parts of the zip to extract its contents.

Related

Zipping a large amount of data into an output stream without loading all the data into memory first in C#

I have a C# program that generates a bunch of short (10 seconds or so) video files. These are stored in an azure file storage blob. I want the user to be able to download these files at a later date as a zip. However, it would take a substantial amount of memory to load the entire collection of video files into memory to create the zip. I was wondering if it is possible to pull data from a stream into memory, zip encode it, output it to another stream, and dispose of it before moving onto the next segment of data.
Lets say the user has generated 100 10mb videos. If possible, this would allow me to send the zip to the user without first loading the entire 1GB of footage into memory (or storing the entire zip in memory after the fact).
The individual videos are pretty small, so if I need to load an entire file into memory at a time, that is fine as long as I can remove it from memory after it has been encoded and transmitted before moving onto the next file
Yes, it is certainly possible to stream in files, not requiring even any of those to be entirely in memory at any one time, and to compress, stream out, and transmit a zip file containing those, without holding the entire zip file either in memory or mass storage. The zip format is designed to be streamable. However I am not aware of a library that will do that for you.
ZipFile would require saving the entire zip file before transmitting it. If you're ok with saving the zip file in mass storage (not memory) before transmitting, then use ZipFile.
To write your own zip streamer, you would need to generate the zip file format manually. The zip format is documented here. You can use DeflateStream to do the actual compression and Crc32 to compute the CRC-32s. You would transmit the local header before each file's compressed data, followed by a data descriptor after each. You would save the local header information in memory as you go along, and then transmit the central directory and end record after all of the local entries.
zip is a relatively straightforward format, so while it would take a little bit of work, it is definitely doable.

Write Zip file to AWS as a stream

Using C#, I would like to create a zip file in AWS S3, add file entries to it, then close the stream. System.IO.Compression.ZipArchive can be created from a System.IO.Stream. Is it possible to get a writeable stream into an S3 bucket? I am using the .NET SDK FOR S3.
An object uploaded to S3 must have a known size when the request is made. Since the size of the zip file won't be known till the stream is closed you can't do what you are asking about. You would have to create the zip file locally then upload it to S3.
The closest you could get to what you are asking for is using S3's multi part upload. I would use a MemoryStream as the underlying stream for the ZipArchive and each time you add a file to the zip archive check to see if the MemoryStream is larger than 5 megabytes. If it is take the byte buffer from the MemoryStream and upload a new part to S3. Then clear the MemoryStream and continue adding files to the zip archive.
You'll probably want to take a look at this answer here for an existing discussion around this.
This doc page seems to suggest that there is an Upload method can take a stream (with S3 taking care of re-assembling the multi-part upload). Although this is for version 1 so might not be available in version 3.

Generate and stream Zip archive without storing it in the filesystem or reading into memory first

How can I asynchronously take multiple existing streams (from the db), add them to a zip archive stream and return it in asp.net web api 2?
The key difference with the other "duplicate" question is how to do this in a streaming fashion without writing it to a temp file or buffer it completely in memory first.
It looks like you can't do this directly
Writing to ZipArchive using the HttpContext OutputStream
The http response stream needs to support seeking for a zip to be written directly to it which it doesn't. Will need to write to a temporary file by the looks of it.

Create a ZIP file without entries touching the disk?

I'm trying to create a program that has the capability of creating a zipped package containing files based on user input.
I don't need any of those files to be written to the hard drive before they're zipped, as that would be unnecessary, so how do I create these files without actually writing them to the hard drive, and then have them zipped?
I'm using DotNetZip.
See the documentation here, specifically the example called "Create a zip using content obtained from a stream":
using (ZipFile zip = new ZipFile())
{
ZipEntry e= zip.AddEntry("Content-From-Stream.bin", "basedirectory", StreamToRead);
e.Comment = "The content for entry in the zip file was obtained from a stream";
zip.AddFile("Readme.txt");
zip.Save(zipFileToCreate);
}
If your files are not already in a stream format, you'll need to convert them to one. You'll probably want to use a MemoryStream for that.
I use SharpZipLib, but if DotNetZip can do everything against a basic System.IO.Stream, then yes, just feed it a MemoryStream to write to.
Writing to the hard disk shouldn't be something avoid because it's unnecessary. That's backwards. If it's not a requirement that the entire zipping process is done in memory then avoid it by writing to the hard disk.
The hard disk is better suited for storing large amounts of data than memory is. If by some chance your zip file ends up being around a gigabyte in size your application could croak or at least cause a system slowdown. If you write directly to the hard drive the zip could be several gigabytes in size without causing an issue.

DotNetZip - Calculate final zip size before calling Save(stream)

When using DotNetZip, is it possible to get what the final zip file size will be before calling Save(stream)? I have a website where users will be downloading fairly large zip files (over 2 gigs), and I would like to be able to stream the file to the user rather then buffering the entire file into memory. Some thing like this...
response.BufferOutput = false;
response.AddHeader("Content-Length", ????);
Is this possible?
If the stream is homogenous, you could waste some time by compressing a 'small' portion ahead, calculating the compression ratio and extrapolating from that.
If you are meaning to set a content-length header or something like that, it can only be done when you (1) write a temporary file (advisable if there is any risk of connection trouble and clients requesting specific chunks anyway) (2) can keep the entire file in memory (presumably on ly on 64bit system with copious memory)
Of course, you could waste enormous resources and just compress the stream twice, but I hope you agree that would be silly.
The way to do what you want is to save the file to a temporary filesystem file, then stream the result to the user. This lets you compute the size then transmit the file.
In this case dotnetzip will not save the file into memory.

Categories

Resources