Create a ZIP file without entries touching the disk?

Create a ZIP file without entries touching the disk? - c#

I'm trying to create a program that has the capability of creating a zipped package containing files based on user input.
I don't need any of those files to be written to the hard drive before they're zipped, as that would be unnecessary, so how do I create these files without actually writing them to the hard drive, and then have them zipped?
I'm using DotNetZip.

See the documentation here, specifically the example called "Create a zip using content obtained from a stream":
using (ZipFile zip = new ZipFile())
{
ZipEntry e= zip.AddEntry("Content-From-Stream.bin", "basedirectory", StreamToRead);
e.Comment = "The content for entry in the zip file was obtained from a stream";
zip.AddFile("Readme.txt");
zip.Save(zipFileToCreate);
}
If your files are not already in a stream format, you'll need to convert them to one. You'll probably want to use a MemoryStream for that.

I use SharpZipLib, but if DotNetZip can do everything against a basic System.IO.Stream, then yes, just feed it a MemoryStream to write to.

Writing to the hard disk shouldn't be something avoid because it's unnecessary. That's backwards. If it's not a requirement that the entire zipping process is done in memory then avoid it by writing to the hard disk.
The hard disk is better suited for storing large amounts of data than memory is. If by some chance your zip file ends up being around a gigabyte in size your application could croak or at least cause a system slowdown. If you write directly to the hard drive the zip could be several gigabytes in size without causing an issue.

Related

Storing PDF in RamDisk vs MemoryMappedFile

I am using a library to create a PDF file from HTML, then I print that file and delete it from disk. The issue is, we don't want anything stored on disk, not even temporarily. I thought about using some kind of RamDisk, but I see MemoryMappedFile thrown around everywhere. I looked into it, but I don't think that provides the functionality I want since the file should already exist on disk before using it in a MemoryMappedFile.
My question is: Is my assumption true about the file needing to be on disk FIRST before using it in a MemoryMappedFile? IS there a way to create a MemoryMappedFile and access it using a virtual path without it existing on disk?
In case MemoryMappedFile doesn't work, what library can I use to create a virtual RamDisk with C#? It will only be storing 1 file at a time.

Zipping a large amount of data into an output stream without loading all the data into memory first in C#

I have a C# program that generates a bunch of short (10 seconds or so) video files. These are stored in an azure file storage blob. I want the user to be able to download these files at a later date as a zip. However, it would take a substantial amount of memory to load the entire collection of video files into memory to create the zip. I was wondering if it is possible to pull data from a stream into memory, zip encode it, output it to another stream, and dispose of it before moving onto the next segment of data.
Lets say the user has generated 100 10mb videos. If possible, this would allow me to send the zip to the user without first loading the entire 1GB of footage into memory (or storing the entire zip in memory after the fact).
The individual videos are pretty small, so if I need to load an entire file into memory at a time, that is fine as long as I can remove it from memory after it has been encoded and transmitted before moving onto the next file

Yes, it is certainly possible to stream in files, not requiring even any of those to be entirely in memory at any one time, and to compress, stream out, and transmit a zip file containing those, without holding the entire zip file either in memory or mass storage. The zip format is designed to be streamable. However I am not aware of a library that will do that for you.
ZipFile would require saving the entire zip file before transmitting it. If you're ok with saving the zip file in mass storage (not memory) before transmitting, then use ZipFile.
To write your own zip streamer, you would need to generate the zip file format manually. The zip format is documented here. You can use DeflateStream to do the actual compression and Crc32 to compute the CRC-32s. You would transmit the local header before each file's compressed data, followed by a data descriptor after each. You would save the local header information in memory as you go along, and then transmit the central directory and end record after all of the local entries.
zip is a relatively straightforward format, so while it would take a little bit of work, it is definitely doable.

Create a "directory" in memory?

I'm working in c#, and looking for a way to create a path to a directory that will map to an IO.Stream instead of to the actual file system.
I want to be able to "save" files to that path, manipulate the content or file names, and then save them from that path to a regular file in the file system.
I know I can use a temporary file, but I would rather use the memory for both security and performance.
This kind of thing exists, according to this answer, in Java, using the FileSystemProvider class. I'm looking for a way to do it in c#.
I've tried every search I could think of and came up only with the java answer and suggestions to use Temporary files.
Is it even possible using .net?
Basically, I'm looking for a way to enable saving files directly to memory as if they where saved into the file system.
so, for instance, if I had a 3rd party class that exposes a save method (save(string fullPath)), or something like the SmtpServer.Send(MyMsg) in this question, i could choose that path and save it into the memory stream instead of onto the drive. (the main thing here is that I want to provide a path that will lead directly to a memory stream).

.NET doesn't have an abstraction layer over the host OS's file system. So unless you can build your own for use in custom code, and you need to have 3rd party libraries covered, there are just two workable optilns:
Use streams and avoid any APIs working with file names.
Build a virtual file system plugged into your host OS's storage architecture; however, the effort needed versus benefits is highly questionable.

I went through a similar situation lately, and there is no out of the box solution in .NET for doing that although I used a workaround which was efficient and safe for me.
Using Ionic.Zip Nuget package you can create a whole directory with a complex structure as a stream in memory and although it will be created as a zip file, you can extract it as a stream or even send the zip file as a stream.
using (var zip = new Ionic.Zip.ZipFile())
{
zip.AddEntry($"file1.json", new MemoryStream(Encoding.UTF8.GetBytes(someJsonContent)));
for (int i = 0; i < 4; i++)
{
zip.AddEntry($"{myDir}/{i}.json", new MemoryStream(Encoding.UTF8.GetBytes(anotherJsonContent)));
}
}
And here is how to extract a zip file as a stream using Ionic.Zip

Efficient compression of folder with same file copied multiple times

I am creating a *.zip using Ionic.Zip. However, my *.zip contains same files multiple times, sometimes even 20x, and the ZIP format does not take advantage of it at all.
Whats worse, Ionic.Zip sometimes crashes with an OutOfMemoryException, since I am compressing the files into a MemoryStream.
Is there a .NET library for compressing that takes advantage of redundancy between files?
Users decompress the files on their own, so it cannot be an exotic format.

I ended up creating a tar.gz using the SharpZipLib library. Using this solution on 1 file, the archive is 3kB. Using it on 20 identical files, the archive is only 6kB, whereas in .zip it was 64kB.
Nuget:
Install-Package SharpZipLib
Usings:
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;
Code:
var output = new MemoryStream();
using (var gzip = new GZipOutputStream(output))
using (var tar = TarArchive.CreateOutputTarArchive(gzip))
{
for (int i = 0; i < files.Count; i++)
{
var tarEntry = TarEntry.CreateEntryFromFile(file);
tar.WriteEntry(tarEntry,false);
}
tar.IsStreamOwner = false;
gzip.IsStreamOwner = false;
}

No, there is no such API exposed by well-known ones (such as GZip, PPMd, Zip, LZMA). They all operate per file (or stream of bytes to be more specific).
You could catenate all the files, ie using a tar-ball format and then use compression algorithm.
Or, it's trivial to implement your own check: compute hash for a file and store it in the a hash-filename dictionary. If hash matches for next file you can decide what you want to do, such as ignore this file completely, or perhaps note its name and save it in another file to mark duplicates.

Yes, 7-zip. There is a SevenZipSharp library you could use, but from my experience, launching a compressing process directly using command line is much faster.
My personal experience:
We used a SevenZipSharp in a company to decompress archives up to 1GB and it was terribly slow until I reworked it so that it will use the 7-zip library directly by running its command line interface. Then it was as fast as it was when decompressing manually in Windows Explorer.

I haven't tested this, but according to one answerer in How many times can a file be compressed?
If you have a large number of duplicate files, the zip format will zip each independently, and you can then zip the first zip file to remove duplicate zip information.

Creating File On Full Disk

Is it possible to create a file on a disk which is full??
Does creation of the file take any space??
Basically I am seeing a case where C# has created but failed to write anything whhich I think points to a full disk.
Does anyone know whether creating a file on a full disk will fail or not??
This wa done using c# o Windw xSerevr- The log file was also written to the same drive

Creating (empty) files should still be possible in most cases. The MFT is a separate part of the volume which won't get used for file data.
It should even be possible to store small amounts of data without needing more than the file entry in the MFT. NTFS can store streams as "resident data" in the stream descriptor which doesn't need any additional space, but only works for very small files.
I think your issue is another problem, though. It may be that you have permissions to create a file but not to write anything to it. You might want to check the ACLs of the location where you're trying to write.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Create a ZIP file without entries touching the disk? - c#

I use SharpZipLib, but if DotNetZip can do everything against a basic System.IO.Stream, then yes, just feed it a MemoryStream to write to.

Related

Storing PDF in RamDisk vs MemoryMappedFile

Zipping a large amount of data into an output stream without loading all the data into memory first in C#

Create a "directory" in memory?

Efficient compression of folder with same file copied multiple times

Creating File On Full Disk

Categories

Resources