Programmatically merging zip segments made by DotNetZip - c#

I have a problem with merging zip segments which I made using DotNetZip library.
I'm zipping big file, which produces files like: abc.zip, abc.z01 and abc.z02.
using (ZipFile zip = new ZipFile())
{
zip.AddDirectory(fullDir);
zip.MaxOutputSegmentSize = 65536;
zip.Save(packageFullName);
return zip.NumberOfSegmentsForMostRecentSave;
}
In other service I want to download these files and merge them to only one zip file. I made this by simply merging theirs byte arrays. Sadly I'm getting error, that archive created by me isn't valid.
I'm not sure why my approach isn't right. I found this stack question: https://superuser.com/questions/15935/how-do-i-reassemble-a-zip-file-that-has-been-emailed-in-multiple-parts - accepted answer also produces invalid archive.
Do anybody knows how can I merge a few DotNetZip files? I don't really want to extract them in memory and pack once again, but maybe it's the only way.

dotnetzip can read segment zip files without problem, you can refer it's source code to take a look how it handle the segement files as one zip file, its an internal class you cannot directly use, but it may have a clue tell you how to do it:
http://dotnetzip.codeplex.com/SourceControl/latest#Zip/ZipSegmentedStream.cs

Related

split zip files to volumes [duplicate]

I need to create spanned (multi-volume) zip files using .Net, but I have been unable to find a library that enables me to do it.
Spanned zip is a zip compressed file that is split among a number of files, which usually have extensions like .z00, .z01, and so on.
The library would have to be open-source or free, because I'm gonna use it for a open source project.
(it's a duplicate to this question, but there are no answers there and I'm not going for ASP specific anyway)
DotNetZip example:
int segmentsCreated ;
using (ZipFile zip = new ZipFile())
{
zip.UseUnicode= true; // utf-8
zip.AddDirectory(#"MyDocuments\ProjectX");
zip.Comment = "This zip was created at " + System.DateTime.Now.ToString("G") ;
zip.MaxOutputSegmentSize = 100*1024 ; // 100k segments
zip.Save("MyFiles.zip");
segmentsCreated = zip.NumberOfSegmentsForMostRecentSave ;
}
if segmentsCreated comes back as 5, then you have the following files, each not more than 100kb in size.
MyFiles.zip
MyFiles.z01
MyFiles.z02
MyFiles.z03
MyFiles.z04
Edited To Note: DotNetZip used to live at Codeplex. Codeplex has been shut down. The old archive is still [available at Codeplex][1]. It looks like the code has migrated to Github:
https://github.com/DinoChiesa/DotNetZip. Looks to be the original author's repo.
https://github.com/haf/DotNetZip.Semverd. This looks to be the currently maintained version. It's also packaged up an available via Nuget at https://www.nuget.org/packages/DotNetZip/
DotNetZip allows you to do this. From their documentation:
The library supports zip passwords, Unicode, ZIP64, stream input and output,
AES encryption, multiple compression levels, self-extracting archives,
spanned archives, and more.
Take a look at the SevenZipSharp library. It supports multivolumes archives.

See when the zip file has been zipped?

I have googled without any luck.
My question is if there is a way to see when the file has been zipped? I am not asking for when the file has been created or modified rather to see when it has been zipped.
Is there a byte I can read from the zipped file somehow to find out this in code (C#)?
Best regards
As far as I can tell,that information is not stored in the zip file. The only info you can get is the last time that file inside the zip was modified. That datetime could or could not be the first time that file was included in the zip file.
If you want to get the last time the entry in the zip archive was changed, you can use the ZipArchiveEntry.LastWriteTime in the System.IO.Compression Namespace.
If you have any doubt of what info is availble in a zip file, you can check this wikipedia article : Zip File Format

Efficient compression of folder with same file copied multiple times

I am creating a *.zip using Ionic.Zip. However, my *.zip contains same files multiple times, sometimes even 20x, and the ZIP format does not take advantage of it at all.
Whats worse, Ionic.Zip sometimes crashes with an OutOfMemoryException, since I am compressing the files into a MemoryStream.
Is there a .NET library for compressing that takes advantage of redundancy between files?
Users decompress the files on their own, so it cannot be an exotic format.
I ended up creating a tar.gz using the SharpZipLib library. Using this solution on 1 file, the archive is 3kB. Using it on 20 identical files, the archive is only 6kB, whereas in .zip it was 64kB.
Nuget:
Install-Package SharpZipLib
Usings:
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;
Code:
var output = new MemoryStream();
using (var gzip = new GZipOutputStream(output))
using (var tar = TarArchive.CreateOutputTarArchive(gzip))
{
for (int i = 0; i < files.Count; i++)
{
var tarEntry = TarEntry.CreateEntryFromFile(file);
tar.WriteEntry(tarEntry,false);
}
tar.IsStreamOwner = false;
gzip.IsStreamOwner = false;
}
No, there is no such API exposed by well-known ones (such as GZip, PPMd, Zip, LZMA). They all operate per file (or stream of bytes to be more specific).
You could catenate all the files, ie using a tar-ball format and then use compression algorithm.
Or, it's trivial to implement your own check: compute hash for a file and store it in the a hash-filename dictionary. If hash matches for next file you can decide what you want to do, such as ignore this file completely, or perhaps note its name and save it in another file to mark duplicates.
Yes, 7-zip. There is a SevenZipSharp library you could use, but from my experience, launching a compressing process directly using command line is much faster.
My personal experience:
We used a SevenZipSharp in a company to decompress archives up to 1GB and it was terribly slow until I reworked it so that it will use the 7-zip library directly by running its command line interface. Then it was as fast as it was when decompressing manually in Windows Explorer.
I haven't tested this, but according to one answerer in How many times can a file be compressed?
If you have a large number of duplicate files, the zip format will zip each independently, and you can then zip the first zip file to remove duplicate zip information.

Create a ZIP file without entries touching the disk?

I'm trying to create a program that has the capability of creating a zipped package containing files based on user input.
I don't need any of those files to be written to the hard drive before they're zipped, as that would be unnecessary, so how do I create these files without actually writing them to the hard drive, and then have them zipped?
I'm using DotNetZip.
See the documentation here, specifically the example called "Create a zip using content obtained from a stream":
using (ZipFile zip = new ZipFile())
{
ZipEntry e= zip.AddEntry("Content-From-Stream.bin", "basedirectory", StreamToRead);
e.Comment = "The content for entry in the zip file was obtained from a stream";
zip.AddFile("Readme.txt");
zip.Save(zipFileToCreate);
}
If your files are not already in a stream format, you'll need to convert them to one. You'll probably want to use a MemoryStream for that.
I use SharpZipLib, but if DotNetZip can do everything against a basic System.IO.Stream, then yes, just feed it a MemoryStream to write to.
Writing to the hard disk shouldn't be something avoid because it's unnecessary. That's backwards. If it's not a requirement that the entire zipping process is done in memory then avoid it by writing to the hard disk.
The hard disk is better suited for storing large amounts of data than memory is. If by some chance your zip file ends up being around a gigabyte in size your application could croak or at least cause a system slowdown. If you write directly to the hard drive the zip could be several gigabytes in size without causing an issue.

How to Decompress nested GZip (TGZ) files in C#

I am receiving a TGZ file that will contain one plain text file along with possibly one or more nested TGZ files. I have figured out how to decompress the main TGZ file and read the plain text file contained in it, but I have not been able to figure out how to recognize and decompress the nested TGZ files. Has anyone come across this problem before?
Also, I do not have control over the file I am receiving, so I cannot change the format of a TGZ file containing nested TGZ files. One other caveat (even though I don't think it matters) is that these files are being compressed and tarred in a Unix or Linux environment.
Thanks in advance for any help.
Try the SharpZipLib (http://www.icsharpcode.net/OpenSource/SharpZipLib/Download.aspx) free library.
It lets you work with TGZ and has methods to test files before trying to inflate them; so you can either rely on the file extensions being correct, or test them individually to see if you can read them as compressed files - then inflate them once the main file has been decompressed.
To read and write .tar and .tgz (or .tar.gz ) files from .NET, you can use this one-file tar class:
http://cheesoexamples.codeplex.com/SourceControl/changeset/view/97756#1868643
Very simple usage. To create an archive:
string[] filenames = { ... };
Ionic.Tar.CreateArchive("archive.tar", filenames);
Create a compressed (gzip'd) tar archive:
string[] filenames = { ... };
Ionic.Tar.CreateArchive("archive.tgz", filenames, TarOptions.Compress);
Read a tar archive:
var entries = Ionic.Tar.List("archive.tar"); // also handles .tgz files
Extract all entries in a tar archive:
var entries = Ionic.Tar.Extract("archive.tar"); // also handles .tgz files
Take a look at DotNetZip on CodePlex.
"If all you want is a better
DeflateStream or GZipStream class to
replace the one that is built-into the
.NET BCL, that is here, too.
DotNetZip's DeflateStream and
GZipStream are available in a
standalone assembly, based on a .NET
port of Zlib. These streams support
compression levels and deliver much
better performance that the built-in
classes. There is also a ZlibStream to
complete the set (RFC 1950, 1951,
1952)."
It appears that you can iterate through the compressed file and pull the individual files out of the archive. You can then test the files you uncompressed and see if any of them are themselves GZip files.
Here is a snippit from their Examples Page
using (ZipFile zip = ZipFile.Read(ExistingZipFile))
{
foreach (ZipEntry e in zip)
{
e.Extract(OutputStream);
}
}
Keith

Categories

Resources