Is there a way to read a zip file from a network stream without buffering it entirely in memory? I'd have liked to avoid downloading the entire file before starting to process its contents to save on processing time.
I'm using .Net Core 3.1
The ZipArchive class as shown here will buffer the stream into memory if it's not seekable. So the only way to avoid buffering large zip files into memory would be to first download the file to the local file system and opening a FileStream which is seekable.
This is because the Directory of the zip, the part of the file that has a list of all the contents and their locations, is located at the end of the file. So the class needs to jump around between different parts of the zip to extract its contents.
Using C#, I would like to create a zip file in AWS S3, add file entries to it, then close the stream. System.IO.Compression.ZipArchive can be created from a System.IO.Stream. Is it possible to get a writeable stream into an S3 bucket? I am using the .NET SDK FOR S3.
An object uploaded to S3 must have a known size when the request is made. Since the size of the zip file won't be known till the stream is closed you can't do what you are asking about. You would have to create the zip file locally then upload it to S3.
The closest you could get to what you are asking for is using S3's multi part upload. I would use a MemoryStream as the underlying stream for the ZipArchive and each time you add a file to the zip archive check to see if the MemoryStream is larger than 5 megabytes. If it is take the byte buffer from the MemoryStream and upload a new part to S3. Then clear the MemoryStream and continue adding files to the zip archive.
You'll probably want to take a look at this answer here for an existing discussion around this.
This doc page seems to suggest that there is an Upload method can take a stream (with S3 taking care of re-assembling the multi-part upload). Although this is for version 1 so might not be available in version 3.
I am downloading a zip file using c# program and I get the error
at System.IO.Compression.ZipArchive.ReadEndOfCentralDirectory()
at System.IO.Compression.ZipArchive.Init(Stream stream, ZipArchiveMode mode,
Boolean leaveOpen)
at System.IO.Compression.ZipArchive..ctor(Stream stream, ZipArchiveMode mode,
Boolean leaveOpen, Encoding entryNameEncoding)
at System.IO.Compression.ZipFile.Open(String archiveFileName, ZipArchiveMode
mode, Encoding entryNameEncoding)
at System.IO.Compression.ZipFile.ExtractToDirectory(String sourceArchiveFileN
ame, String destinationDirectoryName, Encoding entryNameEncoding)
at System.IO.Compression.ZipFile.ExtractToDirectory(String sourceArchiveFileN
ame, String destinationDirectoryName)
Here's the program
response = (HttpWebResponse)request.GetResponse();
Stream ReceiveStream = response.GetResponseStream();
byte[] buffer = new byte[1024];
FileStream outFile = new FileStream(zipFilePath, FileMode.Create);
int bytesRead;
while ((bytesRead = ReceiveStream.Read(buffer, 0, buffer.Length)) != 0)
outFile.Write(buffer, 0, bytesRead);
outFile.Close();
response.Close();
try
{
ZipFile.ExtractToDirectory(zipFilePath, destnDirectoryName);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
Console.ReadLine();
}
I do not understand the error. Can anybody explain this
Thanks
MR
The problem is ZipFile can't find the line of code that signals the end of the archive, so either:
It is not a .zip archive.
It may be a .rar or other compressed type. Or as I suspect here, you are downloading an html file that auto-redirects to the zip file.
Solution - Gotta find a correct archive to use this code.
The archive is corrupt.
Solution - The archive will need repairing.
There is more than 1 part to the archive.
A multi part zip file.
Solution - Read in all the files before decompression.
As #ElliotSchmelliot notes in comments, the file may be hidden or have extended characters in the name.
Solution - Check your file attributes/permissions and verify the file name.
Opening the file with your favorite zip/unzip utility (7-zip, winzip, etc) will tell which of these it could be.
From your old question you deleted.
I get System.IO.InvalidDataException: End of Central Directory record could not be found.
This most likely means whatever file you are passing in is malformed and the Zip is failing. Since you already have the file outfile on the hard drive I would recommend trying to open that file with with windows built in zip extractor and see if it works. If it fails the problem is not with your unzipping code but with the data the server is sending to you.
I have the same problem, but in my case the problem is with the compression part and not with the decompression.
During the compression I need use the "Using" statament with the Stream and the ZipArchive objects too. The "Using" statament will Close the archive properly and I can decompress it without any problem.
The working code in my case in VB.Net:
Using zipSteramToCreate As New MemoryStream()
Using archive As New ZipArchive(zipSteramToCreate, ZipArchiveMode.Create)
' Add entry...
End Using
' Return the zip byte array for example:
Return zipSteramToCreate.ToArray
End Using
I encountered this same problem. There are many types of compression, .zip being only one of the types. Look and make sure that you aren't trying to 'unzip' a .rar or similar file.
In my case i absolutely KNEW that my zip was not corrupted, and I was able to figure out through trial and error that I was extracting the files to a directory with the filename and extension in the FOLDER Name.
So Unzipping /tmp/data.zip to:
/tmp/staging/data.zip/files_go_here
failed with the error [End of Central Directory record could not be found]
but extracting data.zip to this worked just fine:
/tmp/staging/data/files_go_here
While it might seem unusual to some folks to name a folder a filename with extension, I can't think of a single reason why you should not be able to do this, and more importantly -- the error returned is not obviously related to the cause.
I was getting the same error with both the System.IO.Compression library and 3rd party packages such as SharpZipLib -- which is what eventually clued me in that it was a more general issue.
I hope this helps someone and saves them some time/frustration.
I used SharpCompress C#.net Library available via Nuget Package manager, it solved my purpose of unzipping.
I just came across this thread when I had the same error from a PowerShell script calling the Net.WebClient DownloadFile method.
In my case, the problem was that the web server was unable to provide the requested zip file, and instead provided an HTML page with an error message in it, which obviously could not be unzipped.
So instead, I created an exception handler to extract and present the "real" error message.
Might be useful to someone else. I dealt with this by adding an exception to my code, which then:
Creates a temporary directory
Extracts the zip archive (normally works)
Renames the original ziparchive to *.bak
Zips and replaces the original archive file with one that works
For me, the problem had to do with git settings.
To solve it, I added:
*.zip binary
to my .gitattributes file.
Then I downloaded an uncorrupted version of the file (without using git) and added a new commit updating the .zip file to the uncorrupted version and also updating the .gitattributes file.
I wish I could avoid adding that extra commit to update the .zip file, but the only way I can think of avoiding that would be to insert a commit updating the .gitattributes file into or before the commit that added the .zip file (using a rebase) and using git push -f to update the remote repo, but I can't do that.
I also had this error because I was trying to open a .json file as a .zip archive:
using(ZipArchive archive = ZipFile.Open(fileToSend.FilePath, ZipArchiveMode.Read))
{
ZipArchiveEntry entry = archive.GetEntry(fileToSend.FileName);
using (StreamReader reader = new StreamReader(entry.Open(), Encoding.UTF8))
{
fileContent = reader.ReadToEnd();
}
}
I was expecting that fileToSend.FilePath = "C:\MyProject\mydata.zip"
but it was actually fileToSend.FilePath = "C:\MyProject\mydata.json" and that was causing the error.
Write down the stream to a file then inspect it with a (hex) editor.
I got the same message in Visual Studio when downloading nupkg from nuget.org. It was because nuget.org was blacklisted by the firewall. So instead of the pkg I got a html error page which (of course) cannot be unzipped.
In my case: I was mistakenly saving an input stream to *.zip.
While Archive Utility had no issues opening the file, all the rest failed (unzip cmd or java libs) with the same "end of central" error.
The plot-twist was: the file I'm downloading is in gzip format, i.e. *.gz, and not zip.
Make sure it is a zip file you trying to decompress.
The web-service I querying zips results when there are two files, but in this instance it was just returning one. My code was saving the embedded base64 as a stream and therefore my code was assigning the zip extension.
Whereas it was already actually just a plain PDF...
In my case, I was receiving this error in a combination with FileSystemWatcher, which triggered a processing method upon the zip archive before the archive was fully copied/created in its target folder.
I solved it with a check of whether the zip archive was truly eligible for reading in a try/catch block within a while loop.
My solution compress with powershell
Compress-Archive * -DestinationPath a.zip
I found resolution.
Move "Tools->Nuget PackageManager ->Package Manager Settings" and in "Nuget Package Manager" -General Tab , click Clear All Nuget Caches button and OK. You can install package from online
I'm trying to create a program that has the capability of creating a zipped package containing files based on user input.
I don't need any of those files to be written to the hard drive before they're zipped, as that would be unnecessary, so how do I create these files without actually writing them to the hard drive, and then have them zipped?
I'm using DotNetZip.
See the documentation here, specifically the example called "Create a zip using content obtained from a stream":
using (ZipFile zip = new ZipFile())
{
ZipEntry e= zip.AddEntry("Content-From-Stream.bin", "basedirectory", StreamToRead);
e.Comment = "The content for entry in the zip file was obtained from a stream";
zip.AddFile("Readme.txt");
zip.Save(zipFileToCreate);
}
If your files are not already in a stream format, you'll need to convert them to one. You'll probably want to use a MemoryStream for that.
I use SharpZipLib, but if DotNetZip can do everything against a basic System.IO.Stream, then yes, just feed it a MemoryStream to write to.
Writing to the hard disk shouldn't be something avoid because it's unnecessary. That's backwards. If it's not a requirement that the entire zipping process is done in memory then avoid it by writing to the hard disk.
The hard disk is better suited for storing large amounts of data than memory is. If by some chance your zip file ends up being around a gigabyte in size your application could croak or at least cause a system slowdown. If you write directly to the hard drive the zip could be several gigabytes in size without causing an issue.
I'm trying to update (add/modify files) an existing JAR file, and this code (using the DotNetZip library) results in the archive being "corrupted", I cannot open it with WinRAR as a ZIP or a JAR:
using (FileStream fs = new FileStream("/path/to/jar", FileMode.Open))
{
ZipFile zip = ZipFile.Read(fs);
fs.Seek(0, SeekOrigin.Begin);
zip.Save(fs);
}
Can anyone tell me what the difference between the ZIP and JAR format is, exactly? I was under the impression it was simply the ZIP format with the manifest as the first entry in the file, which is apparently not the case. Is there an existing (C#) library I can use to do this?
A JAR is binary compatible with a ["standard"] ZIP archive. Only the optional Manifest file is prescribed, but this will not cause a "corrupt archive".
I believe one (or both) of these is happening:
It is not truncating the file so there is "garbage" left at the end.
The actual "Read" is lazy and the latter "Save" might mess up the data before it has been correctly read.
(Zipping it to a new file would allow this to be verified.)