Unzipping A Gzip File That Contains Folders In C# - c#

I've got a windows program using C# that is working with log files. Some of these different log files come in gzipped (for instance test.log.gz). I've got code using SharpZipLib to unzip those log files and it works really well.
public static void unZip(string gzipFilePath, string targetDir)
{
byte[] dataBuffer = new byte[4096];
using (System.IO.Stream fs = new FileStream(gzipFilePath, FileMode.Open, FileAccess.Read))
{
using (GZipInputStream gzipStream = new GZipInputStream(fs))
{
string fnOut = Path.Combine(targetDir, Path.GetFileNameWithoutExtension(gzipFilePath));
using (FileStream fsOut = File.Create(fnOut))
{
StreamUtils.Copy(gzipStream, fsOut, dataBuffer);
}
}
}
}
From my research, it would seem that gzip files are typically one file, so it's always for instance, test.htm.gz. So I would create a file named test.htm and put the uncompressed information into test.htm, which happens in this part of the code:
using (GZipInputStream gzipStream = new GZipInputStream(fs))
{
string fnOut = Path.Combine(targetDir, Path.GetFileNameWithoutExtension(gzipFilePath));
using (FileStream fsOut = File.Create(fnOut))
{
StreamUtils.Copy(gzipStream, fsOut, dataBuffer);
}
}
This is all well and good but the problem I'm having is I've been given a log file, for example again, test.log.gz that has directories zipped into it.
When I use the 7-Zip gui to unzip the file, the log file I need is five directories deep in folders. So after unzipping with 7-zip, it outputs:
folder1 -> folder2 -> folder3 -> folder4 -> folder5 -> test.log
Trying to use the method provided from SharpLib only gives me a small subset of the data of the file in test.log.
I haven't been able to find any code or issues dealing with gzipped files containing folders and from what I can tell, you're not supposed to do that. It should be in a .tar and then gzipped.
Any one have any idea of what I could do with this .gz file?

First Maybe try using another lib here are a few
http://dotnetzip.codeplex.com/
http://www.icsharpcode.net/OpenSource/SharpZipLib/
There is also a built in GZ lib built into .net see
Unzipping a .gz file using C#

There is still just one file in there, so there isn't any violation of the gzip format. gzip permits an entire path name to be stored with the file, so that path may simply be ghostcache/ic_split_files/CBN/00-christmas/test.log and 7-Zip is faithfully recreating that path. You should be able to see this in the gzip header, starting about ten bytes in.
The fact that you are getting back only a subset of the log may or may not be related to the pathname in the gzip file.
Please provide a hex dump of the first 64 bytes of the .gz file that worked and the the .gz file that didn't.

Related

C# - Extracting contents of compressed file without saving

I have a compressed file (.osz) stored on an S3 Bucket which contains a .os file which I need to read the contents of.
I need to be able to extract/decompress and read the contents of the compressed file without downloading the file directly to the PC due to security reasons.
I am able to retrieve the compressed file (.osz) using the URL address of it's S3 Bucket path. Then using ZipArchive I am able to access the files contained within the compressed file. I am then able to extract the file I require by using the 'ExtractToFile' function as seen below. However, this function extracts the file (.os) and saves it locally in the path specified.
WebClient client = new WebClient();
byte[] bytes = client.DownloadData(OSZFilepath); // Read the .osz file contents into a byte array
using (MemoryStream zipStream = new MemoryStream(bytes))
{
// Create the zip containing the file from the stream
ZipArchive zip = new ZipArchive(zipStream);
// extract the compressed file and download the .os file contained within
var fileName = Guid.NewGuid().ToString() + ".OS";
var baseDirectory = Environment.ExpandEnvironmentVariables(System.Web.Configuration.WebConfigurationManager.AppSettings["DataStorage"].ToString());
zip.Entries[0].ExtractToFile(Path.Combine(baseDirectory, "ProjectFiles", fileName));
Although this extracted file can be successfully read and imported by my program, I cannot use this method as I am not able to save the file onto the user's computer due to security restrictions and program requirements.
Therefore I need to programmatically extract/decompress the file I need in order to read the contents and programmatically import it into my program.
I have tried to use the following code to do this:
ZipArchiveEntry entry = zip.GetEntry(zip.Entries[0].Name);
Stream stream = entry.Open();
StreamReader reader = new StreamReader(stream);
string contents = reader.ReadToEnd();
However the resulting contents throws up an error when I try to import it, indicating that the contents is different to the contents of the file that gets saved using 'ExtractToFile'.
This is confirmed when I save this contents as a seperate file and compare it to the file saved using 'ExtractToFile'. The 'ExtractToFile' file is bigger than the latter.
So my question is: is there another way to successfully decompress/extract a compressed file and obtain the contents without using the 'ExtractToFile' method and having to save the extracted file somewhere?
Thanks for your help.

c# ZipFile.CreateFromDirectory - the process cannot access the file "path_to_the_zip_file_created.zip" because it is being used by another process

Basic Code:
string startPath = #"C:\intel\logs";
string zipPath = #"C:\intel\logs-" + DateTime.Now.ToString("yyyy_dd_M-HH_mm_ss") + ".zip";
ZipFile.CreateFromDirectory(startPath, zipPath);
Error: the process cannot access the file "path_to_the_zip_file_created.zip" because it is being used by another process.
The above setup works fine on windows 7 where I have Visual Studio installed but I get the above error message when running on Windows Server 2008R2.
I have checked the antivirus logs and it does not block the application, nor does it lock the zip file that is created.
//WRONG
ZipFile.CreateFromDirectory("C:\somefolder", "C:\somefolder\somefile.zip");
//RIGHT
ZipFile.CreateFromDirectory("C:\somefolder", "C:\someotherfolder\somefile.zip");
I use to do the same error: zipping a file into the same folder that I'm zipping.
This causes an error, of course.
I came across this while because I was trying to zip the folder where my log files were being actively written by a running application. Kyle Johnson's answer could work, but it adds the overhead of copying the folder and the necessity of cleaning up the copy afterwards. Here's some code that will create the zip even if log files are being written to:
void SafelyCreateZipFromDirectory(string sourceDirectoryName, string zipFilePath)
{
using (FileStream zipToOpen = new FileStream(zipFilePath, FileMode.Create))
using (ZipArchive archive = new ZipArchive(zipToOpen, ZipArchiveMode.Create))
{
foreach (var file in Directory.GetFiles(sourceDirectoryName))
{
var entryName = Path.GetFileName(file);
var entry = archive.CreateEntry(entryName);
entry.LastWriteTime = File.GetLastWriteTime(file);
using (var fs = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (var stream = entry.Open())
{
fs.CopyTo(stream);
}
}
}
}
I had the exact same problem. The workaround is to copy the folder you are zipping to another folder and point CreateFromDirectory there. Don't ask me why this works but it does.
Directory.CreateDirectory(<new directory path>);
File.Copy(<copy contents into new folder>);
ZipFile.CreateFromDirectory(<new folder path>, <zipPath>);
The other answers, provide the correct reason, but I had a little problem in understanding them at the first sight.
If the path of the Zip file that is being created, is the same as the path that is given to the ZipFile.CreateFromDirectory, the ZipFile creates the desired zip file and starts adding the files from the directory to it. And will Eventually, try to add the desired zip file in the zip as well, as it is in the same directory. This is just not possible and not required, because the desired zipfile is being used by CreateFromDirectory method.
If you're getting this error because NLog is locking your log files, you can use the following workaround. Add 'keepFileOpen' attribute to your nlog tag inside NLog.config and set it to false:
<nlog xmlns=.......
keepFileOpen="false"
....>
More details here.
Note that this setting will have negative performance on NLog logging as indicated here.

Generating a zipOutputstream from a folder having other .zip files in it

I am trying to convert an entire azure blob storage folder and its contents to a zip file .Inside this folder ,I have different types of files eg, .txt,.mp3,.zip files .But once the folder is converted to zip file I noticed that all the .zip file types got corrupted,.How can I prevent my zip files from corrupted. I am using Ionic.Zip library to generate zip files
Here is the code I am using .Here I am able to generate and download the zip file successfully with all other filetypes except the inner zip files.
var allFiles = directory.ListBlobs(new BlobRequestOptions { UseFlatBlobListing = true }).Where(x => x.GetType() == typeof(CloudBlockBlob)).Cast<CloudBlob>();
string xyzblob = directory.Uri.ToString().TrimEnd('/');
var dBlob = blobClient.GetBlobReference(xyzblob);
byte[] fileBytes = null;
fileBytes = dBlob.DownloadByteArray();
foreach (var file in allFiles)
{
using (var fileStream = new MemoryStream(fileBytes))
{
var entryName = file.Uri.ToString().Replace(directory.Uri.ToString(), "");
zipOutputStream.PutNextEntry(entryName);
fileStream.Seek(0, SeekOrigin.Begin);
int count = fileStream.Read(fileBytes, 0, fileBytes.Length);
while (count > 0)
{
zipOutputStream.Write(fileBytes, 0, count);
count = fileStream.Read(fileBytes, 0, fileBytes.Length);
if (!Response.IsClientConnected)
{
break;
}
Response.Flush();
}
fileStream.Close();
}
}
zipOutputStream.Close();
More details
I am downloading a folder ,."myFolder" and its contents from azure blob as a zip file eg, myfolders.zip.
Here is how the file structure inside "myFolder" /azure blob
MyFolder/mymusic/ test.mp3
MyFolder/mytext/ newtext.txt
MyFolder/MyZipfiles/ myzip.zip
My code I posted above will generate a zip all the contents of the folder to create "MyFolder.zip" and will download automatically .Now if you unzip "MyFolder.zip" file , due to some reason , the myzip.zip is getting corrupted.If I try to open myzip.zip file ,its showing a message "windows cannot open the folder ,the compressed zipped folder "myzip.zip" is invalid"
Please help me find a solution so that the .zip files wont get corrupted
I tried to download to stream ,but same results.,The inner zip files are getting corrupted.all other file types are in good shape.
zipOutputStream.PutNextEntry(entryName);
destBlob.DownloadToStream(zipOutputStream);
I am assuming you already tried downloading one of those zip files and opening it, right?
If that is the case, one thing I would suggest is to eliminate the intermediate fileBytes array completely. Using fileBytes as the buffer to fileStream and then reading from fileStream to fileBytes might be the culprit. On the other hand, you start from offset 0 and write to the beginning of fileBytes anyway, so it might be working just fine.
In any case, a more efficient solution is; you can call PutNextEntry and then call the blob object's DownloadToStream method by passing in the zip stream itself. That would simply copy the entire blob directly into the zip stream without having to manage an intermediate buffer.
When it starts to pick the .zip file ,I added BlobReference to .zip file and this resolved the issue
dBlob = blobClient.GetBlobReference(entryName.EndsWith(".zip") ? file.Uri.ToString() : xyzblob);
zipOutputStream.PutNextEntry(entryName);
dBlob.DownloadToStream(zipOutputStream);

The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.(.exe file)

I want to extract a exe file. The exe file contain some files and folders. When I try to extract the file using winrar it gets extracted but when I am trying to extract the exe file using some examples I am getting this error:
The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
I have used some samples and googled a lot for my problem but didn't get my answer, and I have used some libraries also.
I used this code but same error:
public static void Decompress(FileInfo fi)
{
// Get the stream of the source file.
using (FileStream inFile = fi.OpenRead())
{
// Get original file extension, for example
// "doc" from report.doc.gz.
string curFile = fi.FullName;
string origName = curFile.Remove(curFile.Length -
fi.Extension.Length);
//Create the decompressed file.
using (FileStream outFile = File.Create(origName))
{
using (GZipStream Decompress = new GZipStream(inFile,
CompressionMode.Decompress))
{
// Copy the decompression stream
// into the output file.
Decompress.CopyTo(outFile);
Console.WriteLine("Decompressed: {0}", fi.Name);
}
}
}
}
That's because the .exe file is a self-extracting archive...
You should give DotNetZip a try. From the project's FAQ:
Does this library read self-extracting zip files?
Yes. DotNetZip can read self-extracting archives (SFX) generated by WinZip, and WinZip
can read SFX files generated by DotNetZip.
You can install it from Nuget easily.

Gzip a directory that has subdirectories using GZipStream class with C#?

This MSDN site has an example to gzip a file. Then, how can I gzip a whole directory with sub directories in it?
Since gzip only works on files, I suggest you tar your directory and then gzip the generated tar file.
You can use tar-cs or SharpZipLib to generate your tar file.
You can't !
GZip was created for file, not directories :)
gzip operates on a simgle stream. To create a multi-stream (multi-file) archive using the gzipstream you need to include your own index. Basicly, at its simplest you would write the file offsets to the beginning of the output stream and then when you read it back in you know where the boundaries are. This method would not be PKZIP compatible. To be compatible you would have to read and implement the ZIP format... or use something like SharpZip, or Zip.NET
You can zip the directory in pure .NET 3.0. Using SharpZipLib may not be desirable due to the modified GPL license.
First, you will need a reference to WindowsBase.dll.
This code will open or create a zip file, create a directory inside, and place the file in that directory. If you want to zip a folder, possibly containing sub-directories, you could loop through the files in the directory and call this method for each file. Then, you could depth-first search the sub-directories for files, call the method for each of those and pass in the path to create that hierarchy within the zip file.
public void AddFileToZip(string zipFilename, string fileToAdd, string destDir)
{
using (Package zip = System.IO.Packaging.Package.Open(zipFilename, FileMode.OpenOrCreate))
{
string destFilename = "." + destDir + "\\" + Path.GetFileName(fileToAdd);
Uri uri = PackUriHelper.CreatePartUri(new Uri(destFilename, UriKind.Relative));
if (zip.PartExists(uri))
{
zip.DeletePart(uri);
}
PackagePart part = zip.CreatePart(uri, "", CompressionOption.Normal);
using (FileStream fileStream = new FileStream(fileToAdd, FileMode.Open, FileAccess.Read))
{
using (Stream dest = part.GetStream())
{
CopyStream(fileStream, dest);
}
}
}
}
destDir could be an empty string, which would place the file directly in the zip.
Sources:
https://weblogs.asp.net/jongalloway/creating-zip-archives-in-net-without-an-external-library-like-sharpziplib
https://weblogs.asp.net/albertpascual/creating-a-folder-inside-the-zip-file-with-system-io-packaging

Categories

Resources