Zip within a zip opens to undocumented System.IO.Compression.SubReadStream - c#

I have a function I use for aggregating streams from a zip archive.
private void ExtractMiscellaneousFiles()
{
foreach (var miscellaneousFileName in _fileData.MiscellaneousFileNames)
{
var fileEntry = _archive.GetEntry(miscellaneousFileName);
if (fileEntry == null)
{
throw new ZipArchiveMissingFileException("Couldn't find " + miscellaneousFileName);
}
var stream = fileEntry.Open();
OtherFileStreams.Add(miscellaneousFileName, (DeflateStream) stream);
}
}
This works well in most cases. However, if I have a zip within a zip, I get an excpetion on casting the stream to a DeflateStream:
System.InvalidCastException: Unable to cast object of type 'System.IO.Compression.SubReadStream' to type 'System.IO.Compression.DeflateStream'.
I am unable to find Microsoft documentation for a SubReadStream. I would like my zip within a zip as a DeflateStream. Is this possible? If so how?
UPDATE
Still no success. I attempted #Sunshine's suggestion of copying the stream using the following code:
private void ExtractMiscellaneousFiles()
{
_logger.Log("Extracting misc files...");
foreach (var miscellaneousFileName in _fileData.MiscellaneousFileNames)
{
_logger.Log($"Opening misc file stream for {miscellaneousFileName}");
var fileEntry = _archive.GetEntry(miscellaneousFileName);
if (fileEntry == null)
{
throw new ZipArchiveMissingFileException("Couldn't find " + miscellaneousFileName);
}
var openStream = fileEntry.Open();
var deflateStream = openStream;
if (!(deflateStream is DeflateStream))
{
var memoryStream = new MemoryStream();
deflateStream.CopyTo(memoryStream);
memoryStream.Position = 0;
deflateStream = new DeflateStream(memoryStream, CompressionLevel.NoCompression, true);
}
OtherFileStreams.Add(miscellaneousFileName, (DeflateStream)deflateStream);
}
}
But I get a
System.NotSupportedException: Stream does not support reading.
I inspected deflateStream.CanRead and it is true.
I've discovered this happens not just on zips, but on files that are in the zip but are not compressed (because too small, for example). Surely there's a way to deal with this; surely someone has encountered this before. I'm opening a bounty on this question.
Here's the .NET source for SubReadStream, thanks to #Quantic.

The return type of ZipArchiveEntry.Open() is Stream. An abstract type, in practice it can be a DeflateStream (you'd be happy), a SubReadStream (boo) or a WrappedStream (boo). Woe be you if they decide to improve the class some day and use a ZopfliStream (boo). The workaround is not good, you are trying to deflate data that is not compressed (boo).
Too many boos.
Only good solution is to change the type of your OtherFileStreams member. We can't see it, smells like a List<DeflateStream>. It needs to be a List<Stream>.

So it looks like the when storing a zip file inside another zip it doesn't deflate the zip but rather just inlines the content of the zip with the rest of the files with some information that these entries are part of a sub zip file. Which makes sense because applying compression to something that is already compressed is a waste of time.
This zip file is marked as CompressionMethodValues.Stored in the archive, which causes .NET to just return the original stream it read instead to wrapping it in a DeflateStream.
Source here: https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs#L670
You could pass the stream into a ZipArchive, if it's not a DeflateStream (if you are interested in the file inside)
var stream = entry.Open();
if (!(stream is DeflateStream))
{
var subArchive = new ZipArchive(stream);
}
Or you can copy the stream to a FileStream (if you want to save it to disk)
var stream = entry.Open();
if (!(stream is DeflateStream))
{
var fs = File.Create(Path.GetTempFileName());
stream.CopyTo(fs);
fs.Close();
}
Or copy to any stream you are interested in using.
Note: This is also how .NET 4.6 behaves

Related

Zipping Two File with Same Content and Encoding them to base64 giving different response

I need to encode the zip file in base64 formats.
I followed the following approach
string text = File.ReadAllText("../../../SampleDat.dat");
byte[] compress0 = Compress(stringbyte);
string short_com0 = base64_encode(compress0);
public static byte[] Compress(byte[] data)
{
using (var compressedStream = new MemoryStream())
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Compress))
{
zipStream.Write(data, 0, data.Length);
zipStream.Close();
return compressedStream.ToArray();
}
}
public string base64_encode(byte[] data)
{
if (data == null)
throw new ArgumentNullException("data");
return Convert.ToBase64String(data);
}
After using this I got this encoded string.
H4sIAAAAAAAEAJVQTU/CQBS8m/gfejHRgxQpoJJ4qGXBKlBsq6KXph8P2NjdrbuLleT9eBe/QvSgHt7hTWYmMzMmsdt3Yxe9lBe0SDVcisytqpLmqaaCkxctU5/PBQ5GZNabkjAxFwWThPhxQgYDNJd4bkyGQXifeEGfYKoUKMWA60nKYP+n5mwCTKyksjxJNUiaHmxpolzIf4tuZPk3iWcaLoRce6IAJPP5iHLwC5wC3ZSU7K30JwmjVcaoUgYynOGN38fI+OUQrZUGZrDtN6g5SAzhaUUV3dhMViwzyNey7//uzpiEQ/L74N/D46agaYZuwSinyvA0fQbLNQGVTrm2Di3CtVxbI3iGEjttXGpdqZ5t13XdyD9szLxVIxfMXlIJCkrItS2hElIrm/ICXuzH6V7rfL4oTx+CIMtY/+7aiaNZq7ZFnLfDinavZsFtBvfNpZ9HZIH4MyriUctpd7rHJ6dNvPDGDX88HaFz3MGO02w6r7wgTAN2AgAA
When I created zip manually and read file in the code and compress that file
//file zipped manually
string filePath1 = "../../../git_only/oraclehcm1/dbscripts/SampleDat.zip";
byte[] physicalfile1 = File.ReadAllBytes(filePath1);
string long_com1 = base64_encode(physicalfile1);
The response I get is
UEsDBBQAAAAIAECDYlK8IEwDbAEAAHYCAAANAAAAU2FtcGxlRGF0LmRhdJVQTU/CQBS8m/gfejHRgxQpqJB4qGXBKlBsq6KXph8P2NjdrbuLleT9eBc/4tdBPbzDvMxMZmZMYrfvxi56KS9okWo4F5lbVSXNU00FJ09apj6fCxyMyKw3JWFiLgomCfHjhAwGaC7x3JgMg/A28YI+wVQpUIoB15OUwe5PzckEmFhJZXmSapA03fukiXIh/y26kuXfJJ5puBBy7YkCkMznI8rBL3AKdFNSspfS7ySMVhmjSpmX4Qyv/D5Gxi+HaK00ML/4AoOag8QQHlZU0Y3NZMUykB/LvuLtrTEJh+T3wb+Hx01B0wzdglFOleFp+giWawIqnXJt7VuEa7m2RvAIJXbauNS6Uj3bruu6kb/ZmHmrRi6YvaQSFJSQa1tCJaRWNuUFPNn3053W6XxRdu+CIMtY/+bSiaNZq7ZFnLfDih5ezILrDG6bSz+PyALxZ1TEg5bT7hweHXebeOaNG/54OkLnqIMdp9l0ngFQSwECHwAUAAAACABAg2JSvCBMA2wBAAB2AgAADQAkAAAAAAAAACAAAAAAAAAAU2FtcGxlRGF0LmRhdAoAIAAAAAAAAQAYAEMpLaJSD9cBq6mosXsP1wFNJS5xSw7XAVBLBQYAAAAAAQABAF8AAACXAQAAAAA=
This is the actual response . I also noticed the two zip are of the different size and the zip I which I created programmatically , The files in this zip have no extensions.
Please help me to create the second encoding through program and > .NET version I am using is 4.5
and I cannot use Zip.createDirectory() method due to project dependencies.
Any help is appreaciated .
Thanks in Advnance!
The first one is a gzip file, the second one is a zip file. If you want to make a zip file, try the ZipFile class as opposed to the GZipStream class.
I wouldn't expect two different Zip algorithms/libraries to yield the same output. For one, in the programmatic way, the file metadata (name, modification date, attributes) are not set, while the command line version will include all that information for unzipping purposes.
Plus libraries update at different cadence than standalones, and you might not have the fixes synchronized to reliably match the outputs.

Extract tgz file in memory and access files in C#

I have a service that downloads a *.tgz file from a remote endpoint. I use SharpZipLib to extract and write the content of that compressed archive to disk. But now I want to prevent writing the files to disk (because that process doesn't have write permissions on that disk) and keep them in memory.
How can I access the decompressed files from memory? (Let's assume the archive holds simple text files)
Here is what I have so far:
public void Decompress(byte[] byteArray)
{
Stream inStream = new MemoryStream(byteArray);
Stream gzipStream = new GZipInputStream(inStream);
TarArchive tarArchive = TarArchive.CreateInputTarArchive(gzipStream);
tarArchive.ExtractContents(#".");
tarArchive.Close();
gzipStream.Close();
inStream.Close();
}
Check this and this out.
Turns out, ExtractContents() works by iterating over TarInputStream. When you create your TarArchive like this:
TarArchive.CreateInputTarArchive(gzipStream);
it actually wraps the stream you're passing into a TarInputStream. Thus, if you want more fine-grained control over how you extract files, you must use TarInputStream directly.
See, if you can iterate over files, directories and actual file contents like this:
Stream inStream = new MemoryStream(byteArray);
Stream gzipStream = new GZipInputStream(inStream);
using (var tarInputStream = new TarInputStream(gzipStream))
{
TarEntry entry;
while ((entry = tarInputStream.GetNextEntry()) != null)
{
var fileName = entry.Name;
using (var fileContents = new MemoryStream())
{
tarInputStream.CopyEntryContents(fileContents);
// use entry, fileName or fileContents here
}
}
}

HttpPostedFileBase Stream Upload to AWS SDK Bucket has no Data

I'm testing how to upload to AWS using SDK with a sample .txt file from a web app. The file uploads to the Bucket, but the downloaded file from the bucket is just an empty Notepad document without the text from the original uploaded file. I'm new to working with streams, so I'm not sure what could be wrong here. Does anyone see why the data wouldn't be sent in the transfer request? Thanks in advance!
using (var client = new AmazonS3Client(Amazon.RegionEndpoint.USWest1))
{
//Save File to Bucket
using (FileStream txtFileStream = (FileStream)UploadedHttpFileBase.InputStream)
{
try
{
TransferUtility fileTransferUtility = new TransferUtility();
fileTransferUtility.Upload(txtFileStream, bucketLocation,
UploadedHttpFileBase.FileName);
}
catch (Exception e)
{
e.Message.ToString();
}
}
}
EDIT:
Both TransferUtility and PutObjectRequest/PutObjectResponse/AmazonS3Client.PutObject saved a blank text file. Then, after having some trouble instantiating a new FileStream, a MemoryStream used after resetting the starting position to zero still saved a blank text file. Any ideas?
New Code:
using (var client = new AmazonS3Client(Amazon.RegionEndpoint.USWest1))
{
Stream saveableStream = new MemoryStream();
using (Stream source = (Stream)UploadedHttpFileBase.InputStream)
{
source.Position = 0;
source.CopyTo(saveableStream);
}
//Save File to Bucket
try
{
PutObjectRequest request = new PutObjectRequest
{
BucketName = bucketLocation,
Key = UploadedHttpFileBase.FileName,
InputStream = saveableStream
};
PutObjectResponse response = client.PutObject(request);
}
catch (Exception e)
{
e.Message.ToString();
}
}
Most probably that TransferUtility doesn't work good with temporary upload files. Try to copy your input stream somewhere (e.g. into other not-so-temporary file, or even MemoryStream if you're sure it would not give you OutOfMemory at some point). Another thing is to get rid of TransferUtility and use low-level AmazonS3Client.PutObject with which you get finer control over Stream lifetime (do not forget that you'll need to implement some retrying as S3 API is prone to returning random temporary errors).
The answer had something to do with nesting, which is still a little beyond my understanding, and not because the code posted here was inherently wrong. This code came after an initial StreamReader which checked the first line of the text file to determine whether or not to save the file. After moving the code out from the while loop doing the ReadLines, the upload worked. Everything works as it's supposed to now that the validation is reorganized so that there's no need for the nested Stream or MemoryStream.

Can I get a GZipStream for a file without writing to intermediate temporary storage?

Can I get a GZipStream for a file on disk without writing the entire compressed content to temporary storage? I'm currently using a temporary file on disk in order to avoid possible memory exhaustion using MemoryStream on very large files (this is working fine).
public void UploadFile(string filename)
{
using (var temporaryFileStream = File.Open("tempfile.tmp", FileMode.CreateNew, FileAccess.ReadWrite))
{
using (var fileStream = File.OpenRead(filename))
using (var compressedStream = new GZipStream(temporaryFileStream, CompressionMode.Compress, true))
{
fileStream.CopyTo(compressedStream);
}
temporaryFileStream.Position = 0;
Uploader.Upload(temporaryFileStream);
}
}
What I'd like to do is eliminate the temporary storage by creating GZipStream, and have it read from the original file only as the Uploader class requests bytes from it. Is such a thing possible? How might such an implementation be structured?
Note that Upload is a static method with signature static void Upload(Stream stream).
Edit: The full code is here if it's useful. I hope I've included all the relevant context in my sample above however.
Yes, this is possible, but not easily with any of the standard .NET stream classes. When I needed to do something like this, I created a new type of stream.
It's basically a circular buffer that allows one producer (writer) and one consumer (reader). It's pretty easy to use. Let me whip up an example. In the meantime, you can adapt the example in the article.
Later: Here's an example that should come close to what you're asking for.
using (var pcStream = new ProducerConsumerStream(BufferSize))
{
// start upload in a thread
var uploadThread = new Thread(UploadThreadProc(pcStream));
uploadThread.Start();
// Open the input file and attach the gzip stream to the pcStream
using (var inputFile = File.OpenRead("inputFilename"))
{
// create gzip stream
using (var gz = new GZipStream(pcStream, CompressionMode.Compress, true))
{
var bytesRead = 0;
var buff = new byte[65536]; // 64K buffer
while ((bytesRead = inputFile.Read(buff, 0, buff.Length)) != 0)
{
gz.Write(buff, 0, bytesRead);
}
}
}
// The entire file has been compressed and copied to the buffer.
// Mark the stream as "input complete".
pcStream.CompleteAdding();
// wait for the upload thread to complete.
uploadThread.Join();
// It's very important that you don't close the pcStream before
// the uploader is done!
}
The upload thread should be pretty simple:
void UploadThreadProc(object state)
{
var pcStream = (ProducerConsumerStream)state;
Uploader.Upload(pcStream);
}
You could, of course, put the producer on a background thread and have the upload be done on the main thread. Or have them both on background threads. I'm not familiar with the semantics of your uploader, so I'll leave that decision to you.

Decompress tar files using C#

I'm searching a way to add embedded resource to my solution. This resources will be folders with a lot of files in them. On user demand they need to be decompressed.
I'm searching for a way do store such folders in executable without involving third-party libraries (Looks rather stupid, but this is the task).
I have found, that I can GZip and UnGZip them using standard libraries. But GZip handles single file only. In such cases TAR should come to the scene. But I haven't found TAR implementation among standard classes.
Maybe it possible decompress TAR with bare C#?
While looking for a quick answer to the same question, I came across this thread, and was not entirely satisfied with the current answers, as they all point to using third-party dependencies to much larger libraries, all just to achieve simple extraction of a tar.gz file to disk.
While the gz format could be considered rather complicated, tar on the other hand is quite simple. At its core, it just takes a bunch of files, prepends a 500 byte header (but takes 512 bytes) to each describing the file, and writes them all to single archive on a 512 byte alignment. There is no compression, that is typically handled by compressing the created file to a gz archive, which .NET conveniently has built-in, which takes care of all the hard part.
Having looked at the spec for the tar format, there are only really 2 values (especially on Windows) we need to pick out from the header in order to extract the file from a stream. The first is the name, and the second is size. Using those two values, we need only seek to the appropriate position in the stream and copy the bytes to a file.
I made a very rudimentary, down-and-dirty method to extract a tar archive to a directory, and added some helper functions for opening from a stream or filename, and decompressing the gz file first using built-in functions.
The primary method is this:
public static void ExtractTar(Stream stream, string outputDir)
{
var buffer = new byte[100];
while (true)
{
stream.Read(buffer, 0, 100);
var name = Encoding.ASCII.GetString(buffer).Trim('\0');
if (String.IsNullOrWhiteSpace(name))
break;
stream.Seek(24, SeekOrigin.Current);
stream.Read(buffer, 0, 12);
var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8);
stream.Seek(376L, SeekOrigin.Current);
var output = Path.Combine(outputDir, name);
if (!Directory.Exists(Path.GetDirectoryName(output)))
Directory.CreateDirectory(Path.GetDirectoryName(output));
using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
{
var buf = new byte[size];
stream.Read(buf, 0, buf.Length);
str.Write(buf, 0, buf.Length);
}
var pos = stream.Position;
var offset = 512 - (pos % 512);
if (offset == 512)
offset = 0;
stream.Seek(offset, SeekOrigin.Current);
}
}
And here is a few helper functions for opening from a file, and automating first decompressing a tar.gz file/stream before extracting.
public static void ExtractTarGz(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTarGz(stream, outputDir);
}
public static void ExtractTarGz(Stream stream, string outputDir)
{
// A GZipStream is not seekable, so copy it first to a MemoryStream
using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
{
const int chunk = 4096;
using (var memStr = new MemoryStream())
{
int read;
var buffer = new byte[chunk];
do
{
read = gzip.Read(buffer, 0, chunk);
memStr.Write(buffer, 0, read);
} while (read == chunk);
memStr.Seek(0, SeekOrigin.Begin);
ExtractTar(memStr, outputDir);
}
}
}
public static void ExtractTar(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTar(stream, outputDir);
}
Here is a gist of the full file with some comments.
Tar-cs will do the job, but it is quite slow. I would recommend using SharpCompress which is significantly quicker. It also supports other compression types and it has been updated recently.
using System;
using System.IO;
using SharpCompress.Common;
using SharpCompress.Reader;
private static String directoryPath = #"C:\Temp";
public static void unTAR(String tarFilePath)
{
using (Stream stream = File.OpenRead(tarFilePath))
{
var reader = ReaderFactory.Open(stream);
while (reader.MoveToNextEntry())
{
if (!reader.Entry.IsDirectory)
{
ExtractionOptions opt = new ExtractionOptions {
ExtractFullPath = true,
Overwrite = true
};
reader.WriteEntryToDirectory(directoryPath, opt);
}
}
}
}
See tar-cs
using (FileStream unarchFile = File.OpenRead(tarfile))
{
TarReader reader = new TarReader(unarchFile);
reader.ReadToEnd("out_dir");
}
Since you are not allowed to use outside libraries, you are not restricted to a specific format of the tar file either. In fact, they don't even need it to be all in the same file.
You can write your own tar-like utility in C# that walks a directory tree, and produces two files: a "header" file that consists of a serialized dictionary mapping System.IO.Path instances to an offset/length pairs, and a big file containing the content of individual files concatenated into one giant blob. This is not a trivial task, but it's not overly complicated either.
there are 2 ways to compress/decompress in .NET first you can use Gzipstream class and DeflatStream both can actually do compress your files in .gz format so if you compressed any file in Gzipstream it can be opened with any popular compression applications such as winzip/ winrar, 7zip but you can't open compressed file with DeflatStream. these two classes are from .NET 2.
and there is another way which is Package class it's actually same as Gzipstream and DeflatStream the only different is you can compress multiple files which then can be opened with winzip/ winrar, 7zip.so that's all .NET has. but it's not even generic .zip file,
it something Microsoft uses to compress their *x extension office files. if you decompress any docx file with package class you can see everything stored in it. so don't use .NET libraries for compressing or even decompressing cause you can't even make a generic compress file or even decompress a generic zip file. you have to consider for a third party library such as
http://www.icsharpcode.net/OpenSource/SharpZipLib/
or implement everything from the ground floor.

Categories

Resources