Why is compressed then uncompressed stream of different length - c#

I'm using the SevenZipSharp library to compress and then uncompress a MemoryStream which contains a simple serialized object. However, the compressed and decompressed streams are of different length.
From the code run below I get
Input length: 174
Output length: 338
(the SevenZipSharp dll is included as a reference and the 7z.dll is included in the project output)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;
namespace DataTransmission {
class Program {
static void Main(string[] args)
{
SevenZip.SevenZipCompressor compressor = new SevenZip.SevenZipCompressor();
//compressor.CompressionMethod = SevenZip.CompressionMethod.Lzma2;
//compressor.CompressionLevel = SevenZip.CompressionLevel.Normal;
MemoryStream inputStream = new MemoryStream();
Person me = new Person("John", "Smith");
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(inputStream, me);
Int32 inputStreamLength = (Int32)inputStream.Length;
MemoryStream outputStream = new MemoryStream();
compressor.CompressStream(inputStream, outputStream);
SevenZip.SevenZipExtractor decompressor = new SevenZip.SevenZipExtractor(outputStream);
decompressor.ExtractFile(0, outputStream);
Int32 outputStreamLength = (Int32)outputStream.Length;
Console.WriteLine("Input length: {0}", inputStreamLength);
Console.WriteLine("Output length: {0}", outputStreamLength);
Console.ReadLine();
}
}
[Serializable]
public class Person {
public string firstName;
public string lastName;
public Person(string fname, string lname) {
firstName = fname;
lastName = lname;
}
}
}
Can anyone help me with why this may be?
Thanks,

You've decompressed into outputStream despite that already containing data. You should use a new MemoryStream for the output.
(In fact, it's very odd because the decompressor is reading from outputStream and also writing to outputStream. Bad idea. Use two different streams.)
You should also rewind each stream after you've written to it and before something else wants to read it, e.g. with
inputStream.Position = 0;
It's possible that SevenZipLib is doing that for you in this case, but in general if you want something to act from the start of the stream, you should reset it appropriately.
I've just made the following change to your code, at which point I get the same length for input and output:
MemoryStream targetStream = new MemoryStream();
decompressor.ExtractFile(0, targetStream);
Int32 outputStreamLength = (Int32)targetStream.Length;
As I say, you should make the appropriate other changes too.

However, the compressed and decompressed streams are of different length
That is the whole purpose of compression ...
Look at this piece of the code:
SevenZip.SevenZipExtractor decompressor =
new SevenZip.SevenZipExtractor(outputStream);
decompressor.ExtractFile(0, outputStream);
You are decompressing from outputStream to outputStream. It will probably fail with larger data. Make changes so that it reads
SevenZip.SevenZipExtractor decompressor =
new SevenZip.SevenZipExtractor(compressedStream);
decompressor.ExtractFile(0, outputStream);

Related

Decompress Stream to String using SevenZipSharp

I'd like to compress a string using SevenZipSharp and have cobbled together a C# console application (I'm new to C#) using the following code, (bits and pieces of which came from similar questions here on SO).
The compress part seems to work (albeit I'm passing in a file instead of a string), output of the compressed string to the console looks like gibberish but I'm stuck on the decompress...
I'm trying to do the same thing as here (I think):
https://stackoverflow.com/a/4305399/3451115
https://stackoverflow.com/a/45861659/3451115
https://stackoverflow.com/a/36331690/3451115
Appreciate any help, ideally the console will display the compressed string followed by the decompressed string.
Thanks :)
using System;
using System.IO;
using SevenZip;
namespace _7ZipWrapper
{
public class Program
{
public static void Main()
{
SevenZipCompressor.SetLibraryPath(#"C:\Temp\7za64.dll");
SevenZipCompressor compressor = new SevenZipCompressor();
compressor.CompressionMethod = CompressionMethod.Ppmd;
compressor.CompressionLevel = SevenZip.CompressionLevel.Ultra;
compressor.ScanOnlyWritable = true;
var compStream = new MemoryStream();
var decompStream = new MemoryStream();
compressor.CompressFiles(compStream, #"C:\Temp\a.txt");
StreamReader readerC = new StreamReader(compStream);
Console.WriteLine(readerC.ReadToEnd());
Console.ReadKey();
// works up to here... below here output to consol is: ""
SevenZipExtractor extractor = new SevenZip.SevenZipExtractor(compStream);
extractor.ExtractFile(0, decompStream);
StreamReader readerD = new StreamReader(decompStream);
Console.WriteLine(readerD.ReadToEnd());
Console.ReadKey();
}
}
}
The result of compression is binary data - it isn't a string. If you try to read it as a string, you'll just see garbage. That's to be expected - you shouldn't be treating it as a string.
The next problem is that you're trying to read from compStream twice, without "rewinding" it first. You're starting from the end of the stream, which means there's no data for it to decompress. If you just add:
compStream.Position = 0;
before you create the extractor, you may well find it works immediately. You may also need to rewind the decompStream before reading from it. So you'd have code like this:
// Rewind to the start of the stream before decompressing
compStream.Position = 0;
SevenZipExtractor extractor = new SevenZip.SevenZipExtractor(compStream);
extractor.ExtractFile(0, decompStream);
// Rewind to the start of the decompressed stream before reading
decompStream.Position = 0;

Inconsistent file size change while writing bytes from stream to a file

I have a file with size 10124, I am adding a byte array, which has length 4 in the beginning of the file.
After that the file size should become 10128, but as I write it to file, the size decreased to 22 bytes. I don't know where is the problem
public void AppendAllBytes(string path, byte[] bytes)
{
var encryptedFile = new FileStream(path, FileMode.Open, FileAccess.Read);
////argument-checking here.
Stream header = new MemoryStream(bytes);
var result = new MemoryStream();
header.CopyTo(result);
encryptedFile.CopyTo(result);
using (var writer = new StreamWriter(#"C:\\Users\\life.monkey\\Desktop\\B\\New folder (2)\\aaaaaaaaaaaaaaaaaaaaaaaaaaa.docx.aef"))
{
writer.Write(result);
}
}
How can I write bytes to the file?
The issue seems to be caused by:
using a StreamWriter to write binary formatted data. The name does not inthuitively suggest this, but the StreamWriter class is suited for writing textual data.
passing an entire stream instead of the actual binary data. To obtain the bytes stored in a MemoryStream, use its convenient ToArray() method.
I suggest you the following code:
public void AppendAllBytes(string path, byte[] bytes)
{
var fileName = #"C:\\Users\\life.monkey\\Desktop\\B\\New folder (2)\\aaaaaaaaaaaaaaaaaaaaaaaaaaa.docx.aef";
using (var encryptedFile = new FileStream(path, FileMode.Open, FileAccess.Read))
using (var writer = new BinaryWriter(File.Open(fileName, FileMode.Append)))
using (var result = new MemoryStream())
{
encryptedFile.CopyTo(result);
result.Flush(); // ensure header is entirely written.
// write header directly, no need to put it in a memory stream
writer.Write(bytes);
writer.Flush(); // ensure the header is written to the result stream.
writer.Write(result.ToArray());
writer.Flush(); // ensure the encryptdFile is written to the result stream.
}
}
The code above uses the BinaryWriter class which is better suited for binary data. It has a Write(byte[] bytes) method overload that is used above to write an entire array to the file. The code uses regular calls to the Flush() method that some may consider not needed, but these guarantee in general, that all the data written prior the call of the Flush() method is persisted within the stream.

How to compress a file before saving on the disk?

I want to compress a file before saving physically on the disk.
I tried using compress and decompress methods (MSDN sample code) but all methods require a file which is already physically stored on the disk.
The easiest way is to open the file as a Stream and wrap it with a compression API like GZipStream.
using (var fileStream = File.Open(theFilePath, FileMode.OpenOrCreate) {
using (var stream = new GZipStream(fileStream, CompressionMode.Compress)) {
// Write to the `stream` here and the result will be compressed
}
}
Description
You can use the GZipStream class not only with a fileName. It is possible to compress a Stream.
GZipStream Class Provides methods and properties used to compress and decompress streams.
Sample
System.IO.MemoryStream ms = new System.IO.MemoryStream();
System.IO.Compression.GZipStream sw = new System.IO.Compression.GZipStream(ms,
System.IO.Compression.CompressionMode.Compress);
// now you can save the file to disc
More Information
MSDN - GZipStream Class
Can't you use the GZipStream class? It's stream based, so you shouldn't need an on-disk file to use this class.
Which kind of data are you trying to compress?
Use MemoryStream and GZipStream.
File is an array of bytes so you can try following code according to http://www.dotnetperls.com/compress :
using System;
using System.IO;
using System.IO.Compression;
using System.Text;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
byte[] text = Encoding.ASCII.GetBytes(new string('X', 10000));
byte[] compress = Compress(text);
Console.WriteLine("Compressed");
foreach (var b in compress)
{
Console.WriteLine("{0} ", b);
}
Console.ReadKey();
}
public static byte[] Compress(byte[] raw)
{
using (var memory = new MemoryStream())
{
using (var gzip = new GZipStream(memory, CompressionMode.Compress, true))
{
gzip.Write(raw, 0, raw.Length);
}
return memory.ToArray();
}
}
}
}

JpegBitmapEncoder.Save() throws exception when writing image with metadata to MemoryStream

I am trying to set up metadata on JPG image what does not have it. You can't use in-place writer (InPlaceBitmapMetadataWriter) in this case, cuz there is no place for metadata in image.
If I use FileStream as output - everything works fine. But if I try to use MemoryStream as output - JpegBitmapEncoder.Save() throws an exception (Exception from HRESULT: 0xC0000005).
After some investigation I also found out what encoder can save image to memory stream if I supply null instead of metadata.
I've made a very simplified and short example what reproduces the problem:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Drawing;
using System.Drawing.Imaging;
using System.Windows.Media.Imaging;
namespace JpegSaveTest
{
class Program
{
public static JpegBitmapEncoder SetUpMetadataOnStream(Stream src, string title)
{
uint padding = 2048;
BitmapDecoder original;
BitmapFrame framecopy, newframe;
BitmapMetadata metadata;
JpegBitmapEncoder output = new JpegBitmapEncoder();
src.Seek(0, SeekOrigin.Begin);
original = JpegBitmapDecoder.Create(src, BitmapCreateOptions.PreservePixelFormat, BitmapCacheOption.OnLoad);
if (original.Frames[0] != null) {
framecopy = (BitmapFrame)original.Frames[0].Clone();
if (original.Frames[0].Metadata != null) metadata = original.Frames[0].Metadata.Clone() as BitmapMetadata;
else metadata = new BitmapMetadata("jpeg");
metadata.SetQuery("/app1/ifd/PaddingSchema:Padding", padding);
metadata.SetQuery("/app1/ifd/exif/PaddingSchema:Padding", padding);
metadata.SetQuery("/xmp/PaddingSchema:Padding", padding);
metadata.SetQuery("System.Title", title);
newframe = BitmapFrame.Create(framecopy, framecopy.Thumbnail, metadata, original.Frames[0].ColorContexts);
output.Frames.Add(newframe);
}
else {
Exception ex = new Exception("Image contains no frames.");
throw ex;
}
return output;
}
public static MemoryStream SetTagsInMemory(string sfname, string title)
{
Stream src, dst;
JpegBitmapEncoder output;
src = File.Open(sfname, FileMode.Open, FileAccess.Read, FileShare.Read);
output = SetUpMetadataOnStream(src, title);
dst = new MemoryStream();
output.Save(dst);
src.Close();
return (MemoryStream)dst;
}
static void Main(string[] args)
{
string filename = "Z:\\dotnet\\gnom4.jpg";
MemoryStream s;
s = SetTagsInMemory(filename, "test title");
}
}
}
It is simple console application.
To run it, replace filename variable content with path to any .jpg file without metadata (or use mine).
Ofc I can just save image to temporary file first, close it, then open and copy to MemoryStream, but its too dirty and slow workaround.
Any ideas about getting this working are welcome :)
In case someone will encounter same issue, here is the solution:
If you try to .Save() jpeg from main application thread, add [STAThread] before Main().
If not, call .SetApartmentState(ApartmentState.STA) for the thread calling JpegBitmapEncoder.Save()
WinXP and WinVista versions of windowscodecs.dll are not reenterable, so if you will use default MTA model (it is default since .NET framework 2.0) for threads calling JpegBitmapEncoder.Save() function, it can behave strangely and throw described exception.
Win7 version of windowscodecs.dll does not have this issue.
I ran your code without modifications and it didn't throw an error.
I even tried saving the modified data to disk and the image itself was uncorrupted.
string filename = "e:\\a.jpg";
MemoryStream s;
s = SetTagsInMemory(filename, "test title");
FileStream fs = new FileStream("e:\\b.jpg", FileMode.CreateNew, FileAccess.ReadWrite);
BinaryWriter sw = new BinaryWriter(fs);
s.Seek(0, SeekOrigin.Begin);
while (s.Position < s.Length)
{
byte[] data = new byte[4096];
s.Read(data, 0, data.Length);
sw.Write(data);
}
sw.Flush();
sw.Close();
fs.Close();
Other than what I added below s = SetTagsInMemory(...) to write to disk, the rest of your code is unmodifed.
Edit: oh and the metadeta definatly ended up in the new file, previous one didn't have any metadata from what I could see.

Create Zip archive from multiple in memory files in C#

Is there a way to create a Zip archive that contains multiple files, when the files are currently in memory? The files I want to save are really just text only and are stored in a string class in my application. But I would like to save multiple files in a single self-contained archive. They can all be in the root of the archive.
It would be nice to be able to do this using SharpZipLib.
Use ZipEntry and PutNextEntry() for this. The following shows how to do it for a file, but for an in-memory object just use a MemoryStream
FileStream fZip = File.Create(compressedOutputFile);
ZipOutputStream zipOStream = new ZipOutputStream(fZip);
foreach (FileInfo fi in allfiles)
{
ZipEntry entry = new ZipEntry((fi.Name));
zipOStream.PutNextEntry(entry);
FileStream fs = File.OpenRead(fi.FullName);
try
{
byte[] transferBuffer[1024];
do
{
bytesRead = fs.Read(transferBuffer, 0, transferBuffer.Length);
zipOStream.Write(transferBuffer, 0, bytesRead);
}
while (bytesRead > 0);
}
finally
{
fs.Close();
}
}
zipOStream.Finish();
zipOStream.Close();
Using SharpZipLib for this seems pretty complicated. This is so much easier in DotNetZip. In v1.9, the code looks like this:
using (ZipFile zip = new ZipFile())
{
zip.AddEntry("Readme.txt", stringContent1);
zip.AddEntry("readings/Data.csv", stringContent2);
zip.AddEntry("readings/Index.xml", stringContent3);
zip.Save("Archive1.zip");
}
The code above assumes stringContent{1,2,3} contains the data to be stored in the files (or entries) in the zip archive. The first entry is "Readme.txt" and it is stored in the top level "Directory" in the zip archive. The next two entries are stored in the "readings" directory in the zip archive.
The strings are encoded in the default encoding. There is an overload of AddEntry(), not shown here, that allows you to explicitly specify the encoding to use.
If you have the content in a stream or byte array, not a string, there are overloads for AddEntry() that accept those types. There are also overloads that accept a Write delegate, a method of yours that is invoked to write data into the zip. This works for easily saving a DataSet into a zip file, for example.
DotNetZip is free and open source.
This function should create a byte array from a stream of data: I've created a simple interface for handling files for simplicity
public interface IHasDocumentProperties
{
byte[] Content { get; set; }
string Name { get; set; }
}
public void CreateZipFileContent(string filePath, IEnumerable<IHasDocumentProperties> fileInfos)
{
using (var memoryStream = new MemoryStream())
{
using (var zipArchive = new ZipArchive(memoryStream, ZipArchiveMode.Create, true))
{
foreach(var fileInfo in fileInfos)
{
var entry = zipArchive.CreateEntry(fileInfo.Name);
using (var entryStream = entry.Open())
{
entryStream.Write(fileInfo.Content, 0, fileInfo.Content.Length);
}
}
}
using (var fileStream = new FileStream(filePath, FileMode.OpenOrCreate, System.IO.FileAccess.Write))
{
memoryStream.Position = 0;
memoryStream.CopyTo(fileStream);
}
}
}
Yes, you can use SharpZipLib to do this - when you need to supply a stream to write to, use a MemoryStream.
I come across this problem, using the MSDN example I created this class:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO.Packaging;
using System.IO;
public class ZipSticle
{
Package package;
public ZipSticle(Stream s)
{
package = ZipPackage.Open(s, FileMode.Create);
}
public void Add(Stream stream, string Name)
{
Uri partUriDocument = PackUriHelper.CreatePartUri(new Uri(Name, UriKind.Relative));
PackagePart packagePartDocument = package.CreatePart(partUriDocument, "");
CopyStream(stream, packagePartDocument.GetStream());
stream.Close();
}
private static void CopyStream(Stream source, Stream target)
{
const int bufSize = 0x1000;
byte[] buf = new byte[bufSize];
int bytesRead = 0;
while ((bytesRead = source.Read(buf, 0, bufSize)) > 0)
target.Write(buf, 0, bytesRead);
}
public void Close()
{
package.Close();
}
}
You can then use it like this:
FileStream str = File.Open("MyAwesomeZip.zip", FileMode.Create);
ZipSticle zip = new ZipSticle(str);
zip.Add(File.OpenRead("C:/Users/C0BRA/SimpleFile.txt"), "Some directory/SimpleFile.txt");
zip.Add(File.OpenRead("C:/Users/C0BRA/Hurp.derp"), "hurp.Derp");
zip.Close();
str.Close();
You can pass a MemoryStream (or any Stream) to ZipSticle.Add such as:
FileStream str = File.Open("MyAwesomeZip.zip", FileMode.Create);
ZipSticle zip = new ZipSticle(str);
byte[] fileinmem = new byte[1000];
// Do stuff to FileInMemory
MemoryStream memstr = new MemoryStream(fileinmem);
zip.Add(memstr, "Some directory/SimpleFile.txt");
memstr.Close();
zip.Close();
str.Close();
Note this answer is outdated; since .Net 4.5, the ZipArchive class allows zipping files in-memory. See johnny 5's answer below for how to use it.
You could also do it a bit differently, using a Serializable object to store all strings
[Serializable]
public class MyStrings {
public string Foo { get; set; }
public string Bar { get; set; }
}
Then, you could serialize it into a stream to save it.
To save on space you could use GZipStream (From System.IO.Compression) to compress it. (note: GZip is stream compression, not an archive of multiple files).
That is, of course if what you need is actually to save data, and not zip a few files in a specific format for other software.
Also, this would allow you to save many more types of data except strings.
I was utilizing Cheeso's answer by adding MemoryStreams as the source of the different Excel files. When I downloaded the zip, the files had nothing in them. This could be the way we were getting around trying to create and download a file over AJAX.
To get the contents of the different Excel files to be included in the Zip, I had to add each of the files as a byte[].
using (var memoryStream = new MemoryStream())
using (var zip = new ZipFile())
{
zip.AddEntry("Excel File 1.xlsx", excelFileStream1.ToArray());
zip.AddEntry("Excel File 2.xlsx", excelFileStream2.ToArray());
// Keep the file off of disk, and in memory.
zip.Save(memoryStream);
}
Use a StringReader to read from your string objects and expose them as Stream s.
That should make it easy to feed them to your zip-building code.

Categories

Resources