DotNetZip How to fix 0000000 CRC32 issue?

DotNetZip How to fix 0000000 CRC32 issue? - c#

I'm using DotNetZip to add multiple MemoryStreams to a single archive. So far, my code works when I select 1 or 2 files, but does not work if I add more. I found the difference is the CRC32 are all 00000000 for those bad archive. Is it something about the archive size? Any help is appreciated!
My code in C#:
foreach(.....){
var zipEntryName=.....//Get the file name in string;
var UDocument = .....//Get a object
var UStream = UDocument .GetStream();
UStream.Seek(0, SeekOrigin.Begin);
ZipEntry entry = zipFile.AddEntry(zipEntryName,UStream );
}
var outputStream = new MemoryStream();
outputStream.Seek(0, SeekOrigin.Begin);
zipFile.Save(outputStream);
outputStream.Flush();
return outputStream;

I think its beacuse of memory leakage.
you are creating object in foreach loop and here the problem comes if the loop iterates more times.
here the problem comes in your code:
var UDocument = .....//Get a object
a singleton is a class that can be instantiated once, and only once.
use singleton class as below:
public static SingletonSample InstanceCreation()
{
private static object lockingObject = new object();
if(singletonObject == null)
{
lock (lockingObject)
{
singletonObject = new SingletonSample();
}
}
return singletonObject;
}

Related

Decompressing gzipped ReadOnlyMemory<byte> before I do JsonDocument.Parse

The websocket client is returning a ReadOnlyMemory<byte>.
The issue is that JsonDocument.Parse fails due to the fact that the buffer has been compressed. I've got to decompress it somehow before I parse it. How do I do that? I cannot really change the websocket library code.
What I want is something like public Func<ReadOnlyMemory<byte>> DataInterpreterBytes = () => which optionally decompresses these bytes out of this class. How do I do that? Is it possible to decompress ReadOnlyMemory<byte> and if the handler is unused to basically to do nothing.
private static string DecompressData(byte[] byteData)
{
using var decompressedStream = new MemoryStream();
using var compressedStream = new MemoryStream(byteData);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
deflateStream.CopyTo(decompressedStream);
decompressedStream.Position = 0;
using var streamReader = new StreamReader(decompressedStream);
return streamReader.ReadToEnd();
}
Snippet
private void OnMessageReceived(object? sender, MessageReceivedEventArgs e)
{
var timestamp = DateTime.UtcNow;
_logger.LogTrace("Message was received. {Message}", Encoding.UTF8.GetString(e.Message.Buffer.Span));
// We dispose that object later on
using var document = JsonDocument.Parse(e.Message.Buffer);
var tokenData = document.RootElement;

So, if you had a byte array, you'd do this:
private static JsonDocument DecompressData(byte[] byteData)
{
using var compressedStream = new MemoryStream(byteData);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
return JsonDocument.Parse(deflateStream);
}
This is similar to your snippet above, but no need for the intermediate copy: just read straight from the GzipStream. JsonDocument.Parse also has an overload that takes a stream, so you can use that and avoid yet another useless copy.
Unfortunately, you don't have a byte array, you have a ReadOnlyMemory<byte>. There is no way out of the box to create a memory stream out of a ReadOnlyMemory<byte>. Honestly, it feels like an oversight, like they forgot to put that feature into .NET.
So here are your options instead.
The first option is to just convert the ReadOnlyMemory<byte> object to an array with ToArray():
// assuming e.Message.Buffer is a ReadOnlyMemory<byte>
using var document = DecompressData(e.Message.Buffer.ToArray());
This is really straightforward, but remember it actually copies the data, so for large documents it might not be a good idea if you want to avoid using too much memory.
The second is to try and extract the underlying array from the memory. This can be achieved with MemoryMarshal.TryGetArray, which gives you an ArraySegment (but might fail if the memory isn't actually a managed array).
private static JsonDocument DecompressData(ReadOnlyMemory<byte> byteData)
{
if(MemoryMarshal.TryGetArray(byteData, out var segment))
{
using var compressedStream = new MemoryStream(segment.Array, segment.Offset, segment.Count);
// rest of the code goes here
}
else
{
// Welp, this memory isn't actually an array, so... tough luck?
}
}
The third way might feel dirty, but if you're okay with using unsafe code, you can just pin the memory's span and then use UnmanagedMemoryStream:
private static unsafe JsonDocument DecompressData(ReadOnlyMemory<byte> byteData)
{
fixed (byte* ptr = byteData.Span)
{
using var compressedStream = new UnmanagedMemoryStream(ptr, byteData.Length);
using var deflateStream = new GZipStream(compressedStream, CompressionMode.Decompress);
return JsonDocument.Parse(deflateStream);
}
}
The other solution is to write your own Stream class that supports this. The Windows Community Toolkit has an extension method that returns a Stream wrapper around the memory object. If you're not okay with using an entire third party library just for that, you can probably just roll your own, it's not that much code.

Zip within a zip opens to undocumented System.IO.Compression.SubReadStream

I have a function I use for aggregating streams from a zip archive.
private void ExtractMiscellaneousFiles()
{
foreach (var miscellaneousFileName in _fileData.MiscellaneousFileNames)
{
var fileEntry = _archive.GetEntry(miscellaneousFileName);
if (fileEntry == null)
{
throw new ZipArchiveMissingFileException("Couldn't find " + miscellaneousFileName);
}
var stream = fileEntry.Open();
OtherFileStreams.Add(miscellaneousFileName, (DeflateStream) stream);
}
}
This works well in most cases. However, if I have a zip within a zip, I get an excpetion on casting the stream to a DeflateStream:
System.InvalidCastException: Unable to cast object of type 'System.IO.Compression.SubReadStream' to type 'System.IO.Compression.DeflateStream'.
I am unable to find Microsoft documentation for a SubReadStream. I would like my zip within a zip as a DeflateStream. Is this possible? If so how?
UPDATE
Still no success. I attempted #Sunshine's suggestion of copying the stream using the following code:
private void ExtractMiscellaneousFiles()
{
_logger.Log("Extracting misc files...");
foreach (var miscellaneousFileName in _fileData.MiscellaneousFileNames)
{
_logger.Log($"Opening misc file stream for {miscellaneousFileName}");
var fileEntry = _archive.GetEntry(miscellaneousFileName);
if (fileEntry == null)
{
throw new ZipArchiveMissingFileException("Couldn't find " + miscellaneousFileName);
}
var openStream = fileEntry.Open();
var deflateStream = openStream;
if (!(deflateStream is DeflateStream))
{
var memoryStream = new MemoryStream();
deflateStream.CopyTo(memoryStream);
memoryStream.Position = 0;
deflateStream = new DeflateStream(memoryStream, CompressionLevel.NoCompression, true);
}
OtherFileStreams.Add(miscellaneousFileName, (DeflateStream)deflateStream);
}
}
But I get a
System.NotSupportedException: Stream does not support reading.
I inspected deflateStream.CanRead and it is true.
I've discovered this happens not just on zips, but on files that are in the zip but are not compressed (because too small, for example). Surely there's a way to deal with this; surely someone has encountered this before. I'm opening a bounty on this question.
Here's the .NET source for SubReadStream, thanks to #Quantic.

The return type of ZipArchiveEntry.Open() is Stream. An abstract type, in practice it can be a DeflateStream (you'd be happy), a SubReadStream (boo) or a WrappedStream (boo). Woe be you if they decide to improve the class some day and use a ZopfliStream (boo). The workaround is not good, you are trying to deflate data that is not compressed (boo).
Too many boos.
Only good solution is to change the type of your OtherFileStreams member. We can't see it, smells like a List<DeflateStream>. It needs to be a List<Stream>.

So it looks like the when storing a zip file inside another zip it doesn't deflate the zip but rather just inlines the content of the zip with the rest of the files with some information that these entries are part of a sub zip file. Which makes sense because applying compression to something that is already compressed is a waste of time.
This zip file is marked as CompressionMethodValues.Stored in the archive, which causes .NET to just return the original stream it read instead to wrapping it in a DeflateStream.
Source here: https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs#L670
You could pass the stream into a ZipArchive, if it's not a DeflateStream (if you are interested in the file inside)
var stream = entry.Open();
if (!(stream is DeflateStream))
{
var subArchive = new ZipArchive(stream);
}
Or you can copy the stream to a FileStream (if you want to save it to disk)
var stream = entry.Open();
if (!(stream is DeflateStream))
{
var fs = File.Create(Path.GetTempFileName());
stream.CopyTo(fs);
fs.Close();
}
Or copy to any stream you are interested in using.
Note: This is also how .NET 4.6 behaves

MemoryStream not readable when passed as parameter

Yesterday I had a strange problem: When I wanted to pass a zip file as byte[] and read it, i got an Ionic.Zip.ZipExpception
Cannot read that as a ZipFile
public string Import(byte[] file)
{
try
{
var stream = new MemoryStream(file);
if (ZipFile.IsZipFile(stream))
{
ImportArchive(stream);
} else {
...
}
...
}
private void ImportArchive(MemoryStream stream)
{
var zip = ZipFile.Read(stream); //--> ZipException thrown
...
}
Now if I pass the byte[] as parameter and not the MemoryStream, everything works fine:
public string Import(byte[] file)
{
try
{
if (ZipFile.IsZipFile(new MemoryStream(file), true))
{
ImportArchive(file);
} else {
...
}
...
}
private void ImportArchive(byte[] file)
{
var fileStream = new MemoryStream(file);
var zip = ZipFile.Read(fileStream); //--> no exception!
...
}
Where is the difference between those two versions? Why can't the first Version of the passed MemoryStream be read?

ZipFile.IsZipFile changes the stream position - it needs to read more than one byte of data. You need to "rewind" the stream before calling ImportArchive:
stream.Position = 0;
This is not something that can be done automatically - when you pass some method a stream, it's usually assumed that you're pointing to the beginning of the relevant data. This allows you to have different data "packets" in one stream, and it means that you can use streams that aren't seekable.

Check output size using .NET XmlTextWriter

I need to generate an XML file and i need to stick as much data into it as possible BUT there is a filesize limit. So i need to keep inserting data until something says no more. How do i figure out the XML file size without repeatably writing it to file?

I agree with John Saunders. Here's some code that will basically do what he's talking about but as an XmlSerializer except as a FileStream and uses a MemoryStream as intermediate storage. It may be more effective to extend stream though.
public class PartitionedXmlSerializer<TObj>
{
private readonly int _fileSizeLimit;
public PartitionedXmlSerializer(int fileSizeLimit)
{
_fileSizeLimit = fileSizeLimit;
}
public void Serialize(string filenameBase, TObj obj)
{
using (var memoryStream = new MemoryStream())
{
// serialize the object in the memory stream
using (var xmlWriter = XmlWriter.Create(memoryStream))
new XmlSerializer(typeof(TObj))
.Serialize(xmlWriter, obj);
memoryStream.Seek(0, SeekOrigin.Begin);
var extensionFormat = GetExtensionFormat(memoryStream.Length);
var buffer = new char[_fileSizeLimit];
var i = 0;
// split the stream into files
using (var streamReader = new StreamReader(memoryStream))
{
int readLength;
while ((readLength = streamReader.Read(buffer, 0, _fileSizeLimit)) > 0)
{
var filename
= Path.ChangeExtension(filenameBase,
string.Format(extensionFormat, i++));
using (var fileStream = new StreamWriter(filename))
fileStream.Write(buffer, 0, readLength);
}
}
}
}
/// <summary>
/// Gets the a file extension formatter based on the
/// <param name="fileLength">length of the file</param>
/// and the max file length
/// </summary>
private string GetExtensionFormat(long fileLength)
{
var numFiles = fileLength / _fileSizeLimit;
var extensionLength = Math.Ceiling(Math.Log10(numFiles));
var zeros = string.Empty;
for (var j = 0; j < extensionLength; j++)
{
zeros += "0";
}
return string.Format("xml.part{{0:{0}}}", zeros);
}
}
To use it, you'd initialize it with the max file length and then serialize using the base file path and then the object.
public class MyType
{
public int MyInt;
public string MyString;
}
public void Test()
{
var myObj = new MyType { MyInt = 42,
MyString = "hello there this is my string" };
new PartitionedXmlSerializer<MyType>(2)
.Serialize("myFilename", myObj);
}
This particular example will generate an xml file partitioned into
myFilename.xml.part001
myFilename.xml.part002
myFilename.xml.part003
...
myFilename.xml.part110

In general, you cannot break XML documents at arbitrary locations, even if you close all open tags.
However, if what you need is to split an XML document over multiple files, each of no more than a certain size, then you should create your own subtype of the Stream class. This "PartitionedFileStream" class could write to a particular file, up to the size limit, then create a new file, and write to that file, up to the size limit, etc.
This would leave you with multiple files which, when concatenated, make up a valid XML document.
In the general case, closing tags will not work. Consider an XML format that must contain one element A followed by one element B. If you closed the tags after writing element A, then you do not have a valid document - you need to have written element B.
However, in the specific case of a simple site map file, it may be possible to just close the tags.

You can ask the XmlTextWriter for it's BaseStream, and check it's Position.
As the other's pointed out, you may need to reserve some headroom to properly close the Xml.

OutOfMemoryException on MemoryStream writing

I have a little sample application I was working on trying to get some of the new .Net 4.0 Parallel Extensions going (they are very nice). I'm running into a (probably really stupid) problem with an OutOfMemoryException. My main app that I'm looking to plug this sample into reads some data and lots of files, does some processing on them, and then writes them out somewhere. I was running into some issues with the files getting bigger (possibly GB's) and was concerned about memory so I wanted to parallelize things which led me down this path.
Now the below code gets an OOME on smaller files and I think I'm just missing something. It will read in 10-15 files and write them out in parellel nicely, but then it chokes on the next one. It looks like it's read and written about 650MB. A second set of eyes would be appreciated.
I'm reading into a MemorySteam from the FileStream because that is what is needed for the main application and I'm just trying to replicate that to some degree. It reads data and files from all types of places and works on them as MemoryStreams.
This is using .Net 4.0 Beta 2, VS 2010.
namespace ParellelJob
{
class Program
{
BlockingCollection<FileHolder> serviceToSolutionShare;
static void Main(string[] args)
{
Program p = new Program();
p.serviceToSolutionShare = new BlockingCollection<FileHolder>();
ServiceStage svc = new ServiceStage(ref p.serviceToSolutionShare);
SolutionStage sol = new SolutionStage(ref p.serviceToSolutionShare);
var svcTask = Task.Factory.StartNew(() => svc.Execute());
var solTask = Task.Factory.StartNew(() => sol.Execute());
while (!solTask.IsCompleted)
{
}
}
}
class ServiceStage
{
BlockingCollection<FileHolder> outputCollection;
public ServiceStage(ref BlockingCollection<FileHolder> output)
{
outputCollection = output;
}
public void Execute()
{
var di = new DirectoryInfo(#"C:\temp\testfiles");
var files = di.GetFiles();
foreach (FileInfo fi in files)
{
using (var fs = new FileStream(fi.FullName, FileMode.Open, FileAccess.Read))
{
int b;
var ms = new MemoryStream();
while ((b = fs.ReadByte()) != -1)
{
ms.WriteByte((byte)b); //OutOfMemoryException Occurs Here
}
var f = new FileHolder();
f.filename = fi.Name;
f.contents = ms;
outputCollection.TryAdd(f);
}
}
outputCollection.CompleteAdding();
}
}
class SolutionStage
{
BlockingCollection<FileHolder> inputCollection;
public SolutionStage(ref BlockingCollection<FileHolder> input)
{
inputCollection = input;
}
public void Execute()
{
FileHolder current;
while (!inputCollection.IsCompleted)
{
if (inputCollection.TryTake(out current))
{
using (var fs = new FileStream(String.Format(#"c:\temp\parellel\{0}", current.filename), FileMode.OpenOrCreate, FileAccess.Write))
{
using (MemoryStream ms = (MemoryStream)current.contents)
{
ms.WriteTo(fs);
current.contents.Close();
}
}
}
}
}
}
class FileHolder
{
public string filename { get; set; }
public Stream contents { get; set; }
}
}

The main logic seems OK, but if that empty while-loop in main is literal then you are burning unnecesary CPU cycles. Better use solTask.Wait() instead.
But if individual files can run in Gigabytes, you still have the problem of holding at least 1 completely in memory, and usually 2 (1 being read, 1 being processed/written.
PS1: I just realized you don't pre-allocate the MemStream. That's bad, it will have to re-size very often for a big file, and that costs a lot of memory. Better use something like:
var ms = new MemoryStream(fs.Length);
And then, for big files, you have to consider the Large Object Heap (LOH). Are you sure you can't break a file up in segments and process them?
PS2: And you don't need the ref's on the constructor parameters, but that's not the problem.

Just looking through quickly, inside your ServiceStage.Execute method you have
var ms = new MemoryStream();
I don't see where you are closing ms out or have it in a using. You do have the using in the other class. That's one thing to check out.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

DotNetZip How to fix 0000000 CRC32 issue? - c#

Related

Decompressing gzipped ReadOnlyMemory<byte> before I do JsonDocument.Parse

Zip within a zip opens to undocumented System.IO.Compression.SubReadStream

MemoryStream not readable when passed as parameter

Check output size using .NET XmlTextWriter

OutOfMemoryException on MemoryStream writing

Categories

Resources