C++ zlib inflate failing - translation of c# fixup?

C++ zlib inflate failing - translation of c# fixup? - c#

I'm trying to inflate a string using zlib's deflate, but it's failing, apparently because it doesn't have the right header. I read elsewhere that the C# solution to this problem is:
public static byte[] FlateDecode(byte[] inp, bool strict) {
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
MemoryStream outp = new MemoryStream();
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
outp.Write(b, 0, n);
}
zip.Close();
outp.Close();
return outp.ToArray();
}
catch {
if (strict)
return null;
return outp.ToArray();
}
}
But I know nothing about C#. I can surmise that all it's doing is adding a prefix to the string, but what that prefix is, I have no idea. Would someone be able to phrase this function (or even just the header creation and string concatenation) in C++?
The data which I'm trying to inflate is taken from a PDF using zlib deflation.
Thanks a million,
Wyatt

I've had better luck using SharpZipLib for zlib interop than with the native .Net Framework classes. This correctly handles streams from C++ (zlib native) and from Java's compression classes without any funny business being needed.

I can't see any prefixes, sorry. Here's what the logic appears to be; sorry this isn't in C++:
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
Create an inflate stream from the data passed
MemoryStream outp = new MemoryStream();
Create a memory buffer stream for output
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
If you're in strict mode, read up to 4092 bytes - or 1 in non-strict mode - into a byte buffer
outp.Write(b, 0, n);
Write all the bytes decoded (may be less than the 4092) to the output memory buffer stream
zip.Close();
outp.Close();
return outp.ToArray();
Clean up, and return the output memory buffer stream as an array.
I'm a bit confused, though: why not just cut array b off at n elements and return that rather than go via a MemoryStream? The code also ought really to take care to clean up the memory streams and zip on exception (e.g. using using) since they're all IDisposable but I guess that's not really important since they don't correspond to I/O file handles, only memory structures.

Related

How to copy a large stream to another without OutOfMemoryException in C# [duplicate]

What is the best way to copy the contents of one stream to another? Is there a standard utility method for this?

From .NET 4.5 on, there is the Stream.CopyToAsync method
input.CopyToAsync(output);
This will return a Task that can be continued on when completed, like so:
await input.CopyToAsync(output)
// Code from here on will be run in a continuation.
Note that depending on where the call to CopyToAsync is made, the code that follows may or may not continue on the same thread that called it.
The SynchronizationContext that was captured when calling await will determine what thread the continuation will be executed on.
Additionally, this call (and this is an implementation detail subject to change) still sequences reads and writes (it just doesn't waste a threads blocking on I/O completion).
From .NET 4.0 on, there's is the Stream.CopyTo method
input.CopyTo(output);
For .NET 3.5 and before
There isn't anything baked into the framework to assist with this; you have to copy the content manually, like so:
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write (buffer, 0, read);
}
}
Note 1: This method will allow you to report on progress (x bytes read so far ...)
Note 2: Why use a fixed buffer size and not input.Length? Because that Length may not be available! From the docs:
If a class derived from Stream does not support seeking, calls to Length, SetLength, Position, and Seek throw a NotSupportedException.

MemoryStream has .WriteTo(outstream);
and .NET 4.0 has .CopyTo on normal stream object.
.NET 4.0:
instream.CopyTo(outstream);

I use the following extension methods. They have optimized overloads for when one stream is a MemoryStream.
public static void CopyTo(this Stream src, Stream dest)
{
int size = (src.CanSeek) ? Math.Min((int)(src.Length - src.Position), 0x2000) : 0x2000;
byte[] buffer = new byte[size];
int n;
do
{
n = src.Read(buffer, 0, buffer.Length);
dest.Write(buffer, 0, n);
} while (n != 0);
}
public static void CopyTo(this MemoryStream src, Stream dest)
{
dest.Write(src.GetBuffer(), (int)src.Position, (int)(src.Length - src.Position));
}
public static void CopyTo(this Stream src, MemoryStream dest)
{
if (src.CanSeek)
{
int pos = (int)dest.Position;
int length = (int)(src.Length - src.Position) + pos;
dest.SetLength(length);
while(pos < length)
pos += src.Read(dest.GetBuffer(), pos, length - pos);
}
else
src.CopyTo((Stream)dest);
}

.NET Framework 4 introduce new "CopyTo" method of Stream Class of System.IO namespace. Using this method we can copy one stream to another stream of different stream class.
Here is example for this.
FileStream objFileStream = File.Open(Server.MapPath("TextFile.txt"), FileMode.Open);
Response.Write(string.Format("FileStream Content length: {0}", objFileStream.Length.ToString()));
MemoryStream objMemoryStream = new MemoryStream();
// Copy File Stream to Memory Stream using CopyTo method
objFileStream.CopyTo(objMemoryStream);
Response.Write("<br/><br/>");
Response.Write(string.Format("MemoryStream Content length: {0}", objMemoryStream.Length.ToString()));
Response.Write("<br/><br/>");

There is actually, a less heavy-handed way of doing a stream copy. Take note however, that this implies that you can store the entire file in memory. Don't try and use this if you are working with files that go into the hundreds of megabytes or more, without caution.
public static void CopySmallTextStream(Stream input, Stream output)
{
using (StreamReader reader = new StreamReader(input))
using (StreamWriter writer = new StreamWriter(output))
{
writer.Write(reader.ReadToEnd());
}
}
NOTE: There may also be some issues concerning binary data and character encodings.

The basic questions that differentiate implementations of "CopyStream" are:
size of the reading buffer
size of the writes
Can we use more than one thread (writing while we are reading).
The answers to these questions result in vastly different implementations of CopyStream and are dependent on what kind of streams you have and what you are trying to optimize. The "best" implementation would even need to know what specific hardware the streams were reading and writing to.

Unfortunately, there is no really simple solution. You can try something like that:
Stream s1, s2;
byte[] buffer = new byte[4096];
int bytesRead = 0;
while (bytesRead = s1.Read(buffer, 0, buffer.Length) > 0) s2.Write(buffer, 0, bytesRead);
s1.Close(); s2.Close();
But the problem with that that different implementation of the Stream class might behave differently if there is nothing to read. A stream reading a file from a local harddrive will probably block until the read operaition has read enough data from the disk to fill the buffer and only return less data if it reaches the end of file. On the other hand, a stream reading from the network might return less data even though there are more data left to be received.
Always check the documentation of the specific stream class you are using before using a generic solution.

There may be a way to do this more efficiently, depending on what kind of stream you're working with. If you can convert one or both of your streams to a MemoryStream, you can use the GetBuffer method to work directly with a byte array representing your data. This lets you use methods like Array.CopyTo, which abstract away all the issues raised by fryguybob. You can just trust .NET to know the optimal way to copy the data.

if you want a procdure to copy a stream to other the one that nick posted is fine but it is missing the position reset, it should be
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
long TempPos = input.Position;
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
input.Position = TempPos;// or you make Position = 0 to set it at the start
}
but if it is in runtime not using a procedure you shpuld use memory stream
Stream output = new MemoryStream();
byte[] buffer = new byte[32768]; // or you specify the size you want of your buffer
long TempPos = input.Position;
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
input.Position = TempPos;// or you make Position = 0 to set it at the start

Since none of the answers have covered an asynchronous way of copying from one stream to another, here is a pattern that I've successfully used in a port forwarding application to copy data from one network stream to another. It lacks exception handling to emphasize the pattern.
const int BUFFER_SIZE = 4096;
static byte[] bufferForRead = new byte[BUFFER_SIZE];
static byte[] bufferForWrite = new byte[BUFFER_SIZE];
static Stream sourceStream = new MemoryStream();
static Stream destinationStream = new MemoryStream();
static void Main(string[] args)
{
// Initial read from source stream
sourceStream.BeginRead(bufferForRead, 0, BUFFER_SIZE, BeginReadCallback, null);
}
private static void BeginReadCallback(IAsyncResult asyncRes)
{
// Finish reading from source stream
int bytesRead = sourceStream.EndRead(asyncRes);
// Make a copy of the buffer as we'll start another read immediately
Array.Copy(bufferForRead, 0, bufferForWrite, 0, bytesRead);
// Write copied buffer to destination stream
destinationStream.BeginWrite(bufferForWrite, 0, bytesRead, BeginWriteCallback, null);
// Start the next read (looks like async recursion I guess)
sourceStream.BeginRead(bufferForRead, 0, BUFFER_SIZE, BeginReadCallback, null);
}
private static void BeginWriteCallback(IAsyncResult asyncRes)
{
// Finish writing to destination stream
destinationStream.EndWrite(asyncRes);
}

For .NET 3.5 and before try :
MemoryStream1.WriteTo(MemoryStream2);

Easy and safe - make new stream from original source:
MemoryStream source = new MemoryStream(byteArray);
MemoryStream copy = new MemoryStream(byteArray);

The following code to solve the issue copy the Stream to MemoryStream using CopyTo
Stream stream = new MemoryStream();
//any function require input the stream. In mycase to save the PDF file as stream
document.Save(stream);
MemoryStream newMs = (MemoryStream)stream;
byte[] getByte = newMs.ToArray();
//Note - please dispose the stream in the finally block instead of inside using block as it will throw an error 'Access denied as the stream is closed'

.net: efficient way to read a binary file into memory then access

I'm a novice programmer. I'm creating a library to process binary files of a certain type -- like a codec (though without a need to process a progressive stream coming over a wire). I'm looking for an efficient way to read the file into memory and then parse portions of the data as needed. In particular, I'd like to avoid large memory copies, hopefully without a lot of added complexity to avoid that.
In some situations, I want to do sequential reading of values in the data. For this, a MemoryStream works well.
FileStream fs = new FileStream(_fileName, FileMode.Open, FileAccess.Read);
byte[] bytes = new byte[fs.Length];
fs.Read(bytes, 0, bytes.Length);
_ms = new MemoryStream(bytes, 0, bytes.Length, false, true);
fs.Close();
(That involved a copy from the bytes array into the memory stream; that's one time, and I don't know of a way to avoid it.)
With the memory stream, it's easy to seek to arbitrary positions and then start reading structure members. E.g.,
_ms.Seek(_tableRecord.Offset, SeekOrigin.Begin);
byte[] ab32 = new byte[4];
_version = ConvertToUint(_ms.Read(ab32));
_numRecords = ConvertToUint(_ms.Read(ab32));
// etc.
But there may also be times when I want to take a slice out of the memory corresponding to some large structure and then pass into a method for certain processing. MemoryStream doesn't support that. I could always pass the MemoryStream plus offset and length, though that might not always be the most convenient.
Instead of MemoryStream, I could store the data in memory using Memory. That supports slicing, but not sequential reading.
If for some situation I want to get a slice (rather than pass stream & offset/length), I could construct an ArraySegment from MemoryStream.GetBuffer.
ArraySegment<byte> as = new ArraySegment<byte>(ms.GetBuffer(), offset, length);
It's not clear to me, though, if that will result in a (potentially large) copy, or if that uses a reference into the same memory held by the MemoryStream. I gather that GetBuffer exposes the underlying memory rather than providing a copy; and that ArraySegment will point into the same memory?
There will be times when I need to get a slice that is a copy as I'll need to modify some elements and then process that, but without changing the original. If ArraySegment gets a reference rather than a copy, I gather I could use ArraySegment<byte>.ToArray()?
So, my questions are:
Is MemoryStream the best approach? Is there any other type that allows sequential reading like MemoryStream but also allows slicing like Memory?
If I want a slice without copying memory, will ArraySegment<byte>(ms.GetBuffer(), offset, length) do that?
Then if I need a copy that can be modified without affecting the original, use ArraySegment<byte>.ToArray()?
Is there a way to read the data from a file directly into a new MemoryStream without creating a temporary byte array that gets copied?
Am I approaching all this the best way?

To get the initial MemoryStream from reading the file, the following works:
byte[] bytes;
try
{
// File.ReadAllBytes opens a filestream and then ensures it is closed
bytes = File.ReadAllBytes(_fi.FullName);
_ms = new MemoryStream(bytes, 0, bytes.Length, false, true);
}
catch (IOException e)
{
throw e;
}
File.ReadAllBytes() copies the file content into memory. It uses using, which means that it ensures the file gets closed. So no Finally statement is needed.
I can read individual values from the MemoryStream using MemoryStream.Read. These calls involve copies of those values, which is fine.
In one situation, I needed to read a table out of the file, change a value, and then calculate a checksum of the entire file with that change in place. Instead of copying the entire file so that I could edit one part, I was able to calculate the checksum in progressive steps: first on the initial, unchanged segment of the file, then continue with the middle segment that was changed, then continue with the remainder.
For this I could process the first and final segments using the MemoryStream. This involved lots of reads, with each read copying; but those copies were transient variables, so no significant working set increase.
For the middle segment, that needed to be copied since it had to be changed (but the original version needed to be kept intact). The following worked:
// get ref (not copy!) to the byte array underlying the MemoryStream
byte[] fileData = _ms.GetBuffer();
// determine the required length
int length = _tableRecord.Length;
// create array to hold the copy
byte[] segmentCopy = new byte[length];
// get the copy
Array.ConstrainedCopy(fileData, _tableRecord.Offset, segmentCopy, 0, length);
After modifying values in segmentCopy, I then needed to pass this to my static method for calculating checksums, which expected a MemoryStream (for sequential reading). This worked:
// new MemoryStream will hold a ref to the segmentCopy array (no new copy!)
MemoryStream ms = new MemoryStream(segmentCopy, 0, segmentCopy.Length);
What I haven't needed to do yet, but will want to do, is to get a slice of the MemoryStream that doesn't involve copying. This works:
MemoryStream sliceFromMS = new MemoryStream(fileData, offset, length);
From above, fileData was a ref to the array underlying the original MemoryStream. Now sliceFromMS will have a ref to a segment within that same array.

You can use FileStream.Seek, as I understand it, there is no need to load data into memory, then to use this method of MemoryStream
In the following example, str1 and str2 are equal:
using (var fs = new FileStream(#"C:\Users\bar_v\OneDrive\Desktop\js_balancer.txt", FileMode.Open))
{
var buffer = new byte[20];
fs.Read(buffer, 0, 20);
var str1= Encoding.ASCII.GetString(buffer);
fs.Seek(0, SeekOrigin.Begin);
fs.Read(buffer, 0, 20);
var str2 = Encoding.ASCII.GetString(buffer);
}
By the way, when you create a new MemoryStream object, you don’t copy the byte array, you just keep a reference to it:
public MemoryStream(byte[] buffer, bool writable)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer), SR.ArgumentNull_Buffer);
_buffer = buffer;
_length = _capacity = buffer.Length;
_writable = writable;
_exposable = false;
_origin = 0;
_isOpen = true;
}
But when reading, as we can see, copying occurs:
public override int Read(byte[] buffer, int offset, int count)
{
if (buffer == null)
throw new ArgumentNullException(nameof(buffer), SR.ArgumentNull_Buffer);
if (offset < 0)
throw new ArgumentOutOfRangeException(nameof(offset), SR.ArgumentOutOfRange_NeedNonNegNum);
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count), SR.ArgumentOutOfRange_NeedNonNegNum);
if (buffer.Length - offset < count)
throw new ArgumentException(SR.Argument_InvalidOffLen);
EnsureNotClosed();
int n = _length - _position;
if (n > count)
n = count;
if (n <= 0)
return 0;
Debug.Assert(_position + n >= 0, "_position + n >= 0"); // len is less than 2^31 -1.
if (n <= 8)
{
int byteCount = n;
while (--byteCount >= 0)
buffer[offset + byteCount] = _buffer[_position + byteCount];
}
else
Buffer.BlockCopy(_buffer, _position, buffer, offset, n);
_position += n;
return n;
}

Whats the most efficient way in C# to cut off the first 4 bytes of a file?

I have a compressed (LZMA) .txt file and need to decompress it, but i have to exclude the first 4 bytes as they are not part of the file content.
I load my file like this:
byte[] curFile = File.ReadAllBytes(files[i]);
Performance is critical as i have to loop trough over 14k+ files, average file size is around 4KB.

for (int i = 0; i < filePath.Length; i++)
{
var positionToSkipTo = 4;
using (var fileStream = File.OpenRead(filePath))
{
fileStream.Seek(positionToSkipTo, SeekOrigin.Begin);
var curFile = new byte[fileStream.Length - positionToSkipTo];
fileStream.Read(curFile, 0, curFile.Length);
//Do your thing
}
}
Everything is self-explanatory. Important functions are listed at MSDN FileStream class documentation.

If you're just using a byte array, you can utilize the ConstrainedCopy method in the Array class.
Array.ConstrainedCopy(unclippedArray, 4, clippedArray, 0, unclippedArray.Length - 4);
If you're not going to just be dealing with the raw bytes, utilize a memory stream and a binary reader or a filestream like other people suggested.

Base64 - CryptoStream with StreamWriter vs Convert.ToBase64String()

Following feedback from Alexei, a simplification of the question:
How do I use a buffered Stream approach to convert the contents of a CryptoStream (using ToBase64Transform) into a StreamWriter (Unicode encoding) without using Convert.ToBase64String()?
Note: Calling Convert.ToBase64String() throws OutOfMemoryException, hence the need for a buffered/Stream approach to the conversion.

You probably should implement custom Stream, not a TextWriter. It is much easier to compose streams than writers (like pass your stream to compressed stream).
To create custom stream - derive from Stream and implement at least Write and Flush (and Read if you need R/W stream). The rest is more or less optional and depends on you additional needs, regular copy to other stream does not need anything else.
In constructor get inner stream passed to you for writing to. Base64 is always producing ASCII characters, so it should be easy to write output as UTF-8 with or without BOM directly to a stream, but if you want to specify encoding you can wrap inner stream with StreamWriter internally.
In your Write implementation buffer data till you get enough bytes to have block of multiple of 3 bytes (i.e. 300) and call Convert.ToBase64String on that portion. Make sure not to loose not-yet-converted portion. Since Base64 converts 3 bytes to 4 characters converting in blocks of multiple of 3 size will never have =/== padding at the end and can be concatenated with next block. So write that converted portion into inner stream/writer. Note that you want to limit block size to something relatively small like 3*10000 to avoid allocation of your blocks on large objects heap.
In Flush make sure to convert the last unwritten bytes (this will be the only one with = padding at the end) and write it to the stream too.
For reading you may need to be more careful as in Base64 white spaces are allowed, so you can't read fixed number of characters and convert to bytes. The easiest approach would be to read by character from StreamReader and convert each 4 non-space ones to bytes.
Note: you can consider writing/reading Base64 by hand directly from bytes. It will give you some performance benefits, but may be hard if you are not good with bit shifting.

Please try using following to encrypt. I am using fileName/filePath as input. You can adjust it as per your requirement. Using this I have encrypted over 1 gb file successfully without any out of memory exception.
public bool EncryptUsingStream(string inputFileName, string outputFileName)
{
bool success = false;
// here assuming that you already have key
byte[] key = new byte[128];
SymmetricAlgorithm algorithm = SymmetricAlgorithm.Create();
algorithm.Key = key;
using (ICryptoTransform transform = algorithm.CreateEncryptor())
{
CryptoStream cs = null;
FileStream fsEncrypted = null;
try
{
using (FileStream fsInput = new FileStream(inputFileName, FileMode.Open, FileAccess.Read))
{
//First write IV
fsEncrypted = new FileStream(outputFileName, FileMode.Create, FileAccess.Write);
fsEncrypted.Write(algorithm.IV, 0, algorithm.IV.Length);
//then write using stream
cs = new CryptoStream(fsEncrypted, transform, CryptoStreamMode.Write);
int bytesRead;
int _bufferSize = 1048576; //buggersize = 1mb;
byte[] buffer = new byte[_bufferSize];
do
{
bytesRead = fsInput.Read(buffer, 0, _bufferSize);
cs.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
success = true;
}
}
catch (Exception ex)
{
//handle exception or throw.
}
finally
{
if (cs != null)
{
cs.Close();
((IDisposable)cs).Dispose();
if (fsEncrypted != null)
{
fsEncrypted.Close();
}
}
}
}
return success;
}

How to use ICSharpCode.ZipLib with stream?

I'm very sorry for the conservative title and my question itself,but I'm lost.
The samples provided with ICsharpCode.ZipLib doesn't include what I'm searching for.
I want to decompress a byte[] by putting it in InflaterInputStream(ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream)
I found a decompress function ,but it doesn't work.
public static byte[] Decompress(byte[] Bytes)
{
ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream stream =
new ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream(new MemoryStream(Bytes));
MemoryStream memory = new MemoryStream();
byte[] writeData = new byte[4096];
int size;
while (true)
{
size = stream.Read(writeData, 0, writeData.Length);
if (size > 0)
{
memory.Write(writeData, 0, size);
}
else break;
}
stream.Close();
return memory.ToArray();
}
It throws an exception at line(size = stream.Read(writeData, 0, writeData.Length);) saying it has a invalid header.
My question is not how to fix the function,this function is not provided with the library,I just found it googling.My question is,how to decompress the same way the function does with InflaterStream,but without exceptions.
Thanks and again - sorry for the conservative question.

the code in lucene is very nice.
public static byte[] Compress(byte[] input) {
// Create the compressor with highest level of compression
Deflater compressor = new Deflater();
compressor.SetLevel(Deflater.BEST_COMPRESSION);
// Give the compressor the data to compress
compressor.SetInput(input);
compressor.Finish();
/*
* Create an expandable byte array to hold the compressed data.
* You cannot use an array that's the same size as the orginal because
* there is no guarantee that the compressed data will be smaller than
* the uncompressed data.
*/
MemoryStream bos = new MemoryStream(input.Length);
// Compress the data
byte[] buf = new byte[1024];
while (!compressor.IsFinished) {
int count = compressor.Deflate(buf);
bos.Write(buf, 0, count);
}
// Get the compressed data
return bos.ToArray();
}
public static byte[] Uncompress(byte[] input) {
Inflater decompressor = new Inflater();
decompressor.SetInput(input);
// Create an expandable byte array to hold the decompressed data
MemoryStream bos = new MemoryStream(input.Length);
// Decompress the data
byte[] buf = new byte[1024];
while (!decompressor.IsFinished) {
int count = decompressor.Inflate(buf);
bos.Write(buf, 0, count);
}
// Get the decompressed data
return bos.ToArray();
}

Well it sounds like the data is just inappropriate, and that otherwise the code would work okay. (Admittedly I'd use a "using" statement for the streams instead of calling Close explicitly.)
Where did you get your data from?

Why don't you use the System.IO.Compression.DeflateStream class (available since .Net 2.0)? This uses the same compression/decompression method but doesn't require an extra library dependency.
Since .Net 2.0 you only need the ICSharpCode.ZipLib if you need the file container support.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C++ zlib inflate failing - translation of c# fixup? - c#

I've had better luck using SharpZipLib for zlib interop than with the native .Net Framework classes. This correctly handles streams from C++ (zlib native) and from Java's compression classes without any funny business being needed.

Related

How to copy a large stream to another without OutOfMemoryException in C# [duplicate]

.net: efficient way to read a binary file into memory then access

Whats the most efficient way in C# to cut off the first 4 bytes of a file?

Base64 - CryptoStream with StreamWriter vs Convert.ToBase64String()

How to use ICSharpCode.ZipLib with stream?

Categories

Resources