Why do I need two calls to stream CopyTo? - c#

I have the following method and for some reason the first call to Copy to seems to do nothing? Anyone know why?
In input to the method is compressed and base64 can supply that method to if need.
private byte[] GetFileChunk(string base64)
{
using (
MemoryStream compressedData = new MemoryStream(Convert.FromBase64String(base64), false),
uncompressedData = new MemoryStream())
{
using (GZipStream compressionStream = new GZipStream(compressedData, CompressionMode.Decompress))
{
// first copy does nothing ?? second works
compressionStream.CopyTo(uncompressedData);
compressionStream.CopyTo(uncompressedData);
}
return uncompressedData.ToArray();
}
}

If the first call to Read() returns 0 then Stream.CopyTo() isn't going to work either. While this points to a problem with GZipStream, it is very unlikely that it has a bug like this. Far more likely is that something went wrong when you created the compressed data. Like compressing 0 bytes first, followed by compressing the real data.

Just a guess, but is it because the new GZipStream constructor leaves the index at the end of the array, and the first CopyTo resets it to the start, so that when you call the second CopyTo its now at the start and copies the data properly?

How sure are you that first copy does nothing and the second works
, that would be a bug in the GZipStream class. Your code should work fine without calling CopyTo twice.

Hi thank you for everybody’s input. It turns out the error was caused by a mistake in encoding method. The method was
/// <summary>
/// Compress file data and then base64s the compressed data for safe transportation in XML.
/// </summary>
/// <returns>Base64 string of file chunk</returns>
private string GetFileChunk()
{
// MemoryStream for compression output
using (MemoryStream compressed = new MemoryStream())
{
using (GZipStream zip = new GZipStream(compressed, CompressionMode.Compress))
{
// read chunk from file
byte[] plaintext = new byte[this.readSize];
int read = this.file.Read(plaintext, 0, plaintext.Length);
// write chunk to compreesion
zip.Write(plaintext, 0, read);
plaintext = null;
// Base64 compressed data
return Convert.ToBase64String(compressed.ToArray());
}
}
}
The return line should be below the using allowing the compression stream to close and flush, this caused the inconsistent behaviour when decompressing the stream.
/// <summary>
/// Compress file data and then base64s the compressed data for safe transportation in XML.
/// </summary>
/// <returns>Base64 string of file chunk</returns>
private string GetFileChunk()
{
// MemoryStream for compression output
using (MemoryStream compressed = new MemoryStream())
{
using (GZipStream zip = new GZipStream(compressed, CompressionMode.Compress))
{
// read chunk from file
byte[] plaintext = new byte[this.readSize];
int read = this.file.Read(plaintext, 0, plaintext.Length);
// write chunk to compreesion
zip.Write(plaintext, 0, read);
plaintext = null;
}
// Base64 compressed data
return Convert.ToBase64String(compressed.ToArray());
}
}
Thank you for everybody's help.

Related

Wrong file extension added in image byte array

I have a method that takes a web-based image, converts it into a byte array and then passes that to a CDN as an image.
I have now used this successfully hundreds of times, however on migrating a particular set of images, I've noticed that these Jpeg files are identifying as .png files which when they arrive at the CDN are blank.
Using some code that I copied from the web, I am able to identify the file extension from the image byte array after it is built.
So, it's the conversion from the original image to byte array that is mysteriously updating the file type.
This is my method:-
public byte[] Get(string fullPath)
{
byte[] imageBytes = { };
var imageRequest = (HttpWebRequest)WebRequest.Create(fullPath);
var imageResponse = imageRequest.GetResponse();
var responseStream = imageResponse.GetResponseStream();
if (responseStream != null)
{
using (var br = new BinaryReader(responseStream))
{
imageBytes = br.ReadBytes(500000);
br.Close();
}
responseStream.Close();
}
imageResponse.Close();
return imageBytes;
}
I have also tried converting this to use MemoryStream instead.
I'm not sure what else I can do to ensure that this identifies the correct file type.
Edit
I have now updated the number of allowed bytes which has resulted in viable images.
However the issue with the JPEG files being altered to PNG is still ongoing.
It's only this selection of images that are affected.
These images were saved in an old CMS system so I do wonder if the way that they were saved is the cause?
Up to now, the code only reads 500,000 bytes for each file. If the file is larger than that, the end is truncated and the content is not valid anymore. In order to read all bytes, you can use the following code:
public byte[] Get(string fullPath)
{
List<byte> imageBytes = new List<byte>(500000);
var imageRequest = (HttpWebRequest)WebRequest.Create(fullPath);
using (var imageResponse = imageRequest.GetResponse())
{
using (var responseStream = imageResponse.GetResponseStream())
{
using (var br = new BinaryReader(responseStream))
{
var buffer = new byte[500000];
int bytesRead;
while ((bytesRead = br.Read(buffer, 0, buffer.length)) > 0)
{
imageBytes.AddRange(buffer);
}
}
}
}
return imageBytes.ToArray();
}
Above sample reads the data in chunks of 500,000 bytes - for most of your files, this should be sufficient. If a file is larger, the code reads more chunks until there are no more bytes to read. All the chunks are assembled in a list.
This asserts that all the bytes are read, even if the content is larger than 500,000 bytes.

After re-compressing data, the result differs from the uncompressed source. How do I reproduce the same data again?

I am given an byte array of compressed data, which takes up 27878 bytes. It starts with
78-9C-ED-BD-09
so it should be readable with DeflateStream, which it is. The uncompressed data is a XML file. I want to modify some values in this XML and then compress it again, to save it back to its source.
But even without any modifications, just decompressing and compressing again the result differs from the source, which causes the target-applications, which reads this byte array to crash.
For compression and uncompression I used these methods found in Stackoverflow
private static MemoryStream Decompress(byte[] input)
{
var output = new MemoryStream();
using (var compressStream = new MemoryStream(input))
using (var decompressor = new DeflateStream(compressStream, CompressionMode.Decompress))
decompressor.CopyTo(output);
output.Position = 0;
return output;
}
public byte[] Compress(byte[] input)
{
MemoryStream stream = new MemoryStream();
DeflateStream compressionStream =
new DeflateStream(stream, CompressionMode.Compress);
compressionStream.Write(input, 0, input.Length);
compressionStream.Close();
return stream.ToArray();
}
Edit 04/23/19
As it was pointed out be a comment, that has been deleted, the deflate methods are not suited to compress data. Instead, with the DotNetZip library and some work in this library it was possible to create the same data again!

Appending to MemoryStream

I'm trying to append some data to a stream. This works well with FileStream, but not for MemoryStream due to the fixed buffer size.
The method which writes data to the stream is separated from the method which creates the stream (I've simplified it greatly in the below example). The method which creates the stream is unaware of the length of data to be written to the stream.
public void Foo(){
byte[] existingData = System.Text.Encoding.UTF8.GetBytes("foo");
Stream s1 = new FileStream("someFile.txt", FileMode.Append, FileAccess.Write, FileShare.Read);
s1.Write(existingData, 0, existingData.Length);
Stream s2 = new MemoryStream(existingData, 0, existingData.Length, true);
s2.Seek(0, SeekOrigin.End); //move to end of the stream for appending
WriteUnknownDataToStream(s1);
WriteUnknownDataToStream(s2); // NotSupportedException is thrown as the MemoryStream is not expandable
}
public static void WriteUnknownDataToStream(Stream s)
{
// this is some example data for this SO query - the real data is generated elsewhere and is of a variable, and often large, size.
byte[] newBytesToWrite = System.Text.Encoding.UTF8.GetBytes("bar"); // the length of this is not known before the stream is created.
s.Write(newBytesToWrite, 0, newBytesToWrite.Length);
}
An idea I had was to send an expandable MemoryStream to the function, then append the returned data to the existing data.
public void ModifiedFoo()
{
byte[] existingData = System.Text.Encoding.UTF8.GetBytes("foo");
Stream s2 = new MemoryStream(); // expandable capacity memory stream
WriteUnknownDataToStream(s2);
// append the data which has been written into s2 to the existingData
byte[] buffer = new byte[existingData.Length + s2.Length];
Buffer.BlockCopy(existingData, 0, buffer, 0, existingData.Length);
Stream merger = new MemoryStream(buffer, true);
merger.Seek(existingData.Length, SeekOrigin.Begin);
s2.CopyTo(merger);
}
Any better (more efficient) solutions?
A possible solution is not to limit the capacity of the MemoryStream in the first place.
If you do not know in advance the total number of bytes you will need to write, create a MemoryStream with unspecified capacity and use it for both writes.
byte[] existingData = System.Text.Encoding.UTF8.GetBytes("foo");
MemoryStream ms = new MemoryStream();
ms.Write(existingData, 0, existingData.Length);
WriteUnknownData(ms);
This will no doubt be less performant than initializing a MemoryStream from a byte[], but if you need to continue writing to the stream I believe it is your only option.

How to use ICSharpCode.ZipLib with stream?

I'm very sorry for the conservative title and my question itself,but I'm lost.
The samples provided with ICsharpCode.ZipLib doesn't include what I'm searching for.
I want to decompress a byte[] by putting it in InflaterInputStream(ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream)
I found a decompress function ,but it doesn't work.
public static byte[] Decompress(byte[] Bytes)
{
ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream stream =
new ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream(new MemoryStream(Bytes));
MemoryStream memory = new MemoryStream();
byte[] writeData = new byte[4096];
int size;
while (true)
{
size = stream.Read(writeData, 0, writeData.Length);
if (size > 0)
{
memory.Write(writeData, 0, size);
}
else break;
}
stream.Close();
return memory.ToArray();
}
It throws an exception at line(size = stream.Read(writeData, 0, writeData.Length);) saying it has a invalid header.
My question is not how to fix the function,this function is not provided with the library,I just found it googling.My question is,how to decompress the same way the function does with InflaterStream,but without exceptions.
Thanks and again - sorry for the conservative question.
the code in lucene is very nice.
public static byte[] Compress(byte[] input) {
// Create the compressor with highest level of compression
Deflater compressor = new Deflater();
compressor.SetLevel(Deflater.BEST_COMPRESSION);
// Give the compressor the data to compress
compressor.SetInput(input);
compressor.Finish();
/*
* Create an expandable byte array to hold the compressed data.
* You cannot use an array that's the same size as the orginal because
* there is no guarantee that the compressed data will be smaller than
* the uncompressed data.
*/
MemoryStream bos = new MemoryStream(input.Length);
// Compress the data
byte[] buf = new byte[1024];
while (!compressor.IsFinished) {
int count = compressor.Deflate(buf);
bos.Write(buf, 0, count);
}
// Get the compressed data
return bos.ToArray();
}
public static byte[] Uncompress(byte[] input) {
Inflater decompressor = new Inflater();
decompressor.SetInput(input);
// Create an expandable byte array to hold the decompressed data
MemoryStream bos = new MemoryStream(input.Length);
// Decompress the data
byte[] buf = new byte[1024];
while (!decompressor.IsFinished) {
int count = decompressor.Inflate(buf);
bos.Write(buf, 0, count);
}
// Get the decompressed data
return bos.ToArray();
}
Well it sounds like the data is just inappropriate, and that otherwise the code would work okay. (Admittedly I'd use a "using" statement for the streams instead of calling Close explicitly.)
Where did you get your data from?
Why don't you use the System.IO.Compression.DeflateStream class (available since .Net 2.0)? This uses the same compression/decompression method but doesn't require an extra library dependency.
Since .Net 2.0 you only need the ICSharpCode.ZipLib if you need the file container support.

Base64 Encode a PDF in C#?

Can someone provide some light on how to do this? I can do this for regular text or byte array, but not sure how to approach for a pdf. do i stuff the pdf into a byte array first?
Use File.ReadAllBytes to load the PDF file, and then encode the byte array as normal using Convert.ToBase64String(bytes).
Byte[] fileBytes = File.ReadAllBytes(#"TestData\example.pdf");
var content = Convert.ToBase64String(fileBytes);
There is a way that you can do this in chunks so that you don't have to burn a ton of memory all at once.
.Net includes an encoder that can do the chunking, but it's in kind of a weird place. They put it in the System.Security.Cryptography namespace.
I have tested the example code below, and I get identical output using either my method or Andrew's method above.
Here's how it works: You fire up a class called a CryptoStream. This is kind of an adapter that plugs into another stream. You plug a class called CryptoTransform into the CryptoStream (which in turn is attached to your file/memory/network stream) and it performs data transformations on the data while it's being read from or written to the stream.
Normally, the transformation is encryption/decryption, but .net includes ToBase64 and FromBase64 transformations as well, so we won't be encrypting, just encoding.
Here's the code. I included a (maybe poorly named) implementation of Andrew's suggestion so that you can compare the output.
class Base64Encoder
{
public void Encode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.ToBase64Transform();
using(System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(outFile, transform, System.Security.Cryptography.CryptoStreamMode.Write))
{
// I'm going to use a 4k buffer, tune this as needed
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = inFile.Read(buffer, 0, buffer.Length)) > 0)
cryptStream.Write(buffer, 0, bytesRead);
cryptStream.FlushFinalBlock();
}
}
public void Decode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.FromBase64Transform();
using (System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(inFile, transform, System.Security.Cryptography.CryptoStreamMode.Read))
{
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = cryptStream.Read(buffer, 0, buffer.Length)) > 0)
outFile.Write(buffer, 0, bytesRead);
outFile.Flush();
}
}
// this version of Encode pulls everything into memory at once
// you can compare the output of my Encode method above to the output of this one
// the output should be identical, but the crytostream version
// will use way less memory on a large file than this version.
public void MemoryEncode(string inFileName, string outFileName)
{
byte[] bytes = System.IO.File.ReadAllBytes(inFileName);
System.IO.File.WriteAllText(outFileName, System.Convert.ToBase64String(bytes));
}
}
I am also playing around with where I attach the CryptoStream. In the Encode method,I am attaching it to the output (writing) stream, so when I instance the CryptoStream, I use its Write() method.
When I read, I'm attaching it to the input (reading) stream, so I use the read method on the CryptoStream. It doesn't really matter which stream I attach it to. I just have to pass the appropriate Read or Write enumeration member to the CryptoStream's constructor.

Categories

Resources