Following feedback from Alexei, a simplification of the question:
How do I use a buffered Stream approach to convert the contents of a CryptoStream (using ToBase64Transform) into a StreamWriter (Unicode encoding) without using Convert.ToBase64String()?
Note: Calling Convert.ToBase64String() throws OutOfMemoryException, hence the need for a buffered/Stream approach to the conversion.
You probably should implement custom Stream, not a TextWriter. It is much easier to compose streams than writers (like pass your stream to compressed stream).
To create custom stream - derive from Stream and implement at least Write and Flush (and Read if you need R/W stream). The rest is more or less optional and depends on you additional needs, regular copy to other stream does not need anything else.
In constructor get inner stream passed to you for writing to. Base64 is always producing ASCII characters, so it should be easy to write output as UTF-8 with or without BOM directly to a stream, but if you want to specify encoding you can wrap inner stream with StreamWriter internally.
In your Write implementation buffer data till you get enough bytes to have block of multiple of 3 bytes (i.e. 300) and call Convert.ToBase64String on that portion. Make sure not to loose not-yet-converted portion. Since Base64 converts 3 bytes to 4 characters converting in blocks of multiple of 3 size will never have =/== padding at the end and can be concatenated with next block. So write that converted portion into inner stream/writer. Note that you want to limit block size to something relatively small like 3*10000 to avoid allocation of your blocks on large objects heap.
In Flush make sure to convert the last unwritten bytes (this will be the only one with = padding at the end) and write it to the stream too.
For reading you may need to be more careful as in Base64 white spaces are allowed, so you can't read fixed number of characters and convert to bytes. The easiest approach would be to read by character from StreamReader and convert each 4 non-space ones to bytes.
Note: you can consider writing/reading Base64 by hand directly from bytes. It will give you some performance benefits, but may be hard if you are not good with bit shifting.
Please try using following to encrypt. I am using fileName/filePath as input. You can adjust it as per your requirement. Using this I have encrypted over 1 gb file successfully without any out of memory exception.
public bool EncryptUsingStream(string inputFileName, string outputFileName)
{
bool success = false;
// here assuming that you already have key
byte[] key = new byte[128];
SymmetricAlgorithm algorithm = SymmetricAlgorithm.Create();
algorithm.Key = key;
using (ICryptoTransform transform = algorithm.CreateEncryptor())
{
CryptoStream cs = null;
FileStream fsEncrypted = null;
try
{
using (FileStream fsInput = new FileStream(inputFileName, FileMode.Open, FileAccess.Read))
{
//First write IV
fsEncrypted = new FileStream(outputFileName, FileMode.Create, FileAccess.Write);
fsEncrypted.Write(algorithm.IV, 0, algorithm.IV.Length);
//then write using stream
cs = new CryptoStream(fsEncrypted, transform, CryptoStreamMode.Write);
int bytesRead;
int _bufferSize = 1048576; //buggersize = 1mb;
byte[] buffer = new byte[_bufferSize];
do
{
bytesRead = fsInput.Read(buffer, 0, _bufferSize);
cs.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
success = true;
}
}
catch (Exception ex)
{
//handle exception or throw.
}
finally
{
if (cs != null)
{
cs.Close();
((IDisposable)cs).Dispose();
if (fsEncrypted != null)
{
fsEncrypted.Close();
}
}
}
}
return success;
}
Related
I'm having a problem with writing an uncompressed GZIP stream using SharpZipLib's GZipInputStream. I only seem to be able to get 256 bytes worth of data with the rest not being written to and left zeroed. The compressed stream (compressedSection) has been checked and all data is there (1500+ bytes). The snippet of the decompression process is below:
int msiBuffer = 4096;
using (Stream msi = new MemoryStream(msiBuffer))
{
msi.Write(compressedSection, 0, compressedSection.Length);
msi.Position = 0;
int uncompressedIntSize = AllMethods.GetLittleEndianInt(uncompressedSize, 0); // Gets little endian value of uncompressed size into an integer
// SharpZipLib GZip method called
using (GZipInputStream decompressStream = new GZipInputStream(msi, uncompressedIntSize))
{
using (MemoryStream outputStream = new MemoryStream(uncompressedIntSize))
{
byte[] buffer = new byte[uncompressedIntSize];
decompressStream.Read(buffer, 0, uncompressedIntSize); // Stream is decompressed and read
outputStream.Write(buffer, 0, uncompressedIntSize);
using (var fs = new FileStream(kernelSectionUncompressed, FileMode.Create, FileAccess.Write))
{
fs.Write(buffer, 0, buffer.Length);
fs.Close();
}
outputStream.Close();
}
decompressStream.Close();
So in this snippet:
1) The compressed section is passed in, ready to be decompressed.
2) The expected size of the uncompressed output (which is stored in a header with the file as a 2-byte little-endian value) is passed through a method to convert it to integer. The header is removed earlier as it is not part of the compressed GZIP file.
3) SharpLibZip's GZIP stream is declared with the compressed file stream (msi) and a buffer equal to int uncompressedIntSize (have tested with a static value of 4096 as well).
4) I set up a MemoryStream to handle writing the output to a file as GZipInputStream doesn't have Read/Write; it takes the expected decompressed file size as the argument (capacity).
5) The Read/Write of the stream needs byte[] array as the first argument, so I set up a byte[] array with enough space to take all the bytes of the decompressed output (3584 bytes in this case, derived from uncompressedIntSize).
6) int GzipInputStream decompressStream uses .Read with the buffer as first argument, from offset 0, using the uncompressedIntSize as the count. Checking the arguments in here, the buffer array still has a capacity of 3584 bytes but has only been given 256 bytes of data. The rest are zeroes.
It looks like the output of .Read is being throttled to 256 bytes but I'm not sure where. Is there something I've missed with the Streams, or is this a limitation with .Read?
You need to loop when reading from a stream; the lazy way is probably:
decompressStream.CopyTo(outputStream);
(but this doesn't guarantee to stop after uncompressedIntSize bytes - it'll try to read to the end of decompressStream)
A more manual version (that respects an imposed length limit) would be:
const int BUFFER_SIZE = 1024; // whatever
var buffer = ArrayPool<byte>.Shared.Rent(BUFFER_SIZE);
try
{
int remaining = uncompressedIntSize, bytesRead;
while (remaining > 0 && // more to do, and making progress
(bytesRead = decompressStream.Read(
buffer, 0, Math.Min(remaining, buffer.Length))) > 0)
{
outputStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
if (remaining != 0) throw new EndOfStreamException();
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
The issue turned out to be an oversight I'd made earlier in the posted code:
The file I'm working with has 27 sections which are GZipped, but they each have a header which will break the Gzip decompression if the GZipInput stream hits any of them. When opening the base file, it was starting from the beginning (adjusted by 6 to avoid the first header) each time instead of going to the next post-head offset:
brg.BaseStream.Seek(6, SeekOrigin.Begin);
Instead of:
brg.BaseStream.Seek(absoluteSectionOffset, SeekOrigin.Begin);
This meant that the extracted compressed data was an amalgam of the first headerless section + part of the 2nd section along with its header. As the first section is 256 bytes long without its header, this part was being decompressed correctly by the GZipInput stream. But after that is 6-bytes of header which breaks it, resulting in the rest of the output being 00s.
There was no explicit error being thrown by the GZipInput stream when this happened, so I'd incorrectly assumed that the cause was the .Read or something in the stream retaining data from the previous pass. Sorry for the hassle.
I'm trying to get a byte[] array filled with request response, without any extra garbage data.
This is how I fetch the data:
using (Stream MyResponseStream = hwresponse.GetResponseStream())
{
byte[] MyBuffer = new byte[4096];
int BytesRead;
while (0 < (BytesRead = MyResponseStream.Read(MyBuffer, 0, MyBuffer.Length)))
{
ByteArrayToFile("request.txt", MyBuffer);
}
}
I use the function 'ByteArrayToFile' to see what data has been recieved.
public void ByteArrayToFile(string _FileName, byte[] _ByteArray)
{
System.IO.FileStream _FileStream = new System.IO.FileStream(_FileName, System.IO.FileMode.Append, System.IO.FileAccess.Write);
_FileStream.Write(_ByteArray, 0, _ByteArray.Length);
_FileStream.Close();
}
I get request written to the file, but a lot of 'null' characters are added at the end. How do I trim them? Since I'm going to need this to handle binary files, how can I safely trim out the endings and have just pure array of response? Thanks!
You need to utilise the value BytesRead, this will indicate exactly how many bytes were received:
public void ByteArrayToFile(string _FileName, byte[] _ByteArray, int _BytesRead)
{
using (var _FileStream = new FileStream(
_FileName, FileMode.Append, FileAccess.Write))
{
_FileStream.Write(_ByteArray, 0, _BytesRead);
}
}
Otherwise you're writing out an array of length X which has only been populated with Y number of elements, causing a number of 'unused' elements in the array to also be written out. There is also the possibility of stale data remaining in the buffer with a pass, meaning misinformation could also end up being written out with the next write.
You should also dispose of FileStream instances when done (although Close does this for a Stream, I'd recommend the consistency of calling Dispose in one of two ways: explicitly or as illustrated in the code above, implicitly using the using construct).
I'm trying to inflate a string using zlib's deflate, but it's failing, apparently because it doesn't have the right header. I read elsewhere that the C# solution to this problem is:
public static byte[] FlateDecode(byte[] inp, bool strict) {
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
MemoryStream outp = new MemoryStream();
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
outp.Write(b, 0, n);
}
zip.Close();
outp.Close();
return outp.ToArray();
}
catch {
if (strict)
return null;
return outp.ToArray();
}
}
But I know nothing about C#. I can surmise that all it's doing is adding a prefix to the string, but what that prefix is, I have no idea. Would someone be able to phrase this function (or even just the header creation and string concatenation) in C++?
The data which I'm trying to inflate is taken from a PDF using zlib deflation.
Thanks a million,
Wyatt
I've had better luck using SharpZipLib for zlib interop than with the native .Net Framework classes. This correctly handles streams from C++ (zlib native) and from Java's compression classes without any funny business being needed.
I can't see any prefixes, sorry. Here's what the logic appears to be; sorry this isn't in C++:
MemoryStream stream = new MemoryStream(inp);
InflaterInputStream zip = new InflaterInputStream(stream);
Create an inflate stream from the data passed
MemoryStream outp = new MemoryStream();
Create a memory buffer stream for output
byte[] b = new byte[strict ? 4092 : 1];
try {
int n;
while ((n = zip.Read(b, 0, b.Length)) > 0) {
If you're in strict mode, read up to 4092 bytes - or 1 in non-strict mode - into a byte buffer
outp.Write(b, 0, n);
Write all the bytes decoded (may be less than the 4092) to the output memory buffer stream
zip.Close();
outp.Close();
return outp.ToArray();
Clean up, and return the output memory buffer stream as an array.
I'm a bit confused, though: why not just cut array b off at n elements and return that rather than go via a MemoryStream? The code also ought really to take care to clean up the memory streams and zip on exception (e.g. using using) since they're all IDisposable but I guess that's not really important since they don't correspond to I/O file handles, only memory structures.
What is the best method to convert a Stream to a FileStream using C#.
The function I am working on has a Stream passed to it containing uploaded data, and I need to be able to perform stream.Read(), stream.Seek() methods which are methods of the FileStream type.
A simple cast does not work, so I'm asking here for help.
Read and Seek are methods on the Stream type, not just FileStream. It's just that not every stream supports them. (Personally I prefer using the Position property over calling Seek, but they boil down to the same thing.)
If you would prefer having the data in memory over dumping it to a file, why not just read it all into a MemoryStream? That supports seeking. For example:
public static MemoryStream CopyToMemory(Stream input)
{
// It won't matter if we throw an exception during this method;
// we don't *really* need to dispose of the MemoryStream, and the
// caller should dispose of the input stream
MemoryStream ret = new MemoryStream();
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
ret.Write(buffer, 0, bytesRead);
}
// Rewind ready for reading (typical scenario)
ret.Position = 0;
return ret;
}
Use:
using (Stream input = ...)
{
using (Stream memory = CopyToMemory(input))
{
// Seek around in memory to your heart's content
}
}
This is similar to using the Stream.CopyTo method introduced in .NET 4.
If you actually want to write to the file system, you could do something similar that first writes to the file then rewinds the stream... but then you'll need to take care of deleting it afterwards, to avoid littering your disk with files.
Can someone provide some light on how to do this? I can do this for regular text or byte array, but not sure how to approach for a pdf. do i stuff the pdf into a byte array first?
Use File.ReadAllBytes to load the PDF file, and then encode the byte array as normal using Convert.ToBase64String(bytes).
Byte[] fileBytes = File.ReadAllBytes(#"TestData\example.pdf");
var content = Convert.ToBase64String(fileBytes);
There is a way that you can do this in chunks so that you don't have to burn a ton of memory all at once.
.Net includes an encoder that can do the chunking, but it's in kind of a weird place. They put it in the System.Security.Cryptography namespace.
I have tested the example code below, and I get identical output using either my method or Andrew's method above.
Here's how it works: You fire up a class called a CryptoStream. This is kind of an adapter that plugs into another stream. You plug a class called CryptoTransform into the CryptoStream (which in turn is attached to your file/memory/network stream) and it performs data transformations on the data while it's being read from or written to the stream.
Normally, the transformation is encryption/decryption, but .net includes ToBase64 and FromBase64 transformations as well, so we won't be encrypting, just encoding.
Here's the code. I included a (maybe poorly named) implementation of Andrew's suggestion so that you can compare the output.
class Base64Encoder
{
public void Encode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.ToBase64Transform();
using(System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(outFile, transform, System.Security.Cryptography.CryptoStreamMode.Write))
{
// I'm going to use a 4k buffer, tune this as needed
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = inFile.Read(buffer, 0, buffer.Length)) > 0)
cryptStream.Write(buffer, 0, bytesRead);
cryptStream.FlushFinalBlock();
}
}
public void Decode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.FromBase64Transform();
using (System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(inFile, transform, System.Security.Cryptography.CryptoStreamMode.Read))
{
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = cryptStream.Read(buffer, 0, buffer.Length)) > 0)
outFile.Write(buffer, 0, bytesRead);
outFile.Flush();
}
}
// this version of Encode pulls everything into memory at once
// you can compare the output of my Encode method above to the output of this one
// the output should be identical, but the crytostream version
// will use way less memory on a large file than this version.
public void MemoryEncode(string inFileName, string outFileName)
{
byte[] bytes = System.IO.File.ReadAllBytes(inFileName);
System.IO.File.WriteAllText(outFileName, System.Convert.ToBase64String(bytes));
}
}
I am also playing around with where I attach the CryptoStream. In the Encode method,I am attaching it to the output (writing) stream, so when I instance the CryptoStream, I use its Write() method.
When I read, I'm attaching it to the input (reading) stream, so I use the read method on the CryptoStream. It doesn't really matter which stream I attach it to. I just have to pass the appropriate Read or Write enumeration member to the CryptoStream's constructor.