My application requires the need to crypt parts of a stream to other streams, as some files have some parts encrypted with one key and others with other keys. To support this, I tried to make a method that crypts a part of a stream using an ICryptoTransform (yes, crypt, since ICryptoTransforms can be decryptors or encryptors) with a given offset and size, and it should be buffered.
This was my idea:
Open a buffer stream, for data chunks
Open a CryptoStream, and pass the buffer stream into it
Read and hopefully crypt it like this:
Read a chunk (the size of bufferSize) from the stream
Write that chunk into the crypto stream (which should write it to the
buffer stream)
Call .Flush() on that, to make sure the data
has been crypted
Write the contents of the buffer stream to the output stream
Seek the buffer stream back to the beginning, so that a new chunk can be written and crypted in it
Repeat until offset + size has been reached on the input stream.
This is my current code:
public static void CryptStreamPartBuffered(Stream input, Stream output, ICryptoTransform transform, long offset, long size, int bufferSize = 4000000)
{
using (MemoryStream ms = new MemoryStream()) //opening a memory stream, which will be the "buffer" for the crypted chunks
{
using (CryptoStream cs = new CryptoStream(ms, transform, CryptoStreamMode.Write)) //opening a cryptostream on the "buffer" stream, to crypt it's contents
{
input.Seek(offset, SeekOrigin.Begin); //seeking input stream to given start offset
byte[] buffer = new byte[bufferSize];
while (input.Position < offset + size)
{
int remaining = bufferSize, bytesRead;
while (remaining > 0 && (bytesRead = input.Read(buffer, 0, Math.Min(remaining, bufferSize))) > 0)
{
remaining -= bytesRead;
cs.Write(buffer); //writing current chunk of data to crypto stream
cs.Flush(); //making sure that the crypto stream has done its work on the current chunk (although I'm not really sure whether this is the right thing to do)
output.Write(ms.ToArray()); //writing the (hopefully) crypted data to the output stream
ms.Seek(0, SeekOrigin.Begin); //re-seeking the chunk stream to it's beginning, so that when the next chunk gets crypted, it can be written there
}
}
}
}
}
This straight-up doesn't work for files smaller than the default buffer size. While I do know that this can be fixed by just setting a smaller buffer size manually when calling the method, but having the support would be great.
However, this doesn't seem to do its job very well. I seem to get garbled data and I feel like I'm doing something wrong here, probably some very obvious mistake I just can't figure out.
Related
I'm having a problem with writing an uncompressed GZIP stream using SharpZipLib's GZipInputStream. I only seem to be able to get 256 bytes worth of data with the rest not being written to and left zeroed. The compressed stream (compressedSection) has been checked and all data is there (1500+ bytes). The snippet of the decompression process is below:
int msiBuffer = 4096;
using (Stream msi = new MemoryStream(msiBuffer))
{
msi.Write(compressedSection, 0, compressedSection.Length);
msi.Position = 0;
int uncompressedIntSize = AllMethods.GetLittleEndianInt(uncompressedSize, 0); // Gets little endian value of uncompressed size into an integer
// SharpZipLib GZip method called
using (GZipInputStream decompressStream = new GZipInputStream(msi, uncompressedIntSize))
{
using (MemoryStream outputStream = new MemoryStream(uncompressedIntSize))
{
byte[] buffer = new byte[uncompressedIntSize];
decompressStream.Read(buffer, 0, uncompressedIntSize); // Stream is decompressed and read
outputStream.Write(buffer, 0, uncompressedIntSize);
using (var fs = new FileStream(kernelSectionUncompressed, FileMode.Create, FileAccess.Write))
{
fs.Write(buffer, 0, buffer.Length);
fs.Close();
}
outputStream.Close();
}
decompressStream.Close();
So in this snippet:
1) The compressed section is passed in, ready to be decompressed.
2) The expected size of the uncompressed output (which is stored in a header with the file as a 2-byte little-endian value) is passed through a method to convert it to integer. The header is removed earlier as it is not part of the compressed GZIP file.
3) SharpLibZip's GZIP stream is declared with the compressed file stream (msi) and a buffer equal to int uncompressedIntSize (have tested with a static value of 4096 as well).
4) I set up a MemoryStream to handle writing the output to a file as GZipInputStream doesn't have Read/Write; it takes the expected decompressed file size as the argument (capacity).
5) The Read/Write of the stream needs byte[] array as the first argument, so I set up a byte[] array with enough space to take all the bytes of the decompressed output (3584 bytes in this case, derived from uncompressedIntSize).
6) int GzipInputStream decompressStream uses .Read with the buffer as first argument, from offset 0, using the uncompressedIntSize as the count. Checking the arguments in here, the buffer array still has a capacity of 3584 bytes but has only been given 256 bytes of data. The rest are zeroes.
It looks like the output of .Read is being throttled to 256 bytes but I'm not sure where. Is there something I've missed with the Streams, or is this a limitation with .Read?
You need to loop when reading from a stream; the lazy way is probably:
decompressStream.CopyTo(outputStream);
(but this doesn't guarantee to stop after uncompressedIntSize bytes - it'll try to read to the end of decompressStream)
A more manual version (that respects an imposed length limit) would be:
const int BUFFER_SIZE = 1024; // whatever
var buffer = ArrayPool<byte>.Shared.Rent(BUFFER_SIZE);
try
{
int remaining = uncompressedIntSize, bytesRead;
while (remaining > 0 && // more to do, and making progress
(bytesRead = decompressStream.Read(
buffer, 0, Math.Min(remaining, buffer.Length))) > 0)
{
outputStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
if (remaining != 0) throw new EndOfStreamException();
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
The issue turned out to be an oversight I'd made earlier in the posted code:
The file I'm working with has 27 sections which are GZipped, but they each have a header which will break the Gzip decompression if the GZipInput stream hits any of them. When opening the base file, it was starting from the beginning (adjusted by 6 to avoid the first header) each time instead of going to the next post-head offset:
brg.BaseStream.Seek(6, SeekOrigin.Begin);
Instead of:
brg.BaseStream.Seek(absoluteSectionOffset, SeekOrigin.Begin);
This meant that the extracted compressed data was an amalgam of the first headerless section + part of the 2nd section along with its header. As the first section is 256 bytes long without its header, this part was being decompressed correctly by the GZipInput stream. But after that is 6-bytes of header which breaks it, resulting in the rest of the output being 00s.
There was no explicit error being thrown by the GZipInput stream when this happened, so I'd incorrectly assumed that the cause was the .Read or something in the stream retaining data from the previous pass. Sorry for the hassle.
Following feedback from Alexei, a simplification of the question:
How do I use a buffered Stream approach to convert the contents of a CryptoStream (using ToBase64Transform) into a StreamWriter (Unicode encoding) without using Convert.ToBase64String()?
Note: Calling Convert.ToBase64String() throws OutOfMemoryException, hence the need for a buffered/Stream approach to the conversion.
You probably should implement custom Stream, not a TextWriter. It is much easier to compose streams than writers (like pass your stream to compressed stream).
To create custom stream - derive from Stream and implement at least Write and Flush (and Read if you need R/W stream). The rest is more or less optional and depends on you additional needs, regular copy to other stream does not need anything else.
In constructor get inner stream passed to you for writing to. Base64 is always producing ASCII characters, so it should be easy to write output as UTF-8 with or without BOM directly to a stream, but if you want to specify encoding you can wrap inner stream with StreamWriter internally.
In your Write implementation buffer data till you get enough bytes to have block of multiple of 3 bytes (i.e. 300) and call Convert.ToBase64String on that portion. Make sure not to loose not-yet-converted portion. Since Base64 converts 3 bytes to 4 characters converting in blocks of multiple of 3 size will never have =/== padding at the end and can be concatenated with next block. So write that converted portion into inner stream/writer. Note that you want to limit block size to something relatively small like 3*10000 to avoid allocation of your blocks on large objects heap.
In Flush make sure to convert the last unwritten bytes (this will be the only one with = padding at the end) and write it to the stream too.
For reading you may need to be more careful as in Base64 white spaces are allowed, so you can't read fixed number of characters and convert to bytes. The easiest approach would be to read by character from StreamReader and convert each 4 non-space ones to bytes.
Note: you can consider writing/reading Base64 by hand directly from bytes. It will give you some performance benefits, but may be hard if you are not good with bit shifting.
Please try using following to encrypt. I am using fileName/filePath as input. You can adjust it as per your requirement. Using this I have encrypted over 1 gb file successfully without any out of memory exception.
public bool EncryptUsingStream(string inputFileName, string outputFileName)
{
bool success = false;
// here assuming that you already have key
byte[] key = new byte[128];
SymmetricAlgorithm algorithm = SymmetricAlgorithm.Create();
algorithm.Key = key;
using (ICryptoTransform transform = algorithm.CreateEncryptor())
{
CryptoStream cs = null;
FileStream fsEncrypted = null;
try
{
using (FileStream fsInput = new FileStream(inputFileName, FileMode.Open, FileAccess.Read))
{
//First write IV
fsEncrypted = new FileStream(outputFileName, FileMode.Create, FileAccess.Write);
fsEncrypted.Write(algorithm.IV, 0, algorithm.IV.Length);
//then write using stream
cs = new CryptoStream(fsEncrypted, transform, CryptoStreamMode.Write);
int bytesRead;
int _bufferSize = 1048576; //buggersize = 1mb;
byte[] buffer = new byte[_bufferSize];
do
{
bytesRead = fsInput.Read(buffer, 0, _bufferSize);
cs.Write(buffer, 0, bytesRead);
} while (bytesRead > 0);
success = true;
}
}
catch (Exception ex)
{
//handle exception or throw.
}
finally
{
if (cs != null)
{
cs.Close();
((IDisposable)cs).Dispose();
if (fsEncrypted != null)
{
fsEncrypted.Close();
}
}
}
}
return success;
}
The issue is as follows, I am using an HttpWebRequest to request some online data from dmo.gov.uk. The response I am reading using a BinaryReader and writing to a MemoryStream. I have packaged the code being used into a simple test method:
public static byte[] Test(int bufferSize)
{
var request = (HttpWebRequest)WebRequest.Create("http://www.dmo.gov.uk/xmlData.aspx?rptCode=D3B.2");
request.Method = "GET";
request.Credentials = CredentialCache.DefaultCredentials;
var buffer = new byte[bufferSize];
using (var httpResponse = (HttpWebResponse)request.GetResponse())
{
using (var ms = new MemoryStream())
{
using (var reader = new BinaryReader(httpResponse.GetResponseStream()))
{
int bytesRead;
while ((bytesRead = reader.Read(buffer, 0, bufferSize)) > 0)
{
ms.Write(buffer, 0, bytesRead);
}
}
return ms.GetBuffer();
}
}
}
My real-life code uses a buffer size of 2048 bytes usually, however I noticed today that this file has a huge amount of empty bytes (\0) at the end which bloats the file size. As a test I tried increasing the buffer size to near-on the file size I expected (I was expecting ~80Kb so made the buffer size 79000) and now I get the right file size. But I'm confused, I expected to get the same file size regardless of the buffer size used to read the data.
The following test:
Console.WriteLine(Test(2048).Length);
Console.WriteLine(Test(79000).Length);
Console.ReadLine();
Yields the follwoing output:
131072
81341
The second figure, using the high buffer size is the exact file size I was expecting (This file changes daily, so expect that size to differ after today's date). The first figure contains \0 for everything after the file size expected.
What's going on here?
You should change ms.GetBuffer(); to ms.ToArray();.
GetBuffer will return the entire MemoryStream buffer while ToArray will return all the values inside the MemoryStream.
I'm trying to append some data to a stream. This works well with FileStream, but not for MemoryStream due to the fixed buffer size.
The method which writes data to the stream is separated from the method which creates the stream (I've simplified it greatly in the below example). The method which creates the stream is unaware of the length of data to be written to the stream.
public void Foo(){
byte[] existingData = System.Text.Encoding.UTF8.GetBytes("foo");
Stream s1 = new FileStream("someFile.txt", FileMode.Append, FileAccess.Write, FileShare.Read);
s1.Write(existingData, 0, existingData.Length);
Stream s2 = new MemoryStream(existingData, 0, existingData.Length, true);
s2.Seek(0, SeekOrigin.End); //move to end of the stream for appending
WriteUnknownDataToStream(s1);
WriteUnknownDataToStream(s2); // NotSupportedException is thrown as the MemoryStream is not expandable
}
public static void WriteUnknownDataToStream(Stream s)
{
// this is some example data for this SO query - the real data is generated elsewhere and is of a variable, and often large, size.
byte[] newBytesToWrite = System.Text.Encoding.UTF8.GetBytes("bar"); // the length of this is not known before the stream is created.
s.Write(newBytesToWrite, 0, newBytesToWrite.Length);
}
An idea I had was to send an expandable MemoryStream to the function, then append the returned data to the existing data.
public void ModifiedFoo()
{
byte[] existingData = System.Text.Encoding.UTF8.GetBytes("foo");
Stream s2 = new MemoryStream(); // expandable capacity memory stream
WriteUnknownDataToStream(s2);
// append the data which has been written into s2 to the existingData
byte[] buffer = new byte[existingData.Length + s2.Length];
Buffer.BlockCopy(existingData, 0, buffer, 0, existingData.Length);
Stream merger = new MemoryStream(buffer, true);
merger.Seek(existingData.Length, SeekOrigin.Begin);
s2.CopyTo(merger);
}
Any better (more efficient) solutions?
A possible solution is not to limit the capacity of the MemoryStream in the first place.
If you do not know in advance the total number of bytes you will need to write, create a MemoryStream with unspecified capacity and use it for both writes.
byte[] existingData = System.Text.Encoding.UTF8.GetBytes("foo");
MemoryStream ms = new MemoryStream();
ms.Write(existingData, 0, existingData.Length);
WriteUnknownData(ms);
This will no doubt be less performant than initializing a MemoryStream from a byte[], but if you need to continue writing to the stream I believe it is your only option.
Can someone provide some light on how to do this? I can do this for regular text or byte array, but not sure how to approach for a pdf. do i stuff the pdf into a byte array first?
Use File.ReadAllBytes to load the PDF file, and then encode the byte array as normal using Convert.ToBase64String(bytes).
Byte[] fileBytes = File.ReadAllBytes(#"TestData\example.pdf");
var content = Convert.ToBase64String(fileBytes);
There is a way that you can do this in chunks so that you don't have to burn a ton of memory all at once.
.Net includes an encoder that can do the chunking, but it's in kind of a weird place. They put it in the System.Security.Cryptography namespace.
I have tested the example code below, and I get identical output using either my method or Andrew's method above.
Here's how it works: You fire up a class called a CryptoStream. This is kind of an adapter that plugs into another stream. You plug a class called CryptoTransform into the CryptoStream (which in turn is attached to your file/memory/network stream) and it performs data transformations on the data while it's being read from or written to the stream.
Normally, the transformation is encryption/decryption, but .net includes ToBase64 and FromBase64 transformations as well, so we won't be encrypting, just encoding.
Here's the code. I included a (maybe poorly named) implementation of Andrew's suggestion so that you can compare the output.
class Base64Encoder
{
public void Encode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.ToBase64Transform();
using(System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(outFile, transform, System.Security.Cryptography.CryptoStreamMode.Write))
{
// I'm going to use a 4k buffer, tune this as needed
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = inFile.Read(buffer, 0, buffer.Length)) > 0)
cryptStream.Write(buffer, 0, bytesRead);
cryptStream.FlushFinalBlock();
}
}
public void Decode(string inFileName, string outFileName)
{
System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.FromBase64Transform();
using (System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
outFile = System.IO.File.Create(outFileName))
using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(inFile, transform, System.Security.Cryptography.CryptoStreamMode.Read))
{
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = cryptStream.Read(buffer, 0, buffer.Length)) > 0)
outFile.Write(buffer, 0, bytesRead);
outFile.Flush();
}
}
// this version of Encode pulls everything into memory at once
// you can compare the output of my Encode method above to the output of this one
// the output should be identical, but the crytostream version
// will use way less memory on a large file than this version.
public void MemoryEncode(string inFileName, string outFileName)
{
byte[] bytes = System.IO.File.ReadAllBytes(inFileName);
System.IO.File.WriteAllText(outFileName, System.Convert.ToBase64String(bytes));
}
}
I am also playing around with where I attach the CryptoStream. In the Encode method,I am attaching it to the output (writing) stream, so when I instance the CryptoStream, I use its Write() method.
When I read, I'm attaching it to the input (reading) stream, so I use the read method on the CryptoStream. It doesn't really matter which stream I attach it to. I just have to pass the appropriate Read or Write enumeration member to the CryptoStream's constructor.