How to add seek and position capabilities to CryptoStream

How to add seek and position capabilities to CryptoStream - c#

I was trying to use CryptoStream with AWS .NET SDk it failed as seek is not supported on CryptoStream. I read somewhere with content length known we should be able to add these capabilities to CryptoStream. I would like to know how to do this; any sample code will be useful too.
I have a method like this which is passed with a FieStream and returns a cryptoStream. I assign the returned Stream object to InputStream of AWS SDk PutObjectRequest object.
public static Stream GetEncryptStream(Stream existingStream,
SymmetricAlgorithm cryptoServiceProvider,
string encryptionKey, string encryptionIV)
{
Stream existingStream = this.dataStream;
cryptoServiceProvider.Key = ASCIIEncoding.ASCII.GetBytes(encryptionKey);
cryptoServiceProvider.IV = ASCIIEncoding.ASCII.GetBytes(encryptionIV);
CryptoStream cryptoStream = new CryptoStream(existingStream,
cryptoServiceProvider.CreateEncryptor(), CryptoStreamMode.Read);
return cryptoStream ;
}

Generally with encryption there isn't a 1:1 mapping between input bytes and output bytes, so in order to seek backwards (in particular) it would have to do a lot of work - perhaps even going right back to the start and moving forwards processing the data to consume [n] bytes from the decrypted stream. Even if it knew where each byte mapped to, the state of the encryption is dependent on the data that came before it (it isn't a decoder ring ;p), so again - it would either have to read from the start (and reset back to the initialisation-vector), or it would have to track snapshots of positions and crypto-states, and go back to the nearest snapshot, then walk forwards. Lots of work and storage.
This would apply to seeking relative to either end, too.
Moving forwards from the current position wouldn't be too bad, but again you'd have to process the data - not just jump the base-stream's position.
There isn't a good way to implement this that most consumers could use - normally if you get a true from CanSeek that means "random access", but that is not efficient in this case.
As a workaround - consider copying the decrypted data into a MemoryStream or a file; then you can access the fully decrypted data in a random-access fashion.

It is so simple, just generate a long key with the same size as data by the position of the stream (stream.Position) and use ECB or any other encryption methods you like and then apply XOR. It is seekable, very fast and 1 to 1 encryption, which the output length is exactly same as the input length. It is memory efficient and you can use it on huge files. I think this method is used in modern WinZip AES encryption too. The only thing that you MUST be careful is the salt
Use a unique salt for each stream otherwise there is no encryption.
public class SeekableAesStream : Stream
{
private Stream baseStream;
private AesManaged aes;
private ICryptoTransform encryptor;
public bool autoDisposeBaseStream { get; set; } = true;
/// <param name="salt">//** WARNING **: MUST be unique for each stream otherwise there is NO security</param>
public SeekableAesStream(Stream baseStream, string password, byte[] salt)
{
this.baseStream = baseStream;
using (var key = new PasswordDeriveBytes(password, salt))
{
aes = new AesManaged();
aes.KeySize = 128;
aes.Mode = CipherMode.ECB;
aes.Padding = PaddingMode.None;
aes.Key = key.GetBytes(aes.KeySize / 8);
aes.IV = new byte[16]; //useless for ECB
encryptor = aes.CreateEncryptor(aes.Key, aes.IV);
}
}
private void cipher(byte[] buffer, int offset, int count, long streamPos)
{
//find block number
var blockSizeInByte = aes.BlockSize / 8;
var blockNumber = (streamPos / blockSizeInByte) + 1;
var keyPos = streamPos % blockSizeInByte;
//buffer
var outBuffer = new byte[blockSizeInByte];
var nonce = new byte[blockSizeInByte];
var init = false;
for (int i = offset; i < count; i++)
{
//encrypt the nonce to form next xor buffer (unique key)
if (!init || (keyPos % blockSizeInByte) == 0)
{
BitConverter.GetBytes(blockNumber).CopyTo(nonce, 0);
encryptor.TransformBlock(nonce, 0, nonce.Length, outBuffer, 0);
if (init) keyPos = 0;
init = true;
blockNumber++;
}
buffer[i] ^= outBuffer[keyPos]; //simple XOR with generated unique key
keyPos++;
}
}
public override bool CanRead { get { return baseStream.CanRead; } }
public override bool CanSeek { get { return baseStream.CanSeek; } }
public override bool CanWrite { get { return baseStream.CanWrite; } }
public override long Length { get { return baseStream.Length; } }
public override long Position { get { return baseStream.Position; } set { baseStream.Position = value; } }
public override void Flush() { baseStream.Flush(); }
public override void SetLength(long value) { baseStream.SetLength(value); }
public override long Seek(long offset, SeekOrigin origin) { return baseStream.Seek(offset, origin); }
public override int Read(byte[] buffer, int offset, int count)
{
var streamPos = Position;
var ret = baseStream.Read(buffer, offset, count);
cipher(buffer, offset, count, streamPos);
return ret;
}
public override void Write(byte[] buffer, int offset, int count)
{
cipher(buffer, offset, count, Position);
baseStream.Write(buffer, offset, count);
}
protected override void Dispose(bool disposing)
{
if (disposing)
{
encryptor?.Dispose();
aes?.Dispose();
if (autoDisposeBaseStream)
baseStream?.Dispose();
}
base.Dispose(disposing);
}
}
Usage:
static void test()
{
var buf = new byte[255];
for (byte i = 0; i < buf.Length; i++)
buf[i] = i;
//encrypting
var uniqueSalt = new byte[16]; //** WARNING **: MUST be unique for each stream otherwise there is NO security
var baseStream = new MemoryStream();
var cryptor = new SeekableAesStream(baseStream, "password", uniqueSalt);
cryptor.Write(buf, 0, buf.Length);
//decrypting at position 200
cryptor.Position = 200;
var decryptedBuffer = new byte[50];
cryptor.Read(decryptedBuffer, 0, 50);
}

As an extension to Mark Gravell's answer, the seekability of a cipher depends on the Mode Of Operation you're using for the cipher. Most modes of operation aren't seekable, because each block of ciphertext depends in some way on the previous one. ECB is seekable, but it's almost universally a bad idea to use it. CTR mode is another one that can be accessed randomly, as is CBC.
All of these modes have their own vulnerabilities, however, so you should read carefully and think long and hard (and preferably consult an expert) before choosing one.

Related

How can I stream a massive object directly to S3, without a MemoryStream or local file?

I am trying to write a massive object to AWS S3 (e.g. 25 GB).
Currently I can get it working in two ways:
Write the content to a file on local disk, then send the file to S3 using multi-part upload
Write the content to a MemoryStream, then send that stream to S3 using multi-part upload
However, I don't like either approach, because I need to reserve a large amount of disk space or memory for the operation. I am generating this content in code, so I was hoping to open a stream to an S3 object, and generate the content directly to that object. But I can't see how to make that work.
Is it possible to build a massive object in S3 without representing the entire object in a local file or memory first?
(Note: My question is very similar to this question, but that question doesn't have a useful answer.)

I was able to get it working by breaking the overall payload into chunks, and sending each individual chunk as a separate MemoryStream.
Technically this solution still uses a MemoryStream, but that's OK, since I can control how much memory is used by adjusting the chunk size. For my test, I created a 25GB file while keeping memory usage well below that (~2 GB IIRC).
Here is my solution:
private const string BucketName = "YOUR-BUCKET-NAME-HERE";
private static readonly RegionEndpoint BucketRegion = RegionEndpoint.USEast1;
private const string Key = "massive-file-test";
// We're going to send 100 chunks of 256 MB each, for a total of 25 GB.
// The content will be the asterisk ("*") repeated for the desired size.
private const int ChunkSizeMb = 256;
private const int TotalSizeGb = 25;
public static void Main(string[] args)
{
Console.WriteLine($"Writing object to {BucketName}, {Key}");
int totalChunks = TotalSizeGb * 1024 / ChunkSizeMb;
int chunkSizeBytes = ChunkSizeMb * 1024 * 1024;
string payload = new String('*', chunkSizeBytes);
// Initiate the request.
InitiateMultipartUploadRequest initiateRequest = new InitiateMultipartUploadRequest
{
BucketName = BucketName,
Key = Key
};
List<UploadPartResponse> uploadResponses = new List<UploadPartResponse>();
IAmazonS3 s3Client = new AmazonS3Client(BucketRegion);
InitiateMultipartUploadResponse initResponse = s3Client.InitiateMultipartUpload(initiateRequest);
// Open a stream to build the input.
for (int i = 0; i < totalChunks; i++)
{
// Write the next chunk to the input stream.
Console.WriteLine($"Writing chunk {i} of {totalChunks}");
using (var stream = ToStream(payload))
{
// Write the next chunk to s3.
UploadPartRequest uploadRequest = new UploadPartRequest
{
BucketName = BucketName,
Key = Key,
UploadId = initResponse.UploadId,
PartNumber = i + 1,
PartSize = chunkSizeBytes,
InputStream = stream,
};
uploadResponses.Add(s3Client.UploadPart(uploadRequest));
}
}
// Complete the request.
CompleteMultipartUploadRequest completeRequest = new CompleteMultipartUploadRequest
{
BucketName = BucketName,
Key = Key,
UploadId = initResponse.UploadId
};
completeRequest.AddPartETags(uploadResponses);
s3Client.CompleteMultipartUpload(completeRequest);
Console.WriteLine("Script is complete. Press any key to exit...");
Console.ReadKey();
}
private static Stream ToStream(string s)
{
var stream = new MemoryStream();
var writer = new StreamWriter(stream);
writer.Write(s);
writer.Flush();
stream.Position = 0;
return stream;
}

Here is what AnonCoward started, finished off by adding seeking - it's a trivial op for a stream that does nothing except write asterisks to its buffer. If you were generating more complex data it would be hard work but for seeking all you need to do is set the position and say "yep, done that" because no matter where you seek to in the stream the behavior of creating asterisks is always the same
class AsteriskGeneratingStream : Stream
{
long _pos = 0;
long _length = 0;
public AsteriskGeneratingStream(long length)
{
_length = length;
}
public override long Length => _length;
public override int Read(byte[] buffer, int offset, int count)
{
// Create the data as needed
if (count + _pos > _length)
count = (int)(_length - _pos);
for (int i = offset; i < count; i++)
buffer[i] = (byte)'*';
_pos += count;
return count;
}
public override bool CanRead => true;
public override long Seek(long offset, SeekOrigin origin)
{
if(origin == SeekOrigin.Begin) //lets just trust that the caller will be sensible and not set e.g. negative offset
_pos = offset;
else if(origin == SeekOrigin.Current)
_pos += offset;
else if(origin == SeekOrigin.End)
_pos = _length + offset;
return _pos;
}
public override bool CanSeek => true;
public override bool CanWrite => false;
public override long Position { get => _pos; set => _pos = value; }
public override void Flush() { }
public override void SetLength(long value) { _length = value; }
public override void Write(byte[] buffer, int offset, int count) { throw new NotImplementedException(); }
}
class Program
{
static void Main(string[] args)
{
long objectSize = 25L * 1024 * 1024;
var s3 = new AmazonS3Client(Amazon.RegionEndpoint.USWest1);
var xfer = new TransferUtility(s3,new TransferUtilityConfig
{
MinSizeBeforePartUpload = 5L * 1024 * 1024
});
var helper = new AsteriskGeneratingStream(objectSize);
xfer.Upload(helper, "bucket-name", "object-key");
}
}
Note, I can't guarantee it'll work right off the bat because I'm on a cellphone and can't test this via c# fiddle but let's see how it blows up! 😀

If you can create the object on the fly, or at least cache fairly small segments, you can create a stream that serves the data up to S3. Note, that unless you can also create any part of the object out of order, you need to prevent the AWS SDK from using a multi-part upload, which will slow down the transfer speed.
class DataStream : Stream
{
long _pos = 0;
long _length = 0;
public DataStream(long length)
{
_length = length;
}
public override long Length => _length;
public override int Read(byte[] buffer, int offset, int count)
{
// Create the data as needed, on demand
// For this example, just cycle through 0 to 256 in the data over and over again
if (count + _pos > _length)
{
count = (int)(_length - _pos);
}
for (int i = 0; i < count; i++)
{
buffer[i + offset] = (byte)((_pos + i) % 256);
}
_pos += count;
return count;
}
public override bool CanRead => true;
// Stub out all other methods. For a seekable stream
// Seek() and Postion need to be implemented, along with CanSeek changed
public override long Seek(long offset, SeekOrigin origin) { throw new NotImplementedException(); }
public override bool CanSeek => false;
public override bool CanWrite => false;
public override long Position { get => _pos; set => throw new NotImplementedException(); }
public override void Flush() { throw new NotImplementedException(); }
public override void SetLength(long value) { throw new NotImplementedException(); }
public override void Write(byte[] buffer, int offset, int count) { throw new NotImplementedException(); }
}
class Program
{
static void Main(string[] args)
{
long objectSize = 25L * 1024 * 1024;
var s3 = new AmazonS3Client(Amazon.RegionEndpoint.USWest1);
// Prevent a multi-part upload, which requires a seekable stream
var xfer = new TransferUtility(s3, new TransferUtilityConfig
{
MinSizeBeforePartUpload = objectSize + 1
});
var helper = new DataStream(objectSize);
xfer.Upload(helper, "bucket-name", "object-key");
}
}

Gzip only after a threshold reached?

I have a requirement to archive all the data used to build a report everyday. I compress most of the the data using gzip, as some of the datasets can be very large (10mb+). I write each individual protobuf graph to a file. I also whitelist a fixed set of known small object types and added some code to detect if the file is gzipped or not, when I read it. This is because a small file, when compressed can actually be bigger then uncompressed.
Unfortunately, just due to the nature of the data, I may only have a few elements of a larger object type, and the whitelist approach can be problematic.
Is there anyway to write an object to a stream, and only if it reaches a threshold (like 8kb), then compress it? I don't know the size of the object beforehand, and sometimes I have an object graph with an IEnumerable<T> that might be considerable in size.
Edit:
The code is fairly basic. I did skim over the fact that I store this in a filestream db table. That shouldn't really matter for the implementation purpose. I removed some of the extraneous code.
public Task SerializeModel<T>(TransactionalDbContext dbConn, T Item, DateTime archiveDate, string name)
{
var continuation = (await dbConn
.QueryAsync<PathAndContext>(_getPathAndContext, new {archiveDate, model=name})
.ConfigureAwait(false))
.First();
var useGzip = !_whitelist.Contains(typeof(T));
using (var fs = new SqlFileStream(continuation.Path, continuation.Context, FileAccess.Write,
FileOptions.SequentialScan | FileOptions.Asynchronous, 64*1024))
using (var buffer = useGzip ? new GZipStream(fs, CompressionLevel.Optimal) : default(Stream))
{
_serializerModel.Serialize(stream ?? fs, item);
}
dbConn.Commit();
}

During the serialization, you can use an intermediate stream to accomplish what you are asking for. Something like this will do the job
class SerializationOutputStream : Stream
{
Stream outputStream, writeStream;
byte[] buffer;
int bufferedCount;
long position;
public SerializationOutputStream(Stream outputStream, int compressTreshold = 8 * 1024)
{
writeStream = this.outputStream = outputStream;
buffer = new byte[compressTreshold];
}
public override long Seek(long offset, SeekOrigin origin) { throw new NotSupportedException(); }
public override void SetLength(long value) { throw new NotSupportedException(); }
public override int Read(byte[] buffer, int offset, int count) { throw new NotSupportedException(); }
public override bool CanRead { get { return false; } }
public override bool CanSeek { get { return false; } }
public override bool CanWrite { get { return writeStream != null && writeStream.CanWrite; } }
public override long Length { get { throw new NotSupportedException(); } }
public override long Position { get { return position; } set { throw new NotSupportedException(); } }
public override void Write(byte[] buffer, int offset, int count)
{
if (count <= 0) return;
var newPosition = position + count;
if (this.buffer == null)
writeStream.Write(buffer, offset, count);
else
{
int bufferCount = Math.Min(count, this.buffer.Length - bufferedCount);
if (bufferCount > 0)
{
Array.Copy(buffer, offset, this.buffer, bufferedCount, bufferCount);
bufferedCount += bufferCount;
}
int remainingCount = count - bufferCount;
if (remainingCount > 0)
{
writeStream = new GZipStream(outputStream, CompressionLevel.Optimal);
try
{
writeStream.Write(this.buffer, 0, this.buffer.Length);
writeStream.Write(buffer, offset + bufferCount, remainingCount);
}
finally { this.buffer = null; }
}
}
position = newPosition;
}
public override void Flush()
{
if (buffer == null)
writeStream.Flush();
else if (bufferedCount > 0)
{
try { outputStream.Write(buffer, 0, bufferedCount); }
finally { buffer = null; }
}
}
protected override void Dispose(bool disposing)
{
try
{
if (!disposing || writeStream == null) return;
try { Flush(); }
finally { writeStream.Close(); }
}
finally
{
writeStream = outputStream = null;
buffer = null;
base.Dispose(disposing);
}
}
}
and use it like this
using (var stream = new SerializationOutputStream(new SqlFileStream(continuation.Path, continuation.Context, FileAccess.Write,
FileOptions.SequentialScan | FileOptions.Asynchronous, 64*1024)))
_serializerModel.Serialize(stream, item);

datasets can be very large (10mb+)
On most devices, that is not very large. Is there a reason you can't read in the entire object before deciding whether to compress? Note also the suggestion from #Niklas to read in one buffer's worth of data (e.g. 8K) before deciding whether to compress.
This is because a small file, when compressed can actually be bigger then uncompressed.
The thing that makes a small file potentially larger is the ZIP header, in particular the dictionary. Some ZIP libraries allow you to use a custom dictionary known while compressing and uncompressing. I used SharpZipLib for this many years back.
It is more effort, in terms of coding and testing, to use this approach. If you feel that the benefit is worthwhile, it may provide the best approach.
Note no matter what path you take, you will physically store data using multiples of the block size of your storage device.
if the object is 1 byte or 100mb I have no idea
Note that protocol buffers is not really designed for large data sets
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data.
If your largest object can comfortably serialize into memory, first serialize it into a MemoryStream, then either write that MemoryStream to your final destination, or run it through a GZipStream and then to your final destination. If the largest object cannot comfortably serialize into memory, I'm not sure what further advice to give.

How to compute hash of a large file chunk?

I want to be able to compute the hashes of arbitrarily sized file chunks of a file in C#.
eg: Compute the hash of the 3rd gigabyte in 4gb file.
The main problem is that I don't want to load the entire file at memory, as there could be several files and the offsets could be quite arbitrary.
AFAIK, the HashAlgorithm.ComputeHash allows me to either use a byte buffer, of a stream. The stream would allow me to compute the hash efficiently, but for the entire file, not just for a specific chunk.
I was thinking to create aan alternate FileStream object and pass it to ComputeHash, where I would overload the FileStream methods and have read only for a certain chunk in a file.
Is there a better solution than this, preferably using the built in C# libraries ?
Thanks.

You should pass in either:
A byte array containing the chunk of data to compute the hash from
A stream that restricts access to the chunk you want to computer the hash from
The second option isn't all that hard, here's a quick LINQPad program I threw together. Note that it lacks quite a bit of error handling, such as checking that the chunk is actually available (ie. that you're passing in a position and length of the stream that actually exists and doesn't fall off the end of the underlying stream).
Needless to say, if this should end up as production code I would add a lot of error handling, and write a bunch of unit-tests to ensure all edge-cases are handled correctly.
You would construct the PartialStream instance for your file like this:
const long gb = 1024 * 1024 * 1024;
using (var fileStream = new FileStream(#"d:\temp\too_long_file.bin", FileMode.Open))
using (var chunk = new PartialStream(fileStream, 2 * gb, 1 * gb))
{
var hash = hashAlgorithm.ComputeHash(chunk);
}
Here's the LINQPad test program:
void Main()
{
var buffer = Enumerable.Range(0, 256).Select(i => (byte)i).ToArray();
using (var underlying = new MemoryStream(buffer))
using (var partialStream = new PartialStream(underlying, 64, 32))
{
var temp = new byte[1024]; // too much, ensure we don't read past window end
partialStream.Read(temp, 0, temp.Length);
temp.Dump();
// should output 64-95 and then 0's for the rest (64-95 = 32 bytes)
}
}
public class PartialStream : Stream
{
private readonly Stream _UnderlyingStream;
private readonly long _Position;
private readonly long _Length;
public PartialStream(Stream underlyingStream, long position, long length)
{
if (!underlyingStream.CanRead || !underlyingStream.CanSeek)
throw new ArgumentException("underlyingStream");
_UnderlyingStream = underlyingStream;
_Position = position;
_Length = length;
_UnderlyingStream.Position = position;
}
public override bool CanRead
{
get
{
return _UnderlyingStream.CanRead;
}
}
public override bool CanWrite
{
get
{
return false;
}
}
public override bool CanSeek
{
get
{
return true;
}
}
public override long Length
{
get
{
return _Length;
}
}
public override long Position
{
get
{
return _UnderlyingStream.Position - _Position;
}
set
{
_UnderlyingStream.Position = value + _Position;
}
}
public override void Flush()
{
throw new NotSupportedException();
}
public override long Seek(long offset, SeekOrigin origin)
{
switch (origin)
{
case SeekOrigin.Begin:
return _UnderlyingStream.Seek(_Position + offset, SeekOrigin.Begin) - _Position;
case SeekOrigin.End:
return _UnderlyingStream.Seek(_Length + offset, SeekOrigin.Begin) - _Position;
case SeekOrigin.Current:
return _UnderlyingStream.Seek(offset, SeekOrigin.Current) - _Position;
default:
throw new ArgumentException("origin");
}
}
public override void SetLength(long length)
{
throw new NotSupportedException();
}
public override int Read(byte[] buffer, int offset, int count)
{
long left = _Length - Position;
if (left < count)
count = (int)left;
return _UnderlyingStream.Read(buffer, offset, count);
}
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotSupportedException();
}
}

You can use TransformBlock and TransformFinalBlock directly. That's pretty similar to what HashAlgorithm.ComputeHash does internally.
Something like:
using(var hashAlgorithm = new SHA256Managed())
using(var fileStream = new File.OpenRead(...))
{
fileStream.Position = ...;
long bytesToHash = ...;
var buf = new byte[4 * 1024];
while(bytesToHash > 0)
{
var bytesRead = fileStream.Read(buf, 0, (int)Math.Min(bytesToHash, buf.Length));
hashAlgorithm.TransformBlock(buf, 0, bytesRead, null, 0);
bytesToHash -= bytesRead;
if(bytesRead == 0)
throw new InvalidOperationException("Unexpected end of stream");
}
hashAlgorithm.TransformFinalBlock(buf, 0, 0);
var hash = hashAlgorithm.Hash;
return hash;
};

Your suggestion - passing in a restricted access wrapper for your FileStream - is the cleanest solution. Your wrapper should defer everything to the wrapped Stream except the Length and Position properties.
How? Simply create a class that inherits from Stream. Make the constructor take:
Your source Stream (in your case, a FileStream)
The chunk start position
The chunk end position
As an extension - this is a list of all the Streams that are available http://msdn.microsoft.com/en-us/library/system.io.stream%28v=vs.100%29.aspx#inheritanceContinued

To easily compute the hash of a chunk of a larger stream, use these two methods:
HashAlgorithm.TransformBlock
HashAlgorithm.TransformFinalBlock
Here's a LINQPad program that demonstrates:
void Main()
{
const long gb = 1024 * 1024 * 1024;
using (var stream = new FileStream(#"d:\temp\largefile.bin", FileMode.Open))
{
stream.Position = 2 * gb; // 3rd gb-chunk
byte[] buffer = new byte[32768];
long amount = 1 * gb;
using (var hashAlgorithm = SHA1.Create())
{
while (amount > 0)
{
int bytesRead = stream.Read(buffer, 0,
(int)Math.Min(buffer.Length, amount));
if (bytesRead > 0)
{
amount -= bytesRead;
if (amount > 0)
hashAlgorithm.TransformBlock(buffer, 0, bytesRead,
buffer, 0);
else
hashAlgorithm.TransformFinalBlock(buffer, 0, bytesRead);
}
else
throw new InvalidOperationException();
}
hashAlgorithm.Hash.Dump();
}
}
}

To answer your original question ("Is there a better solution..."):
Not that I know of.
This seems to be a very special, non-trivial task, so a little extra work might be involved anyway. I think your approach of using a custom Stream-class goes in the right direction, I'd probably do exactly the same.
And Gusdor and xander have already provided very helpful information on how to implement that — good job guys!

Rfc2898 / PBKDF2 with SHA256 as digest in c#

I want to use Rfc2898 in c# to derive a key. I also need to use SHA256 as Digest for Rfc2898. I found the class Rfc2898DeriveBytes, but it uses SHA-1 and I don't see a way to make it use a different digest.
Is there a way to use Rfc2898 in c# with SHA256 as digest (short of implementing it from scratch)?

.NET Core has a new implementation of Rfc2898DeriveBytes.
The CoreFX version no longer has the the hashing algorithm hard-coded
The code is available on Github. It was merged to master on March 2017 and has been shipped with .NET Core 2.0.

For those who need it, .NET Framework 4.7.2 includes an overload of Rfc2898DeriveBytes that allows the hashing algorithm to be specified:
byte[] bytes;
using (var deriveBytes = new Rfc2898DeriveBytes(password, salt, iterations, HashAlgorithmName.SHA256))
{
bytes = deriveBytes.GetBytes(PBKDF2SubkeyLength);
}
The HashAlgorithmName options at the moment are:
MD5
SHA1
SHA256
SHA384
SHA512

See Bruno Garcia's answer.
At the time I started this answer, Rfc2898DeriveBytes was not configurable to use a different hash function. In the meantime, though, it has been improved; see Bruno Garcia's answer. The following function can be used to generate a hashed version of a user-provided password to store in a database for authentication purposes.
For users of older .NET frameworks, this is still useful:
// NOTE: The iteration count should
// be as high as possible without causing
// unreasonable delay. Note also that the password
// and salt are byte arrays, not strings. After use,
// the password and salt should be cleared (with Array.Clear)
public static byte[] PBKDF2Sha256GetBytes(int dklen, byte[] password, byte[] salt, int iterationCount){
using(var hmac=new System.Security.Cryptography.HMACSHA256(password)){
int hashLength=hmac.HashSize/8;
if((hmac.HashSize&7)!=0)
hashLength++;
int keyLength=dklen/hashLength;
if((long)dklen>(0xFFFFFFFFL*hashLength) || dklen<0)
throw new ArgumentOutOfRangeException("dklen");
if(dklen%hashLength!=0)
keyLength++;
byte[] extendedkey=new byte[salt.Length+4];
Buffer.BlockCopy(salt,0,extendedkey,0,salt.Length);
using(var ms=new System.IO.MemoryStream()){
for(int i=0;i<keyLength;i++){
extendedkey[salt.Length]=(byte)(((i+1)>>24)&0xFF);
extendedkey[salt.Length+1]=(byte)(((i+1)>>16)&0xFF);
extendedkey[salt.Length+2]=(byte)(((i+1)>>8)&0xFF);
extendedkey[salt.Length+3]=(byte)(((i+1))&0xFF);
byte[] u=hmac.ComputeHash(extendedkey);
Array.Clear(extendedkey,salt.Length,4);
byte[] f=u;
for(int j=1;j<iterationCount;j++){
u=hmac.ComputeHash(u);
for(int k=0;k<f.Length;k++){
f[k]^=u[k];
}
}
ms.Write(f,0,f.Length);
Array.Clear(u,0,u.Length);
Array.Clear(f,0,f.Length);
}
byte[] dk=new byte[dklen];
ms.Position=0;
ms.Read(dk,0,dklen);
ms.Position=0;
for(long i=0;i<ms.Length;i++){
ms.WriteByte(0);
}
Array.Clear(extendedkey,0,extendedkey.Length);
return dk;
}
}

The BCL Rfc2898DeriveBytes is hardcoded to use sha-1.
KeyDerivation.Pbkdf2 allows for exactly the same output, but it also allows HMAC SHA-256 and HMAC SHA-512. It's faster too; on my machine by around 5 times - and that's good for security, because it allows for more rounds, which makes life for crackers harder (incidentally sha-512 is a lot less gpu-friendly than sha-256 or sha1). And the api is simpler, to boot:
byte[] salt = ...
string password = ...
var rounds = 50000; // pick something bearable
var num_bytes_requested = 16; // 128 bits is fine
var prf = KeyDerivationPrf.HMACSHA512; // or sha256, or sha1
byte[] hashed = KeyDerivation.Pbkdf2(password, salt, prf, rounds, num_bytes_requested);
It's from the nuget package Microsoft.AspNetCore.Cryptography.KeyDerivation which does not depend on asp.net core; it runs on .net 4.5.1 or .net standard 1.3 or higher.

You could use Bouncy Castle. The C# specification lists the algorithm "PBEwithHmacSHA-256", which can only be PBKDF2 with SHA-256.

I know this is an old question, but for anyone that comes across it, you can now use KeyDerivation.Pbkdf2 from the Microsoft.AspNetCore.Cryptography.KeyDerivation nuget package. It is what is used in asp.net core.
Unfortunately it will add a ton of references that aren't really needed. You could just copy the code and paste it into your own project (although you will now have to maintain cryto code which is a PITA)

For what it's worth, here's a copy of Microsoft's implementation but with SHA-1 replaced with SHA512:
namespace System.Security.Cryptography
{
using System.Globalization;
using System.IO;
using System.Text;
[System.Runtime.InteropServices.ComVisible(true)]
public class Rfc2898DeriveBytes_HMACSHA512 : DeriveBytes
{
private byte[] m_buffer;
private byte[] m_salt;
private HMACSHA512 m_HMACSHA512; // The pseudo-random generator function used in PBKDF2
private uint m_iterations;
private uint m_block;
private int m_startIndex;
private int m_endIndex;
private static RNGCryptoServiceProvider _rng;
private static RNGCryptoServiceProvider StaticRandomNumberGenerator
{
get
{
if (_rng == null)
{
_rng = new RNGCryptoServiceProvider();
}
return _rng;
}
}
private const int BlockSize = 20;
//
// public constructors
//
public Rfc2898DeriveBytes_HMACSHA512(string password, int saltSize) : this(password, saltSize, 1000) { }
public Rfc2898DeriveBytes_HMACSHA512(string password, int saltSize, int iterations)
{
if (saltSize < 0)
throw new ArgumentOutOfRangeException("saltSize", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
byte[] salt = new byte[saltSize];
StaticRandomNumberGenerator.GetBytes(salt);
Salt = salt;
IterationCount = iterations;
m_HMACSHA512 = new HMACSHA512(new UTF8Encoding(false).GetBytes(password));
Initialize();
}
public Rfc2898DeriveBytes_HMACSHA512(string password, byte[] salt) : this(password, salt, 1000) { }
public Rfc2898DeriveBytes_HMACSHA512(string password, byte[] salt, int iterations) : this(new UTF8Encoding(false).GetBytes(password), salt, iterations) { }
public Rfc2898DeriveBytes_HMACSHA512(byte[] password, byte[] salt, int iterations)
{
Salt = salt;
IterationCount = iterations;
m_HMACSHA512 = new HMACSHA512(password);
Initialize();
}
//
// public properties
//
public int IterationCount
{
get { return (int)m_iterations; }
set
{
if (value <= 0)
throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
m_iterations = (uint)value;
Initialize();
}
}
public byte[] Salt
{
get { return (byte[])m_salt.Clone(); }
set
{
if (value == null)
throw new ArgumentNullException("value");
if (value.Length < 8)
throw new ArgumentException(String.Format(CultureInfo.CurrentCulture, Environment.GetResourceString("Cryptography_PasswordDerivedBytes_FewBytesSalt")));
m_salt = (byte[])value.Clone();
Initialize();
}
}
//
// public methods
//
public override byte[] GetBytes(int cb)
{
if (cb <= 0)
throw new ArgumentOutOfRangeException("cb", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
byte[] password = new byte[cb];
int offset = 0;
int size = m_endIndex - m_startIndex;
if (size > 0)
{
if (cb >= size)
{
Buffer.InternalBlockCopy(m_buffer, m_startIndex, password, 0, size);
m_startIndex = m_endIndex = 0;
offset += size;
}
else
{
Buffer.InternalBlockCopy(m_buffer, m_startIndex, password, 0, cb);
m_startIndex += cb;
return password;
}
}
//BCLDebug.Assert(m_startIndex == 0 && m_endIndex == 0, "Invalid start or end index in the internal buffer.");
while (offset < cb)
{
byte[] T_block = Func();
int remainder = cb - offset;
if (remainder > BlockSize)
{
Buffer.InternalBlockCopy(T_block, 0, password, offset, BlockSize);
offset += BlockSize;
}
else
{
Buffer.InternalBlockCopy(T_block, 0, password, offset, remainder);
offset += remainder;
Buffer.InternalBlockCopy(T_block, remainder, m_buffer, m_startIndex, BlockSize - remainder);
m_endIndex += (BlockSize - remainder);
return password;
}
}
return password;
}
public override void Reset()
{
Initialize();
}
private void Initialize()
{
if (m_buffer != null)
Array.Clear(m_buffer, 0, m_buffer.Length);
m_buffer = new byte[BlockSize];
m_block = 1;
m_startIndex = m_endIndex = 0;
}
internal static byte[] Int(uint i)
{
byte[] b = BitConverter.GetBytes(i);
byte[] littleEndianBytes = { b[3], b[2], b[1], b[0] };
return BitConverter.IsLittleEndian ? littleEndianBytes : b;
}
// This function is defined as follow :
// Func (S, i) = HMAC(S || i) | HMAC2(S || i) | ... | HMAC(iterations) (S || i)
// where i is the block number.
private byte[] Func()
{
byte[] INT_block = Int(m_block);
m_HMACSHA512.TransformBlock(m_salt, 0, m_salt.Length, m_salt, 0);
m_HMACSHA512.TransformFinalBlock(INT_block, 0, INT_block.Length);
byte[] temp = m_HMACSHA512.Hash;
m_HMACSHA512.Initialize();
byte[] ret = temp;
for (int i = 2; i <= m_iterations; i++)
{
temp = m_HMACSHA512.ComputeHash(temp);
for (int j = 0; j < BlockSize; j++)
{
ret[j] ^= temp[j];
}
}
// increment the block count.
m_block++;
return ret;
}
}
}
In addition to replacing HMACSHA1 with HMACSHA512, you need to add a StaticRandomNumberGenerator property because Utils.StaticRandomNumberGenerator is internal in the microsoft assembly, and you need to add the static byte[] Int(uint i) method because microsoft's Utils.Int is also internal. Other than that, the code works.

Although this is an old question, since I added reference to this question in my Question Configurable Rfc2898DeriveBytes where I asked whether a generic implementation of the Rfc2898DeriveBytes algorithm was correct.
I have now tested and validated that it generates the exact same hash values if HMACSHA1 is provided for TAlgorithm as the .NET implementation of Rfc2898DeriveBytes
In order to use the class, one must provide the constructor for the HMAC algorithm requiring a byte array as the first argument.
e.g.:
var rfcGenSha1 = new Rfc2898DeriveBytes<HMACSHA1>(b => new HMACSHA1(b), key, ...)
var rfcGenSha256 = new Rfc2898DeriveBytes<HMACSHA256>(b => new HMACSHA256(b), key, ...)
This requires the algorithm to inherit HMAC at this point, I'm believe one might be able to Reduce the restriction to require inheritance from KeyedHashAlgorithm instead of HMAC, as long as the constructor of the algorithm accepts an array of bytes to the constructor.

Length of the data to decrypt is invalid

I'm trying to encrypt and decrypt a file stream over a socket using RijndaelManaged, but I keep bumping into the exception
CryptographicException: Length of the data to decrypt is invalid.
at System.Security.Cryptography.RijndaelManagedTransform.TransformFinalBlock(Byte[] inputBuffer, Int32 inputOffset, Int32 inputCount)
at System.Security.Cryptography.CryptoStream.FlushFinalBlock()
at System.Security.Cryptography.CryptoStream.Dispose(Boolean disposing)
The exception is thrown at the end of the using statement in receiveFile, when the whole file has been transferred.
I tried searching the web but only found answers to problems that arise when using Encoding when encrypting and decrypting a single string. I use a FileStream, so I don't specify any Encoding to be used, so that should not be the problem. These are my methods:
private void transferFile(FileInfo file, long position, long readBytes)
{
// transfer on socket stream
Stream stream = new FileStream(file.FullName, FileMode.Open);
if (position > 0)
{
stream.Seek(position, SeekOrigin.Begin);
}
// if this should be encrypted, wrap the encryptor stream
if (UseCipher)
{
stream = new CryptoStream(stream, streamEncryptor, CryptoStreamMode.Read);
}
using (stream)
{
int read;
byte[] array = new byte[8096];
while ((read = stream.Read(array, 0, array.Length)) > 0)
{
streamSocket.Send(array, 0, read, SocketFlags.None);
position += read;
}
}
}
private void receiveFile(FileInfo transferFile)
{
byte[] array = new byte[8096];
// receive file
Stream stream = new FileStream(transferFile.FullName, FileMode.Append);
if (UseCipher)
{
stream = new CryptoStream(stream, streamDecryptor, CryptoStreamMode.Write);
}
using (stream)
{
long position = new FileInfo(transferFile.Path).Length;
while (position < transferFile.Length)
{
int maxRead = Math.Min(array.Length, (int)(transferFile.Length - position));
int read = position < array.Length
? streamSocket.Receive(array, maxRead, SocketFlags.None)
: streamSocket.Receive(array, SocketFlags.None);
stream.Write(array, 0, read);
position += read;
}
}
}
This is the method I use to set up the ciphers. byte[] init is a generated byte array.
private void setupStreamCipher(byte[] init)
{
RijndaelManaged cipher = new RijndaelManaged();
cipher.KeySize = cipher.BlockSize = 256; // bit size
cipher.Mode = CipherMode.ECB;
cipher.Padding = PaddingMode.ISO10126;
byte[] keyBytes = new byte[32];
byte[] ivBytes = new byte[32];
Array.Copy(init, keyBytes, 32);
Array.Copy(init, 32, ivBytes, 0, 32);
streamEncryptor = cipher.CreateEncryptor(keyBytes, ivBytes);
streamDecryptor = cipher.CreateDecryptor(keyBytes, ivBytes);
}
Anyone have an idea in what I might be doing wrong?

It looks to me like you're not properly sending the final block. You need to at least FlushFinalBlock() the sending CryptoStream in order to ensure that the final block (which the receiving stream is looking for) is sent.
By the way, CipherMode.ECB is more than likely an epic fail in terms of security for what you're doing. At least use CipherMode.CBC (cipher-block chaining) which actually uses the IV and makes each block dependent on the previous one.
EDIT: Whoops, the enciphering stream is in read mode. In that case you need to make sure you read to EOF so that the CryptoStream can deal with the final block, rather than stopping after readBytes. It's probably easier to control if you run the enciphering stream in write mode.
One more note: You cannot assume that bytes in equals bytes out. Block ciphers have a fixed block size they process, and unless you are using a cipher mode that converts the block cipher to a stream cipher, there will be padding that makes the ciphertext longer than the plaintext.

After the comment made by Jeffrey Hantin, I changed some lines in receiveFile to
using (stream) {
FileInfo finfo = new FileInfo(transferFile.Path);
long position = finfo.Length;
while (position < transferFile.Length) {
int maxRead = Math.Min(array.Length, (int)(transferFile.Length - position));
int read = position < array.Length
? streamSocket.Receive(array, maxRead, SocketFlags.None)
: streamSocket.Receive(array, SocketFlags.None);
stream.Write(array, 0, read);
position += read;
}
}
->
using (stream) {
int read = array.Length;
while ((read = streamSocket.Receive(array, read, SocketFlags.None)) > 0) {
stream.Write(array, 0, read);
if ((read = streamSocket.Available) == 0) {
break;
}
}
}
And voila, she works (because of the ever so kind padding that I didn't care to bother about earlier). I'm not sure what happens if Available returns 0 even though all data hasn't been transferred, but I'll tend to that later in that case. Thanks for your help Jeffrey!
Regards.

cipher.Mode = CipherMode.ECB;
Argh! Rolling your own security code is almost always a bad idea.

Mine i just removed the padding and it works
Commented this out - cipher.Padding = PaddingMode.ISO10126;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.