Hash from different files is always the same

Hash from different files is always the same - c#

I'm building an API which has a method that accepts a file via POST request.
Based on that file, i need to create a hash on the file itself (not the name), check if the hash already exists and do some other actions.
My problem is that whatever file i will send through postman, the hash is always the same for every file, which means that every time i get only 1 file which is overwritten.
Here is my method
private string GetHashFromImage(IFormFile file)
{
/* Creates a hash with the image as a parameter
* with the SHA1 algorithm and returns the hash
* as a string since the ComputeHash() method
* creates a byte array.
*/
System.IO.MemoryStream image = new System.IO.MemoryStream();
file.CopyTo(image);
var hashedValue = System.Security.Cryptography.SHA1.Create().ComputeHash(image);
var hashAsString = Convert.ToBase64String(hashedValue).Replace(#"/", #"");
image.Seek(0, System.IO.SeekOrigin.Begin);
return hashAsString;
}
}
I need a hash method that is agnostic to OS and will return the same hash on each file.

Not entirely sure why you're solution is not working but I think I have an idea on how to achieve what you want and it uses MD5 instead of SHA1.
Let's create a function that will receive an IFormFile, compute the MD5 hash of its contents then return the hash value as a string.
using System;
using System.IO;
using System.Security.Cryptography;
private string GetMD5Hash(IFormFile file)
{
// get stream from file then convert it to a MemoryStream
MemoryStream stream = new MemoryStream();
file.OpenReadStream().CopyTo(stream);
// compute md5 hash of the file's byte array.
byte[] bytes = MD5.Create().ComputeHash(stream.ToArray());
return BitConverter.ToString(bytes).Replace("-",string.Empty).ToLower();
}
Hope it works for you!

The real reason of this behaviour is the last position (same as position after image.Seek(0, System.IO.SeekOrigin.End)) in the calculated stream.
Stream operations like CopyTo, ComputeHash, etc change the position of sreams because they have to iterate through them. The final hash of any stream with position on the end is always same - like a hash of empty stream or empty array.
Convert stream to array works, of course, because to array function works with whole stream (from position = 0) but it is not generally very elegant solution because you have to copy whole stream into memory (this is same for memory stream - the data are also in memory).
When you work directly with stream the function (like compute hash from stream) reads the stream by small chunks (like 4096B) and compute hash iteretively (.NET source code). It means that original solution should work when the seek operation to the start is performed before hash calculation.
Actually you should be able to compute hash directly from input stream (in IFormFile) without copy whole stream into memory (array or memory stream) with better performance and without risk e.g. OutOfMemoryException.

Related

Read large base64 strings into chunks to avoid allocation to LOH C#

I'm sent large files from an API as base64 encoded strings which I convert to a byte[] array (in one go) and then return to the client via controller action, example:
byte[] fileBytes = Convert.FromBase64String(base64File);
return this.File(fileBytes);
FileContentResult - MSDN
In some cases, these files end up being very large which I believe is causing the fileBytes object to be allocated to the LOH where it wont be immediately freed once out of scope. This is happening enough that it is causing out of memory exceptions and the application to restart.
My question is, how can I read these large base64 strings without allocating a byte[] to the LOH? I thought about reading it into a stream and then returning a FileStreamResult instead e.g:
using(var ms = new MemoryStream(fileBytes))
{
// return stream from action
}
But I'd still need to convert the base64 to byte[] first. Is it possible to read the the base64 itself in smaller chunks, therefore creating smaller byte[] which wouldn't end up in LOH?

Encrypt\decrypt a part of big file using salsa20

I have homework from my university teacher. I have to write code which will encrypt\decrypt small part of big file (about 10GB). I use algorithm Salsa20.
The main thing is not to load RAM. As he said, I should read, for example, 100 lines then encrypt\decrypt it, write to file and back.
I create List
List<string> dict = new List<string>();
Read lines (because reading all bytes is loading lots of RAM)
using (StreamReader sReader = new StreamReader(filePath))
{
while (dict.Count < 100)
{
dict.Add(sReader.ReadLine());
}
}
Try to create one line from
string words = string.Join("", dict.ToArray());
Encrypt this line
string encrypted;
using (var salsa = new Salsa20.Salsa20())
using (var mstream_out = new MemoryStream())
{
salsa.Key = key;
salsa.IV = iv;
using (var cstream = new CryptoStream(mstream_out,
salsa.CreateEncryptor(), CryptoStreamMode.Write))
{
var bytes = Encoding.UTF8.GetBytes(words);
cstream.Write(bytes, 0, bytes.Length);
}
encrypted = Encoding.UTF8.GetString(mstream_out.ToArray());
}
Then I need to write 100 lines of encrypted string, but I don't know how to do it! Is there any solution?

OK, so here's what you could do.
Accept a filename, a starting line number and an ending line number.
Read the lines, simply writing them them to another file if they are lower than the starting line number or larger than the ending line number.
Once you read a line that is in the range, you can encrypt it with the key and an IV. You will possibly need to encode it to a byte array e.g. using UTF-8 first, as modern ciphers such as Salsa operate on bytes, not text.
You can use the line number possibly as nonce/IV for your stream cipher, if you don't expect the number of lines to change. Otherwise you can prefix the ciphertext with a large, fixed size, random nonce.
The ciphertext - possibly including the nonce - can be encoded as base64 without line endings. Then you write the base 64 line to the other file.
Keep encrypting the lines until you found the end index. It is up to you if your ending line is inclusive or exclusive.
Now read the remaining lines, and write them to the other file.
Don't forget to finalize the encryption and close the file. You may possibly want to destroy the source input file.
Encrypting bytes may be easier as you could write to the original file. However, writing encrypted strings will likely always expand the ciphertext compared with the plaintext. So you need to copy the file, as it needs to grow from the middle out.
I haven't got a clue why you would keep a list or dictionary in memory. If that's part of the requirements then I don't see it in the rest of the question. If you read in all the lines of a file that way then clearly you're using up memory.
Of course, if your 4 GiB file is just a single line then you're still using way too much memory. In that case you need to stream everything, parsing text from files, putting it in a character buffer, character-decoding it, encrypting it, encoding it again to base 64 and writing it to file. Certainly doable, but tricky if you've never done such things.

Modified Memorystream Byte length less than when its saved

I'm loading in a binary file into a memory stream and modifying the bytes and then storing the file to disk. However to save time I retain the modified byte array to calculate a checksum. When I load the saved file from disk and calculate the checksum the file length is about 150 bytes different from the original byte length when it was saved and obviously the checksum doesn't match the one before it was saved. Any ideas as to why this happens? I've searched and searched for clues but it looks like I'd have to reload the file after it was saved to calculate an accurate checksum.
Also note that the shorter byte array does render its contents correctly and so does the longer byte array, in fact the two arrays render identically!
Here's the code that collects the modified bytes from the memory stream:
writerStream.Flush();
storedFile = new Byte[writerStream.Length];
writerStream.Position = 0;
writerStream.Read(storedFile, 0, Convert.ToInt32(writerStream.Length));
And here's how I read the file:
using (BinaryReader readFile = new BinaryReader(Delimon.Win32.IO.File.Open(filePath, Delimon.Win32.IO.FileMode.Open)))
{
byte[] cgmBytes = readFile.ReadBytes(Convert.ToInt32(readFile.BaseStream.Length));
hash = fileCheck.ComputeHash(cgmBytes);
}
And here's how the file is saved:
(using BinaryWriter aWriter = new BinaryWriter(Delimon.Win32.IO.File.Create(filePath))
{
aWriter.Write(storedFile);
}
Any suggestions would be much appreciated.
Thx

The problem seems to have resolved itself by simply changing the point where the stream position is set:
writerStream.Flush();
writerStream.Position = 0;
storedFile = new Byte[writerStream.Length];
writerStream.Read(storedFile, 0, Convert.ToInt32(writerStream.Length));
In the previous code the Position was set after reading the stream length, now the position is set before reading the stream length. In either case the byte length DOES NOT CHANGE, but the saved file when retrieved returns the identical byte length now. Why? Not sure, setting the stream position does not affect the stream length nor should it affect how a newly instantiated writer decides to save the byte array. Gremlins?...

C# BaseStream returning same MD5 hash regardless of what is in the stream

The question says it all. This code
string hash = "";
using (var md5 = System.Security.Cryptography.MD5.Create())
{
hash = Convert.ToBase64String(md5.ComputeHash(streamReader.BaseStream));
}
will always return the same hash.
If I pass all of the data from the BaseStream into a MemoryStream, it gives a unique hash every time. The same goes for running
string hash = "";
using (var md5 = System.Security.Cryptography.MD5.Create())
{
hash = Convert.ToBase64String(md5.ComputeHash(
Encoding.ASCII.GetBytes(streamReader.ReadToEnd())));
}
The second one is actually faster, but I've heard it's bad practice.
My question is, what is the proper way to use ComputeHash(stream). For me it always (and I mean always, even if I restart the program, meaning it's not just hashing the reference) returns the same hash, regardless of the data in the stream.

The Stream instance is likely positioned at the end of the stream. ComputeHash returns the hash for the bytes from the current position to the end of the stream. So if the current position is the end of the stream, it will the hash for the empty input. Make sure that the Stream instance is positioned at the beginning of the stream.

I solved this issue by setting stream.Position = 0 before ComputeHash

Reading a Stream is changing the hash value of it

I have the following code, which is reading a Stream to store the content of it as a string. Unfortunately after the StreamReader is not used anymore, the hash value of the Stream has changed. How is this possible? The Stream is readonly and thus can't be changed.
string content;
string hash = Cryptography.CalculateSHA1Hash(stream); // 5B006E35CF1838871FDC1E3DF52B0CB5A8A97274
using (StreamReader reader = new StreamReader(stream))
{
content = reader.ReadToEnd();
}
hash = Cryptography.CalculateSHA1Hash(stream); // DA39A3EE5E6B4B0D3255BFEF95601890AFD80709

The SHA1 value DA39A3EE5E6B4B0D3255BFEF95601890AFD80709 is the result of hashing an empty string. The call to Cryptography.CalculateSHA1Hash reads everything (from the current position to the end) from the string and hashes it. There's no more data to read after the first call to Cryptography.CalculateSHA1Hash.
I would also guess that your StreamReader.ReadToEnd() returns an empty string due to the same reason.

Are the 2 values always the same? It's possible that your method is creating a hash from the contents of the stream, and in the second instance the stream is at the end position so has no more data to be read.
If you seek the stream back to the beginning do you get consistent numbers?

Maybe the stream Position has changed (e.g. by ReadToEnd) and the digest is computed from the current Position ?
It's only a guess since we can't help you much without seeing the code for Cryptography.CalculateSHA1Hash.

You have placed your StreamReader into a using block - a good thing. However, TextReader.Dispose by default calls Dispose on the underlying stream. This is likely to change things.
Try checking the hash from within the using block.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.