I'm writing an application that needs to verify HMAC-SHA256 checksums. The code I currently have looks something like this:
static bool VerifyIntegrity(string secret, string checksum, string data)
{
// Verify HMAC-SHA256 Checksum
byte[] key = System.Text.Encoding.UTF8.GetBytes(secret);
byte[] value = System.Text.Encoding.UTF8.GetBytes(data);
byte[] checksum_bytes = System.Text.Encoding.UTF8.GetBytes(checksum);
using (var hmac = new HMACSHA256(key))
{
byte[] expected_bytes = hmac.ComputeHash(value);
return checksum_bytes.SequenceEqual(expected_bytes);
}
}
I know that this is susceptible to timing attacks.
Is there a message digest comparison function in the standard library? I realize I could write my own time hardened comparison method, but I have to believe that this is already implemented elsewhere.
EDIT: Original answer is below - still worth reading IMO, but regarding the timing attack...
The page you referenced gives some interesting points about compiler optimizations. Given that you know the two byte arrays will be the same length (assuming the size of the checksum isn't particularly secret, you can immediately return if the lengths are different) you might try something like this:
public static bool CompareArraysExhaustively(byte[] first, byte[] second)
{
if (first.Length != second.Length)
{
return false;
}
bool ret = true;
for (int i = 0; i < first.Length; i++)
{
ret = ret & (first[i] == second[i]);
}
return ret;
}
Now that still won't take the same amount of time for all inputs - if the two arrays are both in L1 cache for example, it's likely to be faster than if it has to be fetched from main memory. However, I suspect that is unlikely to cause a significant issue from a security standpoint.
Is this okay? Who knows. Different processors and different versions of the CLR may take different amounts of time for an & operation depending on the two operands. Basically this is the same as the conclusion of the page you referenced - that it's probably as good as we'll get in a portable way, but that it would require validation on every platform you try to run on.
At least the above code only uses relatively simple operations. I would personally avoid using LINQ operations here as there could be sneaky optimizations going on in some cases. I don't think there would be in this case - or they'd be easy to defeat - but you'd at least have to think about them. With the above code, there's at least a reasonably close relationship between the source code and IL - leaving "only" the JIT compiler and processor optimizations to worry about :)
Original answer
There's one significant problem with this: in order to provide the checksum, you have to have a string whose UTF-8 encoded form is the same as the checksum. There are plenty of byte sequences which simply don't represent UTF-8-encoded text. Basically, trying to encode arbitrary binary data as text using UTF-8 is a bad idea.
Base64, on the other hand, is basically designed for this:
static bool VerifyIntegrity(string secret, string checksum, string data)
{
// Verify HMAC-SHA256 Checksum
byte[] key = Encoding.UTF8.GetBytes(secret);
byte[] value = Encoding.UTF8.GetBytes(data);
byte[] checksumBytes = Convert.FromBase64String(checksum);
using (var hmac = new HMACSHA256(key))
{
byte[] expectedBytes = hmac.ComputeHash(value);
return checksumBytes.SequenceEqual(expectedBytes);
}
}
On the other hand, instead of using SequenceEqual on the byte array, you could Base64 encode the actual hash, and see whether that matches:
static bool VerifyIntegrity(string secret, string checksum, string data)
{
// Verify HMAC-SHA256 Checksum
byte[] key = Encoding.UTF8.GetBytes(secret);
byte[] value = Encoding.UTF8.GetBytes(data);
using (var hmac = new HMACSHA256(key))
{
return checksum == Convert.ToBase64String(hmac.ComputeHash(value));
}
}
I don't know of anything better within the framework. It wouldn't be too hard to write a specialized SequenceEqual operator for arrays (or general ICollection<T> implementations) which checked for equal lengths first... but given that the hashes are short, I wouldn't worry about that.
If you're worried about the timing of the SequenceEqual, you could always replace it with something like this:
checksum_bytes.Zip( expected_bytes, (a,b) => a == b ).Aggregate( true, (a,r) => a && r );
This returns the same result as SequenceEquals but always check every element before given an answer this less chance of revealing anything through a timing attack.
How it is susceptible to timing attacks? Your code works the same amount of time in the case of valid or invalid digest. And calculate digest/check digest looks like the easiest way to check this.
Related
I want to know how return values for strings works for strings in C#. In one of my functions, I generate html code and the string is really huge, I then return it from the function, and then insert it into the page. But I want to know should I pass a huge string as a return value, or just insert it into the page from the same function?
When C# returns a string, does it create a new string from the old one, and return that?
Thanks.
Strings (or any other reference type) are not copied when returning from a function, only value types are.
System.String is a reference type (class) and so passing as parameter and returning only involve the copying of a reference (32 or 64 bits).
The size of the string is not relevant.
Returning a string is a cheap operation - as mentioned it's purely a matter of returning 32 or 64 bits (4 or 8 bytes).
However, as Sten Petrov points out string + operations involve the creation of a new string, and can be a little expensive. If you wanted to save performance & memory I'd suggest doing something like this:
static int i = 0;
static void Main(string[] args)
{
while (Console.ReadLine() == "")
{
var pageSB = new StringBuilder();
foreach (var section in new[] { AddHeader(), AddContent(), AddFooter() })
for (int i = 0; i < section.Length; i++)
pageSB.Append(section[i]);
Console.Write(pageSB.ToString());
}
}
static StringBuilder AddHeader()
{
return new StringBuilder().Append("Hi ").AppendLine("World");
}
static StringBuilder AddContent()
{
return new StringBuilder()
.AppendFormat("This page has been viewed: {0} times\n", ++i);
}
static StringBuilder AddFooter()
{
return new StringBuilder().Append("Bye ").AppendLine("World");
}
Here we use the StringBuilders to hold a reference to all the strings we want to concat, and wait until the very end before joining them together. This'll save many unnecessary additions (which are memory and CPU heavy in comparison).
Of course, I doubt you'll actually see any need for this in practise - and if you do I'd spend some time learning about pooling etc. to help reduce the garbage created by all the string builders - and maybe consider creating a custom 'string holder' that suits your purposes better.
My goal is to create a method with the following requirements:
Output should be consistent across different app domains (but which are running the same version of the .Net framework)
Objects of different types should not generate the same hash
Collisions are extremely unlikely
The method will be called fairly frequently, so should not be too slow
The implementations that I'm considering look something like:
private static long GenerateHash<TKey>(TKey key)
{
long typeHash = typeof(TKey).GetHashCode();
long keyHash = key.GetHashCode();
return (typeHash << 32) + keyHash;
}
and
private static long GenerateHash<TKey>(TKey key)
{
using (var stream = new MemoryStream())
{
var formatter = new BinaryFormatter(); // Or other serialiser
formatter.Serialize(stream, key);
stream.Seek(0, SeekOrigin.Begin);
var hashAlgorithm = new SuitableHashAlgorithm(); // Not real class, need to find/write a hash algorithm that can compute 64 bit hashes...
var hash = hashAlgorithm.ComputeHash(stream);
return BitConverter.ToInt64(hash, 0);
}
}
Note, the possible nullness of key is not a consideration.
Any comments and potential pitfalls of these implementations welcomed along with any other possible ones.
Thanks
It appears that the requirements cannot be met with the stated method signature, thanks all for your comments, particularly #Marc Gravell.
I'll introduce a suitable interface with a UniqueId property which all keys will implement.
I was hoping to avoid this in order to maintain backwards compatibility, but hey ho, you can't always get what you want!
I have the following code to encode password field but it gets error when password field is longer than ten characters.
private string base64Encode(string sData)
{
try
{
byte[] encData_byte = new byte[sData.Length];
//encData_byte = System.Text.Encoding.UTF8.GetBytes(sData);
encData_byte = System.Text.Encoding.UTF8.GetBytes(sData);
string encodedData = Convert.ToBase64String(encData_byte);
return encodedData;
}
catch (Exception ex)
{
throw new Exception("Error in base64Encode" + ex.Message);
}
}
This is the code to decode the encoded value
public string base64Decode(string sData)
{
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
System.Text.Decoder utf8Decode = encoder.GetDecoder();
byte[] todecode_byte = Convert.FromBase64String(sData);
int charCount = utf8Decode.GetCharCount(todecode_byte, 0, todecode_byte.Length);
char[] decoded_char = new char[charCount];
utf8Decode.GetChars(todecode_byte, 0, todecode_byte.Length, decoded_char, 0);
string result = new String(decoded_char);
return result;
}
That code itself shouldn't be failing - but it's not actually providing any protection for the password. I'm not sure what kind of "encoding" you're really trying to do, but this is not the way to do it. Issues:
Even if this worked, it's terrible from a security point of view - this isn't encryption, hashing, or anything like it
You're allocating a new byte array for no good reason - why?
You're catching Exception, which is almost always a bad idea
Your method ignores .NET naming conventions
If you can explain to us what the bigger picture is, we may be able to suggest a much better approach.
My guess is that the exception you're seeing is actually coming when you call Convert.FromBase64String, i.e. in the equivalent decoding method, which you haven't shown us.
I think you will need to modify your code.
These are 2 links which gives more details -
Encrypt and decrypt a string
http://msdn.microsoft.com/en-us/library/system.security.cryptography.rsacryptoserviceprovider.aspx
They are correct about this not being secure. But the question you asked was why is the code failing. Base64 strings usually take up more space than the string they encode. You are trying to store the same amount of data in fewer characters (64 instead of 255), so it expands the string. Since you are dimensioning the array based on the size of the string, any time the base 64 string exceeds the size of the base 255 string, you get an error. Instead of writing the code yourself, use the built in converters.
System.Convert.ToBase64String()
System.Convert.FromBase64String()
But as I mentioned before, this is not secure, so only use this if you are trying to do something with a legacy system, and you need to preserve functionality for some reason.
I have faced aweird problem with the following code, the code below suppose to stop after one iteration, but it just keep going. However, if I remove the last "result_bytes = md5.ComputeHash(orig_bytes);" then it will work. Does anyone face similar problem before?
MD5 md5;
byte[] orig_bytes;
byte[] result_bytes;
Dictionary<byte[], string> hashes = new Dictionary<byte[], string>();
string input = "NEW YORK";
result_bytes = UnicodeEncoding.Default.GetBytes("HELLO");
while (!hashes.ContainsKey(result_bytes))
{
md5 = new MD5CryptoServiceProvider();
orig_bytes = UnicodeEncoding.Default.GetBytes(input);
result_bytes = md5.ComputeHash(orig_bytes);
hashes.Add(result_bytes, input);
Console.WriteLine(BitConverter.ToString(result_bytes));
Console.WriteLine(hashes.ContainsKey(result_bytes));
result_bytes = md5.ComputeHash(orig_bytes);
}
When you reassign result_bytes to a new value in the last line, you have a new reference to a byte array, which is not equal to the one in the collection, therefore hashes.ContainsKey returns false.
You're assuming that byte arrays override Equals and GetHashCode to compare for equality: they don't. They just use the default identity test - so without the extra assignment at the end, you're just checking whether the exact key object you've just added is still in the dictionary - which of course it is.
One way round this would be to store a reversible string representation of the hash (e.g. using base64), instead of the hash itself. Or write your own implementation of IEqualityComparer<byte[]> and pass that to the Dictionary constructor, so that it uses that implementation to find the hash code of byte arrays and compare them with each other.
In short: this has nothing to do with MD5, and everything to do with the fact that
Console.WriteLine(new byte[0].Equals(new byte[0]));
will print False :)
I'm trying to do some parsing that will be easier using regular expressions.
The input is an array (or enumeration) of bytes.
I don't want to convert the bytes to chars for the following reasons:
Computation efficiency
Memory consumption efficiency
Some non-printable bytes might be complex to convert to chars. Not all the bytes are printable.
So I can't use Regex.
The only solution I know, is using Boost.Regex (which works on bytes - C chars), but this is a C++ library that wrapping using C++/CLI will take considerable work.
How can I use regular expressions on bytes in .NET directly, without working with .NET strings and chars?
Thank you.
There is a bit of impedance mismatch going on here. You want to work with Regular expressions in .Net which use strings (multi-byte characters), but you want to work with single byte characters. You can't have both at the same time using .Net as per usual.
However, to break this mismatch down, you could deal with a string in a byte oriented fashion and mutate it. The mutated string can then act as a re-usable buffer. In this way you will not have to convert bytes to chars, or convert your input buffer to a string (as per your question).
An example:
//BLING
byte[] inputBuffer = { 66, 76, 73, 78, 71 };
string stringBuffer = new string('\0', 1000);
Regex regex = new Regex("ING", RegexOptions.Compiled);
unsafe
{
fixed (char* charArray = stringBuffer)
{
byte* buffer = (byte*)(charArray);
//Hard-coded example of string mutation, in practice you would
//loop over your input buffers and regex\match so that the string
//buffer is re-used.
buffer[0] = inputBuffer[0];
buffer[2] = inputBuffer[1];
buffer[4] = inputBuffer[2];
buffer[6] = inputBuffer[3];
buffer[8] = inputBuffer[4];
Console.WriteLine("Mutated string:'{0}'.",
stringBuffer.Substring(0, inputBuffer.Length));
Match match = regex.Match(stringBuffer, 0, inputBuffer.Length);
Console.WriteLine("Position:{0} Length:{1}.", match.Index, match.Length);
}
}
Using this technique you can allocate a string "buffer" which can be re-used as the input to Regex, but you can mutate it with your bytes each time. This avoids the overhead of converting\encoding your byte array into a new .Net string each time you want to do a match. This could prove to be very significant as I have seen many an algorithm in .Net try to go at a million miles an hour only to be brought to its knees by string generation and the subsequent heap spamming and time spent in GC.
Obviously this is unsafe code, but it is .Net.
The results of the Regex will generate strings though, so you have an issue here. I'm not sure if there is a way of using Regex that will not generate new strings. You can certainly get at the match index and length information but the string generation violates your requirements for memory efficiency.
Update
Actually after disassembling Regex\Match\Group\Capture, it looks like it only generates the captured string when you access the Value property, so you may at least not be generating strings if you only access index and length properties. However, you will be generating all the supporting Regex objects.
Well, if I faced this problem, I would DO the C++/CLI wrapper, except I'd create specialized code for what I want to achieve. Eventually develop the wrapper with time to do general things, but this just an option.
The first step is to wrap the Boost::Regex input and output only. Create specialized functions in C++ that do all the stuff you want and use CLI just to pass the input data to the C++ code and then fetch the result back with the CLI. This doesn't look to me like too much work to do.
Update:
Let me try to clarify my point. Even though I may be wrong, I believe you wont be able to find any .NET Binary Regex implementation that you could use. That is why - whether you like it or not - you will be forced to choose between CLI wrapper and bytes-to-chars conversion to use .NET's Regex. In my opinion the wrapper is better choice, because it will be working faster. I did not do any benchmarking, this is just an assumption based on:
Using wrapper you just have to cast
the pointer type (bytes <-> chars).
Using .NET's Regex you have to
convert each byte of the input.
As an alternative to using unsafe, just consider writing a simple, recursive comparer like:
static bool Evaluate(byte[] data, byte[] sequence, int dataIndex=0, int sequenceIndex=0)
{
if (sequence[sequenceIndex] == data[dataIndex])
{
if (sequenceIndex == sequence.Length - 1)
return true;
else if (dataIndex == data.Length - 1)
return false;
else
return Evaluate(data, sequence, dataIndex + 1, sequenceIndex + 1);
}
else
{
if (dataIndex < data.Length - 1)
return Evaluate(data, sequence, dataIndex+1, 0);
else
return false;
}
}
You could improve efficiency in a number of ways (i.e. seeking the first byte match instead of iterating, etc.) but this could get you started... hope it helps.
I personally went a different approach and wrote a small state machine that can be extended. I believe if parsing protocol data this is much more readable than regex.
bool ParseUDSResponse(PassThruMsg rxMsg, UDScmd.Mode txMode, byte txSubFunction, out UDScmd.Response functionResponse, out byte[] payload)
{
payload = new byte[0];
functionResponse = UDScmd.Response.UNKNOWN;
bool positiveReponse = false;
var rxMsgBytes = rxMsg.GetBytes();
//Iterate the reply bytes to find the echod ECU index, response code, function response and payload data if there is any
//If we could use some kind of HEX regex this would be a bit neater
//Iterate until we get past any and all null padding
int stateMachine = 0;
for (int i = 0; i < rxMsgBytes.Length; i++)
{
switch (stateMachine)
{
case 0:
if (rxMsgBytes[i] == 0x07) stateMachine = 1;
break;
case 1:
if (rxMsgBytes[i] == 0xE8) stateMachine = 2;
else return false;
case 2:
if (rxMsgBytes[i] == (byte)txMode + (byte)OBDcmd.Reponse.SUCCESS)
{
//Positive response to the requested mode
positiveReponse = true;
}
else if(rxMsgBytes[i] != (byte)OBDcmd.Reponse.NEGATIVE_RESPONSE)
{
//This is an invalid response, give up now
return false;
}
stateMachine = 3;
break;
case 3:
functionResponse = (UDScmd.Response)rxMsgBytes[i];
if (positiveReponse && rxMsgBytes[i] == txSubFunction)
{
//We have a positive response and a positive subfunction code (subfunction is reflected)
int payloadLength = rxMsgBytes.Length - i;
if(payloadLength > 0)
{
payload = new byte[payloadLength];
Array.Copy(rxMsgBytes, i, payload, 0, payloadLength);
}
return true;
} else
{
//We had a positive response but a negative subfunction error
//we return the function error code so it can be relayed
return false;
}
default:
return false;
}
}
return false;
}