Secure Equals: Why is inequality still consistently less time?

Secure Equals: Why is inequality still consistently less time? - c#

I have a method that compares two byte arrays for equality, with the major caveat that it does not fail and exit early when it detects inequality. Basically the code is used to compare Cross-Site Request Forgery tokens, and avoid (as much as possible) the ability to use timing to hack the key. I wish I could find the link to the paper that discusses the attack in detail, but the important thing is that the code I have still has a statistically measurable bias for returning sooner if the two byte arrays are equal--although it is an order of magnitude better. So without further ado, here is the code:
public static bool SecureEquals(byte[] original, byte[] potential)
{
// They should be the same size, but we don't want to throw an
// exception if we are wrong.
bool isEqual = original.Length == potential.Length;
int maxLenth = Math.Max(original.Length, potential.Length);
for(int i=0; i < maxLength; i++)
{
byte originalByte = (i < original.Length) ? original[i] : (byte)0;
byte potentialByte = (i < potential.Length) ? potential[i] : (byte)0;
isEqual = isEqual && (originalByte == potentialByte);
}
return isEqual;
}
The difference in average timing between equal and unequal tokens is consistently 10-25ms (depending on garbage collection cycles) shorter for unequal tokens. This is precisely what I want to avoid. If the relative timing were equal, or the average timing swapped based on the run I would be happy. The problem is that we are consistently running shorter for unequal tokens. In contrast, if we stopped the loop on the first unequal token, we could have up to an 80x difference in timing.
While this equality check is a major improvement over the typical eager return, it is still not good enough. Essentially, I don't want any consistent result for equality or inequality returning faster. If I could get the results into the range where garbage collection cycles will mask any consistent bias, I will be happy.
Anyone have a clue what is causing the timing bias toward inequality being faster? At first I thought it was the ternary operator returning an access to the array or a constant if the arrays were of unequal size. The problem is that I still get this bias if the two arrays are the same size.
NOTE: As requested, the links to the articals on Timing Attacks:
http://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf (official paper, linked to from the blog post below)
http://codahale.com/a-lesson-in-timing-attacks/ (talks about the failure in Java's library)
http://en.wikipedia.org/wiki/Timing_attack (more general, but not as complete)

This line could be causing a problem:
isEqual = isEqual && (originalByte == potentialByte);
That won't bother evaluating the originalByte == potentialByte subexpression if isEquals is already false. You don't want the shortcircuiting here, so change it to:
isEqual = isEqual & (originalByte == potentialByte);
EDIT: Note that you're already effectively leaking the information about the size of the original data - because it will always run in a constant time until the potential array exceeds the original array in size, at which point the time will increase. It's probably quite tricky to avoid this... so I would go for the "throw an exception if they're not the right size" approach which explicitly acknowledges it, effectively.
EDIT: Just to go over the idea I included in comments:
// Let's assume you'll never really need more than this
private static readonly byte[] LargeJunkArray = new byte[1024 * 32];
public static bool SecureEquals(byte[] original, byte[] potential)
{
// Reveal that the original array will never be more than 32K.
// This is unlikely to be particularly useful.
if (potential.Length > LargeJunkArray.Length)
{
return false;
}
byte[] copy = new byte[potential.Length];
int bytesFromOriginal = Math.Min(original.Length, copy.Length);
// Always copy the same amount of data
Array.Copy(original, 0, copy, 0, bytesFromOriginal);
Array.Copy(LargeJunkArray, 0, copy, bytesFromOriginal,
copy.Length - bytesFromOriginal);
bool isEqual = original.Length == potential.Length;
for(int i=0; i < copy.Length; i++)
{
isEqual = isEqual & (copy[i] == potential[i]);
}
return isEqual;
}
Note that this assumes that Array.Copy will take the same amount of time to copy the same amount of data from any source - that may well not be true, based on CPU caches...

In the event that .NET should be smart enough to optimize this out – have you tried introducing a counter variable that counts the number of unequal characters, and returns true if and only if that number is zero?
public static bool SecureEquals(byte[] original, byte[] potential)
{
// They should be the same size, but we don't want to throw an
// exception if we are wrong.
int maxLength = Math.Max(original.Length, potential.Length);
int equals = maxLength - Math.Min(original.Length, potential.Length);
for(int i=0; i < maxLength; i++)
{
byte originalByte = (i < original.Length) ? original[i] : (byte)0;
byte potentialByte = (i < potential.Length) ? potential[i] : (byte)0;
equals += originalByte != potentialByte ? 1 : 0;
}
return equals == 0;
}

I have no idea what is causing the difference -- a profiler seems like it would be a good tool to have here. But I'd consider going with a different approach altogether.
What I'd do in this situation is build a timer into the method so that it can measure its own timing when given two equal keys. (Use the Stopwatch class.) It should compute the mean and standard deviation of the success timing and stash that away in some global state. When you get an unequal key, you can then measure how much time it took you to detect the unequal key, and then make up the difference by spinning a tight loop until the appropriate amount of time has elapsed. You can choose a random time to spin consistent with a normal distribution based on the mean and standard deviation you've already computed.
The nice thing about this approach is that now you have a general mechanism that you can re-use when defeating other timing attacks, and the meaning of the code is obvious; no one is going to come along and try to optimize it away.
And like any other "Security" code, get it reviewed by a security professional before you ship it.

Related

Why is HashSet<Point> so much slower than HashSet<string>?

I wanted to store some pixels locations without allowing duplicates, so the first thing comes to mind is HashSet<Point> or similar classes. However this seems to be very slow compared to something like HashSet<string>.
For example, this code:
HashSet<Point> points = new HashSet<Point>();
using (Bitmap img = new Bitmap(1000, 1000))
{
for (int x = 0; x < img.Width; x++)
{
for (int y = 0; y < img.Height; y++)
{
points.Add(new Point(x, y));
}
}
}
takes about 22.5 seconds.
While the following code (which is not a good choice for obvious reasons) takes only 1.6 seconds:
HashSet<string> points = new HashSet<string>();
using (Bitmap img = new Bitmap(1000, 1000))
{
for (int x = 0; x < img.Width; x++)
{
for (int y = 0; y < img.Height; y++)
{
points.Add(x + "," + y);
}
}
}
So, my questions are:
Is there a reason for that? I checked this answer, but 22.5 sec is way more than the numbers shown in that answer.
Is there a better way to store points without duplicates?

There are two perf problems induced by the Point struct. Something you can see when you add Console.WriteLine(GC.CollectionCount(0)); to the test code. You'll see that the Point test requires ~3720 collections but the string test only needs ~18 collections. Not for free. When you see a value type induce so many collections then you need to conclude "uh-oh, too much boxing".
At issue is that HashSet<T> needs an IEqualityComparer<T> to get its job done. Since you did not provide one, it needs to fall back to one returned by EqualityComparer.Default<T>(). That method can do a good job for string, it implements IEquatable. But not for Point, it is a type that harks from .NET 1.0 and never got the generics love. All it can do is use the Object methods.
The other issue is that Point.GetHashCode() does not do a stellar job in this test, too many collisions, so it hammers Object.Equals() pretty heavily. String has an excellent GetHashCode implementation.
You can solve both problems by providing the HashSet with a good comparer. Like this one:
class PointComparer : IEqualityComparer<Point> {
public bool Equals(Point x, Point y) {
return x.X == y.X && x.Y == y.Y;
}
public int GetHashCode(Point obj) {
// Perfect hash for practical bitmaps, their width/height is never >= 65536
return (obj.Y << 16) ^ obj.X;
}
}
And use it:
HashSet<Point> list = new HashSet<Point>(new PointComparer());
And it is now about 150 times faster, easily beating the string test.

The main reason for the performance drop is all the boxing going on (as already explained in Hans Passant's answer).
Apart from that, the hash code algorithm worsens the problem, because it causes more calls to Equals(object obj) thus increasing the amount of boxing conversions.
Also note that the hash code of Point is computed by x ^ y. This produces very little dispersion in your data range, and therefore the buckets of the HashSet are overpopulated — something that doesn't happen with string, where the dispersion of the hashes is much larger.
You can solve that problem by implementing your own Point struct (trivial) and using a better hash algorithm for your expected data range, e.g. by shifting the coordinates:
(x << 16) ^ y
For some good advice when it comes to hash codes, read Eric Lippert's blog post on the subject.

Fastest way to check C# BitArray for non-zero value

I'm trying to rapidly detect collisions between BitArrays in C# (using the AND boolean operation), which results in a single BitArray representing the overlapping region.
Obviously, if the resulting array only consists of zeroes, there is no collision. What is the fastest way to check this? Simple iteration is too slow. I don't care where collisions are, or how many there are--only that there's a nonzero value somewhere in the array.
It seems like there should be some sort of fast case along the lines of "cast the entire bit array to an int value" (which specifically won't work because BitArrays are variable size), but I can't figure one out.

Do you need the resulting BitArray of the And() method? If not, you could just loop through the input arrays and return true on the first collision.
bool collision(BitArray a1, BitArray a2) {
if (a1 == null || a2 == null) throw new ArgumentException("arguments cannot be null");
if (a1.Count != a2.Count) throw new ArgumentException("arrays don't have same length");
for (int i = 0; i < a1.Count; i++)
if (a1[i] && a2[i]) return true;
return false;
}
This way you an prevent looping the Array twice -- ie. once for the And() and once for checking. In average you will traverse only half of the array once, thus speed things up to 4 times.
Another way is. like #itsme86 suggested use ints instead of BitArrays
int a1, a2;
bool collision = (a1 & a2) > 0;

Just in case someone is still searching for a nice solution, since it's missing here:
Bitarrays initialize with zeros, so you could simply compare a fresh BitArray (of the same length) with your AND-result of the two BitArrays (Note the "!", inverting the bool...):
if (!new BitArray(bitCountOfResult).Equals(result)) {
// We hit!
}
Fast and works for me. Make sure to avoid the LINQ-approach, it's very slow.

Deleting from array, mirrored (strange) behavior

The title may seem a little odd, because I have no idea how to describe this in one sentence.
For the course Algorithms we have to micro-optimize some stuff, one is finding out how deleting from an array works. The assignment is delete something from an array and re-align the contents so that there are no gaps, I think it is quite similar to how std::vector::erase works from c++.
Because I like the idea of understanding everything low-level, I went a little further and tried to bench my solutions. This presented some weird results.
At first, here is a little code that I used:
class Test {
Stopwatch sw;
Obj[] objs;
public Test() {
this.sw = new Stopwatch();
this.objs = new Obj[1000000];
// Fill objs
for (int i = 0; i < objs.Length; i++) {
objs[i] = new Obj(i);
}
}
public void test() {
// Time deletion
sw.Restart();
deleteValue(400000, objs);
sw.Stop();
// Show timings
Console.WriteLine(sw.Elapsed);
}
// Delete function
// value is the to-search-for item in the list of objects
private static void deleteValue(int value, Obj[] list) {
for (int i = 0; i < list.Length; i++) {
if (list[i].Value == value) {
for (int j = i; j < list.Length - 1; j++) {
list[j] = list[j + 1];
//if (list[j + 1] == null) {
// break;
//}
}
list[list.Length - 1] = null;
break;
}
}
}
}
I would just create this class and call the test() method. I did this in a loop for 25 times.
My findings:
The first round it takes a lot longer than the other 24, I think this is because of caching, but I am not sure.
When I use a value that is in the start of the list, it has to move more items in memory than when I use a value at the end, though it still seems to take less time.
Benchtimes differ quite a bit.
When I enable the commented if, performance goes up (10-20%) even if the value I search for is almost at the end of the list (which means the if goes off a lot of times without actually being useful).
I have no idea why these things happen, is there someone who can explain (some of) them? And maybe if someone sees this who is a pro at this, where can I find more info to do this the most efficient way?
Edit after testing:
I did some testing and found some interesting results. I run the test on an array with a size of a million items, filled with a million objects. I run that 25 times and report the cumulative time in milliseconds. I do that 10 times and take the average of that as a final value.
When I run the test with my function described just above here I get a score of:
362,1
When I run it with the answer of dbc I get a score of:
846,4
So mine was faster, but then I started to experiment with a half empty empty array and things started to get weird. To get rid of the inevitable nullPointerExceptions I added an extra check to the if (thinking it would ruin a bit more of the performance) like so:
if (fromItem != null && fromItem.Value != value)
list[to++] = fromItem;
This seemed to not only work, but improve performance dramatically! Now I get a score of:
247,9
The weird thing is, the scores seem to low to be true, but sometimes spike, this is the set I took the avg from:
94, 26, 966, 36, 632, 95, 47, 35, 109, 439
So the extra evaluation seems to improve my performance, despite of doing an extra check. How is this possible?

You are using Stopwatch to time your method. This calculates the total clock time taken during your method call, which could include the time required for .Net to initially JIT your method, interruptions for garbage collection, or slowdowns caused by system loads from other processes. Noise from these sources will likely dominate noise due to cache misses.
This answer gives some suggestions as to how you can minimize some of the noise from garbage collection or other processes. To eliminate JIT noise, you should call your method once without timing it -- or show the time taken by the first call in a separate column in your results table since it will be so different. You might also consider using a proper profiler which will report exactly how much time your code used exclusive of "noise" from other threads or processes.
Finally, I'll note that your algorithm to remove matching items from an array and shift everything else down uses a nested loop, which is not necessary and will access items in the array after the matching index twice. The standard algorithm looks like this:
public static void RemoveFromArray(this Obj[] array, int value)
{
int to = 0;
for (int from = 0; from < array.Length; from++)
{
var fromItem = array[from];
if (fromItem.Value != value)
array[to++] = fromItem;
}
for (; to < array.Length; to++)
{
array[to] = default(Obj);
}
}
However, instead of using the standard algorithm you might experiment by using Array.RemoveAt() with your version, since (I believe) internally it does the removal in unmanaged code.

Dynamic compilation for performance

I have an idea of how I can improve the performance with dynamic code generation, but I'm not sure which is the best way to approach this problem.
Suppose I have a class
class Calculator
{
int Value1;
int Value2;
//..........
int ValueN;
void DoCalc()
{
if (Value1 > 0)
{
DoValue1RelatedStuff();
}
if (Value2 > 0)
{
DoValue2RelatedStuff();
}
//....
//....
//....
if (ValueN > 0)
{
DoValueNRelatedStuff();
}
}
}
The DoCalc method is at the lowest level and it is called many times during calculation. Another important aspect is that ValueN are only set at the beginning and do not change during calculation. So many of the ifs in the DoCalc method are unnecessary, as many of ValueN are 0. So I was hoping that dynamic code generation could help to improve performance.
For instance if I create a method
void DoCalc_Specific()
{
const Value1 = 0;
const Value2 = 0;
const ValueN = 1;
if (Value1 > 0)
{
DoValue1RelatedStuff();
}
if (Value2 > 0)
{
DoValue2RelatedStuff();
}
....
....
....
if (ValueN > 0)
{
DoValueNRelatedStuff();
}
}
and compile it with optimizations switched on the C# compiler is smart enough to only keep the necessary stuff. So I would like to create such method at run time based on the values of ValueN and use the generated method during calculations.
I guess that I could use expression trees for that, but expression trees works only with simple lambda functions, so I cannot use things like if, while etc. inside the function body. So in this case I need to change this method in an appropriate way.
Another possibility is to create the necessary code as a string and compile it dynamically. But it would be much better for me if I could take the existing method and modify it accordingly.
There's also Reflection.Emit, but I don't want to stick with it as it would be very difficult to maintain.
BTW. I'm not restricted to C#. So I'm open to suggestions of programming languages that are best suited for this kind of problem. Except for LISP for a couple of reasons.
One important clarification. DoValue1RelatedStuff() is not a method call in my algorithm. It's just some formula-based calculation and it's pretty fast. I should have written it like this
if (Value1 > 0)
{
// Do Value1 Related Stuff
}
I have run some performance tests and I can see that with two ifs when one is disabled the optimized method is about 2 times faster than with the redundant if.
Here's the code I used for testing:
public class Program
{
static void Main(string[] args)
{
int x = 0, y = 2;
var if_st = DateTime.Now.Ticks;
for (var i = 0; i < 10000000; i++)
{
WithIf(x, y);
}
var if_et = DateTime.Now.Ticks - if_st;
Console.WriteLine(if_et.ToString());
var noif_st = DateTime.Now.Ticks;
for (var i = 0; i < 10000000; i++)
{
Without(x, y);
}
var noif_et = DateTime.Now.Ticks - noif_st;
Console.WriteLine(noif_et.ToString());
Console.ReadLine();
}
static double WithIf(int x, int y)
{
var result = 0.0;
for (var i = 0; i < 100; i++)
{
if (x > 0)
{
result += x * 0.01;
}
if (y > 0)
{
result += y * 0.01;
}
}
return result;
}
static double Without(int x, int y)
{
var result = 0.0;
for (var i = 0; i < 100; i++)
{
result += y * 0.01;
}
return result;
}
}

I would usually not even think about such an optimization. How much work does DoValueXRelatedStuff() do? More than 10 to 50 processor cycles? Yes? That means you are going to build quite a complex system to save less then 10% execution time (and this seems quite optimistic to me). This can easily go down to less then 1%.
Is there no room for other optimizations? Better algorithms? An do you really need to eliminate single branches taking only a single processor cycle (if the branch prediction is correct)? Yes? Shouldn't you think about writing your code in assembler or something else more machine specific instead of using .NET?
Could you give the order of N, the complexity of a typical method, and the ratio of expressions usually evaluating to true?

It would surprise me to find a scenario where the overhead of evaluating the if statements is worth the effort to dynamically emit code.
Modern CPU's support branch prediction and branch predication, which makes the overhead for branches in small segments of code approach zero.
Have you tried to benchmark two hand-coded versions of the code, one that has all the if-statements in place but provides zero values for most, and one that removes all of those same if branches?

If you are really into code optimisation - before you do anything - run the profiler! It will show you where the bottleneck is and which areas are worth optimising.
Also - if the language choice is not limited (except for LISP) then nothing will beat assembler in terms of performance ;)
I remember achieving some performance magic by rewriting some inner functions (like the one you have) using assembler.

Before you do anything, do you actually have a problem?
i.e. does it run long enough to bother you?
If so, find out what is actually taking time, not what you guess. This is the quick, dirty, and highly effective method I use to see where time goes.
Now, you are talking about interpreting versus compiling. Interpreted code is typically 1-2 orders of magnitude slower than compiled code. The reason is that interpreters are continually figuring out what to do next, and then forgetting, while compiled code just knows.
If you are in this situation, then it may make sense to pay the price of translating so as to get the speed of compiled code.

string.Format() parameters

How many parameters can you pass to a string.Format() method?
There must be some sort of theoretical or enforced limit on it. Is it based on the limits of the params[] type or the memory usage of the app that is using it or something else entirely?

OK, I emerge from hiding... I used the following program to verify what was going on and while Marc pointed out that a string like this "{0}{1}{2}...{2147483647}" would succeed the memory limit of 2 GiB before the argument list, my findings did't match yours. Thus the hard limit, of the number of parameters you can put in a string.Format method call has to be 107713904.
int i = 0;
long sum = 0;
while (sum < int.MaxValue)
{
var s = sizeof(char) * ("{" + i + "}").Length;
sum += s; // pseudo append
++i;
}
Console.WriteLine(i);
Console.ReadLine();
Love the discussion people!

Not as far as I know...
well, the theoretical limit would be the int32 limit for the array, but you'd hit the string length limit long before that, I guess...
Just don't go mad with it ;-p It may be better to write lots of small fragments to (for example) a file or response, than one huge hit.
edit - it looked like there was a limit in the IL (0xf4240), but apparently this isn't quite as it appears; I can make it get quite large (2^24) before I simply run out of system memory...
Update; it seems to me that the bounding point is the format string... those {1000001}{1000002} add up... a quick bit of math (below) shows that the maximum useful number of arguments we can use is 206,449,129:
long remaining = 2147483647;// max theoretical format arg length
long count = 10; // i.e. {0}-{9}
long len = 1;
int total = 0;
while (remaining >= 0) {
for(int i = 0 ; i < count && remaining >= 0; i++) {
total++;
remaining -= len + 2; // allow for {}
}
count *= 10;
len++;
}
Console.WriteLine(total - 1);

Expanding on Marc's detailed answer.
The only other limitation that is important is for the debugger. Once you pass a certain number of parameters directly to a function, the debugger becomes less functional in that method. I believe the limit is 64 parameters.
Note: This does not mean an array with 64 members, but 64 parameters passed directly to the function.
You might laugh and say "who would do this?" which is certainly a valid question. Yet LINQ makes this a lot easier than you think. Under the hood in LINQ the compiler generates a lot of code. It's possible that for a large generate SQL query where more than 64 fields are selected that you would hit this issue. Because the compiler under the hood would need to pass all of the fields to the constructor of an anonymous type.
Still a corner case.

Considering that both the limit of the Array class and the String class are the upper limit of Int32 (documented at 2,147,483,647 here: Int32 Structure), it is reasonable to believe that this value is the limit of the number string of format parameters.
Update Upon checking reflector, John is right. String.Format, using the Red Gate Reflector, shows the ff:
public static string Format(IFormatProvider provider, string format, params object[] args)
{
if ((format == null) || (args == null))
{
throw new ArgumentNullException((format == null) ? "format" : "args");
}
StringBuilder builder = new StringBuilder(format.Length + (args.Length * 8));
builder.AppendFormat(provider, format, args);
return builder.ToString();
}
The format.Length + (args.Length * 8) part of the code is enough to kill most of that number. Ergo, '2,147,483,647 = x + 8x' leaves us with x = 238,609,294 (theoretical).
It's far less than that of course; as the guys in the comments mentioned the string hitting the string length limit earlier is quite likely.
Maybe someone should just code this into a machine problem! :P

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Secure Equals: Why is inequality still consistently less time? - c#

Related

Why is HashSet<Point> so much slower than HashSet<string>?

Fastest way to check C# BitArray for non-zero value

Deleting from array, mirrored (strange) behavior

Dynamic compilation for performance

string.Format() parameters

Categories

Resources