Fastest way to check C# BitArray for non-zero value - c#

I'm trying to rapidly detect collisions between BitArrays in C# (using the AND boolean operation), which results in a single BitArray representing the overlapping region.
Obviously, if the resulting array only consists of zeroes, there is no collision. What is the fastest way to check this? Simple iteration is too slow. I don't care where collisions are, or how many there are--only that there's a nonzero value somewhere in the array.
It seems like there should be some sort of fast case along the lines of "cast the entire bit array to an int value" (which specifically won't work because BitArrays are variable size), but I can't figure one out.

Do you need the resulting BitArray of the And() method? If not, you could just loop through the input arrays and return true on the first collision.
bool collision(BitArray a1, BitArray a2) {
if (a1 == null || a2 == null) throw new ArgumentException("arguments cannot be null");
if (a1.Count != a2.Count) throw new ArgumentException("arrays don't have same length");
for (int i = 0; i < a1.Count; i++)
if (a1[i] && a2[i]) return true;
return false;
}
This way you an prevent looping the Array twice -- ie. once for the And() and once for checking. In average you will traverse only half of the array once, thus speed things up to 4 times.
Another way is. like #itsme86 suggested use ints instead of BitArrays
int a1, a2;
bool collision = (a1 & a2) > 0;

Just in case someone is still searching for a nice solution, since it's missing here:
Bitarrays initialize with zeros, so you could simply compare a fresh BitArray (of the same length) with your AND-result of the two BitArrays (Note the "!", inverting the bool...):
if (!new BitArray(bitCountOfResult).Equals(result)) {
// We hit!
}
Fast and works for me. Make sure to avoid the LINQ-approach, it's very slow.

Related

How can I implement odd-even sorting in C# using threads?

I am practicing about threads and concurrency in C# and tried to implement the basic odd-even sort algorithm using a thread for even and another for odd sorting.
static bool Sort(int startPosition, List<int> list)
{
bool result = true;
do
{
for (int i = startPosition; i <= list.Count - 2; i = i + 2)
{
if (list[i] > list[i + 1])
{
int temp = list[i];
list[i] = list[i + 1];
list[i + 1] = temp;
result = false;
}
}
} while (!result);
return result;
}
While the main method is like this:
static void Main(string[] args)
{
bool isOddSorted = false;
bool isEvenSorted = false;
List<int> list = new List<int>();
while (list.Count < 15)
{
list.Add(new Random().Next(0, 20));
}
var evenThread = new Thread(() =>
{
isEvenSorted = Sort(0, list);
});
evenThread.Start();
var oddThread = new Thread(() =>
{
isOddSorted = Sort(1, list);
});
oddThread.Start();
while (true)
{
if (isEvenSorted && isOddSorted)
{
foreach (int i in list)
{
Console.WriteLine(i);
}
break;
}
}
}
Understandably, the loop in Sort method works forever because the result variable is never set to true. However the way it works manages to sort the list. It just doesn't break at any time.
However the moment I add a "result = true" to the first line of do-scope of Sort function, the sorting messes up.
I couldn't figure out how to fix this.
You cannot do odd-even sort easily in a multi-threaded manner. Why?
Because the odd-even sort is in essence the repetition of two sorting passes (the odd and the even pass), with any subsequent pass depending on the result of the preceding pass. You cannot run two passes in parallel/concurrently in practical terms, as each pass has to follow each other.
There are of course ways to employ multi-threading, even with odd-even-sort, although that wouldn't probably make much practical sense. For example, you could divide the list into several partitions, with each partition being odd-even-sorted independently. The sorting of each partition could be done in a multi-threaded manner. As a final step it would require merging the sorted partitions in a way that would result in the fully sorted list.
(By the way, that you eventually get a sorted list if you only let the do while loops in your Sort method run many, many times is just that given enough time, even with "overlapping" concurrent passes you reach eventually a sorted list, but maybe not with all the same numbers from the original list. Because given enough repetions of the loop, eventually the elements will be compared with each other and shuffled to the right positions. However, since you have not synchronized list access, you might lose some numbers from the list, being replaced with duplicates of other numbers, depending on the runtime behavior and timing of list accesses between the two threads.)
You are trying to modify non-thread safe collection across threads.
Even if the assumption is good - you are using basic swap in Sort method (but you did not implement it entirely correct), you have to take under account that while one of the threads is doing the swap, other one could swap a value that is being in temp variable in this exact moment.
You would need to familiarize ourself with either locks and/or thread-Safe Collections.
Look at your result variable and the logic you have implemented with regard to result.
The outer do ... while (!result) loop will only exit when result is being true.
Now imagine your inner for loop finds two numbers that need swapping. So it does and swaps the numbers. And sets result to false. And here is my question to you: After result has been set to false when two numbers have been swapped, when and where is result ever being set to true?
Also, while you sort each the numbers on even list positions, and each the numbers on odd positions, your code does not do a final sort across the entire list. So, basically, if after doing the even and odd sorting, a larger number on an even position n is followed by a smaller number on odd position n+1, your code leaves it at that, leaving the list essentially still (partially) unsorted...

Comparing 1 million integers in an array without sorting it first

I have a task to find the difference between every integer in an array of random numbers and return the lowest difference. A requirement is that the integers can be between 0 and int.maxvalue and that the array will contain 1 million integers.
I put some code together which works fine for a small amount of integers but it takes a long time (so long most of the time I give up waiting) to do a million. My code is below, but I'm looking for some insight on how I can improve performance.
for(int i = 0; i < _RandomIntegerArray.Count(); i++) {
for(int ii = i + 1; ii < _RandomIntegerArray.Count(); ii++) {
if (_RandomIntegerArray[i] == _RandomIntegerArray[ii]) continue;
int currentDiff = Math.Abs(_RandomIntegerArray[i] - _RandomIntegerArray[ii]);
if (currentDiff < lowestDiff) {
Pairs.Clear();
}
if (currentDiff <= lowestDiff) {
Pairs.Add(new NumberPair(_RandomIntegerArray[i], _RandomIntegerArray[ii]));
lowestDiff = currentDiff;
}
}
}
Apologies to everyone that has pointed out that I don't sort; unfortunately sorting is not allowed.
Imagine that you have already found a pair of integers a and b from your random array such that a > b and a-b is the lowest among all possible pairs of integers in the array.
Does an integer c exist in the array such that a > c > b, i.e. c goes between a and b? Clearly, the answer is "no", because otherwise you'd pick the pair {a, c} or {c, b}.
This gives an answer to your problem: a and b must be next to each other in a sorted array. Sorting can be done in O(N*log N), and the search can be done in O(N) - an improvement over O(N2) algorithm that you have.
As per #JonSkeet try sorting the array first and then only compare consecutive array items, which means that you only need to iterate the array once:
Array.Sort(_RandomIntegerArray);
for (int i = 1; i < _RandomIntegerArray.Count(); i++)
{
int currentDiff = _RandomIntegerArray[i] - _RandomIntegerArray[i-1];
if (currentDiff < lowestDiff)
{
Pairs.Clear();
}
if (currentDiff <= lowestDiff)
{
Pairs.Add(new NumberPair(_RandomIntegerArray[i], _RandomIntegerArray[i-1]));
lowestDiff = currentDiff;
}
}
In my testing this results in < 200 ms elapsed for 1 million items.
You've got a million integers out of a possible 2.15 or 4.3 billion (signed or unsigned). That means the largest possible min distance is either about 2150 or 4300. Let's say that the max possible min distance is D.
Divide the legal integers into groups of length D. Create a hash h keyed on integers with arrays of ints as values. Process your array by taking each element x, and adding it to h[x/D].
The point of doing this is that any valid pair of points is either contained in h(k) for some k, or collectively in h(k) and h(k+1).
Find your pair of points by going through the keys of the hash and checking the points associated with adjacent keys. You can sort if you like, or use a bitvector, or any other method but now you're dealing with small arrays (on average 1 element per array).
As elements of the array are b/w 0 to int.maxvalue, so I suppose maxvalue will be less than 1 million. If it is so you just need to initialise the array[maxvalue] to 0 and then as you read 1 million values increment the value in your array.
Now read this array and find the lowest value as described by others as if all the values were sorted. If at any element is present more than 1 than its value will be >1 so you could easily say that min. difference is 0.
NOTE- This method is efficient only if you do not use sorting and more importantly int.maxvalue<<<<<(less than) 10^6(1 million).
It helps a little if you do not count on each iteration
int countIntegers = _RandomIntegerArray.Count();
for(int i = 0; i < countIntegers; i++) {
//...
for(int ii = i + 1; ii < countIntegers; ii++) {
//...
Given that Count() is only returning the number of Ints in an array on each successful count and not modifying the array or caching output until modifications.
How about splitting up the array into arraysize/number of processors sized chunks and running each chunk in a different thread. (Neil)
Assume three parts A, B and C of size as close as possible.
For each part, find the minimum "in-part" difference and that of pairs with the first component from the current part and the second from the next part (A being the next from C).
With a method taking O(n²) time, n/3 should take one ninth, done 2*3 times, this amounts to two thirds plus change for combining the results.
This calls to be applied recursively - remember Карацу́ба/Karatsuba multiplication?
Wait - maybe use two parts after all, for three fourth of the effort - very close to "Karatsuba". (When not seeing how to use an even number of parts, I was thinking multiprocessing with every processor doing "the same".)

Compare two arrays and return their remainder

I am doing an image comparer for learning purposes.
I have done almost everything already and I am now improving it. To check for similarity, I have 2 jagged-multidimensional arrays (byte[][,]) where I access each element of each array using a triple for loop and store their remainder, like this:
for (int dimension = 0; dimension < 8; dimension++)
{
Parallel.For(0, 16, mycolumn =>
{
Parallel.For(0, 16, myrow =>
{
Diffs[dimension][mycolumn, myrow] =
(byte)Math.Abs(Image1Bytes[dimension][mycolumn, myrow]
- Image2Bytes[dimension][mycolumn, myrow]);
});
});
}
Now, I would like to check how much each dimension is equal to another in the other collection.
How could I compare the entire arrays in each array (like array1[i][,] == array2[j][,])?
I think there are better ways to do these operations, but I have managed to do them pretty quickly.
Here is an older thread on comparing two images that would be simple for you to adapt to your needs.
Compare Bitmaps
Since Array supports the IStructuralEquatable interface, you can use structural comparison:
using System.Collections;
. . .
var areEqual = StructuralComparisons.StructuralEqualityComparer.Equals(array1[i], array2[j]);

Find intersection of two multi-dimensional Arrays in C# 4.0

Trying to find a solution to my ranking problem.
Basically I have two multi-dimensional double[,] arrays. Both containing rankings for certain scenarios, so [rank number, scenario number]. More than one scenario can have the same rank.
I want to generate a third multi-dimensional array, taking the intersections of the previous two multi-dimensional arrays to provide a joint ranking.
Does anyone have an idea how I can do this in C#?
Many thanks for any advice or help you can provide!
Edit:
Thank you for all the responses, sorry I should have included an example.
Here it is:
Array One:
[{0,4},{1,0},{1,2},{2,1},{3,5},{4,3}]
Array Two:
[{0,1},{0,4},{1,0},{1,2},{3,5},{4,3}]
Required Result:
[{0,4},{1,0},{1,2},{1,1},{2,5},{3,3}]
Here's some sample code that makes a bunch of assumptions but might be something like what you are looking for. I've added a few comments as well:
static double[,] Intersect(double[,] a1, double[,] a2)
{
// Assumptions:
// a1 and a2 are two-dimensional arrays of the same size
// An element in the array matches if and only if its value is found in the same location in both arrays
// result will contain not-a-number (NaN) for non-matches
double[,] result = new double[a1.GetLength(0), a1.GetLength(1)];
for (int i = 0; i < a1.GetLength(0); i++)
{
for (int j = 0; j < a1.GetLength(1); j++)
{
if (a1[i, j] == a2[i, j])
{
result[i, j] = a1[i, j];
}
else
{
result[i, j] = double.NaN;
}
}
}
return result;
}
For the most part, finding the intersection of multiple dimensional arrays will involve iterating over the elements in each of the dimensions in the arrays. If the indices of the array are not part of the match criteria (my second assumption in my code is removed), you would have to walk each dimension in each array - which increases the run-time of the algorithm (in this case, from O(n^2) to O(n^4).
If you care enough about run-time, I believe array matching is one of the typical examples of dynamic programming (DP) optimization; which you can read up on at your leisure.
I'm not sure how you wanted your results...you could probably return a flat collection of results that can be indexed by a pair, which would potentially save a lot of space if the expected result set is typically small. I went with a third fixed-sized array because it was the easiest thing to do.
Lastly, I'll mention that I don't see a keen C# way of doing this using IEnumerable, LINQ, or something like that. Someone more C# knowledgeable than I can chime in anytime now....
Given the additional information, I'd argue that you aren't actually working with multidimensional arrays, but instead are working with a collection of pairs. The pair is a pair of doubles. I think the following should work nicely:
public class Pair : IEquatable<Pair>
{
public double Rank;
public double Scenario;
public bool Equals(Pair p)
{
return Rank == p.Rank && Scenario == p.Scenario;
}
public override int GetHashCode()
{
int hashRank= Rank.GetHashCode();
int hashScenario = Scenario.GetHashCode();
return hashRank ^ hashScenario;
}
}
You can then use the Intersect operator on IEnumerable:
List<Pair> one = new List<Pair>();
List<Pair> two = new List<Pair>();
// ... populate the lists
List<Pair> result = one.Intersect(two).ToList();
Check out the following msdn article on Enumerable.Intersect() for more information:
http://msdn.microsoft.com/en-us/library/bb910215%28v=vs.90%29.aspx

Secure Equals: Why is inequality still consistently less time?

I have a method that compares two byte arrays for equality, with the major caveat that it does not fail and exit early when it detects inequality. Basically the code is used to compare Cross-Site Request Forgery tokens, and avoid (as much as possible) the ability to use timing to hack the key. I wish I could find the link to the paper that discusses the attack in detail, but the important thing is that the code I have still has a statistically measurable bias for returning sooner if the two byte arrays are equal--although it is an order of magnitude better. So without further ado, here is the code:
public static bool SecureEquals(byte[] original, byte[] potential)
{
// They should be the same size, but we don't want to throw an
// exception if we are wrong.
bool isEqual = original.Length == potential.Length;
int maxLenth = Math.Max(original.Length, potential.Length);
for(int i=0; i < maxLength; i++)
{
byte originalByte = (i < original.Length) ? original[i] : (byte)0;
byte potentialByte = (i < potential.Length) ? potential[i] : (byte)0;
isEqual = isEqual && (originalByte == potentialByte);
}
return isEqual;
}
The difference in average timing between equal and unequal tokens is consistently 10-25ms (depending on garbage collection cycles) shorter for unequal tokens. This is precisely what I want to avoid. If the relative timing were equal, or the average timing swapped based on the run I would be happy. The problem is that we are consistently running shorter for unequal tokens. In contrast, if we stopped the loop on the first unequal token, we could have up to an 80x difference in timing.
While this equality check is a major improvement over the typical eager return, it is still not good enough. Essentially, I don't want any consistent result for equality or inequality returning faster. If I could get the results into the range where garbage collection cycles will mask any consistent bias, I will be happy.
Anyone have a clue what is causing the timing bias toward inequality being faster? At first I thought it was the ternary operator returning an access to the array or a constant if the arrays were of unequal size. The problem is that I still get this bias if the two arrays are the same size.
NOTE: As requested, the links to the articals on Timing Attacks:
http://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf (official paper, linked to from the blog post below)
http://codahale.com/a-lesson-in-timing-attacks/ (talks about the failure in Java's library)
http://en.wikipedia.org/wiki/Timing_attack (more general, but not as complete)
This line could be causing a problem:
isEqual = isEqual && (originalByte == potentialByte);
That won't bother evaluating the originalByte == potentialByte subexpression if isEquals is already false. You don't want the shortcircuiting here, so change it to:
isEqual = isEqual & (originalByte == potentialByte);
EDIT: Note that you're already effectively leaking the information about the size of the original data - because it will always run in a constant time until the potential array exceeds the original array in size, at which point the time will increase. It's probably quite tricky to avoid this... so I would go for the "throw an exception if they're not the right size" approach which explicitly acknowledges it, effectively.
EDIT: Just to go over the idea I included in comments:
// Let's assume you'll never really need more than this
private static readonly byte[] LargeJunkArray = new byte[1024 * 32];
public static bool SecureEquals(byte[] original, byte[] potential)
{
// Reveal that the original array will never be more than 32K.
// This is unlikely to be particularly useful.
if (potential.Length > LargeJunkArray.Length)
{
return false;
}
byte[] copy = new byte[potential.Length];
int bytesFromOriginal = Math.Min(original.Length, copy.Length);
// Always copy the same amount of data
Array.Copy(original, 0, copy, 0, bytesFromOriginal);
Array.Copy(LargeJunkArray, 0, copy, bytesFromOriginal,
copy.Length - bytesFromOriginal);
bool isEqual = original.Length == potential.Length;
for(int i=0; i < copy.Length; i++)
{
isEqual = isEqual & (copy[i] == potential[i]);
}
return isEqual;
}
Note that this assumes that Array.Copy will take the same amount of time to copy the same amount of data from any source - that may well not be true, based on CPU caches...
In the event that .NET should be smart enough to optimize this out – have you tried introducing a counter variable that counts the number of unequal characters, and returns true if and only if that number is zero?
public static bool SecureEquals(byte[] original, byte[] potential)
{
// They should be the same size, but we don't want to throw an
// exception if we are wrong.
int maxLength = Math.Max(original.Length, potential.Length);
int equals = maxLength - Math.Min(original.Length, potential.Length);
for(int i=0; i < maxLength; i++)
{
byte originalByte = (i < original.Length) ? original[i] : (byte)0;
byte potentialByte = (i < potential.Length) ? potential[i] : (byte)0;
equals += originalByte != potentialByte ? 1 : 0;
}
return equals == 0;
}
I have no idea what is causing the difference -- a profiler seems like it would be a good tool to have here. But I'd consider going with a different approach altogether.
What I'd do in this situation is build a timer into the method so that it can measure its own timing when given two equal keys. (Use the Stopwatch class.) It should compute the mean and standard deviation of the success timing and stash that away in some global state. When you get an unequal key, you can then measure how much time it took you to detect the unequal key, and then make up the difference by spinning a tight loop until the appropriate amount of time has elapsed. You can choose a random time to spin consistent with a normal distribution based on the mean and standard deviation you've already computed.
The nice thing about this approach is that now you have a general mechanism that you can re-use when defeating other timing attacks, and the meaning of the code is obvious; no one is going to come along and try to optimize it away.
And like any other "Security" code, get it reviewed by a security professional before you ship it.

Categories

Resources