Search data from one array in another - c#

What I'm trying to do is simple but it's just slooow. Basically I'm looping through data (byte array), converting some parts to a INT and then comparing it to RamCache with is also a byte array. The reason why I'm converting it to a INT is because it's 4 bytes so if 4 bytes are equal in some part of the RamCache array I know it's already 4 length equal.
And then from there I can see how many bytes are equal.
In short what this code must do:
Loop through the data array and take 4 bytes ,then look if it contains in the RamCache array. Currently the code below is slow when the data array and RamCache array contains 65535 bytes.
private unsafe SmartCacheInfo[] FindInCache(byte[] data, Action<SmartCacheInfo> callback)
{
List<SmartCacheInfo> ret = new List<SmartCacheInfo>();
fixed (byte* x = &(data[0]), XcachePtr = &(RamCache[0]))
{
Int32 Loops = data.Length >> 2;
int* cachePtr = (int*)XcachePtr;
int* dataPtr = (int*)x;
if (IndexWritten == 0)
return new SmartCacheInfo[0];
//this part is just horrible slow
for (int i = 0; i < data.Length; i++)
{
if (((data.Length - i) >> 2) == 0)
break;
int index = -1;
dataPtr = (int*)(x + i);
//get the index, alot faster then List.IndexOf
for (int j = 0; ; j++)
{
if (((IndexWritten - j) >> 2) == 0)
break;
if (dataPtr[0] == ((int*)(XcachePtr + j))[0])
{
index = j;
break;
}
}
if (index == -1)
{
//int not found, lets see how
SmartCacheInfo inf = new SmartCacheInfo(-1, i, 4, false);
inf.instruction = Instruction.NEWDATA;
i += inf.Length - 1; //-1... loop does +1
callback(inf);
}
else
{
SmartCacheInfo inf = new SmartCacheInfo(index, i, 0, true); //0 index for now just see what the length is of the MemCmp
inf.Length = MemCmp(data, i, RamCache, index);
ret.Add(inf);
i += inf.Length - 1; //-1... loop does +1
}
}
}
return ret.ToArray();
}
Double looping is what's making it so slow. The data array contains 65535 bytes and so goes for the RamCache array. This code is btw some part of the Cache system I'm working at it's for my SSP project.

Sort the RamCache array or a copy of the array and use a Array.BinarySearch. If you cannot sort it, create a HashSet of the RamCache.

Related

How to find the starting and ending index of matching 8 byte blocks from two byte arrays

Take two byte arrays, each different lengths. What's known about both arrays is there is at least one eight byte (or longer) match between the two. There may be more than one match as long as each match is at least eight bytes long.
Is there a really fast way to find the start and end position of all matches between two byte arrays?
Thank you.
If you have short-list of byte sequences, you can use Boyer-Moore-Horspool to find all the occurrences of needle in haystack quickly.
You will need to scan each of the two byte arrays for each needle.
Depending on the distribution of your particular data, this may work faster than hashing and intersecting every possible 8 byte sequence.
public static IEnumerable<int> BoyerMooreHorspoolIndexesOf(byte[] haystack, byte[] needle)
{
if (haystack == null)
throw new ArgumentNullException("haystack");
if (needle == null)
throw new ArgumentNullException("needle");
int haystackLength = haystack.Length;
int needleLength = needle.Length;
if ((haystackLength == 0) || (needleLength == 0) || (needleLength > haystackLength))
yield break;
var misses = new int[256];
for (var i = 0; i < 256; ++i)
misses[i] = needleLength;
var lastneedleByte = needleLength - 1;
for (var i = 0; i < lastneedleByte; ++i)
misses[needle[i]] = lastneedleByte - i;
var index = 0;
while (index <= (haystackLength - needleLength))
{
for (int i = lastneedleByte; haystack[index + i] == needle[i]; --i)
{
if (i == 0)
{
yield return index;
break;
}
}
index += misses[haystack[index + lastneedleByte]];
}
}

Converting UInt64 to a binary array

I am having problem with this method I wrote to convert UInt64 to a binary array. For some numbers I am getting incorrect binary representation.
Results
Correct
999 = 1111100111
Correct
18446744073709551615 = 1111111111111111111111111111111111111111111111111111111111111111
Incorrect?
18446744073709551614 =
0111111111111111111111111111111111111111111111111111111111111110
According to an online converter the binary value of 18446744073709551614 should be
1111111111111111111111111111111111111111111111111111111111111110
public static int[] GetBinaryArray(UInt64 n)
{
if (n == 0)
{
return new int[2] { 0, 0 };
}
var val = (int)(Math.Log(n) / Math.Log(2));
if (val == 0)
val++;
var arr = new int[val + 1];
for (int i = val, j = 0; i >= 0 && j <= val; i--, j++)
{
if ((n & ((UInt64)1 << i)) != 0)
arr[j] = 1;
else
arr[j] = 0;
}
return arr;
}
FYI: This is not a homework assignment, I require to convert an integer to binary array for encryption purposes, hence the need for an array of bits. Many solutions I have found on this site convert an integer to string representation of binary number which was useless so I came up with this mashup of various other methods.
An explanation as to why the method works for some numbers and not others would be helpful. Yes I used Math.Log and it is slow, but performance can be fixed later.
EDIT: And yes I do need the line where I use Math.Log because my array will not always be 64 bits long, for example if my number was 4 then in binary it is 100 which is array length 3. It is a requirement of my application to do it this way.
It's not the returned array for the input UInt64.MaxValue - 1 which is wrong, it seems like UInt64.MaxValue is wrong.
The array is 65 elements long. This is intuitively wrong because UInt64.MaxValue must fit in 64 bits.
Firstly, instead of doing a natural log and dividing by a log to base 2, you can just do a log to base 2.
Secondly, you also need to do a Math.Ceiling on the returned value because you need the value to fit fully inside the number of bits. Discarding the remainder with a cast to int means that you need to arbitrarily do a val + 1 when declaring the result array. This is only correct for certain scenarios - one of which it is not correct for is... UInt64.MaxValue. Adding one to the number of bits necessary gives a 65-element array.
Thirdly, and finally, you cannot left-shift 64 bits, hence i = val - 1 in the for loop initialization.
Haven't tested this exhaustively...
public static int[] GetBinaryArray(UInt64 n)
{
if (n == 0)
{
return new int[2] { 0, 0 };
}
var val = (int)Math.Ceiling(Math.Log(n,2));
if (val == 0)
val++;
var arr = new int[val];
for (int i = val-1, j = 0; i >= 0 && j <= val; i--, j++)
{
if ((n & ((UInt64)1 << i)) != 0)
arr[j] = 1;
else
arr[j] = 0;
}
return arr;
}

What is wrong with my code (it deals with array size)?

I am initializing array size to 1 but I am updating it in the subsequent lines. It is not even storing the first element in the array as the array size is 1 initially but I expected it would. Could someone provide me with an explanation? Here is the code:
class Program
{
static void Main(string[] args)
{
int num = int.Parse(Console.ReadLine());
Console.Write("The binary number for " + num + " is ");
int size = 1;
int[] binary = new int[size];
size = 0;
while(num>=1)
{
if (num % 2 == 0)
binary[size++] = 0;
else
binary[size++] = 1;
//size += 1;
num = num / 2;
}
for (int i = size - 1; i >= 0; i--)
{
Console.Write(binary[i]);
}
Console.WriteLine();
Console.Write("The Compliment of this number is ");
for (int i = size - 1; i >= 0; i--)
{
if (binary[i] == 0)
binary[i] = 1;
else
binary[i] = 0;
}
for (int i = size - 1; i >= 0; i--)
{
Console.Write(binary[i]);
}
Console.WriteLine();
Console.ReadKey();
}
}
You cannot resize an array, it always has the length you gave it to during initialization (1 in your case).
I think the problem is specifically in your expectation that you can update an array size "in the subsequent lines."
When you make the array here:
int[] binary = new int[size];
Then the size is set in stone
When you call something like:
binary[size++] = 0;
This will not actually increase the number of slots in your array. In fact, that code is only changing the index where you are looking to read or write values. I can see that your code is going to quickly go out of bounds of the array (if you ask for anything but binary[0]
It turns out this array is a tricky data type to use; arrays have a fixed size on creation. You want something that can grow!
So you can either:
-Use an array, but declare that it's size is Math.Ciel(logbase2(yourNumber)) to make sure you will have enough space
-Use a data structure that can grow, like a string or list
-You can create a new array every time you need it bigger and assign it like:
binary = new int[++size];
binary[size-1]=whatever
Good luck, hope this helps!

Is there a good radixsort-implementation for floats in C#

I have a datastructure with a field of the float-type. A collection of these structures needs to be sorted by the value of the float. Is there a radix-sort implementation for this.
If there isn't, is there a fast way to access the exponent, the sign and the mantissa.
Because if you sort the floats first on mantissa, exponent, and on exponent the last time. You sort floats in O(n).
Update:
I was quite interested in this topic, so I sat down and implemented it (using this very fast and memory conservative implementation). I also read this one (thanks celion) and found out that you even dont have to split the floats into mantissa and exponent to sort it. You just have to take the bits one-to-one and perform an int sort. You just have to care about the negative values, that have to be inversely put in front of the positive ones at the end of the algorithm (I made that in one step with the last iteration of the algorithm to save some cpu time).
So heres my float radixsort:
public static float[] RadixSort(this float[] array)
{
// temporary array and the array of converted floats to ints
int[] t = new int[array.Length];
int[] a = new int[array.Length];
for (int i = 0; i < array.Length; i++)
a[i] = BitConverter.ToInt32(BitConverter.GetBytes(array[i]), 0);
// set the group length to 1, 2, 4, 8 or 16
// and see which one is quicker
int groupLength = 4;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
int[] count = new int[1 << groupLength];
int[] pref = new int[1 << groupLength];
int groups = bitLength / groupLength;
int mask = (1 << groupLength) - 1;
int negatives = 0, positives = 0;
for (int c = 0, shift = 0; c < groups; c++, shift += groupLength)
{
// reset count array
for (int j = 0; j < count.Length; j++)
count[j] = 0;
// counting elements of the c-th group
for (int i = 0; i < a.Length; i++)
{
count[(a[i] >> shift) & mask]++;
// additionally count all negative
// values in first round
if (c == 0 && a[i] < 0)
negatives++;
}
if (c == 0) positives = a.Length - negatives;
// calculating prefixes
pref[0] = 0;
for (int i = 1; i < count.Length; i++)
pref[i] = pref[i - 1] + count[i - 1];
// from a[] to t[] elements ordered by c-th group
for (int i = 0; i < a.Length; i++){
// Get the right index to sort the number in
int index = pref[(a[i] >> shift) & mask]++;
if (c == groups - 1)
{
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if (a[i] < 0)
index = positives - (index - negatives) - 1;
else
index += negatives;
}
t[index] = a[i];
}
// a[]=t[] and start again until the last group
t.CopyTo(a, 0);
}
// Convert back the ints to the float array
float[] ret = new float[a.Length];
for (int i = 0; i < a.Length; i++)
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
return ret;
}
It is slightly slower than an int radix sort, because of the array copying at the beginning and end of the function, where the floats are bitwise copied to ints and back. The whole function nevertheless is again O(n). In any case much faster than sorting 3 times in a row like you proposed. I dont see much room for optimizations anymore, but if anyone does: feel free to tell me.
To sort descending change this line at the very end:
ret[i] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
to this:
ret[a.Length - i - 1] = BitConverter.ToSingle(BitConverter.GetBytes(a[i]), 0);
Measuring:
I set up some short test, containing all special cases of floats (NaN, +/-Inf, Min/Max value, 0) and random numbers. It sorts exactly the same order as Linq or Array.Sort sorts floats:
NaN -> -Inf -> Min -> Negative Nums -> 0 -> Positive Nums -> Max -> +Inf
So i ran a test with a huge array of 10M numbers:
float[] test = new float[10000000];
Random rnd = new Random();
for (int i = 0; i < test.Length; i++)
{
byte[] buffer = new byte[4];
rnd.NextBytes(buffer);
float rndfloat = BitConverter.ToSingle(buffer, 0);
switch(i){
case 0: { test[i] = float.MaxValue; break; }
case 1: { test[i] = float.MinValue; break; }
case 2: { test[i] = float.NaN; break; }
case 3: { test[i] = float.NegativeInfinity; break; }
case 4: { test[i] = float.PositiveInfinity; break; }
case 5: { test[i] = 0f; break; }
default: { test[i] = test[i] = rndfloat; break; }
}
}
And stopped the time of the different sorting algorithms:
Stopwatch sw = new Stopwatch();
sw.Start();
float[] sorted1 = test.RadixSort();
sw.Stop();
Console.WriteLine(string.Format("RadixSort: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
float[] sorted2 = test.OrderBy(x => x).ToArray();
sw.Stop();
Console.WriteLine(string.Format("Linq OrderBy: {0}", sw.Elapsed));
sw.Reset();
sw.Start();
Array.Sort(test);
float[] sorted3 = test;
sw.Stop();
Console.WriteLine(string.Format("Array.Sort: {0}", sw.Elapsed));
And the output was (update: now ran with release build, not debug):
RadixSort: 00:00:03.9902332
Linq OrderBy: 00:00:17.4983272
Array.Sort: 00:00:03.1536785
roughly more than four times as fast as Linq. That is not bad. But still not yet that fast as Array.Sort, but also not that much worse. But i was really surprised by this one: I expected it to be slightly slower than Linq on very small arrays. But then I ran a test with just 20 elements:
RadixSort: 00:00:00.0012944
Linq OrderBy: 00:00:00.0072271
Array.Sort: 00:00:00.0002979
and even this time my Radixsort is quicker than Linq, but way slower than array sort. :)
Update 2:
I made some more measurements and found out some interesting things: longer group length constants mean less iterations and more memory usage. If you use a group length of 16 bits (only 2 iterations), you have a huge memory overhead when sorting small arrays, but you can beat Array.Sort if it comes to arrays larger than about 100k elements, even if not very much. The charts axes are both logarithmized:
(source: daubmeier.de)
There's a nice explanation of how to perform radix sort on floats here:
http://www.codercorner.com/RadixSortRevisited.htm
If all your values are positive, you can get away with using the binary representation; the link explains how to handle negative values.
By doing some fancy casting and swapping arrays instead of copying this version is 2x faster for 10M numbers as Philip Daubmeiers original with grouplength set to 8. It is 3x faster as Array.Sort for that arraysize.
static public void RadixSortFloat(this float[] array, int arrayLen = -1)
{
// Some use cases have an array that is longer as the filled part which we want to sort
if (arrayLen < 0) arrayLen = array.Length;
// Cast our original array as long
Span<float> asFloat = array;
Span<int> a = MemoryMarshal.Cast<float, int>(asFloat);
// Create a temp array
Span<int> t = new Span<int>(new int[arrayLen]);
// set the group length to 1, 2, 4, 8 or 16 and see which one is quicker
int groupLength = 8;
int bitLength = 32;
// counting and prefix arrays
// (dimension is 2^r, the number of possible values of a r-bit number)
var dim = 1 << groupLength;
int groups = bitLength / groupLength;
if (groups % 2 != 0) throw new Exception("groups must be even so data is in original array at end");
var count = new int[dim];
var pref = new int[dim];
int mask = (dim) - 1;
int negatives = 0, positives = 0;
// counting elements of the 1st group incuding negative/positive
for (int i = 0; i < arrayLen; i++)
{
if (a[i] < 0) negatives++;
count[(a[i] >> 0) & mask]++;
}
positives = arrayLen - negatives;
int c;
int shift;
for (c = 0, shift = 0; c < groups - 1; c++, shift += groupLength)
{
CalcPrefixes();
var nextShift = shift + groupLength;
//
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
count[( ai>> nextShift) & mask]++;
t[index] = ai;
}
// swap the arrays and start again until the last group
var temp = a;
a = t;
t = temp;
}
// Last round
CalcPrefixes();
for (var i = 0; i < arrayLen; i++)
{
var ai = a[i];
// Get the right index to sort the number in
int index = pref[( ai >> shift) & mask]++;
// We're in the last (most significant) group, if the
// number is negative, order them inversely in front
// of the array, pushing positive ones back.
if ( ai < 0) index = positives - (index - negatives) - 1; else index += negatives;
//
t[index] = ai;
}
void CalcPrefixes()
{
pref[0] = 0;
for (int i = 1; i < dim; i++)
{
pref[i] = pref[i - 1] + count[i - 1];
count[i - 1] = 0;
}
}
}
You can use an unsafe block to memcpy or alias a float * to a uint * to extract the bits.
I think your best bet if the values aren't too close and there's a reasonable precision requirement, you can just use the actual float digits before and after the decimal point to do the sorting.
For example, you can just use the first 4 decimals (be they 0 or not) to do the sorting.

Why do I get the following output when inverting bits in a byte?

Assumption:
Converting a
byte[] from Little Endian to Big
Endian means inverting the order of the bits in
each byte of the byte[].
Assuming this is correct, I tried the following to understand this:
byte[] data = new byte[] { 1, 2, 3, 4, 5, 15, 24 };
byte[] inverted = ToBig(data);
var little = new BitArray(data);
var big = new BitArray(inverted);
int i = 1;
foreach (bool b in little)
{
Console.Write(b ? "1" : "0");
if (i == 8)
{
i = 0;
Console.Write(" ");
}
i++;
}
Console.WriteLine();
i = 1;
foreach (bool b in big)
{
Console.Write(b ? "1" : "0");
if (i == 8)
{
i = 0;
Console.Write(" ");
}
i++;
}
Console.WriteLine();
Console.WriteLine(BitConverter.ToString(data));
Console.WriteLine(BitConverter.ToString(ToBig(data)));
foreach (byte b in data)
{
Console.Write("{0} ", b);
}
Console.WriteLine();
foreach (byte b in inverted)
{
Console.Write("{0} ", b);
}
The convert method:
private static byte[] ToBig(byte[] data)
{
byte[] inverted = new byte[data.Length];
for (int i = 0; i < data.Length; i++)
{
var bits = new BitArray(new byte[] { data[i] });
var invertedBits = new BitArray(bits.Count);
int x = 0;
for (int p = bits.Count - 1; p >= 0; p--)
{
invertedBits[x] = bits[p];
x++;
}
invertedBits.CopyTo(inverted, i);
}
return inverted;
}
The output of this little application is different from what I expected:
00000001 00000010 00000011 00000100 00000101 00001111 00011000
00000001 00000010 00000011 00000100 00000101 00001111 00011000
80-40-C0-20-A0-F0-18
01-02-03-04-05-0F-18
1 2 3 4 5 15 24
1 2 3 4 5 15 24
For some reason the data remains the same, unless printed using BitConverter.
What am I not understanding?
Update
New code produces the following output:
10000000 01000000 11000000 00100000 10100000 11110000 00011000
00000001 00000010 00000011 00000100 00000101 00001111 00011000
01-02-03-04-05-0F-18
80-40-C0-20-A0-F0-18
1 2 3 4 5 15 24
128 64 192 32 160 240 24
But as I have been told now, my method is incorrect anyway because I should invert the bytes
and not the bits?
This hardware developer I'm working with told me to invert the bits because he cannot read the data.
Context where I'm using this
The application that will use this does not really work with numbers.
I'm supposed to save a stream of bits to file where
1 = white and 0 = black.
They represent pixels of a bitmap 256x64.
byte 0 to byte 31 represents the first row of pixels
byte 32 to byte 63 the second row of pixels.
I have code that outputs these bits... but the developer is telling
me they are in the wrong order... He says the bytes are fine but the bits are not.
So I'm left confused :p
No. Endianness refers to the order of bytes, not bits. Big endian systems store the most-significant byte first and little-endian systems store the least-significant first. The bits within a byte remain in the same order.
Your ToBig() function is returning the original data rather than the bit-swapped data, it seems.
Your method may be correct at this point. There are different meanings of endianness, and it depends on the hardware.
Typically, it's used for converting between computing platforms. Most CPU vendors (now) use the same bit ordering, but different byte ordering, for different chipsets. This means, that, if you are passing a 2-byte int from one system to another, you leave the bits alone, but swap bytes 1 and 2, ie:
int somenumber -> byte[2]: somenumber[high],somenumber[low] ->
byte[2]: somenumber[low],somenumber[high] -> int newNumber
However, this isn't always true. Some hardware still uses inverted BIT ordering, so what you have may be correct. You'll need to either trust your hardware dev. or look into it further.
I recommend reading up on this on Wikipedia - always a great source of info:
http://en.wikipedia.org/wiki/Endianness
Your ToBig method has a bug.
At the end:
invertedBits.CopyTo(data, i);
}
return data;
You need to change that to:
byte[] newData = new byte[data.Length];
invertedBits.CopyTo(newData, i);
}
return newData;
You're resetting your input data, so you're receiving both arrays inverted. The problem is that arrays are reference types, so you can modify the original data.
As greyfade already said, endianness is not about bit ordering.
The reason that your code doesn't do what you expect, is that the ToBig method changes the array that you send to it. That means that after calling the method the array is inverted, and data and inverted are just two references pointing to the same array.
Here's a corrected version of the method.
private static byte[] ToBig(byte[] data) {
byte[] result = new byte[data.length];
for (int i = 0; i < data.Length; i++) {
var bits = new BitArray(new byte[] { data[i] });
var invertedBits = new BitArray(bits.Count);
int x = 0;
for (int p = bits.Count - 1; p >= 0; p--) {
invertedBits[x] = bits[p];
x++;
}
invertedBits.CopyTo(result, i);
}
return result;
}
Edit:
Here's a method that changes endianness for a byte array:
static byte[] ConvertEndianness(byte[] data, int wordSize) {
if (data.Length % wordSize != 0) throw new ArgumentException("The data length does not divide into an even number of words.");
byte[] result = new byte[data.Length];
int offset = wordSize - 1;
for (int i = 0; i < data.Length; i++) {
result[i + offset] = data[i];
offset -= 2;
if (offset < -wordSize) {
offset += wordSize * 2;
}
}
return result;
}
Example:
byte[] data = { 1,2,3,4,5,6 };
byte[] inverted = ConvertEndianness(data, 2);
Console.WriteLine(BitConverter.ToString(inverted));
Output:
02-01-04-03-06-05
The second parameter is the word size. As endianness is the ordering of bytes in a word, you have to specify how large the words are.
Edit 2:
Here is a more efficient method for reversing the bits:
static byte[] ReverseBits(byte[] data) {
byte[] result = new byte[data.Length];
for (int i = 0; i < data.Length; i++) {
int b = data[i];
int r = 0;
for (int j = 0; j < 8; j++) {
r <<= 1;
r |= b & 1;
b >>= 1;
}
result[i] = (byte)r;
}
return result;
}
One big problem I see is ToBig changes the contents of the data[] array that is passed to it.
You're calling ToBig on an array named data, then assigning the result to inverted, but since you didn't create a new array inside ToBig, you modified both arrays, then you proceed to treat the arrays data and inverted as different when in reality they are not.

Categories

Resources