I am trying to improve the speed of BitConvert, or rather, an alternative way.
So here is the code i thought was supposed to be faster :
bsize = ms.length
int index = 0;
byte[] target = new byte[intsize];
target[index++] = (byte)bsize;
target[index++] = (byte)(bsize >> 8);
target[index++] = (byte)(bsize >> 16);
target[index] = (byte)(bsize >> 24);
And well the BitConvert code:
BitConverter.GetBytes(bsize)
And well, it wasn´t faster, it was alot slower from my tests, more than twice as slow.
So why is it slower?
And is there a way to improve the speed?
EDIT:
BitConvert = 5068 Ticks
OtherMethod above: 12847 Ticks
EDIT 2: My Benchmark code:
private unsafe void ExecuteBenchmark(int samplingSize = 100000)
{
// run the Garbage collector
GC.Collect();
GC.WaitForPendingFinalizers();
// log start
Console.WriteLine("Benchmark started");
// start timer
var t = Stopwatch.StartNew();
for (int i = 0; i < samplingSize; i++)
{
}
}
// stop timer
t.Stop();
// log ending
Console.WriteLine("Execute1 time = " + t.ElapsedTicks + " ticks");
}
Your implementation is slower, because BitConverter uses unsafe code which operates on pointers:
public unsafe static byte[] GetBytes(int value)
{
byte[] array = new byte[4];
fixed (byte* ptr = array)
{
*(int*)ptr = value;
}
return array;
}
And back to int:
public unsafe static int ToInt32(byte[] value, int startIndex)
{
if (value == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.value);
}
if ((ulong)startIndex >= (ulong)((long)value.Length))
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.startIndex, ExceptionResource.ArgumentOutOfRange_Index);
}
if (startIndex > value.Length - 4)
{
ThrowHelper.ThrowArgumentException(ExceptionResource.Arg_ArrayPlusOffTooSmall);
}
int result;
if (startIndex % 4 == 0)
{
result = *(int*)(&value[startIndex]);
}
else
{
if (BitConverter.IsLittleEndian)
{
result = ((int)(*(&value[startIndex])) | (int)(&value[startIndex])[(IntPtr)1 / 1] << 8 | (int)(&value[startIndex])[(IntPtr)2 / 1] << 16 | (int)(&value[startIndex])[(IntPtr)3 / 1] << 24);
}
else
{
result = ((int)(*(&value[startIndex])) << 24 | (int)(&value[startIndex])[(IntPtr)1 / 1] << 16 | (int)(&value[startIndex])[(IntPtr)2 / 1] << 8 | (int)(&value[startIndex])[(IntPtr)3 / 1]);
}
}
return result;
}
Well, first, measuring the speed of such a tiny amount of code is going to be error-prone. Posting your benchmark might give more answers.
But my guess is that on platforms supporting it (like x86), BitConverter probably does a single bounds check and an unaligned write into target rather than 3 shifts, 4 bounds checks, and 4 writes. It may end up completely inlined, alleviating all call overhead.
Related
This question already has answers here:
How to return an array from a function?
(5 answers)
Closed 3 years ago.
I have my C# code that returns uint array but I want to do it in C++. I looked other posts; they use uint pointer array where my array is not. Does anyone know how to return uint16_t array properly?
This is C# code works fine
public static UInt16[] GetIntArrayFromByteArray(byte[] byteArray)
{
if ((byteArray.Length % 2) == 1)
Array.Resize(ref byteArray, byteArray.Length + 1);
UInt16[] intArray = new UInt16[byteArray.Length / 2];
for (int i = 0; i < byteArray.Length; i += 2)
intArray[i / 2] = (UInt16)((byteArray[i] << 8) | byteArray[i + 1]);
return intArray;
}
This is C++ code that creates syntax error
uint16_t[] GetIntArrayFromByteArray(byte[] byteArray)
{
//if ((byteArray.Length % 2) == 1)
//Array.Resize(ref byteArray, byteArray.Length + 1);
uint16_t[] intArray = new uint16_t[10];
for (int i = 0; i < 10; i += 2)
intArray[i / 2] = (uint16_t)((byteArray[i] << 8) | byteArray[i + 1]);
return intArray;
}
Do not use Type[] ever. Use std::vector:
std::vector<uint16_t> GetIntArrayFromByteArray(std::vector<byte> byteArray)
{
// If the number of bytes is not even, put a zero at the end
if ((byteArray.size() % 2) == 1)
byteArray.push_back(0);
std::vector<uint16_t> intArray;
for (int i = 0; i < byteArray.size(); i += 2)
intArray.push_back((uint16_t)((byteArray[i] << 8) | byteArray[i + 1]));
return intArray;
}
You can also use std::array<Type, Size> if the array would be fixed size.
More optimal version (thanks to #Aconcagua) (demo)
Here is a full code with more optimal version that doesn't copy or alter the input. This is better if you'll have long input arrays. It's possible to write it shorter, but I wanted to keep it verbose and beginner-friendly.
#include <iostream>
#include <vector>
using byte = unsigned char;
std::vector<uint16_t> GetIntArrayFromByteArray(const std::vector<byte>& byteArray)
{
const int inputSize = byteArray.size();
const bool inputIsOddCount = inputSize % 2 != 0;
const int finalSize = (int)(inputSize/2.0 + 0.5);
// Ignore the last odd item in loop and handle it later
const int loopLength = inputIsOddCount ? inputSize - 1 : inputSize;
std::vector<uint16_t> intArray;
// Reserve space for all items
intArray.reserve(finalSize);
for (int i = 0; i < loopLength; i += 2)
{
intArray.push_back((uint16_t)((byteArray[i] << 8) | byteArray[i + 1]));
}
// If the input was odd-count, we still have one byte to add, along with a zero
if(inputIsOddCount)
{
// The zero in this expression is redundant but illustrative
intArray.push_back((uint16_t)((byteArray[inputSize-1] << 8) | 0));
}
return intArray;
}
int main() {
const std::vector<byte> numbers{2,0,0,0,1,0,0,1};
const std::vector<uint16_t> result(GetIntArrayFromByteArray(numbers));
for(uint16_t num: result) {
std::cout << num << "\n";
}
return 0;
}
Could anyone help me optimize this piece of code? Its currently a large bottleneck as it gets called very often. Even a 25% speed improvement would be significant.
public int ReadInt(int length)
{
if (Position + length > Length)
throw new BitBufferException("Not enough bits remaining.");
int result = 0;
while (length > 0)
{
int off = Position & 7;
int count = 8 - off;
if (count > length)
count = length;
int mask = (1 << count) - 1;
int bits = (Data[Position >> 3] >> off);
result |= (bits & mask) << (length - count);
length -= count;
Position += count;
}
return result;
}
Best answer would go to fastest solution. Benchmarks done with dottrace. Currently this block of code takes up about 15% of the total cpu time. Lowest number wins best answer.
EDIT: Sample usage:
public class Auth : Packet
{
int Field0;
int ProtocolHash;
int Field1;
public override void Parse(buffer)
{
Field0 = buffer.ReadInt(9);
ProtocolHash = buffer.ReadInt(32);
Field1 = buffer.ReadInt(8);
}
}
Size of Data is variable but in most cases 512 bytes;
How about using pointers and unsafe context? You didn't say anything about your input data, method context, etc. so I tried to deduct all of these by myself.
public class BitTest
{
private int[] _data;
public BitTest(int[] data)
{
Length = data.Length * 4 * 8;
// +2, because we use byte* and long* later
// and don't want to read outside the array memory
_data = new int[data.Length + 2];
Array.Copy(data, _data, data.Length);
}
public int Position { get; private set; }
public int Length { get; private set; }
and ReadInt method. Hope comments give a little light on the solution:
public unsafe int ReadInt(int length)
{
if (Position + length > Length)
throw new ArgumentException("Not enough bits remaining.");
// method returns int, so getting more then 32 bits is pointless
if (length > 4 * 8)
throw new ArgumentException();
//
int bytePosition = Position / 8;
int bitPosition = Position % 8;
Position += length;
// get int* on array to start with
fixed (int* array = _data)
{
// change pointer to byte*
byte* bt = (byte*)array;
// skip already read bytes and change pointer type to long*
long* ptr = (long*)(bt + bytePosition);
// read value from current pointer position
long value = *ptr;
// take only necessary bits
value &= (1L << (length + bitPosition)) - 1;
value >>= bitPosition;
// cast value to int before returning
return (int)value;
}
}
}
I didn't test the method, but would bet it's much faster then your approach.
My simple test code:
var data = new[] { 1 | (1 << 8 + 1) | (1 << 16 + 2) | (1 << 24 + 3) };
var test = new BitTest(data);
var bytes = Enumerable.Range(0, 4)
.Select(x => test.ReadInt(8))
.ToArray();
bytes contains { 1, 2, 4, 8}, as expected.
I Don't know if this give you a significant improvements but it should give you some numbers.
Instead of creating new int variables inside the loop (this requires a time to create) let reserved those variables before entering the loop.
public int ReadInt(int length)
{
if (Position + length > Length)
throw new BitBufferException("Not enough bits remaining.");
int result = 0;
int off = 0;
int count = 0;
int mask = 0;
int bits = 0
while (length > 0)
{
off = Position & 7;
count = 8 - off;
if (count > length)
count = length;
mask = (1 << count) - 1;
bits = (Data[Position >> 3] >> off);
result |= (bits & mask) << (length - count);
length -= count;
Position += count;
}
return result;
}
HOPE THIS increase your performance even a bit
Not so sure how to ask this question, but I have 2 ways (so far) for a lookup array
Option 1 is:
bool[][][] myJaggegArray;
myJaggegArray = new bool[120][][];
for (int i = 0; i < 120; ++i)
{
if ((i & 0x88) == 0)
{
//only 64 will be set
myJaggegArray[i] = new bool[120][];
for (int j = 0; j < 120; ++j)
{
if ((j & 0x88) == 0)
{
//only 64 will be set
myJaggegArray[i][j] = new bool[60];
}
}
}
}
Option 2 is:
bool[] myArray;
// [998520]
myArray = new bool[(120 | (120 << 7) | (60 << 14))];
Both ways work nicely, but is there another (better) way of doing a fast lookup and which one would you take if speed / performance is what matter?
This would be used in a chessboard implementation (0x88) and mostly is
[from][to][dataX] for option 1
[(from | (to << 7) | (dataX << 14))] for option 2
I would suggest using one large array, because of the advantages of having one large memory block, but I would also encourage writing a special accessor to that array.
class MyCustomDataStore
{
bool[] array;
int sizex, sizey, sizez;
MyCustomDataStore(int x, int y, int z) {
array=new bool[x*y*z];
this.sizex = x;
this.sizey = y;
this.sizez = z;
}
bool get(int px, int py, int pz) {
// change the order in whatever way you iterate
return array [ px*sizex*sizey + py*sizey + pz ];
}
}
I just update dariusz's solution with an array of longs for z-size <= 64
edit2: updated to '<<' version, size fixed to 128x128x64
class MyCustomDataStore
{
long[] array;
MyCustomDataStore()
{
array = new long[128 | 128 << 7];
}
bool get(int px, int py, int pz)
{
return (array[px | (py << 7)] & (1 << pz)) == 0;
}
void set(int px, int py, int pz, bool val)
{
long mask = (1 << pz);
int index = px | (py << 7);
if (val)
{
array[index] |= mask;
}
else
{
array[index] &= ~mask;
}
}
}
edit: performance test:
used 100 times 128x128x64 fill and read
long: 9885ms, 132096B
bool: 9740ms, 1065088B
There's got to be a faster and better way to swap bytes of 16bit words then this.:
public static void Swap(byte[] data)
{
for (int i = 0; i < data.Length; i += 2)
{
byte b = data[i];
data[i] = data[i + 1];
data[i + 1] = b;
}
}
Does anyone have an idea?
In my attempt to apply for the Uberhacker award, I submit the following. For my testing, I used a Source array of 8,192 bytes and called SwapX2 100,000 times:
public static unsafe void SwapX2(Byte[] source)
{
fixed (Byte* pSource = &source[0])
{
Byte* bp = pSource;
Byte* bp_stop = bp + source.Length;
while (bp < bp_stop)
{
*(UInt16*)bp = (UInt16)(*bp << 8 | *(bp + 1));
bp += 2;
}
}
}
My benchmarking indicates that this version is over 1.8 times faster than the code submitted in the original question.
This way appears to be slightly faster than the method in the original question:
private static byte[] _temp = new byte[0];
public static void Swap(byte[] data)
{
if (data.Length > _temp.Length)
{
_temp = new byte[data.Length];
}
Buffer.BlockCopy(data, 1, _temp, 0, data.Length - 1);
for (int i = 0; i < data.Length; i += 2)
{
_temp[i + 1] = data[i];
}
Buffer.BlockCopy(_temp, 0, data, 0, data.Length);
}
My benchmarking assumed that the method is called repeatedly, so that the resizing of the _temp array isn't a factor. This method relies on the fact that half of the byte-swapping can be done with the initial Buffer.BlockCopy(...) call (with the source position offset by 1).
Please benchmark this yourselves, in case I've completely lost my mind. In my tests, this method takes approximately 70% as long as the original method (which I modified to declare the byte b outside of the loop).
I always liked this:
public static Int64 SwapByteOrder(Int64 value)
{
var uvalue = (UInt64)value;
UInt64 swapped =
( (0x00000000000000FF) & (uvalue >> 56)
| (0x000000000000FF00) & (uvalue >> 40)
| (0x0000000000FF0000) & (uvalue >> 24)
| (0x00000000FF000000) & (uvalue >> 8)
| (0x000000FF00000000) & (uvalue << 8)
| (0x0000FF0000000000) & (uvalue << 24)
| (0x00FF000000000000) & (uvalue << 40)
| (0xFF00000000000000) & (uvalue << 56));
return (Int64)swapped;
}
I believe you'll find this is the fastest method as well a being fairly readable and safe. Obviously this applies to 64-bit values but the same technique could be used for 32- or 16-.
Next method, in my test, almost 3 times faster as the accepted answer. (Always faster on more than 3 characters or six bytes, a bit slower on less or equal to three characters or six bytes.) (Note that the accepted answer can read/write outside the bounds of the array.)
(Update While having a pointer there's no need to call the property to get the length. Using that pointer is a bit faster, but requires either a runtime check or, as in next example, a project configuration to build for each platform. Define X86 and X64 under each configuration.)
static unsafe void SwapV2(byte[] source)
{
fixed (byte* psource = source)
{
#if X86
var length = *((uint*)(psource - 4)) & 0xFFFFFFFEU;
#elif X64
var length = *((uint*)(psource - 8)) & 0xFFFFFFFEU;
#else
var length = (source.Length & 0xFFFFFFFE);
#endif
while (length > 7)
{
length -= 8;
ulong* pulong = (ulong*)(psource + length);
*pulong = ( ((*pulong >> 8) & 0x00FF00FF00FF00FFUL)
| ((*pulong << 8) & 0xFF00FF00FF00FF00UL));
}
if(length > 3)
{
length -= 4;
uint* puint = (uint*)(psource + length);
*puint = ( ((*puint >> 8) & 0x00FF00FFU)
| ((*puint << 8) & 0xFF00FF00U));
}
if(length > 1)
{
ushort* pushort = (ushort*)psource;
*pushort = (ushort) ( (*pushort >> 8)
| (*pushort << 8));
}
}
}
Five tests with 300.000 times 8192 bytes
SwapV2: 1055, 1051, 1043, 1041, 1044
SwapX2: 2802, 2803, 2803, 2805, 2805
Five tests with 50.000.000 times 6 bytes
SwapV2: 1092, 1085, 1086, 1087, 1086
SwapX2: 1018, 1019, 1015, 1017, 1018
But if the data is large and performance really matters, you could use SSE or AVX. (13 times faster.) https://pastebin.com/WaFk275U
Test 5 times, 100000 loops with 8192 bytes or 4096 chars
SwapX2 : 226, 223, 225, 226, 227 Min: 223
SwapV2 : 113, 111, 112, 114, 112 Min: 111
SwapA2 : 17, 17, 17, 17, 16 Min: 16
Well, you could use the XOR swapping trick, to avoid an intermediate byte. It won't be any faster, though, and I wouldn't be surprised if the IL is exactly the same.
for (int i = 0; i < data.Length; i += 2)
{
data[i] ^= data[i + 1];
data[i + 1] ^= data[i];
data[i] ^= data[i + 1];
}
I have a byte[] testKey = new byte[8];
This obviously starts with all bytes as 0. I want to go through all the bytes and increment by 1 on each iteration of the loop so eventually I go through all possibilities of the byte array. I also want to do this as FAST as possible. Yes I am trying to write a brute forcer.
Update I got the unsafe method working, and it is the quickest. However, by my calculations, it is going to take 76,000,000 years to loop through doing DES encryption on each key using the .Net DESCryptoServiceProvider. 10,000 encryptions takes 1.3 seconds. Thanks for all the awesome answers to the most useless question ever!
btw; it takes a lot of processing to check 2^64 options...
Well, the fastest way may be to just use an Int64 (aka long) or UInt64 (ulong), and use ++? Do you really need the byte[]?
As a hacky alternative, how about:
Array.Clear(data, 0, data.Length);
while (true)
{
// use data here
if (++data[7] == 0) if (++data[6] == 0)
if (++data[5] == 0) if (++data[4] == 0)
if (++data[3] == 0) if (++data[2] == 0)
if (++data[1] == 0) if (++data[0] == 0) break;
}
The only other approach I can think of would be to use unsafe code to talk to an array as though it is an int64... messy.
unsafe static void Test() {
byte[] data = new byte[8];
fixed (byte* first = data) {
ulong* value = (ulong*)first;
do {
// use data here
*value = *value + 1;
} while (*value != 0);
}
}
This is how you increase the value in the array:
int index = testKey.Length - 1;
while (index >= 0) {
if (testKey[index] < 255) {
testKey[index]++;
break;
} else {
testKey[index--] = 0;
}
}
When index is -1 after this code, you have iterated all combinations.
This will be slightly faster than using BitConverter, as it doesn't create a new array for each iteration.
Edit:
A small performance test showed that this is about 1400 times faster than using BitConverter...
What a great question! Here's a way to do it without unsafe code:
public struct LongAndBytes
{
[FieldOffset(0)]
public ulong UlongValue;
[FieldOffset(0)]
public byte Byte0;
[FieldOffset(1)]
public byte Byte1;
[FieldOffset(2)]
public byte Byte2;
[FieldOffset(3)]
public byte Byte3;
[FieldOffset(4)]
public byte Byte4;
[FieldOffset(5)]
public byte Byte5;
[FieldOffset(6)]
public byte Byte6;
[FieldOffset(7)]
public byte Byte7;
public byte[] ToArray()
{
return new byte[8] {Byte0, Byte1, Byte2, Byte3, Byte4, Byte5, Byte6, Byte7};
}
}
// ...
LongAndBytes lab = new LongAndBytes();
lab.UlongValue = 0;
do {
// stuff
lab.UlongValue++;
} while (lab.ULongValue != 0);
Each of the members Byte0...Byte7 overlap the ulong and share its members. It's not an array - I tried dinking around with that and had unsatisfactory results. I bet someone knows the magic declaration to make that happen. I can do that for a P/Invoke, but not for use in .NET as an array is an object.
byte[8] is essentially an ulong but if you really need it to be byte[8] you can use
byte[] bytes = new byte[8];
ulong i = 0;
bytes = BitConverter.GetBytes(i);
You can extract the bytes using bit operators:
byte[] bytes = new byte[8];
for (ulong u = 0; u < ulong.MaxValue; u++)
{
bytes[0] = (byte)(u & 0xff);
bytes[1] = (byte)((u >> 8) & 0xff);
bytes[2] = (byte)((u >> 16) & 0xff);
bytes[3] = (byte)((u >> 24) & 0xff);
bytes[4] = (byte)((u >> 32) & 0xff);
bytes[5] = (byte)((u >> 40) & 0xff);
bytes[6] = (byte)((u >> 48) & 0xff);
bytes[7] = (byte)((u >> 56) & 0xff);
// do your stuff...
}
This is less 'hackish', since it operates on an unsigned 64-bit integer first and then extract the bytes. However beware CPU endianess.
for (UInt64 i = 0; i < UInt64.MaxValue; i++)
{
byte[] data = BitConverter.GetBytes(i)
}
byte[] array = new byte[8];
int[] shifts = new int[] { 0, 8, 16, 24, 32, 40, 48, 56 };
for (long index = long.MinValue; index <= long.MaxValue; index++)
{
for (int i = 0; i < 8; i++)
{
array[i] = (byte)((index >> shifts[i]) & 0xff);
}
// test array
}
for (int i = 0; i < bytes.Length & 0 == ++bytes[i]; i++);
Should be as fast as the unsafe method and allows arrays of any size.
Simple iteration:
static IEnumerable<byte[]> Iterate(int arrayLength) {
var arr = new byte[arrayLength];
var i = 0;
yield return arr;
while (i < arrayLength)
{
if (++arr[i] != 0)
{
i = 0;
yield return arr;
}
else i++;
}
}
static void Main(string[] args)
{
foreach (var arr in Iterate(2))
{
Console.Write(String.Join(",", arr.Select(x => $"{x:D3}")));
Console.WriteLine();
}
}
Sorry for the late post, but I needed the described feature too and implemented it in a pretty easy way in my opinion. Maybe it's useful for somebody else too:
private byte[] IncrementBytes(byte[] bytes)
{
for (var i = bytes.Length - 1; i >= 0; i--)
{
if (bytes[i] < byte.MaxValue)
{
bytes[i]++;
break;
}
bytes[i] = 0;
}
return bytes;
}
BitConverter.ToInt64 / BitConverter.GetBytes - convert 8 byte to exactly long, and increment it.
When almost done convert back to bytes.
It is the fastest way in system