Store float as bytes into large array instead of new byte[] - c#

I need to serialize a bunch of floats and convert to little endian if necessary. I'm aware of BitConverter.GetBytes(float), but I'd rather avoid allocating a ton of little 4-byte arrays on the GC heap. How can I do the conversion into an existing large byte[] array with an offset index? I want something like:
float[] theFloats; // filled up somewhere
byte[] theBytes = new byte[theFloats.Length * 4];
int offset = 0;
for (int i = 0; i < numFloats; ++i)
{
MagicClass.CopyFloatToBytes(theFloats[i], theBytes, offset);
offset += 4;
}

You can create a MemoryStream around the array, then create a BinaryWriter and write floats to it.

Why not just use BitConverter.GetBytes?
You can also do this with [StructLayout(LayoutKind.Explicit)]
[StructLayout(LayoutKind.Explicit)]
public struct Convert32BitType
{
[FieldOffset(0)]
public int Int32Value;
[FieldOffset(0)]
public float FloatValue;
}
// Example:
var tmp = new Convert32BitType();
tmp.FloatValue = 1.1;
int ival = tmp.Int32Value;
byte b1 = (byte)(ival >> 24);
byte b2 = (byte)(ival >> 16);
byte b3 = (byte)(ival >> 8);
byte b4 = (byte)(ival >> 0);
Another possibility is to used the fixed keyword and cast the pointer, but that requires unsafe code.

Related

C# - Fastest way of Interpolating a large byte array (RGB to RGBA)

I am uploading frames from a camera to a texture on the GPU for processing (using SharpDX). My issue is ATM is that I have the frames coming in as 24bit RGB, but DX11 no longer has the 24bit RGB texture format, only 32bit RGBA. After each 3 bytes I need to add another byte with the value of 255 (no transparency). I've tried this method of iterating thru the byte array to add it but it's too expensive. Using GDI bitmaps to convert is also very expensive.
int count = 0;
for (int i = 0; i < frameDataBGRA.Length - 3; i+=4)
{
frameDataBGRA[i] = frameData[i - count];
frameDataBGRA[i + 1] = frameData[(i + 1) - count];
frameDataBGRA[i + 2] = frameData[(i + 2) - count];
frameDataBGRA[i + 3] = 255;
count++;
}
Assuming you can compile with unsafe, using pointers in that case will give you significant boost.
First create two structs to hold data in a packed way:
[StructLayout(LayoutKind.Sequential)]
public struct RGBA
{
public byte r;
public byte g;
public byte b;
public byte a;
}
[StructLayout(LayoutKind.Sequential)]
public struct RGB
{
public byte r;
public byte g;
public byte b;
}
First version :
static void Process_Pointer_PerChannel(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbPtr = &rgbData[0])
{
fixed (byte* rgbaPtr = &rgbaData[0])
{
RGB* rgb = (RGB*)rgbPtr;
RGBA* rgba = (RGBA*)rgbaPtr;
for (int i = 0; i < pixelCount; i++)
{
rgba->r = rgb->r;
rgba->g = rgb->g;
rgba->b = rgb->b;
rgba->a = 255;
rgb++;
rgba++;
}
}
}
}
This avoids a lot of indexing, and passes data directly.
Another version which is slightly faster, to box directly:
static void Process_Pointer_Cast(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbPtr = &rgbData[0])
{
fixed (byte* rgbaPtr = &rgbaData[0])
{
RGB* rgb = (RGB*)rgbPtr;
RGBA* rgba = (RGBA*)rgbaPtr;
for (int i = 0; i < pixelCount; i++)
{
RGB* cp = (RGB*)rgba;
*cp = *rgb;
rgba->a = 255;
rgb++;
rgba++;
}
}
}
}
One small extra optimization (which is marginal), if you keep the same array all the time and reuse it, you can initialize it once with alpha set to 255 eg :
static void InitRGBA_Alpha(int pixelCount, byte[] rgbaData)
{
for (int i = 0; i < pixelCount; i++)
{
rgbaData[i * 4 + 3] = 255;
}
}
Then as you will never change this channel, other functions do not need to write into it anymore:
static void Process_Pointer_Cast_NoAlpha (int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbPtr = &rgbData[0])
{
fixed (byte* rgbaPtr = &rgbaData[0])
{
RGB* rgb = (RGB*)rgbPtr;
RGBA* rgba = (RGBA*)rgbaPtr;
for (int i = 0; i < pixelCount; i++)
{
RGB* cp = (RGB*)rgba;
*cp = *rgb;
rgb++;
rgba++;
}
}
}
}
In my test (running a 1920*1080 image, 100 iterations), I get (i7, x64 release build, average running time)
Your version : 6.81ms
Process_Pointer_PerChannel : 4.3ms
Process_Pointer_Cast : 3.8ms
Process_Pointer_Cast_NoAlpha : 3.5ms
Please note that of course all those functions can as well be easily chunked and parts run in multi threaded versions.
If you need higher performance, you have two options ( a bit out of scope from the question)
upload your image in a byte address buffer (as rgb), and perform the conversion to texture in a compute shader. That involves some bit shifting and a bit of fiddling with formats, but is reasonably straightforward to achieve.
Generally camera images come in Yuv format (with u and v downsampled), so it's mush faster to upload image in that color space and perform conversion to rgba either in pixel shader or compute shader. If your camera sdk allows to get pixel data in that native format, that's the way to go.
#catflier: good work, but it can go a little faster. ;-)
Reproduced times on my hardware:
Base version: 5.48ms
Process_Pointer_PerChannel: 2.84ms
Process_Pointer_Cast: 2.16ms
Process_Pointer_Cast_NoAlpha: 1.60ms
My experiments:
FastConvert: 1.45ms
FastConvert4: 1.13ms (here: count of pixels must be divisible by 4, but is usually no problem)
Things that have improved speed:
your RGB structure must always read 3 single bytes per pixel, but it is faster to read a whole uint (4 bytes) and simply ignore the last byte
the alpha value can then be added directly to a uint bit calculation
modern processors can often address fixed pointers with offset positions faster than pointers that are incremented themselves.
the offset variables in x64 mode should also directly use a 64-bit data value (long instead of int), which reduces the overhead of the accesses
the partial rolling out of the inner loop increases some performance again
The Code:
static void FastConvert(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbP = &rgbData[0], rgbaP = &rgbaData[0])
{
for (long i = 0, offsetRgb = 0; i < pixelCount; i++, offsetRgb += 3)
{
((uint*)rgbaP)[i] = *(uint*)(rgbP + offsetRgb) | 0xff000000;
}
}
}
static void FastConvert4Loop(long pixelCount, byte* rgbP, byte* rgbaP)
{
for (long i = 0, offsetRgb = 0; i < pixelCount; i += 4, offsetRgb += 12)
{
uint c1 = *(uint*)(rgbP + offsetRgb);
uint c2 = *(uint*)(rgbP + offsetRgb + 3);
uint c3 = *(uint*)(rgbP + offsetRgb + 6);
uint c4 = *(uint*)(rgbP + offsetRgb + 9);
((uint*)rgbaP)[i] = c1 | 0xff000000;
((uint*)rgbaP)[i + 1] = c2 | 0xff000000;
((uint*)rgbaP)[i + 2] = c3 | 0xff000000;
((uint*)rgbaP)[i + 3] = c4 | 0xff000000;
}
}
static void FastConvert4(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
if ((pixelCount & 3) != 0) throw new ArgumentException();
fixed (byte* rgbP = &rgbData[0], rgbaP = &rgbaData[0])
{
FastConvert4Loop(pixelCount, rgbP, rgbaP);
}
}

Convert ushort[] into byte[] and back

I have a ushort array that needs converting into a byte array to be transferred over a network.
Once it gets to its destination, I need then reconvert it back into the same ushort array it was to being with.
Ushort Array
Is an array that is of Length 217,088 (1D array of broken down image 512 by 424). It's stored as 16-bit unsigned integers. Each element is 2 bytes.
Byte Array
It needs to be converted into a byte array for network purposes. As each ushort element is worth 2 bytes, I assume the byte array Length needs to be 217,088 * 2?
In terms of converting, and then 'unconverting' correctly, I am unsure on how to do that.
This is for a Unity3D project that is in C#. Could someone point me in the right direction?
Thanks.
You're looking for BlockCopy:
https://msdn.microsoft.com/en-us/library/system.buffer.blockcopy(v=vs.110).aspx
and yes, short as well as ushort is 2 bytes long; and that's why corresponding byte array should be two times longer than initial short one.
Direct (byte to short):
byte[] source = new byte[] { 5, 6 };
short[] target = new short[source.Length / 2];
Buffer.BlockCopy(source, 0, target, 0, source.Length);
Reverse:
short[] source = new short[] {7, 8};
byte[] target = new byte[source.Length * 2];
Buffer.BlockCopy(source, 0, target, 0, source.Length * 2);
using offsets (the second and the fourth parameters of Buffer.BlockCopy) you can have 1D array being broken down (as you've put it):
// it's unclear for me what is the "broken down 1d array", so
// let it be an array of array (say 512 lines, each of 424 items)
ushort[][] image = ...;
// data - sum up all the lengths (512 * 424) and * 2 (bytes)
byte[] data = new byte[image.Sum(line => line.Length) * 2];
int offset = 0;
for (int i = 0; i < image.Length; ++i) {
int count = image[i].Length * 2;
Buffer.BlockCopy(image[i], offset, data, offset, count);
offset += count;
}

How to convert an array of signed bytes to float?

I just got confused about how to convert an array of 4 signed bytes to a float number.
I just know for an array of unsigned bytes bts, probably I can use this function
BitConverter.ToSingle(bts, 0);
However, it looks like BitConverter.ToSingle only accepts byte array instead of sbyte array.
Could somebody give me some ideas please?
Thanks!
Maybe this:
float num = 0;
for (int i = 0; i < sbytesArr.Length; i++)
{
num = (num | sbytesArr[i]) << i * 4;
}
Float value = 5000.1234;
//
// Invoke BitConverter.GetBytes to convert double to bytes.
//
byte[] array = BitConverter.GetBytes(value);
foreach (byte element in array)
{
Console.WriteLine(element);
}
//
// You can convert the bytes back to a double.
//
Float result = BitConverter.Tofloat(array, 0);
Console.WriteLine(result);
Assuming that your signed bytes are in an array named sbts you can first of all convert to an unsigned byte array, and then use BitConverter.ToSingle().
byte[] bts = new byte[sbts.Length];
Buffer.BlockCopy(sbts, 0, bts, 0, sbts.Length);
float f = BitConverter.ToSingle(bts, 0);
It is a little known fact that byte and sbyte are interchangeable at the CLR level:
sbyte[] a = new sbyte[1];
byte[] b = (byte[])(object)a;
This code actually works at runtime. So can pass in the array that you have.
BitConverter.ToSingle((byte[])(object)bts, 0);
Call GetFloatValue method passing una array of four sbyte as parameter
public float GetFloatValue(sbyte[] data)
{
return bytesToFloat(data[0], data[1], data[2], data[3]);
}
private static float bytesToFloat(sbyte b0, sbyte b1, sbyte b2, sbyte b3)
{
int mantissa = (byte)b0 + ((byte)b1 << 8) + ((byte)b2 << 16);
return (float)(mantissa * Math.Pow(10, b3));
}

Fastest way to convert int to 4 bytes in C#

What is a fastest way to convert int to 4 bytes in C# ?
Fastest as in execution time not development time.
My own solution is this code:
byte[] bytes = new byte[4];
unchecked
{
bytes[0] = (byte)(data >> 24);
bytes[1] = (byte)(data >> 16);
bytes[2] = (byte)(data >> 8);
bytes[3] = (byte)(data);
}
Right now I see that my solution outperforms both struct and BitConverter by couple of ticks.
I think the unsafe is probably the fastest option and accept that as an answer but I would prefer to use a managed option.
A byte* cast using unsafe code is by far the fastest:
unsafe static void Main(string[] args) {
int i = 0x12345678;
byte* pi = (byte*)&i;
byte lsb = pi[0];
// etc..
}
That's what BitConverter does as well, this code avoids the cost of creating the array.
What is a fastest way to convert int to 4 bytes in C# ?
Using a BitConverter and it's GetBytes overload that takes a 32 bit integer:
int i = 123;
byte[] buffer = BitConverter.GetBytes(i);
The fastest way is with a struct containing 4 bytes.
In a defined layout (at byte position 0, 1, 2, 3
And an int32 that starts at position 0.
Put in the 4 variables, read out the byte.
Finished.
Significantly faster than the BitConverter.
http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute.aspx
has the necessary attribute.
[StructLayout(LayoutKind.Explicit)]
struct FooUnion
{
[FieldOffset(0)]
public byte byte0;
[FieldOffset(1)]
public byte byte1;
[FieldOffset(2)]
public byte byte2;
[FieldOffset(3)]
public byte byte3;
[FieldOffset(0)]
public int integer;
}
I have done a research on the times needed to serialize a basic type to byte array. I did it for the case when you already have an array and offset where you want to put your data. I guess that's really an important case compared to theoretical get an array of 4 bytes because when you are serializing something then it's exactly what you need. I have figured out that the answer to what method is faster depends on what type you want to serialize. I have tried few methods:
Unsafe reference with an additional buffer overrun check
GetBytes + consequent Buffer.BulkCopy (This is essentially the same as 1 plus overhead)
Direct assignment with shift (
m_Bytes[offset] = (byte)(value >> 8)
Direct assignment with shift and bitwise &
m_Bytes[offset] = (byte)((i >> 8) & 0xFF)
I ran all of the test 10 mln times. Below are the results in milliseconds
Long Int Short Byte Float Double
1 29 32 31 30 29 34
2 209 233 220 212 208 228
3 63 24 13 8 24 44
4 72 29 14
As you can see the unsafe way is much faster for long and double (unsigned versions are about the same as their signed versions so they are not in the table). For short/int/float the fastest way is the 2/4/4 assignments with shift. For byte the fastest is obviously the simple assignment. So regarding the original question - the assignment way is the best. This is the example of such a function in a fastest way:
public static void WriteInt(byte[] buffer, int offset, int value)
{
m_BytesInt[offset] = (byte)(value >> 24);
m_BytesInt[offset + 1] = (byte)(value >> 16);
m_BytesInt[offset + 2] = (byte)(value >> 8);
m_BytesInt[offset + 3] = (byte) value;
}
P.S. The tests were run on x64 environment with code compiled to cpu any (which was x64 on run) in release mode.
Note the BitConverter may not be the fastest as the test below shows.
Use the BitConverter class, specifically the GetBytes method that takes an Int32 parameter:
var myInt = 123;
var bytes = BitConverter.GetBytes(myInt);
You can use BitConverter.IsLittlEndian to determine the byte order based on the CPU architecture.
EDIT: The test below isn't conclusive due to compiler optimisations.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Runtime.InteropServices;
namespace ConsoleApplication1
{
[StructLayout(LayoutKind.Explicit)]
struct FooUnion
{
[FieldOffset(0)]
public byte byte0;
[FieldOffset(1)]
public byte byte1;
[FieldOffset(2)]
public byte byte2;
[FieldOffset(3)]
public byte byte3;
[FieldOffset(0)]
public int integer;
}
class Program
{
static void Main(string[] args)
{
testUnion();
testBitConverter();
Stopwatch Timer = new Stopwatch();
Timer.Start();
testUnion();
Timer.Stop();
Console.WriteLine(Timer.ElapsedTicks);
Timer = new Stopwatch();
Timer.Start();
testBitConverter();
Timer.Stop();
Console.WriteLine(Timer.ElapsedTicks);
Console.ReadKey();
}
static void testBitConverter()
{
byte[] UnionBytes;
for (int i = 0; i < 10000; i++)
{
UnionBytes = BitConverter.GetBytes(i);
}
}
static void testUnion()
{
byte[] UnionBytes;
for (int i = 0; i < 10000; i++)
{
FooUnion union = new FooUnion() { integer = i };
UnionBytes = new byte[] { union.byte0, union.byte1, union.byte2, union.byte3 };
}
}
}
}
As many in here seem to argue about if BitConverter is beter than a dedicated struct. Based on BCL source code the BitConverter.GetBytes() looks like this:
public static unsafe byte[] GetBytes(int value)
{
byte[] buffer = new byte[4];
fixed (byte* bufferRef = buffer)
{
*((int*)bufferRef) = value;
}
return buffer;
}
Which from my point of view is more clean and seems faster than making 1 integer + 4 byte assignments to a explicit struct as this one.
[StructLayout(LayoutKind.Explicit)]
struct IntByte
{
[FieldOffset(0)]
public int IntVal;
[FieldOffset(0)]
public byte Byte0;
[FieldOffset(1)]
public byte Byte1;
[FieldOffset(2)]
public byte Byte2;
[FieldOffset(3)]
public byte Byte3;
}
new IntByte { IntVal = 10 } -> Byte0, Byte1, Byte2, Byte3.
class Program
{
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
sw.Start();
unsafe{
byte[] byteArray = new byte[4];
for(int i = 0; i != int.MaxValue; ++i)
{
fixed(byte* asByte = byteArray)
*((int*)asByte) = 43;
}
}
Console.WriteLine(sw.ElapsedMilliseconds);
Console.Read();
}
}
Averages around 2770ms on my machine while
[StructLayout(LayoutKind.Explicit)]
struct Switcher
{
[FieldOffset(0)]
public int intVal;
[FieldOffset(0)]
public byte b0;
[FieldOffset(1)]
public byte b1;
[FieldOffset(2)]
public byte b2;
[FieldOffset(3)]
public byte b3;
}
class Program
{
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
sw.Start();
byte[] byteArray = new byte[4];
Switcher swi = new Switcher();
for(int i = 0; i != int.MaxValue; ++i)
{
swi.intVal = 43;
byteArray[0] = swi.b0;
byteArray[1] = swi.b1;
byteArray[2] = swi.b2;
byteArray[3] = swi.b3;
}
Console.WriteLine(sw.ElapsedMilliseconds);
Console.Read();
}
}
Averages around 4510ms.
I think this might be the fastest way in C#.. (with byte array initialized to 4x the int stream w/ int32
private MemoryStream Convert(int[] Num, byte[] Bytes)
{
Buffer.BlockCopy(Num, 0, Bytes, 0, Bytes.Length);
MemoryStream stream = new MemoryStream(Bytes);
return stream;
}
Union is the fastest way of splitting an integer into bytes. Below is a complete program in which the C# optimizer can't optimize the byte splitting operation out, because each byte is summed and the sum is printed out.
The timings on my laptop are 419 milliseconds for the Union and 461 milliseconds for the BitConverter. The speed gain, however, is much greater.
This method is used in an open source high-performance algorithms HPCsharp library, where the Union method provides a nice performance boost for the Radix Sort.
Union is faster because it performs no bitwise masking and no bit-shift, but simply reads the proper byte out of the 4-byte integer.
using System;
using System.Diagnostics;
using System.Runtime.InteropServices;
namespace SplitIntIntoBytes
{
[StructLayout(LayoutKind.Explicit)]
struct FooUnion
{
[FieldOffset(0)]
public byte byte0;
[FieldOffset(1)]
public byte byte1;
[FieldOffset(2)]
public byte byte2;
[FieldOffset(3)]
public byte byte3;
[FieldOffset(0)]
public int integer;
}
class Program
{
static void Main(string[] args)
{
testUnion();
testBitConverter();
Stopwatch Timer = new Stopwatch();
Timer.Start();
int sumTestUnion = testUnion();
Timer.Stop();
Console.WriteLine("time of Union: " + Timer.ElapsedTicks + " milliseconds, sum: " + sumTestUnion);
Timer.Restart();
int sumBitConverter = testBitConverter();
Timer.Stop();
Console.WriteLine("time of BitConverter: " + Timer.ElapsedTicks + " milliseconds, sum: " + sumBitConverter);
Console.ReadKey();
}
static int testBitConverter()
{
byte[] UnionBytes = new byte[4];
byte[] SumOfBytes = new byte[4];
SumOfBytes[0] = SumOfBytes[1] = SumOfBytes[2] = SumOfBytes[3] = 0;
for (int i = 0; i < 10000; i++)
{
UnionBytes = BitConverter.GetBytes(i);
SumOfBytes[0] += UnionBytes[0];
SumOfBytes[1] += UnionBytes[1];
SumOfBytes[2] += UnionBytes[2];
SumOfBytes[3] += UnionBytes[3];
}
return SumOfBytes[0] + SumOfBytes[1] + SumOfBytes[2] + SumOfBytes[3];
}
static int testUnion()
{
byte[] UnionBytes;
byte[] SumOfBytes = new byte[4];
SumOfBytes[0] = SumOfBytes[1] = SumOfBytes[2] = SumOfBytes[3] = 0;
FooUnion union = new FooUnion();
for (int i = 0; i < 10000; i++)
{
union.integer = i;
UnionBytes = new byte[] { union.byte0, union.byte1, union.byte2, union.byte3 };
SumOfBytes[0] += UnionBytes[0];
SumOfBytes[1] += UnionBytes[1];
SumOfBytes[2] += UnionBytes[2];
SumOfBytes[3] += UnionBytes[3];
}
return SumOfBytes[0] + SumOfBytes[1] + SumOfBytes[2] + SumOfBytes[3];
}
}
}

Fast way to swap bytes in array from big endian to little endian in C#

I'm reading from a binary stream which is big-endian. The BitConverter class does this automatically. Unfortunately, the floating point conversion I need is not the same as BitConverter.ToSingle(byte[]) so I have my own routine from a co-worker. But the input byte[] needs to be in little-endian. Does anyone have a fast way to convert endianness of a byte[] array. Sure, I could swap each byte but there has got to be a trick. Thanks.
Here is a fast method for changing endianess for singles in a byte array:
public static unsafe void SwapSingles(byte[] data) {
int cnt = data.Length / 4;
fixed (byte* d = data) {
byte* p = d;
while (cnt-- > 0) {
byte a = *p;
p++;
byte b = *p;
*p = *(p + 1);
p++;
*p = b;
p++;
*(p - 3) = *p;
*p = a;
p++;
}
}
}
I use LINQ:
var bytes = new byte[] {0, 0, 0, 1};
var littleEndianBytes = bytes.Reverse().ToArray();
Single x = BitConverter.ToSingle(littleEndianBytes, 0);
You can also .Skip() and .Take() to your heart's content, or else use an index in the BitConverter methods.
What does the routine from your co-worker look like? If it accesses the bytes explicitly, you could change the code (or rather, create a separate method for big-endian data) instead of reversing the bytes.

Categories

Resources