I'm trying to create a 16-bit PCM version of NAudio's MixingWaveProvider32 that operates on 16-bit PCM samples instead of 32-bit floats.
Each 16-bit stereo sample is packed in a byte array like so...
Byte 0
Byte 1
Byte 2
Byte 3
Channel 1 (Left) Lo
Channel 1 Hi
Channel 2 (Right) Lo
Channel 2 Hi
The two bytes per channel are interpreted as signed integers, so the minimum value is short.MinValue, the max is short.MaxValue. I don't think you can simply add the byte values to each other.
I've written some very long-handed code (see below) but I am convinced there is a more performant way of doing this.
I'd be really grateful for any help :-)
static void Main(string[] args)
{
// setup some input data
byte[] b1 = { 0x1, 0x0, 0x2, 0x0, 0x3, 0x0, 0x4, 0x0 };
byte[] b2 = new byte[b1.Length];
Array.Copy(b1, b2, b1.Length);
byte[] result = new byte[b1.Length];
Console.WriteLine("b1");
b1.DumpPcm();
Console.WriteLine();
Console.WriteLine("b2");
b2.DumpPcm();
for (int i = 0; i < b1.Length; i += 4)
{
short l1 = BitConverter.ToInt16(b1, i);
short r1 = BitConverter.ToInt16(b1, i + 2);
short l2 = BitConverter.ToInt16(b2, i);
short r2 = BitConverter.ToInt16(b2, i + 2);
byte[] resl = BitConverter.GetBytes(l1 + l2);
byte[] resr = BitConverter.GetBytes(r1 + r2);
result[i] = resl[0];
result[i + 1] = resl[1];
result[i + 2] = resr[0];
result[i + 3] = resr[1];
}
Console.WriteLine();
Console.WriteLine("Result...");
result.DumpPcm();
Console.ReadLine();
}
You could always use unsafe code, this should be significantly faster since you save a bunch of method calls and object allocations:
// setup some input data
byte[] b1 = {0x1, 0x0, 0x2, 0x0, 0x3, 0x0, 0x4, 0x0};
byte[] b2 = new byte[b1.Length];
Array.Copy(b1, b2, b1.Length);
byte[] result = new byte[b1.Length];
fixed (byte* b1Ptr = b1)
{
fixed (byte* b2Ptr = b2)
{
fixed (byte* rPtr = result)
{
var s1Ptr = (short*) b1Ptr;
var s2Ptr = (short*) b2Ptr;
var srPtr = (short*) rPtr;
var length = b1.Length / 2;
for (int i = 0; i < length; i++)
{
var v = s1Ptr[i] + s2Ptr[i];
srPtr[i] = (short) v;
Console.WriteLine($"{s1Ptr[i]} + {s2Ptr[i]} -> {srPtr[i]}");
}
}
}
}
Note that summing values might cause overflow. You should probably either average the two samples, or clamp the result to avoid this.
Related
I am uploading frames from a camera to a texture on the GPU for processing (using SharpDX). My issue is ATM is that I have the frames coming in as 24bit RGB, but DX11 no longer has the 24bit RGB texture format, only 32bit RGBA. After each 3 bytes I need to add another byte with the value of 255 (no transparency). I've tried this method of iterating thru the byte array to add it but it's too expensive. Using GDI bitmaps to convert is also very expensive.
int count = 0;
for (int i = 0; i < frameDataBGRA.Length - 3; i+=4)
{
frameDataBGRA[i] = frameData[i - count];
frameDataBGRA[i + 1] = frameData[(i + 1) - count];
frameDataBGRA[i + 2] = frameData[(i + 2) - count];
frameDataBGRA[i + 3] = 255;
count++;
}
Assuming you can compile with unsafe, using pointers in that case will give you significant boost.
First create two structs to hold data in a packed way:
[StructLayout(LayoutKind.Sequential)]
public struct RGBA
{
public byte r;
public byte g;
public byte b;
public byte a;
}
[StructLayout(LayoutKind.Sequential)]
public struct RGB
{
public byte r;
public byte g;
public byte b;
}
First version :
static void Process_Pointer_PerChannel(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbPtr = &rgbData[0])
{
fixed (byte* rgbaPtr = &rgbaData[0])
{
RGB* rgb = (RGB*)rgbPtr;
RGBA* rgba = (RGBA*)rgbaPtr;
for (int i = 0; i < pixelCount; i++)
{
rgba->r = rgb->r;
rgba->g = rgb->g;
rgba->b = rgb->b;
rgba->a = 255;
rgb++;
rgba++;
}
}
}
}
This avoids a lot of indexing, and passes data directly.
Another version which is slightly faster, to box directly:
static void Process_Pointer_Cast(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbPtr = &rgbData[0])
{
fixed (byte* rgbaPtr = &rgbaData[0])
{
RGB* rgb = (RGB*)rgbPtr;
RGBA* rgba = (RGBA*)rgbaPtr;
for (int i = 0; i < pixelCount; i++)
{
RGB* cp = (RGB*)rgba;
*cp = *rgb;
rgba->a = 255;
rgb++;
rgba++;
}
}
}
}
One small extra optimization (which is marginal), if you keep the same array all the time and reuse it, you can initialize it once with alpha set to 255 eg :
static void InitRGBA_Alpha(int pixelCount, byte[] rgbaData)
{
for (int i = 0; i < pixelCount; i++)
{
rgbaData[i * 4 + 3] = 255;
}
}
Then as you will never change this channel, other functions do not need to write into it anymore:
static void Process_Pointer_Cast_NoAlpha (int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbPtr = &rgbData[0])
{
fixed (byte* rgbaPtr = &rgbaData[0])
{
RGB* rgb = (RGB*)rgbPtr;
RGBA* rgba = (RGBA*)rgbaPtr;
for (int i = 0; i < pixelCount; i++)
{
RGB* cp = (RGB*)rgba;
*cp = *rgb;
rgb++;
rgba++;
}
}
}
}
In my test (running a 1920*1080 image, 100 iterations), I get (i7, x64 release build, average running time)
Your version : 6.81ms
Process_Pointer_PerChannel : 4.3ms
Process_Pointer_Cast : 3.8ms
Process_Pointer_Cast_NoAlpha : 3.5ms
Please note that of course all those functions can as well be easily chunked and parts run in multi threaded versions.
If you need higher performance, you have two options ( a bit out of scope from the question)
upload your image in a byte address buffer (as rgb), and perform the conversion to texture in a compute shader. That involves some bit shifting and a bit of fiddling with formats, but is reasonably straightforward to achieve.
Generally camera images come in Yuv format (with u and v downsampled), so it's mush faster to upload image in that color space and perform conversion to rgba either in pixel shader or compute shader. If your camera sdk allows to get pixel data in that native format, that's the way to go.
#catflier: good work, but it can go a little faster. ;-)
Reproduced times on my hardware:
Base version: 5.48ms
Process_Pointer_PerChannel: 2.84ms
Process_Pointer_Cast: 2.16ms
Process_Pointer_Cast_NoAlpha: 1.60ms
My experiments:
FastConvert: 1.45ms
FastConvert4: 1.13ms (here: count of pixels must be divisible by 4, but is usually no problem)
Things that have improved speed:
your RGB structure must always read 3 single bytes per pixel, but it is faster to read a whole uint (4 bytes) and simply ignore the last byte
the alpha value can then be added directly to a uint bit calculation
modern processors can often address fixed pointers with offset positions faster than pointers that are incremented themselves.
the offset variables in x64 mode should also directly use a 64-bit data value (long instead of int), which reduces the overhead of the accesses
the partial rolling out of the inner loop increases some performance again
The Code:
static void FastConvert(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
fixed (byte* rgbP = &rgbData[0], rgbaP = &rgbaData[0])
{
for (long i = 0, offsetRgb = 0; i < pixelCount; i++, offsetRgb += 3)
{
((uint*)rgbaP)[i] = *(uint*)(rgbP + offsetRgb) | 0xff000000;
}
}
}
static void FastConvert4Loop(long pixelCount, byte* rgbP, byte* rgbaP)
{
for (long i = 0, offsetRgb = 0; i < pixelCount; i += 4, offsetRgb += 12)
{
uint c1 = *(uint*)(rgbP + offsetRgb);
uint c2 = *(uint*)(rgbP + offsetRgb + 3);
uint c3 = *(uint*)(rgbP + offsetRgb + 6);
uint c4 = *(uint*)(rgbP + offsetRgb + 9);
((uint*)rgbaP)[i] = c1 | 0xff000000;
((uint*)rgbaP)[i + 1] = c2 | 0xff000000;
((uint*)rgbaP)[i + 2] = c3 | 0xff000000;
((uint*)rgbaP)[i + 3] = c4 | 0xff000000;
}
}
static void FastConvert4(int pixelCount, byte[] rgbData, byte[] rgbaData)
{
if ((pixelCount & 3) != 0) throw new ArgumentException();
fixed (byte* rgbP = &rgbData[0], rgbaP = &rgbaData[0])
{
FastConvert4Loop(pixelCount, rgbP, rgbaP);
}
}
I have an array of audio data, which is a lot of Int32 numbers represented by array of bytes (each 4 byte element represents an Int32) and i want to do some manipulation on the data (for example, add 10 to each Int32).
I converted the bytes to Int32, do the manipulation and convert it back to bytes as in this example:
//byte[] buffer;
for (int i=0; i<buffer.Length; i+=4)
{
Int32 temp0 = BitConverter.ToInt32(buffer, i);
temp0 += 10;
byte[] temp1 = BitConverter.GetBytes(temp0);
for (int j=0;j<4;j++)
{
buffer[i + j] = temp1[j];
}
}
But I would like to know if there is a better way to do such manipulation.
You can check the .NET Reference Source for pointers (grin) on how to convert from/to big endian.
class intFromBigEndianByteArray {
public byte[] b;
public int this[int i] {
get {
i <<= 2; // i *= 4; // optional
return (int)b[i] << 24 | (int)b[i + 1] << 16 | (int)b[i + 2] << 8 | b[i + 3];
}
set {
i <<= 2; // i *= 4; // optional
b[i ] = (byte)(value >> 24);
b[i + 1] = (byte)(value >> 16);
b[i + 2] = (byte)(value >> 8);
b[i + 3] = (byte)value;
}
}
}
and sample use:
byte[] buffer = { 127, 255, 255, 255, 255, 255, 255, 255 };//big endian { int.MaxValue, -1 }
//bool check = BitConverter.IsLittleEndian; // true
//int test = BitConverter.ToInt32(buffer, 0); // -129 (incorrect because little endian)
var fakeIntBuffer = new intFromBigEndianByteArray() { b = buffer };
fakeIntBuffer[0] += 2; // { 128, 0, 0, 1 } = big endian int.MinValue - 1
fakeIntBuffer[1] += 2; // { 0, 0, 0, 1 } = big endian 1
Debug.Print(string.Join(", ", buffer)); // "128, 0, 0, 0, 1, 0, 0, 1"
For better performance you can look into parallel processing and SIMD instructions - Using SSE in C#
For even better performance, you can look into Utilizing the GPU with c#
How about the following approach:
struct My
{
public int Int;
}
var bytes = Enumerable.Range(0, 20).Select(n => (byte)(n + 240)).ToArray();
foreach (var b in bytes) Console.Write("{0,-4}", b);
// Pin the managed memory
GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
for (int i = 0; i < bytes.Length; i += 4)
{
// Copy the data
My my = (My)Marshal.PtrToStructure<My>(handle.AddrOfPinnedObject() + i);
my.Int += 10;
// Copy back
Marshal.StructureToPtr(my, handle.AddrOfPinnedObject() + i, true);
}
// Unpin
handle.Free();
foreach (var b in bytes) Console.Write("{0,-4}", b);
I made it just for fun.
Not sure that's less ugly.
I don't know, will it be faster? Test it.
I am currently using BitConverter to package two unsigned shorts inside a signed int. This code executes millions of times for different values and I am thinking the code could be optimized further. Here is what I am currently doing -- you can assume the code is C#/NET.
// to two unsigned shorts from one signed int:
int xy = 343423;
byte[] bytes = BitConverter.GetBytes(xy);
ushort m_X = BitConverter.ToUInt16(bytes, 0);
ushort m_Y = BitConverter.ToUInt16(bytes, 2);
// convet two unsigned shorts to one signed int
byte[] xBytes = BitConverter.GetBytes(m_X);
byte[] yBytes = BitConverter.GetBytes(m_Y);
byte[] bytes = new byte[] {
xBytes[0],
xBytes[1],
yBytes[0],
yBytes[1],
};
return BitConverter.ToInt32(bytes, 0);
So it occurs to me that I can avoid the overhead of constructing arrays if I bitshift. But for the life of me I can't figure out what the correct shift operation is. My first pathetic attempt involved the following code:
int xy = 343423;
const int mask = 0x00000000;
byte b1, b2, b3, b4;
b1 = (byte)((xy >> 24));
b2 = (byte)((xy >> 16));
b3 = (byte)((xy >> 8) & mask);
b4 = (byte)(xy & mask);
ushort m_X = (ushort)((xy << b4) | (xy << b3));
ushort m_Y = (ushort)((xy << b2) | (xy << b1));
Could someone help me? I am thinking I need to mask the upper and lower bytes before shifting. Some of the examples I see include subtraction with type.MaxValue or an arbitrary number, like negative twelve, which is pretty confusing.
** Update **
Thank you for the great answers. Here are the results of a benchmark test:
// 34ms for bit shift with 10M operations
// 959ms for BitConverter with 10M operations
static void Main(string[] args)
{
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
for (int i = 0; i < 10000000; i++)
{
ushort x = (ushort)i;
ushort y = (ushort)(i >> 16);
int result = (y << 16) | x;
}
stopWatch.Stop();
Console.WriteLine((int)stopWatch.Elapsed.TotalMilliseconds + "ms");
stopWatch.Start();
for (int i = 0; i < 10000000; i++)
{
byte[] bytes = BitConverter.GetBytes(i);
ushort x = BitConverter.ToUInt16(bytes, 0);
ushort y = BitConverter.ToUInt16(bytes, 2);
byte[] xBytes = BitConverter.GetBytes(x);
byte[] yBytes = BitConverter.GetBytes(y);
bytes = new byte[] {
xBytes[0],
xBytes[1],
yBytes[0],
yBytes[1],
};
int result = BitConverter.ToInt32(bytes, 0);
}
stopWatch.Stop();
Console.WriteLine((int)stopWatch.Elapsed.TotalMilliseconds + "ms");
Console.ReadKey();
}
The simplest way is to do it using two shifts:
int xy = -123456;
// Split...
ushort m_X = (ushort) xy;
ushort m_Y = (ushort)(xy>>16);
// Convert back...
int back = (m_Y << 16) | m_X;
Demo on ideone: link.
int xy = 343423;
ushort low = (ushort)(xy & 0x0000ffff);
ushort high = (ushort)((xy & 0xffff0000) >> 16);
int xxyy = low + (((int)high) << 16);
How to format the number which can be converted to Byte[] (array);
I know how to convert the result value to byte[], but the problem is how to format the intermediate number.
This is by data,
int packet = 10;
int value = 20;
long data = 02; // This will take 3 bytes [Last 3 Bytes]
i need the long value, through that i can shift and I'll fill the byte array like this
Byte[0] = 10;
Byte[1] = 20;
Byte[2] = 00;
Byte[3] = 00;
Byte[4] = 02;
Byte 2,3,4 are data
but formatting the intermediate value is the problem. How to form this
Here is the sample
long data= 683990319104; ///I referred this as intermediate value.
This is the number i receiving from built in application.
Byte[] by = new byte[5];
for (int i = 0; i < 5; i++)
{
by[i] = (byte)(data & 0xFF);
data>>= 8;
}
Here
Byte[0] = 00;
Byte[1] = 00; //Byte 0,1,2 are Data (ie data = 0)
Byte[2] = 00;
Byte[3] = 65; //Byte 3 is value (ie Value = 65)
Byte[4] = 159; // Byte 4 is packet (is Packet = 159);
This is one sample.
Currently BitConverter.GetBytes(..) is ues to receive.
Byte while sending, the parameter is long.
So i want the format to generate the number 683990319104 from
packet = 159
value = 65
data = 0
I think now its in understandable format :)
Not entirely sure what you are asking exactly, but I think you are looking for BitConverter.GetBytes(data).
The use of 3 bytes to define the long seems unusual; if it is just the 3 bytes... why a long? why not an int?
For example (note I've had to make assumptions about your byte-trimming based on your example - you won't have the full int/long range...):
static void Main() {
int packet = 10;
int value = 20;
long data = 02;
byte[] buffer = new byte[5];
WritePacket(buffer, 0, packet, value, data);
for (int i = 0; i < buffer.Length; i++)
{
Console.Write(buffer[i].ToString("x2"));
}
Console.WriteLine();
ReadPacket(buffer, 0, out packet, out value, out data);
Console.WriteLine(packet);
Console.WriteLine(value);
Console.WriteLine(data);
}
static void WritePacket(byte[] buffer, int offset, int packet,
int value, long data)
{
// note I'm trimming as per the original question
buffer[offset++] = (byte)packet;
buffer[offset++] = (byte)value;
int tmp = (int)(data); // more efficient to work with int, and
// we're going to lose the MSB anyway...
buffer[offset++] = (byte)(tmp>>2);
buffer[offset++] = (byte)(tmp>>1);
buffer[offset] = (byte)(tmp);
}
static void ReadPacket(byte[] buffer, int offset, out int packet,
out int value, out long data)
{
// note I'm trimming as per the original question
packet = buffer[offset++];
value = buffer[offset++];
data = ((int)buffer[offset++] << 2)
| ((int)buffer[offset++] << 1)
| (int)buffer[offset];
}
oooh,
Its simple
int packet = 159
int value = 65
int data = 0
long address = packet;
address = address<<8;
address = address|value;
address = address<<24;
address = address|data;
now the value of address is 683990319104 and i termed this as intermediate value. Let me know the exact term.
Assumption:
Converting a
byte[] from Little Endian to Big
Endian means inverting the order of the bits in
each byte of the byte[].
Assuming this is correct, I tried the following to understand this:
byte[] data = new byte[] { 1, 2, 3, 4, 5, 15, 24 };
byte[] inverted = ToBig(data);
var little = new BitArray(data);
var big = new BitArray(inverted);
int i = 1;
foreach (bool b in little)
{
Console.Write(b ? "1" : "0");
if (i == 8)
{
i = 0;
Console.Write(" ");
}
i++;
}
Console.WriteLine();
i = 1;
foreach (bool b in big)
{
Console.Write(b ? "1" : "0");
if (i == 8)
{
i = 0;
Console.Write(" ");
}
i++;
}
Console.WriteLine();
Console.WriteLine(BitConverter.ToString(data));
Console.WriteLine(BitConverter.ToString(ToBig(data)));
foreach (byte b in data)
{
Console.Write("{0} ", b);
}
Console.WriteLine();
foreach (byte b in inverted)
{
Console.Write("{0} ", b);
}
The convert method:
private static byte[] ToBig(byte[] data)
{
byte[] inverted = new byte[data.Length];
for (int i = 0; i < data.Length; i++)
{
var bits = new BitArray(new byte[] { data[i] });
var invertedBits = new BitArray(bits.Count);
int x = 0;
for (int p = bits.Count - 1; p >= 0; p--)
{
invertedBits[x] = bits[p];
x++;
}
invertedBits.CopyTo(inverted, i);
}
return inverted;
}
The output of this little application is different from what I expected:
00000001 00000010 00000011 00000100 00000101 00001111 00011000
00000001 00000010 00000011 00000100 00000101 00001111 00011000
80-40-C0-20-A0-F0-18
01-02-03-04-05-0F-18
1 2 3 4 5 15 24
1 2 3 4 5 15 24
For some reason the data remains the same, unless printed using BitConverter.
What am I not understanding?
Update
New code produces the following output:
10000000 01000000 11000000 00100000 10100000 11110000 00011000
00000001 00000010 00000011 00000100 00000101 00001111 00011000
01-02-03-04-05-0F-18
80-40-C0-20-A0-F0-18
1 2 3 4 5 15 24
128 64 192 32 160 240 24
But as I have been told now, my method is incorrect anyway because I should invert the bytes
and not the bits?
This hardware developer I'm working with told me to invert the bits because he cannot read the data.
Context where I'm using this
The application that will use this does not really work with numbers.
I'm supposed to save a stream of bits to file where
1 = white and 0 = black.
They represent pixels of a bitmap 256x64.
byte 0 to byte 31 represents the first row of pixels
byte 32 to byte 63 the second row of pixels.
I have code that outputs these bits... but the developer is telling
me they are in the wrong order... He says the bytes are fine but the bits are not.
So I'm left confused :p
No. Endianness refers to the order of bytes, not bits. Big endian systems store the most-significant byte first and little-endian systems store the least-significant first. The bits within a byte remain in the same order.
Your ToBig() function is returning the original data rather than the bit-swapped data, it seems.
Your method may be correct at this point. There are different meanings of endianness, and it depends on the hardware.
Typically, it's used for converting between computing platforms. Most CPU vendors (now) use the same bit ordering, but different byte ordering, for different chipsets. This means, that, if you are passing a 2-byte int from one system to another, you leave the bits alone, but swap bytes 1 and 2, ie:
int somenumber -> byte[2]: somenumber[high],somenumber[low] ->
byte[2]: somenumber[low],somenumber[high] -> int newNumber
However, this isn't always true. Some hardware still uses inverted BIT ordering, so what you have may be correct. You'll need to either trust your hardware dev. or look into it further.
I recommend reading up on this on Wikipedia - always a great source of info:
http://en.wikipedia.org/wiki/Endianness
Your ToBig method has a bug.
At the end:
invertedBits.CopyTo(data, i);
}
return data;
You need to change that to:
byte[] newData = new byte[data.Length];
invertedBits.CopyTo(newData, i);
}
return newData;
You're resetting your input data, so you're receiving both arrays inverted. The problem is that arrays are reference types, so you can modify the original data.
As greyfade already said, endianness is not about bit ordering.
The reason that your code doesn't do what you expect, is that the ToBig method changes the array that you send to it. That means that after calling the method the array is inverted, and data and inverted are just two references pointing to the same array.
Here's a corrected version of the method.
private static byte[] ToBig(byte[] data) {
byte[] result = new byte[data.length];
for (int i = 0; i < data.Length; i++) {
var bits = new BitArray(new byte[] { data[i] });
var invertedBits = new BitArray(bits.Count);
int x = 0;
for (int p = bits.Count - 1; p >= 0; p--) {
invertedBits[x] = bits[p];
x++;
}
invertedBits.CopyTo(result, i);
}
return result;
}
Edit:
Here's a method that changes endianness for a byte array:
static byte[] ConvertEndianness(byte[] data, int wordSize) {
if (data.Length % wordSize != 0) throw new ArgumentException("The data length does not divide into an even number of words.");
byte[] result = new byte[data.Length];
int offset = wordSize - 1;
for (int i = 0; i < data.Length; i++) {
result[i + offset] = data[i];
offset -= 2;
if (offset < -wordSize) {
offset += wordSize * 2;
}
}
return result;
}
Example:
byte[] data = { 1,2,3,4,5,6 };
byte[] inverted = ConvertEndianness(data, 2);
Console.WriteLine(BitConverter.ToString(inverted));
Output:
02-01-04-03-06-05
The second parameter is the word size. As endianness is the ordering of bytes in a word, you have to specify how large the words are.
Edit 2:
Here is a more efficient method for reversing the bits:
static byte[] ReverseBits(byte[] data) {
byte[] result = new byte[data.Length];
for (int i = 0; i < data.Length; i++) {
int b = data[i];
int r = 0;
for (int j = 0; j < 8; j++) {
r <<= 1;
r |= b & 1;
b >>= 1;
}
result[i] = (byte)r;
}
return result;
}
One big problem I see is ToBig changes the contents of the data[] array that is passed to it.
You're calling ToBig on an array named data, then assigning the result to inverted, but since you didn't create a new array inside ToBig, you modified both arrays, then you proceed to treat the arrays data and inverted as different when in reality they are not.