The idea: Being able to take the bytes of any struct, send those bytes across a TcpClient (or through my Client wrapper), then have the receiving client load those bytes and use pointers to "paint" them onto a new struct.
The problem: It reads the bytes into the buffer perfectly; it reads the array of bytes on the other end perfectly. The "paint" operation, however, fails miserably. I write a new Vector3(1F, 2F, 3F); I read a Vector3(0F, 0F, 0F)...Obviously, not ideal.
Unfortunately, I don't see the bug - If it works one way, it should work the reverse - And the values are being filled in.
The write/read functions are as follows:
public static unsafe void Write<T>(Client client, T value) where T : struct
{
int n = System.Runtime.InteropServices.Marshal.SizeOf(value);
byte[] buffer = new byte[n];
{
var handle = System.Runtime.InteropServices.GCHandle.Alloc(value, System.Runtime.InteropServices.GCHandleType.Pinned);
void* ptr = handle.AddrOfPinnedObject().ToPointer();
byte* bptr = (byte*)ptr;
for (int t = 0; t < n; ++t)
{
buffer[t] = *(bptr + t);
}
handle.Free();
}
client.Writer.Write(buffer);
}
Line Break
public static unsafe T Read<T>(Client client) where T : struct
{
T r = new T();
int n = System.Runtime.InteropServices.Marshal.SizeOf(r);
{
byte[] buffer = client.Reader.ReadBytes(n);
var handle = System.Runtime.InteropServices.GCHandle.Alloc(r, System.Runtime.InteropServices.GCHandleType.Pinned);
void* ptr = handle.AddrOfPinnedObject().ToPointer();
byte* bptr = (byte*)ptr;
for (int t = 0; t < n; ++t)
{
*(bptr + t) = buffer[t];
}
handle.Free();
}
return r;
}
Help, please, thanks.
Edit:
Well, one major problem is that I'm getting a handle to a temporary copy created when I passed in the struct value.
Edit2:
Changing "T r = new T();" to "object r = new T();" and "return r" to "return (T)r" boxes and unboxes the struct and, in the meantime, makes it a reference, so the pointer actually points to it.
However, it is slow. I'm getting 13,500 - 14,500 write/reads per second.
Edit3:
OTOH, serializing/deserializing the Vector3 through a BinaryFormatter gets about 750 writes/reads per second. So a Lot faster than what I was using. :)
Edit4:
Sending the floats individually got 8,400 RW/second. Suddenly, I feel much better about this. :)
Edit5:
Tested GCHandle allocation pinning and freeing; 28,000,000 ops per second (Compared to about 1,000,000,000 Int32/int add+assign/second. So compared to integers, it's 35 times slower. However, that's still comparatively fast enough). Note that you don't appear to be able to pin classes, even if GCHandle does auto-boxed structs fine (GCHandle accepts values of type "object").
Now, if the C# guys update constraints to the point where the pointer allocation recognizes that "T" is a struct, I could just assign directly to a pointer, which is...Yep, incredibly fast.
Next up: Probably testing write/read using separate threads. :) See how much the GCHandle actually affects the send/receive delay.
As it turns out:
Edit6:
double start = Timer.Elapsed.TotalSeconds;
for (t = 0; t < count; ++t)
{
Vector3 from = new Vector3(1F, 2F, 3F);
// Vector3* ptr = &test;
// Vector3* ptr2 = &from;
int n = sizeof(Vector3);
if (n / 4 * 4 != n)
{
// This gets 9,000,000 ops/second;
byte* bptr1 = (byte*)&test;
byte* bptr2 = (byte*)&from;
// int n = 12;
for (int t2 = 0; t2 < n; ++t2)
{
*(bptr1 + t2) = *(bptr2 + t2);
}
}
else
{
// This speedup gets 24,000,000 ops/second.
int n2 = n / 4;
int* iptr1 = (int*)&test;
int* iptr2 = (int*)&from;
// int n = 12;
for (int t2 = 0; t2 < n2; ++t2)
{
*(iptr1 + t2) = *(iptr2 + t2);
}
}
}
So, overall, I don't think the GCHandle is really slowing things down. (Those who are thinking this is a slow way of assigning one Vector3 to another, remember that the purpose is to serialize structs into a byte[] buffer to send over a network. And, while that's not what we're doing here, it would be rather easy to do with the first method).
Edit7:
The following got 6,900,000 ops/second:
for (t = 0; t < count; ++t)
{
Vector3 from = new Vector3(1F, 2F, 3F);
int n = sizeof(Vector3);
byte* bptr2 = (byte*)&from;
byte[] buffer = new byte[n];
for (int t2 = 0; t2 < n; ++t2)
{
buffer[t2] = *(bptr2 + t2);
}
}
...Help! I've got IntruigingPuzzlitus! :D
Related
I have float array inside C++ function.
C++ function
void bleplugin_GetGolfResult(float* result)
{
float *array = new float[20];
for(int i=0; i < 20; i++)
array[i]= 25;
result = array;
//DEBUG PRINTING1
for(int i=0; i < 20; i++)
cout << result[i] << endl;//Here is correct
return;
}
Inside C#
[DllImport ("__Internal")]
private static unsafe extern void bleplugin_GetGolfResult (float* result);
public static unsafe float[] result = new float[20];
public unsafe static void GetGolfREsult(){
fixed (float* ptr_result = result) //or equivalently "... = &f2[0]" address of f2[0]
{
bleplugin_GetGolfResult( ptr_result );
//DEBUG PRINTING2
for(int i = 0; i < 20; i++)
Debug.Log("result data " + ptr_result[i]);
}
return;
}
I called GetGolfREsult() from another function to get result.
//DEBUG PRINTING1 has correct output.
But //DEBUG PRINTING2 produced 0 only.
What could be wrong?
As UnholySheep and nvoigt already stated,
result = array;
overrides the address of the passed pointer, making you lose the reference to the calling function.
Directly writing to your parameter should solve this.
result[i] = 25;
Further you dont actually have to use pointers in c#.
You can actually do the following:
Declare your Import like this:
private static extern void bleplugin_GetGolfResult (float arr[]);
Then you can call it like this:
float arr = new float[20];
bleplugin_GetGolfResult(arr);
This line in your C++ code:
float *array = new float[20];
creates a new array, which you operate on in C++. Then control returns to C#, who has it's own array and that's still unchanged. Why don't you write to the array you got?
The problem is that you use the assignment operator on the parameter result which prevents the data from being transferred to the C# array on return.
Using the following C++ example:
void z(int * x)
{
x = new int(4);
}
int main()
{
int * x = new int(-2);
z(x);
cout<<*x<<endl;
}
The output for this is -2 not 4 because you use the assignment operator on the parameter.
I've been trying to get rid of the unmanaged code that I'm currently using on my source code after a friend suggested why I shouldn't be using unmanaged code. However I seem to keep facing a couple of issues here and there. For most case scenarios I used Buffer.BlockCopy() as it seemed to be the most adequate method but there are still a few in which I'm not sure what to use.
Basically this is used to handle packets that are sent between a server and a client. WriteInt16 is a function used to write ushortvalues in the byte array for example. (1. is basically just doing this at offset 20 and 22).
I'll leave a couple examples below:
1.
fixed (byte* p = Data)
{
*((ushort*)(p + 20)) = X;//x is an ushort
*((ushort*)(p + 22)) = Y;//y is an ushort
}
2.
private byte* Ptr
{
get
{
fixed (byte* p = PData)
return p;
}
}
3.
public unsafe void WriteInt16(ushort val)
{
try
{
*((ushort*)(Ptr + Count)) = val;
Count += 2;
}
catch{}
}
Assuming little Endian platform and byte arrays (not tested):
*((ushort*)(p + 20)) = X;
is
Data[20] = (byte)X;
Data[21] = (byte)(X >> 8);
and
*((ushort*)(Ptr + Count)) = val;
is
PData[Count] = (byte)val;
PData[Count + 1] = (byte)(val >> 8);
I am using C# and CUDAfy.net (yes, this problem is easier in straight C with pointers, but I have my reasons for using this approach given the larger system).
I have a video frame grabber card that is collecting byte[1024 x 1024] image data at 30 FPS. Every 33.3 ms it fills a slot in a circular buffer and returns a System.IntPtr that points to that un-managed 1D vector of *byte; The Circular buffer has 15 slots.
On the GPU device (Tesla K40) I want to have a global 2D array that is organized as a dense 2D array. That is, I want something like the Circular Queue but on the GPU organized as a dense 2D array.
byte[15, 1024*1024] rawdata;
// if CUDAfy.NET supported jagged arrays I could use byte[15][1024*1024 but it does not
How can I fill in a different row each 33ms? Do I use something like:
gpu.CopyToDevice<byte>(inputPtr, 0, rawdata, offset, length) // length = 1024*1024
//offset is computed by rowID*(1024*1024) where rowID wraps to 0 via modulo 15.
// inputPrt is the System.Inptr that points to the buffer in the circular queue (un-managed)?
// rawdata is a device buffer allocated gpu.Allocate<byte>(1024*1024);
And in my kernel header is:
[Cudafy]
public static void filter(GThread thread, byte[,] rawdata, int frameSize, byte[] result)
I did try something along these lines. But there is no API pattern in CudaFy for:
GPGPU.CopyToDevice(T) Method (IntPtr, Int32, T[,], Int32, Int32, Int32)
So I used the gpu.Cast Function to change the 2D device array to 1D.
I tried the code below, but I am getting CUDA.net exception: ErrorLaunchFailed
FYI: When I try the CUDA emulator, it aborts on the CopyToDevice
claiming that Data is not host allocated
public static byte[] process(System.IntPtr data, int slot)
{
Stopwatch watch = new Stopwatch();
watch.Start();
byte[] output = new byte[FrameSize];
int offset = slot*FrameSize;
gpu.Lock();
byte[] rawdata = gpu.Cast<byte>(grawdata, FrameSize); // What is the size supposed to be? Documentation lacking
gpu.CopyToDevice<byte>(data, 0, rawdata, offset, FrameSize * frameCount);
byte[] goutput = gpu.Allocate<byte>(output);
gpu.Launch(height, width).filter(rawdata, FrameSize, goutput);
runTime = watch.Elapsed.ToString();
gpu.CopyFromDevice(goutput, output);
gpu.Free(goutput);
gpu.Synchronize();
gpu.Unlock();
watch.Stop();
totalRunTime = watch.Elapsed.ToString();
return output;
}
I propose this "solution", for now, either:
1. Run the program only in native mode (not in emulation mode).
or
2. Do not handle the pinned-memory allocation yourself.
There seems to be an open issue with that now. But this happens only in emulation mode.
see: https://cudafy.codeplex.com/workitem/636
If I understand your question properly I think you are looking to convert the
byte* you get from the cyclic buffer into a multi-dimensional byte array to be sent to
the graphics card API.
int slots = 15;
int rows = 1024;
int columns = 1024;
//Try this
for (int currentSlot = 0; currentSlot < slots; currentSlot++)
{
IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);
// use Marshal.Copy ?
byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory);
int offset =0;
for (int m = 0; m < rows; m++)
for (int n = 0; n < columns; n++)
{
//then send this to your GPU method
rawForGpu[m, n] = ReadByteValue(IntPtr: intPtrToUnManagedMemory,
offset++);
}
}
//or try this
for (int currentSlot = 0; currentSlot < slots; currentSlot++)
{
IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);
// use Marshal.Copy ?
byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory);
byte[,] rawForGpu = ConvertTo2DArray(byteData, rows, columns);
}
}
private static byte[,] ConvertTo2DArray(byte[] byteArr, int rows, int columns)
{
byte[,] data = new byte[rows, columns];
int totalElements = rows * columns;
//Convert 1D to 2D rows, colums
return data;
}
private static IntPtr CopyContextFrom(int slotNumber)
{
//code that return byte* from circular buffer.
return IntPtr.Zero;
}
You should consider using the GPGPU Async functionality that's built in for a really efficient way to move data from/to host/device and use the gpuKern.LaunchAsync(...)
Check out http://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU for an efficient way to use this. Another great example can be found in CudafyExamples project, look for PinnedAsyncIO.cs. Everything you need to do what you're describing.
This is in CudaGPU.cs in Cudafy.Host project, which matches the method you're looking for (only it's async):
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, DevicePtrEx devArray,
int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[, ,] devArray,
int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[,] devArray,
int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[] devArray,
int devOffset, int count, int streamId = 0) where T : struct;
Is there a method (in c#/.net) that would left-shift (bitwise) each short in a short[] that would be faster then doing it in a loop?
I am talking about data coming from a digital camera (16bit gray), the camera only uses the lower 12 bits. So to see something when rendering the data it needs to be shifted left by 4.
This is what I am doing so far:
byte[] RawData; // from camera along with the other info
if (pf == PixelFormats.Gray16)
{
fixed (byte* ptr = RawData)
{
short* wptr = (short*)ptr;
short temp;
for (int line = 0; line < ImageHeight; line++)
{
for (int pix = 0; pix < ImageWidth; pix++)
{
temp = *(wptr + (pix + line * ImageWidth));
*(wptr + (pix + line * ImageWidth)) = (short)(temp << 4);
}
}
}
}
Any ideas?
I don't know of a library method that will do it, but I have some suggestions that might help. This will only work if you know that the upper four bits of the pixel are definitely zero (rather than garbage). (If they are garbage, you'd have to add bitmasks to the below). Basically I would propose:
Using a shift operator on a larger data type (int or long) so that you are shifting more data at once
Getting rid of the multiply operations inside your loop
Doing a little loop unrolling
Here is my code:
using System.Diagnostics;
namespace ConsoleApplication9 {
class Program {
public static void Main() {
Crazy();
}
private static unsafe void Crazy() {
short[] RawData={
0x000, 0x111, 0x222, 0x333, 0x444, 0x555, 0x666, 0x777, 0x888,
0x999, 0xaaa, 0xbbb, 0xccc, 0xddd, 0xeee, 0xfff, 0x123, 0x456,
//extra sentinel value which is just here to demonstrate that the algorithm
//doesn't go too far
0xbad
};
const int ImageHeight=2;
const int ImageWidth=9;
var numShorts=ImageHeight*ImageWidth;
fixed(short* rawDataAsShortPtr=RawData) {
var nextLong=(long*)rawDataAsShortPtr;
//1 chunk of 4 longs
// ==8 ints
// ==16 shorts
while(numShorts>=16) {
*nextLong=*nextLong<<4;
nextLong++;
*nextLong=*nextLong<<4;
nextLong++;
*nextLong=*nextLong<<4;
nextLong++;
*nextLong=*nextLong<<4;
nextLong++;
numShorts-=16;
}
var nextShort=(short*)nextLong;
while(numShorts>0) {
*nextShort=(short)(*nextShort<<4);
nextShort++;
numShorts--;
}
}
foreach(var item in RawData) {
Debug.Print("{0:X4}", item);
}
}
}
}
Is there a way of mapping data collected on a stream or array to a data structure or vice-versa?
In C++ this would simply be a matter of casting a pointer to the stream as a data type I want to use (or vice-versa for the reverse)
eg: in C++
Mystruct * pMyStrct = (Mystruct*)&SomeDataStream;
pMyStrct->Item1 = 25;
int iReadData = pMyStrct->Item2;
obviously the C++ way is pretty unsafe unless you are sure of the quality of the stream data when reading incoming data, but for outgoing data is super quick and easy.
Most people use .NET serialization (there is faster binary and slower XML formatter, they both depend on reflection and are version tolerant to certain degree)
However, if you want the fastest (unsafe) way - why not:
Writing:
YourStruct o = new YourStruct();
byte[] buffer = new byte[Marshal.SizeOf(typeof(YourStruct))];
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
Marshal.StructureToPtr(o, handle.AddrOfPinnedObject(), false);
handle.Free();
Reading:
handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
o = (YourStruct)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(YourStruct));
handle.Free();
In case lubos hasko's answer was not unsafe enough, there is also the really unsafe way, using
pointers in C#. Here's some tips and pitfalls I've run into:
using System;
using System.Runtime.InteropServices;
using System.IO;
using System.Diagnostics;
// Use LayoutKind.Sequential to prevent the CLR from reordering your fields.
[StructLayout(LayoutKind.Sequential)]
unsafe struct MeshDesc
{
public byte NameLen;
// Here fixed means store the array by value, like in C,
// though C# exposes access to Name as a char*.
// fixed also requires 'unsafe' on the struct definition.
public fixed char Name[16];
// You can include other structs like in C as well.
public Matrix Transform;
public uint VertexCount;
// But not both, you can't store an array of structs.
//public fixed Vector Vertices[512];
}
[StructLayout(LayoutKind.Sequential)]
unsafe struct Matrix
{
public fixed float M[16];
}
// This is how you do unions
[StructLayout(LayoutKind.Explicit)]
unsafe struct Vector
{
[FieldOffset(0)]
public fixed float Items[16];
[FieldOffset(0)]
public float X;
[FieldOffset(4)]
public float Y;
[FieldOffset(8)]
public float Z;
}
class Program
{
unsafe static void Main(string[] args)
{
var mesh = new MeshDesc();
var buffer = new byte[Marshal.SizeOf(mesh)];
// Set where NameLen will be read from.
buffer[0] = 12;
// Use Buffer.BlockCopy to raw copy data across arrays of primitives.
// Note we copy to offset 2 here: char's have alignment of 2, so there is
// a padding byte after NameLen: just like in C.
Buffer.BlockCopy("Hello!".ToCharArray(), 0, buffer, 2, 12);
// Copy data to struct
Read(buffer, out mesh);
// Print the Name we wrote above:
var name = new char[mesh.NameLen];
// Use Marsal.Copy to copy between arrays and pointers to arrays.
unsafe { Marshal.Copy((IntPtr)mesh.Name, name, 0, mesh.NameLen); }
// Note you can also use the String.String(char*) overloads
Console.WriteLine("Name: " + new string(name));
// If Erik Myers likes it...
mesh.VertexCount = 4711;
// Copy data from struct:
// MeshDesc is a struct, and is on the stack, so it's
// memory is effectively pinned by the stack pointer.
// This means '&' is sufficient to get a pointer.
Write(&mesh, buffer);
// Watch for alignment again, and note you have endianess to worry about...
int vc = buffer[100] | (buffer[101] << 8) | (buffer[102] << 16) | (buffer[103] << 24);
Console.WriteLine("VertexCount = " + vc);
}
unsafe static void Write(MeshDesc* pMesh, byte[] buffer)
{
// But byte[] is on the heap, and therefore needs
// to be flagged as pinned so the GC won't try to move it
// from under you - this can be done most efficiently with
// 'fixed', but can also be done with GCHandleType.Pinned.
fixed (byte* pBuffer = buffer)
*(MeshDesc*)pBuffer = *pMesh;
}
unsafe static void Read(byte[] buffer, out MeshDesc mesh)
{
fixed (byte* pBuffer = buffer)
mesh = *(MeshDesc*)pBuffer;
}
}
if its .net on both sides:
think you should use binary serialization and send the byte[] result.
trusting your struct to be fully blittable can be trouble.
you will pay in some overhead (both cpu and network) but will be safe.
If you need to populate each member variable by hand you can generalize it a bit as far as the primitives are concerned by using FormatterServices to retrieve in order the list of variable types associated with an object. I've had to do this in a project where I had a lot of different message types coming off the stream and I definitely didn't want to write the serializer/deserializer for each message.
Here's the code I used to generalize the deserialization from a byte[].
public virtual bool SetMessageBytes(byte[] message)
{
MemberInfo[] members = FormatterServices.GetSerializableMembers(this.GetType());
object[] values = FormatterServices.GetObjectData(this, members);
int j = 0;
for (int i = 0; i < members.Length; i++)
{
string[] var = members[i].ToString().Split(new char[] { ' ' });
switch (var[0])
{
case "UInt32":
values[i] = (UInt32)((message[j] << 24) + (message[j + 1] << 16) + (message[j + 2] << 8) + message[j + 3]);
j += 4;
break;
case "UInt16":
values[i] = (UInt16)((message[j] << 8) + message[j + 1]);
j += 2;
break;
case "Byte":
values[i] = (byte)message[j++];
break;
case "UInt32[]":
if (values[i] != null)
{
int len = ((UInt32[])values[i]).Length;
byte[] b = new byte[len * 4];
Array.Copy(message, j, b, 0, len * 4);
Array.Copy(Utilities.ByteArrayToUInt32Array(b), (UInt32[])values[i], len);
j += len * 4;
}
break;
case "Byte[]":
if (values[i] != null)
{
int len = ((byte[])values[i]).Length;
Array.Copy(message, j, (byte[])(values[i]), 0, len);
j += len;
}
break;
default:
throw new Exception("ByteExtractable::SetMessageBytes Unsupported Type: " + var[1] + " is of type " + var[0]);
}
}
FormatterServices.PopulateObjectMembers(this, members, values);
return true;
}