Converting C++ Pointer Math to C# - c#

I'm currently working on a project that requires converting some C++ code to a C# environment. For the most part, it's actually pretty straightforward, but I'm currently converting some lower-level memory manipulation functions and running into some uncertainty as to how to proceed.
In the C++ code, I've got a lot of instances of things like this (obviously quite simplified):
void SomeFunc(unsigned char* myMemoryBlock)
{
AnotherFunc(myMemoryBlock);
AnotherFunc(myMemoryBlock + memoryOffset);
}
void AnotherFunc(unsigned char* data)
{
// Also simplified - basically, modifying the
// byte pointed to by data and then increasing to the next item.
*data = 2;
data++;
*data = 5;
data++;
// And so on...
}
I'm thinking that in C#, I've basically got to treat the "unsigned char*" as a byte array (byte[]). But to perform a similar operation to the pointer arithmetic, is that essentially just increasing a "currentIndex" value for accessing the byte array? For something like AnotherFunc, I guess that means I'd also need to pass in a starting index, if the starting index isn't 0?
Just want to confirm this is how it should be done in C#, or if there's a better way to make that sort of conversion. Also, I can't use the "unsafe" keyword in my current environment, so actually using pointers is not possible!

The two functions treat myMemoryBlock as if it represented an array. You could replace a single myMemoryBlock parameter with a pair of myArray and myOffset, like this:
void SomeFunc(char[] myArray)
{
AnotherFunc(myArray, 0);
AnotherFunc(myArray, memoryOffset);
}
void AnotherFunc(char[] data, int offset)
{
// Also simplified - basically, modifying the
// byte pointed to by data and then increasing to the next item.
data[offset++] = 2;
data[offset++] = 5;
// And so on...
}
Note: C++ type unsigned char is often used as a stand-in for "untyped block of memory" (as opposed to "a block of memory representing character data"). If this is the case in your situation, i.e. the pointer points to memory that is not necessarily character, an array of byte would be a more appropriate choice.

Just like #dasblinkenlight said, the C# (and Java) way to deal with arbitrary pointers to memory data blocks (which are usually byte or char arrays) is to add an additional offset parameter to the methods that access the data blocks.
It is also common to add a third length parameter. Thus the general form for a method Foo() that is passed a block of memory is:
// Operate on 'block', starting at index 'offset',
// for 'length' elements
//
int Foo(byte[] block, int offset, int length)
{ ... }
You see this all over the place in the C# library. Another form that is common for methods that operate on two memory blocks (e.g., copying one block to another, or comparing one block to another, etc.) is this:
// Operate on blocks 'src' starting at index 'srcOff',
// and on block 'dst' starting at 'dstOff',
// for a total of 'length' elements
//
int Bar(byte[] src, int srcOff, byte[] dst, int dstOff, int length)
{ ... }
For methods that expect to operate on an entire memory block (array), these generally look like this:
// Overloaded version of Foo() that
// operates on the entire array 'block'
//
int Foo(byte[] block)
{
return Foo(block, 0, block.Length);
}

C# does away with pointers for the exact reasons of preventing pointer arithmetic (rather, the errors that pointer arithmetic is vulnerable to).
Generally any C++ memory block referred to by a pointer and memory offset is indeed best translated as an array in C# (hence why even C# arrays start with [0]).However, you should keep the array the same type as the underlying data -char[] instead of byte[]. Because this is also a char[], you should look at what the overall use of the function is and consider switching to a string.

Related

Difference between Marshal.SizeOf and sizeof, I just don't get it

Until now I have just taken for granted that Marshal.SizeOf is the right way to compute the memory size of a blittable struct on the unmanaged heap (which seems to be the consensus here on SO and almost everywhere else on the web).
But after having read some cautions against Marshal.SizeOf (this article after "But there's a problem...") I tried it out and now I am completely confused:
public struct TestStruct
{
public char x;
public char y;
}
class Program
{
public static unsafe void Main(string[] args)
{
TestStruct s;
s.x = (char)0xABCD;
s.y = (char)0x1234;
// this results in size 4 (two Unicode characters)
Console.WriteLine(sizeof(TestStruct));
TestStruct* ps = &s;
// shows how the struct is seen from the managed side... okay!
Console.WriteLine((int)s.x);
Console.WriteLine((int)s.y);
// shows the same as before (meaning that -> is based on
// the same memory layout as in the managed case?)... okay!
Console.WriteLine((int)ps->x);
Console.WriteLine((int)ps->y);
// let's try the same on the unmanaged heap
int marshalSize = Marshal.SizeOf(typeof(TestStruct));
// this results in size 2 (two single byte characters)
Console.WriteLine(marshalSize);
TestStruct* ps2 = (TestStruct*)Marshal.AllocHGlobal(marshalSize);
// hmmm, put to 16 bit numbers into only 2 allocated
// bytes, this must surely fail...
ps2->x = (char)0xABCD;
ps2->y = (char)0x1234;
// huh??? same result as before, storing two 16bit values in
// only two bytes??? next will be a perpetuum mobile...
// at least I'd expect an access violation
Console.WriteLine((int)ps2->x);
Console.WriteLine((int)ps2->y);
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
What's going wrong here? What memory layout does the field dereferencing operator '->' assume? Is '->' even the right operator for addressing unmanaged structs? Or is Marshal.SizeOf the wrong size operator for unmanaged structs?
I have found nothing that explains this in a language I understand. Except for "...struct layout is undiscoverable..." and "...in most cases..." wishy-washy kind of stuff.
The difference is: the sizeof operator takes a type name and tells you how many bytes of managed memory need to be allocated for an instance of that struct.This is not necessarily stack memory; structs are allocated off the heap when they are array elements, fields of a class, and so on. By contrast, Marshal.SizeOf takes either a type object or an instance of the type, and tells you how many bytes of unmanaged memory need to be allocated. These can be different for a variety of reasons. The name of the type gives you a clue: Marshal.SizeOf is intended to be used when marshaling a structure to unmanaged memory.
Another difference between the two is that the sizeof operator can only take the name of an unmanaged type; that is, a struct type whose fields are only integral types, Booleans, pointers and so on. (See the specification for an exact definition.) Marshal.SizeOf by contrast can take any class or struct type.
I think the one question you still don't have answered is what's going on in your particular situation:
&ps2->x
0x02ca4370 <------
*&ps2->x: 0xabcd 'ꯍ'
&ps2->y
0x02ca4372 <-------
*&ps2->y: 0x1234 'ሴ'
You are writing to and reading from (possibly) unallocated memory. Because of the memory area you're in, it's not detected.
This will reproduce the expected behavior (at least on my system, YMMV):
TestStruct* ps2 = (TestStruct*)Marshal.AllocHGlobal(marshalSize*10000);
// hmmm, put to 16 bit numbers into only 2 allocated
// bytes, this must surely fail...
for (int i = 0; i < 10000; i++)
{
ps2->x = (char)0xABCD;
ps2->y = (char)0x1234;
ps2++;
}
What memory layout does the field dereferencing operator '->' assume?
Whatever the CLI decides
Is '->' even the right operator for addressing unmanaged structs?
That is an ambiguous concept. There are structs in unmanaged memory accessed via the CLI: these follow CLI rules. And there are structs that are merely notional monikers for unmanaged code (perhaps C/C++) accessing the same memory. This follows the rules of that framework. Marshalling usually refers to P/Invoke, but that isn't necessarily applicable here.
Or is Marshal.SizeOf the wrong size operator for unmanaged structs?
I'd default to Unsafe.SizeOf<T>, which is essentially sizeof(T) - which is perfectly well-defined for the CLI/IL (including padding rules etc), but isn't possible in C#.
A char marshals, by default, to an ANSI byte. This allows interoperability with most C libraries and is fundamental to the operation of the .NET runtime.
I believe the correct solution is to change TestStruct to:
public struct TestStruct
{
[System.Runtime.InteropServices.MarshalAs(UnmanagedType.U2)]
public char x;
[System.Runtime.InteropServices.MarshalAs(UnmanagedType.U2)]
public char y;
}
UnmanagedType.U2 means unsigned 'integer' 2 bytes long, which makes it equivalent to the wchar_t type in a C header.
Seamless porting of C structures to .NET is possible with attention to detail and opens many doors for interop with native libraries.

Why does stackalloc have to be used as a variable initializer?

I'm writing some unsafe code in C# (follow-up to this question) and I'm wondering, why exactly does the stackalloc keyword have to be used as a variable initializer? e.g. This will produce a syntax error:
public unsafe class UnsafeStream
{
byte* buffer;
public UnsafeStream(int capacity)
{
this.buffer = stackalloc byte[capacity]; // "Invalid expression term 'stackalloc' / ; expected / } expected"
}
}
But re-assigning the results from a local temporary will not:
public UnsafeStream(int capacity)
{
byte* buffer = stackalloc byte[capacity];
this.buffer = buffer;
}
Why isn't the first version allowed, and what evil things will happen if I attempt the second version?
Your stack is looking something very roughly like this:
[stuff from earlier calls][stuff about where this came from][this][capacity]
^You are here
Then you do stackalloc and this adds two things to the stack, the pointer and the array pointed to:
[stuff from earlier calls][stuff about where this came from][this][capacity][buffer][array pointed to by buffer]
^You are here
And then when you return the stuff most recently put on the stack, the locals of the current function, its return address, and the stackalloced buffer are all simply ignored (which is one of the advantages of stackalloc, ignoring stuff is fast and easy):
[stuff from earlier calls][stuff about where this came from][this][capacity][buffer][array pointed to by buffer]
^You are here
It can be overwritten by the next method call:
[stuff from earlier calls][stuff about where this came from][this][new local1][new local2]o by buffer]
^You are here
What you are proposing, is that a private field, which is to say a part of an object on the heap (a different piece of memory, managed differently) hold a pointer to the buffer that has been half-overwritten by completely different data, of different types.
Immediately consequences would be:
Attempts to use buffer are now fraught because half of it is overwritten by item, most of which aren't even bytes.
Attempts to use any local is now fraught, because future changes to buffer can overwrite them with random bytes in random places.
And that's just considering the single thread involved here, never mind other threads with separate stacks perhaps being able to access that field.
It's also just not very useful. You can coerce a field to hold an address to somewhere on a stack with enough effort, but there isn't that much good one can do with it.

Fast array copy in C#

I have a C# class that contains an int[] array (and a couple of other fields, but the array is the main thing). The code often creates copies of this class and profiling shows that the Array.Copy() call to copy this array takes a lot of time. What can I do to make it faster?
The array size is very small and constant: 12 elements. So ideally I'd like something like a C-style array: a single block of memory that's inside the class itself (not a pointer). Is this possible in C#? (I can use unsafe code if needed.)
I've already tried:
1) Using a UIn64 and bit-shifting instead of the array. (The values of each element are also very small.) This does make the copy fast, but slows down the program overall.
2) Using separate fields for each array element: int element0, int element1, int element2, etc. Again, this is slower overall when I have to access the element at a given index.
I would checkout the System.Buffer.BlockCopy if you are really concerned about speed.
http://msdn.microsoft.com/en-us/library/system.buffer.blockcopy.aspx
Simple Example:
int[] a = new int[] {1,2,3,4,5,6,7,8};
int[] b = new int[a.Length];
int size = sizeof(int);
int length = a.Length * size;
System.Buffer.BlockCopy(a, 0, b, 0, length);
Great discussion on it over here: Array.Copy vs Buffer.BlockCopy
This post is old, but anyone in a similar situation as the OP should have a look at fixed size buffers in structs. They are exactly what OP was asking for: an array of primitive types with a constant size stored directly in the class.
You can create a struct to represent your collection, which will contain the fixed size buffer. The data will be stored directly within the struct, which will be stored directly within your class. You can copy through simple assignment.
They come with a few caveats:
They can only be used with primitive types.
They require the "unsafe" keyword on your struct.
Size must be known at compile time.
It used to be that you had to use the fixed keyword and pointers to access them, but recent changes to C# catering to performance programming have made that unnecessary. You can now work with them just like arrays.
public unsafe struct MyIntContainer
{
private fixed int myIntegers[12];
public int this[int index]
{
get => this.myIntegers[index];
set => this.myIntegers[index] = value;
}
}
There is no built-in bound checking, so it would be best for you to include that yourself on such a property, encapsulating any functionality which skips bound checks inside of a method. I am on mobile, or I would have worked that into my example.
You asked about managed arrays. If you are content to use fixed / unsafe, this can be very fast.
struct is assignable, like any primitive. Almost certainly faster than Buffer.BlockCopy() or any other method, due to the lack of method call overhead:
public unsafe struct MyStruct //the actual struct used, contains all
{
public int a;
public unsafe fixed byte buffer[16];
public ulong b;
//etc.
}
public unsafe struct FixedSizeBufferWrapper //contains _only_ the buffer
{
public unsafe fixed byte buffer[16];
}
unsafe
{
fixed (byte* bufferA = myStructA.buffer, bufferB = myStructB.buffer)
{
*((FixedSizeBufferWrapper*)bufferA) =
*((FixedSizeBufferWrapper*)bufferB);
}
}
We cast fixed-size byte buffers from each of your original structs to the wrapper pointer type and dereference each pointer SO THAT we can assign one to the other by value; assigning fixed buffers directly is not possible, hence the wrapper, which is basically zero overhead (it just affects values used in pointer arithmetic that is done anyway). That wrapper is only ever used for casting.
We have to cast because (at least in my version of C#) we cannot assign anything other than a primitive type (usually byte[]) as the buffer, and we aren't allowed to cast inside fixed(...).
EDIT: This appears get translated into a call to Buffer.Memcpy() (specifically Buffer.memcpy4() in my case, in Unity / Mono) under the hood to do the copy.

Sending a 2D int array between C# and C++

I'm trying to create a solution where I can run a 2D int array within a C# program through CUDA, so the approach I'm currently taking to try and do this is by creating a C++ dll which can handle the CUDA code then return the 2D array. The code I'm using to send my array to the dll and back again is below.
#include "CudaDLL.h"
#include <stdexcept>
int** cudaArrayData;
void CudaDLL::InitialiseArray(int arrayRows, int arrayCols, int** arrayData)
{
cudaArrayData = new int*[arrayCols];
for(int i = 0; i < arrayCols; i++)
{
cudaArrayData[i] = new int[arrayRows];
}
cudaArrayData = arrayData;
}
int** CudaDLL::ReturnArray()
{
return cudaArrayData;
}
The problem however is I get an error in C# on the return, "Cannot marshal 'return value': Invalid managed/unmanaged type combination." My hope was if I returned the array back as a pointer C# might have hopefully understood and accepted it however no such luck.
Any idea's?
As you’re using int[,] in C#, int** is not the right corresponding type in C++. int[][] is an array of arrays of ints, similar to int** in C++; whereas int[,] is one array of ints with 2D indexing: index = x + y * width. Using int** in C#/C++ interop is difficult, as you have many pointers to either managed or unmanaged memory, which is not directly accessible from one to another (see further down).
Already in InitialiseArray(..., int** arrayData) you read somewhere in your memory but not in array values as you don't pass an array with pointers to arrays of ints, you pass one single array of int.
When you return int** in ReturnArray(), your problem is that .net has no clue how to interpret that pointer to an pointer.
To fix this, use only int* on C++ side and don’t return the array as a function return value, this would only give you a pointer to the unmanaged array and not the entire data in managed memory. It is possible to use this from C#, but probably not in the way you intend to do. Use an array allocated in C# as function argument in void ReturnArray(int* retValues) to copy the data to.
The other problem is then data copying and memory allocations. You can avoid all of these steps if you handle memory in C# right, i.e. forbid the garbage collector to move your data around (it does that when cleaning up unused objects). Either use a fixed{} statement or do it manually via GCHandle.Alloc(array, GCHandleType.Pinned). Doing so, you can directly use the C# allocated arrays from within C++.
Finally, if all you need is to let a CUDA kernel run on your C# data, have a look at some cuda wrappers that handle all the Pinvoke hazards for you. Like managedCuda (I maintain this project) or cudafy and some others.

How do I get a byte* for the bits in a BitArray?

I am working on a C++ CLI wrapper of a C API. One function in the C API expected data in the form:
void setData(byte* dataPtr, int offset, int length);
void getData(byte* buffer, int offset, int length);
For the C++ CLI it was suggested that we use a System.Collections.BitArray (Yes the individual Bits have meaning). A BitArray can be constructed from an array of bytes and copied to one:
array<System::Byte>^ bytes = gcnew array<System::Byte>(40);
System::Collections::BitArray^ ba = gcnew System::Collections::BitArray(bytes);
int length = ((ba->Length - 1)/8) +1;
array<System::Byte>^ newBytes = gcnew array<System::Byte>(length);
ba->CopyTo(newBytes, 0);
pin_ptr<unsigned char> rawDataPtr = &buffer[0];
My concern is the last line. Is it valid to get a pointer from the array in this way? Is there a better alternative in C# for working with arbitrary bits? Remember the individual bits have meaning.
Is it valid to get a pointer from the array in this way?
Yes, that's valid. The pin_ptr<> helper class calls GCHandle.Alloc() under the hood, asking for GCHandleType.Pinned. So the pointer is stable and can be passed to unmanaged code without fear that the garbage collector is going to move the array and make the pointer invalid.
A very important detail is missing from the question however. The reason that pin_ptr<> exists instead of just letting you use GCHandle directly is exactly when the GCHandle.Free() method will be called. You don't do this explicitly, pin_ptr<> does it for you, it uses the standard C++ RAII pattern. In other words, the Free() method is automatically called, it happens when the variable goes out of scope. Which gets the C++ compiler to emit the destructor call, it in turns calls Free().
This will go very, very wrong when the C function stores the passed dataPtr and uses it later. Later being the problem, the array won't be pinned anymore and can now exist at an arbitrary address. Major data corruption, very hard to diagnose. The getData() function strongly suggests that is fact the case. This is not good.
You will need to fix this, using GCHandle::Alloc() yourself to pin the array permanently is very painful to garbage collector, a rock in the road that won't budge and has a long-lasting effect on the efficiency of the program. Instead you should copy the managed array to stable memory that you allocate with, say, malloc() or Marshal::AllocHGlobal(). That's unmanaged memory, it will never move. Marshal::Copy() is a simple way to copy it.

Categories

Resources