Using CUDA in a DLL with a C# Application

Using CUDA in a DLL with a C# Application - c#

What I am trying to do is write a C# application to generate pictures of fractals (mandlebrot and julia sets). I am using unmanaged C++ with CUDA to do the heavy lifting, and C# for the user interface. When I try to run this code, I am not able to call the method I wrote in the DLL - I get an unhandled exception error for an invalid parameter.
The C++ DLL is designed to return a pointer to the pixel data for a bitmap, which is used by the .NET Bitmap to create a bitmap and display it in a PictureBox control.
Here is the relevant code:
C++: (CUDA methods omitted for conciseness
extern "C" __declspec(dllexport) int* generateBitmap(int width, int height)
{
int *bmpData = (int*)malloc(3*width*height*sizeof(int));
int *dev_bmp;
gpuErrchk(cudaMalloc((void**)&dev_bmp, (3*width*height*sizeof(int))));
kernel<<<BLOCKS_PER_GRID, THREADS_PER_BLOCK>>>(dev_bmp, width, height);
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
cudaFree(dev_bmp);
return bmpData;
}
C#:
public class NativeMethods
{
[DllImport(#"C:\...\FractalMaxUnmanaged.dll")]
public static unsafe extern int* generateBitmap(int width, int height);
}
//...
private unsafe void mandlebrotButton_Click(object sender, EventArgs e)
{
int* ptr = NativeMethods.generateBitmap(FractalBox1.Width, FractalBox1.Height);
IntPtr iptr = new IntPtr(ptr)
fractalBitmap = new Bitmap(
FractalBox1.Width,
FractalBox1.Height,
3,
System.Drawing.Imaging.PixelFormat.Format24bppRgb,
iptr );
FractalBox1.Image = fractalBitmap;
}
Error:
************** Exception Text **************
Managed Debugging Assistant 'PInvokeStackImbalance' has detected a problem in 'C:\...WindowsFormsApplication1.vshost.exe'.
I believe the problem I am having is with the IntPtr - is this the correct method to pass a pointer from unmanaged C++ to a C# application? Is there a better method? Is passing a pointer the best method to accomplish what I am trying to do or is there a better way to pass the pixel data from unmanaged C++ w/ CUDA to C#?
EDIT:
From what I gather from the error I get when I debug the application, PInvokeStackImbalance implies that the signatures for the unmanaged and managed code don't match. However, they sure look like they match to me.
I feel like I'm missing something obvious here, any help or recommended reading would be appreciated.

You need to define the same calling convention in C and C#:
In C:
extern "C" __declspec(dllexport) int* __cdecl generateBitmap(int width, int height)
In C#:
[DllImport(#"C:\...\FractalMaxUnmanaged.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern IntPtr generateBitmap(int width, int height);
Instead of cdecl you can also use stdcall, it only needs to be the same on both sides.
And handling my self a lot of managed/unmanaged code, I also advise you to pass the image array as an argument and do the memory allocation in C#. Doing so, you don't need to take care of manually freeing memory in managed world.

Related

Random AccessViolationException in multithread simulation when using LoadLibrary to P/Invoke methods

I have an application in C# that simulates multiple mathematical models written in C++ in parallel(each model in a thread). Each model has 4 methods : Init, Run, Free and Save. Each method is dynamically loaded using LoadLibrary from kernel32. Using the LoadLibray method, we load the DLL and then using GetProcAddress we get the 4 main methods. The application has Play/Pause/Stop.
When we Play the simulation for the first time, there's no problem in the simulation but, if we Stop and Play again, after a couple of seconds, the program throws AccessViolationException in the Run method of a RANDOM model. There's no pattern, which makes me belive that problem lies in the P/Invoke operation. Look at the following example :
//The C++ Dll.
extern "C" __stdcall void Init(const char* name, int size, void* buffer, void** memory);
extern "C" __stdcall void Run(int in_size, int out_size, void* in_buffer, void* out_buffer, void** memory);
extern "C" __stdcall void Free(void** memory);
extern "C" __stdcall void Save(int size, void* buffer, void** memory);
// after the P/Invoke Dll and Marshal.GetDelegateForFunctionPointer method (in C# code)
ModelInit(string name, int size, IntPtr buffer, ref IntPtr memory);
ModelRun(int in_size, int out_size, IntPtr in_buffer, IntPtr out_buffer, ref IntPtr memory); // <---- THE EXCEPTION IS THROWN HERE
ModelFree(ref IntPtr memory);
ModelSave(int size, IntPtr buffer, ref IntPtr memory);
I have no idea how to debug this, i've already tried to make c++/cli wrapper to avoid P/Invoke, but some DLLs has third-party libraries that were built in /MTd option, and the C++/CLI is incompatible.

C# environment stack overflows when starting dll

I have a piece of C# program loading a dll function:
[DllImport("/Users/frk/Workspaces/MySharedLibrary.dll", CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Auto, EntryPoint = "MyCFunction")]
public static extern int MyFunction( [In][MarshalAs(UnmanagedType.I4)]MyFormat format, [In][MarshalAs(UnmanagedType.LPArray)] byte[] myString, [In][MarshalAs(UnmanagedType.I4)] int myStringLength, [MarshalAs(UnmanagedType.LPArray)] byte[] output, ref UIntPtr outputLength);
and calling it
int result = MyFunction(format, inPut, inputLength, outPut, ref outputLength);
on the C++ side, I have:
MyCPPFunction that works perfectly when called from a C test executable. That MyCPPFunction contains somewhere deep in its dependencies a global const variable declared and initialized in an anonymous namespace:
namespace
{
constexpr unsigned RandTileSize = 256;
std::array<unsigned, RandTileSize * RandTileSize> GenerateSamples()
{
std::array<unsigned, RandTileSize * RandTileSize> samples;
std::mt19937 rd(0);
std::uniform_int_distribution<unsigned> distribution(0, 255);
for (unsigned i = 0; i < RandTileSize * RandTileSize; ++i)
{
samples[i] = distribution(rd);
}
return samples;
};
const auto samples = GenerateSamples();<-- Option#1 this causes a stack overflow when loading the dll in C# environment
unsigned Sample(unsigned index)
{
static const auto samples = GenerateSamples();<-- Option#2 this works and dll loads correctly
return samples[index];
}
}
I am confused here since afaik, the option 1 should allocate memory in the code part of the dll, which the C# environment should deal with right ?
How can we have option #1 not to cause memory allocation problems while loading the dll ?

The lifetime of a static variable in a function within a DLL is from the first time the statement is encountered, to the time the DLL unloads.
The lifetime of a class or file scoped variable is from the time the DLL loads until the time the DLL unloads.
The consequence of this is that in the failing case, your initialisation code is running while the DLL is in the process of loading.
It is not generally a good idea to run nontrivial code in class constructors, as there are limits to what can safely be done inside the loader lock.
In particular if you perform any action which requires dynamically loading another DLL (such as LoadLibrary or calling a delay-load linked function) this is likely to cause difficult-to-diagnose issues.
Without diagnosing exactly what has gone wrong in your case the answer is simple: Use option2 or option 3.
Option 3:
void MyDLLInitialize(){
// Initialize the dll here
}
void MyDLLUninitialize(){
// Uninitialize the dll here
}
Then call these functions from C# before you use any other DLL function, and after you have finished with it, respectively.

Array from C++ to C#

I am trying to pass a double array (its actually a std::vector, but converted at transfer) from a c++ dll into a c# script (unity).
Using the approach outlined here https://stackoverflow.com/a/31418775.
I can successfully get the size of the array printing on my console in unity however I am not able to use "CoTaskMemAlloc" to allocate memory for the array since I am using Xcode and it doesnt seem to have COM.
For a little more background this array is part of a control for a GUI, c++ creates it and the user edits with the c# GUI - so the plan is to be able to pass the array back to c++ when it has been edited.
C++ code
extern "C" ABA_API void getArray(long* len, double **data){
*len = delArray.size();
auto size = (*len)*sizeof(double);
*data = static_cast<double*>(CoTaskMemAlloc(size));
memcpy(*data, delArray.data(), size);
}
C# code
[DllImport("AudioPluginSpecDelay")]
private static extern void getArray (out int length,[MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 2)] out double[] array);
int theSize;
double[] theArray;
getArray(out theSize, out theArray);
If I leave out the code concerning the array, the int passes just fine. So I beleive the method to be the right one, its just getting around the lack of CoTaskMemAlloc.

You should be able to allocate memory in XCode using malloc and free it in C# using Marshal.FreeCoTaskMem. To be able to free it however, you need to have the IntPtr for it:
C++ code
extern "C" ABA_API void getArray(long* len, double **data)
{
*len = delArray.size();
auto size = (*len)*sizeof(double);
*data = static_cast<double*>(malloc(size));
memcpy(*data, delArray.data(), size);
}
C# code
[DllImport("AudioPluginSpecDelay")]
private static extern void getArray(out int length, out IntPtr array);
int theSize;
IntPtr theArrayPtr;
double[] theArray;
getArray(out theSize, out theArrayPtr);
Marshal.Copy(theArrayPtr, theArray, 0, theSize);
Marshal.FreeCoTaskMem(theArrayPtr);
// theArray is a valid managed object while the native array is already freed
Edit
From Memory Management I gathered that Marshal.FreeCoTaskMem would most likely be implemented using free(), so the fitting allocator would be malloc().
There are two ways to be really sure:
Allocate the memory in CLI using Marshal.AllocCoTaskMem, pass it to native to have it filled, and then free it in the CLI again using Marshal.FreeCoTaskMem.
Leave it as it is (native allocates memory with malloc()), but do not free the memory in CLI. Instead, have another native function like freeArray(double **data) and have it free() the array for you once CLI is done using it.

I am not an expert on Unity, but it seems that Unity relies on Mono for it's C# scripting support. Take a look at this documentation page:
Memory Management in Mono
We can assume from there that you will need to have platform-dependent code on your C++ side, you will need to use CoTaskMemAlloc/CoTaskMemFree in Windows and GLib memory functions g_malloc() and g_free() for Unix (like iOS, Android etc).
If you have control over all your code, C++ and C#, the easiest way to implement this would be to do all the memory allocation/deallocation in the C# script.
Sample code (untested):
//C++ code
extern "C" ABA_API long getArrayLength(){
return delArray.size();
}
extern "C" ABA_API void getArray(long len, double *data){
if (delArray.size() <= len)
memcpy(data, delArray.data(), delArray.size());
}
// C# code
[DllImport("AudioPluginSpecDelay")]
private static extern int getArrayLength();
[DllImport("AudioPluginSpecDelay")]
private static extern void getArray(int length,[MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)] double[] array);
int theSize = getArrayLength();
double[] theArray = new double[theSize];
getArray(theSize, theArray);

byte[] array from unmanaged memory

I'm writing a simple .NET wrapper to C/C++ Pylon library for Basler cameras under linux (Ubuntu) in monodevelop. I'm building a .so (.dll) file in Code::Blocks and P/Invoke it in monodevelop.
I have two simple tasks: get a single image and get a sequence of images.
I managed the first part in this way:
in C++ i have a function:
void GetImage(void* ptr)
{
CGrabResultPtr ptrGrabResult;
//here image is grabbing to ptrGrabResult
camera.GrabOne(5000,ptrGrabResult,TimeoutHandling_ThrowException);
//i'm copying byte array with image to my pointer ptr
memcpy(ptr,ptrGrabResult->GetBuffer(),_width*_height);
if (ptrGrabResult->GrabSucceeded())
return;
else
cout<<endl<<"Grab Failed;"<<endl;
}
and in C#:
[DllImport("libPylonInterface.so")]
private static extern void GetImage(IntPtr ptr);
public static void GetImage(out byte[] arr)
{
//allocating unmanaged memory for byte array
IntPtr ptr = Marshal.AllocHGlobal (_width * _height);
//and "copying" image data to this pointer
GetImage (ptr);
arr = new byte[_width * _height];
//copying from unmanaged to managed memory
Marshal.Copy (ptr, arr, 0, _width * _height);
Marshal.FreeHGlobal(ptr);
}
and after that i can build an image from this byte[] arr.
I have to copy a huge amount of bytes twice (1. in c++ memcpy(); 2. in c# Marshal.Copy()). I tried to use a direct pointer to image buffer ptr = ptrGrabResult -> GetBuffer(), but get a mono environment error, when marshalling bytes.
So here is a question: is it a fine solution? Am i going in right direction?
P.S. Give me please some advices how should i manage with image sequence?

You can get the marshaller to do the work for you. Like this:
[DllImport("libPylonInterface.so")]
private static extern void GetImage([Out] byte[] arr);
....
arr = new byte[_width * _height];
GetImage(arr);
This avoids the second memory copy because the marshaller will pin the managed array and pass its address to the unmanaged code. The unmanaged code can then populate the managed memory directly.
The first copy looks harder to avoid. That is probably forced upon you by the camera library that you are using. I would comment that you should probably only perform that copy if the grab succeeded.

gDebugger doesn't show allocated textures with OpenTK

i'm using OpenTK warpper for C#, shaders weren't running correctly (i want to generate vertex displacement shader using textures), so i pretend to use a gpu debugger to see what was happening.
The application is quite simple. It just creates a game window, load shaders, textures and render textured cubes (so it's working fine, i discovered the problem trying to use vertex displacement).
I used gDebugger and AMD CodeXL with same results. Debugger detects shaders, vbos, etc but never see allocated textures. This have non-sense because when i'm running application i see a textured cube spinning around the screen and debugger render the object on back/front buffer.
For reference, here is the texture-loading function:
int loadImage(Bitmap image,int tex)
{
int texID = GL.GenTexture();
GL.BindTexture(TextureTarget.Texture2D, texID);
System.Drawing.Imaging.BitmapData data = image.LockBits(new System.Drawing.Rectangle(0, 0, image.Width, image.Height),
System.Drawing.Imaging.ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);
GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, data.Width, data.Height, 0,
OpenTK.Graphics.OpenGL.PixelFormat.Bgra, PixelType.UnsignedByte, data.Scan0);
image.UnlockBits(data);
return texID;
}
I was searching more information, but i couldn't find anything about this error, i'm not sure if the problem is in the wrapper, in the function or i must have something else considered
EDIT:
It seems that the problem is in the wrapper, OpenTK.BindTexture is different to native glBindTexture, so the profiler can't catch the call and that's why the textures are not shown. So the next step is look for the way to do native gl calls using OpenTK.
PROPOSED SOLUTION:
As i said, some functions of OpenTK wrapper are different to native glCalls, when you use functions like GL.BindTexture, GL.GenTexture (i suppose there are more functions, but i don't know yet), OpenTK uses overloaded calls that not match with the original calls and profilers can't catch it.
It's easy to check, just use OpenTK.GenTextures or GL.BindTextures adding breakpoints into profiler with those functions and they will never break.
Now, the solution, i was thinking about it and the conclussion was to replace some OpenTK calls with native gl.dll calls using GetProcAddress function.
This gives me some ideas:
http://blogs.msdn.com/b/jonathanswift/archive/2006/10/03/dynamically-calling-an-unmanaged-dll-from-.net-_2800_c_23002900_.aspx
Using opengl32.dll included in the profiler, i use the same struct as in the previous link
static class NativeMethods
{
[DllImport("kernel32.dll")]
public static extern IntPtr LoadLibrary(string dllToLoad);
[DllImport("kernel32.dll")]
public static extern IntPtr GetProcAddress(IntPtr hModule, string procedureName);
[DllImport("kernel32.dll")]
public static extern bool FreeLibrary(IntPtr hModule);
}
This is added in the GameWindow class:
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
private delegate void BindTexture(OpenTK.Graphics.OpenGL.TextureTarget target, int texID);
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
private delegate void GenTexture(int n, int[] arr_text);
And here the new bindTexture function:
IntPtr pDll = NativeMethods.LoadLibrary(#"....\opengl32.dll");
IntPtr pAddressOfFunctionToCall = NativeMethods.GetProcAddress(pDll, "glBindTexture");
BindTexture bindTexture = (BindTexture)Marshal.GetDelegateForFunctionPointer(pAddressOfFunctionToCall, typeof(BindTexture));
bindTexture(OpenTK.Graphics.OpenGL.TextureTarget.Texture2D, arr_texture[0]);
Now if you try again with breakpoints in the profiler, it will break with glGenTextures and glBindTextures, recognising allocated textures.
I hope it helps.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using CUDA in a DLL with a C# Application - c#

Related

Random AccessViolationException in multithread simulation when using LoadLibrary to P/Invoke methods

C# environment stack overflows when starting dll

Array from C++ to C#

byte[] array from unmanaged memory

gDebugger doesn't show allocated textures with OpenTK

Categories

Resources