Looping over data in CUDA kernel causes app to abort

Looping over data in CUDA kernel causes app to abort - c#

issue:
As I increase the amount of data that is being processed inside of loop that is inside of CUDA kernel - it causes the app to abort!
exception:
ManagedCuda.CudaException: 'ErrorLaunchFailed: An exception occurred
on the device while executing a kernel. Common causes include
dereferencing an invalid device pointer and accessing out of bounds
shared memory.
question:
I would appreciate if somebody could shed a light on limitations that I am hitting with my current implementation and what exactly causes the app to crash..
Alternatively, I am attaching a full kernel code, for the sake if somebody could say how it can be re-modelled in such a way, when no exceptions are thrown. The idea is that kernel is accepting combinations and then performing calculations on the same set of data (in a loop). Therefore, loop calculations that are inside shall be sequential. The sequence in which kernel itself is executed is irrelevant. It's combinatorics problem.
Any bit of advice is welcomed.
code (Short version, which is enough to abort the app):
extern "C"
{
__device__ __constant__ int arraySize;
__global__ void myKernel(
unsigned char* __restrict__ output,
const int* __restrict__ in1,
const int* __restrict__ in2,
const double* __restrict__ in3,
const unsigned char* __restrict__ in4)
{
for (int row = 0; row < arraySize; row++)
{
// looping over sequential data.
}
}
}
In the example above if the arraySize is somewhere close to 50_000 then the app starts to abort. With the same kind of input parameters, if we override or hardcore the arraySize to 10_000 then the code finishes successfully.
code - kernel (full version)
#iclude <cuda.h>
#include "cuda_runtime.h"
#include <device_launch_parameters.h>
#include <texture_fetch_functions.h>
#include <builtin_types.h>
#define _SIZE_T_DEFINED
#ifndef __CUDACC__
#define __CUDACC__
#endif
#ifndef __cplusplus
#define __cplusplus
#endif
texture<float2, 2> texref;
extern "C"
{
__device__ __constant__ int width;
__device__ __constant__ int limit;
__device__ __constant__ int arraySize;
__global__ void myKernel(
unsigned char* __restrict__ output,
const int* __restrict__ in1,
const int* __restrict__ in2,
const double* __restrict__ in3,
const unsigned char* __restrict__ in4)
{
int index = blockIdx.x * blockDim.x + threadIdx.x;
if (index >= limit)
return;
bool isTrue = false;
int varA = in1[index];
int varB = in2[index];
double calculatable = 0;
for (int row = 0; row < arraySize; row++)
{
if (isTrue)
{
int idx = width * row + varA;
if (!in4[idx])
continue;
calculatable = calculatable + in3[row];
isTrue = false;
}
else
{
int idx = width * row + varB;
if (!in4[idx])
continue;
calculatable = calculatable - in3[row];
isTrue = true;
}
}
if (calculatable >= 0) {
output[index] = 1;
}
}
}
code - host (full version)
public static void test()
{
int N = 10_245_456; // size of an output
CudaContext cntxt = new CudaContext();
CUmodule cumodule = cntxt.LoadModule(#"kernel.ptx");
CudaKernel myKernel = new CudaKernel("myKernel", cumodule, cntxt);
myKernel.GridDimensions = (N + 255) / 256;
myKernel.BlockDimensions = Math.Min(N, 256);
// output
byte[] out_host = new byte[N]; // i.e. bool
var out_dev = new CudaDeviceVariable<byte>(out_host.Length);
// input
int[] in1_host = new int[N];
int[] in2_host = new int[N];
double[] in3_host = new double[50_000]; // change it to 10k and it's OK
byte[] in4_host = new byte[10_000_000]; // i.e. bool
var in1_dev = new CudaDeviceVariable<int>(in1_host.Length);
var in2_dev = new CudaDeviceVariable<int>(in2_host.Length);
var in3_dev = new CudaDeviceVariable<double>(in3_host.Length);
var in4_dev = new CudaDeviceVariable<byte>(in4_host.Length);
// copy input parameters
in1_dev.CopyToDevice(in1_host);
in2_dev.CopyToDevice(in2_host);
in3_dev.CopyToDevice(in3_host);
in4_dev.CopyToDevice(in4_host);
myKernel.SetConstantVariable("width", 2);
myKernel.SetConstantVariable("limit", N);
myKernel.SetConstantVariable("arraySize", in3_host.Length);
// exception is thrown here
myKernel.Run(out_dev.DevicePointer, in1_dev.DevicePointer, in2_dev.DevicePointer,in3_dev.DevicePointer, in4_dev.DevicePointer);
out_dev.CopyToHost(out_host);
}
analysis
My initial assumption was that I am having memory issues, however, according to VS debugger I am hitting a little above 500mb of data on a host environment. So I imagine that no matter how much data I copy to GPU - it shouldn't exceed 1Gb or even maximum 11Gb. Later on I have noticed that the crashing only is happening when the loop that is inside a kernel is having many records of data to process. It makes me to believe that I am hitting some kind of thread time-out limitations or something of that sort. Without a solid proof.
system
My system specs are 16Gb of Ram, and GeForce 1080 Ti 11Gb.
Using Cuda 9.1., and managedCuda version 8.0.22 (also tried with 9.x version from master branch)
edit 1: 26.04.2018 Just tested the same logic, but only on OpenCL. The code not only finished successfully, but also performs 1.5-5x time better than the CUDA, depending on the input parameter sizes:
kernel void Test (global bool* output, global const int* in1, global const int* in2, global const double* in3, global const bool* in4, const int width, const int arraySize)
{
int index = get_global_id(0);
bool isTrue = false;
int varA = in1[index];
int varB = in2[index];
double calculatable = 0;
for (int row = 0; row < arraySize; row++)
{
if (isTrue)
{
int idx = width * row + varA;
if (!in4[idx]) {
continue;
}
calculatable = calculatable + in3[row];
isTrue = false;
}
else
{
int idx = width * row + varB;
if (!in4[idx]) {
continue;
}
calculatable = calculatable - in3[row];
isTrue = true;
}
}
if (calculatable >= 0)
{
output[index] = true;
}
}
I don't really want to start OpenCL/CUDA war here. If there is anything I should be concerned about in my original CUDA implementation - please let me know.
edit: 26.04.2018. After following suggestions from the comment section I was able to increase the amount of data processed, before an exception is thrown, by 3x. I was able to achieve that by switching to .ptx generated in Release mode, rather than Debug mode. This improvement could be related to the fact that in Debug settings we also have Generate GPU Debug information set to Yes and other unnecessary settings that could affect performance.. I will now try to search info about how timings can be increased for kernel.. I am still not reaching the results of OpenCL, but getting close.
For CUDA file generation I am using VS2017 Community, CUDA 9.1 project, v140 toolset, build for x64 platform, post build events disabled, configuration type: utility. Code generation set to: compute_30,sm_30. I am not sure why it's not sm_70, for example. I don't have other options.

I have managed to improve the CUDA performance over OpenCL. And what's more important - the code can now finish executing without exceptions. The credits go to Robert Crovella. Thank You!
Before showing the results here are some specs:
CPU Intel i7 8700k 12 cores (6+6)
GPU GeForce 1080 Ti 11Gb
Here are my results (library/technology):
CPU parallel for loop: 607907 ms (default)
GPU (Alea, CUDA): 9905 ms (x61)
GPU (managedCuda, CUDA): 6272 ms (x97)
GPU (Coo, OpenCL): 8277 ms (x73)
THE solution 1:
The solution was to increase the WDDM TDR Delay from default 2 seconds to 10 seconds. As easy as that.
The solution 2:
I was able to squeeze out a bit more of performance by:
updating the compute_30,sm_30 settings to compute_61,sm_61 in CUDA project properties
using the Release settings instead of Debug
using .cubin file instead of .ptx
If anyone still wants to suggesst some ideas on how to improve the performance any further - please share them! I am opened to ideas. This question has been resolved, though!
p.s. if your display blinks in the same fashion as described here, then try increasing the delay as well.

Related

PInvoke of _wtof_l in C# faster than natively invoking it from C++

I was writing some high performance code in C# and I wanted to compare my implementation to native C++ one, as I was pinvoking msvcrt function a lot. To my suprise it appears that C# version of the code is faster than it's native version (!). Can someone explain this behavior?
C# version:
using System.Diagnostics;
using System.Security;
using System.Runtime.InteropServices;
class Program
{
[DllImport("msvcrt.dll", EntryPoint = "_wtof_l", CallingConvention = CallingConvention.Cdecl)]
[SuppressUnmanagedCodeSecurity]
private extern unsafe static double _wtof_l(char* str, IntPtr locale);
[DllImport("msvcrt.dll", EntryPoint = "_create_locale", CallingConvention = CallingConvention.Cdecl)]
private extern static IntPtr CreateLocale(int category, string locale);
private const int LC_NUMERIC = 4;
static unsafe void Main(string[] args)
{
var locale = CreateLocale(LC_NUMERIC, "C");
fixed (char* test = "1.2")
{
int x = 10;
while (x-- > 0)
{
var sw = Stopwatch.StartNew();
double sum = 0;
for (int i = 0; i < 10_000_000; i++)
{
sum += _wtof_l(test, locale);
}
Console.WriteLine(sum + " " + sw.ElapsedMilliseconds);
}
}
Console.ReadLine();
}
}
C++ version:
#include <locale.h>
#include <stdio.h>
#include <chrono>
#include <string>
#include <iostream>
int main()
{
auto test = L"1.2";
_locale_t locale = _create_locale(LC_NUMERIC, "C");
int x = 10;
while (x--)
{
auto start = std::chrono::high_resolution_clock::now();
double sum = 0;
for (int i = 0; i < 10000000; i++)
{
sum += _wtof_l(test, locale);
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << sum << " " << diff.count() << std::endl;
}
std::getline(std::cin, std::string());
return 0;
}
Both applications were compiled on x86 Release with VS2017, and both were run multiple times with VisualStudio turned off. Below are results on my machine. As you can see, C# version is faster by about 30%:
Can someone explain this confusing behavior? My guess would be either:
Some optimizations are not turned on in default Win32 C++ ConsoleApplication project, or C++ runtime does some initialization code in C++ application that slows down invocations of _wtof_l.

I knocked a 0 off so I wouldn't have to wait so long. Median values:
C# x86: 520 msec
C# x64: 395 msec
C++ x86: 408 msec
C++ x64: 273 msec
That I'll buy. Note how the 64-bit version of the C# program could beat the 32-bit C++ program. So that's one explanation.
But the bigger difference you see might well be because you are not comparing the same wtof() implementations. Your C++ program uses the one that came included with the VS install (typically msvcrxxx.dll), not msvcrt.dll. There was a very big rewrite in VS2015. I measured the CRT for VS2017 rtm, msvcrt.dll from Win10 Anniversary.
Bigger picture conclusions that I often see back: the pinvoke marshaller does not suck, 64-bit code does not suck, managed code tends to be 80% of native code. And YMMV.

DLL won't work in C#

Whenever I try to reference my own DLL in C# through Visual Studio, it tells me it was unable to make a reference to the DLL as it's not a COM library.
I've searched around the internet to find a solution to this with no clear answer or help any where really. It's a rather "simple" DLL which captures the raw picture data from a Fingerprint Scanner. I have tested that the C++ code worked just fine before I tried to make it into a DLL, just so you know.
I followed Microsofts guide on how to make a DLL and here is what I ended up with:
JTBioCaptureFuncsDll.h
JTBioCaptureFuncsDll.cpp
JTBioCapture.cpp
JTBioCaptureFuncsDll.h
#ifdef JTBIOCAPTUREFUNCSDLL_EXPORTS
#define JTBIOCAPTUREFUNCSDLL_API __declspec(dllexport)
#else
#define JTBIOCAPTUREFUNCSDLL_API __declspec(dllimport)
#endif
using byte = unsigned char*;
struct BioCaptureSample {
INT32 Width;
INT32 Height;
INT32 PixelDepth;
byte Buffer;
};
JTBioCaptureFuncsDll.cpp
// JTBioCapture.cpp : Defines the exported functions for the DLL application.
//
#include "stdafx.h"
namespace JTBioCapture
{
using byte = unsigned char*;
class JTBioCapture
{
public:
// Returns a Struct with Information Regarding the Fingerprint Sample
static JTBIOCAPTUREFUNCSDLL_API BioCaptureSample CaptureSample();
};
}
JTBioCapture.cpp
/*
* Courtesy of WinBio God Satish Agrawal on Stackoverflow
*/
BioCaptureSample CaptureSample()
{
HRESULT hr = S_OK;
WINBIO_SESSION_HANDLE sessionHandle = NULL;
WINBIO_UNIT_ID unitId = 0;
WINBIO_REJECT_DETAIL rejectDetail = 0;
PWINBIO_BIR sample = NULL;
SIZE_T sampleSize = 0;
// Connect to the system pool.
hr = WinBioOpenSession(
WINBIO_TYPE_FINGERPRINT, // Service provider
WINBIO_POOL_SYSTEM, // Pool type
WINBIO_FLAG_RAW, // Access: Capture raw data
NULL, // Array of biometric unit IDs
0, // Count of biometric unit IDs
WINBIO_DB_DEFAULT, // Default database
&sessionHandle // [out] Session handle
);
if (FAILED(hr))
{
wprintf_s(L"\n WinBioOpenSession failed. hr = 0x%x\n", hr);
goto e_Exit;
}
// Capture a biometric sample.
wprintf_s(L"\n Calling WinBioCaptureSample - Swipe sensor...\n");
hr = WinBioCaptureSample(
sessionHandle,
WINBIO_NO_PURPOSE_AVAILABLE,
WINBIO_DATA_FLAG_RAW,
&unitId,
&sample,
&sampleSize,
&rejectDetail
);
if (FAILED(hr))
{
if (hr == WINBIO_E_BAD_CAPTURE)
{
wprintf_s(L"\n Bad capture; reason: %d\n", rejectDetail);
}
else
{
wprintf_s(L"\n WinBioCaptureSample failed. hr = 0x%x\n", hr);
}
goto e_Exit;
}
wprintf_s(L"\n Swipe processed - Unit ID: %d\n", unitId);
wprintf_s(L"\n Captured %d bytes.\n", sampleSize);
// Courtesy of Art "Messiah" Baker at Microsoft
PWINBIO_BIR_HEADER BirHeader = (PWINBIO_BIR_HEADER)(((PBYTE)sample) + sample->HeaderBlock.Offset);
PWINBIO_BDB_ANSI_381_HEADER AnsiBdbHeader = (PWINBIO_BDB_ANSI_381_HEADER)(((PBYTE)sample) + sample->StandardDataBlock.Offset);
PWINBIO_BDB_ANSI_381_RECORD AnsiBdbRecord = (PWINBIO_BDB_ANSI_381_RECORD)(((PBYTE)AnsiBdbHeader) + sizeof(WINBIO_BDB_ANSI_381_HEADER));
PBYTE firstPixel = (PBYTE)((PBYTE)AnsiBdbRecord) + sizeof(WINBIO_BDB_ANSI_381_RECORD);
int width = AnsiBdbRecord->HorizontalLineLength;
int height = AnsiBdbRecord->VerticalLineLength;
wprintf_s(L"\n ID: %d\n", AnsiBdbHeader->ProductId.Owner);
wprintf_s(L"\n Width: %d\n", AnsiBdbRecord->HorizontalLineLength);
wprintf_s(L"\n Height: %d\n", AnsiBdbRecord->VerticalLineLength);
BioCaptureSample returnSample;
byte byteBuffer;
for (int i = 0; i < AnsiBdbRecord->BlockLength; i++) {
byteBuffer[i] = firstPixel[i];
}
returnSample.Buffer = byteBuffer;
returnSample.Height = height;
returnSample.Width = width;
returnSample.PixelDepth = AnsiBdbHeader->PixelDepth;
/*
* NOTE: (width / 3) is necessary because we ask for a 24-bit BMP but is only provided
* a greyscale image which is 8-bit. So we have to cut the bytes by a factor of 3.
*/
// Commented out as we only need the Byte buffer. Comment it back in should you need to save a BMP of the fingerprint.
// bool b = SaveBMP(firstPixel, (width / 3), height, AnsiBdbRecord->BlockLength, L"C:\\Users\\smf\\Desktop\\fingerprint.bmp");
// wprintf_s(L"\n Success: %d\n", b);
e_Exit:
if (sample != NULL)
{
WinBioFree(sample);
sample = NULL;
}
if (sessionHandle != NULL)
{
WinBioCloseSession(sessionHandle);
sessionHandle = NULL;
}
wprintf_s(L"\n Press any key to exit...");
_getch();
return returnSample;
}
The idea is that in C# you call "CaptureSample()" and then the code attempts to capture a fingerprint scan. When it does a scan, a struct should be returned to C# that it can work with, holding:
Byte Buffer
Image Height
Image Width
Image Pixeldepth
But when I try to reference the DLL in my C# project I get the following error:
I have also tried to use the TlbImp.exe tool to make the DLL but to no avail. It tells me that the DLL is not a valid type library.
So I'm a bit lost here. I'm new to C++ so making an Interop/COM Component is not something I've done before nor make a DLL for use in C#.

You cannot reference a library of unmanaged code written in C++ in a .NET Project.
So to call code from such library you have to either use DllImport, or use a WrapperClass.
I referred to this answer : https://stackoverflow.com/a/574810/4546874.

StackOverflowException while calling native (DllImport) function [duplicate]

This question already has answers here:
Why are Cdecl calls often mismatched in the "standard" P/Invoke Convention?
(2 answers)
Closed 7 years ago.
I'm currenty doing micro-benchmarks for a better understanding of clr to native code performance. In the following example I'm getting a StackOverflowException, when compiled as release and executed without debugger attached. I don't get the exception when compiling as debug-build or when running the program with debugger attached. Furthermore I also get this error only with SuppressUnmanagedCodeSecurityAttribute-Attribute.
I built a dll using c and VS2013 (platformtoolset=v120) with one function in it:
__declspec(dllexport) int __cdecl NativeTestFunction(int a, int b, int c, int d)
{
return a + c + b + d;
}
In my C#-program I use DllImport to call this function and do some timing-measurements:
[DllImport("Native.dll", EntryPoint = "NativeTestFunction")]
static extern int NativeTestFunction(int a, int b, int c, int d);
[DllImport("Native.dll", EntryPoint = "NativeTestFunction"), SuppressUnmanagedCodeSecurityAttribute]
static extern int NativeTestFunctionSuppressed(int a, int b, int c, int d);
static void Main(string[] args)
{
byte[] data = new byte[64];
int c = 0;
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 10000000; i++)
c += NativeTestFunction(2, -1, -2, 1);
Console.WriteLine("Unsuppressed: " + sw.Elapsed.ToString());
sw = Stopwatch.StartNew();
for (int i = 0; i < 10000000; i++)
c += NativeTestFunctionSuppressed(2, -1, -2, 1);
Console.WriteLine("Suppressed..: " + sw.Elapsed.ToString());
}
If I compile this code as release and start it without debugger attached the output is:
Unsuppressed: 00:00:00.2666255
Process is terminated due to StackOverflowException.
However, executed with debugger attached or compiled as debug and launched with or without debugger attached the program succeeds:
Unsuppressed: 00:00:00.2952272
Suppressed..: 00:00:00.1278980
Is this a known Bug in .NET/CLR? What is my mistake? I think the behavior should be the same between attached and not-attached debugger.
This error happens with .NET 2.0 and .NET 4.0. My software is compiled as x86 (and therefore tested only for x86) for compatibility to the Native.dll. If you don't want to setup this scenario yourself you can download my test-projects: Sourcecode.

__declspec(dllexport) int __cdecl NativeTestFunction(int a, char* b, int c, int d)
Note the type of b. It is char*. Then in the C# code you write:
[DllImport("Native.dll", EntryPoint = "NativeTestFunction"),
SuppressUnmanagedCodeSecurityAttribute]
static extern int NativeTestFunctionSuppressed(int a, int b, int c, int d);
Here you declare b to be int. That does not match. It gets worse when you call the function.
NativeTestFunctionSuppressed(2, -1, -2, 1);
Passing -1 will, in a 32 bit process, equate to passing the address 0xffffffff. Nothing good will come of attempting to de-reference that address.
The other problem is that the calling conventions do not match. The native code uses __cdecl, but the managed code uses the default of __stdcall. Change the managed code to:
[DllImport("Native.dll", EntryPoint = "NativeTestFunction",
CallingConvention = CallingConvention.Cdecl),
SuppressUnmanagedCodeSecurityAttribute]
And likewise for the other import.

C# unmanaged function calls, Possible data loss, Ascii Vs. Unicode, Chinese folder names

So here is the deal, Im working a C# application that calls into a legacy C++ dll, which in turns loops through a directory pulling back the names of certain directories, i.e directories that have .lib, I have a directory with the following 3 folders: Default.Lib,中文文本帧的文件.lib,我们的.lib.
As you can see we have some chinese folder names, a string is built in memory by the c++ code as you can see below, it use strcat to build it in memory. however when control is returned back to the c# code, it appears part of that data is lost and the only two folders left are the first two. Default.Lib,中文文本帧的文件.lib, something with 我们的.lib gets lost in translation, I would greatly appreciate any insights anyone may have. thanks.
C# code snippet
lock (padLock)
{
ConnectSign(service);
int size = MaxFileListSize * 100;
byte[] mem = new byte[size];
string finalList;
int used = size;
int fileCount = 0;
string library = "*";
string extension = "*";
V7_FILE_LIST_TYPE type = V7_FILE_LIST_TYPE.LibraryList;
fixed (byte* listbytes = mem)
{
int error = NativeMethods.GetFileDirInfo(sign, type, fileServer, library, extension, &fileCount, listbytes, &used);
if (error != 0)
throw new V7ResponseException(error, sign, service, "GetFileDirInfo");
}
finalList = Encoding.Default.GetString(mem, 0, (int)used);
string[] libraryArray = finalList.Split(new char[] { '\n', '\0' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < libraryArray.Length; i++)
{
int index = libraryArray[i].LastIndexOf(".lib", StringComparison.OrdinalIgnoreCase);
if (index > 0)
libraryArray[i] = libraryArray[i].Substring(0, index);
//libraryArray[i] = libraryArray[i].Trim().ToLower(CultureInfo.CurrentCulture).Replace(".lib", string.Empty);
}
return libraryArray;
}
[DllImport("V7SSRpc.dll", CharSet = CharSet.Ansi, EntryPoint = "V7ssGetFileDirInfo", BestFitMapping = false, ThrowOnUnmappableChar = true)]
public static extern int GetFileDirInfo(string sign, V7_FILE_LIST_TYPE type, string fileServer, string library, string extension, int* fileCount, byte* files, int* bytesUsed);
*****************************C++ DLL code--------------------------------------
//--------------------------------------------------------------------
// RETURN :
//
// PARAMS : eListType
// szServer
// szLib
// szExt
// *pdwFileCnt
// *pbyFileBuf
// *pdwFileBufSize
//
// REMARKS:
//
BOOL CVSign::apiGetFileDirInfo(V7_FILE_LIST_TYPE eListType, LPCSTR szServer, LPCSTR szLib, LPCSTR szExt,
DWORD *pdwFileCnt, char *pbyFileBuf, DWORD *pdwFileBufSize) const
{
BOOL bReturn=TRUE;
CString sServer(szServer);
CString sLib(szLib);
CString sExt(szExt);
CString sFileInfo, sTemp;
CStringArray asFiles;
CFileStatus status;
CV7Files V7Files;
DWORD dwBufUsed=0;
// SOME OTHER LOGIC (not posted)
USES_CONVERSION;
//CoInitialize(NULL);
//AVIFileInit();
CString sFilePath;
CV7SequenceFile V7Seq;
CV7FileInfo fileInfo;
// go through list of files and build the buffer with file names and other info
for (nFile=0; nFile<nFiles; nFile++)
{
// MORE OBSCURED LOGIC
sFileInfo += _T("\n");
// add file info to buffer
int nLen = sFileInfo.GetLength();
if (dwBufUsed+nLen<*pdwFileBufSize)
{
strcat(pbyFileBuf, T2CA(sFileInfo)); //<--- THIS IS THE IMPORTANT PART
int nTemp = sFileInfo.GetLength();
dwBufUsed += nTemp;
}
else
{
*pdwFileBufSize = 0;
AVIFileExit();
CoUninitialize();
return FALSE;
}
} // end for files
//AVIFileExit();
//CoUninitialize();
*pdwFileBufSize = dwBufUsed;
return bReturn;
} // end apiGetFileDirInfo()

I suspect the problem is that you're specifying Charset=CharSet.Ansi, so the default marshaling behavior is to convert the returned string to ANSI. That's going to cause a problem.
You probably want to specify string Charset=CharSet.Unicode, and possibly specify custom marshaling for some strings. See http://msdn.microsoft.com/en-us/library/s9ts558h.aspx#cpcondefaultmarshalingforstringsanchor5 for information on how to change the string marshaling behavior for individual parameters.

Convert array of structs to IntPtr

I am trying to convert an array of the RECT structure (given below) into an IntPtr, so I can send the pointer using PostMessage to another application.
[StructLayout(LayoutKind.Sequential)]
public struct RECT
{
public int Left;
public int Top;
public int Right;
public int Bottom;
// lots of functions snipped here
}
// so we have something to send, in reality I have real data here
// also, the length of the array is not constant
RECT[] foo = new RECT[4];
IntPtr ptr = Marshal.AllocHGlobal(Marshal.SizeOf(foo[0]) * 4);
Marshal.StructureToPtr(foo, ptr, true); // -- FAILS
This gives an ArgumentException on the last line ("The specified structure must be blittable or have layout information."). I need to somehow get this array of RECTs over to another application using PostMessage, so I really need a pointer to this data.
What are my options here?
UPDATE: This seems to work:
IntPtr result = Marshal.AllocHGlobal(Marshal.SizeOf(typeof(Win32.RECT)) * foo.Length);
IntPtr c = new IntPtr(result.ToInt32());
for (i = 0; i < foo.Length; i++)
{
Marshal.StructureToPtr(foo[i], c, true);
c = new IntPtr(c.ToInt32() + Marshal.SizeOf(typeof(Win32.RECT)));
}
UPDATED AGAIN to fix what arbiter commented on.

StructureToPtr expects struct object, and foo is not structure it is array, that is why exception occurs.
I can suggest you to write structures in cycle (sadly, StructureToPtr does not have overload with Index):
long LongPtr = ptr.ToInt64(); // Must work both on x86 and x64
for (int I = 0; I < foo.Length; I++)
{
IntPtr RectPtr = new IntPtr(LongPtr);
Marshal.StructureToPtr(foo[I], RectPtr, false); // You do not need to erase struct in this case
LongPtr += Marshal.SizeOf(typeof(Rect));
}
Another option is to write structure as four integers, using Marshal.WriteInt32:
for (int I = 0; I < foo.Length; I++)
{
int Base = I * sizeof(int) * 4;
Marshal.WriteInt32(ptr, Base + 0, foo[I].Left);
Marshal.WriteInt32(ptr, Base + sizeof(int), foo[I].Top);
Marshal.WriteInt32(ptr, Base + sizeof(int) * 2, foo[I].Right);
Marshal.WriteInt32(ptr, Base + sizeof(int) * 3, foo[I].Bottom);
}
And the last, you can use unsafe keyword, and work with pointers directly.

Arbiter has given you one good answer for how to marshal arrays of structs. For blittable structs like these I, personally, would use unsafe code rather than manually marshaling each element to unmanaged memory. Something like this:
RECT[] foo = new RECT[4];
unsafe
{
fixed (RECT* pBuffer = foo)
{
//Do work with pointer
}
}
or you could pin the array using a GCHandle.
Unfortunately, you say you need to send this information to another process. If the message you are posting is not one of the ones for which Windows provides automatic marshaling then you have another problem. Since the pointer is relative to the local process it means nothing in the remote process and posting a message with this pointer will cause unexpected behavior, including likely program crash. So what you need to do is write the RECT array to the other process' memory not your own. To do this you need to use OpenProcess to get a handle to the process, VitualAllocEx to allocate the memory in the other process and then WriteProcessMemory to write the array into the other process' virtual memory.
Unfortunately again, if you are going from a 32bit process to a 32bit process or from a 64bit process to a 64bit process things are quite straightforward but from a 32bit process to a 64bit process things can get a little hairy. VirtualAllocEx and WriteProcessMemory are not really supported from 32 to 64. You may have success by trying to force VirtualAllocEx to allocate its memory in the bottom 4GB of the 64bit memory space so that the resultant pointer is valid for the 32bit process API calls and then write with that pointer. In addition, you may have struct size and packing differences between the two process types. With RECT there is no problem but some other structs with packing or alignment issues might need to be manually written field by field to the 64bit process in order to match the 64bit struct layout.

You could try the following:
RECT[] rects = new RECT[ 4 ];
IntPtr[] pointers = new IntPtr[4];
IntPtr result = Marshal.AllocHGlobal(IntPtr.Size * rects.Length);
for (int i = 0; i < rects.Length; i++)
{
pointers[i] = Marshal.AllocHGlobal (IntPtr.Size);
Marshal.StructureToPtr(rects[i], pointers[i], true);
Marshal.WriteIntPtr(result, i * IntPtr.Size, pointers[i]);
}
// the array that you need is stored in result
And don't forget to free everything after you are finished.

I was unable to get this solution to work. So, I did some searching and the solution given here worked for me.
http://social.msdn.microsoft.com/Forums/en-US/clr/thread/dcfd6310-b03b-4552-b4c7-6c11c115eb45

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Looping over data in CUDA kernel causes app to abort - c#

Related

PInvoke of _wtof_l in C# faster than natively invoking it from C++

DLL won't work in C#

StackOverflowException while calling native (DllImport) function [duplicate]

C# unmanaged function calls, Possible data loss, Ascii Vs. Unicode, Chinese folder names

Convert array of structs to IntPtr

Categories

Resources