Marshalling utf8 encoded chinese characters from C# to C++ - c#

I'm marshaling some Chinese characters which have the decimal representation (utf8) as
228,184,145,230,161,148
however when I receive this in C++ I end up with the chars
-77,-13,-67,-37
I can solve this using a sbyte[] instead of string in c#, but now I'm trying to marshal a string[] so I can't use this method. Anyone have an idea as to why this is happening?
EDIT: more detailed code:
C#
[DllImport("mydll.dll",CallingConvention=CallingConvention.Cdecl)]
static extern IntPtr inputFiles(IntPtr pAlzObj, string[] filePaths, int fileNum);
string[] allfiles = Directory.GetFiles("myfolder", "*.jpg", SearchOption.AllDirectories);
string[] allFilesutf8 = allfiles.Select(i => Encoding.UTF8.GetString(Encoding.Default.GetBytes(i))).ToArray();
IntPtr pRet = inputFiles(pObj, allfiles, allfiles.Length);
C++
extern __declspec(dllexport) char* inputFiles(Alz* pObj, char** filePaths, int fileNum);
char* massAdd(Alz* pObj, char** filePaths, int fileNum)
{
if (pObj != NULL) {
try{
std::vector<const char*> imgPaths;
for (int i = 0; i < fileNum; i++)
{
char* s = *(filePaths + i);
//Here I would print out the string and the result in bytes (decimals representation) are already different.
imgPaths.push_back(s);
}
string ret = pAlzObj->myfunc(imgPaths);
const char* retTemp = ret.c_str();
char* retChar = _strdup(retTemp);
return retChar;
}
catch (const std::runtime_error& e) {
cout << "some runtime error " << e.what() << endl;
}
}
}
Also, something I found is that if I change the windows universal encoding (In language settings) to use unicode UTF-8, it works fine. Not sure why though.
When marshaling to unsigned char* (or unsigned char** as it's an array) I end up with another output, which is literally just 256+the nummbers shown when in char. 179,243,189,219. This leads me to believe there is something happening during marshaling rather than a conversion mistake on the C++ side of things.

That is because C++ strings uses standard char when stored. The char type is indeed signed and that makes those values being interpreted as negative ones.
I guess that traits may be handled inside the <xstring> header on windows (as far as I know). Specifically in:
_STD_BEGIN
template <class _Elem, class _Int_type>
struct _Char_traits { // properties of a string or stream element
using char_type = _Elem;
using int_type = _Int_type;
using pos_type = streampos;
using off_type = streamoff;
using state_type = _Mbstatet;
#if _HAS_CXX20
using comparison_category = strong_ordering;
#endif // _HAS_CXX20

I have some ideas: You solve problem by using a sbyte[] instead of string in c#, and now you are trying to marshal a string[], just use List<sbyte[]> for string array.
I am not experienced with c++ but I guess there are another libraries for strings use one of them. Look this link, link show string types can marshalling to c#. https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.unmanagedtype?view=net-7.0

The issue was in the marshaling. I think it was because as the data is transferred, the locale setting in the C++ dll was set to GBK (at least not UTF-8). The trick was to convert the incoming strings into UTF-8 from GBK, which I was able to do with the following function:
std::string gb_to_utf8(char* src)
{
wchar_t* strA;
int i = MultiByteToWideChar(CP_ACP, 0, src, -1, NULL, 0);
strA = (wchar_t*)malloc(i * 2);
MultiByteToWideChar(CP_ACP, 0, src, -1, strA, i);
if (!strlen((char*)strA)) {
throw std::runtime_error("error converting");
}
char utf8[1024]; //Unsure how long converted string could be, set as large number
int n = 0;
n = wcstombs(utf8, strA, sizeof(utf8));
std::string resStr = utf8;
free(strA);
return resStr;
}
Also needed to set setlocale(LC_ALL, "en_US.UTF-8"); in order for the function above to work.

Related

(C#) AccessViolationException when getting char ** from C++ DLL

I've written a basic C++ library that gets data from an OPC UA server and formats it into an array of strings (char **). I've confirmed that it works standalone, but now I'm trying to call it from a C# program using DLLs/pInvoke and running into serious memory errors.
My C# main:
List<String> resultList = new List<string>();
IntPtr inArr = new IntPtr();
inArr = Marshal.AllocHGlobal(inArr);
resultList = Utilities.ReturnStringArray(/*data*/,inArr);
C# Helper functions:
public class Utilities{
[DllImport(//DllArgs- confirmed to be correct)]
private static extern void getTopLevelNodes(/*data*/, IntPtr inArr);
public static List<String> ReturnStringArray(/*data*/,IntPtr inArr)
{
getTopLevelNodes(/*data*/,inArr); // <- this is where the AccessViolationException is thrown
//functions that convert char ** to List<String>
//return list
}
And finally, my C++ DLL implementation:
extern "C" EXPORT void getTopLevelNodes(*/data*/,char **ret){
std::vector<std::string> results = std::vector<std::string>();
//code that fills vector with strings from server
ret = (char **)realloc(ret, sizeof(char *));
ret[0] = (char *)malloc(sizeof(char));
strcpy(ret[0], "");
int count = 0;
int capacity = 1;
for (auto string : results){
ret[count] = (char*)malloc(sizeof(char) * 2048);
strcpy(ret[count++], string.c_str());
if (count == capacity){
capacity *= 2;
ret = (char **)realloc(ret, sizeof(char *)*capacity + 1);
}
}
What this should do is, initialize a List to hold the final result and IntPtr to be populated as a char ** by the C++ DLL, which is then processed back in C# and formatted into a List. However, an AccessViolationException is thrown every time I call getTopLevelNodes from C#. What can I do to fix this memory issue? Is this the best way to pass an array of strings via interop?
Thank you in advance
Edit:
I'm still looking for more answers, if there's a simpler way to implement string array interop between C# and a DLL, please, let me know!
METHOD 1 - Advanced Struct Marshalling.
As opposed to marshalling a list, try creating a c# struct like this:
[StructLayout(LayoutKind.Sequential, Pack = 2)]
public struct StringData
{
public string [] mylist; /* maybe better yet byte[][] (never tried)*/
};
Now in c# marshall like this:
IntPtr pnt = Marshal.AllocHGlobal(Marshal.SizeOf(StringData)); // Into Unmanaged space
Get A pointer to the structure.
StringData theStringData = /*get the data*/;
Marshal.StructureToPtr(theStringData, pnt, false);
// Place structure into unmanaged space.
getTopLevelNodes(/* data */, pnt); // call dll
theStringData =(StringData)Marshal.PtrToStructure(pnt,typeof(StringData));
//get structure back from unmanaged space.
Marshal.FreeHGlobal(pnt); // Free shared mem
Now in CPP:
#pragma pack(2)
/************CPP STRUCT**************/
struct StringDataCpp
{
char * strings[]
};
And the function:
extern "C" EXPORT void getTopLevelNodes(/*data*/,char *ret){ //just a byte pointer.
struct StringDataCpp *m = reinterpret_cast<struct StringDataCpp*>(ret);
//..do ur thing ..//
}
I have used this pattern with much more complicated structs as well. The key is that you're just copying byte by byte from c# and interpreting byte by byte in c++.
The 'pack' is key here, to ensure the structs align the same way in memory.
METHOD 2 - Simple byte array with fixed
//USE YOUR LIST EXCEPT List<byte>.
unsafe{
fixed (byte* cp = theStringData.ToArray)
{
getTopLevelNodes(/* data */, cp)
/////...../////
//SNIPPET TO CONVERT STRING ARRAY TO BYTE ARRAY
string[] stringlist = (/* get your strings*/);
byte[] theStringData = new stringlist [stringlist .Count()];
foreach (string b in parser)
{
// ADD SOME DELIMITER HERE FOR CPP TO SPLIT ON?
theStringData [i] = Convert.ToByte(stringlist [i]);
i++;
}
NOW
CPP just receives char*. You'll need a delimiter now to seperate the strings.
NOTE THAT YOUR STRING PROBABLY HAS DELIMETER '\0' ALREADY USE A REPLACE ALGORITHM TO REPLACE THAT WITH a ';' OR SOMETHING AND TOKENIZE EASILY IN A LOOP IN CPP USING STRTOK WITH ';' AS THE DELIMITER OR USE BOOST!
OR, try making a byte pointer array if possible.
Byte*[i] theStringStartPointers = &stringList[i]/* in a for loop*/
fixed(byte* *cp = theStringStartPointers) /// Continue
This way is much simpler. The unsafe block allows the fixed block and the fixed ensures that the c# memory management mechanism does not move that data.

Passing an array of strings from C# to C++ DLL function and fill it in the DLL and get back

I am writing a C# application that passes an empty string array of size 30 to a C++ DLL. This string array needs to be filled in the DLL and given back to the C# application.
I my code I observe memory corruption at the end of function call from DLL.
My C++ DLL code is as follows:
SAMPLEDLL_API BOOL InitExecution (wchar_t **paszStrings, int count)
{
for (int i = 0 ; i < count; i++)
{
mbstowcs(*(paszStrings + i), "Good",4);
//*(paszStrings + i) = "Good";
}
return TRUE;
}
My C# code is
string[] names = new[] { "Britto", "Regis" };
if (Wrapper1.InitExecution(ref names, names.Length) == 1)
MessageBox.Show("Passed");
[DllImport("MFCLibrary1.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern UInt32 InitExecution(ref string[] Names, int count);
To make this current approach work you'd need to pass StringBuilder instances rather than string. That's because the data is flowing from caller to callee. The strings are out parameters. And that means that the caller has to allocate the buffers for each string, and know how large the buffers need to be.
It's much easier to use BSTR here. This allows you to allocate the strings in the native code, and have them deallocated in the managed code. That's because BSTR is allocated on the shared COM heap, and the p/invoke marshaller understands them. Making this slight change means that the caller does not need to know how large the strings are up front.
The code would look like this:
SAMPLEDLL_API BOOL InitExecution(BSTR* names, int count)
{
for (int i = 0 ; i < count; i++)
names[i] = SysAllocStr(...);
return TRUE;
}
And on the C# side you write it like this:
[DllImport(#"mydll.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern bool InitExecution(
[Out] IntPtr[] names,
int count
);
And then you've got a bit of work to marshal from BSTR to C# string.
IntPtr[] namePtrs = new IntPtr[count];
InitExecution(namePtrs, namePtrs.Length);
string[] names = new string[namePtrs.Length];
for (int i = 0; i < namePtrs.Length; i++)
{
names[i] = Marshal.PtrToStringBSTR(namePtrs[i]);
Marshal.FreeBSTR(namePtrs[i]);
}

create fixed size string in a struct in C#?

I m a newbie in C#.I want to create a struct in C# which consist of string variable of fixed size. example DistributorId of size [20]. What is the exact way of giving the string a fixed size.
public struct DistributorEmail
{
public String DistributorId;
public String EmailId;
}
If you need fixed, preallocated buffers, String is not the correct datatype.
This type of usage would only make sense in an interop context though, otherwise you should stick to Strings.
You will also need to compile your assembly with allow unsafe code.
unsafe public struct DistributorEmail
{
public fixed char DistributorId[20];
public fixed char EmailID[20];
public DistributorEmail(string dId)
{
fixed (char* distId = DistributorId)
{
char[] chars = dId.ToCharArray();
Marshal.Copy(chars, 0, new IntPtr(distId), chars.Length);
}
}
}
If for some reason you are in need of fixed size buffers, but not in an interop context, you can use the same struct but without unsafe and fixed. You will then need to allocate the buffers yourself.
Another important point to keep in mind, is that in .NET, sizeof(char) != sizeof(byte). A char is at the very least 2 bytes, even if it is encoded in ANSI.
If you really need a fixed length, you can always use a char[] instead of a string. It's easy to convert to/from, if you also need string manipulation.
string s = "Hello, world";
char[] ca = s.ToCharArray();
string s1 = new string(ca);
Note that, aside from some special COM interop scenarios, you can always just use strings, and let the framework worry about sizes and storage.
You can create a new fixed length string by specifying the length when you create it.
string(char c, int count)
This code will create a new string of 40 characters in length, filled with the space character.
string newString = new string(' ', 40);
As string extension, covers source string longer and shorter thand fixed:
public static string ToFixedLength(this string inStr, int length)
{
if (inStr.Length == length)
return inStr;
if(inStr.Length > length)
return inStr.Substring(0, length);
var blanks = Enumerable.Range(1, length - inStr.Length).Select(v => " ").Aggregate((a, b) => $"{a}{b}");
return $"{inStr}{blanks}";
}

Sending UTF-8 chars C# to C does not work properly

I work with SQLite for C. I try to send UTF-8 Chars to .dll from c# app but everytime it's work different. For example sometimes it add "değirmenci" and another time with same code it add "değirmencil" but I don't change the word. And sometimes it's adding samething in the UNIQUE column ( I think there is a char but it is not visible like 0x01 in ascii)
Sorry about my English.
This is my c# code;
[DllImport("dllfile.dll", CharSet = CharSet.Unicode)]
static void Main()
{
byte[] bytes = System.Text.Encoding.UTF8.GetBytes("değirmenci");
int r;
//
IntPtr unmanagedPointer = Marshal.AllocHGlobal(bytes.Length);
Marshal.Copy(bytes, 0, unmanagedPointer, bytes.Length);
IntPtr ch = Tahmin_Baslat();
r = Sozcuk_Ekle(unmanagedPointer);
Console.WriteLine(r);
Console.Read();
//
}
and this is my C code
int Sozcuk_Ekle(const char* kok,int tip_1=1,int tip_2=0,int tip_3=0)
{
sqlite3 *ch;
int rc;
char *HataMsj = 0;
rc = sqlite3_open(veritabani, &ch); // Veritabanının açılması
if( rc )
{
return HATA_DEGERI;
}
char buff[strlen(kok) + 64];
sprintf(buff,"INSERT INTO kokler (kok,tip_1,tip_2,tip_3) VALUES('%s',%d,%d,%d)",kok,tip_1,tip_2,tip_3); // Verilerin Birleştirilmesi
sqlite3_exec(ch,buff,GeriBildirim,0,&HataMsj); // Komutun Yürütülmesi
sqlite3_close(ch); // Veritabanını kaynaklarının serbest bırakılması
return DOGRU_DEGERI; // Doğru Dönder
}
(header files etc. included)
And it is how it goes:
http://i.stack.imgur.com/BJ4fE.png
Solution
Adding NULL terminator to end of the bytes.
byte[] bytes = System.Text.Encoding.UTF8.GetBytes("değirmenci\0"); like this.
Check calling convention in the DllImport attribute (should be Cdecl). And add a NULL terminator to your UTF-8 string:
byte[] bytes = System.Text.Encoding.UTF8.GetBytes("değirmenci" + '\0');
This will add NULL terminator to the resulting UTF-8 string (which is not needed for native .NET strings).

passing char pointer array of c++ dLL to c# string

consider the following c++ code
#include "stdafx.h"
#include<iostream>
using namespace std;
this much part i want in c#..
void ping(int,char* d[]);
void ping(int a,char *b[])
{
int size;
size=sizeof(b)/sizeof(int); // total size of array/size of array data type
//cout<<size;
for(int i=0;i<=size;i++)
cout<<"ping "<<a<<b[i]<<endl;
}
and below part is in c++
int _tmain(int argc, _TCHAR* argv[])
{
void (*funcptr)(int,char* d[]);
char* c[]={"a","b"};
funcptr= ping;
funcptr(10,c);
return 0;
}
how can i implement the same in c#..
m new to c#. how can i have char pointer array in c#?
You usually avoid char* or char[] in favor of the string class. Rather than having a char* d[], you would have a string[] d instead, if you want an array of strings, or a simple string d if you want a single list of characters.
Interop between C++ and C# is always tricky. Some good references include Pass C# string to C++ and pass C++ result (string, char*.. whatever) to C# and Using arrays and pointers in C# with C DLL.
A string is a list of characters. With your mention of character manipulation and your use of loops I'm assuming your concern is with targetting particular characters from one list/array - and in this sense you can code almost identically when interrogating particular characters from a string (as if it were a char array).
For example:
string testString = "hello";
char testChar = testString[2];
testChar, in this case, will be equal to 'l'.
Firstly, your "C++" code is actually C and bad C at that- it won't execute correctly at all. sizeof(b) will not give you the size of the array or anything like it, it will give you the size of a pointer. Consider writing some correct C++ before attempting to convert to C#.
template<int N> void ping(int a, std::array<std::string, N>& arr) {
for(int i = 0; i < N; i++) std::cout << a << arr[i] << "\n";
}
int _tmain(int argc, _TCHAR* argv[]) {
std::array<std::string, 2> c = { "a", "b" };
ping(10, c);
return 0;
}
C# doesn't have statically sized arrays, but the rest is easily done
public static void ping(int a, string[] arr) {
for(int i = 0; i < arr.Length; i++) {
System.Console.Write(a);
System.Console.Write(arr[i]);
}
}
public static void Main(string[] args) {
string[] arr = { "a", "b" };
ping(10, arr);
}
This should help you, although note that the size of the output buffer is fixed so this won't work for dynamic sized strings, you need to know the size beforehand.
public unsafe static void Foo(char*[] input)
{
foreach(var cptr in input)
{
IntPtr ptr = new IntPtr(cptr);
char[] output = new char[5]; //NOTE: Size is fixed
Marshal.Copy(ptr, output, 0, output.Length);
Console.WriteLine(new string(output));
}
}
Remember to allow unsafe code, since that's the only way you can use fixed pointers in C# (Right-click project, properties, build, allow unsafe code).
Next time be more specific and clear, and try to act more respectfully towards people here, we're not getting paid to help you you know :-)
we can do it as
in DLL we will have
extern "C" __declspec(dllexport) void __stdcall Caller()
{
static char* myArray[3];
myArray[0]="aasdasdasdasdas8888";
myArray[1]="sssss";
FunctionPtr1(2,myArray);
}
and in C# i just added following lines
public static void ping(int a, IntPtr x)
{
Console.WriteLine("Entered Ping Command");
//code to fetch char pointer array from DLL starts here
IntPtr c = x;
int j = 0;
string[] s = new string[100]; //I want this to be dynamic
Console.WriteLine("content of array");
Console.WriteLine("");
do
{
s[j] = Marshal.PtrToStringAnsi(Marshal.ReadIntPtr(c, 4 * j));
Console.WriteLine(s[j]);
j++;
} while (s[j - 1] != null);
//**********end****************
Console.WriteLine("Integer value received from DLL is "+a);
}

Categories

Resources