Marshalling a C++ Encoded String to C# - c#

I am trying to read in a structure that was written to a file by a C++ program (I don't have the source code). I have been trying to read this structure into C# and marshal it so far without success.
The structure is basically a set of strings of fixed length, two-bytes per character. In C++, they can be declared as TCHAR[8].
The data on disk looks like as follows:
I have tried the following C# code that I know can successfully read in the values as a string:
public void ReadTwoStringsOfFixedLength()
{
string field1 = string.Empty;
string field2 = string.Empty;
FileReadString(handle, out field1, 16);
FileReadString(handle, out field2, 16);
}
public static void FileReadString(BinaryReader reader, out string outVal, int length)
{
var mem = new MemoryStream();
outVal = string.Empty;
byte b = 0;
for (int i = 0; i < length; i++)
{
b = reader.ReadByte();
if (b != 0) mem.WriteByte(b);
}
outVal = Encoding.GetEncoding(1252).GetString(mem.ToArray());
}
However, what I really would like to do is use c# structs, since this data is represented as a struct in C++ (and contains other fields which I have not depicted here).
I have tried various methods of attempting to marshal this data based on answers I have read on StackOverflow, but none have yielded the result I wanted. In most cases, either the string encoding was incorrect, I ended up with a memory exception or I ended up with only the first character in each field (probably due to null-termination?)
Here is my code:
void Main()
{
byte[] abBuffer = handle.ReadBytes(Marshal.SizeOf(typeof(MyStruct)));
//Access data
GCHandle pinnedPacket = GCHandle.Alloc(abBuffer, GCHandleType.Pinned);
var atTestStruct = (MyStruct)Marshal.PtrToStructure(pinnedPacket.AddrOfPinnedObject(), typeof(MyStruct));
}
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi), Serializable]
struct MyStruct
{
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 16)]
string Field1 // Resulting value = "F";
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 16)]
string Field2; // Resulting value = "F"
}
Note that I have also attempted to use CharSet.Unicode, however the resulting strings are garbled.
Any help to fix the code would be much appreciated.

I think you need to set CharSet = CharSet.Unicode on your StructLayout.
46 00 69 00 in ASCII/ANSI is considered a single character and a null terminator. The documentation shows that CharSet.Unicode is needed for two-byte characters, such as those you're showing.
The SizeConst value must also be the number of characters, not bytes.

Related

Marshalling utf8 encoded chinese characters from C# to C++

I'm marshaling some Chinese characters which have the decimal representation (utf8) as
228,184,145,230,161,148
however when I receive this in C++ I end up with the chars
-77,-13,-67,-37
I can solve this using a sbyte[] instead of string in c#, but now I'm trying to marshal a string[] so I can't use this method. Anyone have an idea as to why this is happening?
EDIT: more detailed code:
C#
[DllImport("mydll.dll",CallingConvention=CallingConvention.Cdecl)]
static extern IntPtr inputFiles(IntPtr pAlzObj, string[] filePaths, int fileNum);
string[] allfiles = Directory.GetFiles("myfolder", "*.jpg", SearchOption.AllDirectories);
string[] allFilesutf8 = allfiles.Select(i => Encoding.UTF8.GetString(Encoding.Default.GetBytes(i))).ToArray();
IntPtr pRet = inputFiles(pObj, allfiles, allfiles.Length);
C++
extern __declspec(dllexport) char* inputFiles(Alz* pObj, char** filePaths, int fileNum);
char* massAdd(Alz* pObj, char** filePaths, int fileNum)
{
if (pObj != NULL) {
try{
std::vector<const char*> imgPaths;
for (int i = 0; i < fileNum; i++)
{
char* s = *(filePaths + i);
//Here I would print out the string and the result in bytes (decimals representation) are already different.
imgPaths.push_back(s);
}
string ret = pAlzObj->myfunc(imgPaths);
const char* retTemp = ret.c_str();
char* retChar = _strdup(retTemp);
return retChar;
}
catch (const std::runtime_error& e) {
cout << "some runtime error " << e.what() << endl;
}
}
}
Also, something I found is that if I change the windows universal encoding (In language settings) to use unicode UTF-8, it works fine. Not sure why though.
When marshaling to unsigned char* (or unsigned char** as it's an array) I end up with another output, which is literally just 256+the nummbers shown when in char. 179,243,189,219. This leads me to believe there is something happening during marshaling rather than a conversion mistake on the C++ side of things.
That is because C++ strings uses standard char when stored. The char type is indeed signed and that makes those values being interpreted as negative ones.
I guess that traits may be handled inside the <xstring> header on windows (as far as I know). Specifically in:
_STD_BEGIN
template <class _Elem, class _Int_type>
struct _Char_traits { // properties of a string or stream element
using char_type = _Elem;
using int_type = _Int_type;
using pos_type = streampos;
using off_type = streamoff;
using state_type = _Mbstatet;
#if _HAS_CXX20
using comparison_category = strong_ordering;
#endif // _HAS_CXX20
I have some ideas: You solve problem by using a sbyte[] instead of string in c#, and now you are trying to marshal a string[], just use List<sbyte[]> for string array.
I am not experienced with c++ but I guess there are another libraries for strings use one of them. Look this link, link show string types can marshalling to c#. https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.unmanagedtype?view=net-7.0
The issue was in the marshaling. I think it was because as the data is transferred, the locale setting in the C++ dll was set to GBK (at least not UTF-8). The trick was to convert the incoming strings into UTF-8 from GBK, which I was able to do with the following function:
std::string gb_to_utf8(char* src)
{
wchar_t* strA;
int i = MultiByteToWideChar(CP_ACP, 0, src, -1, NULL, 0);
strA = (wchar_t*)malloc(i * 2);
MultiByteToWideChar(CP_ACP, 0, src, -1, strA, i);
if (!strlen((char*)strA)) {
throw std::runtime_error("error converting");
}
char utf8[1024]; //Unsure how long converted string could be, set as large number
int n = 0;
n = wcstombs(utf8, strA, sizeof(utf8));
std::string resStr = utf8;
free(strA);
return resStr;
}
Also needed to set setlocale(LC_ALL, "en_US.UTF-8"); in order for the function above to work.

(C#) AccessViolationException when getting char ** from C++ DLL

I've written a basic C++ library that gets data from an OPC UA server and formats it into an array of strings (char **). I've confirmed that it works standalone, but now I'm trying to call it from a C# program using DLLs/pInvoke and running into serious memory errors.
My C# main:
List<String> resultList = new List<string>();
IntPtr inArr = new IntPtr();
inArr = Marshal.AllocHGlobal(inArr);
resultList = Utilities.ReturnStringArray(/*data*/,inArr);
C# Helper functions:
public class Utilities{
[DllImport(//DllArgs- confirmed to be correct)]
private static extern void getTopLevelNodes(/*data*/, IntPtr inArr);
public static List<String> ReturnStringArray(/*data*/,IntPtr inArr)
{
getTopLevelNodes(/*data*/,inArr); // <- this is where the AccessViolationException is thrown
//functions that convert char ** to List<String>
//return list
}
And finally, my C++ DLL implementation:
extern "C" EXPORT void getTopLevelNodes(*/data*/,char **ret){
std::vector<std::string> results = std::vector<std::string>();
//code that fills vector with strings from server
ret = (char **)realloc(ret, sizeof(char *));
ret[0] = (char *)malloc(sizeof(char));
strcpy(ret[0], "");
int count = 0;
int capacity = 1;
for (auto string : results){
ret[count] = (char*)malloc(sizeof(char) * 2048);
strcpy(ret[count++], string.c_str());
if (count == capacity){
capacity *= 2;
ret = (char **)realloc(ret, sizeof(char *)*capacity + 1);
}
}
What this should do is, initialize a List to hold the final result and IntPtr to be populated as a char ** by the C++ DLL, which is then processed back in C# and formatted into a List. However, an AccessViolationException is thrown every time I call getTopLevelNodes from C#. What can I do to fix this memory issue? Is this the best way to pass an array of strings via interop?
Thank you in advance
Edit:
I'm still looking for more answers, if there's a simpler way to implement string array interop between C# and a DLL, please, let me know!
METHOD 1 - Advanced Struct Marshalling.
As opposed to marshalling a list, try creating a c# struct like this:
[StructLayout(LayoutKind.Sequential, Pack = 2)]
public struct StringData
{
public string [] mylist; /* maybe better yet byte[][] (never tried)*/
};
Now in c# marshall like this:
IntPtr pnt = Marshal.AllocHGlobal(Marshal.SizeOf(StringData)); // Into Unmanaged space
Get A pointer to the structure.
StringData theStringData = /*get the data*/;
Marshal.StructureToPtr(theStringData, pnt, false);
// Place structure into unmanaged space.
getTopLevelNodes(/* data */, pnt); // call dll
theStringData =(StringData)Marshal.PtrToStructure(pnt,typeof(StringData));
//get structure back from unmanaged space.
Marshal.FreeHGlobal(pnt); // Free shared mem
Now in CPP:
#pragma pack(2)
/************CPP STRUCT**************/
struct StringDataCpp
{
char * strings[]
};
And the function:
extern "C" EXPORT void getTopLevelNodes(/*data*/,char *ret){ //just a byte pointer.
struct StringDataCpp *m = reinterpret_cast<struct StringDataCpp*>(ret);
//..do ur thing ..//
}
I have used this pattern with much more complicated structs as well. The key is that you're just copying byte by byte from c# and interpreting byte by byte in c++.
The 'pack' is key here, to ensure the structs align the same way in memory.
METHOD 2 - Simple byte array with fixed
//USE YOUR LIST EXCEPT List<byte>.
unsafe{
fixed (byte* cp = theStringData.ToArray)
{
getTopLevelNodes(/* data */, cp)
/////...../////
//SNIPPET TO CONVERT STRING ARRAY TO BYTE ARRAY
string[] stringlist = (/* get your strings*/);
byte[] theStringData = new stringlist [stringlist .Count()];
foreach (string b in parser)
{
// ADD SOME DELIMITER HERE FOR CPP TO SPLIT ON?
theStringData [i] = Convert.ToByte(stringlist [i]);
i++;
}
NOW
CPP just receives char*. You'll need a delimiter now to seperate the strings.
NOTE THAT YOUR STRING PROBABLY HAS DELIMETER '\0' ALREADY USE A REPLACE ALGORITHM TO REPLACE THAT WITH a ';' OR SOMETHING AND TOKENIZE EASILY IN A LOOP IN CPP USING STRTOK WITH ';' AS THE DELIMITER OR USE BOOST!
OR, try making a byte pointer array if possible.
Byte*[i] theStringStartPointers = &stringList[i]/* in a for loop*/
fixed(byte* *cp = theStringStartPointers) /// Continue
This way is much simpler. The unsafe block allows the fixed block and the fixed ensures that the c# memory management mechanism does not move that data.

create fixed size string in a struct in C#?

I m a newbie in C#.I want to create a struct in C# which consist of string variable of fixed size. example DistributorId of size [20]. What is the exact way of giving the string a fixed size.
public struct DistributorEmail
{
public String DistributorId;
public String EmailId;
}
If you need fixed, preallocated buffers, String is not the correct datatype.
This type of usage would only make sense in an interop context though, otherwise you should stick to Strings.
You will also need to compile your assembly with allow unsafe code.
unsafe public struct DistributorEmail
{
public fixed char DistributorId[20];
public fixed char EmailID[20];
public DistributorEmail(string dId)
{
fixed (char* distId = DistributorId)
{
char[] chars = dId.ToCharArray();
Marshal.Copy(chars, 0, new IntPtr(distId), chars.Length);
}
}
}
If for some reason you are in need of fixed size buffers, but not in an interop context, you can use the same struct but without unsafe and fixed. You will then need to allocate the buffers yourself.
Another important point to keep in mind, is that in .NET, sizeof(char) != sizeof(byte). A char is at the very least 2 bytes, even if it is encoded in ANSI.
If you really need a fixed length, you can always use a char[] instead of a string. It's easy to convert to/from, if you also need string manipulation.
string s = "Hello, world";
char[] ca = s.ToCharArray();
string s1 = new string(ca);
Note that, aside from some special COM interop scenarios, you can always just use strings, and let the framework worry about sizes and storage.
You can create a new fixed length string by specifying the length when you create it.
string(char c, int count)
This code will create a new string of 40 characters in length, filled with the space character.
string newString = new string(' ', 40);
As string extension, covers source string longer and shorter thand fixed:
public static string ToFixedLength(this string inStr, int length)
{
if (inStr.Length == length)
return inStr;
if(inStr.Length > length)
return inStr.Substring(0, length);
var blanks = Enumerable.Range(1, length - inStr.Length).Select(v => " ").Aggregate((a, b) => $"{a}{b}");
return $"{inStr}{blanks}";
}

read string and put contents to struct

I receive a string on PC, that's basically a sequence of byte-per-byte several shorts. I need to put that string into struct. For example, I need to put Hello! into this struct:
public struct serialPacket
{
public ushort first;
public ushort second;
public ushort third;
}
to get it like this:
temp.first=0x6548;
temp.second=0x6c6c;
temp.third=0x216f;
I'm not very sure about endianness, but that doesn't matter right now.
I am really frustrated, because in C/C++ it could be easily done with a little help of pointers, but I don;t know how to fix it in C#.
I'm using Marshal to handle this, but I get some junk in result:
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct serialPacket
{
[MarshalAs(UnmanagedType.U2)]
public ushort first;
[MarshalAs(UnmanagedType.U2)]
public ushort second;
[MarshalAs(UnmanagedType.U2)]
public ushort third;
}
...
IntPtr pBuf = Marshal.StringToBSTR(indata);
serialPacket ms = (serialPacket)Marshal.PtrToStructure(pBuf, typeof(serialPacket));
Marshal.FreeBSTR(pBuf);
result:
1st 101
2nd 72
3rd 108
Looks more like marshalling tears ushorts apart by single byte. The string itself is received complete, in one single shot. And it is received with a 0x0D at the end as a NewLine
This worked for me:
temp.first = 0x6548;
temp.second = 0x6c6c;
temp.third = 0x216f;
Func<ushort, string> conv = u =>
{
var bs = BitConverter.GetBytes(u);
return System.Text.ASCIIEncoding.ASCII.GetString(bs);
};
var query =
from u in new [] { temp.first, temp.second, temp.third, }
select conv(u);
var result = String.Join("", query);
// result == "Hello!"

What is the fastest (possibly unsafe) way to read a byte[]?

I'm working on a server project in C#, and after a TCP message is received, it is parsed, and stored in a byte[] of exact size. (Not a buffer of fixed length, but a byte[] of an absolute length in which all data is stored.)
Now for reading this byte[] I'll be creating some wrapper functions (also for compatibility), these are the signatures of all functions I need:
public byte ReadByte();
public sbyte ReadSByte();
public short ReadShort();
public ushort ReadUShort();
public int ReadInt();
public uint ReadUInt();
public float ReadFloat();
public double ReadDouble();
public string ReadChars(int length);
public string ReadString();
The string is a \0 terminated string, and is probably encoded in ASCII or UTF-8, but I cannot tell that for sure, since I'm not writing the client.
The data exists of:
byte[] _data;
int _offset;
Now I can write all those functions manually, like this:
public byte ReadByte()
{
return _data[_offset++];
}
public sbyte ReadSByte()
{
byte r = _data[_offset++];
if (r >= 128) return (sbyte)(r - 256);
else return (sbyte)r;
}
public short ReadShort()
{
byte b1 = _data[_offset++];
byte b2 = _data[_offset++];
if (b1 >= 128) return (short)(b1 * 256 + b2 - 65536);
else return (short)(b1 * 256 + b2);
}
public short ReadUShort()
{
byte b1 = _data[_offset++];
return (short)(b1 * 256 + _data[_offset++]);
}
But I wonder if there's a faster way, not excluding the use of unsafe code, since this seems to cost too much time for simple processing.
Check out the BitConverter class, in the System namespace. It contains methods for turning parts of byte[]s into other primitive types. I've used it in similar situations and have found it suitably quick.
As for decoding strings from byte[]s, use the classes that derive from the Encoding class, in the System.Text namespace, specifically the GetString(byte[]) method (and its overloads).
One way is to map the contents of the array to a struct (providing your structure is indeed static):
http://geekswithblogs.net/taylorrich/archive/2006/08/21/88665.aspx
using System;
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Sequential, Pack=1)]
struct Message
{
public int id;
[MarshalAs (UnmanagedType.ByValTStr, SizeConst=50)]
public string text;
}
void OnPacket(byte[] packet)
{
GCHandle pinnedPacket = GCHandle.Alloc(packet, GCHandleType.Pinned);
Message msg = (Message)Marshal.PtrToStructure(
pinnedPacket.AddrOfPinnedObject(),
typeof(Message));
pinnedPacket.Free();
}
You could use a BinaryReader.
A BinaryReader is a stream decorator so you would have to wrap the byte[] in a MemoryStream or attach the Reader directly to the network stream.
And then you have
int ReadInt32()
char[] ReadChars(int count)
etc.
Edit: Apparently you want 'faster execution'.
That means you are looking for an optimization in the conversion(s) from byte[], after those bytes have been received over (network) I/O.
In other words, you are trying to optimize the part that only takes up (an estimated) 0.1% of the time. Totally futile.

Categories

Resources