Char pointer equivalent in c# - c#

I am converting c++ code into c#, and char in c++ takes 8 bits while in c# takes 16 bits. I don't know about char*, So
What is the equivalent of char * in C#, Do i use byte[] or [MarshalAs(UnmanagedType.LPStr)] StringBuilder and also tell me whether the equivalent of char from C++ to C# is byte or string?

For an input parameter it could be a string or a byte[], depending on the meaning of the parameter. If it represents a sequence of characters then use string. If the parameter is a buffer to some arbitrary data then it's most likely a byte[].
However in C/C++ a char * can also be an output parameter, such as in the sprintf function. In that case a StringBuilder or a byte[] would be the equivalent types, depending again on the meaning of the parameter.
With regards the char datatype in C#, please keep in mind that a char in C# means character, whereras the meaning the C/C++ is closer to that of a byte in C#.

Can't understand the question correctly but C# too have Char type http://www.dotnetperls.com/char

It depends:
char *ptr should be replaced with string if it acts as a string
container (like commonly used in c).
If what matters is the value in it and to bound its range in-between [-127,128] use char[] or byte[] e.g. char[] array1 = { 'b', 'c', 'a' };

Related

C# LPUTF8Str string marshaling does not appear to read string correctly from memory

LPUTF8Str string marshaling in C# simply does not work for me. I feel like I'm can't be understanding its use case correctly, but after poring over the documentation and doing various other tests, I'm not sure what I'm doing wrong.
Context
First of all, to state my base (possibly incorrect) understanding of character encodings and why C# needs to convert them, in case something is wrong here:
Standard C/C++ strings (const char* and std::string respectively) use single-byte characters by default, on Windows and elsewhere. You can have strings with two-byte characters, but these are only used if you choose to use std::wstring (which I am not doing).
The default Windows single-byte character encoding is ANSI (7-bit ASCII + an extra set of characters that uses the 8th bit).
Unicode is the mapping of printable characters to code points (ie. to unique numbers). Strings of Unicode code points are commonly encoded using conventions such as:
UTF-8: mostly one byte per character for English, with special bytes specifying where a chain of more than one byte should form a single character (for the more funky ones). 7-bit ASCII is a subset of the UTF-8 encoding.
UTF-16: two bytes per character, with similar (but rarer) continuation patterns for the really funky characters.
UTF-32: four bytes per character, which is basically never used for English and adjacent languages because it's not a very memory-efficient encoding.
To write non-ASCII characters in C/C++ strings, you can encode the literal UTF-8 bytes using \xhh, where hh is the hex encoding of the bytes. Eg. "\xF0\x9F\xA4\xA0" equates to "🤠".
C# encodes all managed strings using two-byte characters - I'm unsure if this is explicitly UTF-16, or some other Microsoft encoding. When a C/C++ string is passed to C#, it needs to be converted from single-byte (narrow) characters to two-byte (wide) characters.
Microsoft abuses the term "Unicode". They refer to two-byte character strings as "Unicode strings" in the C# documentation, thereby implying (incorrectly) that any strings that aren't two bytes per character are not Unicode. As we know from the UTF-8 encoding, this is not necessarily true - just because a string is represented as a const char* does not mean that it is not formed of Unicode characters. Colour me "\xF0\x9F\x98\x92" => "😒"
The actual issue
So, for a C++ program, it must expose strings to C# using const char* pointers, and a C# application must marshal these strings by converting them to wide characters. Let's say I have the following C++ function, which for the sake of demonstrating C# marshaling, passes data out via a struct:
// Header:
extern "C"
{
struct Library_Output
{
const char* str;
};
API_FUNC void Library_GetString(Library_Output* out);
}
// Source:
extern "C"
{
void Library_GetString(Library_Output* out)
{
if ( out )
{
// Static string literal:
out->str = "This is a UTF-8 string. \xF0\x9F\xA4\xA0";
}
}
}
In C#, I call the function like so:
public class Program
{
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
struct Library_Output
{
// This is where the marshaling type is defined.
// C# will convert the const char* pointer to
// a string automatically.
[MarshalAs(UnmanagedType.LPUTF8Str)]
public string str;
}
[DllImport("Library.dll")]
static extern void Library_GetString(IntPtr output);
private static void Main()
{
int structSize = Marshal.SizeOf(typeof(Library_Output));
IntPtr structPtr = Marshal.AllocHGlobal(structSize);
Library_GetString(structPtr);
// Tell C# to convert the data in the unmanaged memory
// buffer to a managed object.
Library_Output outputStruct =
(Library_Output)Marshal.PtrToStructure(structPtr, typeof(Library_Output));
Console.WriteLine(outputStruct.str);
Marshal.FreeHGlobal(structPtr);
}
}
Instead of printing the string to the console, what the application actually prints out is:
���n�
However, if I change the marshaling type to be UnmanagedType.LPStr rather than UnmanagedType.LPUTF8Str, I get:
This is a UTF-8 string. 🤠
This is confusing to me, because the documentation for string marshaling of structure members states:
UnmanagedType.LPStr: A pointer to a null-terminated array of ANSI characters.
UnmanagedType.LPUTF8Str: A pointer to a null-terminated array of UTF-8 encoded characters.
So ANSI string marshaling prints a UTF-8 (non-ANSI) string, but UTF-8 string marshaling prints garbage? To work out where the garbage was coming from, I had a look at what the data being printed actually was, and it appeared to be the value of the pointer itself.
Either the UTF-8 marshaling routine is treating the memory where the string pointer value resides as the string itself, or I'm misunderstanding something crucial about this process. My question, fundamentally, is twofold: firstly, why does the UTF-8 marshaling process not follow the string pointer properly, and secondly, what is actually the proper way to marshal UTF-8 strings from C++ to C#? Is it to use LPUTF8Str, or something else?

How are strings VS chars handled in C# vs Javascript?

In JavaScript, single and double quotes are somewhat interchangeable and largely a matter of styles (There is a good discussion of why this isn't actually the case in one of the answers here: When to use double or single quotes in JavaScript?). How are chars and strings handled in C#?
For example:
string test = "hello world";
string test2 = 'hello world'; // Too many characters in character literal
char test3 = 'a';
char test4 = "a"; // Cannot implicitly convert type string to char
It looks like strings and chars are being handled as separate, interchangeable types, and that the use of single or double quotes demarcates this?
What is the relationship between chars and strings in typed languages? Specifically, would it be correct to say that a string is an array of chars?
would it be correct to say that a string is an array of chars
In .NET, a string is an object containing a contiguous block of memory containing UTF-16 code units. A char is another (primitive) data type that just contains one code point, with no object overhead.
From this interesting blog post from Jon Skeet, where he compares the .NET vs. Java implementation:
A long string consists of a single large object in memory. Compare this with Java, where a String is a “normal” type in terms of memory consumption, containing an offset and length into a char array – so a long string consists of a small object referring to a large char array.
C# uses the quotes to indicate the type - ' is always a char, " is always a string.
In .NET a string behaves like a read-only array of char, so:
// Treat string like an array
char c = "This is a string"[3];
int len = "This is a string".Length
// Now we have the char at pos 3
c == 's';
What you can't do is edit them:
// This fails
"This is a string"[3] = 'x';
// This is fine, because we get a new editable char[]
char[] c = "This is a string".ToCharArray();
c[3] = 'x';
This is true in any .NET language as they all use the same string implementation. Other strongly typed frameworks have different ways of handling strings.
In .NET char can be explicitly cast to an int and implicitly cast back, so:
char c = (char) 115; // 's'
int i = c; // 115
Finally char is a value type, while string (being a collection of character bytes under the covers) is actually an immutable reference type. These behave very similarly in code but are stored in different ways in memory - I think this is why C# makes the distinction in how they are delimited - "s" and 's' are different things and stored in different ways (entirely unlike the Javascript string).

Encoding of a string in c#

I was translating some C++ code to C# and I saw the below function
myMultiByteToWideChar( encryptedBufUnicode, (char*)encryptedBuf, sizeof(encryptedBufUnicode) );
This basically converts the char array to unicode.
In C#, aren't strings and char arrays already unicode? Or do we need to make it unicode using a system.text function?
C# strings and characters are UTF-16.
If you have an array of bytes, you can use the Encoding class to read it as a string using a correct encoding.

Specified Cast is not Valid - trying to cast a char

I am trying to cast a char as follows:
while (Reader.Read())
{
VM VMResult = new VM();
VMResult.status = (char)Reader["status"];
VMList.Add(VMResult);
}
Then comes the fun part: Specified Cast is not Valid.
VMResult.status is a char
The returned data is a char(1) in sql
I assume there must be a difference in the C# / SQL char terminology.
What do you think?
I assume there must be a difference in the C# / SQL char terminology.
That's correct. A char in sql server is a fixed length character string. It can be nullable.
A char in .net is a structure that represents a single character as a UTF-16 code unit. It cannot be null since its a structure
There is no fixed length character string .Net unless you consider a char array or byte array a fixed length string.
Since most of the .net ecosystem has better support for strings than chars, char arrays or byte arrays, you're much better off just using the string that gets returned for the char(x) fields.
If you know for a fact that Reader["status"] will always be a char (or you only want the first char), and the current type of Reader["status"] is a string you can always
VMResult.status = (!string.IsNullOrEmpty(Reader["status"])) ?
Reader["status"].ToCharArray()[0] : '';
EDIT: null checking ftw.
OK so you basically want to cast a string to a char, this is going to assume that your "status" value is a single character string:
VMResult.status = Reader["status"].ToString()[0];
this also assumes Reader[] does not already return a string (if it did then the ToString is not required) and that the value is not null.

Using a marshalled BSTR type to pass an ASCII string

I have an interface that is called by unmanaged code. It passes a BSTR type but the data is in ascii string. When it's being written to the file I'm seeing unexpected characters. My thinking of how the data would travel Unmanaged[BSTR[ASCII]] --> Managed[String[ASCII]] --> File[Unicode[ASCII]] so the characters at the input should be the same as those at output. Is this correct? The interface function being called by the unmanaged code is below.
//C# interface called by unmanged code
public void WriteOutFile([In] [MarshalAs(UnmanagedType.BStr)] String asciiData)
{
File.WriteAllText(fileName, asciiData);
}
First, .NET strings are always unicode strings. You can get any representation of concrete string, using corresponding encoding, but all of chars in the string are unicode chars.
Second, if you are using UnmanagedType.BStr, then unmanaged code must pass BSTR and solve character encoding problems itself (ASCII is single byte, BSTR is double byte). If it is impossible, you should consider another type for marshaling, e.g. UnmanagedType.LPStr.

Categories

Resources