char (C++) manipulation in C#

char (C++) manipulation in C# - c#

I am trying to rewrite old code written in C++ to C# - code does binary manipulation with chars, but I recieve different results (probably I do some bad manipulation because of Unicode in C#).
I need to rewrite this C++ code to C#:
myChar = 'K' ^ 128;
Result of this code in C++ is -53 ('Ë') in C++'s data type char.
Same operation in C# results in 203 (again 'Ë') in C#'s data type char.
So char is ok, but I need same byte value as in C++ (because I do math operation with that). Can you recommend way, how to safe convert C# char to equivalent C++ byte values?
Thanks

In a single byte two's complement representation 203 is an unsigned interpretation of of -53.
If you would like to use an equivalent representation of C++ signed char, the type should be sbyte:
sbyte myChar = (sbyte)('K' ^ 128);
Note that C++ standard leaves it up to the implementation to decide whether a char is signed or unsigned, which means that some standard-compliant C++ will print 203 for myChar, not -58, without any change to your code.

Related

unsigned char* causes Access Violation Exception

I am developing a wrapper library that allow my project using a x86 C++ dll library in any CPU environment, I have no control about the dll thus I am using DllImport in C#.
There is a provided function which declared in C++: int __stdcall Func(int V, unsigned char *A)
and provided a sample declaration in VB: Private Declare Function Func Lib "lib.dll" Alias "_Func#8" (ByVal V As Long, A As Any) As Long
This function will request a device to Add/Deduct a value to/from a card by passing Convert.ToInt64(decimalValue) as V, and some customize information in A.
Here is the description of A:
It is a byte pointer containing 7 bytes.
The first 5 bytes are used to stores info that will be passed to the card log (The last 4 digits of the receipt number should be included in the first 2 bytes, the other 3 could be A3A4A5)
The last 2 bytes are used to stores info that will be passed to the device (The last 4 digits of the receipt number)
On return, the A contains a 32 bytes data.
After hours and hours of researches and tries, I cannot make result other than 'Access Violation Exception'. Please see the following draft code:
[DllImport("lib.dll", EntryPoint="_Func#8")]
public static external Int64 Func(Int64 V, StringBuilder sb);
string ReceiptNum = "ABC1234";
decimal Amount = 10m;
byte[] A = new byte[32];
A[0] = Convert.ToByte(ReceiptNum.Substring(3, 2));
A[1] = Convert.ToByte(ReceiptNum.Substring(5));
A[2] = Convert.ToByte("A3");
A[3] = Convert.ToByte("A4");
A[4] = Convert.ToByte("A5");
A[5] = Convert.ToByte(ReceiptNum.Substring(3, 2));
A[6] = Convert.ToByte(ReceiptNum.Substring(5));
StringBuilder sb = new StringBuilder(
new ASCIIEncoding().GetString(A), A.Length
);
Int64 Result = Func(Convert.ToInt64(Amount), sb);
And at this point it throws the exception. I have tried passing IntPtr, byte*, byte (by A[0]), byval, byref and none of them works. (Tried to deploy as x86 CPU as well)
Would appreciate any help! Thanks for your time!
PS - The reason of using StringBuilder is the library contains a function that accept a "char *Data" parameter that causes the same exception, and the solution is using StringBuilder to pass as a pointer, this function's VB Declaration is: Private Declare Function Func1 Lib "lib.dll" Alias "_Func1#12(ByVal c As Byte, ByVal o As Byte, ByVal Data As String) As Long

Your external definition is wrong.
StringBuilder is a complex structure containing an array of c# char.
c# chars are utf-16 (double bytes with complex rules for decoding unicode multichar caracters). Probably not what your are seeking.
If your data is a raw byte bufer you should go for byte[]
Int64 is also c# long.

Well, your native method signature takes int, and you're trying to pass a long long. That's not going to work, rather obviously. The same is true with the return value. Don't assume that VB maps clearly to VB.NET, much less C# - Long means a 32-bit integer in VB, but not in .NET. Native code is a very complex environment, and you better know what you're doing when trying to interface with native.
StringBuilder should only be used for character data. That's not your case, and you should use byte[] instead. No matter the fun things you're doing, you're trying to pass invalid unicode data instead of raw bytes. The confusion is probably from the fact that C doesn't distinguish between byte[] and string - both are usually represented as char*.
Additionally, I don't see how you'd expect this wrapper to work in an AnyCPU environment. If the native DLL is 32-bit, you can only use it from a 32-bit process. AnyCPU isn't magic, it just defers the decision of bit-ness to runtime, rather than compile-time.

How to read/write unsigned byte array between C# and Java on a file?

This question is a bit similar to my previous one, where I asked a "cross-language" way to write and read integers between a Java and a C# program. Problem was the endianess.
Anyway, I'm facing a secondary problem. My goal is now to store and retrieve an array of unsigned bytes (values from 0 to 255) in a way that it can be processed by both Java and C#.
In C# it's easy, since unsigned byte[] exists:
BinaryWriterBigEndian writer = new BinaryWriterBigEndian(fs);
// ...
writer.Write(byteData, 0, byteData.Length);
BinaryWriterBigEndian is... well... a big-endian binary writer ;)
This way, the file will contain a sequence composed by, for example, the following values (in a big-endian representation):
[0][120][50][250][221]...
Now it's time to do the same thing under Java. Since unsigned byte[] does not exist here, the array is stored in memory as a (signed) int[] in order to have the possibility to represent values higher than 127.
How to write it as a sequence of unsigned byte values like C# does?
I tried with this:
ByteBuffer byteBuffer = ByteBuffer.allocate(4 * dataLength);
IntBuffer intBuffer = byteBuffer.asIntBuffer();
intBuffer.put(intData);
outputStream.write(byteBuffer.array());
Writing goes well, but C# is not able to read it in the proper way.

Since unsigned byte[] does not exist [...]
You don't care. Signed or unsigned, a byte is ultimately 8 bits. Just use a regular ByteBuffer and write your individual bytes in it.
In C# as well in Java, 1000 0000 (for instance) is exactly the same binary representation of a byte; the fact that in C# it can be treated as an unsigned value, and not in Java, is irrelevant as long as you don't do any arithmetic on the value.
When you need a readable representation of it and you'd like it to be unsigned, you can use (int) (theByte & 0xff) (you need the mask, otherwise casting will "carry" the sign bit).
Or, if you use Guava, you can use UnsignedBytes.toString().

Pass a UTF character from C++/CLI to C#

How do I pass a UTF-16 char from a C++/CLI function to a .NET function? What types do I use on the C++/CLI side and how do I convert it?
I've currently defined the C++/CLI function as follows:
wchar_t GetCurrentTrackID(); // 'wchar_t' is the C++ unicode char equivalent to .NET's 'char'?
The .NET wrapper is defined as:
System::Char GetCurrentTrackID(); // here, 'char' means UTF-16 char
I'm currently using this to convert it, but when testing it I only get a null character. How do I properly convert a unicode char code to its char equivalent for .NET?
#pragma managed
return (System::Char)player->GetCurrentTrackID();

They are directly compatible. You can assign a Char to a wchar_t and the other way around without a cast, the compiler will not emit any kind of conversion function call. This is true for many simple value types in C++/CLI, like Boolean vs bool, SByte vs char, Byte vs unsigned char, Int16 vs short, Int32 vs int or long, Int64 vs long long, Single vs float, Double vs double. Plus their unsigned varieties. The compiler will treat them as aliases since they have the exact same binary representation.
But not strings or arrays, they are classes with a non-trivial implementation that doesn't match their native versions at all.

is java byte the same as C# byte?

Native method from dll works in java if the input parameter is array of bytes - byte[].
If we use the same method from c# it throws EntryPointNotFoundException.
Is that because of byte[] in java and c# are different things? and if it's so how should I use native function from c#?

Java lacks the unsigned types. In particular, Java lacks a primitive type for an unsigned byte. The Java byte type is signed, while the C# byte is unsigned and sbyte is signed.

Is that because of byte[] in java and c# are different things?
Yes.
Endianness: Java stores things internally as Big Endian, while .NET is Little Endian by default.
Signedness: C# bytes are unsigned. Java bytes are signed.
See different results when converting int to byte array - .NET vs Java.

What's the signature of the native function? How do you declare it in Java and in C#?
The most common reason for EntryPointNotFoundException is that function name is mangled (esp. true if function is written in C++) or misspelled.
Another source of problem is 'W' and 'A' suffixes for WinAPI function used to distinguish ANSI and Unicode versions of functions. .NET interop mechanism can try to guess the function suffix, so that may be the source of confusion,

Java Byte:
java byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). The byte data type can be useful for saving memory in large arrays, where the memory savings actually matters. They can also be used in place of int where their limits help to clarify your code; the fact that a variable's range is limited can serve as a form of documentation.
more for Java Byte
C# Byte
Byte Represents an 8-bit unsigned integer,Byte is an immutable value type that represents unsigned integers with values that range from 0 (which is represented by the Byte.MinValue constant) to 255 (which is represented by the Byte.MaxValue constant). The .NET Framework also includes a signed 8-bit integer value type, SByte, which represents values that range from -128 to 127.
more for c# Byte

Casting a char to an unsigned short: what happens behind the scenes?

Given this field:
char lookup_ext[8192] = {0}; // Gets filled later
And this statement:
unsigned short *slt = (unsigned short*) lookup_ext;
What happens behind the scenes?
lookup_ext[1669] returns 67 = 0100 0011 (C), lookup_ext[1670] returns 78 = 0100 1110 (N) and lookup_ext[1671] returns 68 = 0100 0100 (D); yet slt[1670] returns 18273 = 0100 0111 0110 0001.
I'm trying to port this to C#, so besides an easy way out of this, I'm also wondering what really happens here. Been a while since I used C++ regularly.
Thanks!

The statement that you show doesn't cast a char to an unsigned short, it casts a pointer to a char to a pointer to an unsigned short. This means that the usual arithmetic conversions of the pointed-to-data are not going to happen and that the underlying char data will just be interpreted as unsigned shorts when accessed through the slt variable.
Note that sizeof(unsigned short) is unlikely to be one, so that slt[1670] won't necessarily correspond to lookup_ext[1670]. It is more likely - if, say, sizeof(unsigned short) is two - to correspond to lookup_ext[3340] and lookup_ext[3341].
Do you know why the original code is using this aliasing? If it's not necessary, it might be worth trying to make the C++ code cleaner and verifying that the behaviour is unchanged before porting it.

If I understand correctly, the type conversion will be converting a char array of size 8192 to a short int array of size half of that, which is 4096.
So I don't understand what you are comparing in your question. slt[1670] should correspond to lookup_ext[1670*2] and lookup_ext[1670*2+1].

Well, this statement
char lookup_ext[8192] = {0}; // Gets filled later
Creates an array either locally or non-locally, depending on where the definition occurs. Initializing it like that, with an aggregate initializer will initialize all its elements to zero (the first explicitly, the remaining ones implicitly). Therefore i wonder why your program outputs non-zero values. Unless the fill happens before the read, then that makes sense.
unsigned short *slt = (unsigned short*) lookup_ext;
That will interpret the bytes making up the array as unsigned short objects when you read from that pointer's target. Strictly speaking, the above is undefined behavior, because you can't be sure the array is suitable aligned, and you would read from a pointer that's not pointing at the type of the original pointed type (unsigned char <-> unsigned short). In C++, the only portable way to read the value out of some other pod (plain old data. that's all the structs and simple types that are possible in C too (such as short), broadly speaking) is by using such library functions as memcpy or memmove.
So if you read *slt above, you would interpret the first sizeof(*slt) bytes of the array, and try to read it as unsigned short (that's called type pun).

When you do "unsigned short slt = (unsigned short) lookup_ext;", the no. of bytes equivalent to the size of (unsigned short) are picked up from the location given by lookup_ext, and stored at the location pointed to by slt. Since unsigned short would be 2 bytes, first two bytes from lookup_ext would be stored in the location given by slt.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

char (C++) manipulation in C# - c#

Related

unsigned char* causes Access Violation Exception

How to read/write unsigned byte array between C# and Java on a file?

Pass a UTF character from C++/CLI to C#

is java byte the same as C# byte?

Casting a char to an unsigned short: what happens behind the scenes?

Categories

Resources