About the "GetBytes" implementation in BitConverter

About the "GetBytes" implementation in BitConverter - c#

I've found that the implementation of the GetBytes function in .net framework is something like:
public unsafe static byte[] GetBytes(int value)
{
byte[] bytes = new byte[4];
fixed(byte* b = bytes)
*((int*)b) = value;
return bytes;
}
I'm not so sure I understand the full details of these two lines:
fixed(byte* b = bytes)
*((int*)b) = value;
Could someone provide a more detailed explanation here? And how should I implement this function in standard C++?

Could someone provide a more detailed explanation here?
The MSDN documentation for fixed comes with numerous examples and explanation -- if that's not sufficient, then you'll need to clarify which specific part you don't understand.
And how should I implement this function in standard C++?
#include <cstring>
#include <vector>
std::vector<unsigned char> GetBytes(int value)
{
std::vector<unsigned char> bytes(sizeof(int));
std::memcpy(&bytes[0], &value, sizeof(int));
return bytes;
}

Fixed tells the garbage collector not to move a managed type so that you can access that type with standard pointers.
In C++, if you're not using C++/CLI (i.e. not using .NET) then you can just use a byte-sized pointer (char) and loop through the bytes in whatever you're trying to convert.
Just be aware of endianness...

First fixed has to be used because we want to assign a pointer to a managed variable:
The fixed statement prevents the garbage collector from relocating a
movable variable. The fixed statement is only permitted in an unsafe
context. Fixed can also be used to create fixed size buffers.
The fixed statement sets a pointer to a managed variable and "pins"
that variable during the execution of the statement. Without fixed,
pointers to movable managed variables would be of little use since
garbage collection could relocate the variables unpredictably. The
C# compiler only lets you assign a pointer to a managed variable in a
fixed statement. Ref.
Then we declare a pointer to byte and assign to the start of the byte array.
Then, we cast the pointer to byte to a pointer to int, dereference it and assign it to the int passed in.

The function creates a byte array that contains the same binary data as your platform's representation of the integer value. In C++, this can be achieved (for any type really) like so:
int value; // or any type!
unsigned char b[sizeof(int)];
unsigned char const * const p = reinterpret_cast<unsigned char const *>(&value);
std::copy(p, p + sizeof(int), b);
Now b is an array of as many bytes as the size of the type int (or whichever type you used).
In C# you need to say fixed to obtain a raw pointer, since usually you do not have raw pointers in C# on account of objects not having a fixed location in memory -- the garbage collector can move them around at any time. fixed prevents this and fixes the object in place so a raw pointer can make sense.

You can implement GetBytes() for any POD type with a simple function template.
#include <vector>
template <typename T>
std::vector<unsigned char> GetBytes(T value)
{
return std::vector<unsigned char>(reinterpret_cast<unsigned char*>(&value),
reinterpret_cast<unsigned char*>(&value) + sizeof(value));
}

Here is a C++ header-only library that may be of help.
BitConverter
The idea of implementing the GetBytes function in C++ is straight-forward: compute each byte of the value according to specified layout. For example, let's say we need to get the bytes of an unsigned 16-bit integer in big endian. We can divide the value by 256 to get the first byte, and take the remainder as the second byte.
For floating-point numbers, the algorithm is a little bit more complicated. We need to get the sign, exponent, and mantissa of the number, and encode them as bytes. See https://en.wikipedia.org/wiki/Double-precision_floating-point_format

Related

In the C# String constructor, String(Char*), why doesn't the constructor expect a pointer to an array of characters?

I've used C# for several years but have never worked with Pointers in C#. I'm playing around with the String API and I'm lost on why a char* is actually an array and not a single char.
The API defintion:
[System.CLSCompliant(false)]
public String (char* value);
A snippet of the API documentation:
Parameters
value Char*
A pointer to a null-terminated array of Unicode characters.
Is the declaration char* singleChar a pointer to a single char? The pointer docs example seems to say so.
So, why/how can it also be a pointer to an array? Why isn't the parameter type char[]*?

A char[]* in C# is technically not possible because managed types (in this case char[]) cannot be pointers, the more correct would be char**. This is a little confusing having a pointer to a pointer, but think of it as nothing more than an array of char arrays.
A pointer in unsafe C# code is simply a reference to a region of memory. In the C language, char arrays are generally null-terminated (byte value of 0) using continuous blocks of memory for the strings. The same applies in C#, the char* is simply a pointer to a continuous block of memory of char type.
Example:
Memory address 0x10000000 : h
Memory address 0x10000002 : e
Memory address 0x10000004 : l
Memory address 0x10000006 : l
Memory address 0x10000008 : o
Memory address 0x1000000A : \0
In C#, you could have char* ptr = (char*)0x10000000 - and the memory would point to the chars h, e, l, l, o, \0.
You can convert a single C# array to a char* using the fixed keyword, which pins the char array down, preventing garbage collection or memory movement.
char[] chars = new char[64];
// fill chars with whatever
fixed (char* charPtr = chars)
{
}
chars in the above example could also be a string.
Hope some of this info is useful.

find length of unsafe byte* pointing to native byte array c#

How can I find the length of byte* in c#?
It's pointing to a native byte array in an unmanaged c++ library. I need to convert it to a c# byte[], but in order to do so, I need the length. .Length doesn't work.
byte* ETC = //Stuff from unmanaged c++ DLL;
int ETCLength = ????

You cannot know the length of something just from a pointer; the pointer is just the start. Usually, a pointer and a length are passed together. In the future, this may be improved by Span<T> - or maybe it won't! Time will tell.
You need to already know the length. This could be via an API, or it could be via documentation. There may be a pattern to the data that implies the end (nul terminators, for example, or the length being encoded in the first few bytes), but: that approach is how most buffer attacks start. You should always know the length if you're talking about pointers.

what is equal to the c++ size_t in c#

I have a struct in c++:
struct some_struct{
uchar* data;
size_t size;
}
I want to pass it between manged(c#) and native(c++). What is the equivalent of size_t in C# ?
P.S. I need an exact match in the size because any byte difference will results in huge problem while wrapping
EDIT:
Both native and manged code are under my full control ( I can edit whatever I want)

There is no C# equivalent to size_t.
The C# sizeof() operator always returns an int value regardless of platform, so technically the C# equivalent of size_t is int, but that's no help to you.
(Note that Marshal.SizeOf() also returns an int.)
Also note that no C# object can be larger than 2GB in size as far as sizeof() and Marshal.Sizeof() is concerned. Arrays can be larger than 2GB, but you cannot use sizeof() or Marshal.SizeOf() with arrays.
For your purposes, you will need to know what the version of code in the DLL uses for size_t and use the appropriate size integral type in C#.
One important thing to realise is that in C/C++ size_t will generally have the same number of bits as intptr_t but this is NOT guaranteed, especially for segmented architectures.
I know lots of people say "use UIntPtr", and that will normally work, but it's not GUARANTEED to be correct.
From the C/C++ definition of size_t, size_t
is the unsigned integer type of the result of the sizeof operator;

The best equivalent for size_t in C# is the UIntPtr type. It's 32-bit on 32-bit platforms, 64-bit on 64-bit platforms, and unsigned.

You better to use nint/nuint which is wrapper around IntPtr/UIntPtr

Converting int[] to byte: How to look at int[] as it was byte[]?

To explain: I have array of ints as input. I need to convert it to array of bytes, where 1 int = 4 bytes (big endian). In C++, I can easily just cast it and then access to the field as if it was byte array, without copying or counting the data - just direct access. Is this possible in C#? And in C# 2.0?

Yes, using unsafe code:
int[] arr =...
fixed(int* ptr = arr) {
byte* ptr2 = (byte*)ptr;
// now access ptr2[n]
}
If the compiler complains, add a (void*):
byte* ptr2 = (byte*)(void*)ptr;

You can create a byte[] 4 times the size of your int[] lenght.
Then, you iterate trough your integer array & get the byte array from:
BitConverter.GetBytes(int32);
Next you copy the 4 bytes from this function to the correct offset (i * 4) using Buffer.BlockCopy.
BitConverter
Buffer.BlockCopy

Have a look at the BitConverter class. You could iterate through the array of int, and call BitConverter.GetBytes(Int32) to get a byte[4] for each one.

If you write unsafe code, you can fix the array in memory, get a pointer to its beginning, and cast that pointer.
unsafe
{
fixed(int* pi=arr)
{
byte* pb=(byte*)pi;
...
}
}
An array in .net is prefixed with the number of elements, so you can't safely convert between int[] and byte[] that points to the same data. You can cast between uint[] and int[] (at least as far as .net is concerned, the support for this feature in C# itself is a bit inconsistent).
There is also a union based trick to reinterpret cast references, but I strongly recommend not using it.
The usual way to get individual integers from a byte array in native-endian order is BitConverter, but its relatively slow. Manual code is often faster. And of course it doesn't support the reverse conversion.
One way to manually convert assuming little-endian (managed about 400 million reads per second on my 2.6GHz i3):
byte GetByte(int[] arr, int index)
{
uint elem=(uint)arr[index>>2];
return (byte)(elem>>( (index&3)* 8));
}
I recommend manually writing code that uses bitshifting to access individual bytes if you want to go with managed code, and pointers if you want the last bit of performance.
You also need to be careful about endianness issues. Some of these methods only support native endianness.

The simplest way in type-safe managed code is to use:
byte[] result = new byte[intArray.Length * sizeof(int)];
Buffer.BlockCopy(intArray, 0, result, 0, result.Length);
That doesn't quite do what I think your question asked, since on little endian architectures (like x86 or ARM), the result array will end up being little endian, but I'm pretty sure the same is true for C++ as well.
If you can use unsafe{}, you have other options:
unsafe{
fixed(byte* result = (byte*)(void*)intArray){
// Do stuff with result.
}
}

Casting a char to an unsigned short: what happens behind the scenes?

Given this field:
char lookup_ext[8192] = {0}; // Gets filled later
And this statement:
unsigned short *slt = (unsigned short*) lookup_ext;
What happens behind the scenes?
lookup_ext[1669] returns 67 = 0100 0011 (C), lookup_ext[1670] returns 78 = 0100 1110 (N) and lookup_ext[1671] returns 68 = 0100 0100 (D); yet slt[1670] returns 18273 = 0100 0111 0110 0001.
I'm trying to port this to C#, so besides an easy way out of this, I'm also wondering what really happens here. Been a while since I used C++ regularly.
Thanks!

The statement that you show doesn't cast a char to an unsigned short, it casts a pointer to a char to a pointer to an unsigned short. This means that the usual arithmetic conversions of the pointed-to-data are not going to happen and that the underlying char data will just be interpreted as unsigned shorts when accessed through the slt variable.
Note that sizeof(unsigned short) is unlikely to be one, so that slt[1670] won't necessarily correspond to lookup_ext[1670]. It is more likely - if, say, sizeof(unsigned short) is two - to correspond to lookup_ext[3340] and lookup_ext[3341].
Do you know why the original code is using this aliasing? If it's not necessary, it might be worth trying to make the C++ code cleaner and verifying that the behaviour is unchanged before porting it.

If I understand correctly, the type conversion will be converting a char array of size 8192 to a short int array of size half of that, which is 4096.
So I don't understand what you are comparing in your question. slt[1670] should correspond to lookup_ext[1670*2] and lookup_ext[1670*2+1].

Well, this statement
char lookup_ext[8192] = {0}; // Gets filled later
Creates an array either locally or non-locally, depending on where the definition occurs. Initializing it like that, with an aggregate initializer will initialize all its elements to zero (the first explicitly, the remaining ones implicitly). Therefore i wonder why your program outputs non-zero values. Unless the fill happens before the read, then that makes sense.
unsigned short *slt = (unsigned short*) lookup_ext;
That will interpret the bytes making up the array as unsigned short objects when you read from that pointer's target. Strictly speaking, the above is undefined behavior, because you can't be sure the array is suitable aligned, and you would read from a pointer that's not pointing at the type of the original pointed type (unsigned char <-> unsigned short). In C++, the only portable way to read the value out of some other pod (plain old data. that's all the structs and simple types that are possible in C too (such as short), broadly speaking) is by using such library functions as memcpy or memmove.
So if you read *slt above, you would interpret the first sizeof(*slt) bytes of the array, and try to read it as unsigned short (that's called type pun).

When you do "unsigned short slt = (unsigned short) lookup_ext;", the no. of bytes equivalent to the size of (unsigned short) are picked up from the location given by lookup_ext, and stored at the location pointed to by slt. Since unsigned short would be 2 bytes, first two bytes from lookup_ext would be stored in the location given by slt.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

About the "GetBytes" implementation in BitConverter - c#

You can implement GetBytes() for any POD type with a simple function template. #include <vector> template <typename T> std::vector<unsigned char> GetBytes(T value) { return std::vector<unsigned char>(reinterpret_cast<unsigned char>(&value), reinterpret_cast<unsigned char>(&value) + sizeof(value)); }

Related

In the C# String constructor, String(Char*), why doesn't the constructor expect a pointer to an array of characters?

find length of unsafe byte* pointing to native byte array c#

what is equal to the c++ size_t in c#

Converting int[] to byte: How to look at int[] as it was byte[]?

Casting a char to an unsigned short: what happens behind the scenes?

Categories

Resources