In C#, I need to write T[] to a stream, ideally without any additional buffers. I have a dynamic code that converts T[] (where T is a no-objects struct) to a void* and fixes it in memory, and that works great. When the stream was a file, I could use native Windows API to pass the void * directly, but now I need to write to a generic Stream object that takes byte[].
Question: Can anyone suggest a hack way to create a dummy array object which does not actually have any heap allocations, but rather points to an already existing (and fixed) heap location?
This is the pseudo-code that I need:
void Write(Stream stream, T[] buffer)
{
fixed( void* ptr = &buffer ) // done with dynamic code generation
{
int typeSize = sizeof(T); // done as well
byte[] dummy = (byte[]) ptr; // <-- how do I create this fake array?
stream.Write( dummy, 0, buffer.Length*typeSize );
}
}
Update:
I described how to do fixed(void* ptr=&buffer) in depth in this article. I could always create a byte[], fix it in memory and do an unsafe byte-copying from one pointer to another, and than send that array to the stream, but i was hoping to avoid unneeded extra allocation and copying.
Impossible?
Upon further thinking, the byte[] has some meta data in heap with the array dimensions and the element type. Simply passing a reference (pointer) to T[] as byte[] might not work because the meta data of the block would still be that of T[]. And even if the structure of the meta data is identical, the length of the T[] will be much less than the byte[], hence any subsequent access to byte[] by managed code will generate incorrect results.
Feature requested # Microsoft Connect
Please vote for this request, hopefully MS will listen.
This kind of code can never work in a generic way. It relies on a hard assumption that the memory layout for T is predictable and consistent. That is only true if T is a simple value type. Ignoring endianness for a moment. You are dead in the water if T is a reference type, you'll be copying tracking handles that can never be deserialized, you'll have to give T the struct constraint.
But that's not enough, structure types are not copyable either. Not even if they have no reference type fields, something you can't constrain. The internal layout is determined by the JIT compiler. It swaps fields at its leisure, selecting one where the fields are properly aligned and the structure value take the minimum storage size. The value you'll serialize can only be read properly by a program that runs with the exact same CPU architecture and JIT compiler version.
There are already plenty of classes in the framework that do what you are doing. The closest match is the .NET 4.0 MemoryMappedViewAccessor class. It needs to do the same job, making raw bytes available in the memory mapped file. The workhorse there is the System.Runtime.InteropServices.SafeBuffer class, have a look-see with Reflector. Unfortunately, you can't just copy the class, it relies on the CLR to make the transformation. Then again, it is only another week before it's available.
Because stream.Write cannot take a pointer, you cannot avoid copying memory, so you will have some slowdown. You might want to consider using a BinaryReader and BinaryWriter to serialize your objects, but here is code that will let you do what you want. Keep in mind that all members of T must also be structs.
unsafe static void Write<T>(Stream stream, T[] buffer) where T : struct
{
System.Runtime.InteropServices.GCHandle handle = System.Runtime.InteropServices.GCHandle.Alloc(buffer, System.Runtime.InteropServices.GCHandleType.Pinned);
IntPtr address = handle.AddrOfPinnedObject();
int byteCount = System.Runtime.InteropServices.Marshal.SizeOf(typeof(T)) * buffer.Length;
byte* ptr = (byte*)address.ToPointer();
byte* endPtr = ptr + byteCount;
while (ptr != endPtr)
{
stream.WriteByte(*ptr++);
}
handle.Free();
}
Check out my answer to a related question:
What is the fastest way to convert a float[] to a byte[]?
In it I temporarily transform an array of floats to an array of bytes without memory allocation and copying.
To do this I changed the CLR's metadata using memory manipulation.
Unfortunately, this solution does not lend itself well to generics. However, you can combine this hack with code generation techniques to solve your problem.
Look this article Inline MSIL in C#/VB.NET and Generic Pointers the best way to get dream code :)
Related
I am working on a C++ CLI wrapper of a C API. One function in the C API expected data in the form:
void setData(byte* dataPtr, int offset, int length);
void getData(byte* buffer, int offset, int length);
For the C++ CLI it was suggested that we use a System.Collections.BitArray (Yes the individual Bits have meaning). A BitArray can be constructed from an array of bytes and copied to one:
array<System::Byte>^ bytes = gcnew array<System::Byte>(40);
System::Collections::BitArray^ ba = gcnew System::Collections::BitArray(bytes);
int length = ((ba->Length - 1)/8) +1;
array<System::Byte>^ newBytes = gcnew array<System::Byte>(length);
ba->CopyTo(newBytes, 0);
pin_ptr<unsigned char> rawDataPtr = &buffer[0];
My concern is the last line. Is it valid to get a pointer from the array in this way? Is there a better alternative in C# for working with arbitrary bits? Remember the individual bits have meaning.
Is it valid to get a pointer from the array in this way?
Yes, that's valid. The pin_ptr<> helper class calls GCHandle.Alloc() under the hood, asking for GCHandleType.Pinned. So the pointer is stable and can be passed to unmanaged code without fear that the garbage collector is going to move the array and make the pointer invalid.
A very important detail is missing from the question however. The reason that pin_ptr<> exists instead of just letting you use GCHandle directly is exactly when the GCHandle.Free() method will be called. You don't do this explicitly, pin_ptr<> does it for you, it uses the standard C++ RAII pattern. In other words, the Free() method is automatically called, it happens when the variable goes out of scope. Which gets the C++ compiler to emit the destructor call, it in turns calls Free().
This will go very, very wrong when the C function stores the passed dataPtr and uses it later. Later being the problem, the array won't be pinned anymore and can now exist at an arbitrary address. Major data corruption, very hard to diagnose. The getData() function strongly suggests that is fact the case. This is not good.
You will need to fix this, using GCHandle::Alloc() yourself to pin the array permanently is very painful to garbage collector, a rock in the road that won't budge and has a long-lasting effect on the efficiency of the program. Instead you should copy the managed array to stable memory that you allocate with, say, malloc() or Marshal::AllocHGlobal(). That's unmanaged memory, it will never move. Marshal::Copy() is a simple way to copy it.
I've looked around a lot and can't seem to find a solution to anything similar to what I'm doing. I have two applications, a native C++ app and a managed C# app. The C++ app allocates a pool of bytes that are used in a memory manager. Each object allocated in this memory manager has a header, and each header has a char* that points to a name given to the object. The C# app acts as a viewer for this memory. I use memory mapped files to allow the C# app to read the memory while the C++ app is running. My issue is that I am trying to read the name of the object from the header structure and display it in C# (or just store it in a String, whatever). Using unsafe code I am able to convert the four bytes that make up the char* into an IntPtr, convert that to a void*, and call Marshal.PtrToStringAnsi. This is the code:
IntPtr namePtr = new IntrPtr(BitConverter.ToInt32(bytes, index));
unsafe
{
void* ptr = namePtr.ToPointer();
char* cptr = (char*)ptr;
output = Marshal.PtrToStringAnsi((IntPtr)ptr);
}
In this case, bytes is the array read from the memory mapped file that represents all of the pool of bytes created by the native app, and index is the index of the first byte of the name pointer.
I have verified that, on the managed side of things, the address returned by the call to namePtr.ToPointer() is exactly the address of the name pointer in the native app. If this were native code, I would simply cast ptr to a char* and it would be fine, but in managed code I've read I must use the Marshaller to do this.
This code yields varying results. Sometimes cptr is null, sometimes it points to \0, and other times it points to a few Asian characters (which when run through the PtrToStringAnsi method produce seemingly irrelevant characters). I thought it might be a fixed thing, but ToPointer produces a fixed pointer. And sometimes after the cast to a char* the debugger says Unable to evaluate the expression. The pointer is not valid or something like that (it's not easy to repro every varying thing that comes back). And other times I get an access violation when reading the memory, which leads me to the C++ side of things.
On the C++ side, I figured there might be some issues with actually reading the memory because although the memory that stores the pointer is part of the memory mapped file the actual bytes that make up the text are not. So I looked at how to change read/write access to memory (on Windows, mind you) and found the VirtualProtect method in the Windows libraries, which I use to change access to the memory to PAGE_EXECUTE_WRITECOPY, which I figured would give any application that has a pointer to that address will be able to at least read what's there. But that didn't solve the issue either.
To put it shortly:
I have a pointer (in C#) to the first char in a char array (that was allocated in a C++ app) and am trying to read that array of char's into a string in C#.
EDIT:
The source header looks like this:
struct AllocatorHeader
{
// These bytes are reserved, and their purposes may change.
char _reserved[4];
// A pointer to a destructor mapping that is associated with this object.
DestructorMappingBase* _destructor;
// The size of the object this header is for.
unsigned int _size;
char* _name;
};
The _name field is the one I'm trying to dereference in C#.
EDIT:
As of now, even using the solutions provided below, I am unable to dereference this char* in managed code. As such, I have simply made a copy of the char* in the pool referenced by the memory mapped file and use a pointer to that. This works, which makes me believe this is a protection-related issue. If I find a way to circumvent this at some point, I will answer my own question. Until then, this will be my workaround. Thanks to all who helped!
This seems to work for me in my simple test:
private static unsafe String MarshalUnsafeCStringToString(IntPtr ptr, Encoding encoding) {
void *rawPointer = ptr.ToPointer();
if (rawPointer == null) return "";
char* unsafeCString = (char*)rawPointer;
int lengthOfCString = 0;
while (unsafeCString[lengthOfCString] != '\0') {
lengthOfCString++;
}
// now that we have the length of the string, let's get its size in bytes
int lengthInBytes = encoding.GetByteCount (unsafeCString, lengthOfCString);
byte[] asByteArray = new byte[lengthInBytes];
fixed (byte *ptrByteArray = asByteArray) {
encoding.GetBytes(unsafeCString, lengthOfCString, ptrByteArray, lengthInBytes);
}
// now get the string
return encoding.GetString(asByteArray);
}
Perhaps a bit convoluted, but assuming your string is NUL-terminated it should work.
Your code fails because pointers are only valid and meaningful in the process which owns them. You map the memory into a different process and there is a completely different virtual address space. Nothing says that the memory will be mapped into the same virtual address.
If you knew the base address in both processes you could adjust the pointers. But frankly your entire approach is deeply flawed. You really should use some real IPC here. Named pipes, sockets, WCF, even good old windows messages would suffice.
In a comment you say:
If the char array is allocated on the heap, I can dereference this pointer fine. But when the char* is initialized to something like char* a = "Hello world"; it doesn't work.
Of course not. The string literal is statically allocated by the compiler and resides in the executable module's address space. You'd need to allocate a string in your shared heap and use strcpy to copy the literal over to the shared string.
I want to get data from an IntPtr pointer into a byte array. I can use the following code to do it:
IntPtr intPtr = GetBuff();
byte[] b = new byte[length];
Marshal.Copy(intPtr, b, 0, length);
But the above code forces a copy operation from IntPtr into the byte array. It is not a good solution when the data in question is large.
Is there any way to cast an IntPtr to a byte array? For example, would the following work:
byte[] b = (byte[])intPtr
This would eliminate the need for the copy operation.
Also: how can we determine the length of data pointed to by IntPtr?
As others have mentioned, there is no way you can store the data in a managed byte[] without copying (with the current structure you've provided*). However, if you don't actually need it to be in a managed buffer, you can use unsafe operations to work directly with the unmanaged memory. It really depends what you need to do with it.
All byte[] and other reference types are managed by the CLR Garbage Collector, and this is what is responsible for allocation of memory and deallocation when it is no longer used. The memory pointed to by the return of GetBuffer is a block of unmanaged memory allocated by the C++ code and (memory layout / implementation details aside) is essentially completely separate to your GC managed memory. Therefore, if you want to use a GC managed CLR type (byte[]) to contain all the data currently held within your unmanaged memory pointed to by your IntPtr, it needs to be moved (copied) into memory that the GC knows about. This can be done by Marshal.Copy or by a custom method using unsafe code or pinvoke or what have you.
However, it depends what you want to do with it. You've mentioned it's video data. If you want to apply some transform or filter to the data, you can probably do it directly on the unmanaged buffer. If you want to save the buffer to disk, you can probably do it directly on the unmanaged buffer.
On the topic of length, there is no way to know the length of an unmanaged memory buffer unless the function that allocated the buffer also tells you what the length is. This can be done in lots of ways, as commenters have mentioned (first field of the structure, out paramtere on the method).
*Finally, if you have control of the C++ code it might be possible to modify it so that it is not responsible for allocating the buffer it writes the data to, and instead is provided with a pointer to a preallocated buffer. You could then create a managed byte[] in C#, preallocated to the size required by your C++ code, and use the GCHandle type to pin it and provide the pointer to your C++ code.
Try this:
byte* b = (byte*)intPtr;
Requires unsafe (in the function signature, block, or compiler flag /unsafe).
You can't have a managed array occupy unmanaged memory. You can either copy the unmanaged data one chunk at a time, and process each chunk, or create an UnmanagedArray class that takes an IntPtr and provides an indexer which will still use Marshal.Copy for accessing the data.
As #Vinod has pointed out, you can do this with unsafe code. This will allow you to access the memory directly, using C-like pointers. However, you will need to marshal the data into managed memory before you call any unsafe .NET method, so you're pretty much limited to your own C-like code. I don't think you should bother with this at all, just write the code in C++.
Check out this Code Project page for a solution to working with unmanaged arrays.
I have defined a struct in C# to mirror a native data structure and used the StructLayout of Sequential. To transform the struct to the 12 bytes (3x 4 bytes) required by the Socket IOControl method, I am using Marshal.Copy to copy the bytes to an array.
As the struct only contains value types, do I need to pin the structure before I perform the copy? I know the GC compacts the heap and therefore the mem address of reference types can change during a GC. Is the same the case for stack allocated value types?
The current version which includes the pin operation looks like:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct TcpKeepAliveConfiguration
{
public uint DoUseTcpKeepAlives;
public uint IdleTimeMilliseconds;
public uint KeepAlivePacketInterval;
public byte[] ToByteArray()
{
byte[] bytes = new byte[Marshal.SizeOf(typeof(TcpKeepAliveConfiguration))];
GCHandle pinStructure = GCHandle.Alloc(this, GCHandleType.Pinned);
try
{
Marshal.Copy(pinStructure.AddrOfPinnedObject(), bytes, 0, bytes.Length);
return bytes;
}
finally
{
pinStructure.Free();
}
}
}
Any thoughts?
If your structure is captured by, say, a lambda expression, it won't be stored on the stack.
Thus, I'd recommend you always pin the structure before copying.
Eric Lippert wrote an article about value type storage that might interest you.
Frédéric and Aliostad are correct; you do not know where the "this" actually lives, and therefore you don't know whether the garbage collector is allowed to move it or not.
I just want to point out that there is an equivalent solution to your problem that you might find useful. You can also solve your problem with:
public byte[] ToByteArray()
{
byte[] bytes = new byte[Marshal.SizeOf(typeof(TcpKeepAliveConfiguration))];
unsafe
{
fixed (TcpKeepAliveConfiguration* ptr = &this)
{
// now you have pinned "this" and obtained a pointer to it in one step
}
}
}
The "fixed" statement ensures that during the body of its block, the unmanaged pointer to "this" is valid because the memory cannot be moved by the garbage collector. Basically it is another way of writing your code; some people find this way a bit easier to read.
(Note that you have to check the "allow unsafe" checkbox in Visual Studio or use the "/unsafe" flag on the command line when you are building code that contains an unsafe context.)
Change the definition to class instead of struct.
Yes you have to do it - and your code looks fine to me.
You would not know where your structure is going to live. It could be part of another object's structure hence located on heap or it could be a local variable where most likely will be on the stack. If it is on heap, then you need to Pin it.
I have a structure that represents a wire format packet. In this structure is an array of other structures. I have generic code that handles this very nicely for most cases but this array of structures case is throwing the marshaller for a loop.
Unsafe code is a no go since I can't get a pointer to a struct with an array (argh!).
I can see from this codeproject article that there is a very nice, generic approach involving C++/CLI that goes something like...
public ref class Reader abstract sealed
{
public:
generic <typename T> where T : value class
static T Read(array<System::Byte>^ data)
{
T value;
pin_ptr<System::Byte> src = &data[0];
pin_ptr<T> dst = &value;
memcpy((void*)dst, (void*)src,
/*System::Runtime::InteropServices::Marshal::SizeOf(T::typeid)*/
sizeof(T));
return value;
}
};
Now if just had the structure -> byte array / writer version I'd be set! Thanks in advance!
Using memcpy to copy an array of bytes to a structure is extremely dangerous if you are not controlling the byte packing of the structure. It is safer to marshall and unmarshall a structure one field at a time. Of course you will lose the generic feature of the sample code you have given.
To answer your real question though (and consider this pseudo code):
public ref class Writer abstract sealed
{
public:
generic <typename T> where T : value class
static System::Byte[] Write(T value)
{
System::Byte buffer[] = new System::Byte[sizeof(T)]; // this syntax is probably wrong.
pin_ptr<System::Byte> dst = &buffer[0];
pin_ptr<T> src = &value;
memcpy((void*)dst, (void*)src,
/*System::Runtime::InteropServices::Marshal::SizeOf(T::typeid)*/
sizeof(T));
return buffer;
}
};
This is probably not the right way to go. CLR is allowed to add padding, reorder the items and alter the way it's stored in memory.
If you want to do this, be sure to add [System.Runtime.InteropServices.StructLayout] attribute to force a specific memory layout for the structure. In general, I suggest you not to mess with memory layout of .NET types.
Unsafe code can be made to do this, actually. See my post on reading structs from disk: Reading arrays from files in C# without extra copy.
Not altering the structure is certainly sound advice. I use liberal amounts of StructLayout attributes to specify the packing, layout and character encoding. Everything flows just fine.
My issue is just that I need a performant and preferably generic solution. Performance because this is a server application and generic for elegance. If you look at the codeproject link you'll see that the StructureToPtr and PtrToStructure methods perform on the order of 20 times slower than a simple unsafe pointer cast. This is one of those areas where unsafe code is full of win. C# will only let you have pointers to primitives (and it's not generic - can't get a pointer to a generic), so that's why CLI.
Thanks for the psuedocode grieve, I'll see if it gets the job done and report back.
Am I missing something? Why not create a new array of the same size and initialise each element seperately in a loop?
Using an array of byte data is quite dangerous unless you are targetting one platform only... for example your method doesn't consider differing endianness between the source and destination arrays.
Something I don't really understand about your question as well is why having an array as a member in your class is causing a problem. If the class comes from a .NET language you should have no issues, otherwise, you should be able to take the pointer in unsafe code and initialise a new array by going through the elements pointed at one by one (with unsafe code) and adding them to it.