Unsafe method to get pointer to byte array - c#

is this behaviour will be valid in C#
public class MyClass
{
private byte[] data;
public MyClass()
{
this.data = new byte[1024];
}
public unsafe byte* getData()
{
byte* result = null;
fixed (byte* dataPtr = data)
{
result = dataPtr;
}
return result;
}
}

If you are going to turn off the safety system then you are responsible for ensuring the memory safety of the program. As soon as you do, you are required to do everything safely without the safety system helping you. That's what "unsafe" means.
As the C# specification clearly says:
the address of a moveable variable can only be obtained using a fixed statement, and that address remains valid only for the duration of that fixed statement.
You are obtaining the address of a moveable variable and then using it after the duration of the fixed statement, so the address is no longer valid. You are therefore specifically required to not do precisely what you are doing.
You should not write any unsafe code until you have a thorough and deep understanding of what the rules you must follow are. Start by reading all of chapter 18 of the specification.

This code will compile just fine however it will lead to runtime issues. The code is essentially smuggling out a pointer to an unfixed object in the heap. The next GC which moves the MyClass type around will also move the data reference with it and any previously returned values from getData will now point to the incorrect location.
var obj = new MyClass();
unsafe byte* pValue = obj.getData();
// Assuming no GC has happened (bad assumption) then this works fine
*pValue = 42;
// Assume a GC has now happened and `obj` moved around in the heap. The
// following code is now over writing memory it simply doesn't own
*pValue = 42;
Did that last line cause the app to crash, overwrite a string value in another type or simply poke a value into an uninitialized array and just screw up a math problem else where? You have no idea. Best outcome is that the code just crashes quickly but in all likely hood it will do something far more subtle and evil.

You could use the Marshal.StructureToPtr() method instead of unsafe magic :)
StructureToPtr copies the contents of structure to the pre-allocated
block of memory that the ptr parameter points to.
Marshal.StructureToPtr Method (Object, IntPtr, Boolean)

This code will not work (it will compile but at runtime it will cause problems). Once the fixed region ends, the data is no longer pinned.

No, once you leave the fixed block, the value of result is no longer valid (it may coincidentally be valid if the GC hasn't run).
The proper way to do this kind of operation is to either have a reference to a byte[] in unmanaged memory that you access through C# code, or copying the managed array into unmanaged memory.

Related

Stackalloc vs. Fixed sized buffer in C#. What is the difference

As far as I'm concerned the following code will create an array on the stack:
unsafe struct Foo
{
public fixed int bar[10];
}
var foo = new Foo();
And stackalloc statement will do the same thing:
public void unsafe foo(int length)
{
Span<int> bar = stackalloc int[length];
}
So I'm wondering what is the difference between those approaches. And also what is the purpose of fixed size buffers at all? Everyone talks about performance boost, but I can not understand why do I need them, when I already can create an array on the stack with stackalloc. MSDN says that fixed sized buffers are used to "interoperate" with other platforms. So what is this "interoperation" looks like?
The fixed statement only says that an array is inlined (fixed inside) the struct. Meaning the data stored in the array is directly stored in your struct. In your example the Foo struct would have the size needed to store 10 integer values. As structs are value types they are allocated on the stack. However they can also be copied to the heap when for example storing them in a reference type.
class Test1
{
private Foo Foo = new();
}
unsafe struct Foo
{
public fixed int bar[10];
}
The code above will compile and the private Foo instance will live on the managed heap.
Without the fixed statement (just a "normal" int[]) the data of the array would not be stored in the struct itself but on the heap. The struct would only own a reference to that array.
When using stackalloc the data is allocated on the stack and cannot be automatically moved to the heap by the CLR. This means that stackalloc'ed data remains on the stack and the compiler will enforce this by not allowing code like this:
unsafe class Test1
{
// CS8345: Field or auto-implemented property cannot be of type 'Span<int>' unless it is an instance member of a ref struct.
private Span<int> mySpan;
public Test1()
{
// CS8353: A result of a stackalloc expression of type 'Span<int>' cannot be used in this context because it may be exposed outside of the containing method
mySpan = stackalloc int[10];
}
}
Therefore you use stackalloc when you absolutely want to make sure that your allocated data can not escape to the heap resulting in increased pressure on the grabage collector (it's a performance thing). fixed on the other hand is mainly used for interop scenarios with native C/C++ libraries which may use inlined buffers for some reason or another. So when calling methods from the native world which take structs with inlined buffers as a parameter you must be able to recreate this in .NET or else you wouldn't be able to easily work with native code (thus the fixed statement exists). Another reason to use fixed is for inlining the data in your struct which can allow for better caching when the CPU is accessing it as it can just read all the data in Foo in one go without having to dereference a reference and jump around the memory to access some array stored elsewhere.

Why does stackalloc have to be used as a variable initializer?

I'm writing some unsafe code in C# (follow-up to this question) and I'm wondering, why exactly does the stackalloc keyword have to be used as a variable initializer? e.g. This will produce a syntax error:
public unsafe class UnsafeStream
{
byte* buffer;
public UnsafeStream(int capacity)
{
this.buffer = stackalloc byte[capacity]; // "Invalid expression term 'stackalloc' / ; expected / } expected"
}
}
But re-assigning the results from a local temporary will not:
public UnsafeStream(int capacity)
{
byte* buffer = stackalloc byte[capacity];
this.buffer = buffer;
}
Why isn't the first version allowed, and what evil things will happen if I attempt the second version?
Your stack is looking something very roughly like this:
[stuff from earlier calls][stuff about where this came from][this][capacity]
^You are here
Then you do stackalloc and this adds two things to the stack, the pointer and the array pointed to:
[stuff from earlier calls][stuff about where this came from][this][capacity][buffer][array pointed to by buffer]
^You are here
And then when you return the stuff most recently put on the stack, the locals of the current function, its return address, and the stackalloced buffer are all simply ignored (which is one of the advantages of stackalloc, ignoring stuff is fast and easy):
[stuff from earlier calls][stuff about where this came from][this][capacity][buffer][array pointed to by buffer]
^You are here
It can be overwritten by the next method call:
[stuff from earlier calls][stuff about where this came from][this][new local1][new local2]o by buffer]
^You are here
What you are proposing, is that a private field, which is to say a part of an object on the heap (a different piece of memory, managed differently) hold a pointer to the buffer that has been half-overwritten by completely different data, of different types.
Immediately consequences would be:
Attempts to use buffer are now fraught because half of it is overwritten by item, most of which aren't even bytes.
Attempts to use any local is now fraught, because future changes to buffer can overwrite them with random bytes in random places.
And that's just considering the single thread involved here, never mind other threads with separate stacks perhaps being able to access that field.
It's also just not very useful. You can coerce a field to hold an address to somewhere on a stack with enough effort, but there isn't that much good one can do with it.

Cannot declare pointer to non unmanaged type int* C#

unsafe static void Main(string[] args)
{
int i=6;
int* j = &i; //Allowed
int* k = j; //Allowed
fixed(int* q = &i) //Allowed
{
}
fixed(int* q = j) //Cannot declare pointer to non unmanaged type int*
{
}
}
What I am doing with the 2nd fixed block is pretty much the same as what I am doing with the first fixed block.
I am assigning the address of a variable i, to a pointer q. Direct address assignment is permitted, while taking the address in another pointer and using that pointer in the assignment fails. The same steps however worked, outside of the fixed context.. What's going on?!
The fixed statement is used to pin a managed variable, so that the garbage collector won't move it.
In normal operation, the garbage collector is free to move objects around. This would be a problem for pointers because if the garbage collector moved an object you had a pointer to, the pointer would no longer be valid. fixed provides a solution to this by letting you pin the variable, telling the garbage collector that pointers to the object may exist and that it must not move them for the duration of the code block.
The C# compiler only lets you assign a pointer to a managed variable in a fixed statement.
Variables allocated on the stack (value types) are not subject to the garbage collector and will not move in memory, so fixed is both unnecessary and incorrect for such variables. That's why your second (and indeed your first) fixed statement produces an error.
What version of C# are you using? The first fixed statement doesn't compile for me either (You cannot use the fixed statement to take the address of an already fixed expression).
Fixing an unmanaged pointer doesn't make sense. It's already fixed, it can't ever be touched by the GC.
This changes when you make i a member field of a class, for example. Suddenly, it's no longer scoped to the method, and can be moved by the GC (along with its countaining object). In that case, you have to use the fixed statement.
The compiler will not allow you to take a pointer to an unfixed managed variable, and it will not allow you to fix an unmanaged or fixed variable.
The same way, if you take a pointer to the beginning of an array, eg. make int[] i and take &i[0], it again needs to be fixed, because it's no longer guaranteed to be locally scoped. If you do need a locally scoped unmanaged array, you can use the stackalloc keyword, but again, that basically means you're cutting yourself from the relative safety of managed .NET.

Perversion with unsafe C#, memory stack allocation

Here I'm trying to work with unsafe features of C#: http://ideone.com/L9uwZ5
I know, that such way in C# is worst, and I want to admit, that there is some info in the topic. Look at the word "perversion".
I would like to implement quick sort in C# like pure-C way (not even C++). It could be crazy, but just want to look deep at the possibilities of unsafe C#.
I was always trying to use stackalloc operator. I know, that it's an allocation from stack, not from heap, and that's why I get failure with executing of my program.
But I was confused when I haven't seen any exception/error in this program.
Why didn't I get any explicit exceptions/errors?
Also, as you see the commented part of code:
struct Header
{
internal int* data;
};
Header* object_header = stackalloc Header[sizeof(Header)];
object_header->data = stackalloc int[length];
I can't compile it with the last line. C# compiler tells, that in this expression stackalloc couldn't be used. Why? data is int* type, so why did error occur here?
I want just to use stack frame and not to use heap.
I know, that there is another way, but it's an allocation from heap.
int*[] data = new int*[length * sizeof(int)];
IntPtr result = Marshal.AllocHGlobal(Marshal.SizeOf(length * sizeof(int)));
Marshal.WriteInt32(result, 0);
for(int i = 0; i < length * sizeof(int); i++) d[i] = (int*)result;
For example, but it's not stack allocation.
How could I solve my perversion task, explicitly with the stack-allocation and pure-C style syntax in C# language.
That C# wasn't created for such aims and such features are silly - I know, but the main question is not about significance, it's about such features.
Marc showed the workaround, I'll try to explain why this is required. You are writing, in effect, unmanaged code but the method is still very much a managed method. It gets compiled from IL into machine code and its stack frame and cpu registers will be searched by the garbage collector for object references.
The jitter performs two important duties when it compiles a method. One is obvious and highly visible, translating the IL to machine code. But there's another very important task and it is completely invisible, it generates metadata for a method. A table that shows what parts of the stack frame contains object references and what parts store pointers and value type values. And at which points in the code a cpu register will store an object reference. Also, at what point in the method code an object reference goes out of scope. The reason for GC.KeepAlive(), a pretty unique method that generates no code at all.
The garbage collector needs that table to reliably find object references. This table however has only one level of indirection. It can describe the stack space allocated for object_header and mark the pointer and the pointed-to stack area as "do not scan for object references". It cannot describe the chunk of stack space when you directly assign object_header->data. It doesn't have the extra indirection to sub-divide the stack into smaller sections and describe Header. Using the dummy local variable solves the problem.
stackalloc wants to assign to a variable. The following works but you would have to be really careful to unassign that before leaving the method - if you leave object_header->data point to a location in the stack: bad things:
int* ptr = stackalloc int[length];
object_header->data = ptr;
The fact that it must be assigned to a local variable is explicit in the specification:
local-variable-initializer:
…
stackalloc-initializer
stackalloc-initializer:
stackalloc unmanaged-type [ expression ]

Confusion on whether to use fixed with unsafe code and stackalloc

I have a block of code below with a single line commented out. What happens in the CreateArray method is the same thing that the commented out line does. My question is why does it work when the line b->ArrayItems = d is uncommented, but return garbage when commented out? I don't think I have to "fixed" anything, because all of the information is unmanaged. Is this assumption incorrect?
class Program
{
unsafe static void Main(string[] args)
{
someInstance* b = stackalloc someInstance[1];
someInstance* d = stackalloc someInstance[8];
b->CreateArray();
// b->ArrayItems = d;
*(b->ArrayItems)++ = new someInstance() { IntConstant = 5 };
*(b->ArrayItems)++ = new someInstance() { IntConstant = 6 };
Console.WriteLine((b)->ArrayItems->IntConstant);
Console.WriteLine(((b)->ArrayItems - 1)->IntConstant);
Console.WriteLine(((b)->ArrayItems - 2)->IntConstant);
Console.Read();
}
}
public unsafe struct someInstance
{
public someInstance* ArrayItems;
public int IntConstant;
public void CreateArray()
{
someInstance* d = stackalloc someInstance[8];
ArrayItems = d;
}
}
My question is why does it work when the line is uncommented, but return garbage when commented out.
The commented line is what is masking the bug caused by CreateArray. Commenting it out exposes the bug. But the bug is there regardless.
As the specification clearly states:
All stack allocated memory blocks created during the execution of a function member are automatically discarded when that function member returns.
The CreateArray function allocates a block, you store a pointer to the block, the block is discarded, and now you have a pointer to a garbage block. You are required to never store a pointer to a stackalloc'd block such that the storage can be accessed after the block becomes invalid. Heap allocate the block if you need to store a reference to it, and remember to deallocate it when you're done.
Remember, in unsafe code you are required to fully understand everything about the managed memory model. Everything. If you don't understand everything about managed memory, don't write unsafe code.
That said, let's address what seems to be your larger confusion, which is "when do you have to fix memory to obtain a pointer?" The answer is simple. You have to fix memory if and only if it is movable memory. Fixing transforms movable memory into immovable memory; that's what fixing is for.
You can only take the address of something that is immovable; if you take the address of something that is movable and it moves then obviously the address is wrong. You are required to ensure that memory will not move before you take its address, and you are required to ensure that you do not use the address after it becomes movable again.
Your assumption is partially correct, but understood incorrectly. Here's a quote from this MSDN page:
In unsafe mode, you can allocate
memory on the stack, where it is not
subject to garbage collection and
therefore does not need to be pinned.
See stackalloc for more information.
Some statements will allocate variables on the stack automatically (i.e., value types inside a method), others will need to be specified specifically using stackalloc.
Stack allocated memory is discarded after the method ends, hence your issue (see Eric Lipperts answer, who wrote this before me).
Stackalloc allocates some space on the callstack, that space is then lost when you move up out of the current level of context (for example, leaving a method). You're problem is that when the stackalloc is inside a method then that area of the stack is no longer yours to play with when you leave that method.
So, if you do this:
foo()
{
stuff = stackalloc byte[1]
Do something with stuff
}
"stuff" is only valid inside foo, once you leave foo the stack is wound back, which means that if you do this:
foo()
{
byte* allocate()
{
return stackalloc[1]
}
stuff = allocate()
do something with stuff
}
then the return value of allocate becomes rubbish when you leave the allocate method, which means that "stuff" never makes any sense.

Categories

Resources