.NET Unsafe string manipulation

.NET Unsafe string manipulation - c#

I use next unsafe code for string modifying:
public static unsafe void RemoveLastOne(ref string Str1)
{
if (Str1.Length < 1)
return;
int len = Str1.Length - 1;
fixed (char* pCh1 = Str1)
{
int* pChi1 = (int*)pCh1;
pCh1[len] = '\0';
pChi1[-1] = len;
}
}
But some time later my C# programm crash with exception:
FatalExecutionEngineError:
"The runtime has encountered a fatal error.
The address of the error was at 0x6e9a80d9, on thread 0xcfc. The error
code is 0xc0000005. This error may be a bug in the CLR or in the
unsafe or non-verifiable portions of user code. Common sources of this
bug include user marshaling errors for COM-interop or PInvoke, which
may corrupt the stack."
If I change function "RemoveLastOne" to "Str1 = Str1.Remove(Str1.Length - 1);" program works fine.
Why exception happens? And how I can implement unsafe change string in C# correctly?

String values in .Net are intended to be immutable. In this function you are taking an immutable value, mutating it in several visible ways (content and length) not to mention writing data before the original. I'm not surprised at all that this would result in a later CLR crash as it special cases String values in several places and writing before the pointer is simply dangerous.
I can't really see a reason why you'd want to do the unsafe manipulation here. The safe code is straight forward and won't cause these types of hard to track down bugs.

Unsafe string manipulation is inherently incorrect. .NET strings aren't supposed to be edited, and it's most likely that there is code in the framework that is built around the assumption that a string will never change. Anything that relies on String.GetHashCode() comes immediately to mind, but there might be behind-the-scenes optimizations or sanity checks. Presumably it's something like that which is causing the CLR error.
If you're finding, after profiling, that .NET's immutable string implementation does not fit your needs, the easiest mutable alternative that would let you modify its length is a List<char>.

Related

Is it legal to modify strings?

Using a fixed statement one can have a pointer to a string. Using that pointer they can modify the string. But is it legally allowed in C# documentation?
using System;
class Program
{
static void Main()
{
string s = "hello";
unsafe
{
fixed (char* p = s)
{
p[1] = 'u';
}
}
Console.WriteLine("hello");
Console.Write("hello" + "\n");
Console.ReadKey();
}
}
// hullo
// hello
The above program modifies a string literal.

Per the language specification:
Modifying objects of managed type through fixed pointers can results [sic] in undefined behavior. For example, because strings are immutable, it is the programmer's responsibility to ensure that the characters referenced by a pointer to a fixed string are not modified.
(My emphasis)
So, it's explicitly contemplated within the language and you're not meant to do it, but it's your responsibility, not the compiler's.

"Legal" may be the wrong word to use here. "Incorrect" is what I'd say. It's possible, but Strings are defined as immutable in C#. By mutating one anyway you're violating class invariants. The runtime may react to it anyway it pleases, including "apparently working", "falling over" or "stealing your credit card info to buy tacos"*. The whole point of the unsafe keyword is that you introduce a section of your code where you say "OK I know you can't show this is safe, but trust me I know what I'm doing and it totally is".
*: The more likely risk in this particular case is that somewhere between the compiler and runtime multiple stages would be perfectly justified in inlining and constant folding accesses to string literals, but not others depending on slight variations in the code, meaning you could get inconsistent results at runtime. The bottom line is, don't do this.

Legal is such as strong word, but yes, you can. I'll add one more thing, you don't use it unless it's absolutely necessary.

Accommodating nested unsafe structs in C#

What is the best way to accommodate the following:
Real time, performance critical application that interfaces with a native C dll for communicating with a proprietary back end.
The native api has hundreds upon hundreds of structs, nested structs and methods that pass data back and forth via these structs.
Want to use c# for logic, so decided on unsafe c# in favor of cli and marshaling. I know how and have implemented this via the later so please don't reply "use cli". Marshaling hundreds of structs a hundred times a second introduces a significant enough delay that it warranted investigating unsafe c#.
Most of the c structs contain dozens of fields, so looking for a method to do minimal typing on each. At this point, got it down to running a VS macro to convert each line element to c# equivalent setting arrays to fixed size when necessary. This work pretty well until I hit a nested struct array. So for example, I have these 2 structs:
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe struct User{
int id;
fixed char name[12];
}
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe structs UserGroup{
fixed char name[12];
fixed User users[512]
int somethingElse;
fixed char anotherThing[16]
}
What is the best way to accommodate fixed User users[512] so that to not have to do much during run time?
I have seen examples where the suggestion is to do
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe structs UserGroup{
fixed char name[12];
User users_1;
User users_2;
...
User users_511;
int somethingElse;
fixed char anotherThing[16]
}
Another idea has been, to compute the size of User in bytes and just do this
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe structs UserGroup{
fixed char name[12];
fixed byte Users[28*512];
int somethingElse;
fixed char anotherThing[16]
}
But that would mean that I would have to do special treatment to this struct every time I need to use it, or wrap it with some other code. There are enough of those in the api that I would like to avoid this approach, but if someone can demonstrate an elegant way I that could work as well
A third approach that eludes me enough that I can't produce and example(i think i saw somewhere but cant find it anymore), is to specify size for User or somehow make it strictly sized so that you could use a "fixed" keyword on it.
Can anyone recommend a reasonable approach that they have utilized and scales well under load?

The best way I could find nested struct in unsafe structs is by defining them as fixed byte arrays and then providing a runtime conversion property for the field. For example:
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe struct UserGroup{
fixed char name[12];
fixed User users[512]
int somethingElse;
fixed char anotherThing[16]
}
Turns into:
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe struct UserGroup{
fixed char name[12];
fixed byte users[512 * Constants.SizeOfUser]
int somethingElse;
fixed char anotherThing[16];
public User[] Users
{
get
{
var retArr = new User[512];
fixed(User* retArrRef = retArr){
fixed(byte* usersFixed = users){
{
Memory.Copy(usersFixed, retArrRef, 512 * Constants.SizeOfUser);
}
}
}
return retArr;
}
}
}
Pleas note, this code uses Memory.Copy function provided here: http://msdn.microsoft.com/en-us/library/aa664786(v=vs.71).aspx
The general explanation of the geter is as follows:
allocate a managed array for the return value
get and fix an unsafe pointer to it
get and fix an unsafe pointer to the byte array for the struct
copy the memory from one to the other
The reason why the managed array is not getting stored back into the struct it self is because it would modify its layout and would not translate correctly anymore, while the prop is a no issue when getting it from un-managed. Alternatively, this could be wrapped in another managed object that does the storing.

GCHandle.FromIntPointer does not work as expected

Here's a very simple (complete) program for exercising the use of GCHandle.FromIntPointer:
using System;
using System.Runtime.InteropServices;
namespace GCHandleBugTest
{
class Program
{
static void Main(string[] args)
{
int[] arr = new int[10];
GCHandle handle = GCHandle.Alloc(arr, GCHandleType.Pinned);
IntPtr pointer = handle.AddrOfPinnedObject();
GCHandle handle2 = GCHandle.FromIntPtr(pointer);
}
}
}
Note that this program is essentially a transliteration of the procedure described in English on CLR via C# (4e) on page 547. Running it, however, results in an unmanaged exception like:
Additional Information: The runtime has encountered a fatal error. The address of the error was at 0x210bc39b, on thread 0x21bc. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.
Thinking that this might be a bug in .NET 4.5, and since I don't see anything obviously wrong, I tried exactly the same program in Mono on Linux (v2.10.8.1). I got the slightly more informative but still puzzling exception GCHandle value belongs to a different domain.
As far as I am aware, the handle really does belong to the same AppDomain as the code where I call GCHandle.FromIntPtr. But the fact that I see an exception in both implementations makes me suspect that I am missing some important detail. What's going on here?

You've got the wrong mental model. FromIntPtr() can only convert back the value you got from ToIntPtr(). They are convenience methods, handy in particular to store a reference to a managed object (and keep it alive) in unmanaged code. The gcroot<> template class relies on it, used in C++ projects. It is convenient because the unmanaged code only has to store the pointer.
The underlying value, the actual pointer, is called a "handle" but it is really a pointer into a table that the garbage collector maintains. The table create extra references to objects, in addition to the ones that the garbage collector finds. In essence allowing a managed object to survive even though the program no longer has a valid reference to the object.
GCHandle.AddrOfPinnedObject() returns a completely different pointer, it points to the actual managed object, not the "handle". The "belongs to a different domain" exception message is understandable since the table I mentioned is associated with an AppDomain.
The crash in .NET 4.5 strongly looks like a bug. It does perform a test with an internal CLR function called MarshalNative::GCHandleInternalCheckDomain(). The v2 version of the CLR raises an ArgumentException with the message text "Cannot pass a GCHandle across AppDomains.". But the v4 version crashes inside this method which in turn generates the ExecutionEngineException. This does not look intentional.
Feedback report filed at connect.microsoft.com

AddrOfPinnedObject is not the opposite of FromIntPtr. You want ToIntPtr instead:
IntPtr pointer = handle.ToIntPtr ();
GCHandle handle2 = GCHandle.FromIntPtr (pointer);
FromIntPtr does not take the address of the object, it takes an opaque value (which happens to be defined as IntPtr), which is used to retrieve the object with ToIntPtr.

Are there memory security levels in .NET interop?

I have a quite strange problem:
I am testing several function calls to a unmanaged C dll with NUnit. The odd thing is, the test fails when it runs normally, but when i run it with the debugger (even with no break point) it passes fine.
So, has the debugger a wider memory access as the plain NUnit application?
i have isolated the call which fails. its passing back a char pointer to a string, which the marshaller should convert to a C# string. the C side looks like this:
#define get_symbol(a) ((a).a_w.w_symbol->s_name)
EXTERN char *atom_get_symbol(t_atom *a);
...
char *atom_get_symbol(t_atom *a) {
return get_symbol(*a);
}
and the C# code:
[DllImport("csharp.dll", EntryPoint="atom_get_symbol")]
[return:MarshalAs(UnmanagedType.LPStr)]
private static extern string atom_get_symbol(IntPtr a);
the pointer which is returned from c is quite deep inside the code and part of a list. so do i just miss some security setting?
EDIT: here is the exception i get:
System.AccessViolationException : (translated to english:) there was an attempt to read or write protected memory. this might be an indication that other memory is corrupt.
at Microsoft.Win32.Win32Native.CoTaskMemFree(IntPtr ptr)
at ....atom_get_symbol(IntPtr a)
SOLUTION:
the problem was, that the marshaller wanted to free the memory which was part of a C struct. but it sould just make a copy of the string and leave the memory as is:
[DllImport("csharp.dll", EntryPoint="atom_get_symbol")]
private static extern IntPtr atom_get_symbol(IntPtr a);
and then in the code get a copy of the string with:
var string = Marshal.PtrToStringAnsi(atom_get_symbol(ptrToStruct));
great!

This will always cause a crash on Vista and up, how you avoided it at all isn't very clear. The stack trace tells the tale, the pinvoke marshaller is trying to release the string buffer that was allocated for the string. It always uses CoTaskMemFree() to do so, the only reasonable guess at an allocator that might have been used to allocate the memory for the string. But that rarely works out well, C or C++ code almost always uses the CRT's private heap. This doesn't crash on XP, it has a much more forgiving memory manager. Which produces undiagnosable memory leaks.
Notable is that the C declaration doesn't give much promise that you can pinvoke the function, it doesn't return a const char*. The only hope you have is to declare the return type as IntPtr instead of string so the pinvoke marshaller doesn't try to release the pointed-to memory. You'll need to use Marshal.PtrToStringAnsi() to convert the returned IntPtr to a string.
You'll need to test the heck out of it, call the function a billion times to ensure that you don't leak memory. If that test crashes with an OutOfMemoryException then you have a big problem. The only alternative then is to write a wrapper in the C++/CLI language and make sure that it uses the exact same version of the CRT as the native code so that they both use the same heap. Which is tricky and impossible if you don't have the source code. This function is just plain difficult to call from any language, including C. It should have been declared as int atom_get_symbol(t_atom* a, char* buf, size_t buflen) so it can be called with a buffer that's allocated by the client code.

Unsafe method to get pointer to byte array

is this behaviour will be valid in C#
public class MyClass
{
private byte[] data;
public MyClass()
{
this.data = new byte[1024];
}
public unsafe byte* getData()
{
byte* result = null;
fixed (byte* dataPtr = data)
{
result = dataPtr;
}
return result;
}
}

If you are going to turn off the safety system then you are responsible for ensuring the memory safety of the program. As soon as you do, you are required to do everything safely without the safety system helping you. That's what "unsafe" means.
As the C# specification clearly says:
the address of a moveable variable can only be obtained using a fixed statement, and that address remains valid only for the duration of that fixed statement.
You are obtaining the address of a moveable variable and then using it after the duration of the fixed statement, so the address is no longer valid. You are therefore specifically required to not do precisely what you are doing.
You should not write any unsafe code until you have a thorough and deep understanding of what the rules you must follow are. Start by reading all of chapter 18 of the specification.

This code will compile just fine however it will lead to runtime issues. The code is essentially smuggling out a pointer to an unfixed object in the heap. The next GC which moves the MyClass type around will also move the data reference with it and any previously returned values from getData will now point to the incorrect location.
var obj = new MyClass();
unsafe byte* pValue = obj.getData();
// Assuming no GC has happened (bad assumption) then this works fine
*pValue = 42;
// Assume a GC has now happened and `obj` moved around in the heap. The
// following code is now over writing memory it simply doesn't own
*pValue = 42;
Did that last line cause the app to crash, overwrite a string value in another type or simply poke a value into an uninitialized array and just screw up a math problem else where? You have no idea. Best outcome is that the code just crashes quickly but in all likely hood it will do something far more subtle and evil.

You could use the Marshal.StructureToPtr() method instead of unsafe magic :)
StructureToPtr copies the contents of structure to the pre-allocated
block of memory that the ptr parameter points to.
Marshal.StructureToPtr Method (Object, IntPtr, Boolean)

This code will not work (it will compile but at runtime it will cause problems). Once the fixed region ends, the data is no longer pinned.

No, once you leave the fixed block, the value of result is no longer valid (it may coincidentally be valid if the GC hasn't run).
The proper way to do this kind of operation is to either have a reference to a byte[] in unmanaged memory that you access through C# code, or copying the managed array into unmanaged memory.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

.NET Unsafe string manipulation - c#

Related

Is it legal to modify strings?

Accommodating nested unsafe structs in C#

GCHandle.FromIntPointer does not work as expected

Are there memory security levels in .NET interop?

Unsafe method to get pointer to byte array

Categories

Resources