Copy string to memory buffer in C# - c#

What is the best way to copy a string into a raw memory buffer in C#?
Note that I already know how and when to use String and StringBuilder, so don't suggest that ;) - I'm going to need some text processing & rendering code and currently it looks like a memory buffer is both easier to code and more performant, as long as I can get the data into it. (I'm thinking of B-tree editor buffers and memory mapped files, something which doesn't map well into managed C# objects but is easily coded with pointers.)
Things I already considered:
C++/CLI can do the thing, there is PtrToStringChars in vcclr.h which can then be passed to memcpy, but I'm usually preferring only having one assembly and merging the IL from multiple languages is something I like to avoid. Any way to rewrite that function in C#?
System.Runtime.InteropServices.Marshal has functions which copy the string, but only to a newly allocated buffer. Couldn't find any function to copy into an existing buffer.
I could use String.CopyTo and use an array instead of a memory buffer, but then I need to pin that buffer a lot (or keep it pinned all the time) which is going to be bad for GC. (By using a memory buffer in the first place I can allocate it outside the managed heap so it doesn't mess with the GC.)
If there's a way to pin or copy a StringBuilder then that would probably work too. My text usually comes from either a file or a StringBuilder, so if I can already move it into the memory buffer at that point it never needs to go through a String instance. (Note that going from StringBuilder to String doesn't matter for performance because this is optimized to not make a copy if you stop using the StringBuilder afterwards.)
Can I generate IL which pins a String or StringBuilder? Then instead of writing the copy-function in C# I could generate a DynamicMethod by emitting the required IL. Just now thought of this while writing the question, so I might just try to disassembly the C++/CLI way and reproduce the IL.

enable unsafe code(Somewhere in the project options), then use:
unsafe
{
fixed(char* pc = myString)
{
}
}
and then just use low level memory copies.

Related

Mutable String in unmanaged memory useable in managed space

NOTE: My case is in the ecosystem of an old API that only work with Strings, no modern .NET additions.
So I have a strong need to have mutable string that has no allocations. String is updated every X ms so you can figure out how much garbage it can produce just in few minutes (StringBuilder is not even close to being relevant here at all). My current approach is to pre-allocate string of fixed size and mutate it via pinning, writing characters directly and either falling off silently of throwing when capacity reached.
This works fine. The allocated string is long-lived so eventually GC will promote it to Gen2 and pinning wont bother it that much, minimizing overhead. There are 2 major issues though:
Because string is fixed, I have to pad it with \0 and while this worked fine so far with all default NET/MONO functionality and 3rd party stuff, no way telling how something else will react when string is 1024 in len, but last 100 are \0
I cant resize it, because this will incur allocation. I could take one allocation once a blue moon but since string is fairly dynamic I cant be sure when it will try expand or shrink further. I COULD use "expand only" approach, this way I allocate only when expansion needed, however, this has disadvantages of padding overhead (if string expanded to be 5k characters, but next string is just 3k - 2k characters will be padded for extra cycles) and also memory extra usage. I'm not sure how GC will feel about mchuge, often pinned string in Gen2 and not in LOH. Another way would be pool re-useable string objects, however, this has higher memory and GC overhead + lookup overhead.
Since the target string has to live for quite some time, I was thinking about moving it into Unmanaged memory, via byte buffer. This will remove burden from GC (pinning penalty overhead) and I can re-size/re-allocate at less cost than in managed heap.
What I'm having hard time to understand is - how can I possibly slice specific part of allocated unmanaged buffer and wrap it as a normal net string to use in managed space/code? Like, pass it to Console.WriteLine or some 3rd party library that draws UI label on screen and accepts string. Is this even doable?
P.S. As far as I know, the plan for NET5 (and to be finalized in NET6, I think) that you will no longer be able to mutate things like string (either blocked at runtime or undefined failure). Their solution seems to be POH which is essentially what I describe, with the same limitations.
how can I possibly slice specific part of allocated unmanaged buffer and wrap it as a normal net string to use in managed space/code
As far as I know this is not possible. .Net has their own way to define objects (object headers etc), you cannot treat some arbitrary memory region as a .net object. Pinning and mutating a string seem dangerous since strings are intended to be immutable, and some things might not work correctly (using the string as a dictionary key for example).
The correct way would be (as Canton7 mentions) to use a char[] buffer and Span<char> / Memory<char> for slicing the string. When passing to other methods you can convert a slice of the string to an actual string object. When calling methods like Console.WriteLine or UI methods, the overhead of allocating the string object will be irrelevant compared to everything else that is going on.
If you have old code that only accepts string you would either need to accept the limitations this entails, or rewrite the code to accept memory/span representations.
I would highly recommend profiling to see if it is an actual problem with frequent allocations. As long as the string fits in the small object heap (SOH, i.e. less than 87kb) and is not promoted to gen 2 the overhead might not be huge. Allocations on the SOH is fast, and the time to run a gen 0 GC does not scale directly with the amount allocated. So updating every few milliseconds might not be terrible. I would be more worried if you where talking about microseconds.

C strtol vs C# long.Parse

I wonder why C# does not have a version of long.Parse accepting an offset in the string and length. In effect, I am forced to call string.Substring first.
This is unlike C strtol where one does not need to extract the substring first.
If I need to parse millions of rows I have a feeling there will be overhead creating those small strings that immediately become garbage.
Is there a way to parse a string into numbers efficiently without creating temporary short lived garbage strings on the heap? (Essentially doing it the C way)
Unless I'm reading this wrong, strtol doesn't take an offset into the string. It takes a memory address, which the caller can set to any position within a character buffer (or outside the buffer, if they aren't paying attention).
This presents a couple issues:
Computation of the offset requires an understanding of how the string is encoded. I believe c# uses UTF16 for in-memory strings, currently anyway. if that were ever to change, your offsets would be off, possibly with disastrous results.
Computation of the address could easily go stale for managed objects since they are not pinned in memory-- they could be moved around by memory management at any time. You'd have to pin it in memory using something like GCHandle.Alloc. When you're done, you'd better unpin it, or you could have serious problems!
If you get the address wrong, e.g. outside your buffer, your program is likely going to blow up.
I think C programmers are more accustomed to managing memory mapped objects themselves and have no issue computing offsets and addresses and monkeying around with them like you would with assembly. With a managed language like c# those sorts of things require more work and aren't typically done-- the only time we pin things in memory is when we have to pass objects off to unmanaged code. When we do it, it incurs overhead. I wouldn't advise it if your overall goal is to improve performance.
But if you are hell bent on getting down to the bare metal on this, you could try this solution where one clever c# programmer would read the string as an array of ASCII-encoded bytes and compute the numbers based on that. With his solution you can specify start and length to your heart's content. You'd have to write something different if your strings are encoded in UTF. I would go this route rather than trying to hack the string object's memory mapping.

Serialize into native memory buffer from C# using flatbuffers

Is it possible with flatbuffers in C# to serialize objects to native (unmanaged) memory buffer?
So I want to do these steps:
Allocate a native memory buffer from native memory
Create objects in C# and serialize them into the allocated buffer
Send this memory buffer to C++ for deserialization
I'm thinking either of some custom memory buffer allocator in C#, or of some way of transferring ownership of a memory buffer form C# to C++.
In general I want to avoid copying memory when sending data from C# to C++ and vice versa. I want this memory buffer to be shared between C# and C++.
How do I do that?
No, the current FlatBuffers implementation is hard-coded to write to a regular byte array. You could copy this array to native memory afterwards, or like #pm100 says, pin it.
All serialization in FlatBuffers goes through an abstraction called the ByteBuffer, so if you made an implementation of that for native memory, it could be used directly relatively easily.
Yes, if you use C++/CLI. Basic data types such as bool, 32-bit int, short, etc are same. For other types check it out msclr::interop::marshal_as<>.
Similar post: C++/CLI Converting from System::String^ to std::string

C# array to implement support for page-file architecture

Let me explain what I need to accomplish. I need to load a file into RAM and analyze its structure. What I was doing is this:
//Stream streamFile;
byte[] bytesFileBuff = new byte[streamFile.Length];
if(streamFile.Read(bytesFileBuff, 0, streamFile.Length) == streamFile.Length)
{
//Loaded OK, can now analyze 'bytesFileBuff'
//Go through bytes in 'bytesFileBuff' array from 0 to `streamFile.Length`
}
But in my previous experience with Windows and 32-bit processes, it seems like even smaller amounts of RAM can be hard to allocate. (In that particular example I failed to allocate 512MB on a Windows 7 machine with 16GB of installed RAM.)
So I was curious, is there a special class that would allow me to work with the contents on a file of hypothetically any length (by implementing an internal analog of a page-file architecture)?
If linear stream access (even with multiple passes) is not a viable option, the solution in Win32 would be to use Memory Mapped Files with relatively small Views.
I didn't think you could do that in C# easily, but I was wrong. It turns out that .NET 4.0 and above provide classes wrapping the Memory Mapped Files API.
See http://msdn.microsoft.com/en-us/library/dd997372.aspx
If you have used memory mapped files in C/C++, you will know what to do.
The basic idea would be to use MemoryMappedFileCreateFromFile to obtain a MemoryMappedFile object. With the object, you can call the CreateViewAccessor method to get different MemoryMappedViewAccessor objects that represent chunks of the file; you can use these objects to read from the file in chunks of your choice. Make sure you dispose MemoryamappedViewAccessors diligently to release the memory buffer.
You have to work out the right strategy for using memory mapped files. You don't want to create too many small views or you will suffer a lot of overhead. Too few larger views and you will consume a lot of memory.
(As I said, I didn't know about these class wrappers in .NET. Do read the MSDN docs carefully: I might have easily missed something important in the few minutes I spent reviewing them)

When would I need to use the stackalloc keyword in C#?

What functionality does the stackalloc keyword provide? When and Why would I want to use it?
From MSDN:
Used in an unsafe code context to allocate a block of memory on the
stack.
One of the main features of C# is that you do not normally need to access memory directly, as you would do in C/C++ using malloc or new. However, if you really want to explicitly allocate some memory you can, but C# considers this "unsafe", so you can only do it if you compile with the unsafe setting. stackalloc allows you to allocate such memory.
You almost certainly don't need to use it for writing managed code. It is feasible that in some cases you could write faster code if you access memory directly - it basically allows you to use pointer manipulation which suits some problems. Unless you have a specific problem and unsafe code is the only solution then you will probably never need this.
Stackalloc will allocate data on the stack, which can be used to avoid the garbage that would be generated by repeatedly creating and destroying arrays of value types within a method.
public unsafe void DoSomeStuff()
{
byte* unmanaged = stackalloc byte[100];
byte[] managed = new byte[100];
//Do stuff with the arrays
//When this method exits, the unmanaged array gets immediately destroyed.
//The managed array no longer has any handles to it, so it will get
//cleaned up the next time the garbage collector runs.
//In the mean-time, it is still consuming memory and adding to the list of crap
//the garbage collector needs to keep track of. If you're doing XNA dev on the
//Xbox 360, this can be especially bad.
}
Paul,
As everyone here has said, that keyword directs the runtime to allocate on the stack rather than the heap. If you're interested in exactly what this means, check out this article.
http://msdn.microsoft.com/en-us/library/cx9s2sy4.aspx
this keyword is used to work with unsafe memory manipulation. By using it, you have ability to use pointer (a powerful and painful feature in C/C++)
stackalloc directs the .net runtime to allocate memory on the stack.
Most other answers are focused on the "what functionality" part of OP's question.
I believe this will answers the when and why:
When do you need this?
For the best worst-case performance with cache locality of multiple small arrays.
Now in an average app you won't need this, but for realtime sensitive scenarios it gives more deterministic performance: No GC is involved and you are all but guaranteed a cache hit.
(Because worst-case performance is more important than average performance.)
Keep in mind that the default stack size in .net is small though!
(I think it's 1MB for normal apps and 256kb for ASP.net?)
Practical use could for example include realtime sound processing.
It is like Steve pointed out, only used in unsafe code context (e.g, when you want to use pointers).
If you don't use unsafe code in your C# application, then you will never need this.

Categories

Resources