How to Iterate Through Array in C# Across Multiple Calls - c#

We have an application where we need to de-serialize some data from one stream into multiple objects.
The Data array represents a number of messages of variable length packed together. There are no message delimiting codes in the stream.
We want to do something like:
void Decode(byte[] Data)
{
Object0.ExtractMessage(Data);
Object1.ExtractMessage(Data);
Object2.ExtractMessage(Data);
...
}
where each ProcessData call knows where to start in the array. Ideally we'd do this without passing a DataIx reference in.
To do this in C++ we'd just hand around a pointer into the array, and each ProcessData function would increment it as required.
Each object class knows how its own messages are serialized and can be relied upon (in C++) to return the pointer at the beginning of the next message in the stream.
Is there some inbuilt mechanism we can use to do this (without going unsafe)? The operation is high frequency (~10kps) and very lightweight. We also don't want to go copying or trimming the array.
Thanks for your help.

Could you not just pass in and return the array index? That is basically all that a pointer is anyway, an offset from a fixed memory location.

Well this sounds like you want a simple stream (E.g. just use MemoryStream as a wrapper around your byte array: stream = new MemoryStream (data)). Just wrap the byte array into a stream and every object reads as much from the stream as it needs and then hands over the stream to the next item. It even has the benefit that you aren't forced to loading the entire byte-array at once.
Other than that you can use pointers in C# exactly the way you did in C++ (though pointers require the unsafe keyword and they are discouraged)
Alternatively you could just pass data and an index variable and then increment the index (which is, in effect, the same as using a pointer but doesn't need unsafe).

How about wrapping the data in a MemoryStream and then passing a StreamReader into the ExtractMessage method?

I guess several things come to mind.
You could simulate the action of the pointer by wrapping the byte[] in a class which also maintained the array offset. Whenever you access the array you would access it thru the class, probably via an accessor method, which returned the next byte and also incremented the offset variable. The class instance could be passed between the different ExtractMessage function calls.
How about using C++/CLI? This would allow you to use familiar C/C++ techniques, and yet be directly callable from C# without the dreaded interop.
Then of course there is the dreaded unsafe option, whereby you obtain a C# pointer to the byte[] and perform the required pointer arithmetic.

You could create a stream from the byte array.
Stream stream = new MemoryStream(data);
Then your processor's could work on streams instead.

Related

How to get MemoryStream (or another Stream) from Memory<byte> without reallocating?

Is there an existing way get a MemoryStream (or another Stream) from Memory<byte> without reallocating?
When you construct a MemoryStream from a byte[] the array is assigned as the stream's buffer. In this case there is no additional allocation or copy.
Unfortunately there's no constructor that takes Memory<byte>. You'll have to call Memory<byte>.ToArray() to get there, and that does allocate. The documentation comment for Memory<T>.ToArray() states:
Copies the contents from the memory into a new array. This heap allocates, so should generally be avoided, however it is sometimes necessary to bridge the gap with APIs written in terms of arrays.
(Internally this is implemented via Span<T>.ToArray().)
The Microsoft.Toolkit.HighPerformance package adds an AsStream() extension that creates a thin stream wrapper around Memory<byte>, allowing you to read/write to your Memory<byte> without allocations.

Why Read and ReadAync are producing totally different results

I have been using this code to capture the webcam and I have been trying to learn from it and make it better. Rider IDE suggested I should use an async variant of MemoryMappedViewStream.Read but it doesn't work at all. It produces all-black images suggesting the async and sync methods are totally different. I am wondering why that's the case?
// Working:
sourceStream.Read(MemoryMarshal.AsBytes(image.GetPixelMemoryGroup().Single().Span));
// NOT Working:
var bytes = MemoryMarshal.AsBytes(image.GetPixelMemoryGroup().Single().Span).ToArray();
await sourceStream.ReadAsync(bytes, 0, bytes.Length, token);
Repository and line of code
Those two versions are not the same. In "sync" version you obtain a reference to memory location of an image via image.GetPixelMemoryGroup(). Then you read data from sourceStream directly into that location.
In "async" version you again obtain reference to memory location via image.GetPixelMemoryGroup but then you do something different - you call ToArray. This extension method copies bytes from image memory location into new array, the one you hold in bytes variable. You then read data from sourceStream into that bytes array, NOT directly into image memory locaiton. Then you discard bytes array, so you read them to nowhere basically.
Now,MemoryMappedViewStream inherits from UnmanagedMemoryStream and all read\write operations are implemented in UnmanagedMemoryStream. This kind of stream represents data in memory and there is nothing async it can do. The only reason it even has ReadAsync is because base stream class (Stream) has those methods. Even if you manage to make ReadAsync work - in this case it will not be asynchornous anyway. As far as I know - MemoryMappedViewStream does now allow real asynchronous access, even though it could make sense, since it has underlying file.
In short - I'd just continue with sync version, because there is no benefit in this case to use "async" one. Static analyzer of course doesn't know that, it only sees that there is Async-named analog of the method you use.
await sourceStream.ReadAsync(bytes, 0, bytes.Length, token).ConfigureAwait(false);
Check like this

When to use ArraySegment<T> over Memory<T>?

I was researching the best way to return 'views' into a very large array and found ArraySegment which perfectly suited my needs. However, I then found Memory<T> which seems to behave the same, with the exception of requiring a span to view the memory.
For the use-case of creating and writing to views into a massive (2GB+) array, does it matter which one is used?
The reasons for the large arrays are they hold bytes of an image.
Resurrecting this in case someone bumps into this question.
When to use ArraySegment over Memory?
Never, unless you need to call something old that expects an ArraySegment<T>, which I doubt will be the case as it was never that popular.
ArraySegment<T> is just an array, an offset, and a length, which are all exposed directly where you can choose to ignore the offset and length and access the entirety of the array if you want to. There’s also no read-only version of ArraySegment<T>.
Span<T> and Memory<T> can be backed by arrays, similar to ArraySegment<T>, but also by strings and unmanaged memory (in the form of a pointer in Span<T>’s case, and by using a custom MemoryManager<T> in Memory<T>’s case). They provide better encapsulation by not exposing their underlying data source and have read-only versions for immutable access.
Back then, we had to pass the array/offset/count trio to a lot of APIs (APIs that needed a direct reference of an array), but now that Span<T> and Memory<T> exist and are widely supported by most, if not all, .NET APIs that need to interact with continuous blocks of memory, you should have no reason to use an ArraySegment<T>.
See also: Memory- and span-related types - MS Docs
Memory is sort of a wrapper around Span - one that doesn't have to be on the stack. And as the link provided by CoolBots pointed out it's an addition to arrays and array segments not really a replacement for them.
The main reason you would want to consider using Span/Memory is for performance and flexibility. Span gives you access to the memory directly instead of copying it back and forth to the array, and it allows you to treat the memory in a flexible way. Below I'll go from using the array as bytes to using it as an array of uint.
I'll skip right to Span but you could use AsMemory instead so you could pass that around easier. But it'd still boil down to getting the Span from the Memory.
Here's an example:
const int dataSize = 512;
const int segSize = 256;
byte[] rawdata = new byte[dataSize];
var segment = new ArraySegment<byte>(rawdata, segSize, segSize);
var seg1 = segment.AsSpan();
var seg1Uint = MemoryMarshal.Cast<byte, uint>(seg1);
for (int i = 0; i < segSize / sizeof(uint); ++i)
{
ref var data = ref seg1Uint[i];
data = 0x000066;
}
foreach (var b in rawdata)
Console.WriteLine(b);

arrays of structs need advice

I made an array of structs to represent map data that gets drawn; however I didn't double check it till it was too late: when I load in a new map I get either an "out of memory exception" (if i try to make a new array struct first) or I get a screwed up map that would require a lot of recodeing to get it to work right (if i just initialize a big map first)... maybe too much.
So now I'm wondering if there's a safe way to reallocate the array of structs since the data when I do it is thrown away anyway (i.e. I dont need to copy the data, just resize the array and reset new data from the file).
Is this possible safely?
Or should I just look to use something else, like an arraylist or list?
What I need here is basically indexing speed and reading speed more then anything.
A large and contiguous block of memory is sometimes difficult to allocate. Consider allocating more jagged data. Access time will be slightly degraded, but you will be able to allocate more memory.
Read more about jagged arrays

Is c# compiler deciding to use stackalloc by itself?

I found a blog entry which suggests that sometimes c# compiler may decide to put array on the stack instead of the heap:
Improving Performance Through Stack Allocation (.NET Memory Management: Part 2)
This guy claims that:
The compiler will also sometimes decide to put things on the stack on its own. I did an experiment with TestStruct2 in which I allocated it both an unsafe and normal context. In the unsafe context the array was put on the heap, but in the normal context when I looked into memory the array had actually been allocated on the stack.
Can someone confirm that?
I was trying to repeat his example, but everytime I tried array was allocated on the heap.
If c# compiler can do such trick without using 'unsafe' keyword I'm specially intrested in it. I have a code that is working on many small byte arrays (8-10 bytes long) and so using heap for each new byte[...] is a waste of time and memory (especially that each object on heap has 8 bytes overhead needed for garbage collector).
EDIT: I just want to describe why it's important to me:
I'm writing library that is communicating with Gemalto.NET smart card which can have .net code working in it. When I call a method that returns something, smart card return 8 bytes that describes me the exact Type of return value. This 8 bytes are calculated by using md5 hash and some byte arrays concatenations.
Problem is that when I have an array that is not known to me I must scan all types in all assemblies loaded in application and for each I must calculate those 8 bytes until I find the same array.
I don't know other way to find the type, so I'm trying to speed it up as much as possible.
Author of the linked-to article here.
It seems impossible to force stack allocation outside of an unsafe context. This is likely the case to prevent some classes of stack overflow condition.
Instead, I recommend using a memory recycler class which would allocate byte arrays as needed but also allow you to "turn them in" afterward for reuse. It's as simple as keeping a stack of unused byte arrays and, when the list is empty, allocating new ones.
Stack<Byte[]> _byteStack = new Stack<Byte[]>();
Byte[] AllocateArray()
{
Byte[] outArray;
if (_byteStack.Count > 0)
outArray = _byteStack.Pop();
else
outArray = new Byte[8];
return outArray;
}
void RecycleArray(Byte[] inArray)
{
_byteStack.Push(inArray);
}
If you are trying to match a hash with a type it seems the best idea would be to use a Dictionary for fast lookups. In this case you could load all relevant types at startup, if this causes program startup to become too slow you might want to consider caching them the first time each type is used.
From your line:
I have a code that is working on many small byte arrays (8-10 bytes long)
Personally, I'd be more interested in allocating a spare buffer somewhere that different parts of your code can re-use (while processing the same block). Then you don't have any creation/GC to worry about. In most cases (where the buffer is used for very discreet operations) with a scratch-buffer, you can even always assume that it is "all yours" - i.e. every method that needs it can assume that they can start writing at zero.
I use this single-buffer approach in some binary serialization code (while encoding data); it is a big boost to performance. In my case, I pass a "context" object between the layers of serialization (that encapsulates the scratch-buffer, the output-stream (with some additional local buffering), and a few other oddities).
System.Array (the class representing an array) is a reference type and lives on the heap. You can only have an array on the stack if you use unsafe code.
I can't see where it says otherwise in the article that you refer to. If you want to have a stack allocated array, you can do something like this:
decimal* stackAllocatedDecimals = stackalloc decimal[4];
Personally I wouldn't bother- how much performance do you think you will gain by this approach?
This CodeProject article might be useful to you though.

Categories

Resources