C# Using span with SocketAsyncEventArgs

C# Using span with SocketAsyncEventArgs - c#

I would like to use new Span to send unmanaged data straight to the socket using SocketAsyncEventArgs but it seems that SocketAsyncEventArgs can only accept Memory<byte> which cannot be initialized with byte * or IntPtr.
So please is there a way to do use span with SocketAsyncEventArgs?
Thank you for your help.

As already mentioned in the comments, Span is the wrong tool here - have you looked at using Memory instead? As you stated, the SetBuffer method does accept that as a parameter - is there a reason you can't use it?
See also this article for a good explanation on how stack vs heap allocation applies to Span and Memory. It includes this example, using a readonly Memory<Foo> buffer:
public struct Enumerable : IEnumerable<Foo>
{
readonly Stream stream;
public Enumerable(Stream stream)
{
this.stream = stream;
}
public IEnumerator<Foo> GetEnumerator() => new Enumerator(this);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public struct Enumerator : IEnumerator<Foo>
{
static readonly int ItemSize = Unsafe.SizeOf<Foo>();
readonly Stream stream;
readonly Memory<Foo> buffer;
bool lastBuffer;
long loadedItems;
int currentItem;
public Enumerator(Enumerable enumerable)
{
stream = enumerable.stream;
buffer = new Foo[100]; // alloc items buffer
lastBuffer = false;
loadedItems = 0;
currentItem = -1;
}
public Foo Current => buffer.Span[currentItem];
object IEnumerator.Current => Current;
public bool MoveNext()
{
if (++currentItem != loadedItems) // increment current position and check if reached end of buffer
return true;
if (lastBuffer) // check if it was the last buffer
return false;
// get next buffer
var rawBuffer = MemoryMarshal.Cast<Foo, byte>(buffer);
var bytesRead = stream.Read(rawBuffer);
lastBuffer = bytesRead < rawBuffer.Length;
currentItem = 0;
loadedItems = bytesRead / ItemSize;
return loadedItems != 0;
}
public void Reset() => throw new NotImplementedException();
public void Dispose()
{
// nothing to do
}
}
}

You should copy the data to managed memory first, use Marshal or Buffer class.
If not think about when the C code delete the returned pointer what will happen to the send data?

There's a complete example (and implementation of the class) on the MSDN page for the SocketAsyncEventArgs (just follow the link). It shows the proper use of the class and may give you the guidence you're looking for.
Also, as Shingo said, it should all be in managed code, not in pointers.

Related

Calling OpenReadStream of an IFormFile multiple times

I need to have the stream of the file in 2 different locations. In the code the IFormFile is already passed as parameter in the 2 methods. I thought of either modifying the methods and calling the OpenReadStream in the beginning and pass the stream as param or calling OpenReadStream separately.
I inspected the dissasembled code and OpenReadStream does this:
return new ReferenceReadStream(_baseStream, _baseStreamOffset, Length);
and the ReferenceReadStream class does this in the constructor:
public ReferenceReadStream(Stream inner, long offset, long length)
{
if (inner == null)
{
throw new ArgumentNullException("inner");
}
_inner = inner;
_innerOffset = offset;
_length = length;
_inner.Position = offset;
}
In my understanding the base stream is the same and it doesn't matter calling OpenReadStream multiple times.
What worries me is if I'll run into problems when I start using Seek method.
Does anyone know what's the correct usage of OpenReadStream in this senario?

Apparently it's not safe to call OpenReadStream multiple times.
When Read method is called, it calls this method:
private void VerifyPosition()
{
if (_inner.Position == _innerOffset + _position)
{
return;
}
throw new InvalidOperationException("The inner stream position has changed unexpectedly.");
}
I was able to trigger this exception with the following code:
var s = file.OpenReadStream();
s.Seek(10, SeekOrigin.Begin);
var b = new byte[2];
var c = s.Read(b);
var s2 = file.OpenReadStream();
c = s.Read(b);

What is the best choice of a collection for multiplexed 2D data in modern C# given the constraint of using preallocated memory?

I am trying to improve the usability of an open source C# API that wraps a C library. The underlying library pulls multiplexed 2D data from a server over a network connection. In C, the samples come out as a pointer to the data (many types are supported), e.g. float*. The pull function returns the number of data points (frames * channels, but channels is known and never changes) so that the client knows how much new data is being passed. It is up to the client to allocate enough memory behind these pointers. For example, if one wants to pull floats the function signature is something like:
long pull_floats(float *floatbuf);
and floatbuf better have sizeof(float)*nChannels*nMoreFramesThanIWillEverGet bytes behind it.
In order to accommodate this, the C# wrapper currently uses 2D arrays, e.g. float[,]. The way it is meant to be used is a literal mirror to the C method---to allocate more memory than one ever expects to these arrays and return the number of data points so that the client knows how many frames of data have just come in. The underlying dll handler has a signature like:
[DllImport(libname, CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Ansi, ExactSpelling = true)]
public static extern uint pull_floats(IntPtr obj, float[,] data_buffer);
And the C# wrapper itself has a definition like:
int PullFloats(float[,] floatbuf)
{
// DllHandler has the DllImport code
// Obj is the class with the handle to the C library
uint res = DllHandler.pull_floats(Obj, floatbuf);
return res/floatbuf.GetLength(1);
}
The C++ wrapper for this library is idiomatic. There, the client supplies a vector<vector<T>>& to the call and in a loop, each frame gets pushed into the multiplexed data container. Something like:
void pull_floats_cpp(std::vector<std::vector<float>>& floatbuf)
{
std::vector<float> frame;
floatbuf.clear();
while(pull_float_cpp(frame)) // C++ function to pull only one frame at a time
{
floatbuf.push_back(frame); // (memory may be allocated here)
}
}
This works because in C++ you can pun a reference to a std::vector to a primitive type like float*. That is, the vector frame from above goes into a wrapper like:
void pull_float_cpp(std:vector<float>& frame)
{
frame.resize(channel_count); // memory may be allocated here as well...
pull_float_c(&frame[0]);
}
where pull_float_c has a signature like:
void pull_float_c(float* frame);
I would like to do something similar in the C# API. Ideally the wrapper method would have a signature like:
void PullFloats(List<List<float>> floatbuf);
instead of
int PullFloats(float[,] floatbuf);
so that clients don't have work with 2D arrays and (more importantly) don't have to keep track of the number of frames they get. That should be inherent to the dimensions of the containing object so that clients can use enumeration patterns and foreach. But, unlike C++ std::vector, you can't pun a List to an array. As far as I know ToArray allocates memory and does a copy so that not only is there memory being allocated, but the new data doesn't go into the List of Lists that the array was built from.
I hope that the psuedocode + explanation of this problem is clear. Any suggestions for how to tackle it in an elegant C# way is much appreciated. Or, if someone can assure me that this is simply a rift between C and C# that may not be breached without imitating C-style memory management, at least I would know not to think about this any more.
Could a MemoryStream or a Span help here?

I came up with a pretty satisfactory way to wrap pre-allocated arrays in Lists. Please anyone let me know if there is a better way to do this, but according to this I think this is about as good as it gets---if the answer is to make a List out of an array, anyway. According to my debugger, 100,000 iterations of 5000 or so floats at a time,takes less than 12 seconds (which is far better than the underlying library demands in practice, but worse than I would like to see), the memory use stays flat at around 12 Mb (no copies), and the GC isn't called until the program exits:
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
namespace ListArrayTest
{
[StructLayout(LayoutKind.Explicit, Pack = 2)]
public class GenericDataBuffer
{
[FieldOffset(0)]
public int _numberOfBytes;
[FieldOffset(8)]
private readonly byte[] _byteBuffer;
[FieldOffset(8)]
private readonly float[] _floatBuffer;
[FieldOffset(8)]
private readonly int[] _intBuffer;
public byte[] ByteBuffer => _byteBuffer;
public float[] FloatBuffer => _floatBuffer;
public int[] IntBuffer => _intBuffer;
public GenericDataBuffer(int sizeToAllocateInBytes)
{
int aligned4Bytes = sizeToAllocateInBytes % 4;
sizeToAllocateInBytes = (aligned4Bytes == 0) ? sizeToAllocateInBytes : sizeToAllocateInBytes + 4 - aligned4Bytes;
// Allocating the byteBuffer is co-allocating the floatBuffer and the intBuffer
_byteBuffer = new byte[sizeToAllocateInBytes];
_numberOfBytes = _byteBuffer.Length;
}
public static implicit operator byte[](GenericDataBuffer genericDataBuffer)
{
return genericDataBuffer._byteBuffer;
}
public static implicit operator float[](GenericDataBuffer genericDataBuffer)
{
return genericDataBuffer._floatBuffer;
}
public static implicit operator int[](GenericDataBuffer genericDataBuffer)
{
return genericDataBuffer._intBuffer;
}
}
public class ListArrayTest<T>
{
private readonly Random _random = new();
const int _channels = 10;
const int _maxFrames = 500;
private readonly T[,] _array = new T[_maxFrames, _channels];
private readonly GenericDataBuffer _genericDataBuffer;
int _currentFrameCount;
public int CurrentFrameCount => _currentFrameCount;
// generate 'data' to pull
public void PushValues()
{
int frames = _random.Next(_maxFrames);
if (frames == 0) frames++;
for (int ch = 0; ch < _array.GetLength(1); ch++)
{
for (int i = 0; i < frames; i++)
{
switch (_array[0, 0]) // in real life this is done with type enumerators
{
case float: // only implementing float to be concise
_array[i, ch] = (T)(object)(float)i;
break;
}
}
}
_currentFrameCount = frames;
}
private void CopyFrame(int frameIndex)
{
for (int ch = 0; ch < _channels; ch++)
switch (_array[0, 0]) // in real life this is done with type enumerators
{
case float: // only implementing float to be concise
_genericDataBuffer.FloatBuffer[ch] = (float)(object)_array[frameIndex, ch];
break;
}
}
private void PullFrame(List<T> frame, int frameIndex)
{
frame.Clear();
CopyFrame(frameIndex);
for (int ch = 0; ch < _channels; ch++)
{
switch (frame)
{
case List<float>: // only implementing float to be concise
frame.Add((T)(object)BitConverter.ToSingle(_genericDataBuffer, ch * 4));
break;
}
}
}
public void PullChunk(List<List<T>> list)
{
list.Clear();
List<T> frame = new();
int frameIndex = 0;
while (frameIndex != _currentFrameCount)
{
PullFrame(frame, frameIndex);
list.Add(frame);
frameIndex++;
}
}
public ListArrayTest()
{
switch (_array[0, 0])
{
case float:
_genericDataBuffer = new(_channels * 4);
break;
}
}
}
internal class Program
{
static void Main(string[] args)
{
ListArrayTest<float> listArrayTest = new();
List<List<float>> chunk = new();
for (int i = 0; i < 100; i++)
{
listArrayTest.PushValues();
listArrayTest.PullChunk(chunk);
Console.WriteLine($"{i}: first value: {chunk[0][0]}");
}
}
}
}
Update
...and, using a nifty trick I found from Mark Heath (https://github.com/markheath), I can effectively type pun List<List<T>> back to a T* the same way as does the C++ API with std::vector<std::vector<T>> (see class GenericDataBuffer). It is a lot more complicated under the hood since one must be so verbose with type casting in C#, but it compiles without complaint and it works like a charm. Here is the blog post I stole the idea from: https://www.markheath.net/post/wavebuffer-casting-byte-arrays-to-float.
This also lets me ditch the need for clients being responsible to pre-allocate, at the cost of (as in the C++ wrapper) of having to do a bit of dynamic allocation internally. According to the debugger the GC doesn't get called and the memory stays flat, so I guess the Lists allocations are not relying on digging into the heap.

An enumerator wrapper that pre-buffers a number of items from underlying enumerator in advance

Suppose I have some IEnumerator<T> which does a fair amount of processing inside the MoveNext() method.
The code consuming from that enumerator does not just consume as fast as data is available, but occasionally waits (the specifics of which are irrelevant to my question) in order to synchronize the time when it needs to resume consumption. But when it does the next call to MoveNext(), it needs the data as fast as possible.
One way would be to pre-consume the whole stream into some list or array structure for instant enumeration. That would be a waste of memory however, as at any single point in time, only one item is in use, and it would be prohibitive in cases where the whole data does not fit into memory.
So is there something generic in .net that wraps an enumerator / enumerable in a way that it asynchronously pre-iterates the underlying enumerator a couple of items in advance and buffers the results so that it always has a number of items available in its buffer and the calling MoveNext will never have to wait? Obviously items consumed, i.e. iterated over by a subsequent MoveNext from the caller, would be removed from the buffer.
N.B. Part of what I'm trying to do is also called Backpressure, and, in the Rx world, has already been implemented in RxJava and is under discussion in Rx.NET. Rx (observables that push data) can be considered the opposite approach of enumerators (enumerators allow pulling of data). Backpressure is relatively easy in the pulling approach, as my answer shows: Just pause consumption. It's harder when pushing, requiring an additional feedback mechanism.

A more concise alternative to your custom enumerable class is to do this:
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> source, int bufferSize)
{
var queue = new BlockingCollection<T>(bufferSize);
Task.Run(() => {
foreach(var i in source) queue.Add(i);
queue.CompleteAdding();
});
return queue.GetConsumingEnumerable();
}
This can be used as:
var slowEnumerable = GetMySlowEnumerable();
var buffered = slowEnumerable.Buffer(10); // Populates up to 10 items on a background thread

There are different ways to implement this yourself, and I decided to use
a single dedicated thread per enumerator that does the asynchronous pre-buffering
a fixed number of elements to pre-buffer
Which is perfect for my case at hand (only a few, very long-running enumerators), but e.g. creating a thread might be too heavy if you use lots and lots of enumerators, and the fixed number of elements may be too inflexible if you need something more dynamic, based perhaps on the actual content of the items.
I have so far only tested its main feature, and some rough edges may remain. It can be used like this:
int bufferSize = 5;
IEnumerable<int> en = ...;
foreach (var item in new PreBufferingEnumerable<int>(en, bufferSize))
{
...
Here's the gist of the Enumerator:
class PreBufferingEnumerator<TItem> : IEnumerator<TItem>
{
private readonly IEnumerator<TItem> _underlying;
private readonly int _bufferSize;
private readonly Queue<TItem> _buffer;
private bool _done;
private bool _disposed;
public PreBufferingEnumerator(IEnumerator<TItem> underlying, int bufferSize)
{
_underlying = underlying;
_bufferSize = bufferSize;
_buffer = new Queue<TItem>();
Thread preBufferingThread = new Thread(PreBufferer) { Name = "PreBufferingEnumerator.PreBufferer", IsBackground = true };
preBufferingThread.Start();
}
private void PreBufferer()
{
while (true)
{
lock (_buffer)
{
while (_buffer.Count == _bufferSize && !_disposed)
Monitor.Wait(_buffer);
if (_disposed)
return;
}
if (!_underlying.MoveNext())
{
lock (_buffer)
_done = true;
return;
}
var current = _underlying.Current; // do outside lock, in case underlying enumerator does something inside get_Current()
lock (_buffer)
{
_buffer.Enqueue(current);
Monitor.Pulse(_buffer);
}
}
}
public bool MoveNext()
{
lock (_buffer)
{
while (_buffer.Count == 0 && !_done && !_disposed)
Monitor.Wait(_buffer);
if (_buffer.Count > 0)
{
Current = _buffer.Dequeue();
Monitor.Pulse(_buffer); // so PreBufferer thread can fetch more
return true;
}
return false; // _done || _disposed
}
}
public TItem Current { get; private set; }
public void Dispose()
{
lock (_buffer)
{
if (_disposed)
return;
_disposed = true;
_buffer.Clear();
Current = default(TItem);
Monitor.PulseAll(_buffer);
}
}

Mock Networkstream.Read

I've been trying to mock a network stream for some unit tests.
So far, using Moq the best I've come up with is to use a wrapper for the stream and then mock my interface.
public interface INetworkstreamWrapper
{
int Read(byte[] buffer, int offset,int size);
void Flush();
bool DataAvailable { get; }
bool CanRead { get; }
void close();
}
Question is, whilst that gives me a start, I actually want to test some byte array values as read into my read buffer. How can I return some test data into the buffer when calling Read() on the mock object?

You can use a callback to gain access to the passed parameter and alter them:
public void TestRead()
{
var streamMock = new Mock<INetworkstreamWrapper>();
streamMock
.Setup(m => m.Read(It.IsAny<byte[]>(),
It.IsAny<int>(),
It.IsAny<int>()))
.Callback((byte[] buffer, int offset, int size) => buffer[0] = 128);
var myBuffer = new byte[10];
streamMock.Object.Read(myBuffer,0,10);
Assert.AreEqual(128, myBuffer[0]);
}
But I would suggest you rethink your strategy about that kind of mocking, see:
http://davesquared.net/2011/04/dont-mock-types-you-dont-own.html
Maybe you could write an integration test instead, or make your code depend on the abstract Stream class.
In your test you could then use a MemoryStream to check your class correct behaviour when fetching data from the Stream.

You can use Setup to do this:
[Test]
public void MockStreamTest()
{
var mock = new Mock<INetworkstreamWrapper>();
int returnValue = 1;
mock.Setup(x => x.Read(It.IsAny<byte[]>(), It.IsAny<int>(), It.IsAny<int>()))
.Returns((byte[] r,int o, int s) =>
{
r[0] = 1;
return returnValue;
});
var bytes = new byte[1024];
var read = mock.Object.Read(bytes , 1, 1);
//Verify the the method was called with expected arguments like this:
mock.Verify(x => x.Read(bytes, 1, 1), Times.Once());
Assert.AreEqual(returnValue, read);
Assert.AreEqual(1,bytes[0]);
}

In Rhinomocks, this is very easy, as the important methods on NetworkStream are virtual, and so you can simply create a stub using the MockRepository. Even better, it understands that the byte array passed to the Read method is an output array, so you can completely stub out a call to Read() using this code:
NetworkStream stream = MockRepository.GenerateStub<NetworkStream>();
stream.Stub(x => x.Read(Arg<byte[]>.Out(bytes).Dummy, Arg<int>.Is.Anything, Arg<int>.Is.Anything))
.Return(bytes.Length);
No wrapper required.
I've had very little experience with Moq but I'd be surprised if it didn't support something similar.

What is an efficent method for in-order processing of events using CCR?

I was experimenting with CCR iterators as a solution to a task that requires parallel processing of tons of data feeds, where the data from each feed needs to be processed in order. None of the feeds are dependent on each other, so the in-order processing can be paralleled per-feed.
Below is a quick and dirty mockup with one integer feed, which simply shoves integers into a Port at a rate of about 1.5K/second, and then pulls them out using a CCR iterator to keep the in-order processing guarantee.
class Program
{
static Dispatcher dispatcher = new Dispatcher();
static DispatcherQueue dispatcherQueue =
new DispatcherQueue("DefaultDispatcherQueue", dispatcher);
static Port<int> intPort = new Port<int>();
static void Main(string[] args)
{
Arbiter.Activate(
dispatcherQueue,
Arbiter.FromIteratorHandler(new IteratorHandler(ProcessInts)));
int counter = 0;
Timer t = new Timer( (x) =>
{ for(int i = 0; i < 1500; ++i) intPort.Post(counter++);}
, null, 0, 1000);
Console.ReadKey();
}
public static IEnumerator<ITask> ProcessInts()
{
while (true)
{
yield return intPort.Receive();
int currentValue;
if( (currentValue = intPort) % 1000 == 0)
{
Console.WriteLine("{0}, Current Items In Queue:{1}",
currentValue, intPort.ItemCount);
}
}
}
}
What surprised me about this greatly was that CCR could not keep up on a Corei7 box, with the queue size growing without bounds. In another test to measure the latency from the Post() to the Receive() under a load or ~100 Post/sec., the latency between the first Post() and Receive() in each batch was around 1ms.
Is there something wrong with my mockup? If so, what is a better way of doing this using CCR?

Yes, I agree, this does indeed seem weird. Your code seems initially to perform smoothly, but after a few thousand items, processor usage rises to the point where performance is really lacklustre. This disturbs me and suggests a problem in the framework. After a play with your code, I can't really identify why this is the case. I'd suggest taking this problem to the Microsoft Robotics Forums and seeing if you can get George Chrysanthakopoulos (or one of the other CCR brains) to tell you what the problem is. I can however surmise that your code as it stands is terribly inefficient.
The way that you are dealing with "popping" items from the Port is very inefficient. Essentially the iterator is woken each time there is a message in the Port and it deals with only one message (despite the fact that there might be several hundred more in the Port), then hangs on the yield while control is passed back to the framework. At the point that the yielded receiver causes another "awakening" of the iterator, many many messages have filled the Port. Pulling a thread from the Dispatcher to deal with only a single item (when many have piled up in the meantime) is almost certainly not the best way to get good throughput.
I've modded your code such that after the yield, we check the Port to see if there are any further messages queued and deal with them too, thereby completely emptying the Port before we yield back to the framework. I've also refactored your code somewhat to use CcrServiceBase which simplifies the syntax of some of the tasks you are doing:
internal class Test:CcrServiceBase
{
private readonly Port<int> intPort = new Port<int>();
private Timer timer;
public Test() : base(new DispatcherQueue("DefaultDispatcherQueue",
new Dispatcher(0,
"dispatcher")))
{
}
public void StartTest() {
SpawnIterator(ProcessInts);
var counter = 0;
timer = new Timer(x =>
{
for (var i = 0; i < 1500; ++i)
intPort.Post(counter++);
}
,
null,
0,
1000);
}
public IEnumerator<ITask> ProcessInts()
{
while (true)
{
yield return intPort.Receive();
int currentValue = intPort;
ReportCurrent(currentValue);
while(intPort.Test(out currentValue))
{
ReportCurrent(currentValue);
}
}
}
private void ReportCurrent(int currentValue)
{
if (currentValue % 1000 == 0)
{
Console.WriteLine("{0}, Current Items In Queue:{1}",
currentValue,
intPort.ItemCount);
}
}
}
Alternatively, you could do away with the iterator completely, as it's not really well used in your example (although I'm not entirely sure what effect this has on the order of processing):
internal class Test : CcrServiceBase
{
private readonly Port<int> intPort = new Port<int>();
private Timer timer;
public Test() : base(new DispatcherQueue("DefaultDispatcherQueue",
new Dispatcher(0,
"dispatcher")))
{
}
public void StartTest()
{
Activate(
Arbiter.Receive(true,
intPort,
i =>
{
ReportCurrent(i);
int currentValue;
while (intPort.Test(out currentValue))
{
ReportCurrent(currentValue);
}
}));
var counter = 0;
timer = new Timer(x =>
{
for (var i = 0; i < 500000; ++i)
{
intPort.Post(counter++);
}
}
,
null,
0,
1000);
}
private void ReportCurrent(int currentValue)
{
if (currentValue % 1000000 == 0)
{
Console.WriteLine("{0}, Current Items In Queue:{1}",
currentValue,
intPort.ItemCount);
}
}
}
Both these examples significantly increase throughput by orders of magnitude. Hope this helps.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.