I'm writing a class that exposes a subsection of a stream for reading. Since data may be read from multiple different subsections at the same time, only one operation may be active at any one time.
I had the idea of locking the underlying stream before every operation. Is locking the stream around the BeginRead call sufficient to ensure that concurrent asynchronous reads from different positions in the underlying stream happen correctly?
public sealed class SubStream : Stream
{
// ...
public override IAsyncResult BeginRead(byte[] buffer, int offset, int count,
AsyncCallback callback, object state)
{
lock (this.baseStream)
{
this.baseStream.Seek(this.offset + this.position, SeekOrigin.Begin);
return this.baseStream.BeginRead(buffer, offset, count,
callback, state);
}
}
public override int EndRead(IAsyncResult asyncResult)
{
int read;
lock (this.baseStream)
{
read = baseStream.EndRead(asyncResult);
this.position += read;
}
return read;
}
// Read() and ReadByte() also lock on this.baseStream (not shown).
// ...
}
For example, if thread A calls BeginRead, a lock on the base stream is acquired. Now thread B calls BeginRead and has to wait for the lock to be released. Thread A sets the position of the base stream and starts an asynchronous read operation. Then releases the lock. Thread B then acquires the lock and changes the position of the base stream and starts another asynchronous read operation. Then, sometime later, the asynchronous read from thread A completes. Can I be sure that this reads from the original position in the base stream? If not, how do I fix it?
Here you might end up with multiple threads calling BeginRead on the same instance of resource (baseStream). As per MSDN, the "EndRead must be called exactly once for every call to BeginRead. Failing to end a read process before beginning another read can cause undesirable behavior such as deadlock." In you case, I recon a trouble 'if Thread B is on Seek (on baseStream) while Thread A in the middle of executing their EndRead(callback)'.
Due to the nature of requirement, you are better off with wrapping multi-threaded access with synchronous I/O. This means, the current implementation can be amended with synchronous I/O instead of asynchronous I/O. Also, you may want to consider informing queuing threads about the completion of former threads using Monitor.WaitOne (baseStream) and Monitor.Pulse(baseStream) or Monitor.PulseAll(baseStream).
Alternately, I would like to throw another idea of Memory-Mapped file for segmented style.
In the given code snippet you'll read multiple times from the same position. Move the position update to the BeginRead function. Apart from that you are honoring the contract of the FileStream class by never concurrently calling its methods.
Related
Before the introduction of async-await programming into C#, how was one able to put a network request into another thread and yield execution time back to the CPU until a response is received so that this thread will not waste CPU time?
Because when CPU allocates time to this thread and thread sits idle waiting for a response, that would be a waste of CPU time, right?
In several ways, however Asynchronous Programming Model (APM) was the go-to for this type of Asynchrony
An asynchronous operation that uses the IAsyncResult design pattern is
implemented as two methods named BeginOperationName and
EndOperationName that begin and end the Asynchronous Operation
OperationName respectively. For example, the FileStream class provides
the BeginRead and EndRead methods to Asynchronously read bytes from a
file. These methods implement the asynchronous version of the Read
method.
To answer your question
Because when CPU allocates time to this thread and thread sits idle
waiting for a response, that would be a waste of CPU time, right?
No blocking a thread and waiting for a completion port to call back doesn't cause CPU cycles to run away, however polling on a thread will.
There is a lot to how this works, however an example use can be seen here
Example of usage
private static void TestWrite()
{
// Must specify FileOptions.Asynchronous otherwise the BeginXxx/EndXxx methods are
// handled synchronously.
FileStream fs = new FileStream(Program.FilePath, FileMode.OpenOrCreate,
FileAccess.Write, FileShare.None, 8, FileOptions.Asynchronous);
string content = "A quick brown fox jumps over the lazy dog";
byte[] data = Encoding.Unicode.GetBytes(content);
// Begins to write content to the file stream.
Console.WriteLine("Begin to write");
fs.BeginWrite(data, 0, data.Length, Program.OnWriteCompleted, fs);
Console.WriteLine("Write queued");
}
private static void OnWriteCompleted(IAsyncResult asyncResult)
{
// End the async operation.
FileStream fs = (FileStream)asyncResult.AsyncState;
fs.EndWrite(asyncResult);
// Close the file stream.
fs.Close();
Console.WriteLine("Write completed");
// Test async read bytes from the file stream.
Program.TestRead();
}
The situation I am uncertain of concerns the usage of a "threadsafe" PipeStream where multiple threads can add messages to be written. If there is no queue of messages to be written, the current thread will begin writing to the reading party. If there is a queue, and the queue grows while the pipe is writing, I want the thread that begun writing to deplete the queue.
I "hope" that this design (demonstrated below) discourages the continuous entering/releasing of the SemaphoreSlim and decrease the number of tasks scheduled. I say "hope" because I should test whether this complication has any positive performance implications. However, before even testing this I should first understand if the code does what I think it will, so please consider the following class, and below it a sequence of events;
Note: I understand that execution of tasks is not tied to any particular thread, but I find this is the easiest way to explain.
class SemaphoreExample
{
// Wrapper around a NamedPipeClientStream
private readonly MessagePipeClient m_pipe =
new MessagePipeClient("somePipe");
private readonly SemaphoreSlim m_semaphore =
new SemaphoreSlim(1, 1);
private readonly BlockingCollection<Message> m_messages =
new BlockingCollection<Message>(new ConcurrentQueue<Message>());
public Task Send<T>(T content)
where T : class
{
if (!this.m_messages.TryAdd(new Message<T>(content)))
throw new InvalidOperationException("No more requests!");
Task dequeue = TryDequeue();
return Task.FromResult(true);
// In reality this class (and method) is more complex.
// There is a similiar pipe (and wrkr) in the other direction.
// The "sent jobs" is kept in a dictionary and this method
// returns a task belonging to a completionsource tied
// to the "sent job". The wrkr responsible for the other
// pipe reads a response and sets the corresponding
// completionsource.
}
private async Task TryDequeue()
{
if (!this.m_semaphore.Wait(0))
return; // someone else is already here
try
{
Message message;
while (this.m_messages.TryTake(out message))
{
await this.m_pipe.WriteAsync(message);
}
}
finally { this.m_semaphore.Release(); }
}
}
Wrkr1 finishes writing to the pipe. (in TryDequeue)
Wrkr1 determines queue is empty. (in TryDequeue)
Wrkr2 adds item to queue. (in Send)
Wrkr2 determines Wrkr1 occupies the Semaphore, returns. (in Send)
Wrkr1 releases the Semaphore. (in TryDequeue)
Queue is left with 1 item that wont be acted upon for x amount of Time.
Is this sequence of events possible? Should I forget this idea altogether and have every call to "Send" await on "TryDeque" and the semaphore within it? Perhaps the potential performance implications of scheduling another task per method call is negligible, even at a "high" frequency.
UPDATE:
Following the advice of Alex I am doing the following;
Let the caller of "Send" specify a "maxWorkload" integer that specifies how many items the caller is prepared to do (for other callers, in the worst case) before delegating work to another thread to handle any extra work. However, before creating the new thread, other callers of "Send" is given an opportunity to enter the semaphore, thereby possibly preventing the use of an additional thread.
To not let any work be left lingering in the queue, any worker who successfully entered the semaphore and did some work must check if there is any new work added after exiting the semaphore. If this is true the same worker will try to re-enter (if "maxWorkload" is not reached) or delegate work as described above.
Example below: Send now sets up "TryPool" as a continuation of "TryDequeue". "TryPool" only begins if "TryDequeue" returns true (i.e. did some work while having entered the semaphore).
// maxWorkload cannot be -1 for this method
private async Task<bool> TryDequeue(int maxWorkload)
{
int currWorkload = 0;
while (this.m_messages.Count != 0 && this.m_semaphore.Wait(0))
{
try
{
currWorkload = await Dequeue(currWorkload, maxWorkload);
if (currWorkload >= maxWorkload)
return true;
}
finally
{
this.m_semaphore.Release();
}
}
return false;
}
private Task TryPool()
{
if (this.m_messages.Count == 0 || !this.m_semaphore.Wait(0))
return Task<bool>.FromResult(false);
return Task.Run(async () =>
{
do
{
try
{
await Dequeue(0, -1);
}
finally
{
this.m_semaphore.Release();
}
}
while (this.m_messages.Count != 0 && this.m_semaphore.Wait(0));
});
}
private async Task<int> Dequeue(int currWorkload, int maxWorkload)
{
while (currWorkload < maxWorkload || maxWorkload == -1)
{
Message message;
if (!this.m_messages.TryTake(out message))
return currWorkload;
await this.m_pipe.WriteAsync(message);
currWorkload++;
}
return maxWorkload;
}
I tend to call this pattern the "GatedBatchWriter", i.e. the first thread through the gate handles a batch of tasks; its own and a number of others on behalf of other writers, until it has done enough work.
This pattern is primarily useful, when it is more efficient to batch work, because of overheads associated with that work. E.g. writing larger blocks to disk in one go, instead of multiple small ones.
And yes, this particular pattern has a specific race condition to be aware of: The "responsible writer", i.e. the one that got through the gate, determines that no more messages are in the queue and stops before releasing the semaphore (i.e. its write responsibility). A second writer arrived and in between those two decision points failed to acquire write responsibility. Now there is a message in the queue that will not be delivered (or delivered late, when the next writer arrives).
Additionally, what you are doing now, is not fair, in terms of scheduling. If there are many messages, the queue might never be empty, and the writer that got through the gate will be busy writing messages on behalf of the others for all eternity. You need to limit the batch size for the responsible writer.
Some other things you may want to change are:
Have your Message contain a task completion token.
Have writers that could not acquire the write responsibility enqueue their message and wait for any of two task completions: the task completion associated with their message, the releasing of the write responsibility.
Have the responsible writer set the completion for messages that it processed.
Have the responsible writer release it's write responsibility when it has done enough work.
When a waiting writer is woken up by one of the two task completions:
if it was due to the completion token on its message, it can go its merry way.
otherwise, try to acquire the write responsibility, rinse, repeat...
One more note: if there are a lot of messages, i.e. a high message load on average, a dedicated thread / long running task handling the queue will generally have a better performance.
I know something about the IOCP, but I'm a little confused with APM.
static FileStream fs;
static void Main(string[] args)
{
fs = new FileStream(#"c:\bigfile.txt", FileMode.Open);
var buffer = new byte[10000000];
IAsyncResult asyncResult = fs.BeginRead(buffer, 0, 10000000, OnCompletedRead, null);
Console.WriteLine("async...");
int bytesRead = fs.EndRead(asyncResult);
Console.WriteLine("async... over");
}
static void OnCompletedRead(IAsyncResult ar)
{
Console.WriteLine("finished");
}
I wonder, is the read action executed by an IO thread asynchronously? Or a worker thread in a thread pool?
And the callback function OnCompletedRead, is it also executed by an IO thread in CLR thread pool?
Are these two threads the same one? If not, there are two threads generated, one executes the read action and another does the callback.
If you don't use an AsyncCallback argument with BeginRead then there is only one thread that runs code in your program. This uses IO completion ports to signal when the IO is complete by running a small amount of code on a thread in the IO thread pool to update the status of the operation as being complete. When you call EndRead it will block the current thread until the IO operation is complete. It is asynchronous in that when you start the read operation the current thread does not need to do anything other than wait for the IO hardware to perform the read operation, so you can do other things in the meantime and then decide when you want to stop and wait for the IO to finish.
If you do pass in an AsyncCallback then when the IO operation is complete it will execute a small amount of code on an IO thread pool thread which will trigger your callback method to be executed on a thread from the .NET thread pool.
Usually, mclaassen is right about the nature of IO bound work, IOCP and the APM. When BeginRead executes, it does so asynchronously all the way down to kernel mode. But, there is one caveat specifically in your example that he didn't mention in his answer.
In your example, you use the FileStream class. One important thing to note is that if you dont use the FileStream overload that accepts a useAsync boolean, when you invoke a BeginWrite / EndWrite operation, it will queue work on a new ThreadPool thread.
This is the proper overload:
public FileStream(
string path,
FileMode mode,
FileAccess access,
FileShare share,
int bufferSize,
bool useAsync
)
From MSDN:
useAsync:
Type: System.Boolean
Specifies whether to use asynchronous
I/O or synchronous I/O. However, note that the underlying operating
system might not support asynchronous I/O, so when specifying true,
the handle might be opened synchronously depending on the platform.
When opened asynchronously, the BeginRead and BeginWrite methods
perform better on large reads or writes, but they might be much slower
for small reads or writes. If the application is designed to take
advantage of asynchronous I/O, set the useAsync parameter to true.
Using asynchronous I/O correctly can speed up applications by as much
as a factor of 10, but using it without redesigning the application
for asynchronous I/O can decrease performance by as much as a factor
of 10.
You have to make sure each specific method implementing the APM pattern truly uses true asynchronous work all the way down.
Is there any way to cancel a asynchronous read or write task on a SslStream? I have tried providing ReadAsync with a CancellationToken but it doesnt appear to work. When the following code reaches it's timeout (the Task.Delay), it calls cancel on the CancellationTokenSource which should cancel the read task, returns a error to the calling method, and the calling method eventually tries to read again, which raises a "The BeginRead method cannot be called when another write operation is pending" exception.
In my specific application I could work around this by closeing the socket and reconnecting, but there is a high overhead associated with reestablishing the connection so it is less than ideal.
private async Task<int> ReadAsync(byte[] buffer, int offset, int count, DateTime timeout)
{
CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
if (socket.Poll(Convert.ToInt32(timeout.RemainingTimeout().TotalMilliseconds) * 1000, SelectMode.SelectRead) == true)
{
Task<int> readTask = stream.ReadAsync(buffer, offset, count, cancellationTokenSource.Token);
if (await Task.WhenAny(readTask, Task.Delay(timeout.RemainingTimeout())) == readTask)
return readTask.Result;
else
cancellationTokenSource.Cancel();
}
return -1;
}
Looking at the doc for SslStream, it does not support ReadAsync (it simply uses the fallback synchronous implementation from Stream). Since SslStream is a decorator Stream, is isn't obvious how to safely recover from a timeout on the underlying Stream, and the only obvious way would be to re-initialize the entire Stream pipeline. However given that the underlying stream might not be seekable, again this might not be idea.
For support of cancellation, the stream would have to override Stream.ReadAsync(Byte[], Int32, Int32, CancellationToken). In the documentation, neither NetworkStream nor SslStream overrides the overload of ReadAsync required to consume cancellation (and abstract Stream couldn't possibly implement generic cancellation). For an example where cancellation IS supported, see FileStream and contrast how the documentation differs.
So for a concrete case, if we were decorating HttpStream using SslStream then after a timeout we would want to recover by opening the HttpStream back at the position where we timed out (using the Range header). But there is no way to expose that generically using the IO.Stream class.
Finally you should consider what your failure case should be. Why would ReadAsync timeout? In the majority of cases I can think of, it should be due to unrecoverable network issues, which would necessitate the Stream being reinitialized.
Bonus point. Have you considered refactoring out your Timeout behaviour into a decorator Stream? You could then place the timeout decorator onto your underlying stream.
stream = new SslStream(
new TimeoutStream(new FooStream(), Timespan.FromMilliseconds(1000)));
I need to read from NetworkStream which would send data randomly and the size of data packets also keep varying. I am implementing a multi-threaded application where each thread would have its own stream to read from. If there is no data on the stream, the application should keep waiting for the data to arrive. However, if the server is done sending data and has terminated the session, then it should exit out.
Initially I had utilised the Read method to obtain the data from the stream, but it used to block the thread and kept waiting until data appeared on the stream.
The documentation on MSDN suggests,
If no data is available for reading,
the Read method returns 0. If the
remote host shuts down the connection,
and all available data has been
received, the Read method completes
immediately and return zero bytes.
But in my case, I have never got the Read method to return 0 and exit gracefully. It just waits indefinitely.
In my further investigation, I came across BeginRead which watches the stream and invokes a callback method asynchronously, as soon as it receives the data. I have tried to look for various implementations using this approach as well, however, I was unable to identify when would using BeginRead be beneficial as opposed to Read.
As I look at it, BeginRead has just the advantage of having the async call, which would not block the current thread. But in my application, I already have a separate thread to read and process the data from stream, so that wouldn't make much difference for me.
Can anyone please help me understand the Wait and Exit mechanism for
BeginRead and how is it different from Read?
What would be the best way to implement the desired functionality?
I use BeginRead, but continue blocking the thread using a WaitHandle:
byte[] readBuffer = new byte[32];
var asyncReader = stream.BeginRead(readBuffer, 0, readBuffer.Length,
null, null);
WaitHandle handle = asyncReader.AsyncWaitHandle;
// Give the reader 2seconds to respond with a value
bool completed = handle.WaitOne(2000, false);
if (completed)
{
int bytesRead = stream.EndRead(asyncReader);
StringBuilder message = new StringBuilder();
message.Append(Encoding.ASCII.GetString(readBuffer, 0, bytesRead));
}
Basically it allows a timeout of the async reads using the WaitHandle and gives you a boolean value (completed) if the read was completed in the set time (2000 in this case).
Here's my full stream reading code copied and pasted from one of my Windows Mobile projects:
private static bool GetResponse(NetworkStream stream, out string response)
{
byte[] readBuffer = new byte[32];
var asyncReader = stream.BeginRead(readBuffer, 0, readBuffer.Length, null, null);
WaitHandle handle = asyncReader.AsyncWaitHandle;
// Give the reader 2seconds to respond with a value
bool completed = handle.WaitOne(2000, false);
if (completed)
{
int bytesRead = stream.EndRead(asyncReader);
StringBuilder message = new StringBuilder();
message.Append(Encoding.ASCII.GetString(readBuffer, 0, bytesRead));
if (bytesRead == readBuffer.Length)
{
// There's possibly more than 32 bytes to read, so get the next
// section of the response
string continuedResponse;
if (GetResponse(stream, out continuedResponse))
{
message.Append(continuedResponse);
}
}
response = message.ToString();
return true;
}
else
{
int bytesRead = stream.EndRead(asyncReader);
if (bytesRead == 0)
{
// 0 bytes were returned, so the read has finished
response = string.Empty;
return true;
}
else
{
throw new TimeoutException(
"The device failed to read in an appropriate amount of time.");
}
}
}
Async I/O can be used to achieve the same amount of I/O in less threads.
As you note, right now your app has one thread per Stream. This is OK with small numbers of connections, but what if you need to support 10000 at once? With async I/O, this is no longer necessary because the read completion callback allows context to be passed identifying the relevant stream. Your reads no longer block, so you don't need one thread per Stream.
Whether you use sync or async I/O, there is a way to detect and handle stream closedown on the relevant API return codes. BeginRead should fail with IOException if the socket has already been closed. A closedown while your async read is pending will trigger a callback, and EndRead will then tell you the state of play.
When your application calls BeginRead,
the system will wait until data is
received or an error occurs, and then
the system will use a separate thread
to execute the specified callback
method, and blocks on EndRead until
the provided NetworkStream reads data
or throws an exception.
Did you try server.ReceiveTimeout? You can set the time which Read() functon will wait for incomming data before returning zero. In your case, this property is probably set to infinite somewhere.
BeginRead is an async process which means your main thread will start execute Read in another process. So now we have 2 parallel processes. if u want to get the result, u have to call EndRead, which will gives the result.
some psudo
BeginRead()
//...do something in main object while result is fetching in another thread
var result = EndRead();
but if your main thread doesn't have anything else to do and u have to need the result, u should call Read.