I am redirecting the output of a process into a streamreader which I read later. My problem is I am using multiple threads which SHOULD have separate instances of this stream. When I go to read this stream in, the threading fudges and starts executing oddly. Is there such a thing as making a thread-safe stream? EDIT: I put locks on the ReadToEnd on the streamreader, and the line where I did: reader = proc.StandardOutput;
There's a SyncrhonizedStream built into the framework, they just don't expose the class for you to look at/subclass etc, but you can turn any stream into a SynchronizedStream using
var syncStream = Stream.Synchronized(inStream);
You should pass the syncStream object around to each thread that needs it, and make sure you never try to access inStream elsewhere in code.
The SynchronizedStream just implements a monitor on all read/write operation to ensure that a thread has mutually exclusive access to the stream.
Edit:
Appears they also implements a SynchronizedReader/SynchronizedWriter in the framework too.
var reader = TextReader.Synchronized(process.StandardOutput);
A 'thread-safe' stream doesn't really mean anything. If the stream is somehow shared you must define on what level synchronization/sharing can take place. This in terms of the data packets (messages or records) and their allowed/required ordering.
Related
I need to check a NamedPipeClientStream to see if there are bytes for it to read before I attempt to read it. The reason for this is because the thread stops on any read operation if there's nothing to read and I simply cannot have that. I must be able to continue even if there's no bytes to read.
I've also tried wrapping it in a StreamReader, which I've seen suggested, but that has the same result.
StreamReader sr = new StreamReader(myPipe)
string temp;
while((temp = sr.ReadLine()) != null) //Thread stops in ReadLine
{
Console.WriteLine("Received from server: {0}", temp);
}
I either need for the read operations to not wait until there are bytes to read, or a way to check if there are bytes to read before attempting the read operations.
PipeStream does not support the Length, Position or ReadTimout properties or Seek...
This is a very bad pattern. Structure your code so that there's a reading thread that always tries to read until the stream has ended. Then, make your threads communicate to achieve the logic and control flow you want.
It is generally not possible to check whether an arbitrary Stream has data available. I think it's possible with named pipes. But even if you do that you need to ensure that incoming bytes will be read in a timely manner. There is no event for that. Even if you manage all of this the code will be quite nasty. It will not be easy to mentally verify.
For that reason, simply keep a reading loop alive. You could make that reading loop enqueue the data into a queue (maybe BlockingCollection). Then other threads can check that queue for data or wait for data to arrive. The stream will always be drained correctly. You can signal the stream end by enqueueing null.
When I say "thread" I mean any primitive that gives you the appearance of a thread. These days you would never use Thread. Rather, use async/await or Task.
What would be the best way to design a packing / converting stream proxy in C#?
Suppose, I have some input stream and I wish to make something similiar to boost::iostreams does.
So, for example, I can zlib packing proxy to the stream, so that when I access the contents of the ZlibWrappedStream(initialStream), I receive the data from initialStream, but packed using zlib?
How can this be designed considering the fact that different proxies can be applied one after another and also considering the possibility of multithreaded packing?
.NET streams already allow the chaining that you're talking about. For example, if I want to gzip data and store to a file, I can write the following:
using (var fs = File.OpenWrite(outputFilename))
{
using (var gz = new GZipStream(fs, CompressionMode.Compress))
{
// now, write to the GZip stream . . .
gz.Write(buffer, 0, buffer, length);
}
}
You can make that chain arbitrarily long. For example, I often put a BufferedStream in front of the GZipStream to give gzip a larger compression window.
Multithreaded packing shouldn't be a problem, as long as you confine all of your multithreaded operations to the class's internals. If you want multiple threads to be able to write to the stream concurrently, you'll have to create some kind of synchronization mechanism to prevent interleaving data in the input buffer. Again, if you limit that synchronization to the class internals, then there shouldn't be a problem. The same kind of thing applies to multithreaded unpacking and reading.
I have a file that will be read from multiple threads, do I need to put each seek and read into a critical section?
stream.Seek(seekStart, SeekOrigin.Begin);
stream.Read();
stream.Seek(seekNext, SeekOrigin.Current);
stream.Read();
or
lock(fileLock)
{
stream.Seek(seekStart, SeekOrigin.Begin);
stream.Read();
stream.Seek(seekNext, SeekOrigin.Current);
stream.Read();
}
Obviously what I'm trying to avoid is the following situation:
.
.
Thread A: Seek
<- Preempted ->
Thread B: Seek
Thread B: Read
<- Preempted ->
Thread A: Read (Will this be reading from the wrong location?)
.
.
Because your streams will be separate objects in separate threads, you should be OK without making it critical. Each stream should hold its own seek location, and so should not interfere with the others.
This assumes that you are declaring all of your variables within the class object, and not static.
Whenever you're modifying an object which is shared across threads, you need a critical section.
If the stream variable is being shared (referring to the same object), then yes.
If each thread has its own stream variable (not referring to the same object), then no.
if it is the same stream you would need for each seek and read to be in a critical section...
Alternatively you can use MemoryMappedFile (ne in .NET 4)... this allows multiple threads to access it without crisitcal section because it maps the file into RAM and then you can random access its content...
MSDN say for FileStream"
When a FileStream object does not have an exclusive hold on its
handle, another thread could access the file handle concurrently and
change the position of the operating system's file pointer that is
associated with the file handle. In this case, the cached position in
the FileStream object and the cached data in the buffer could be
compromised. The FileStream object routinely performs checks on
methods that access the cached buffer to assure that the operating
system's handle position is the same as the cached position used by
the FileStream object.
If an unexpected change in the handle position is detected in a call
to the Read method, the .NET Framework discards the contents of the
buffer and reads the stream from the file again. This can affect
performance, depending on the size of the file and any other processes
that could affect the position of the file stream.
So at least, you can have performance issues. For performance, the best solution is to use Asynchronuous I/O (BeginRead, EndRead)
I am trying to figure out how to pass a class/struct/etc.. using named pipe between thread(I am trying to measure some performance using the stopwatch and compare it to other methods..)
Anyway , All the documention I've found is talking about using StreamReader and readline to get the data from the NamedPipeServerStream. However readline is a string, how do I actually use the data from the named pipe if I am passing something that is not a string.
Thanks,
Eyal
NamedPipeServerStream is a stream - so it's fine for binary data out of the box. Just treat it like a normal stream, rather than wrapping it in a StreamReader.
As for passing objects - why used named pipes if you're strictly within a single process? Just create an in-memory producer/consumer queue.
Using BinaryFormatter is the easiest way to serialize an object in and out of a pipe stream. You'll need to decorate the class or struct with the [Serializable] attribute.
Using a pipe is however a very inefficient way to "pass" data between threads. Every thread in a process has access to the same garbage collected heap, no serialization is required as long as the threads run in the same AppDomain. You do need to synchronize access to the object(s), use the lock statement. The ConcurrentQueue class makes it very simple.
Does anyone know where I can find a Stream splitter implementation?
I'm looking to take a Stream, and obtain two separate streams that can be independently read and closed without impacting each other. These streams should each return the same binary data that the original stream would. No need to implement Position or Seek and such... Forward only.
I'd prefer if it didn't just copy the whole stream into memory and serve it up multiple times, which would be fairly simple enough to implement myself.
Is there anything out there that could do this?
I have made a SplitStream available on github and NuGet.
It goes like this.
using (var inputSplitStream = new ReadableSplitStream(inputSourceStream))
using (var inputFileStream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputFileStream = File.OpenWrite("MyFileOnAnyFilestore.bin"))
using (var inputSha1Stream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputSha1Stream = SHA1.Create())
{
inputSplitStream.StartReadAhead();
Parallel.Invoke(
() => {
var bytes = outputSha1Stream.ComputeHash(inputSha1Stream);
var checksumSha1 = string.Join("", bytes.Select(x => x.ToString("x")));
},
() => {
inputFileStream.CopyTo(outputFileStream);
},
);
}
I have not tested it on very large streams, but give it a try.
github: https://github.com/microknights/SplitStream
Not out of the box.
You'll need to buffer the data from the original stream in a FIFO manner, discarding only data which has been read by all "reader" streams.
I'd use:
A "management" object holding some sort of queue of byte[] holding the chunks to be buffered and reading additional data from the source stream if required
Some "reader" instances which known where and on what buffer they are reading, and which request the next chunk from the "management" and notify it when they don't use a chunk anymore, so that it may be removed from the queue
This could be tricky without risking keeping everything buffered in memory (if the streams are at BOF and EOF respectively).
I wonder whether it isn't easier to write the stream to disk, copy it, and have two streams reading from disk, with self-deletion built into the Close() (i.e. write your own Stream wrapper around FileStream).
The below seems to be valid called EchoStream
http://www.codeproject.com/Articles/3922/EchoStream-An-Echo-Tee-Stream-for-NET
Its a very old implementation (2003) but should provide some context
found via Redirect writes to a file to a stream C#
You can't really do this without duplicating at least part of the sourse stream - mostly due to the fact that if doesn't sound like you can control the rate at which they are consumed (multiple threads?). You could do something clever regarding one reading ahread of the other (and thereby making the copy at that point only) but the complexiy of this sounds like it's not worth the trouble.
I do not think you will be able to find a generic implementation to do just that. A Stream is rather abstract, you don't know where the bytes are coming from. For instance you don't know if it will support seeking; and you don't know the relative cost of operations. (The Stream might be an abstraction of reading data from a remote server, or even off a backup tape !).
If you are able to have a MemoryStream and store the contents once, you can create two separate streams using the same buffer; and they will behave as independent Streams but only use the memory once.
Otherwise, I think you are best off by creating a wrapper class that stores the bytes read from one stream, until they are also read by the second stream. That would give you the desired forward-only behaviour - but in worst case, you might risk storing all of the bytes in memory, if the second Stream is not read until the first Stream has completed reading all content.
With the introduction of async / await, so long as all but one of your reading tasks are async, you should be able to process the same data twice using only a single OS thread.
What I think you want, is a linked list of the data blocks you have seen so far. Then you can have multiple custom Stream instances that hold a pointer into this list. As blocks fall off the end of the list, they will be garbage collected. Reusing the memory immediately would require some other kind of circular list and reference counting. Doable, but more complicated.
When your custom Stream can answer a ReadAsync call from the cache, copy the data, advance the pointer down the list and return.
When your Stream has caught up to the end of the cache list, you want to issue a single ReadAsync to the underlying stream, without awaiting it, and cache the returned Task with the data block. So if any other Stream reader also catches up and tries to read more before this read completes, you can return the same Task object.
This way, both readers will hook their await continuation to the result of the same ReadAsync call. When the single read returns, both reading tasks will sequentially execute the next step of their process.