how to deal with the position in a c# stream - c#

The (entire) documentation for the position property on a stream says:
When overridden in a derived class, gets or sets the position within the current stream.
The Position property does not keep track of the number of bytes from the stream that have been consumed, skipped, or both.
That's it.
OK, so we're fairly clear on what it doesn't tell us, but I'd really like to know what it in fact does stand for. What is 'the position' for? Why would we want to alter or read it? If we change it - what happens?
In a pratical example, I have a a stream that periodically gets written to, and I have a thread that attempts to read from it (ideally ASAP).
From reading many SO issues, I reset the position field to zero to start my reading. Once this is done:
Does this affect where the writer to this stream is going to attempt to put the data? Do I need to keep track of the last write position myself? (ie if I set the position to zero to read, does the writer begin to overwrite everything from the first byte?)
If so, do I need a semaphore/lock around this 'position' field (subclassing, perhaps?) due to my two threads accessing it?
If I don't handle this property, does the writer just overflow the buffer?
Perhaps I don't understand the Stream itself - I'm regarding it as a FIFO pipe: shove data in at one end, and suck it out at the other.
If it's not like this, then do I have to keep copying the data past my last read (ie from position 0x84 on) back to the start of my buffer?
I've seriously tried to research all of this for quite some time - but I'm new to .NET. Perhaps the Streams have a long, proud (undocumented) history that everyone else implicitly understands. But for a newcomer, it's like reading the manual to your car, and finding out:
The accelerator pedal affects the volume of fuel and air sent to the fuel injectors. It does not affect the volume of the entertainment system, or the air pressure in any of the tires, if fitted.
Technically true, but seriously, what we want to know is that if we mash it to the floor you go faster..
EDIT - Bigger Picture
I have data coming in either from a serial port, a socket, or a file, and have a thread that sits there waiting for new data, and writing it to one or more streams - all identical.
One of these streams I can access from a telnet session from another pc, and that all works fine.
The problem I'm having now is parsing the data in code in the same program (on another of the duplicated streams). I'm duplicating the data to a MemoryStream, and have a thread to sit and decipher the data, and pass it back up to the UI.
This thread does a dataStream.BeginRead() into it's own buffer, which returns some(?) amount of data up to but not more than the count argument. After I've dealt with whatever I got back from the BeginRead, I copy the remaining data (from the end of my read point to the end of the stream) to the start of my buffer so it won't overflow.
At this point, since both the writing and reading are asynchronous, I don't know if I can change the position (since it's a 'cursor' - thanks Jon). Even if send a message to the other thread to say that I've just read 28 bytes, or whatever - it won't know which 28 bytes they were, and won't know how to reset it's cursor/position.
I haven't subclassed any streams - I've just created a MemoryStream, and passed that to the thread that duplicates the data out to whatever streams are needed.
This all feels too complex to be the right way of doing it - I'm just unable to find a simple example I can modify as needed..
How else do people deal with a long-term sporadic data stream that needs to be send to some other task that isn't instantaneous to perform?
EDIT: Probable Solution
While trying to write a Stream wrapper around a queue due to information in the answers, I stumbled upon this post by Stephen Toub.
He has written a BlockingStream, and explains:
Most streams in the .NET Framework are not thread safe, meaning that multiple threads can't safely access an instance of the stream concurrently and most streams maintain a single position at which the next read or write will occur. BlockingStream, on the other hand, is thread safe, and, in a sense, it implicitly maintains two positions, though neither is exposed as a numerical value to the user of the type.
BlockingStream works by maintaining an internal queue of data buffers written to it. When data is written to the stream, the buffer written is enqueued. When data is read from the stream, a buffer is dequeued in a first-in-first-out (FIFO) order, and the data in it is handed back to the caller. In that sense, there is a position in the stream at which the next write will occur and a position at which the next read will occur.
This seems exactly what I was looking for - so thanks for the answerrs guys, I only found this from your answers.

I think that you are expecting a little too much from the documentation. It does tell you exactly what everything does, but it doesn't tell you much about how to use it. If you are not familiar with streams, reading only the documention will not give you enough information to actually understand how to use them.
Let's look at what the documentation says:
"When overridden in a derived class,
gets or sets the position within the
current stream."
This is "standard documentation speak" for saying that the property is intended for keeping track of the position in the stream, but that the Stream class itself doesn't provide the actual implementation of that. The implementation lies in classes that derive from the Stream class, like a FileStream or a MemoryStream. Each have their own system of maintaining the position, because they work against completely different back ends.
There can even be implementation of streams where the Position property doesn't make sense. You can use the CanSeek property to find out if a stream implementation supports a position.
"The Position property does not keep
track of the number of bytes from the
stream that have been consumed,
skipped, or both."
This means that the Position property represents an absolute position in the back end implementation, it's not just a counter of what's been read or written. The methods for reading and writing the stream uses the position to keep track of where to read or write, it's not the other way around.
For a stream implementation that doesn't support a position, it could still have returned how many bytes have been read or written, but it doesn't. The Position property should reflect an actual place in the data, and if it can't do that it should throw a NotSupportedException exception.
Now, let's look at your case:
Using a StreamReader and a StreamWriter against the same stream is tricky, and mostly pointless. The stream only has one position, and that will be used for both reading and writing, so you would have to keep track of two separate positions. Also, you would have to flush the buffer after each read and write operation, so that there is nothing left in the buffers and the Position of the stream is up to date when you retrieve it. This means that the StreamReader and StreamWriter can't be used as intended, and only act as a wrapper around the stream.
If you are using the StreamReader and StreamWriter from different threads, you have to synchronise every operation. Two threads can never use the stream at the same time, so a read/write operation would have to do:
lock
set position of the stream from local copy
read/write
flush buffer
get position of the stream to local copy
end lock
A stream can be used as a FIFO buffer that way, but there are other ways that may be better suited for your needs. A Queue<T> for example works as an in-memory FIFO buffer.

The position is the "cursor" for both writing and reading. So yes, after resetting the Position property to 0, it will start overwriting existing data
You should be careful when dealing with a stream from multiple threads in the first place, to be honest. It's not clear whether you've written a new Stream subclass, or whether you're just the client of an existing stream, but either way you need to be careful.
It's not clear what you mean by "If I don't handle this property" - what do you mean by "handle" here? Again, it would help if you were clearer on what you were doing.
A Stream may act like a pipe... it really depends on what you're doing with it. It's unclear what you mean by "do I have to keep copying the data past my last read" - and unclear what you mean by your buffer, too.
If you could give an idea of the bigger picture of what you're trying to achieve, that would really help.

Related

Checking if NamedPipeClientStream has something to read

I need to check a NamedPipeClientStream to see if there are bytes for it to read before I attempt to read it. The reason for this is because the thread stops on any read operation if there's nothing to read and I simply cannot have that. I must be able to continue even if there's no bytes to read.
I've also tried wrapping it in a StreamReader, which I've seen suggested, but that has the same result.
StreamReader sr = new StreamReader(myPipe)
string temp;
while((temp = sr.ReadLine()) != null) //Thread stops in ReadLine
{
Console.WriteLine("Received from server: {0}", temp);
}
I either need for the read operations to not wait until there are bytes to read, or a way to check if there are bytes to read before attempting the read operations.
PipeStream does not support the Length, Position or ReadTimout properties or Seek...
This is a very bad pattern. Structure your code so that there's a reading thread that always tries to read until the stream has ended. Then, make your threads communicate to achieve the logic and control flow you want.
It is generally not possible to check whether an arbitrary Stream has data available. I think it's possible with named pipes. But even if you do that you need to ensure that incoming bytes will be read in a timely manner. There is no event for that. Even if you manage all of this the code will be quite nasty. It will not be easy to mentally verify.
For that reason, simply keep a reading loop alive. You could make that reading loop enqueue the data into a queue (maybe BlockingCollection). Then other threads can check that queue for data or wait for data to arrive. The stream will always be drained correctly. You can signal the stream end by enqueueing null.
When I say "thread" I mean any primitive that gives you the appearance of a thread. These days you would never use Thread. Rather, use async/await or Task.

Multithreaded file reading, does seek & read need critical section?

I have a file that will be read from multiple threads, do I need to put each seek and read into a critical section?
stream.Seek(seekStart, SeekOrigin.Begin);
stream.Read();
stream.Seek(seekNext, SeekOrigin.Current);
stream.Read();
or
lock(fileLock)
{
stream.Seek(seekStart, SeekOrigin.Begin);
stream.Read();
stream.Seek(seekNext, SeekOrigin.Current);
stream.Read();
}
Obviously what I'm trying to avoid is the following situation:
.
.
Thread A: Seek
<- Preempted ->
Thread B: Seek
Thread B: Read
<- Preempted ->
Thread A: Read (Will this be reading from the wrong location?)
.
.
Because your streams will be separate objects in separate threads, you should be OK without making it critical. Each stream should hold its own seek location, and so should not interfere with the others.
This assumes that you are declaring all of your variables within the class object, and not static.
Whenever you're modifying an object which is shared across threads, you need a critical section.
If the stream variable is being shared (referring to the same object), then yes.
If each thread has its own stream variable (not referring to the same object), then no.
if it is the same stream you would need for each seek and read to be in a critical section...
Alternatively you can use MemoryMappedFile (ne in .NET 4)... this allows multiple threads to access it without crisitcal section because it maps the file into RAM and then you can random access its content...
MSDN say for FileStream"
When a FileStream object does not have an exclusive hold on its
handle, another thread could access the file handle concurrently and
change the position of the operating system's file pointer that is
associated with the file handle. In this case, the cached position in
the FileStream object and the cached data in the buffer could be
compromised. The FileStream object routinely performs checks on
methods that access the cached buffer to assure that the operating
system's handle position is the same as the cached position used by
the FileStream object.
If an unexpected change in the handle position is detected in a call
to the Read method, the .NET Framework discards the contents of the
buffer and reads the stream from the file again. This can affect
performance, depending on the size of the file and any other processes
that could affect the position of the file stream.
So at least, you can have performance issues. For performance, the best solution is to use Asynchronuous I/O (BeginRead, EndRead)

C#, is there such a thing as a "thread-safe" stream?

I am redirecting the output of a process into a streamreader which I read later. My problem is I am using multiple threads which SHOULD have separate instances of this stream. When I go to read this stream in, the threading fudges and starts executing oddly. Is there such a thing as making a thread-safe stream? EDIT: I put locks on the ReadToEnd on the streamreader, and the line where I did: reader = proc.StandardOutput;
There's a SyncrhonizedStream built into the framework, they just don't expose the class for you to look at/subclass etc, but you can turn any stream into a SynchronizedStream using
var syncStream = Stream.Synchronized(inStream);
You should pass the syncStream object around to each thread that needs it, and make sure you never try to access inStream elsewhere in code.
The SynchronizedStream just implements a monitor on all read/write operation to ensure that a thread has mutually exclusive access to the stream.
Edit:
Appears they also implements a SynchronizedReader/SynchronizedWriter in the framework too.
var reader = TextReader.Synchronized(process.StandardOutput);
A 'thread-safe' stream doesn't really mean anything. If the stream is somehow shared you must define on what level synchronization/sharing can take place. This in terms of the data packets (messages or records) and their allowed/required ordering.

Lazy stream for C# / .NET

Does anyone know of a lazy stream implementation in .net? IOW, I want a to create a method like this:
public Stream MyMethod() {
return new LazyStream(...whatever parameters..., delegate() {
... some callback code.
});
}
and when my other code calls MyMethod() to return retrieve the stream, it will not actually perform any work until someone actually tries to read from the stream. The usual way would be to make MyMethod take the stream parameter as a parameter, but that won't work in my case (I want to give the returned stream to an MVC FileStreamResult).
To further explain, what I'm looking for is to create a layered series of transformations, so
Database result set =(transformed to)=> byte stream =(chained to)=> GZipStream =(passed to)=> FileStreamResult constructor.
The result set can be huge (GB), so I don't want to cache the result in a MemoryStream, which I can pass to the GZipStream constructor. Rather, I want to fetch from the result set as the GZipStream requests data.
Most stream implementations are, by nature, lazy streams. Typically, any stream will not read information from its source until it is requested by the user of the stream (other than some extra "over-reading" to allow for buffering to occur, which makes stream usage much faster).
It would be fairly easy to make a Stream implementation that did no reading until necessary by overriding Read to open the underlying resource and then read from it when used, if you need a fully lazy stream implementation. Just override Read, CanRead, CanWrite, and CanSeek.
In your Stream class you have to implement several methods of System.IO.Stream including the Read method.
What you do in this method is up to you. If you choose to call a delegate - this is up to you as well, and of course you can pass this delegate as one of the parameters of your constructor. At least this is how I would do it.
Unfortunately it will take more than implementing read method, and your delegate will not cover other required methods
This answer (https://stackoverflow.com/a/22048857/1037948) links to this article about how to write your own stream class.
To quote the answer:
The producer writes data to the stream and the consumer reads. There's a buffer in the middle so that the producer can "write ahead" a little bit. You can define the size of the buffer.
To quote the original source:
You can think of the ProducerConsumerStream as a queue that has a Stream interface. Internally, it's implemented as a circular buffer. Two indexes keep track of the insertion and removal points within the buffer. Bytes are written at the Head index, and removed from the Tail index.
If Head wraps around to Tail, then the buffer is full and the producer has to wait for some bytes to be read before it can continue writing. Similarly, if Tail catches up with Head, the consumer has to wait for bytes to be written before it can proceed.
The article goes on to describe some weird cases when the pointers wrap around, with full code samples.

How can I split (copy) a Stream in .NET?

Does anyone know where I can find a Stream splitter implementation?
I'm looking to take a Stream, and obtain two separate streams that can be independently read and closed without impacting each other. These streams should each return the same binary data that the original stream would. No need to implement Position or Seek and such... Forward only.
I'd prefer if it didn't just copy the whole stream into memory and serve it up multiple times, which would be fairly simple enough to implement myself.
Is there anything out there that could do this?
I have made a SplitStream available on github and NuGet.
It goes like this.
using (var inputSplitStream = new ReadableSplitStream(inputSourceStream))
using (var inputFileStream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputFileStream = File.OpenWrite("MyFileOnAnyFilestore.bin"))
using (var inputSha1Stream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputSha1Stream = SHA1.Create())
{
inputSplitStream.StartReadAhead();
Parallel.Invoke(
() => {
var bytes = outputSha1Stream.ComputeHash(inputSha1Stream);
var checksumSha1 = string.Join("", bytes.Select(x => x.ToString("x")));
},
() => {
inputFileStream.CopyTo(outputFileStream);
},
);
}
I have not tested it on very large streams, but give it a try.
github: https://github.com/microknights/SplitStream
Not out of the box.
You'll need to buffer the data from the original stream in a FIFO manner, discarding only data which has been read by all "reader" streams.
I'd use:
A "management" object holding some sort of queue of byte[] holding the chunks to be buffered and reading additional data from the source stream if required
Some "reader" instances which known where and on what buffer they are reading, and which request the next chunk from the "management" and notify it when they don't use a chunk anymore, so that it may be removed from the queue
This could be tricky without risking keeping everything buffered in memory (if the streams are at BOF and EOF respectively).
I wonder whether it isn't easier to write the stream to disk, copy it, and have two streams reading from disk, with self-deletion built into the Close() (i.e. write your own Stream wrapper around FileStream).
The below seems to be valid called EchoStream
http://www.codeproject.com/Articles/3922/EchoStream-An-Echo-Tee-Stream-for-NET
Its a very old implementation (2003) but should provide some context
found via Redirect writes to a file to a stream C#
You can't really do this without duplicating at least part of the sourse stream - mostly due to the fact that if doesn't sound like you can control the rate at which they are consumed (multiple threads?). You could do something clever regarding one reading ahread of the other (and thereby making the copy at that point only) but the complexiy of this sounds like it's not worth the trouble.
I do not think you will be able to find a generic implementation to do just that. A Stream is rather abstract, you don't know where the bytes are coming from. For instance you don't know if it will support seeking; and you don't know the relative cost of operations. (The Stream might be an abstraction of reading data from a remote server, or even off a backup tape !).
If you are able to have a MemoryStream and store the contents once, you can create two separate streams using the same buffer; and they will behave as independent Streams but only use the memory once.
Otherwise, I think you are best off by creating a wrapper class that stores the bytes read from one stream, until they are also read by the second stream. That would give you the desired forward-only behaviour - but in worst case, you might risk storing all of the bytes in memory, if the second Stream is not read until the first Stream has completed reading all content.
With the introduction of async / await, so long as all but one of your reading tasks are async, you should be able to process the same data twice using only a single OS thread.
What I think you want, is a linked list of the data blocks you have seen so far. Then you can have multiple custom Stream instances that hold a pointer into this list. As blocks fall off the end of the list, they will be garbage collected. Reusing the memory immediately would require some other kind of circular list and reference counting. Doable, but more complicated.
When your custom Stream can answer a ReadAsync call from the cache, copy the data, advance the pointer down the list and return.
When your Stream has caught up to the end of the cache list, you want to issue a single ReadAsync to the underlying stream, without awaiting it, and cache the returned Task with the data block. So if any other Stream reader also catches up and tries to read more before this read completes, you can return the same Task object.
This way, both readers will hook their await continuation to the result of the same ReadAsync call. When the single read returns, both reading tasks will sequentially execute the next step of their process.

Categories

Resources