I've been working on something that make use of streams and I found myself not clear about some stream concepts( you can also view another question posted by me at About redirected stdout in System.Diagnostics.Process ).
1.how do you indicate that you have finished writing a stream, writing something like a EOF?
2.follow the previous question, if I have written a EOF(or something like that) to a stream but didn't close the stream, then I want to write something else to the same stream, can I just start writing to it and no more set up required?
3.if a procedure tries to read a stream(like the stdin ) that no one has written anything to it, the reading procedure will be blocked,finally some data arrives and the procedure will just read till the writing is done,which is indicated by getting a return of 0 count of bytes read rather than being blocked, and now if the procedure issues another read to the same stream, it will still get a 0 count and return immediately while I was expecting it will be blocked since no one is writing to the stream now. So does the stream holds different states when the stream is opened but no one has written to it yet and when someone has finished a writing session?
I'm using Windows the .net framework if there will by any thing platform specific.
Thanks a lot!
This depends on the concrete stream. For example, reading from a MemoryStream would not block as you describle. This is because a MemoryStream has an explicit size, and as you read from the stream the pointer is progressed through the stream untile you reach the end, at which point the Read will return 0. If there was not data in the MemoryStream the first Read would have immediately returned 0.
What you describe fits with a NetworkStream, in which case reading from the stream will block until data becomes available, when the "server" side closes the underlying Socket that is wrapped by the NetworkStream the Read will return 0.
So the actual details depends on the stream, but at the high level they are all treated the same ie. You can read from a stream until the Read returns 0.
There is no "EOF" with streams. You write to a stream until you close it, which prevents it from being written further.
Streams just read and write bytes. That's all.
Related
Is it possible to somehow (without a huge performance loss) determine if a Stream (just a normal System.IO.Stream) "contains" a string or not? I have tried to google this, but I havent found a good solution that doesnt involve try and catching.
Any stream can be a string as a string is just a series of bytes. If you're asking if a stream contains a specific sequence of bytes -- i.e. you want to confirm that a stream contains a $MY_TOKEN$ somewhere then you'll have to read up until that point or to the end of the stream.
Depending on the nature of your stream, there might be an efficient way to do this and then reset the cursor back to the beginning of the stream.
Currently I have this code
SessionStream(Request.Content.ReadAsStreamAsync(), new { });
I need to somehow "mirror" the incoming stream so that I have two instances of it.
Something like the following pseudo code:
Task<Stream> stream = Request.Content.ReadAsStreamAsync();
SessionStream(stream, new { });
Stream theotherStram;
stream.Result.CopyToAsync(theotherStram)
ThoOtherStream(theotherStram, new { });
A technique that always works is to copy the stream to a MemoryStream and then use that.
Often, it is more efficient to just seek the original stream back to the beginning using the Seek method. That only works if this stream supports seeking.
If you do not want to buffer and cannot seek you need to push the stream contents blockwise two the two consumers. Read a block, write it two times.
If in addition you need a pull model (i.e. hand a readable stream to some component) it gets really hard and threading is involved. You'd need to write a push-to-pull adapter which is always hard.
The answer of usr is still correct in 2020, but for those wondering about why it is not trivial here is a short explanation.
The idea behind the steams is that writing to the stream and reading from it are independent. Usually, the process of reading is much faster then writing (think about receiving data through network - you can read the data as soon as it arrives) so usually the reader waits for new portion of data, processes it as soon as it arrives and then drops it to free the memory, and then waits for the next portion.
This allows processing potentially infinite data stream (for example, application log stream) without using much RAM.
Suppose now we have 2 readers (as required by the question). A data portion arrives, and then we have to wait for both the readers to read the data before we can drop it. Which means that it must be stored in memory until both readers are done with it. The problem is that the readers can process the data with very different speed. E.g. one can write it to a file, another can just count the symbols in memory. In this case either the fast one would have to wait for the slow one before reading further, or we would need to save the data to a buffer in memory, and let the readers read from it. In the worst case we will end up with the full copy of the input stream in memory, basically creating an instance of memory stream.
To implement the first option, you would have to implement a stream reader that is aware which of your stream usage is faster, and considering it, would distribute and drop the data accordingly.
If you are sure you have enough memory, and processing speed is not critical, just use the memory stream:
using var memStream = new MemoryStream();
await incomingStream.CopyToAsync(memStream);
UseTheStreamForTheFirstTime(memStream);
memStream.Seek(0, SeekOrigin.Begin);
UseTheStreamAnotherTime(memStream);
I have an event that gets fired when I receive data. The data received is a segment of a large compressed stream.
Currently, I maintain a MemoryStream and am calling Write on it whenever I receive data. However, to decompress this data I need to wrap a DeflateStream around it. The problem is that when you call Read on the DeflateStream, it reads zero bytes!
This is because the DeflateStream.BaseStream is the MemoryStream I just wrote to, so the MemoryStream.Position has been updated, for both reading and writing.
Is there an alternative to calling MemoryStream.Seek(-bytesJustWritten, SeekOrigin.Current) after every single MemoryStream.Write?
I need to be able to read the decompressed bytes in real-time without waiting for closure of the stream.
I thought I could just use two MemoryStream objects, one for buffering my input and then one to copy into and then read from, but I quickly realized that it would present the same problem; you cannot write to a stream and then read from it without first seeking.
I solved this by using one MemoryStream and calling MemoryStream.Seek(-bytesJustWritten, SeekOrigin.Current) after each time MemoryStream.Write was invoked.
This reset the position of the stream to allow for the data just decompressed to be read and later overwritten, while preserving the state of the DeflateStream.
I have a socket-based application that exposes received data with a BinaryReader object on the client side. I've been trying to debug an issue where the data contained in the reader is not clean... i.e. the buffer that I'm reading contains old data past the size of the new data.
In the code below:
System.Diagnostics.Debug.WriteLine("Stream length: {0}", _binaryReader.BaseStream.Length);
byte[] buffer = _binaryReader.ReadBytes((int)_binaryReader.BaseStream.Length);
When I comment out the first line, the data doesn't end up being dirty (or, doesn't end up being dirty as regularly) as when I have that print line statement. As far as I can tell, from the server side the data is coming in cleanly, so it's possible that my socket implementation has some issues. But does anyone have any idea why adding that print line would cause the data to be dirty more often?
Your binary reader looks like it is a private member variable (if the leading underscore is a tell tell sign).
Is your application multithreaded? You could be experiencing a race condition if another thread is attempting to do also use your binaryReader while you are reading from it. The fact that you experience issues even without that line seems quite suspect to me.
Are you sure that your reading logic is correct? Stream.Length indicates the length of the entire stream, not of the remaining data to be read.
Suppose that, initially, 100 bytes were available. Length is 100, and BinaryReader corrects reads 100 bytes and advances the stream position by 100. Then, another 20 bytes arrive. Length is now 120; however, your BinaryReader should only be reading 20 bytes, not 120. The ‘extra’ 100 bytes requested in the second read would either cause it to block or (if the stream is not implemented correctly) break.
The problem was silly and unrelated. I believe my reading logic above is correct, however. The issue was that the _binaryReader I was using was a reference that was not owned by my class and hence the underlying stream was being rewritten with bad data.
I want to make a class (let's call the class HugeStream) that takes an IEnumerable<Stream> in its constructor. This HugeStream should implement the Stream abstract class.
Basically, I have 1 to many pieces of UTF8 streams coming from a DB that when put together, make a gigantic XML document. The HugeStream needs to be file-backed so that I can seek back to position 0 of the whole stitched-together-stream at any time.
Anyone know how to make a speedy implementation of this?
I saw something similar created at this page but it does not seem optimal for handling large numbers of large streams. Efficiency is the key.
On a side note, I'm having trouble visualizing Streams and am a little confused now that I need to implement my own Stream. If there's a good tutorial on implementing the Stream class that anyone knows of, please let me know; I haven't found any good articles browsing around. I just see a lot of articles on using already-existing FileStreams and MemoryStreams. I'm a very visual learner and for some reason can't find anything useful to study the concept.
Thanks,
Matt
If you only read data sequentially from the HugeStream, then it simply needs to read each child stream (and append it into a local file as well as returning the read data to the caller) until the child-stream is exhausted, then move on to the next child-stream. If a Seek operation is used to jump "backwards" in the data, you must start reading from the local cache file; when you reach the end of the cache file, you must resume reading the current child stream where you left off.
So far, this is all pretty straight-forward to implement - you just need to indirect the Read calls to the appropriate stream, and switch streams as each one runs out of data.
The inefficiency of the quoted article is that it runs through all the streams every time you read to work out where to continue reading from. To improve on this, you need to open the child streams only as you need them, and keep track of the currently-open stream so you can just keep reading more data from that current stream until it is exhausted. Then open the next stream as your "current" stream and carry on. This is pretty straight-forward, as you have a linear sequence of streams, so you just step through them one by one. i.e. something like:
int currentStreamIndex = 0;
Stream currentStream = childStreams[currentStreamIndex++];
...
public override int Read(byte[] buffer, int offset, int count)
{
while (count > 0)
{
// Read what we can from the current stream
int numBytesRead = currentSteam.Read(buffer, offset, count);
count -= numBytesRead;
offset += numBytesRead;
// If we haven't satisfied the read request, we have exhausted the child stream.
// Move on to the next stream and loop around to read more data.
if (count > 0)
{
// If we run out of child streams to read from, we're at the end of the HugeStream, and there is no more data to read
if (currentStreamIndex >= numberOfChildStreams)
break;
// Otherwise, close the current child-stream and open the next one
currentStream.Close();
currentStream = childStreams[currentStreamIndex++];
}
}
// Here, you'd write the data you've just read (into buffer) to your local cache stream
}
To allow seeking backwards, you just have to introduce a new local file stream that you copy all the data into as you read (see the comment in my pseudocode above). You need to introduce a state so you know that you are reading from the cache file rather than the current child stream, and then just directly access the cache (seeking etc is easy because the cache represents the entire history of the data read from the HugeStream, so the seek offsets are identical between the HugeStream and the Cache - you simply have to redirect any Read calls to get the data out of the cache stream)
If you read or seek back to the end of the cache stream, you need to resume reading data from the current child stream. Just go back to the logic above and continue appending data to your cache stream.
If you wish to be able to support full random access within the HugeStream you will need to support seeking "forwards" (beyond the current end of the cache stream). If you don't know the lengths of the child streams beforehand, you have no choice but to simply keep reading data into your cache until you reach the seek offset. If you know the sizes of all the streams, then you could seek directly and more efficiently to the right place, but you will then have to devise an efficient means for storing the data you read to the cache file and recording which parts of the cache file contain valid data and which have not actually been read from the DB yet - this is a bit more advanced.
I hope that makes sense to you and gives you a better idea of how to proceed...
(You shouldn't need to implement much more than the Read and Seek interfaces to get this working).