Lazy stream for C# / .NET - c#

Does anyone know of a lazy stream implementation in .net? IOW, I want a to create a method like this:
public Stream MyMethod() {
return new LazyStream(...whatever parameters..., delegate() {
... some callback code.
});
}
and when my other code calls MyMethod() to return retrieve the stream, it will not actually perform any work until someone actually tries to read from the stream. The usual way would be to make MyMethod take the stream parameter as a parameter, but that won't work in my case (I want to give the returned stream to an MVC FileStreamResult).
To further explain, what I'm looking for is to create a layered series of transformations, so
Database result set =(transformed to)=> byte stream =(chained to)=> GZipStream =(passed to)=> FileStreamResult constructor.
The result set can be huge (GB), so I don't want to cache the result in a MemoryStream, which I can pass to the GZipStream constructor. Rather, I want to fetch from the result set as the GZipStream requests data.

Most stream implementations are, by nature, lazy streams. Typically, any stream will not read information from its source until it is requested by the user of the stream (other than some extra "over-reading" to allow for buffering to occur, which makes stream usage much faster).
It would be fairly easy to make a Stream implementation that did no reading until necessary by overriding Read to open the underlying resource and then read from it when used, if you need a fully lazy stream implementation. Just override Read, CanRead, CanWrite, and CanSeek.

In your Stream class you have to implement several methods of System.IO.Stream including the Read method.
What you do in this method is up to you. If you choose to call a delegate - this is up to you as well, and of course you can pass this delegate as one of the parameters of your constructor. At least this is how I would do it.
Unfortunately it will take more than implementing read method, and your delegate will not cover other required methods

This answer (https://stackoverflow.com/a/22048857/1037948) links to this article about how to write your own stream class.
To quote the answer:
The producer writes data to the stream and the consumer reads. There's a buffer in the middle so that the producer can "write ahead" a little bit. You can define the size of the buffer.
To quote the original source:
You can think of the ProducerConsumerStream as a queue that has a Stream interface. Internally, it's implemented as a circular buffer. Two indexes keep track of the insertion and removal points within the buffer. Bytes are written at the Head index, and removed from the Tail index.
If Head wraps around to Tail, then the buffer is full and the producer has to wait for some bytes to be read before it can continue writing. Similarly, if Tail catches up with Head, the consumer has to wait for bytes to be written before it can proceed.
The article goes on to describe some weird cases when the pointers wrap around, with full code samples.

Related

Handle asynchronous input synchronously

I am working on parsing a fairly complicated data stream generated by a usb device that emulates a keyboard. The easiest way for me to conceptualize and deal with the data would be if I had a method called something like GetNextInputCharacter, and I could do all the parsing in one go without having to do it one piece at a time as I receive input. Unfortunately I won't know how many bytes to expect in the stream until I have parsed into it significantly, and I would rather not wait until the end to parse it anyway.
Is there any mechanism or design pattern that can take asynchronous input from the key event and pass it to my parse method on demand? All I can think of is an IEnumerable that does a busy wait on a FIFO that the key event populates and yields them out one at a time. That seems like a bit of a hack, but maybe it would work. I just want a way for the parse routine to pretend like the input is already there and take it without knowing that it has to wait for the events.
How about parsing a Stream, and making the parser block until it has enough characters to make a sensible result? Then the async data from the USB device can just write to the stream. You'd probably have to write your own Stream implementation, but that isn't too hard.
This is a common enough pattern by the way- when you use the built-in .net serialization, the deserializers block on reading an input stream which may be coming over a network socket.
How about something like this:
...
var stream = //Set up stream
var data = from dataStream in StreamStuff(stream) select dataStream;
...
private IEnumerable<String> StreamStuff(Stream stream)
{
stream.Open();
while(stream.Read())
{
//Do some stuff to your read value
yield return yourProcessedData;
}
stream.Close();
}

Stream design in c sharp

What would be the best way to design a packing / converting stream proxy in C#?
Suppose, I have some input stream and I wish to make something similiar to boost::iostreams does.
So, for example, I can zlib packing proxy to the stream, so that when I access the contents of the ZlibWrappedStream(initialStream), I receive the data from initialStream, but packed using zlib?
How can this be designed considering the fact that different proxies can be applied one after another and also considering the possibility of multithreaded packing?
.NET streams already allow the chaining that you're talking about. For example, if I want to gzip data and store to a file, I can write the following:
using (var fs = File.OpenWrite(outputFilename))
{
using (var gz = new GZipStream(fs, CompressionMode.Compress))
{
// now, write to the GZip stream . . .
gz.Write(buffer, 0, buffer, length);
}
}
You can make that chain arbitrarily long. For example, I often put a BufferedStream in front of the GZipStream to give gzip a larger compression window.
Multithreaded packing shouldn't be a problem, as long as you confine all of your multithreaded operations to the class's internals. If you want multiple threads to be able to write to the stream concurrently, you'll have to create some kind of synchronization mechanism to prevent interleaving data in the input buffer. Again, if you limit that synchronization to the class internals, then there shouldn't be a problem. The same kind of thing applies to multithreaded unpacking and reading.

C#, is there such a thing as a "thread-safe" stream?

I am redirecting the output of a process into a streamreader which I read later. My problem is I am using multiple threads which SHOULD have separate instances of this stream. When I go to read this stream in, the threading fudges and starts executing oddly. Is there such a thing as making a thread-safe stream? EDIT: I put locks on the ReadToEnd on the streamreader, and the line where I did: reader = proc.StandardOutput;
There's a SyncrhonizedStream built into the framework, they just don't expose the class for you to look at/subclass etc, but you can turn any stream into a SynchronizedStream using
var syncStream = Stream.Synchronized(inStream);
You should pass the syncStream object around to each thread that needs it, and make sure you never try to access inStream elsewhere in code.
The SynchronizedStream just implements a monitor on all read/write operation to ensure that a thread has mutually exclusive access to the stream.
Edit:
Appears they also implements a SynchronizedReader/SynchronizedWriter in the framework too.
var reader = TextReader.Synchronized(process.StandardOutput);
A 'thread-safe' stream doesn't really mean anything. If the stream is somehow shared you must define on what level synchronization/sharing can take place. This in terms of the data packets (messages or records) and their allowed/required ordering.

how to deal with the position in a c# stream

The (entire) documentation for the position property on a stream says:
When overridden in a derived class, gets or sets the position within the current stream.
The Position property does not keep track of the number of bytes from the stream that have been consumed, skipped, or both.
That's it.
OK, so we're fairly clear on what it doesn't tell us, but I'd really like to know what it in fact does stand for. What is 'the position' for? Why would we want to alter or read it? If we change it - what happens?
In a pratical example, I have a a stream that periodically gets written to, and I have a thread that attempts to read from it (ideally ASAP).
From reading many SO issues, I reset the position field to zero to start my reading. Once this is done:
Does this affect where the writer to this stream is going to attempt to put the data? Do I need to keep track of the last write position myself? (ie if I set the position to zero to read, does the writer begin to overwrite everything from the first byte?)
If so, do I need a semaphore/lock around this 'position' field (subclassing, perhaps?) due to my two threads accessing it?
If I don't handle this property, does the writer just overflow the buffer?
Perhaps I don't understand the Stream itself - I'm regarding it as a FIFO pipe: shove data in at one end, and suck it out at the other.
If it's not like this, then do I have to keep copying the data past my last read (ie from position 0x84 on) back to the start of my buffer?
I've seriously tried to research all of this for quite some time - but I'm new to .NET. Perhaps the Streams have a long, proud (undocumented) history that everyone else implicitly understands. But for a newcomer, it's like reading the manual to your car, and finding out:
The accelerator pedal affects the volume of fuel and air sent to the fuel injectors. It does not affect the volume of the entertainment system, or the air pressure in any of the tires, if fitted.
Technically true, but seriously, what we want to know is that if we mash it to the floor you go faster..
EDIT - Bigger Picture
I have data coming in either from a serial port, a socket, or a file, and have a thread that sits there waiting for new data, and writing it to one or more streams - all identical.
One of these streams I can access from a telnet session from another pc, and that all works fine.
The problem I'm having now is parsing the data in code in the same program (on another of the duplicated streams). I'm duplicating the data to a MemoryStream, and have a thread to sit and decipher the data, and pass it back up to the UI.
This thread does a dataStream.BeginRead() into it's own buffer, which returns some(?) amount of data up to but not more than the count argument. After I've dealt with whatever I got back from the BeginRead, I copy the remaining data (from the end of my read point to the end of the stream) to the start of my buffer so it won't overflow.
At this point, since both the writing and reading are asynchronous, I don't know if I can change the position (since it's a 'cursor' - thanks Jon). Even if send a message to the other thread to say that I've just read 28 bytes, or whatever - it won't know which 28 bytes they were, and won't know how to reset it's cursor/position.
I haven't subclassed any streams - I've just created a MemoryStream, and passed that to the thread that duplicates the data out to whatever streams are needed.
This all feels too complex to be the right way of doing it - I'm just unable to find a simple example I can modify as needed..
How else do people deal with a long-term sporadic data stream that needs to be send to some other task that isn't instantaneous to perform?
EDIT: Probable Solution
While trying to write a Stream wrapper around a queue due to information in the answers, I stumbled upon this post by Stephen Toub.
He has written a BlockingStream, and explains:
Most streams in the .NET Framework are not thread safe, meaning that multiple threads can't safely access an instance of the stream concurrently and most streams maintain a single position at which the next read or write will occur. BlockingStream, on the other hand, is thread safe, and, in a sense, it implicitly maintains two positions, though neither is exposed as a numerical value to the user of the type.
BlockingStream works by maintaining an internal queue of data buffers written to it. When data is written to the stream, the buffer written is enqueued. When data is read from the stream, a buffer is dequeued in a first-in-first-out (FIFO) order, and the data in it is handed back to the caller. In that sense, there is a position in the stream at which the next write will occur and a position at which the next read will occur.
This seems exactly what I was looking for - so thanks for the answerrs guys, I only found this from your answers.
I think that you are expecting a little too much from the documentation. It does tell you exactly what everything does, but it doesn't tell you much about how to use it. If you are not familiar with streams, reading only the documention will not give you enough information to actually understand how to use them.
Let's look at what the documentation says:
"When overridden in a derived class,
gets or sets the position within the
current stream."
This is "standard documentation speak" for saying that the property is intended for keeping track of the position in the stream, but that the Stream class itself doesn't provide the actual implementation of that. The implementation lies in classes that derive from the Stream class, like a FileStream or a MemoryStream. Each have their own system of maintaining the position, because they work against completely different back ends.
There can even be implementation of streams where the Position property doesn't make sense. You can use the CanSeek property to find out if a stream implementation supports a position.
"The Position property does not keep
track of the number of bytes from the
stream that have been consumed,
skipped, or both."
This means that the Position property represents an absolute position in the back end implementation, it's not just a counter of what's been read or written. The methods for reading and writing the stream uses the position to keep track of where to read or write, it's not the other way around.
For a stream implementation that doesn't support a position, it could still have returned how many bytes have been read or written, but it doesn't. The Position property should reflect an actual place in the data, and if it can't do that it should throw a NotSupportedException exception.
Now, let's look at your case:
Using a StreamReader and a StreamWriter against the same stream is tricky, and mostly pointless. The stream only has one position, and that will be used for both reading and writing, so you would have to keep track of two separate positions. Also, you would have to flush the buffer after each read and write operation, so that there is nothing left in the buffers and the Position of the stream is up to date when you retrieve it. This means that the StreamReader and StreamWriter can't be used as intended, and only act as a wrapper around the stream.
If you are using the StreamReader and StreamWriter from different threads, you have to synchronise every operation. Two threads can never use the stream at the same time, so a read/write operation would have to do:
lock
set position of the stream from local copy
read/write
flush buffer
get position of the stream to local copy
end lock
A stream can be used as a FIFO buffer that way, but there are other ways that may be better suited for your needs. A Queue<T> for example works as an in-memory FIFO buffer.
The position is the "cursor" for both writing and reading. So yes, after resetting the Position property to 0, it will start overwriting existing data
You should be careful when dealing with a stream from multiple threads in the first place, to be honest. It's not clear whether you've written a new Stream subclass, or whether you're just the client of an existing stream, but either way you need to be careful.
It's not clear what you mean by "If I don't handle this property" - what do you mean by "handle" here? Again, it would help if you were clearer on what you were doing.
A Stream may act like a pipe... it really depends on what you're doing with it. It's unclear what you mean by "do I have to keep copying the data past my last read" - and unclear what you mean by your buffer, too.
If you could give an idea of the bigger picture of what you're trying to achieve, that would really help.

How can I split (copy) a Stream in .NET?

Does anyone know where I can find a Stream splitter implementation?
I'm looking to take a Stream, and obtain two separate streams that can be independently read and closed without impacting each other. These streams should each return the same binary data that the original stream would. No need to implement Position or Seek and such... Forward only.
I'd prefer if it didn't just copy the whole stream into memory and serve it up multiple times, which would be fairly simple enough to implement myself.
Is there anything out there that could do this?
I have made a SplitStream available on github and NuGet.
It goes like this.
using (var inputSplitStream = new ReadableSplitStream(inputSourceStream))
using (var inputFileStream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputFileStream = File.OpenWrite("MyFileOnAnyFilestore.bin"))
using (var inputSha1Stream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputSha1Stream = SHA1.Create())
{
inputSplitStream.StartReadAhead();
Parallel.Invoke(
() => {
var bytes = outputSha1Stream.ComputeHash(inputSha1Stream);
var checksumSha1 = string.Join("", bytes.Select(x => x.ToString("x")));
},
() => {
inputFileStream.CopyTo(outputFileStream);
},
);
}
I have not tested it on very large streams, but give it a try.
github: https://github.com/microknights/SplitStream
Not out of the box.
You'll need to buffer the data from the original stream in a FIFO manner, discarding only data which has been read by all "reader" streams.
I'd use:
A "management" object holding some sort of queue of byte[] holding the chunks to be buffered and reading additional data from the source stream if required
Some "reader" instances which known where and on what buffer they are reading, and which request the next chunk from the "management" and notify it when they don't use a chunk anymore, so that it may be removed from the queue
This could be tricky without risking keeping everything buffered in memory (if the streams are at BOF and EOF respectively).
I wonder whether it isn't easier to write the stream to disk, copy it, and have two streams reading from disk, with self-deletion built into the Close() (i.e. write your own Stream wrapper around FileStream).
The below seems to be valid called EchoStream
http://www.codeproject.com/Articles/3922/EchoStream-An-Echo-Tee-Stream-for-NET
Its a very old implementation (2003) but should provide some context
found via Redirect writes to a file to a stream C#
You can't really do this without duplicating at least part of the sourse stream - mostly due to the fact that if doesn't sound like you can control the rate at which they are consumed (multiple threads?). You could do something clever regarding one reading ahread of the other (and thereby making the copy at that point only) but the complexiy of this sounds like it's not worth the trouble.
I do not think you will be able to find a generic implementation to do just that. A Stream is rather abstract, you don't know where the bytes are coming from. For instance you don't know if it will support seeking; and you don't know the relative cost of operations. (The Stream might be an abstraction of reading data from a remote server, or even off a backup tape !).
If you are able to have a MemoryStream and store the contents once, you can create two separate streams using the same buffer; and they will behave as independent Streams but only use the memory once.
Otherwise, I think you are best off by creating a wrapper class that stores the bytes read from one stream, until they are also read by the second stream. That would give you the desired forward-only behaviour - but in worst case, you might risk storing all of the bytes in memory, if the second Stream is not read until the first Stream has completed reading all content.
With the introduction of async / await, so long as all but one of your reading tasks are async, you should be able to process the same data twice using only a single OS thread.
What I think you want, is a linked list of the data blocks you have seen so far. Then you can have multiple custom Stream instances that hold a pointer into this list. As blocks fall off the end of the list, they will be garbage collected. Reusing the memory immediately would require some other kind of circular list and reference counting. Doable, but more complicated.
When your custom Stream can answer a ReadAsync call from the cache, copy the data, advance the pointer down the list and return.
When your Stream has caught up to the end of the cache list, you want to issue a single ReadAsync to the underlying stream, without awaiting it, and cache the returned Task with the data block. So if any other Stream reader also catches up and tries to read more before this read completes, you can return the same Task object.
This way, both readers will hook their await continuation to the result of the same ReadAsync call. When the single read returns, both reading tasks will sequentially execute the next step of their process.

Categories

Resources