Choice of direction in c# streams - c#

Some streams in c# appear to have a "direction" in that they are meant to be used on one way. For some of them, such as FileStream and NetworkStream that makes sense, but others does not.
For example with a GZipStream you can either compress or decompress data by writing it to the stream depending on constructor parameters. CryptoStream on the other hand force the encrypted data to the far side where decryption is forced to a read operation and encryption is a write operation.
Especially when working with cryptographic implementations It has been annoying to be forced pushing the data in a specific direction.
Is there any specific design motivation for implementing some streams in one direction only?
Update: To clarify, what I'm looking for is an understanding on why a some designs only use a single direction, not what the choice of direction is. Did anyone think of this before and found a explanation or maybe there is none.
Receiving a running stream of data that need to be processed as soon as possible. Therefore you want to write the bytes to the decoding stream as they are received.
With CryptoStream there is no natural relation to how many bytes you put in the memorystream and how many bytes you can read decrypted. Here you must take implementation specific details into consideration such as block size.
GZipStream can handle this by changing the direction of the compression.

decryption is forced to a read operation and encryption is a write operation
Is there any specific design motivation for implementing some streams in one direction only?
Well, suppose it was the other way round. It sounds sensible to decrypt-while-writing to a file, but the data still has to come from somewhere.
Meaning you would need a Stream.CopyTo() and the occasional MemoryStream. And those are the tools you can use now as well.
The choice may have been slightly arbitrary but you need to pick a direction, and encrypt-while-writing seems (to me) the most natural.

If you think of the CryptoStream as a container for encrypted content, it becomes obvious that sth. you want to Read() out of it should be decryptet and sth. you Write() into it should be encrypted.

The stream direction is arbitrary to the implementer and sometimes it doesn't match up with the direction we would like to use.
e.g. FTP file downloader only accepts a write stream, but you want to read the stream instead. Or in Reverse, the FTP uploader only accepts a read stream, but you want to write to the stream instead.
Stream is about moving data in chunks so it can be changed to move in either direction.
Here's an implementation:
https://github.com/djpai/StreamConduit

Related

Equivalent to MemoryStream that's full duplex/overlapped?

I'm doing a testbed client/server system (dotnet 4.0) that will eventually have two components communicating via streams across some transport medium, but at the moment has the two communicating via a single MemoryStream. Never used them before, and I made the assumption I could be writing and reading at the same time. However, because there's only one 'cursor' I can't be reading from the stream until it's finished writing and I can seek() back to zero.
The named pipe stuff supports full duplex operation, but only if I set one object up as the server and have the other connect to it- not something I'm wanting to do at this point.
I can get the result I want by creating a byte buffer and having two MemoryStream instances pointing at that buffer, but that falls over when I reach the end of the buffer and get an exception because the memory stream can't be expanded.
I could probably do this by creating a file instead of the array and having two FileStream instances, but that seems a somewhat messy way of doing it. And if left running would result in a full disk since nothing would be pruning the data that's been read.
What I'm after is a stream that doesn't support seek() or position, maintains separate read and write pointers, buffers data that's written to it and discards it sometime after it's been read. Feels like reinventing the wheel to roll my own. Surely such a thing is already around somewhere?

File I/O best practice - byte[] or FileStream?

I'm currently working with a lot of different file types (txt, binary, office, etc). I typically use a byte[] or string to hold the file data in memory (while it is being written/parsed) and in order to read/write it into files I write the entire data using a FileStream after the data has been completely processed.
Should I be using a TextStream instead of a string while generating data for a text file?
Should I be using a FileStream instead of a byte[] while generating data for a binary file?
Would using streams give me better performance instead of calculating the entire data and outputting it in one go at the end?
Is it a general rule that File I/O should always use streams or is my approach fine in some cases?
The advantage of a byte[]/string vs a stream may be that the byte[]/string is in memory, and accessing it may be faster. If the file is very large, however, you may end up paging thus reducing performance. Another advantage of the byte[]/string approach is that the parsing may be a little easier (simply use File.ReadAllText, say).
If your parsing allows (particularly if you don't need to seek randomly), using a FileStream can be more efficient especially if the file is rather large. Also, you can make use of C#'s (4.5) async/await features to very easily read/write the file asynchronously and process chunks that you read in.
Personally, I'd probably just read the file into memory if I'm not too worried about performance, or the file is very small. Otherwise I'd consider using streams.
Ultimately I would say write some simple test programs and time the performance of each if you're worried about the performance differences, that should give you your best answer.
Apart from talking about the size of the data, another important question is the purpose of the data. Manipulation is easier to perform when working with strings and arrays. If both strings and arrays are equally convenient then an array of bytes would be preferred. Strings have to be interpreted which brings in complexity (Encoding, BOM etc) and therefore increases the likelihood of a bug. Use strings only for text. Binary data should always be handled by byte arrays or streams.
Streams should be considered each time you either don't have to perform any manipulation or the subjected data is very large or the subjected data is coming in very slowly. Streams are a natural way of processing data part by part whereas strings and arrays in general expect the data to be there in its entirety before processing it.
Working in streams will generally yield performance since it opens up the possibility for having different channels both reading and writing asynchronously.
while generating data for a text file
If the file data flushing is immediate, your choice is StreamWriter over the FileStream. If not, then the StringBuilder.
while generating data for a binary file?
MemoryStream is a choice. Additionally, BinaryWriter over the memstream is preferred.

Write compressed data to NetworkStream

I am writing a simple client-server application, and looking the MSDN docs, I came across some interesting stream types...
http://msdn.microsoft.com/en-us/library/system.io.compression.deflatestream.aspx
http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx
Apparently, there is such a thing as a compressed stream! Naturally, this is very attractive, considering we are dealing with networking. However, most unfortunately, TcpClient.GetStream() only returns a NetworkStream -- Not any form of compressed stream.
I was wondering if it is possible to wire a compressed stream to redirect to the NetworkStream, meaning I could write the the compressed stream and that stream would write its compressed output to my NetworkStream. I'd also need to know how to do the reverse, get a compressed stream to read from a NetworkStream.
On the side, which do you recommend I do -- Which offers the fastest compression, GZip or Deflate? And what is the difference in compression between the two?
These are wrapper streams.
You can create a GzipStream around any existing stream to read or write compressed data to the inner stream.
Checkout networkComms.net, an open source network communication library which includes the option of using many different types of compression when sending data. All you have to do is change the NetworkComms.DefaultCompressor property. You can choose from:
None
LZMA (Slow Speed, Best Compression)
GZip (Good Speed, Good Compression)
QuickLZ (Best Speed, Basic Compression)
In order to to support these plus any further number of different compression methods networkComms.net uses a basic networkStream at the base and just makes sure everything has been compressed before it reaches that point.

Queue<byte> vs. Stream

Is there an difference between a Queue and a Stream in C#?
The question should be: do they even have anything in common besides both offering some sort of interface to retrieve bytes from?
A queue Queue<byte> is just that, a FIFO queue of bytes, main functionality is to enqueue or dequeue a single byte value at a time - there is no random access. You usually use a queue as part of a data structure or algorithm (i.e. breadth first search in a tree comes to mind). All data in a queue is stored in memory.
A stream on the other hand is an abstract representation of a byte stream usually obtained from a file, memory, network or other source - there is always an underlying source or target.This source doesn't have to be in memory, i.e. a network or file stream will allow you to read from or write to a file or network - so a stream is the main way to get bytes from A to B.
A queue has to stores bytes, a stream doesn't. Big difference.
Im not a C# (or even .NET) guy at all, and hopefully someone will provide a more detailed answer, but..
I think its pretty clear that Queue and Stream are quite different. I understandwhy you'd ask, but even a quick peek at the API shows a lot of differences.
http://msdn.microsoft.com/en-us/library/system.io.stream.aspx
http://msdn.microsoft.com/en-us/library/system.collections.queue.aspx
Foremost among these differences is that a Queue is part of Collections package and Stream is part of IO
EDIT - typed Queue is probably more applicable, as shown with other poster
http://msdn.microsoft.com/en-us/library/7977ey2c.aspx

What does stream mean? What are its characteristics?

C++ and C# both use the word stream to name many classes.
C++: iostream, istream, ostream, stringstream, ostream_iterator, istream_iterator...
C#: Stream, FileStream,MemoryStream, BufferedStream...
So it made me curious to know, what does stream mean?
What are the characteristics of a stream?
When can I use this term to name my classes?
Is this limited to file I/O classes only?
Interestingly, C doesn’t use this word anywhere, as far as I know.
Many data-structures (lists, collections, etc) act as containers - they hold a set of objects. But not a stream; if a list is a bucket, then a stream is a hose. You can pull data from a stream, or push data into a stream - but normally only once and only in one direction (there are exceptions of course). For example, TCP data over a network is a stream; you can send (or receive) chunks of data, but only in connection with the other computer, and usually only once - you can't rewind the Internet.
Streams can also manipulate data passing through them; compression streams, encryption streams, etc. But again - the underlying metaphor here is a hose of data. A file is also generally accessed (at some level) as a stream; you can access blocks of sequential data. Of course, most file systems also provide random access, so streams do offer things like Seek, Position, Length etc - but not all implementations support such. It has no meaning to seek some streams, or get the length of an open socket.
There's a couple different meanings. #1 is what you probably mean, but you might want to look at #2 too.
In the libraries like those you mentioned, a "stream" is just an abstraction for "binary data", that may or may not be random-access (as opposed to data that is continuously generated, such as if you were writing a stream that generated random data), or that may be stored anywhere (in RAM, on the hard disk, over a network, in the user's brain, etc.). They're useful because they let you avoid the details, and write generic code that doesn't care about the particular source of the stream.
As a more general computer science concept, a "stream" is sometimes thought of (loosely) as "finite or infinite amount of data". The concept is a bit difficult to explain without an example, but in functional programming (like in Scheme), you can turn a an object with state into a stateless object, by treating the object's history as a "stream" of changes. (The idea is that an object's state may change over time, but if you treat the object's entire life as a "stream" of changes, the stream as a whole never changes, and you can do functional programming with it.)
From I/O Streams (though in java, the meaning is the same in C++ / C#)
An I/O Stream represents an input
source or an output destination. A
stream can represent many different
kinds of sources and destinations,
including disk files, devices, other
programs, and memory arrays.
Streams support many different kinds
of data, including simple bytes,
primitive data types, localized
characters, and objects. Some streams
simply pass on data; others manipulate
and transform the data in useful ways.
No matter how they work internally,
all streams present the same simple
model to programs that use them: A
stream is a sequence of data. A
program uses an input stream to read
data from a source, one item at a
time.
In C#, the streams you have mentioned derive from the abstract base class Stream. Each implementation of this base class has a specific purpose.
For example, FileStream supports read / write operations on a file, while the MemoryStream works on an in-memory stream object. Unlike the FileStream and MemoryStream classes, BufferedStream class allows the user to buffer the I/O.
In addition to the above classes, there are several other classes that implement the Stream class. For a complete list, refer the MSDN documentation on Stream class.
Official terms and explanations aside, the word stream itself was taken from the "real life" stream - instead of water, data is transferred from one place to another.
Regarding question you asked and still wasn't ansewered, you can name your own classes in names that contain stream but only if you implement some sort of new stream it will have correct meaning.
In C functions defined in <stdio.h> operate on streams.
Section 7.19.2 Streams in C99 discusses how they behave, though not what they are, apart from "an ordered sequence of characters".
The rationale gives more context in the corresponding section, starting with:
C inherited its notion of text streams from the UNIX environment in which it was born.
So that's where the concept comes from.

Categories

Resources