I want to stream bytes directly from a TCP socket to a file on disk. In Java, it's possible to use NIO Channels, specifically SocketChannel and FileChannel. Quoting FileChannel#transferFrom(...):
This method is potentially much more efficient than a simple loop that
reads from the source channel and writes to this channel. Many
operating systems can transfer bytes directly from the source channel
into the filesystem cache without actually copying them.
Obviously I can just write the standard "copy loop" to read and write the bytes, and even take advantage of asynchronous I/O to minimize waiting. Will that be comparable to the platform native functionality that Java is taking advantage of, or is there another approach?
You can read the data with a NetworkStream, then use the CopyTo extension method to write the data into a FileStream.
A manual approach, pre .NET4: How do I save a stream to a file in C#?
For sending, there is a Socket.SendFile method that directly sends a file, utilizing the Win32 TransmitFile method. Unfortunately there is no corresponding Socket.ReceiveFile method or ReceiveFile Win32 API.
New to .net but still confuse about the concept of BinaryFormatter and Filestream, from all i read they both seem to be doing the same thing or similar concept. Ex. is Binaryformatter.serialize, how can that incorporate filestream and the object as parameter while i thought filestream function was to transfer the stream object to the file. I'm just confuse, can someone tell me how they work together and the difference between the two?
Streams represent raw data that can be accessed sequentially. They are used (directly or indirectly) whenever you input or output. There are different kinds of streams. For example:
NetworkStream and FileStream read and write data from a network port and disk without performing any transformations.
GzipStream or CryptoStream decorate an underlying stream by adding compression and encryption.
BinaryFormatter requires a Stream to write or read from. But its responsibility is very different: it's used to convert .NET objects to sequence of bytes that can saved or transmitted through a network. Concrete medium and additional transformations are determined by the type of stream you use.
All streams inherit from the Stream class and share the same interface which is very convenient. Classes like BinaryFormatter can rely on this shared interface without knowing the specifics of particular implementation.
Once again, BinaryFormatter is for converting an object to and from a sequence of bytes.
Streams are for reading and writing these bytes to a particular medium.
C++ and C# both use the word stream to name many classes.
C++: iostream, istream, ostream, stringstream, ostream_iterator, istream_iterator...
C#: Stream, FileStream,MemoryStream, BufferedStream...
So it made me curious to know, what does stream mean?
What are the characteristics of a stream?
When can I use this term to name my classes?
Is this limited to file I/O classes only?
Interestingly, C doesn’t use this word anywhere, as far as I know.
Many data-structures (lists, collections, etc) act as containers - they hold a set of objects. But not a stream; if a list is a bucket, then a stream is a hose. You can pull data from a stream, or push data into a stream - but normally only once and only in one direction (there are exceptions of course). For example, TCP data over a network is a stream; you can send (or receive) chunks of data, but only in connection with the other computer, and usually only once - you can't rewind the Internet.
Streams can also manipulate data passing through them; compression streams, encryption streams, etc. But again - the underlying metaphor here is a hose of data. A file is also generally accessed (at some level) as a stream; you can access blocks of sequential data. Of course, most file systems also provide random access, so streams do offer things like Seek, Position, Length etc - but not all implementations support such. It has no meaning to seek some streams, or get the length of an open socket.
There's a couple different meanings. #1 is what you probably mean, but you might want to look at #2 too.
In the libraries like those you mentioned, a "stream" is just an abstraction for "binary data", that may or may not be random-access (as opposed to data that is continuously generated, such as if you were writing a stream that generated random data), or that may be stored anywhere (in RAM, on the hard disk, over a network, in the user's brain, etc.). They're useful because they let you avoid the details, and write generic code that doesn't care about the particular source of the stream.
As a more general computer science concept, a "stream" is sometimes thought of (loosely) as "finite or infinite amount of data". The concept is a bit difficult to explain without an example, but in functional programming (like in Scheme), you can turn a an object with state into a stateless object, by treating the object's history as a "stream" of changes. (The idea is that an object's state may change over time, but if you treat the object's entire life as a "stream" of changes, the stream as a whole never changes, and you can do functional programming with it.)
From I/O Streams (though in java, the meaning is the same in C++ / C#)
An I/O Stream represents an input
source or an output destination. A
stream can represent many different
kinds of sources and destinations,
including disk files, devices, other
programs, and memory arrays.
Streams support many different kinds
of data, including simple bytes,
primitive data types, localized
characters, and objects. Some streams
simply pass on data; others manipulate
and transform the data in useful ways.
No matter how they work internally,
all streams present the same simple
model to programs that use them: A
stream is a sequence of data. A
program uses an input stream to read
data from a source, one item at a
time.
In C#, the streams you have mentioned derive from the abstract base class Stream. Each implementation of this base class has a specific purpose.
For example, FileStream supports read / write operations on a file, while the MemoryStream works on an in-memory stream object. Unlike the FileStream and MemoryStream classes, BufferedStream class allows the user to buffer the I/O.
In addition to the above classes, there are several other classes that implement the Stream class. For a complete list, refer the MSDN documentation on Stream class.
Official terms and explanations aside, the word stream itself was taken from the "real life" stream - instead of water, data is transferred from one place to another.
Regarding question you asked and still wasn't ansewered, you can name your own classes in names that contain stream but only if you implement some sort of new stream it will have correct meaning.
In C functions defined in <stdio.h> operate on streams.
Section 7.19.2 Streams in C99 discusses how they behave, though not what they are, apart from "an ordered sequence of characters".
The rationale gives more context in the corresponding section, starting with:
C inherited its notion of text streams from the UNIX environment in which it was born.
So that's where the concept comes from.
Some streams in c# appear to have a "direction" in that they are meant to be used on one way. For some of them, such as FileStream and NetworkStream that makes sense, but others does not.
For example with a GZipStream you can either compress or decompress data by writing it to the stream depending on constructor parameters. CryptoStream on the other hand force the encrypted data to the far side where decryption is forced to a read operation and encryption is a write operation.
Especially when working with cryptographic implementations It has been annoying to be forced pushing the data in a specific direction.
Is there any specific design motivation for implementing some streams in one direction only?
Update: To clarify, what I'm looking for is an understanding on why a some designs only use a single direction, not what the choice of direction is. Did anyone think of this before and found a explanation or maybe there is none.
Receiving a running stream of data that need to be processed as soon as possible. Therefore you want to write the bytes to the decoding stream as they are received.
With CryptoStream there is no natural relation to how many bytes you put in the memorystream and how many bytes you can read decrypted. Here you must take implementation specific details into consideration such as block size.
GZipStream can handle this by changing the direction of the compression.
decryption is forced to a read operation and encryption is a write operation
Is there any specific design motivation for implementing some streams in one direction only?
Well, suppose it was the other way round. It sounds sensible to decrypt-while-writing to a file, but the data still has to come from somewhere.
Meaning you would need a Stream.CopyTo() and the occasional MemoryStream. And those are the tools you can use now as well.
The choice may have been slightly arbitrary but you need to pick a direction, and encrypt-while-writing seems (to me) the most natural.
If you think of the CryptoStream as a container for encrypted content, it becomes obvious that sth. you want to Read() out of it should be decryptet and sth. you Write() into it should be encrypted.
The stream direction is arbitrary to the implementer and sometimes it doesn't match up with the direction we would like to use.
e.g. FTP file downloader only accepts a write stream, but you want to read the stream instead. Or in Reverse, the FTP uploader only accepts a read stream, but you want to write to the stream instead.
Stream is about moving data in chunks so it can be changed to move in either direction.
Here's an implementation:
https://github.com/djpai/StreamConduit
I'm implementing Deflate and GZip compression for web content. The .NET Framework DeflateStream performs very well (it doesn't compress that good as SharpZipLib, but it is much faster). Unfortunately it (and all other libs i know) miss a function to write precompressed data like stream.WritePrecompressed(byte[] buffer).
With this function it would be possible to insert precompressed blocks in the stream. This could reduce the cpu load for compressing this part and increase the total throughput of the web server.
Is there any managed lib capable of doing this? Or is there any good starting point beyond ZLIB.NET from ComponentAce to do this?
Another approach is to flush the deflater stream (and possibly also close it), to guarantee that all buffered compressed data is written to the output stream, and then simply write your precompressed data to the underlying output stream, then re-open the deflater stream on top of your output stream again.
IIRC the #ZipLib allows you to set the compression level, have you tried flushing the stream and dropping the level to 0 and then sending the already compressed data before raising the compression level again?
If you are only looking at doing this for performance reasons then this might be an acceptable solution.
Yes, you can insert precompressed blocks in to a zlib stream. Start with the zpipe.c example in the zlib source. Only where you want to insert your precompressed block, replace Z_NO_FLUSH with Z_FULL_FLUSH (otherwise don't use Z_FULL_FLUSH because the compression ratio will suffer.)
Now the compressed output is byte aligned and the last deflate block is closed. Full flush means that the next block past the precompressed block cannot contain any back references.
Append your precompressed block to the output stream (e.g. memcpy). Advance strm.next_out to the next empty byte. Continue with deflate where you left off.
flush = feof(source) ? Z_FINISH : Z_NO_FLUSH;
ret = deflate(&strm, flush);