Using named pipes asynchronously with StreamWriter - c#

I am trying to send a string over a named pipe using StreamWriter, but StreamWriter class only offers synchronous operations. I can use BeginWrite method of the NamedPipeServerStream class, but I wonder why there are no writer classes that would allow asynchronous operations. Am I missing something obvious?

It would be significantly more complicated than for the raw streams. For the raw streams, any amount of data might come in asynchronously and the system just passes the buffer to you. The reader requires character encoding which may turn several bytes of raw data into a single Unicode character. Not that this would be impossible, the framework libraries just don't take it that far so you'll need to do this work yourself.
(Depending on your needs, creating another thread and performing the operations synchronously on it might make it easier to write your program. Note that scalability would be generally be higher when you use Begin/End async.)

Related

Backwards compatible replacement for C# 8.0's async streams

I hope the question title does not become imprecise, but it may happen that a direct replacement isn't available and a code restructuring becomes inevitable.
My task is to stream audio frames from HTTP, pipe them through ffmpeg and then shove them into some audio buffer.
Now a classical approach would probably involve multiple threads and lots of garbage collection I want to avoid.
My modern attempt was using async IAsyncEnumerable<Memory<byte>>, where I'd basically read content from HTTP into a fixed-size byte array, that is allocated once.
Then I'd yield return the Memory struct ("pointer"), which would cause the caller to immediately consume and transform that content via ffmpeg.
With that, I'd stay in lock-step defined by the chunk reads from the HttpClient. That way, the whole processing would happen on one thread and I would only have one big byte array as data store with a long lifetime.
The problem with this is, that Unity's Mono Version doesn't support the C# 8.0 feature of async streams (i.e. awaiting the async enumerable). So I need to come up with a replacement.
I thought about using System.Threading.Channels, but those already have a few caveats in the way the control flow is handled: With Channels, I cannot guarantee that the written Memory<T> is immediately read. As such it can happen that http is overwriting the backing buffer before the other end has read content. This would mean copying lots of data, causing garbage.
An oldschool alternative would be to maintain some kind of ring buffer, where I have a write and a read pointer that is moved whenever each end reads/writes. hand-rolling that felt dumb though, maybe there is a roughly equivalent and elegant API?
Also would you rather just have two Threads and have them busy wait? Or can I maybe just accept the garbage collector pressure and use some regular queue structure, that potentially even uses notify/wait to wake up waiting threads, so they don't have to SpinWait/busy wait?

Is it thread safe to use more then one Stream.BeginWrite in parallel?

I have a file that im going to fill up so I tought if its better to do it simultaneously.
Notes:
I get the file from multiple computers simultaneously.
I set the position every time befor calling StartWrite. -> Do I must lock it each time befor using it?
Is it good sulotion? Do you have a better one?
btw, what does Stream.Flush() ?
Thanks.
No, that would be conceptually wrong. Stream (I assume you mean a System.IO.Stream class) is an abstract class. When you instantiate an object you are using one of many child classes.
Assuming anything about child classes is wrong approach because:
a) Somebody might come after you to made modifications to your code and not see what actual child class implementation does.
b) Less likely, but the implementation can change. For example, what if someone installs your code on Mono framework.
If you are using FileStream class, consider creating two (or more) FileStream objects over the same underlying file with FileShare parameter set to Write. This way you specify that there might be simultaneous writing, but each stream has its own location pointer.
Update: Only now I saw your comment "each computer send me a part with start index, end index and byte[]". Actually, multiple FileStreams should work OK for this scenario.
void DataReceived(int start, byte[] data)
{
System.IO.FileStream f = new System.IO.FileStream("file.dat", System.IO.FileMode.Open, System.IO.FileAccess.Write, System.IO.FileShare.ReadWrite);
f.Seek(start, System.IO.SeekOrigin.Begin);
f.Write(data, start, data.Length);
f.Close();
}
This is unsafe by principle because even if your stream was thread-safe you would still have to non-atomically set the position and write.
The native Windows file APIs support this, .NET doesn't. Windows is perfectly capable of concurrent IO to the same file (how would SQL Server work if Windows didn't support this?).
I suggest you just use one writing FileStream per thread.
It's pointless to try to do several write operations to the same stream at the same time.
The underlying system can only write to one position in the file at a time, so even if the asynchronous write method would support multi threading, the writes would still be blocked.
Just do regular writes to the file, and use locking so that only one thread at a time writes to the file.

.NET equivalent of Java FileChannel?

I want to stream bytes directly from a TCP socket to a file on disk. In Java, it's possible to use NIO Channels, specifically SocketChannel and FileChannel. Quoting FileChannel#transferFrom(...):
This method is potentially much more efficient than a simple loop that
reads from the source channel and writes to this channel. Many
operating systems can transfer bytes directly from the source channel
into the filesystem cache without actually copying them.
Obviously I can just write the standard "copy loop" to read and write the bytes, and even take advantage of asynchronous I/O to minimize waiting. Will that be comparable to the platform native functionality that Java is taking advantage of, or is there another approach?
You can read the data with a NetworkStream, then use the CopyTo extension method to write the data into a FileStream.
A manual approach, pre .NET4: How do I save a stream to a file in C#?
For sending, there is a Socket.SendFile method that directly sends a file, utilizing the Win32 TransmitFile method. Unfortunately there is no corresponding Socket.ReceiveFile method or ReceiveFile Win32 API.

What does stream mean? What are its characteristics?

C++ and C# both use the word stream to name many classes.
C++: iostream, istream, ostream, stringstream, ostream_iterator, istream_iterator...
C#: Stream, FileStream,MemoryStream, BufferedStream...
So it made me curious to know, what does stream mean?
What are the characteristics of a stream?
When can I use this term to name my classes?
Is this limited to file I/O classes only?
Interestingly, C doesn’t use this word anywhere, as far as I know.
Many data-structures (lists, collections, etc) act as containers - they hold a set of objects. But not a stream; if a list is a bucket, then a stream is a hose. You can pull data from a stream, or push data into a stream - but normally only once and only in one direction (there are exceptions of course). For example, TCP data over a network is a stream; you can send (or receive) chunks of data, but only in connection with the other computer, and usually only once - you can't rewind the Internet.
Streams can also manipulate data passing through them; compression streams, encryption streams, etc. But again - the underlying metaphor here is a hose of data. A file is also generally accessed (at some level) as a stream; you can access blocks of sequential data. Of course, most file systems also provide random access, so streams do offer things like Seek, Position, Length etc - but not all implementations support such. It has no meaning to seek some streams, or get the length of an open socket.
There's a couple different meanings. #1 is what you probably mean, but you might want to look at #2 too.
In the libraries like those you mentioned, a "stream" is just an abstraction for "binary data", that may or may not be random-access (as opposed to data that is continuously generated, such as if you were writing a stream that generated random data), or that may be stored anywhere (in RAM, on the hard disk, over a network, in the user's brain, etc.). They're useful because they let you avoid the details, and write generic code that doesn't care about the particular source of the stream.
As a more general computer science concept, a "stream" is sometimes thought of (loosely) as "finite or infinite amount of data". The concept is a bit difficult to explain without an example, but in functional programming (like in Scheme), you can turn a an object with state into a stateless object, by treating the object's history as a "stream" of changes. (The idea is that an object's state may change over time, but if you treat the object's entire life as a "stream" of changes, the stream as a whole never changes, and you can do functional programming with it.)
From I/O Streams (though in java, the meaning is the same in C++ / C#)
An I/O Stream represents an input
source or an output destination. A
stream can represent many different
kinds of sources and destinations,
including disk files, devices, other
programs, and memory arrays.
Streams support many different kinds
of data, including simple bytes,
primitive data types, localized
characters, and objects. Some streams
simply pass on data; others manipulate
and transform the data in useful ways.
No matter how they work internally,
all streams present the same simple
model to programs that use them: A
stream is a sequence of data. A
program uses an input stream to read
data from a source, one item at a
time.
In C#, the streams you have mentioned derive from the abstract base class Stream. Each implementation of this base class has a specific purpose.
For example, FileStream supports read / write operations on a file, while the MemoryStream works on an in-memory stream object. Unlike the FileStream and MemoryStream classes, BufferedStream class allows the user to buffer the I/O.
In addition to the above classes, there are several other classes that implement the Stream class. For a complete list, refer the MSDN documentation on Stream class.
Official terms and explanations aside, the word stream itself was taken from the "real life" stream - instead of water, data is transferred from one place to another.
Regarding question you asked and still wasn't ansewered, you can name your own classes in names that contain stream but only if you implement some sort of new stream it will have correct meaning.
In C functions defined in <stdio.h> operate on streams.
Section 7.19.2 Streams in C99 discusses how they behave, though not what they are, apart from "an ordered sequence of characters".
The rationale gives more context in the corresponding section, starting with:
C inherited its notion of text streams from the UNIX environment in which it was born.
So that's where the concept comes from.

Multithreaded compression in C#

Is there a library in .net that does multithreaded compression of a stream? I'm thinking of something like the built in System.IO.GZipStream, but using multiple threads to perform the work (and thereby utilizing all the cpu cores).
I know that, for example 7-zip compresses using multiple threads, but the C# SDK that they've released doesn't seem to do that.
I think your best bet is to split the data stream at equal intervals yourself, and launch threads to compress each part separately in parallel, if using non-parallelized algorithms. (After which a single thread concatenates them into a single stream (you can make a stream class that continues reading from the next stream when the current one ends)).
You may wish to take a look at SharpZipLib which is somewhat better than the intrinsic compression streams in .NET.
EDIT: You will need a header to tell where each new stream begins, of course. :)
Found this library: http://www.codeplex.com/sevenzipsharp
Looks like it wraps the unmanaged 7z.dll which does support multithreading. Obviously not ideal having to wrap unmanaged code, but it looks like this is currently the only option that's out there.
I recently found a compression library that supports multithreaded bzip compression:DotNetZip. The nice thing about this library is that the ParallelBZip2OutputStream class is derived from System.IO.Stream and takes a System.IO.Stream as output. This means that you can create a chain of classes derived from System.IO.Stream like:
ICSharpCode.SharpZipLib.Tar.TarOutputStream
Ionic.BZip2.ParallelBZip2OutputStream (from the DotNetZip library)
System.Security.Cryptography.CryptoStream (for encryption)
System.IO.FileStream
In this case we create a .tar.bz file, encrypt it (maybe with AES) and directly write it to a file.
A compression format (but not necessarily the algorithm) needs to be aware of the fact that you can use multiple threads. Or rather, not necessarily that you use multiple threads, but that you're compressing the original data in multiple steps, parallel or otherwise.
Let me explain.
Most compression algorithms compress data in a sequential manner. Any data can be compressed by using information learned from already compressed data. So for instance, if you're compressing a book by a bad author, which uses a lot of the same words, clichés and sentences multiple times, by the time the compression algorithm comes to the second+ occurrence of those things, it will usually be able to compress the current occurrence better than the first occurrence.
However, a side-effect of this is that you can't really splice together two compressed files without decompressing both and recompressing them as one stream. The knowledge from one file would not match the other file.
The solution of course is to tell the decompression routine that "Hey, I just switched to an altogether new data stream, please start fresh building up knowledge about the data".
If the compression format has support for such a code, you can easily compress multiple parts at the same time.
For instance, a 1GB file could be split into 4 256MB files, compress each part on a separate core, and then splice them together at the end.
If you're building your own compression format, you can of course build support for this yourself.
Whether .ZIP or .RAR or any of the known compression formats can support this is unknown to me, but I know the .7Z format can.
Normally I would say try Intel Parallel studio, which lets you develop code specifically targetted at multi-core systems, but for now it does C/C++ only. Maybe create just lib in C/C++ and call that from your C# code?

Categories

Resources