.NET equivalent of Java FileChannel? - c#

I want to stream bytes directly from a TCP socket to a file on disk. In Java, it's possible to use NIO Channels, specifically SocketChannel and FileChannel. Quoting FileChannel#transferFrom(...):
This method is potentially much more efficient than a simple loop that
reads from the source channel and writes to this channel. Many
operating systems can transfer bytes directly from the source channel
into the filesystem cache without actually copying them.
Obviously I can just write the standard "copy loop" to read and write the bytes, and even take advantage of asynchronous I/O to minimize waiting. Will that be comparable to the platform native functionality that Java is taking advantage of, or is there another approach?

You can read the data with a NetworkStream, then use the CopyTo extension method to write the data into a FileStream.
A manual approach, pre .NET4: How do I save a stream to a file in C#?
For sending, there is a Socket.SendFile method that directly sends a file, utilizing the Win32 TransmitFile method. Unfortunately there is no corresponding Socket.ReceiveFile method or ReceiveFile Win32 API.

Related

efficient continuous data writes on HDD

In my application I need to continuously write data chunks (around 2MB) about every 50ms in a large file (around 2-7 GB). This is done in a sequential, circular way, so I write chunk after chunk into the file and when I'm at the end of the file I start again at the beginning.
Currently I'm doing it as follows:
In C# I call File.OpenWrite once to open the file with read access and set the size of the file with SetLength. When I need to write a chunk, I pass the safe file handle to the unmanaged WriteFile (kernel32.dll). Hereby I pass an overlapped structure to specify the position within the file where the chunk has to be written. The chunk I need to write is stored in unmanaged memory, so I have an IntPtr which I can pass to WriteFile.
Now I'd like to know if and how I can make this process more efficient. Any ideas?
Some questions in detail:
Will changing from file I/O to memory-mapped file help?
Can I include some optimizations for NTFS?
Are there some useful parameters when creating the file that I'm missing? (maybe an unmanaged call with special parameters)
Using better hardware will probably be the most cost efficient way to increase file writing efficiency.
There is a paper from Microsoft research that will answer most of your questions: Sequential File Programming Patterns and Performance with .NET and the downloadable source code (C#) if you want to run the tests from the paper on your machine.
In short:
The default behavior provides excellent performance on a single disk.
Unbufffered IO should be tested if you have a disc array. Could improve write speed with a factor of eight.
This thread on social.msdn might also be of interest.

Using named pipes asynchronously with StreamWriter

I am trying to send a string over a named pipe using StreamWriter, but StreamWriter class only offers synchronous operations. I can use BeginWrite method of the NamedPipeServerStream class, but I wonder why there are no writer classes that would allow asynchronous operations. Am I missing something obvious?
It would be significantly more complicated than for the raw streams. For the raw streams, any amount of data might come in asynchronously and the system just passes the buffer to you. The reader requires character encoding which may turn several bytes of raw data into a single Unicode character. Not that this would be impossible, the framework libraries just don't take it that far so you'll need to do this work yourself.
(Depending on your needs, creating another thread and performing the operations synchronously on it might make it easier to write your program. Note that scalability would be generally be higher when you use Begin/End async.)

Multithreaded compression in C#

Is there a library in .net that does multithreaded compression of a stream? I'm thinking of something like the built in System.IO.GZipStream, but using multiple threads to perform the work (and thereby utilizing all the cpu cores).
I know that, for example 7-zip compresses using multiple threads, but the C# SDK that they've released doesn't seem to do that.
I think your best bet is to split the data stream at equal intervals yourself, and launch threads to compress each part separately in parallel, if using non-parallelized algorithms. (After which a single thread concatenates them into a single stream (you can make a stream class that continues reading from the next stream when the current one ends)).
You may wish to take a look at SharpZipLib which is somewhat better than the intrinsic compression streams in .NET.
EDIT: You will need a header to tell where each new stream begins, of course. :)
Found this library: http://www.codeplex.com/sevenzipsharp
Looks like it wraps the unmanaged 7z.dll which does support multithreading. Obviously not ideal having to wrap unmanaged code, but it looks like this is currently the only option that's out there.
I recently found a compression library that supports multithreaded bzip compression:DotNetZip. The nice thing about this library is that the ParallelBZip2OutputStream class is derived from System.IO.Stream and takes a System.IO.Stream as output. This means that you can create a chain of classes derived from System.IO.Stream like:
ICSharpCode.SharpZipLib.Tar.TarOutputStream
Ionic.BZip2.ParallelBZip2OutputStream (from the DotNetZip library)
System.Security.Cryptography.CryptoStream (for encryption)
System.IO.FileStream
In this case we create a .tar.bz file, encrypt it (maybe with AES) and directly write it to a file.
A compression format (but not necessarily the algorithm) needs to be aware of the fact that you can use multiple threads. Or rather, not necessarily that you use multiple threads, but that you're compressing the original data in multiple steps, parallel or otherwise.
Let me explain.
Most compression algorithms compress data in a sequential manner. Any data can be compressed by using information learned from already compressed data. So for instance, if you're compressing a book by a bad author, which uses a lot of the same words, clichés and sentences multiple times, by the time the compression algorithm comes to the second+ occurrence of those things, it will usually be able to compress the current occurrence better than the first occurrence.
However, a side-effect of this is that you can't really splice together two compressed files without decompressing both and recompressing them as one stream. The knowledge from one file would not match the other file.
The solution of course is to tell the decompression routine that "Hey, I just switched to an altogether new data stream, please start fresh building up knowledge about the data".
If the compression format has support for such a code, you can easily compress multiple parts at the same time.
For instance, a 1GB file could be split into 4 256MB files, compress each part on a separate core, and then splice them together at the end.
If you're building your own compression format, you can of course build support for this yourself.
Whether .ZIP or .RAR or any of the known compression formats can support this is unknown to me, but I know the .7Z format can.
Normally I would say try Intel Parallel studio, which lets you develop code specifically targetted at multi-core systems, but for now it does C/C++ only. Maybe create just lib in C/C++ and call that from your C# code?

In-file data copy using DMA

I need to move some data from one area of a file to another. Currently, I am reading the bytes and writing them back out. But I'm wondering if doing a DMA transfer would be faster, if it is possible. I'm in C#, but unsafe and p/invoke functions are acceptable.
You are, as far as I can tell, not doing DMA 'by accident' by using the usual file streams to copy from one file to another. DMA is in use in the background in some places (for instance to transfer from disk to RAM by using FileStreams), but it cannot handle direct file-to-file stream copy in C#.
DMA itself is pretty complex and native to low level languages. I'm referring to this document. All code examples are in C and asm, so its not directly applicable to C#.
The DMA is another chip on your motherboard (usually is an Intel 8237
chip) that allows you (the programmer) to offload data transfers
between I/O boards. DMA actually stands for 'Direct Memory Access'. An
example of DMA usage would be the Sound Blaster's ability to play
samples in the background. The CPU sets up the sound card and the DMA.
When the DMA is told to 'go', it simply shovels the data from RAM to
the card. Since this is done off-CPU, the CPU can do other things
while the data is being transferred.
An alternative could be to let the OS handle the transfer: Simply use File.Copy.

Send large byte arrays between AppDomains in the same process

I'm building a network server and starting a lot of AppDomains on the server to which requests are routed. What will be the fastest way to send off a request payload to one of the AppDomains for processing?
Read in the payload from the socket into a byte array and marshal it.
Marshal the network stream (inherits from MarshalByRef) to the AppDomain.
Read the payload. Decode it into objects. Marshal the decoded objects.
Use named pipes to transfer the byte array.
Use loopback sockets.
Maybe there is a way to marshal the actual socket connection?
The decoding mostly creates immutable objects that are used to determine how to fulfill the clients request and the AppDomain then creates a response and marshals it back to the host AppDomain which sends it back through the socket.
The method should prefer less memory over less CPU.
WCF is not an option.
TCP binary remoting is certainly fast, I do not how much faster it is than raw sockets which is probably the fastest, but a royal PIA.
I have run 1500 - 2000 req per second in production using HTTP binary remoting between two boxes. On the same box you should have much high performance using TCP or a name pipes channel, depending in the CPU cycles it takes to process the data.
If I was you I would take a look at how Cassini is implemented. It does pretty much exactly what you are talking about doing.
Actually Cassini has been sort of superceded by Webhost which is the built-in webserver that ships with Visual Studio now. Take a look at this post on Phil Haack's blog for more.
Very good question. If I were coming at this problem I would probably use a Buffered Stream / Memory Stream and marshal the stream into the AppDomain that consumes the object to reduce marshaling or serializing many object graphs that were created in a different AppDomain.
But then again, it sounds like you are almost completely duplicating the functionality of IIS, so I would look/reflector into the System.Web.Hosting namespace and see how they handle it and their WorkerThreadPool etc....
6 .Maybe there is a way to marshal the
actual socket connection?
6-th is IMO the best option.
Socket from process perspective is just a handle. AppDomains reside in single process. That means that appdomains can interchange socket handles.
If socket marshalling is not working, you can try recreating socket in other appdomain. You can use DuplicateAndClose to do this.
If that will not work, you should do some perfomance testing to choose the best data transfer method. (I would choose named pipes or memomry mapped files)

Categories

Resources