Memory-Mapped Files vs. RAM Disk - c#

For the game Minecraft, the general approach when running the server application is to run it in a RAMDisk, as it is uses hundreds of tiny files for world generation, and the I/O speeds are the major bottleneck.
In a recent attempt, I tried to use Dokan/ImDisk to create a RAMDisk programatically for the Server Application. Dokan was considerably slower than the average Hard-Drive, and I was unable to get ImDisk to function properly. Since these are the only 2 Filesystem Drivers I know of that have a .NET API, i'm looking into alternatives now.
It was mentioned to me previously to try Memory-Mapped Files. My approach currently is to Create RAMDisk, Create Symbolic Link between Data Folder for Game Server and the RAMDisk, then launch the Game Server process.
Can Memory-Mapped Files function the same way, I.E. creating a virtual drive which I can create a symbolic link to? Such as G:\Data_Files\?
Are there any other alternatives to Dokan/ImDisk with a .NET API/Bindings floating around?

After looking at a bunch of solutions and doing a few benchmarks, we couldn't pass up RAMDisk from DataRam. We kicked around a bunch of the Windows driver stuff and some other freebie solutions and ultimately couldn't justify the expense compared to the tiny price tag of a commercial solution.

There are several approaches that depend on specifics of your task.
If you need to work with file system (i.e. via filesystem API functions and classes), and you want it fast, then (as I suggested in reply to your previous question) you'd need to create a RAMDisk driver. Windows Driver Kit includes a sample driver, which (coincidence?) has the name "RamDisk". Driver development, though, is tricky, and if something goes wrong with the sample or you need to extend it, you would need to dig deep into kernel-mode development (or hire someone to do the job). Why kernel mode? Cause as you could see with Dokan, switching back to user mode to store the data causes major slowdown.
If all you need is a handy management of bunch of files in memory using Stream class (with possibility to flush all of this to the disk), then you can make use of one of virtual file systems. Our SolFS (Application Edition) is one of such products that you can use (I can also remember CodeBase File System, but they don't seem to provide an evaluation version). SolFS seems to fit your task nicely so if you think so too, you can contact me privately (see my profile) for assistance.
To answer your questions:
No, memory-mapped files (MMF) are literally files on the disk (including a virtual disk if you have one), which can be accessed not via filesystem API but directly using in-memory operations. MMFs tend to be faster for most file operations, that's why they are frequently mentioned.
Our Callback File System or CallbackDisk products (see virtual storage line) are an alternative, however, as I mentioned in the first paragraph, they won't solve your problem due to user-mode context switch.
Update:
I see no obstacles for the driver to have a copy in memory and perform writes to disk asynchronously when needed. But this will require modifying sample RAMDisk driver (and this involves quite a lot of kernel-mode programming).
With SolFS or other virtual file system you can have a copy of the storage on the disk as well. In case of virtual file system it might appear that working with container file on the disk will give you satisfactory results (as virtual file system usually has a memory cache) and you won't need to keep in-memory copy at all.

Related

logging in sql vs files vs aws which is faster in c# Applications

Please help me to understand which one is better for logging performancewise.
logging in sql vs files vs aws which is faster in c# Applications
If I understand it correctly, you want to log useful information from your application (C#) somewhere to be able to refer to it (presumably when something goes wrong or to extract information for analytics).
Rule of thumb, in interprocess communications, maximum time is spent on sending data over network. If you apply this knowledge, you will be able to order your choices (and other options) from performance point of view.
As an indication order in terms of performance for few cases will be
Log file on the same drive as your program and being written from within the same process
Log file on a mounted drive on the same machine that runs your program and being written from within the same process
Log written in a database that resides on the same machine (localhost) as program
Log written in a database that resides on a different machine but in a local network
Log written on AWS which obviously will not be within your local network.
...
This said there are other considerations as well. For example a DB in local high bandwidth network on a powerful machine may write faster than a low configuration machine (e.g. ordinary laptop) having DB and program. Similarly, use of Direct Connect or fibre line between AWS and local network boosts the performance many many folds.
Thus, the answer is not straight forward, lot many factors contribute to change the order. Safest bet is to use log files on the same machine. You can always run a separate process to read asynchronously from the file and write wherever you wish.

Efficiently streaming data across process boundaries in .NET

I've been working on an internal developer tool on and off for a few weeks now, but I'm running into an ugly stumbling block I haven't managed to find a good solution for. I'm hoping someone can offer some ideas or guidance on the best ways to use the existing frameworks in .NET.
Background: the purpose of this tool is to load multiple different types of log files (Windows Event Log, IIS, SQL trace, etc.) to the same database table so they can be sorted and examined together. My personal goal is to make the entire thing streamlined so that we only make a single pass and do not cache the entire log either in memory or to disk. This is important when log files reach hundreds of MB or into the GB range. Fast performance is good, but slow and unobtrusive (allowing you to work on something else in the meantime) is better than running faster but monopolizing the system in the process, so I've focused on minimizing RAM and disk usage.
I've iterated through a few different designs so far trying to boil it down to something simple. I want the core of the log parser--the part that has to interact with any outside library or file to actually read the data--to be as simple as possible and conform to a standard interface, so that adding support for a new format is as easy as possible. Currently, the parse method returns an IEnumerable<Item> where Item is a custom struct, and I use yield return to minimize the amount of buffering.
However, we quickly run into some ugly constraints: the libraries provided (generally by Microsoft) to process these file formats. The biggest and ugliest problem: one of these libraries only works in 64-bit. Another one (Microsoft.SqlServer.Management.Trace TraceFile for SSMS logs) only works in 32-bit. As we all know, you can't mix and match 32- and 64-bit code. Since the entire point of this exercise is to have one utility that can handle any format, we need to have a separate child process (which in this case is handling the 32-bit-only portion).
The end result is that I need the 64-bit main process to start up a 32-bit child, provide it with the information needed to parse the log file, and stream the data back in some way that doesn't require buffering the entire contents to memory or disk. At first I tried using stdout, but that fell apart with any significant amount of data. I've tried using WCF, but it's really not designed to handle the "service" being a child of the "client", and it's difficult to get them synchronized backwards from how they want to work, plus I don't know if I can actually make them stream data correctly. I don't want to use a mechanism that opens up unsecured network ports or that could accidentally crosstalk if someone runs more than one instance (I want that scenario to work normally--each 64-bit main process would spawn and run its own child). Ideally, I want the core of the parser running in the 32-bit child to look the same as the core of a parser running in the 64-bit parent, but I don't know if it's even possible to continue using yield return, even with some wrapper in place to help manage the IPC. Is there any existing framework in .NET that makes this relatively easy?
WCF does have a P2P mode however if all your processes are local machine you are better off with IPC such as named pipes due to the latter running in Kernel Mode and does not have the messaging overhead of the former.
Failing that you could try COM which should not have a problem talking between 32 and 64 bit processes. - Tell me more
In case anyone stumbles across this, I'll post the solution that we eventually settled on. The key was to redefine the inter-process WCF service interface to be different from the intra-process IEnumerable interface. Instead of attempting to yield return across process boundaries, we stuck a proxy layer in between that uses an enumerator, so we can call a "give me an item" method over and over again. It's likely this has more performance overhead than a true streaming solution, since there's a method call for every item, but it does seem to get the job done, and it doesn't leak or consume memory.
We did follow Micky's suggestion of using named pipes, but still within WCF. We're also using named semaphores to coordinate the two processes, so we don't attempt to make service calls until the "child service" has finished starting up.

virtual temp file, omit IO operations

Let's say I received a .csv-File over network,
so I have a byte[].
I also have a parser that reads .csv-files and does business things with it,
using File.ReadAllLines().
So far I did:
File.WriteAllBytes(tempPath, incomingBuffer);
parser.Open(tempPath);
I won't ever need the actual file on this device, though.
Is there a way to "store" this file in some virtual place and "open" it again from there, but all in memory?
That would save me ages of waiting on the IO operations to complete (good article on that on coding horror),
plus reducing wear on the drive (relevant if this occured a few dozen times a minute 24/7)
and in general eliminating a point of failure.
This is a bit in the UNIX-direction, where everything is a file-stream, but we're talking windows here.
I won't ever need the actual file on this device, though. - Well, you kind of do if all your API's expect file on the disk.
You can:
1) Get decent API's(I am sure there are CSV parsers that take Stream as construtor parameter - you then can possibly use MemoryStream, for example.)
2) If performance is serious issue, and there is no way you can handle the API's, there's one simple solution: write your own implementation of ramdisk, which will cache everything that is needed, and page stuff to hdd if necessary.
http://code.msdn.microsoft.com/windowshardware/RAMDisk-Storage-Driver-9ce5f699 (Oh did I mention that you absolutely need to have mad experience with drivers :p?)
There's also "ready" solutions for ramdisk(Google!), which means you can just run(in your application initializer) 'CreateRamDisk.exe -Hdd "__MEMDISK__"'(for example), and use File.WriteAllBytes("__MEMDISK__:\yourFile.csv");
Alternatively you can read about memory-mapped files(>= C# 4.0 has nice support). However, by the sounds of it, that probably does not help you too much.

Create Virtual Disk with .NET?

I found a lot of good topics on Stack Overflow concerning this, but my question is a bit more specific. A lot of companies are using this software to host the same services we do...
http://memory.dataram.com/products-and-services/software/ramdisk
Apparently the Read/Write speed to a Virtual Disk is insanely faster, and as we run very intensive I/O software, I would like to write something to do the same thing. My only needs are that it runs the application on a Virtual Drive (for the increased I/O speeds) and copies the data over to the physical location on the Hard-Drive every X minutes.
Would this be pretty easy to accomplish? What should I look into using to accomplish this?
EDIT
It looks like I can use the following Dokan Library, but would "subst" command in Windows yield any I/O performance increases, or would this library be the best bet?
http://dokan-dev.net/en/about/
This really isn't a C#/.NET question, unless you want to write your own RAM disk driver. Drivers like the one at your link have been around for a long time, and they do have insane read/write speeds, at the cost of RAM availability to your application and the OS. That may not be a problem in your case, if the machine in question has lots of RAM.
The programming part of it is the periodic writing of RAM disk contents to disk. As a RAM disk usually shows up as just another drive, this is a simple matter of copying files from it to a physical disk. You could do that in C#, but it would work just as well in a number of scripting languages.
If this is a high end application, look into solid state SATA drives. They have read/write speeds considerably faster than hard drives, and the data is persistent across crashes, power failures, etc.
If you do need a RAM drive, then what you really need is a block device driver which will do the job in kernel mode. The problem with Dokan is that (a) this is a filesystem driver, and this requires lots of additional work for you, (b) it calls your user-mode code back, and this causes a slowdown, (c) it's free stuff which is not stable enough for production use.

What is the fastest way to write hundreds of files to disk using C#?

My program should write hundreds of files to disk, received by external resources (network)
each file is a simple document that I'm currently store it with the name of GUID in a specific folder but creating hundred files, writing, closing is a lengthy process.
Is there any better way to store these amount of files to disk?
I've come to a solution, but I don't know if it is the best.
First, I create 2 files, one of them is like allocation table and the second one is a huge file storing all the content of my documents. But reading from this file would be a nightmare; maybe a memory-mapped file technique could help. Could working with 30GB or more create a problem?
Edit: What is the fastest way to storing 1000 text files on disk ? (write operation performs frequently)
This is similar to how Subversion stores its repositories on disk. Each revision in the repository is stored as a file, and the repository uses a folder for each 1000 revisions. This seems to perform rather well, except there is a good chance for the files to either become fragmented or be located further apart from each other. Subversion allows you to pack each 1000 revision folder into a single file (but this works nicely since the revisions are not modified once created.
If you plan on modifying these documents often, you could consider using an embedded database to manage the solid file for you (Firebird is a good one that doesn't have any size limitations). This way you don't have to manage the growth and organization of the files yourself (which can get complicated when you start modifying files inside the solid file). This will also help with the issues of concurrent access (reading / writing) if you use a separate service / process to manage the database and communicate with it. The new version of Firebird (2.5) supports multiple process access to a database even when using an embedded server. This way you can have multiple accesses to your file storage without having to run a database server.
The first thing you should do is profile your app. In particular you want to get the counters around Disk Queue Length. Your queue length shouldn't be any more than 1.5 to 2 times the number of disk spindles you have.
For example, if you have a single disk system, then the queue length shouldn't go above 2. If you have a RAID array with 3 disks, it should be more than 6.
Verify that you are indeed write bound. If so then the best way to speed up performance of doing massive writes is to buy disks with very fast write performance. Note that most RAID setups will result in decreased performance.
If write performance is critical, then spreading out the storage across multiple drives could work. Of course, you would have to take this into consideration for any app that that needs to read that information. And you'll still have to buy fast drives.
Note that not all drives are created equal and some are better suited for high performance than others.
What about using the ThreadPool for that?
I.e. for each received "file", enqueue a write function in a thread pool thread that actually persists the data to a file on disk.

Categories

Resources