I'm very interested in using managed memory-mapped files available since .NET 4.0.
Check the following statement extracted from MSDN:
Persisted files are memory-mapped files that are associated with a
source file on a disk. When the last process has finished working with
the file, the data is saved to the source file on the disk. These
memory-mapped files are suitable for working with extremely large
source files.
My question is: what happens if computer hangs while persisting a memory-mapped file?
I mean, since memory-mapped files are stored in virtual memory (I understand that this is in the page file), maybe a file can be restored from virtual memory and try to store it again to the source file after restarting Windows.
The data pages that underlie a memory mapped file reside in the OS cache (file cache). Whenever you shutdown Windows it writes all modified cache pages to the file system.
The pages in the cache are either ordinary file data (from processes doing reads/writes from/to files) or memory mapped pages which are read/written by the paging system.
If Windows is unable (e.g. crashes or freezes) to flush cache contents to disk then that data is lost.
If enable persistence , memory map file not remove after reboot .
you can use atomic action process with a flag that show data is valid or not if valid you can restore else data lost
If your os support (kernel or filesystem lifetime) like unix you can use share memory with synchronisation that is more faster than map file
Modern Operating Systems 3e (2007) book memory map file: Shared libraries are really a special case of a more general facility called memory-mapped files. The idea here is that a process can issue a system call to map a file onto a portion of its virtual address space. In most implementations, no pages are brought in at the time of the mapping, but as pages are touched, they are demand paged in one at a time, using the disk file as the backing store. When the process exits, or explicitly unmaps the file, all the modified pages are written back to the file. Mapped files provide an alternative model for I/O. Instead of doing reads and writes, the file can be accessed as a big character array in memory. In some situations, programmers find this model more convenient. If two or more processes map onto the same file at the same time, they can communicate over shared memory. Writes done by one process to the shared memory are immediately visible when the other one reads from the part of its virtual address spaced mapped onto the file. This mechanism thus provides a high bandwidth channel between processes and is often used as such (even to the extent of mapping a scratch file). Now it should be clear that if memory-mapped files are available, shared libraries can use this mechanism
In http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2044.html
Shared Memory
POSIX defines a shared memory object as "An object that represents memory that can be mapped concurrently into the address space of more than one process."
Shared memory is similar to file mapping, and the user can map several regions of a shared memory object, just like with memory mapped files. In some operating systems, like Windows, shared memory is an special case of file mapping, where the file mapping object accesses memory backed by the system paging file. However, in Windows, the lifetime of this memory ends when the last process connected to the shared memory object closes connection or the application exits, so there is no data persistence. If an application creates shared memory, fills it with data and exits, the data is lost. This lifetime is known as process lifetime
In POSIX operating systems the shared memory lifetime is different since for semaphores, shared memory, and message queues it's mandatory that the object and its state (including data, if any) is preserved after the object is no longer referenced by any process. Persistence of an object does not imply that the state of the object is preserved after a system crash or reboot, but this can be achieved since shared memory objects can actually be implemented as mapped files of a permanent file system. The shared memory destruction happens with an explicit call to unlink(), which is similar to the file destruction mechanism. POSIX shared memory is required to have kernel lifetime (the object is explicitly destroyed or it's destroyed when the operating system reboots) or filesystem persistence (the shared memory object has the same lifetime as a file).
This lifetime difference is important to achieve portability. Many portable runtimes have tried to achieve perfect portability between Windows and POSIX shared memory but the author of this paper has not seen any satisfactory effort. Adding a reference count to POSIX shared memory is effective only as long as a process does not crash, something that it's very usual. Emulating POSIX behaviour in Windows using native shared memory is not possible since we could try to dump shared memory to a file to obtain persistence, but a process crash would avoid persistence. The only viable alternative is to use memory mapped files in Windows simulating shared memory, but avoiding file-memory synchronization as much as possible.
Many other named synchronization primitives (like named mutexes or semaphores) suffer the same lifetime portability problem. Automatic shared memory cleanup is useful in many contexts, like shared libraries or DLL-s communicating with other DLL-s or processes. Even when there is a crash, resources are automatically cleaned up by the operating systems. POSIX persistence is also useful when a launcher program can create and fill shared memory that another process can read or modify. Persistence also allows data recovery if a server process crashes. All the data is still in the shared memory, and the server can recover its state.
This paper proposes POSIX lifetime (kernel or filesystem lifetime) as a more portable solution, but has no strong opinion about this. The C++ committee should take into account the use cases of both approaches to decide which behaviour is better or if both options should be available, forcing the modification of both POSIX and Windows systems.
Related
i would like to know if it is possible to explicitly declare which memory type (physical or virtual memory) should be used by the C# application for performing different actions? Let me explain it by an example:
Lets say, i have a file of about 100 or 200 MB in size. I need to parse this file, access and analyze its contents and perform operations on the file contents. Would it be possible for me to specifically store the entire file and the contents of it on Virtual Memory instead of physical memory?
If it is possible, then are there any side-effects/precautions that one should keep in mind?
The reason behind my question is that I often have to deal with such huge files or datasets (retrieved from databases) and perform operations on them, part of which need not occur sequentially or be synchronized. I want to improve the execution time and performance of the application by parallelizing the non sequential parts, if possible.
Generally you don't (and shouldn't need to) have any insight into how physical memory is managed. In Windows and hence in the CLR everything is virtual memory.
Unless you have a specific problem you should just pretend everything is physical memory.
You can depend on the operating system to intelligently determine what should be kept in physical memory and what can be swapped out. Swapping only occurs if there's memory pressure anyway, i.e. if you allocate more memory than is physically available.
Also, 100-200 MB isn't all that much nowadays.
physical or virtual memory
You can't actually read from virtual memory. When you attempt to this causes a page fault and the OS will do a page swap and bring that virtual memory back into physical memory. So you are only ever reading from physical memory. While swapping a page in, it will swap out a page from RAM that has not been used recently.
From the app's perspective it appears everything is in physical memory. You do not need to concern yourself with what pages are in or out of physical memory, the OS will handle that.
Just curious, maybe someone knows a way:
Is it possible, while having an opened process (app domain), dump its entire memory space to a file, send it by wire to a LAN workstation and recreate the process as it was on the first computer.
Assumptions:
the application exists on both computers;
the process is not creating any local settings/temporary files;
the OS is the same on both computers;
If you want to do so, you have to ensure you have the same environment to run the "dumped" process. Some of them:
You have to provide the same handles with the same state (process, threads, file, etc.)
The new environment must have the same memory addresses allocated (including runtime allocations) as previous had
All the libraries must be initialized and put in the same state
If you have some GUI interface even GPU must be in the same state (you have to preload all graphic resources etc.)
And many more stuff to take care about.
This is what's involved on Linux:
http://www.cs.iit.edu/~scs/psfiles/dsn08_dccs.pdf
Not exactly easy.
Performance wise, is it wrong to embed a file in a resource section of a dll?
This might seem silly, but aim trying to embed some info inside the dll which can later be fetched by some methods, in case the whole solution and documentation is lost and we have only the dll.
What are the downside of doing such a thing?
Is it suggested or prohibited ?
Embedded resources are done very efficiently. Under the hood, it uses the demand paged virtual memory capabilities of the operating system. The exact equivalent of a memory-mapped file. In other words, the resource is directly accessible in memory. And you don't pay for the resource until you start using it. The first access to the resource forces it to be read from the file and copied into RAM. And is very cheap to unmap again, the operating system can simply discard the page. There is no way to make it more efficient.
The other side of the medal is that it is permanently mapped into virtual memory. In other words, your process loses the memory space occupied by the resource. You'll run of out of available address space more quickly, an OutOfMemoryException is more likely.
This is not something you normally worry about until you gobble up, say, half a gigabyte in a 32-bit process. And don't fret about at all in a 64-bit process.
I was wondering, that what file extension do those memory mapped files have. Are they .dll's or something like that. Another thing is that can I use such a file, if I don't know its contents.
What file extension do memory mapped files have?
Memory mapped files can have any file extension. You can create a file mapping for any file.
Can I use such a file, if I don't know its contents?
Yes, you can create a file mapping for any file without knowing its contents.
These answers are so trivial that I suspect that you don't fully understand what a memory mapped file is and why they are useful. I suspect that the question you should have asked is: What is a memory mapped file?
Quote from MSDN
A memory-mapped file contains the contents of a file in virtual
memory. This mapping between a file and memory space enables an
application, including multiple processes, to modify the file by
reading and writing directly to the memory. Starting with the .NET
Framework 4, you can use managed code to access memory-mapped files in
the same way that native Windows functions access memory-mapped files,
as described in Managing Memory-Mapped Files in Win32 in the MSDN
Library.
There are two types of memory-mapped files:
Persisted memory-mapped files Persisted files are memory-mapped files that are
associated with a source file on a disk. When the last process has
finished working with the file, the data is saved to the source file
on the disk. These memory-mapped files are suitable for working with
extremely large source files.
Non-persisted memory-mapped files
Non-persisted files are memory-mapped files that are not associated
with a file on a disk. When the last process has finished working with
the file, the data is lost and the file is reclaimed by garbage
collection. These files are suitable for creating shared memory for
inter-process communications (IPC).
A memory-mapped file is a technique provided by the operating system that allows you to access any given file as if it were a piece of memory. The OS just maps it to a portion of the operating memory available to your process.
Nothing more, nothing less. Hence concerns regading the file's extensions and knowledge of its contents are irrelevant. However, one would expect you know what's in a file you are trying to work with.
The origin of a "memory mapped file" is from OS/2 (the predecessor to Windows NT) where it was called, "global shared memory segment" which in fact is a more accurate term for it.
These are used to share DATA IN MEMORY across applications. Such data can be saved to disk upon ALL apps that have hooks to it, have also exited (persistence)...sometimes needed, sometimes not.
For those that talk about reading in a file into memory, yes, you could do that, but WHY would you? Do you ever need to re-read the same file? If so, get what you need and load up some variables (e.g. configuration file).
This feature is really used for sharing DATA that is continually modified (by one or more apps) and read by one or more apps. Much easier and quicker to do than using a database, reading/writing to disk files, etc.
For the game Minecraft, the general approach when running the server application is to run it in a RAMDisk, as it is uses hundreds of tiny files for world generation, and the I/O speeds are the major bottleneck.
In a recent attempt, I tried to use Dokan/ImDisk to create a RAMDisk programatically for the Server Application. Dokan was considerably slower than the average Hard-Drive, and I was unable to get ImDisk to function properly. Since these are the only 2 Filesystem Drivers I know of that have a .NET API, i'm looking into alternatives now.
It was mentioned to me previously to try Memory-Mapped Files. My approach currently is to Create RAMDisk, Create Symbolic Link between Data Folder for Game Server and the RAMDisk, then launch the Game Server process.
Can Memory-Mapped Files function the same way, I.E. creating a virtual drive which I can create a symbolic link to? Such as G:\Data_Files\?
Are there any other alternatives to Dokan/ImDisk with a .NET API/Bindings floating around?
After looking at a bunch of solutions and doing a few benchmarks, we couldn't pass up RAMDisk from DataRam. We kicked around a bunch of the Windows driver stuff and some other freebie solutions and ultimately couldn't justify the expense compared to the tiny price tag of a commercial solution.
There are several approaches that depend on specifics of your task.
If you need to work with file system (i.e. via filesystem API functions and classes), and you want it fast, then (as I suggested in reply to your previous question) you'd need to create a RAMDisk driver. Windows Driver Kit includes a sample driver, which (coincidence?) has the name "RamDisk". Driver development, though, is tricky, and if something goes wrong with the sample or you need to extend it, you would need to dig deep into kernel-mode development (or hire someone to do the job). Why kernel mode? Cause as you could see with Dokan, switching back to user mode to store the data causes major slowdown.
If all you need is a handy management of bunch of files in memory using Stream class (with possibility to flush all of this to the disk), then you can make use of one of virtual file systems. Our SolFS (Application Edition) is one of such products that you can use (I can also remember CodeBase File System, but they don't seem to provide an evaluation version). SolFS seems to fit your task nicely so if you think so too, you can contact me privately (see my profile) for assistance.
To answer your questions:
No, memory-mapped files (MMF) are literally files on the disk (including a virtual disk if you have one), which can be accessed not via filesystem API but directly using in-memory operations. MMFs tend to be faster for most file operations, that's why they are frequently mentioned.
Our Callback File System or CallbackDisk products (see virtual storage line) are an alternative, however, as I mentioned in the first paragraph, they won't solve your problem due to user-mode context switch.
Update:
I see no obstacles for the driver to have a copy in memory and perform writes to disk asynchronously when needed. But this will require modifying sample RAMDisk driver (and this involves quite a lot of kernel-mode programming).
With SolFS or other virtual file system you can have a copy of the storage on the disk as well. In case of virtual file system it might appear that working with container file on the disk will give you satisfactory results (as virtual file system usually has a memory cache) and you won't need to keep in-memory copy at all.