C# using memory mapped files

C# using memory mapped files - c#

I was wondering, that what file extension do those memory mapped files have. Are they .dll's or something like that. Another thing is that can I use such a file, if I don't know its contents.

What file extension do memory mapped files have?
Memory mapped files can have any file extension. You can create a file mapping for any file.
Can I use such a file, if I don't know its contents?
Yes, you can create a file mapping for any file without knowing its contents.
These answers are so trivial that I suspect that you don't fully understand what a memory mapped file is and why they are useful. I suspect that the question you should have asked is: What is a memory mapped file?

Quote from MSDN
A memory-mapped file contains the contents of a file in virtual
memory. This mapping between a file and memory space enables an
application, including multiple processes, to modify the file by
reading and writing directly to the memory. Starting with the .NET
Framework 4, you can use managed code to access memory-mapped files in
the same way that native Windows functions access memory-mapped files,
as described in Managing Memory-Mapped Files in Win32 in the MSDN
Library.
There are two types of memory-mapped files:
Persisted memory-mapped files Persisted files are memory-mapped files that are
associated with a source file on a disk. When the last process has
finished working with the file, the data is saved to the source file
on the disk. These memory-mapped files are suitable for working with
extremely large source files.
Non-persisted memory-mapped files
Non-persisted files are memory-mapped files that are not associated
with a file on a disk. When the last process has finished working with
the file, the data is lost and the file is reclaimed by garbage
collection. These files are suitable for creating shared memory for
inter-process communications (IPC).

A memory-mapped file is a technique provided by the operating system that allows you to access any given file as if it were a piece of memory. The OS just maps it to a portion of the operating memory available to your process.
Nothing more, nothing less. Hence concerns regading the file's extensions and knowledge of its contents are irrelevant. However, one would expect you know what's in a file you are trying to work with.

The origin of a "memory mapped file" is from OS/2 (the predecessor to Windows NT) where it was called, "global shared memory segment" which in fact is a more accurate term for it.
These are used to share DATA IN MEMORY across applications. Such data can be saved to disk upon ALL apps that have hooks to it, have also exited (persistence)...sometimes needed, sometimes not.
For those that talk about reading in a file into memory, yes, you could do that, but WHY would you? Do you ever need to re-read the same file? If so, get what you need and load up some variables (e.g. configuration file).
This feature is really used for sharing DATA that is continually modified (by one or more apps) and read by one or more apps. Much easier and quicker to do than using a database, reading/writing to disk files, etc.

Related

When using Filestream Filemode.Append does it overwrite what is lying next to the file?

Lets assume that exactly 1 byte after the File-1-EOF another file (file2) starts.
If I open up file 1 and use FileStream Filemode.Append, does it overwrite file2 or does it make another copy at a place where there is enough memory?
Thanks, in regards!
Edit:
For everyone after me: I forgot that you have a file system, which is split into chunks. Making this question nonsense!

You appear to be laboring under the misapprehension that files are stored sequentially on disk, and that extending one file might overwrite parts of another file. This doesn't happen when you go via a filestream append in c#. The operating system will write the bytes you add however it likes, wherever it likes (and it likes to not overwrite other files) which is how files end up broken into smaller chunks (and why defragging is thing) scattered all over the disk. None of this is of any concern to you, because the OS presents those scattered file fragments as a single contiguous stream of bytes to any program that wants to read them
Of course, if you wrote a program that bypassed the OS and performed low level disk access, located the end of the file and then blindly write more bytes into the locations after it then you would end up damaging other files, and even the OS's carefully curated filesystem .. but a .net file stream won't make that possible
TLDR; add your bytes and don't worry about it. Keeping the filesystem in order is not your job

If I open up file 1 and use FileStream Filemode.Append, does it overwrite file2 or does it make another copy at a place where there is enough memory?
Thankfully no.
Here's a brief overview why:
Your .NET C# code does not have direct OS level interaction.
Your code is compiled into byte-code and is interpreted at runtime by the .NET runtime.
During runtime your byte-code is executed by the .NET Runtime which is built mostly in a combination of C#/C/C++.
The runtime secures what it calls SafeHandles, which are wrappers around the file handles provided by what I can assume is window.h(for WIN32 applications at least), or whatever OS level provider for file handles you're architecture is running on.
The runtime uses these handles to read and write data using the OS level API.
It is the OS's job to ensure changes to yourfile.txt, using the handle it's provided to the runtime, only affects that file.
Files are not generally stored in memory, and as such are not subject to buffer overflows.
The runtime may use a buffer in memory to.. buffer your reads and writes but that is implemented by the runtime, and has no affect on the file and operating system.
Any attempt to overflow this buffer is safe-guarded by the runtime itself and the execution of your code will stop. Regardless, if a buffer overflow happened on this buffer successfully - no extra bytes will be written to the underlying handle. Rather the runtime would likely stop executing with a memory access violation, or general unspecified behavior.
The handle you're given is little more than a token that the OS uses to keep track which file you want to read or write bytes to.
If you attempt to write more bytes to a file than an architecture allows - most operating systems will have safe guards in place to end your process, close the file, or straight up send an interrupt to crash the system.

What happens if computer hangs while persisting a memory-mapped file?

I'm very interested in using managed memory-mapped files available since .NET 4.0.
Check the following statement extracted from MSDN:
Persisted files are memory-mapped files that are associated with a
source file on a disk. When the last process has finished working with
the file, the data is saved to the source file on the disk. These
memory-mapped files are suitable for working with extremely large
source files.
My question is: what happens if computer hangs while persisting a memory-mapped file?
I mean, since memory-mapped files are stored in virtual memory (I understand that this is in the page file), maybe a file can be restored from virtual memory and try to store it again to the source file after restarting Windows.

The data pages that underlie a memory mapped file reside in the OS cache (file cache). Whenever you shutdown Windows it writes all modified cache pages to the file system.
The pages in the cache are either ordinary file data (from processes doing reads/writes from/to files) or memory mapped pages which are read/written by the paging system.
If Windows is unable (e.g. crashes or freezes) to flush cache contents to disk then that data is lost.

If enable persistence , memory map file not remove after reboot .
you can use atomic action process with a flag that show data is valid or not if valid you can restore else data lost
If your os support (kernel or filesystem lifetime) like unix you can use share memory with synchronisation that is more faster than map file
Modern Operating Systems 3e (2007) book memory map file: Shared libraries are really a special case of a more general facility called memory-mapped files. The idea here is that a process can issue a system call to map a file onto a portion of its virtual address space. In most implementations, no pages are brought in at the time of the mapping, but as pages are touched, they are demand paged in one at a time, using the disk file as the backing store. When the process exits, or explicitly unmaps the file, all the modified pages are written back to the file. Mapped files provide an alternative model for I/O. Instead of doing reads and writes, the file can be accessed as a big character array in memory. In some situations, programmers find this model more convenient. If two or more processes map onto the same file at the same time, they can communicate over shared memory. Writes done by one process to the shared memory are immediately visible when the other one reads from the part of its virtual address spaced mapped onto the file. This mechanism thus provides a high bandwidth channel between processes and is often used as such (even to the extent of mapping a scratch file). Now it should be clear that if memory-mapped files are available, shared libraries can use this mechanism
In http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2044.html
Shared Memory
POSIX defines a shared memory object as "An object that represents memory that can be mapped concurrently into the address space of more than one process."
Shared memory is similar to file mapping, and the user can map several regions of a shared memory object, just like with memory mapped files. In some operating systems, like Windows, shared memory is an special case of file mapping, where the file mapping object accesses memory backed by the system paging file. However, in Windows, the lifetime of this memory ends when the last process connected to the shared memory object closes connection or the application exits, so there is no data persistence. If an application creates shared memory, fills it with data and exits, the data is lost. This lifetime is known as process lifetime
In POSIX operating systems the shared memory lifetime is different since for semaphores, shared memory, and message queues it's mandatory that the object and its state (including data, if any) is preserved after the object is no longer referenced by any process. Persistence of an object does not imply that the state of the object is preserved after a system crash or reboot, but this can be achieved since shared memory objects can actually be implemented as mapped files of a permanent file system. The shared memory destruction happens with an explicit call to unlink(), which is similar to the file destruction mechanism. POSIX shared memory is required to have kernel lifetime (the object is explicitly destroyed or it's destroyed when the operating system reboots) or filesystem persistence (the shared memory object has the same lifetime as a file).
This lifetime difference is important to achieve portability. Many portable runtimes have tried to achieve perfect portability between Windows and POSIX shared memory but the author of this paper has not seen any satisfactory effort. Adding a reference count to POSIX shared memory is effective only as long as a process does not crash, something that it's very usual. Emulating POSIX behaviour in Windows using native shared memory is not possible since we could try to dump shared memory to a file to obtain persistence, but a process crash would avoid persistence. The only viable alternative is to use memory mapped files in Windows simulating shared memory, but avoiding file-memory synchronization as much as possible.
Many other named synchronization primitives (like named mutexes or semaphores) suffer the same lifetime portability problem. Automatic shared memory cleanup is useful in many contexts, like shared libraries or DLL-s communicating with other DLL-s or processes. Even when there is a crash, resources are automatically cleaned up by the operating systems. POSIX persistence is also useful when a launcher program can create and fill shared memory that another process can read or modify. Persistence also allows data recovery if a server process crashes. All the data is still in the shared memory, and the server can recover its state.
This paper proposes POSIX lifetime (kernel or filesystem lifetime) as a more portable solution, but has no strong opinion about this. The C++ committee should take into account the use cases of both approaches to decide which behaviour is better or if both options should be available, forcing the modification of both POSIX and Windows systems.

How WriteFile function interacts with another data on disk?

I using WriteFile function for writing sectors on disk. How WriteFile function interacts with another data on drive or disk? How I can write file without accidentally removing another file? And is it possible that the operating system can accidentally remove my file?

When you write directly to the disk you are bypassing the file system completely. Unless you re-implement the functionality required to read and respect the file system then you can expect to destroy the disk. You will likely not only write over other files, but it is likely that you will overwrite the meta data – that is the information that describes the structure and location of the directories and files, their attributes and so on.
If the disk already contains a functioning file system and you don't want to disturb that then there is practically no scenario that I can imagine where it makes sense to write to the disk directly. If you want to write files to the disk, do just that. I suspect that you have made a mistake in your reasoning somewhere that has led you to believe that you should be writing directly to the disk.

Do you really write sectors on disk and not to a file on disk? Some background information would have been great, because if you are really writing into the raw disk surface instead of writing to a file on the disk, through the operating system, using higher level functions such as fopen(), fwrite(), or even higher that that, then you should be doing it for a good reason. Might we inquire as to what that reason is?
If you are writing disk sectors with no regards as to what filesystem the disk has, then all bets are off. Supposing that the operating system allows that, there's nothing to protect you from overwriting critical disk data or from the OS overwriting your files.
I've done that to access numbered sectors on an embedded system whose only contact to the "outside world" (the PC) was a custom hacked USB mass storage interface. And even then I broke cold sweat every time I had to do it - If my code would have accidentally written to the hard disk of the PC, it would have probably been good-bye to the OS installation on the hard disk and all the files written to it.

Multiple FileStreams for reading/writing different sections of same file concurrently in independent threads

I have a large disk file (around 8 GB) containing several million records that I need to read, process in memory, and write back to another file. All the records are of a fixed length (say, 100 bytes).
I was thinking of parallelizing my process to run on multiple threads (typically 4–8), each of which would be (exclusively) assigned a particular section of the file to process (for example, 1 GB chunks). Since each thread would restrict its reads and writes to the section of the file it has been assigned, there is no risk of race hazards from my code.
Am I allowed to initialize multiple threads, each with its own FileStream, for reading from / writing to the same file without locking, without risking corruption? Assume that the target file has been expanded to its full size in advance (using FileStream.SetLength), and that the appropriate FileShare flag is specified when opening each FileStream.
Also, would I risk incurring slow-downs due to loss of buffering if multiple threads access the same file simultaneously? I am concerned about the “Detection of stream position changes” section in the MSDN documentation on the FileStream class, which states:
When a FileStream object does not have an exclusive hold on its handle, another thread could access the file handle concurrently and change the position of the operating system's file pointer that is associated with the file handle. […]
If an unexpected change in the handle position is detected in a call to the Read method, the .NET Framework discards the contents of the buffer and reads the stream from the file again. This can affect performance, depending on the size of the file and any other processes that could affect the position of the file stream.
Would this apply in my case, or are the file handles created by FileStream instances distinct and independent, even if accessing the same file?

This is perfectly safe.
There is no risk of the problem mentioned in the MSDN article as it only applies if you make changes to underlying handle yourself. You are not accessing the handle at all.
You will notice random disk IO though which can destroy performance. You probably want to mitigate this by reading big chunks from the file (16MB or so) and using a lock to prevent concurrent Read and Write calls. Notice that you need to prevent concurrent calls even on different FileStream instances because IOs are not treated atomically by the OS. Internally they get split to small sizes to allow for fairness and predictable latency. This leads to random IO.
Why don't you just create one reader thread pushing buffers into a BlockingCollection? You can process that collection using Parallel.ForEach on multiple threads.

"A memory-mapped file maps the contents of a file to an application’s logical address space. Memory-mapped files enable programmers to work with extremely large files because memory can be managed concurrently, and they allow complete, random access to a file without the need for seeking. Memory-mapped files can also be shared across multiple processes.
The CreateFromFile methods create a memory-mapped file from a specified path or a FileStream of an existing file on disk. Changes are automatically propagated to disk when the file is unmapped.
The CreateNew methods create a memory-mapped file that is not mapped to an existing file on disk; and are suitable for creating shared memory for interprocess communication (IPC).
A memory-mapped file is associated with a name.
You can create multiple views of the memory-mapped file, including views of parts of the file. You can map the same part of a file to more than one address to create concurrent memory. For two views to remain concurrent, they have to be created from the same memory-mapped file. Creating two file mappings of the same file with two views does not provide concurrency."
http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile.aspx

c# Memory Mapped file Read

I am trying to read a few text files ( around 300 kb each ). Until now I've been using the Filestream to open the file and read it. ( TAB DELIMITED ). However, I heard about the memory mapped file in .net 4.0. Would it make my reads any faster ?
Is there any sample code that does the read of a simple file and compare performance ?

If the files are on disk and just need to be read into memory, then using a memory mapped file will not help at all, as you still need to read them from disk.
If all you are doing is reading the files, there is no point in memory mapping them.
Memory mapped files are for use when you are doing intensive work with the file (reading, writing, changing) and want to avoid the disk IO.

If you're just reading once then memory-mapped files don't make sense; it still takes the same amount of time to load the data from disk. Memory-mapped files excel when many random reads and/or writes must be performed on a file since there's no need to interrupt the read or write operations with seek operations.

With your amount of data MMFs don't give any advantage. However, in general, if one bothers to carry the tests, he will find, that copying large (huge) files using MMFs is faster than calling ReadFile/WriteFile sequentially. This is caused by different mechanisms used internally in Windows for MMF management and for file IO.

Processing data in memory always faster than doing something similar via disk IO. If your processing is sequential and easily fit into memory, you can use File.ReadLines() to get data line by line and process them fast without hard memory overhead. Here example: How to open a large text file in C#
Check this answer too: When to use memory-mapped files?

Memory Mapped File is not recommended to read text files. To read text file you are doing right with Filestream. MMP is best to read binary data.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.