I using WriteFile function for writing sectors on disk. How WriteFile function interacts with another data on drive or disk? How I can write file without accidentally removing another file? And is it possible that the operating system can accidentally remove my file?
When you write directly to the disk you are bypassing the file system completely. Unless you re-implement the functionality required to read and respect the file system then you can expect to destroy the disk. You will likely not only write over other files, but it is likely that you will overwrite the meta data – that is the information that describes the structure and location of the directories and files, their attributes and so on.
If the disk already contains a functioning file system and you don't want to disturb that then there is practically no scenario that I can imagine where it makes sense to write to the disk directly. If you want to write files to the disk, do just that. I suspect that you have made a mistake in your reasoning somewhere that has led you to believe that you should be writing directly to the disk.
Do you really write sectors on disk and not to a file on disk? Some background information would have been great, because if you are really writing into the raw disk surface instead of writing to a file on the disk, through the operating system, using higher level functions such as fopen(), fwrite(), or even higher that that, then you should be doing it for a good reason. Might we inquire as to what that reason is?
If you are writing disk sectors with no regards as to what filesystem the disk has, then all bets are off. Supposing that the operating system allows that, there's nothing to protect you from overwriting critical disk data or from the OS overwriting your files.
I've done that to access numbered sectors on an embedded system whose only contact to the "outside world" (the PC) was a custom hacked USB mass storage interface. And even then I broke cold sweat every time I had to do it - If my code would have accidentally written to the hard disk of the PC, it would have probably been good-bye to the OS installation on the hard disk and all the files written to it.
Related
Lets assume that exactly 1 byte after the File-1-EOF another file (file2) starts.
If I open up file 1 and use FileStream Filemode.Append, does it overwrite file2 or does it make another copy at a place where there is enough memory?
Thanks, in regards!
Edit:
For everyone after me: I forgot that you have a file system, which is split into chunks. Making this question nonsense!
You appear to be laboring under the misapprehension that files are stored sequentially on disk, and that extending one file might overwrite parts of another file. This doesn't happen when you go via a filestream append in c#. The operating system will write the bytes you add however it likes, wherever it likes (and it likes to not overwrite other files) which is how files end up broken into smaller chunks (and why defragging is thing) scattered all over the disk. None of this is of any concern to you, because the OS presents those scattered file fragments as a single contiguous stream of bytes to any program that wants to read them
Of course, if you wrote a program that bypassed the OS and performed low level disk access, located the end of the file and then blindly write more bytes into the locations after it then you would end up damaging other files, and even the OS's carefully curated filesystem .. but a .net file stream won't make that possible
TLDR; add your bytes and don't worry about it. Keeping the filesystem in order is not your job
If I open up file 1 and use FileStream Filemode.Append, does it overwrite file2 or does it make another copy at a place where there is enough memory?
Thankfully no.
Here's a brief overview why:
Your .NET C# code does not have direct OS level interaction.
Your code is compiled into byte-code and is interpreted at runtime by the .NET runtime.
During runtime your byte-code is executed by the .NET Runtime which is built mostly in a combination of C#/C/C++.
The runtime secures what it calls SafeHandles, which are wrappers around the file handles provided by what I can assume is window.h(for WIN32 applications at least), or whatever OS level provider for file handles you're architecture is running on.
The runtime uses these handles to read and write data using the OS level API.
It is the OS's job to ensure changes to yourfile.txt, using the handle it's provided to the runtime, only affects that file.
Files are not generally stored in memory, and as such are not subject to buffer overflows.
The runtime may use a buffer in memory to.. buffer your reads and writes but that is implemented by the runtime, and has no affect on the file and operating system.
Any attempt to overflow this buffer is safe-guarded by the runtime itself and the execution of your code will stop. Regardless, if a buffer overflow happened on this buffer successfully - no extra bytes will be written to the underlying handle. Rather the runtime would likely stop executing with a memory access violation, or general unspecified behavior.
The handle you're given is little more than a token that the OS uses to keep track which file you want to read or write bytes to.
If you attempt to write more bytes to a file than an architecture allows - most operating systems will have safe guards in place to end your process, close the file, or straight up send an interrupt to crash the system.
Is there a way, in C# on Windows, to write a file and flag it as temporary so that the operating system won't bother physically writing it to disk? The file in question is small, and will be read by another program in the very near future, and deleted and rewritten very soon after that, so that keeping it purely in RAM would make sense.
I'm currently writing the file with File.WriteAllText but could use a different API if necessary.
That's simple and could be done without any P/Invoke:
var file = File.Create(path);
File.SetAttributes(path, File.GetAttributes(path) | FileAttributes.Temporary);
The FileAttributes.Temporary enum value is the same as FILE_ATTRIBUTE_TEMPORARY constant (0x100):
Specifying the FILE_ATTRIBUTE_TEMPORARY attribute causes file systems
to avoid writing data back to mass storage if sufficient cache memory
is available, because an application deletes a temporary file after a
handle is closed. In that case, the system can entirely avoid writing
the data. Although it does not directly control data caching in the
same way as the previously mentioned flags, the
FILE_ATTRIBUTE_TEMPORARY attribute does tell the system to hold as
much as possible in the system cache without writing and therefore may
be of concern for certain applications.
source
I was wondering, that what file extension do those memory mapped files have. Are they .dll's or something like that. Another thing is that can I use such a file, if I don't know its contents.
What file extension do memory mapped files have?
Memory mapped files can have any file extension. You can create a file mapping for any file.
Can I use such a file, if I don't know its contents?
Yes, you can create a file mapping for any file without knowing its contents.
These answers are so trivial that I suspect that you don't fully understand what a memory mapped file is and why they are useful. I suspect that the question you should have asked is: What is a memory mapped file?
Quote from MSDN
A memory-mapped file contains the contents of a file in virtual
memory. This mapping between a file and memory space enables an
application, including multiple processes, to modify the file by
reading and writing directly to the memory. Starting with the .NET
Framework 4, you can use managed code to access memory-mapped files in
the same way that native Windows functions access memory-mapped files,
as described in Managing Memory-Mapped Files in Win32 in the MSDN
Library.
There are two types of memory-mapped files:
Persisted memory-mapped files Persisted files are memory-mapped files that are
associated with a source file on a disk. When the last process has
finished working with the file, the data is saved to the source file
on the disk. These memory-mapped files are suitable for working with
extremely large source files.
Non-persisted memory-mapped files
Non-persisted files are memory-mapped files that are not associated
with a file on a disk. When the last process has finished working with
the file, the data is lost and the file is reclaimed by garbage
collection. These files are suitable for creating shared memory for
inter-process communications (IPC).
A memory-mapped file is a technique provided by the operating system that allows you to access any given file as if it were a piece of memory. The OS just maps it to a portion of the operating memory available to your process.
Nothing more, nothing less. Hence concerns regading the file's extensions and knowledge of its contents are irrelevant. However, one would expect you know what's in a file you are trying to work with.
The origin of a "memory mapped file" is from OS/2 (the predecessor to Windows NT) where it was called, "global shared memory segment" which in fact is a more accurate term for it.
These are used to share DATA IN MEMORY across applications. Such data can be saved to disk upon ALL apps that have hooks to it, have also exited (persistence)...sometimes needed, sometimes not.
For those that talk about reading in a file into memory, yes, you could do that, but WHY would you? Do you ever need to re-read the same file? If so, get what you need and load up some variables (e.g. configuration file).
This feature is really used for sharing DATA that is continually modified (by one or more apps) and read by one or more apps. Much easier and quicker to do than using a database, reading/writing to disk files, etc.
My program should write hundreds of files to disk, received by external resources (network)
each file is a simple document that I'm currently store it with the name of GUID in a specific folder but creating hundred files, writing, closing is a lengthy process.
Is there any better way to store these amount of files to disk?
I've come to a solution, but I don't know if it is the best.
First, I create 2 files, one of them is like allocation table and the second one is a huge file storing all the content of my documents. But reading from this file would be a nightmare; maybe a memory-mapped file technique could help. Could working with 30GB or more create a problem?
Edit: What is the fastest way to storing 1000 text files on disk ? (write operation performs frequently)
This is similar to how Subversion stores its repositories on disk. Each revision in the repository is stored as a file, and the repository uses a folder for each 1000 revisions. This seems to perform rather well, except there is a good chance for the files to either become fragmented or be located further apart from each other. Subversion allows you to pack each 1000 revision folder into a single file (but this works nicely since the revisions are not modified once created.
If you plan on modifying these documents often, you could consider using an embedded database to manage the solid file for you (Firebird is a good one that doesn't have any size limitations). This way you don't have to manage the growth and organization of the files yourself (which can get complicated when you start modifying files inside the solid file). This will also help with the issues of concurrent access (reading / writing) if you use a separate service / process to manage the database and communicate with it. The new version of Firebird (2.5) supports multiple process access to a database even when using an embedded server. This way you can have multiple accesses to your file storage without having to run a database server.
The first thing you should do is profile your app. In particular you want to get the counters around Disk Queue Length. Your queue length shouldn't be any more than 1.5 to 2 times the number of disk spindles you have.
For example, if you have a single disk system, then the queue length shouldn't go above 2. If you have a RAID array with 3 disks, it should be more than 6.
Verify that you are indeed write bound. If so then the best way to speed up performance of doing massive writes is to buy disks with very fast write performance. Note that most RAID setups will result in decreased performance.
If write performance is critical, then spreading out the storage across multiple drives could work. Of course, you would have to take this into consideration for any app that that needs to read that information. And you'll still have to buy fast drives.
Note that not all drives are created equal and some are better suited for high performance than others.
What about using the ThreadPool for that?
I.e. for each received "file", enqueue a write function in a thread pool thread that actually persists the data to a file on disk.
I am trying to read a few text files ( around 300 kb each ). Until now I've been using the Filestream to open the file and read it. ( TAB DELIMITED ). However, I heard about the memory mapped file in .net 4.0. Would it make my reads any faster ?
Is there any sample code that does the read of a simple file and compare performance ?
If the files are on disk and just need to be read into memory, then using a memory mapped file will not help at all, as you still need to read them from disk.
If all you are doing is reading the files, there is no point in memory mapping them.
Memory mapped files are for use when you are doing intensive work with the file (reading, writing, changing) and want to avoid the disk IO.
If you're just reading once then memory-mapped files don't make sense; it still takes the same amount of time to load the data from disk. Memory-mapped files excel when many random reads and/or writes must be performed on a file since there's no need to interrupt the read or write operations with seek operations.
With your amount of data MMFs don't give any advantage. However, in general, if one bothers to carry the tests, he will find, that copying large (huge) files using MMFs is faster than calling ReadFile/WriteFile sequentially. This is caused by different mechanisms used internally in Windows for MMF management and for file IO.
Processing data in memory always faster than doing something similar via disk IO. If your processing is sequential and easily fit into memory, you can use File.ReadLines() to get data line by line and process them fast without hard memory overhead. Here example: How to open a large text file in C#
Check this answer too: When to use memory-mapped files?
Memory Mapped File is not recommended to read text files. To read text file you are doing right with Filestream. MMP is best to read binary data.