Performance wise, is it wrong to embed a file in a resource section of a dll?
This might seem silly, but aim trying to embed some info inside the dll which can later be fetched by some methods, in case the whole solution and documentation is lost and we have only the dll.
What are the downside of doing such a thing?
Is it suggested or prohibited ?
Embedded resources are done very efficiently. Under the hood, it uses the demand paged virtual memory capabilities of the operating system. The exact equivalent of a memory-mapped file. In other words, the resource is directly accessible in memory. And you don't pay for the resource until you start using it. The first access to the resource forces it to be read from the file and copied into RAM. And is very cheap to unmap again, the operating system can simply discard the page. There is no way to make it more efficient.
The other side of the medal is that it is permanently mapped into virtual memory. In other words, your process loses the memory space occupied by the resource. You'll run of out of available address space more quickly, an OutOfMemoryException is more likely.
This is not something you normally worry about until you gobble up, say, half a gigabyte in a 32-bit process. And don't fret about at all in a 64-bit process.
Related
Lets assume that exactly 1 byte after the File-1-EOF another file (file2) starts.
If I open up file 1 and use FileStream Filemode.Append, does it overwrite file2 or does it make another copy at a place where there is enough memory?
Thanks, in regards!
Edit:
For everyone after me: I forgot that you have a file system, which is split into chunks. Making this question nonsense!
You appear to be laboring under the misapprehension that files are stored sequentially on disk, and that extending one file might overwrite parts of another file. This doesn't happen when you go via a filestream append in c#. The operating system will write the bytes you add however it likes, wherever it likes (and it likes to not overwrite other files) which is how files end up broken into smaller chunks (and why defragging is thing) scattered all over the disk. None of this is of any concern to you, because the OS presents those scattered file fragments as a single contiguous stream of bytes to any program that wants to read them
Of course, if you wrote a program that bypassed the OS and performed low level disk access, located the end of the file and then blindly write more bytes into the locations after it then you would end up damaging other files, and even the OS's carefully curated filesystem .. but a .net file stream won't make that possible
TLDR; add your bytes and don't worry about it. Keeping the filesystem in order is not your job
If I open up file 1 and use FileStream Filemode.Append, does it overwrite file2 or does it make another copy at a place where there is enough memory?
Thankfully no.
Here's a brief overview why:
Your .NET C# code does not have direct OS level interaction.
Your code is compiled into byte-code and is interpreted at runtime by the .NET runtime.
During runtime your byte-code is executed by the .NET Runtime which is built mostly in a combination of C#/C/C++.
The runtime secures what it calls SafeHandles, which are wrappers around the file handles provided by what I can assume is window.h(for WIN32 applications at least), or whatever OS level provider for file handles you're architecture is running on.
The runtime uses these handles to read and write data using the OS level API.
It is the OS's job to ensure changes to yourfile.txt, using the handle it's provided to the runtime, only affects that file.
Files are not generally stored in memory, and as such are not subject to buffer overflows.
The runtime may use a buffer in memory to.. buffer your reads and writes but that is implemented by the runtime, and has no affect on the file and operating system.
Any attempt to overflow this buffer is safe-guarded by the runtime itself and the execution of your code will stop. Regardless, if a buffer overflow happened on this buffer successfully - no extra bytes will be written to the underlying handle. Rather the runtime would likely stop executing with a memory access violation, or general unspecified behavior.
The handle you're given is little more than a token that the OS uses to keep track which file you want to read or write bytes to.
If you attempt to write more bytes to a file than an architecture allows - most operating systems will have safe guards in place to end your process, close the file, or straight up send an interrupt to crash the system.
I want to use 2 C# apps to communicate to each other by memory. For the host side I used:
int* ptr = &level;
And for the client side I want to use:
ReadProcessMemory((int)_handle, <returned result of int* ptr>, _buffer, 2, ref bytesRead);
But the ReadProcessMemory doesn't work. For example: level set to 3 but ReadProcessMemory returns 0. What the hell out of this? (NOTE: "level" field is not cleared from memory)
I tried int* ptr lots of times because lots of websites tell me to do that but that doesn't work so well with ReadProcessMemory.
I set level = 3 but the result of ReadProcessMemory of level = 0
What you ask, in the way you ask is pretty much dangerous as the process is entirely managed by the CLR. Depending what you want to share and how, you could consider sockets or pipes.
Alternatively, you could use interop, but it requires a certain expertise and tinkering in my opinion.
The cleanest way for two C# applications to communicate via memory is by use memory mapped files. Messing with the memory in a managed process can get subtle issues. Memory mapped files are a way to share information.
Keep in mind that each memory mapped file is loaded at different memory addresses, therefore you need to structure it without the use of absolute pointers.
Edit:
Direct raw memory access requires knowing the exact physical address to access as the virtual addresses allocated in the target process are different and possibly overlapping with those of the source process. C# applications are hosted by the Common Language Runtime which is in control of everything, including memory allocation. In particular, a standard c# application di not manage a bit of its own memory: as the runtime moves the objects as part of the normal application lifetime, such addresses change over time.
If you are in control of the target application, you can pin the object via the GC class to forbid movements, then you have to take the address of the object, and pass it to the other process. The other process must then open the target process for reading, mapping the memory segments, calculate the location of the memory to read by translating the virtual address.
What you ask requires cooperating processes and a lot of low level knowledge, and in the end, you also might never be able to read updated memory changes, as the CLR might not write back the values to memory (have a look to volatile for this).
It is clearing exciting to write such software, but when you are in control of both the applications, there are cleaner and much more reliable ways to achieve your goal.
As a side note, this technique is used by trainers, hacker tools and viruses, therfore antivirus softwares will raise red flags when they see such behavior.
i would like to know if it is possible to explicitly declare which memory type (physical or virtual memory) should be used by the C# application for performing different actions? Let me explain it by an example:
Lets say, i have a file of about 100 or 200 MB in size. I need to parse this file, access and analyze its contents and perform operations on the file contents. Would it be possible for me to specifically store the entire file and the contents of it on Virtual Memory instead of physical memory?
If it is possible, then are there any side-effects/precautions that one should keep in mind?
The reason behind my question is that I often have to deal with such huge files or datasets (retrieved from databases) and perform operations on them, part of which need not occur sequentially or be synchronized. I want to improve the execution time and performance of the application by parallelizing the non sequential parts, if possible.
Generally you don't (and shouldn't need to) have any insight into how physical memory is managed. In Windows and hence in the CLR everything is virtual memory.
Unless you have a specific problem you should just pretend everything is physical memory.
You can depend on the operating system to intelligently determine what should be kept in physical memory and what can be swapped out. Swapping only occurs if there's memory pressure anyway, i.e. if you allocate more memory than is physically available.
Also, 100-200 MB isn't all that much nowadays.
physical or virtual memory
You can't actually read from virtual memory. When you attempt to this causes a page fault and the OS will do a page swap and bring that virtual memory back into physical memory. So you are only ever reading from physical memory. While swapping a page in, it will swap out a page from RAM that has not been used recently.
From the app's perspective it appears everything is in physical memory. You do not need to concern yourself with what pages are in or out of physical memory, the OS will handle that.
I'm very interested in using managed memory-mapped files available since .NET 4.0.
Check the following statement extracted from MSDN:
Persisted files are memory-mapped files that are associated with a
source file on a disk. When the last process has finished working with
the file, the data is saved to the source file on the disk. These
memory-mapped files are suitable for working with extremely large
source files.
My question is: what happens if computer hangs while persisting a memory-mapped file?
I mean, since memory-mapped files are stored in virtual memory (I understand that this is in the page file), maybe a file can be restored from virtual memory and try to store it again to the source file after restarting Windows.
The data pages that underlie a memory mapped file reside in the OS cache (file cache). Whenever you shutdown Windows it writes all modified cache pages to the file system.
The pages in the cache are either ordinary file data (from processes doing reads/writes from/to files) or memory mapped pages which are read/written by the paging system.
If Windows is unable (e.g. crashes or freezes) to flush cache contents to disk then that data is lost.
If enable persistence , memory map file not remove after reboot .
you can use atomic action process with a flag that show data is valid or not if valid you can restore else data lost
If your os support (kernel or filesystem lifetime) like unix you can use share memory with synchronisation that is more faster than map file
Modern Operating Systems 3e (2007) book memory map file: Shared libraries are really a special case of a more general facility called memory-mapped files. The idea here is that a process can issue a system call to map a file onto a portion of its virtual address space. In most implementations, no pages are brought in at the time of the mapping, but as pages are touched, they are demand paged in one at a time, using the disk file as the backing store. When the process exits, or explicitly unmaps the file, all the modified pages are written back to the file. Mapped files provide an alternative model for I/O. Instead of doing reads and writes, the file can be accessed as a big character array in memory. In some situations, programmers find this model more convenient. If two or more processes map onto the same file at the same time, they can communicate over shared memory. Writes done by one process to the shared memory are immediately visible when the other one reads from the part of its virtual address spaced mapped onto the file. This mechanism thus provides a high bandwidth channel between processes and is often used as such (even to the extent of mapping a scratch file). Now it should be clear that if memory-mapped files are available, shared libraries can use this mechanism
In http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2044.html
Shared Memory
POSIX defines a shared memory object as "An object that represents memory that can be mapped concurrently into the address space of more than one process."
Shared memory is similar to file mapping, and the user can map several regions of a shared memory object, just like with memory mapped files. In some operating systems, like Windows, shared memory is an special case of file mapping, where the file mapping object accesses memory backed by the system paging file. However, in Windows, the lifetime of this memory ends when the last process connected to the shared memory object closes connection or the application exits, so there is no data persistence. If an application creates shared memory, fills it with data and exits, the data is lost. This lifetime is known as process lifetime
In POSIX operating systems the shared memory lifetime is different since for semaphores, shared memory, and message queues it's mandatory that the object and its state (including data, if any) is preserved after the object is no longer referenced by any process. Persistence of an object does not imply that the state of the object is preserved after a system crash or reboot, but this can be achieved since shared memory objects can actually be implemented as mapped files of a permanent file system. The shared memory destruction happens with an explicit call to unlink(), which is similar to the file destruction mechanism. POSIX shared memory is required to have kernel lifetime (the object is explicitly destroyed or it's destroyed when the operating system reboots) or filesystem persistence (the shared memory object has the same lifetime as a file).
This lifetime difference is important to achieve portability. Many portable runtimes have tried to achieve perfect portability between Windows and POSIX shared memory but the author of this paper has not seen any satisfactory effort. Adding a reference count to POSIX shared memory is effective only as long as a process does not crash, something that it's very usual. Emulating POSIX behaviour in Windows using native shared memory is not possible since we could try to dump shared memory to a file to obtain persistence, but a process crash would avoid persistence. The only viable alternative is to use memory mapped files in Windows simulating shared memory, but avoiding file-memory synchronization as much as possible.
Many other named synchronization primitives (like named mutexes or semaphores) suffer the same lifetime portability problem. Automatic shared memory cleanup is useful in many contexts, like shared libraries or DLL-s communicating with other DLL-s or processes. Even when there is a crash, resources are automatically cleaned up by the operating systems. POSIX persistence is also useful when a launcher program can create and fill shared memory that another process can read or modify. Persistence also allows data recovery if a server process crashes. All the data is still in the shared memory, and the server can recover its state.
This paper proposes POSIX lifetime (kernel or filesystem lifetime) as a more portable solution, but has no strong opinion about this. The C++ committee should take into account the use cases of both approaches to decide which behaviour is better or if both options should be available, forcing the modification of both POSIX and Windows systems.
I'm going to create some command line tools that make use of some large library DLL's. For security reasons I plan to embed the DLL's in the command line's EXE.
Example:
Suppose the CL's (command line tool) functionality is to just copy a file from A to B. The procedure to do this is included in a 100MB library DLL. If I would just take out the lines of code from the DLL and paste them in the CL's code then the CL would only be 10Kb.
But I don't want to do that, so I embed the full library in the CL's EXE, which will make it 101MB in size.
Please be aware that the above is just an example. I once read somewhere (cannot remember where) that Windows would only use the part of the EXE that's actually used. So if that's true then it shouldn't matter if the EXE size is 10Kb, 100MB or 1GB. I don't know if that is true, so that's why I'm asking this question.
I own the code of the DLL, so if the best solution is to not include the whole DLL but just only link to or include those code files, of the DLL project, that are used by the CL then I will go that way.
So the question is: will the 10Kb CL run faster and consume less memory than the 101MB CL?
First of all, if you're embedding the extra dll into the executable for security reasons, then don't.
If the program can unpack it, anyone else cans, so you're only fooling yourself if you think this will improve security, unless it is job security you're talking about.
Secondly, I suspect the underlying question here is quite a bit harder to answer than others might think.
If you had used a regular non-managed executable and a non-managed dll, then portions of those files would be reserved in memory space when you start the program, but only the actual bits of it you use will be mapped into physical memory.
In other words, the actual amount of physical memory the program would consume would be somewhat proportional to the amount of code you use from them and how that code is spread out. I say "somewhat" because paging into physical memory is done on a page basis, and pages have a fixed size. So a 100 byte function might end up mapping a 4KB of 8KB page (I don't recall the sizes of the pages any more) into memory. If you call a lot of such functions, and they're spread out over the address space of the code, you might end up mapping in a lot of physical memory for little code.
When it comes to managed assemblies, the picture changes somewhat. Managed code isn't mapped directly into physical memory the same way (note, I'm fuzzy on the details here because I don't know the exact details) because the code is not ready to run. Before it can be run it has to be JITted, and the JITter only JITs code on a need-to-jit basis.
In other words, if you include a humongous class in the assembly, but never use it, it might never end up being JITted and thus not use any memory.
So is the answer "no", as in it won't use more memory?
I have no idea. I suspect that there will be some effect of a larger assembly, more metadata to put into reflection tables or whatnot.
However, if you intend to place it into the executable, you either need to unpack it to disk before loading it (which would "circumvent" your "security features"), or unpack it into memory (which would require all of those 100MB of physical memory.)
So if you're concerned about using a lot of memory, here's my advice:
Try it, see how bad it is
Decide if it is worth it
And don't embed the extra assembly into the executable
Will the smaller one run faster and consume less memory? Yes.
Will it be enough to make a difference? Who knows? If done wrong, the big one might take up about 100MB more memory (three guesses where I got that amount from)
But it sure seems awfully silly to include 100MB of 'stuff' that isn't needed...
EDIT: My "Yes" at the top here should be qualified with "infinitesimally so", and incidentally so. See comments, below.