Let me explain what I need to accomplish. I need to load a file into RAM and analyze its structure. What I was doing is this:
//Stream streamFile;
byte[] bytesFileBuff = new byte[streamFile.Length];
if(streamFile.Read(bytesFileBuff, 0, streamFile.Length) == streamFile.Length)
{
//Loaded OK, can now analyze 'bytesFileBuff'
//Go through bytes in 'bytesFileBuff' array from 0 to `streamFile.Length`
}
But in my previous experience with Windows and 32-bit processes, it seems like even smaller amounts of RAM can be hard to allocate. (In that particular example I failed to allocate 512MB on a Windows 7 machine with 16GB of installed RAM.)
So I was curious, is there a special class that would allow me to work with the contents on a file of hypothetically any length (by implementing an internal analog of a page-file architecture)?
If linear stream access (even with multiple passes) is not a viable option, the solution in Win32 would be to use Memory Mapped Files with relatively small Views.
I didn't think you could do that in C# easily, but I was wrong. It turns out that .NET 4.0 and above provide classes wrapping the Memory Mapped Files API.
See http://msdn.microsoft.com/en-us/library/dd997372.aspx
If you have used memory mapped files in C/C++, you will know what to do.
The basic idea would be to use MemoryMappedFileCreateFromFile to obtain a MemoryMappedFile object. With the object, you can call the CreateViewAccessor method to get different MemoryMappedViewAccessor objects that represent chunks of the file; you can use these objects to read from the file in chunks of your choice. Make sure you dispose MemoryamappedViewAccessors diligently to release the memory buffer.
You have to work out the right strategy for using memory mapped files. You don't want to create too many small views or you will suffer a lot of overhead. Too few larger views and you will consume a lot of memory.
(As I said, I didn't know about these class wrappers in .NET. Do read the MSDN docs carefully: I might have easily missed something important in the few minutes I spent reviewing them)
Related
I have a C# application that will continously allocate memory for data stored in byte arrays. I have another process written in python that will read from these arrays once instantiated. Both processes will be running on a ubuntu machine.
The obvious solution seems to be to share memory between the processes by passing a pointer from the C# process to the python process. However, this has turned out to be difficult.
I've mainly looked at solutions proposed online. Two notable ones are named pipes and mapped memory files. I read the following posts:
Sharing memory between C and Python. Suggested to be done via named pipes:
Share memory between C/C++ and Python
The C# application will neither read nor write from the array and the python script will only read from the array. Therefore, this solution doesn't satisfy my efficiency requirements and seems to be a superfluous solution when the data is literally stored in memory.
When i looked at memory mapped files, it seemed as if though that we would allocate memory for these memory files to write the data to. However, the data will already be allocated before the mapped file is used. Thus, it seems inefficient as well.
The second post:
https://learn.microsoft.com/en-us/dotnet/standard/io/memory-mapped-files?redirectedfrom=MSDN
The article says: "Starting with the .NET Framework 4, you can use managed code to access memory-mapped files in the same way that native Windows functions access memory-mapped files". Would an ubuntu machine run into potential problems when reading these files in the same way that windows would? And if not, could someone give either a simple example of using these mapped files between the program languages mentioned above as well as pass a reference to these mapped files between the processes, or give a reference to where someone has already done this?
Or if someone knows how to directly pass a pointer to a byte array from C# to python, that would be even better if possible.
Any help is greatly appreciated!
So after coming back to this post four months later, i did not find a solution that satisfied my efficiency needs.
I had tried to find a way to get around having to write a large amount of data, already allocated in memory, to another process. Meaning i would have needed to reallocate that same data taking up double the amount of memory and adding additional overhead even though the data would be read-safe. However, it seemed as though the proper way to solve this, in this case, for two processes running on the same machine would be to use named pipes as they are faster than i.e. sockets. As Holger stated, this ended up being a question of interprocess-communication.
I ended up writing the whole application in python which happened to be the better alternative in the end anyways, probably saving me a lot of headache.
I want to use 2 C# apps to communicate to each other by memory. For the host side I used:
int* ptr = &level;
And for the client side I want to use:
ReadProcessMemory((int)_handle, <returned result of int* ptr>, _buffer, 2, ref bytesRead);
But the ReadProcessMemory doesn't work. For example: level set to 3 but ReadProcessMemory returns 0. What the hell out of this? (NOTE: "level" field is not cleared from memory)
I tried int* ptr lots of times because lots of websites tell me to do that but that doesn't work so well with ReadProcessMemory.
I set level = 3 but the result of ReadProcessMemory of level = 0
What you ask, in the way you ask is pretty much dangerous as the process is entirely managed by the CLR. Depending what you want to share and how, you could consider sockets or pipes.
Alternatively, you could use interop, but it requires a certain expertise and tinkering in my opinion.
The cleanest way for two C# applications to communicate via memory is by use memory mapped files. Messing with the memory in a managed process can get subtle issues. Memory mapped files are a way to share information.
Keep in mind that each memory mapped file is loaded at different memory addresses, therefore you need to structure it without the use of absolute pointers.
Edit:
Direct raw memory access requires knowing the exact physical address to access as the virtual addresses allocated in the target process are different and possibly overlapping with those of the source process. C# applications are hosted by the Common Language Runtime which is in control of everything, including memory allocation. In particular, a standard c# application di not manage a bit of its own memory: as the runtime moves the objects as part of the normal application lifetime, such addresses change over time.
If you are in control of the target application, you can pin the object via the GC class to forbid movements, then you have to take the address of the object, and pass it to the other process. The other process must then open the target process for reading, mapping the memory segments, calculate the location of the memory to read by translating the virtual address.
What you ask requires cooperating processes and a lot of low level knowledge, and in the end, you also might never be able to read updated memory changes, as the CLR might not write back the values to memory (have a look to volatile for this).
It is clearing exciting to write such software, but when you are in control of both the applications, there are cleaner and much more reliable ways to achieve your goal.
As a side note, this technique is used by trainers, hacker tools and viruses, therfore antivirus softwares will raise red flags when they see such behavior.
I want to write a map editor for a game. I intend doing it using C++ and OpenGL. However, the game was written in Unity, so map loading/saving code was written in C#.
Since I worked on a similar project in C# WinForms, I have already written a C# dll that can manage some game generated files, including map files. I now plan to use it to load/save map files in the main C++ program.
What does the C# dll do? (tl;dr below the second line)
It has a method for loading Region into memory, consisting of an array of 1024 MemoryStreams that each contain a compressed Chunk (about 2kB to 20kB per chunk, mostly around 5kB). It also has a method for requesting a Chunk from the Region. It decompresses the stream and reads it into a Chunk object (which is a complex object with arrays, lists, dictionaries and other custom classes with complexities of their own).
I also have the methods that do the reverse - pack the Chunk object into a MemoryStream, compress it and add it the Region object that has a method which saves it to a file on the disk.
The uncompressed chunk data ranges from 15kB to over 120kB in size, and that's just raw data, not including any object creation related overhead.
In the main program, I'd probably have several thousand of those Chunksloaded into memory at once, some maybe briefly to cache some data and be unloaded (to perhaps generate distant terrain), others fully to modify them to the users wishes.
tl;dr I'd be loading anywhere from a few hundred megabytes up to over a gigabyte of data within a managed c# dll. The data wouldn't be heavily accessed, only changed when user does changes to terrain, which is not that often speaking in terms of time in CPU scale. But as the user moves the map, a lot of chunks might be requested to be loaded/unloaded at a time.
Given that all this is within a managed C# dll, my question is, what happens to memory management and how does that impact performance of the native C++ program? To what extent can I control the memory allocation for the Region/Chunk objects? How does that impact the speed of execution?
Is it something that can be overlooked/ignored and/or dealt with, or will it pose enough of a problem to justify rewriting the dll in native C++ with a more elaborate memory management scheme?
Like GuyFawkes, I would like to use MemoryStream to store a large amount of data, but keep encountering the 'out of memory' exceptions.
TomTom's answer is what I would like to do - use an implementation that does not require a contiguous block - but I'm wondering if there is already a free implementation available, to save me writing one myself?
Does anyone know of a good, free re-implementation of MemoryStream that can work with large streams?
EDIT:
The MemoryMappedFile solution is very interesting and I will be remembering it for other projects, however as Henk says, it strays too far from the abstraction that MemoryStream is aiming for. Specifically, the requirement of a known capacity.
The data that the replacement shall handle will in some cases be very large, but in others relatively small (and no we don't know which it will be until its too late ;)); further, many instances of the class will be in existence at the same time.
Ultimately the work required to use MemoryMappedFiles (to determine an appropriate size for each one) would be equivalent to that of implementing TomTom's solution.
Here is my implementation in case anyone needs it; I will leave this question open for a bit in case someone still responds with anything better.
http://www.codeproject.com/Articles/348590/A-replacement-for-MemoryStream
You yould create a MemoryMappedFile without a file, i.e. one that lives in system memory.
The DelayAllocatePages option delays allocations until the memory is actually needed. You need to specify a maximum capacity in advance though. Use the CreateViewStream Method to create a stream.
Not exactly a re-implementation of MemoryStream, but consider whether you can use a Memory Mapped File for your requirement.
Memory Mapped Files can solve many of the classes of problems that large memory buffers can solve, are very efficient, and are supported directly by .NET.
Is it possible to create a large (physically) contiguous file on a disk using C#? Preferably using only managed code, but if it's not possible then an unmanaged solution would be appreciated.
Any file you create will be logically contiguous.
If you want physically contiguous you are on OS and FS territory. Really (far) beyond the control of normal I/O API's.
But what will probably come close is to claim the space up-front: create an Empty stream and set the Length or Position property to what you need.
Writing a defragger?
It sounds like you're after the defragmentation API anyway:-
http://msdn.microsoft.com/en-us/library/aa363911%28v=vs.85%29.aspx
The link from the bottom, cos it seems you've missed the C# wrapper that someone has kindly produced.
http://blogs.msdn.com/b/jeffrey_wall/archive/2004/09/13/229137.aspx
With modern file systems it is hard to ensure a contiguous file on the hard disk. Logically the file is always contiguous, but the physical blocks that keep the data vary from file system to file system.
The best bet for this would be to use an old file system (ext2, FAT32, etc.) and just ask for a large file using seek to the file size you want and then flushing this file. More up-to-date file systems will probably mark a large file size, but won't actually write anything to the hard disk, instead returning zeros on a future read without actually reading.
int fileSize = 1024 * 1024 * 512;
FileStream file = new FileStream("C:\\MyFile", FileMode.Create, FileAccess.Write);
file.Seek(fileSize, SeekOrigin.Begin);
file.Close();
To build a database, you will need to use the scatter-gather I/O functions provided by the Windows API. This is a special type of file I/O that allows you to either "scatter" data from a file into memory or "gather" data from memory and write it to a contiguous region of a file. While the buffers into which the data is scattered or from which it is gathered need not be contiguous, the source or destination file region is always contiguous.
This functionality consists of two primary functions, both of which work asynchronously. The ReadFileScatter function reads contiguous data from a file on disk and writes it into an array of non-contiguous memory buffers. The WriteFileGather function reads non-contiguous data from memory buffers and writes it to a contiguous file on disk. Of course, you'll also need the OVERLAPPED structure that is used by both of these functions.
This is exactly what SQL Server uses when it reads and writes to the database and/or its log files, and in fact this functionality was added to an early service pack for NT 4.0 specifically for SQL Server's use.
Of course, this is pretty advanced level stuff, and hardly for the faint of heart. Surprisingly, you can actually find the P/Invoke definitions on pinvoke.net, but I have an intensely skeptical mistrust of the site. Since you'll need to spend a lot of quality time with the documentation just to understand how these functions work, you might as well write the declarations yourself. And doing it from C# will create a whole host of additional problems for you, such that I don't even recommend it. If this kind of I/O performance is important to you, I think you're using the wrong tool for the job.
The poor man's solution is contig.exe, a single-file defragmenter available for free download here.
In short no then
The OS will do this in the background, what i would do is make the file as big as you expect it to be, that way the OS will place it contigously. And if you need to grow the file, you again grow it by like 10% each time.
This is simular to how a SQL server keeps it database files.
Open opening the FileStream you open it with append.
Example:
FileStream fwriter = new FileStream("C:\\test.txt", FileMode.Append, FileAccess.Write, FileShare.Read);