How to use MemoryMappedViewAccessor properly? - c#

Edit: If you're going to downvote, explain why please. Doesn't help anyone.
I'm really trying to understand how to use memory-mapped files properly, but I'm having some issues. Here's the scenario:
I have some large files, anywhere from 1-10 GB. These are composed of structs, all 128 bytes except the very first one of the file.
I need random access to any one of these structs fast, which from what I read is what a memory-mapped file is perfect for. I also need these memory-mapped files to be able to be opened by other processes, but there won't be any writing, just reading of the file.
I am able to determine the byte offset of the exact struct I need to read.
First I tried making the mem-mapped file like so:
MemoryMappedFile hFileMapping = MemoryMappedFile.CreateFromFile(rampPath, FMode, Path.GetFileNameWithoutExtension(rampPath), GetFileLength(rampPath), access);
However, other processes couldn't open it. So I found that there was another constructor that took in a FileStream. So I changed it to this:
FileStream fs = new FileStream(rampPath, FileMode.OpenOrCreate, FileAccess.Read, FileShare.Read);
MemoryMappedFile hFileMapping = MemoryMappedFile.CreateFromFile(fs, Path.GetFileNameWithoutExtension(rampPath), GetFileLength(rampPath), access, null, HandleInheritability.Inheritable, true);
Other processes could open the mem-mapped file now. Then I try making a view of the mapped file:
MemoryMappedViewAccessor hFileAccessor = hFileMapping.CreateViewAccessor(0, GetFileLength(rampPath), MemoryMappedFileAccess.Read));
This is fine if the files are small, but if they are large I get out of storage and memory exceptions.
So I need to map only a portion of the file at one time, but what is the best way to determine if the struct I'm looking for is in the currently mapped view? Do I keep track of the current mapped view's offset and length and see if the offset of the struct is in there? Is there a better way to go about all this?

Thanks to #Ross Bush in the OP's comments for the suggestions.
What I did was just make the ViewAccessor's capacity a set amount always. And since I have the byte offset I need, I just ran some calculations to determine if the byte offset was inside the ViewAccessor or not. If it wasn't, then change the ViewAccessor to a different offset/capacity so the byte offset fell within it.

Related

"Where are my bytes?" or Investigation of file length traits

This is a continuation of my question about downloading files in chunks. The explanation will be quite big, so I'll try to divide it to several parts.
1) What I tried to do?
I was creating a download manager for a Window-Phone application. First, I tried to solve the problem of downloading
large files (the explanation is in the previous question). No I want to add "resumable download" feature.
2) What I've already done.
At the current moment I have a well-working download manager, that allows to outflank the Windows Phone RAM limit.
The plot of this manager, is that it allows to download small chunks of file consequently, using HTTP Range header.
A fast explanation of how it works:
The file is downloaded in chunks of constant size. Let's call this size "delta". After the file chunk was downloaded,
it is saved to local storage (hard disk, on WP it's called Isolated Storage) in Append mode (so, the downloaded byte array is
always added to the end of the file). After downloading a single chunk the statement
if (mediaFileLength >= delta) // mediaFileLength is a length of downloaded chunk
is checked. If it's true, that
means, there's something left for download and this method is invoked recursively. Otherwise it means, that this chunk
was last, and there's nothing left to download.
3) What's the problem?
Until I used this logic at one-time downloads (By one-time I mean, when you start downloading file and wait until the download is finished)
that worked well. However, I decided, that I need "resume download" feature. So, the facts:
3.1) I know, that the file chunk size is a constant.
3.2) I know, when the file is completely downloaded or not. (that's a indirect result of my app logic,
won't weary you by explanation, just suppose, that this is a fact)
On the assumption of these two statements I can prove, that the number of downloaded chunks is equal to
(CurrentFileLength)/delta. Where CurrentFileLenght is a size of already downloaded file in bytes.
To resume downloading file I should simply set the required headers and invoke download method. That seems logic, isn't it? And I tried to implement it:
// Check file size
using (IsolatedStorageFileStream fileStream = isolatedStorageFile.OpenFile("SomewhereInTheIsolatedStorage", FileMode.Open, FileAccess.Read))
{
int currentFileSize = Convert.ToInt32(fileStream.Length);
int currentFileChunkIterator = currentFileSize / delta;
}
And what I see as a result? The downloaded file length is equal to 2432000 bytes (delta is 304160, Total file size is about 4,5 MB, we've downloaded only half of it). So the result is
approximately 7,995. (it's actually has long/int type, so it's 7 and should be 8 instead!) Why is this happening?
Simple math tells us, that the file length should be 2433280, so the given value is very close, but not equal.
Further investigations showed, that all values, given from the fileStream.Length are not accurate, but all are close.
Why is this happening? I don't know precisely, but perhaps, the .Length value is taken somewhere from file metadata.
Perhaps, such rounding is normal for this method. Perhaps, when the download was interrupted, the file wasn't saved totally...(no, that's real fantastic, it can't be)
So the problem is set - it's "How to determine number of the chunks downloaded". Question is how to solve it.
4) My thoughts about solving the problem.
My first thought was about using maths here. Set some epsilon-neiborhood and use it in currentFileChunkIterator = currentFileSize / delta; statement.
But that will demand us to remember about type I and type II errors (or false alarm and miss, if you don't like the statistics terms.) Perhaps, there's nothing left to download.
Also, I didn't checked, if the difference of the provided value and the true value is supposed to grow permanently
or there will be cyclical fluctuations. With the small sizes (about 4-5 MB) I've seen only growth, but that doesn't prove anything.
So, I'm asking for help here, as I don't like my solution.
5) What I would like to hear as answer:
What causes the difference between real value and received value?
Is there a way to receive a true value?
If not, is my solution good for this problem?
Are there other better solutions?
P.S. I won't set a Windows-Phone tag, because I'm not sure that this problem is OS-related. I used the Isolated Storage Tool
to check the size of downloaded file, and it showed me the same as the received value(I'm sorry about Russian language at screenshot):
I'm answering to your update:
This is my understanding so far: The length actually written to the file is more (rounded up to the next 1KiB) than you actually wrote to it. This causes your assumption of "file.Length == amount downloaded" to be wrong.
One solution would be to track this information separately. Create some meta-data structure (which can be persisted using the same storage mechanism) to accurately track which blocks have been downloaded, as well as the entire size of the file:
[DataContract] //< I forgot how serialization on the phone works, please forgive me if the tags differ
struct Metadata
{
[DataMember]
public int Length;
[DataMember]
public int NumBlocksDownloaded;
}
This would be enough to reconstruct which blocks have been downloaded and which have not, assuming that you keep downloading them in a consecutive fashion.
edit
Of course you would have to change your code from a simple append to moving the position of the stream to the correct block, before writing the data to the stream:
file.Position = currentBlock * delta;
file.Write(block, 0, block.Length);
Just as a possible bug. Dont forget to verify if the file was modified during requests. Specialy during long time between ones, that can occor on pause/resume.
The error could be big, like the file being modified to small size and your count getting "erronic", and the file being the same size but with modified contents, this will leave a corrupted file.
Have you heard an anecdote about a noob-programmer and 10 guru-programmers? Guru programmers were trying to find an error in his solution, and noob had already found it, but didn't tell about it, as it was something that stupid, we was afraid to be laughed at.
Why I remembered this? Because the situation is similar.
The explanation of my question was very heavy, and I decided not to mention some small aspects, that I was sure, worked correctly. (And they really worked correctly)
One of this small aspects, was the fact, that the downloaded file was encrypted via AES PKCS7 padding. Well, the decryption worked correctly, I knew it, so why should I mention it? And I didn't.
So, then I tried to find out, what exactly causes the error with the last chunk. The most credible version was about problems with buffering, and I tried to find, where am I leaving the missing bytes. I tested again and again, but I couldn't find them, as every chunk was saving without any losses. And one day I comprehended:
There is no spoon
There is no error.
What's the point of AES PKCS7? Well, the primary one is that it makes the decrypted file smaller. Not much, only at 16 bytes. And it was considered in my decryption method and download method, so there should be no problem, right?
But what happens, when the download process interrupts? The last chunk will save correctly, there will be no errors with buffering or other ones. And then we want to continue download. The number of the downloaded chunks will be equal to currentFileChunkIterator = currentFileSize / delta;
And here I should ask myself: "Why are you trying to do something THAT stupid?"
"Your downloaded one chunk size is not delta. Actually, it's less than delta". (the decryption makes chunk smaller to 16 bytes, remember?)
The delta itself consists of 10 equal parts, that are being decrypted. So we should divide not by delta, but by (delta - 16 * 10) which is (304160 - 160) = 304000.
I sense a rat here. Let's try to find out the number of the downloaded chunks:
2432000 / 304000 = 8. Wait... OH SHI~
So, that's the end of story.
The whole solution logic was right.
The only reason it failed, was my thought, that, for some reason, the downloaded decrypted file size should be the same as the sum of downloaded encrypted chunks.
And, of course, as I didn't mention about the decryption(it's mentioned only in previous question, which is only linked), none of you could give me a correct answer. I'm terribly sorry about that.
In continue to my comment..
The original file size as I understand from your description is 2432000 bytes.
The Chunk size is set to 304160 bytes (or 304160 per "delta").
So, the machine which send the file was able to fill 7 chunks and sent them.
The receiving machine now has 7 x 304160 bytes = 2129120 bytes.
The last chunk will not be filled to the end as there is not enough bytes left to fill to it.. so it will contain: 2432000 - 2129120 = 302880 which is less than 304160
If you add the numbers you will get 7x304160 + 1x302880 = 2432000 bytes
So according to that the original file transferred in full to the destination.
The problem is that you are calculating 8x304160 = 2433280 insisting that even the last chunk must be filled completely - but with what?? and why??
In humble.. are you locked in some kind of math confusion or did I misunderstand your problem?
Please answer, What is the original file size and what size is being received at the other end? (totals!)

Sorting Binary File Index Based

I have a binary file which can be seen as a concatenation of different sub-file:
INPUT FILE:
Hex Offset ID SortIndex
0000000 SubFile#1 3
0000AAA SubFile#2 1
0000BBB SubFile#3 2
...
FFFFFFF SubFile#N N
These are the information i have about each SubFile:
Starting Offset
Lenght in bytes
Final sequence Order
What's the fastest way to produce a Sorted Output File in your opinion ?
For instance OUTPUT FILE will contain the SubFile in the following order:
SubFile#2
SubFile#3
SubFile#1
...
I have thought about:
Split the Input File extracting each Subfile to disk, then
concatenate them in the correct order
Using FileSeek to move around the file and adding each SubFile to a BinaryWriter Stream.
Consider the following information also:
Input file can be really huge (200MB~1GB)
For those who knows, i am speaking about IBM AFP Files.
Both my solution are easy to implement, but looks really not performing in my opinion.
Thanks in advance
Also if file is big the number of IDs is not so huge.
You can just get all you IDs,sortindex,offset,length in RAM, then sort in RAM with a simple quicksort, when you finish, you rewrite the entire file in the order you have in your sorted array.
I expect this to be faster than other methods.
So... let's make some pseudocode.
public struct FileItem : IComparable<FileItem>
{
public String Id;
public int SortIndex;
public uint Offset;
public uint Length;
public int CompareTo(FileItem other) { return this.SortIndex.CompareTo(other.SortIndex); }
}
public static FileItem[] LoadAndSortFileItems(FILE inputFile)
{
FileItem[] result = // fill the array
Array.Sort(result);
}
public static void WriteFileItems(FileItem[] items, FILE inputfile, FILE outputFile)
{
foreach (FileItem item in items)
{
Copy from inputFile[item.Offset .. item.Length] to outputFile.
}
}
The number of read operations is linear, O(n), but seeking is required.
The only performance problem about seeking is cache miss by hard drive cache.
Modern hard drive have a big cache from 8 to 32 megabytes, seeking a big file in random order means cache miss, but i would not worry too much, because the amount of time spent in copying files, i guess, is greater than the amount of time required by seek.
If you are using a solid state disk instead seeking time is 0 :)
Writing the output file however is O(n) and sequential, and this is a very good thing since you will be totally cache friendly.
You can ensure better time if you preallocate the size of the file before starting to write it.
FileStream myFileStream = ...
myFileStream.SetLength(predictedTotalSizeOfFile);
Sorting FileItem structures in RAM is O(n log n) but also with 100000 items it will be fast and will use a little amount of memory.
The copy is the slowest part, use 256 kilobyte .. 2 megabyte for block copy, to ensure that copying big chunks of file A to file B will be fast, however you can adjust the amount of block copy memory doing some tests, always keeping in mind that every machine is different.
It is not useful to try a multithreaded approach, it will just slow down the copy.
It is obvious, but, if you copy from drive C: to drive D:, for example, it will be faster (of course, not partitions but two different serial ata drives).
Consider also that you need seek, or in reading or in writing, at some point, you will need to seek. Also if you split the original file in several smaller file, you will make the OS seek the smaller files, and this doesn't make sense, it will be messy and slower and probably also more difficult to code.
Consider also that if files are fragmented the OS will seek by itself, and that is out of your control.
The first solution I thought of was to read the input file sequentially and build a Subfile-object for every subfile. These objects will be put into b+tree as soon as they are created. The tree will order the subfiles by their SortIndex. A good b-tree implementation will have linked child nodes which enables you to iterate over the subfiles in the correct order and write them into the output file
another way could be to use random access files. you can load all SortIndexes and offsets. then sort them and write the output file in the sorted way. in this case all depends on how random access files work. in this case all depends on the random access file reader implementation. if it just reads the file until a specified position it would not be very performant.. honestly, I have no idea how they work... :(

How can I make a fixed hex editor?

So. Let's say I were to make a hex editor to edit... oh... let's say a .DLL file. How can I edit a .DLL file's hex by using C# or C++? And for the "fixed part", I want to make it so that I can browse from the program for a specific .DLL, have some pre-coded buttons on the programmed file, and when the button is pressed, it will automatically execute the requested action, meaning the button has been pre-coded to know what to look for in the .DLL and what to change it to. Can anyone help me get started on this?
Also, preferably C#. Thank you!
The basics are very simple.
A DLL, or any file, is a stream of bytes.
Basic file operations allow you to read and write arbitrary portions of a file. The term of art is basically "Random Access Files Operations".
In C, the fundamental operations are read(), write(), and lseek().
read allows you to read a stream of bytes in to a buffer, write allows you to write a buffers of bytes to a file, lseek allows you to position anywhere you want in the file.
Example:
int fd = open("test.dat", O_RDWR);
off_t offset = lseek(fd, 200, SEEK_SET);
if (off_t == -1) {
printf("Boom!\n");
exit(1);
}
char buf[1024];
ssize_t bytes_read = read(fd, buf, 1024);
offset = lseek(fd, 100, SEEK_SET);
ssize_t bytes_written = write(fd, buf, 1024);
flush(fd);
close(fd);
This reads 1024 bytes from a file, starting at the 200th byte of the file, then writes it back to the file at 100 bytes.
Once you can change random bytes in a file, it's a matter of choosing what bytes to change, how to change them, and doing the appropriate reads/lseeks/writes to make the changes.
Note, those are the most primitive I/O operations, there are likely much better ones you can use depending on your language etc. But they're all based on those primitives.
Interpreting the bytes of a file, displaying them, etc. That's an exercise for the reader. But those basic I/O capabilities give you the fundamentals of changing files.
If the idea is to load a hex edit box you can use the following: Be.HexEditor
Editing a file's "hex" is nothing more than changing bytes in it. The part of having pre-programmed changes is going to be that more general type. But for actually viewing, finding and then having the option of changing anything you want, Be.HexEditor is a good option. I used it over a year ago, I would hope that it has some new features that will make your life easier.

.NET BinaryWriter.Write() Method -- Writing Multiple Datatypes Simultaneously

I'm using BinaryWriter to write records to a file. The records are comprised of a class with the following property datatypes.
Int32,
Int16,
Byte[],
Null Character
To write each record, I call BinaryWriter.Write four times--one for each datatype. This works fine but I'd like to know if there's any way to just call the BinaryWriter.Write() method a single time using all of these datatypes. The reasoning for this is that another program is reading my binary file and will occasionally only read part of a record because it starts reading between my write calls. Unfortunately, I don't have control over the code to the other program else I would modify the way it reads.
Add a .ToBinary() method to your class that returns byte[].
public byte[] ToBinary()
{
byte[] result= new byte[number_of_bytes_you_need];
// fill buf
return result;
}
In your calling code (approximate as I haven't compiled this)
stream.BinaryWrite(myObj.toBinary());
You're still writing each value independently, but it cleans up the code a little.
Also, as sindre suggested, consider using the serialization, as it makes it incredibly easy to recreate your objects from the file in question, and requires less effort than writing the file as you're attempting to.
Sync: you can't depend on any of these solutions to fix your file sync issue. Even if you manage to reduce your binaryWrite() call to a single statement, not using serialization or the .ToBinary() method I've outlined, bytes are still written sequentially by the framework. This is a limitation of the physical structure of the disk. If you have control over the file format, add a record length field written before any of the record data. In the app that's reading the file, make sure that you have record_length bytes before attempting to process the next record from the file. While you're at it, put this in a database. If you don't have control over the file format, you're kind of out of luck.
In keeping with you writing to a BinaryWriter, I would have the object create a binary record using a second BinaryWriter that is then written to the BinaryWriter. So on your class you could have a method like this:
public void WriteTo(BinaryWriter writer)
{
MemoryStream ms = new MemoryStream();
BinaryWriter bw = new BinaryWriter(ms);
bw.Write(value1);
bw.Write(value2);
bw.Write(value3);
bw.Write(value4);
writer.Write(ms.ToArray());
}
This would create a single record with the same format as you're already writing to the main BinaryWriter, just it would build it all at once then write it as a byte array.
Create a class of the record and use the binary formatter:
FileStream fs = new FileStream("file.dat", FileMode.Create);
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(fs, <insert instance of a class here>);
fs.Close();
I haven't done this myself so I'm not absolutely sure it would work, the class cannot contain any other data, that's for sure. If you have no luck with a class you could try a struct.
Edit:
Just came up with another possible solution, create a struct of your data and use the Buffer.BlockCopy function:
byte[] writeBuffer = new byte[sizeof(structure)];
structure[] strucPtr = new structure[1]; // must be an array, 1 element is enough though
strucPtr[0].item1 = 0213; // initialize all the members
// Copy the structure array into the byte array.
Buffer.BlockCopy(strucPtr, 0, writeBuffer, 0, writeBuffer.Length);
Now you can write the writeBuffer to file in one go.
Second edit:
I don't agree with the sync problems not beeing possible to solve. First of all, the data is written to the file in entire sectors, not one byte at a time. And the file is really not updated until you flush it, thus writing data and updating the file length. The best and safest thing to do is to open the file exclusively, write a record (or several), and close the file. That requires the reading applications to use a similiar manner to read the file (open ex, read, close), as well as handling "access denied" errors gracefully.
Anyhow, I'm quite sure this will perform better no matter what when your'e writing an entire record at a time.

Preallocating file space in C#?

I am creating a downloading application and I wish to preallocate room on the harddrive for the files before they are actually downloaded as they could potentially be rather large, and noone likes to see "This drive is full, please delete some files and try again." So, in that light, I wrote this.
// Quick, and very dirty
System.IO.File.WriteAllBytes(filename, new byte[f.Length]);
It works, atleast until you download a file that is several hundred MB's, or potentially even GB's and you throw Windows into a thrashing frenzy if not totally wipe out the pagefile and kill your systems memory altogether. Oops.
So, with a little more enlightenment, I set out with the following algorithm.
using (FileStream outFile = System.IO.File.Create(filename))
{
// 4194304 = 4MB; loops from 1 block in so that we leave the loop one
// block short
byte[] buff = new byte[4194304];
for (int i = buff.Length; i < f.Length; i += buff.Length)
{
outFile.Write(buff, 0, buff.Length);
}
outFile.Write(buff, 0, f.Length % buff.Length);
}
This works, well even, and doesn't suffer the crippling memory problem of the last solution. It's still slow though, especially on older hardware since it writes out (potentially GB's worth of) data out to the disk.
The question is this: Is there a better way of accomplishing the same thing? Is there a way of telling Windows to create a file of x size and simply allocate the space on the filesystem rather than actually write out a tonne of data. I don't care about initialising the data in the file at all (the protocol I'm using - bittorrent - provides hashes for the files it sends, hence worst case for random uninitialised data is I get a lucky coincidence and part of the file is correct).
FileStream.SetLength is the one you want. The syntax:
public override void SetLength(
long value
)
If you have to create the file, I think that you can probably do something like this:
using (FileStream outFile = System.IO.File.Create(filename))
{
outFile.Seek(<length_to_write>-1, SeekOrigin.Begin);
OutFile.WriteByte(0);
}
Where length_to_write would be the size in bytes of the file to write. I'm not sure that I have the C# syntax correct (not on a computer to test), but I've done similar things in C++ in the past and it's worked.
Unfortunately, you can't really do this just by seeking to the end. That will set the file length to something huge, but may not actually allocate disk blocks for storage. So when you go to write the file, it will still fail.

Categories

Resources