Reading a file with FileStream and FILE_FLAG_NO_BUFFERING - c#

A little background: I've been experimenting with using the FILE_FLAG_NO_BUFFERING flag when doing IO with large files. We're trying to reduce the load on the cache manager in the hope that with background IO, we'll reduce the impact of our app on user machines. Performance is not an issue. Being behind the scenes as much as possible is a big issue. I have a close-to-working wrapper for doing unbuffered IO but I ran into a strange issue. I get this error when I call Read with an offset that is not a multiple of 4.
Handle does not support synchronous operations. The parameters to the FileStream constructor may need to be changed to indicate that the handle was opened asynchronously (that is, it was opened explicitly for overlapped I/O).
Why does this happen? And is doesn't this message contradict itself? If I add the Asynchronous file option I get an IOException(The parameter is incorrect.)
I guess the real question is what do these requirements, http://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx, have to do with these multiples of 4.
Here is the code that demonstrates the issue:
FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;
int MinSectorSize = 512;
byte[] buffer = new byte[MinSectorSize * 2];
int i = 0;
while (i < MinSectorSize)
{
try
{
using (FileStream fs = new FileStream(#"<some file>", FileMode.Open, FileAccess.Read, FileShare.None, 8, FileFlagNoBuffering | FileOptions.Asynchronous))
{
fs.Read(buffer, i, MinSectorSize);
Console.WriteLine(i);
}
}
catch { }
i++;
}
Console.ReadLine();

When using FILE_FLAG_NO_BUFFERING, the documented requirement is that the memory address for a read or write must be a multiple of the physical sector size. In your code, you've allowed the address of the byte array to be randomly chosen (hence unlikely to be a multiple of the physical sector size) and then you're adding an offset.
The behaviour you're observing is that the call works if the offset is a multiple of 4. It is likely that the byte array is aligned to a 4-byte boundary, so the call is working if the memory address is a multiple of 4.
Therefore, your question can be rewritten like this: why is the read working when the memory address is a multiple of 4, when the documentation says it has to be a multiple of 512?
The answer is that the documentation doesn't make any specific guarantees about what happens if you break the rules. It may happen that the call works anyway. It may happen that the call works anyway, but only in September on even-numbered years. It may happen that the call works anyway, but only if the memory address is a multiple of 4. (It is likely that this depends on the specific hardware and device drivers involved in the read operation. Just because it works on your machine doesn't mean it will work on anybody else's.)
It probably isn't a good idea to use FILE_FLAG_NO_BUFFERING with FileStream in the first place, because I doubt that FileStream actually guarantees that it will pass the address you give it unmodified to the underlying ReadFile call. Instead, use P/Invoke to call the underlying API functions directly. You may also need to allocate your memory this way, because I don't know whether .NET provides any way to allocate memory with a particular alignment or not.

Just call CreateFile directly with FILE_FLAG_NO_BUFFERING and then close it before opening with FileStream to achieve the same effect.

Related

Does the C# compiler reorder File-IO instructions?

I have the following C# algorithm for config file writeback:
string strPathConfigFile = "C:\File.txt"
string strPathBackupFile = "C:\File.backup"
string strContent = "File Content Text";
bool oldFilePresent = File.Exists(strPathConfigFile);
// Step 1
if (oldFilePresent)
{
File.Move(strPathConfigFile, strPathBackupFile);
}
// Step 2
using (FileStream f = new FileStream(strPath, FileMode.Create, FileAccess.ReadWrite, FileShare.None))
{
using (StreamWriter s = new StreamWriter(f))
{
s.Write(strContent);
s.Close();
}
f.Close();
}
// Step 3
if (oldFilePresent)
{
DeleteFile(strPathBackupFile);
}
It works like this:
The original File.txt is renamed to File.backup.
The new File.txt is written.
File.backup is deleted.
This way, if there is a power blackout during the write operation, there is still an intact backup file present. The backup file is only deleted after the write operation is completed. The reading process can check if the backup file is present. If it is, the normal file is considered broken.
For this approach to work, it is crucial that the order of the 3 steps is strictly followed.
Now to my question: Is it possible that the C# compiler swaps step 2 and 3?
It might be a slight performance benefit, as Step 1 and 3 are wrapped in identical if-conditions, which could tempt the compiler to put them together.
I suspect the compiler might do it, as Step 2 and 3 operate on completely different files. To a compiler who doesn't know the semantics of my exceptionally clever writeback procedure, Step 2 and 3 might seem unrelated.
According to the language specification, the C# compiler must preserve side effects when reordering statements. Writing to files is such a side effect.
In general, the compiler/jitter/CPU is free to reorder instructions as long as the result would be identical for a single thread. However, IO, system calls and most things involved with multi threading would involve memory barriers or other synchronization that prevents such reordering.
In the given example there is only a single thread involved. So if the File APIs are implemented correctly (and that would be a fairly safe assumption) there should be no risk of unintended behavior.
Reordering issues mostly popup when writing multi threaded code without being aware of all the potential hazards and requirements for synchronization. As long as you only use a single thread you should not need to care about potentials for reordering.

Why Read and ReadAync are producing totally different results

I have been using this code to capture the webcam and I have been trying to learn from it and make it better. Rider IDE suggested I should use an async variant of MemoryMappedViewStream.Read but it doesn't work at all. It produces all-black images suggesting the async and sync methods are totally different. I am wondering why that's the case?
// Working:
sourceStream.Read(MemoryMarshal.AsBytes(image.GetPixelMemoryGroup().Single().Span));
// NOT Working:
var bytes = MemoryMarshal.AsBytes(image.GetPixelMemoryGroup().Single().Span).ToArray();
await sourceStream.ReadAsync(bytes, 0, bytes.Length, token);
Repository and line of code
Those two versions are not the same. In "sync" version you obtain a reference to memory location of an image via image.GetPixelMemoryGroup(). Then you read data from sourceStream directly into that location.
In "async" version you again obtain reference to memory location via image.GetPixelMemoryGroup but then you do something different - you call ToArray. This extension method copies bytes from image memory location into new array, the one you hold in bytes variable. You then read data from sourceStream into that bytes array, NOT directly into image memory locaiton. Then you discard bytes array, so you read them to nowhere basically.
Now,MemoryMappedViewStream inherits from UnmanagedMemoryStream and all read\write operations are implemented in UnmanagedMemoryStream. This kind of stream represents data in memory and there is nothing async it can do. The only reason it even has ReadAsync is because base stream class (Stream) has those methods. Even if you manage to make ReadAsync work - in this case it will not be asynchornous anyway. As far as I know - MemoryMappedViewStream does now allow real asynchronous access, even though it could make sense, since it has underlying file.
In short - I'd just continue with sync version, because there is no benefit in this case to use "async" one. Static analyzer of course doesn't know that, it only sees that there is Async-named analog of the method you use.
await sourceStream.ReadAsync(bytes, 0, bytes.Length, token).ConfigureAwait(false);
Check like this

MonoTorrent Event when piece written

I've been trying to use the C# MonoTorrent Library, but the lack of a documentation isn't helping. I'm trying to stream a file, but to do that, I somehow need an event whenever a Piece is written to file, or something similar.
I know there is an event which gets triggered whenever a Piece has been hashed, but it's not so useful when I need the actual content.
So I want to ask how I can know when a piece has been written to a file, so I can parse that and then stream that movie.
I already looked at the TorrentManager the ClientEngine the DiskManager and I haven't found anything useful in any of these classes nor any other Manager class. Now is this feature just hidden somewhere or do I have to do something different to get the pieces that were downloaded?
The PieceHashed event is what you need. When that event is raised you can be guaranteed that the data associated with that piece has been received, validated and written to the DiskManager.
If a MemoryWriter is being used that data may not be written to the underlying harddrive/SSD when the event is raised. To guarantee that you'll need to call the FlushAsync method, passing in the TorrentManager and piece index. If a piece is 256kB in size and there are three files of length 200kb, 50kb and 6kb contained within piece 6, all three of those files will be flushed if you pass in '6' as the piece index. If you call the overload which doesn't take a PieceIndex it will instead flush every file.
If you're writing something like a SlidingWindowPicker then you should probably only call FlushAsync when a piece in the high priority set has been downloaded. Any call to flush will flush all pending data for a given file (or all files). If the only data you have is from the very end of the torrent then flushing it immediately won't impact your ability to stream, but it may increase the overheads.
There is an alternative, which is to implement IPieceWriter and create a wrapper which flushes whenever an appropriate piece is written.
The existing MemoryWriter shows how to create a wrapper. It's Write method is implemented as follows: https://github.com/mono/monotorrent/blob/2209922c4159e394242c6c337401571312642b6e/src/MonoTorrent/MonoTorrent.Client.PieceWriters/MemoryWriter.cs#L118-L123 .
If you wanted to write something to automatically flush pieces you'd do something like:
public void Write(TorrentFile file, long offset, byte[] buffer, int bufferOffset, int count, bool forceWrite)
{
Writer.Write(file, offset, buffer, bufferOffset, count);
Writer.Flush(file);
}

What risks in manipulating a provided stream multiple times

Given a stream by a user such that we expect them to manage the disposal of it through typical using
using(var stream = new MemoryStream())
{
MyMethod(stream);
}
Is there any risk to copying back to the stream after working on it. Specifically we have a method that populates the data, but we have a conditional need to sort the data. So MyMethod is something like this:
void MyMethod(Stream stream, bool sort = false)
{
//Stream is populated
stream.Position = 0;
if(sort)
{
Sort(stream);
}
}
void Sort(Stream stream)
{
using(var sortedStream = new MemoryStream)
{
//Sort per requirements into the new sorted local stream
sortedStream.Position = 0;
//Is this safe? Any risk of losing data or memory leak?
sortedStream.CopyTo(stream);
}
}
The thing to notice is we are populating the stream provided by the user and then sorting it into a local stream. Since the local stream is owned by the local method it is cleaned up but in converse we can NOT clean up the provided stream, but want to populate it with the local results.
To reiterate my question, is there anything wrong with this? Is there a risk of garbage data being in the stream or some other issue I am not thinking of?
Stream is an abstract class, and has a lot of different implementations. Not all streams can be written to, so in some cases the code may not work as expected, or could crash.
sortedStream.Position = 0;
sortedStream.CopyTo(stream);
You would need to check the CanSeek and CanWrite properties beforehand:
if (sortedStream.CanSeek & stream.CanWrite)
{
sortedStream.Position = 0;
sortedStream.CopyTo(stream);
}
else
{
// not supported
}
Whether a given stream support moving the position around and re-writing data over itself is going to depend on the specific stream. Some support it, and some don't. Not all streams are allowed to change their position, not all are able to write, not all are able to overwrite existing data, and some are able to do all of those things.
A well behaved stream shouldn't leak resources if you do any of those unsupported things; it ought to just throw an exception, but of course technically a custom stream could do whatever it wants, so you most certainly could write your own stream that leaks resources when changing the position. But of course at that point the bug of leaking a resource is in that stream's implementation, not in your code that sorts the data in the stream. The code you've shown here only needs to worry about a stream throwing an exception if an unsupported operation is performed.
I have no idea why you don't sort it before you insert it into the stream or why you use a stream at all when you access seems to be random-access, but technically, it's fine. You can do it. It will work.

int[] to byte[], am i forgetting something?

This is untested as i need to write more code. But is this correct and i feel like i am missing something, like this could be better written. Do i need the c.lose at the end? should i flush anything(i'll assume no if i do close())?
Byte[] buffer;
using (var m = new MemoryStream())
{
using (var binWriter = new BinaryWriter(m))
{
foreach (var v in wordIDs)
binWriter.Write(v);
binWriter.Close();
}
buffer = m.GetBuffer();
m.Close();
}
You don't need the .Close() calls (the automatic .Dispose() the using block generates takes care of those).
Also, you'll want to use .ToArray() on the MemoryStream, not .GetBuffer(). GetBuffer() returns the underlying buffer, no matter how much of it is used. ToArray() returns a copy that is the perfect length.
If you're using this to communicate with another program, make sure you and it agree on the order of the bytes (aka endianness). If you're using network byte-order, you'll need to flip the order of the bytes (using something like IPAddress.HostToNetworkOrder()), as network byte-order is big-endian, and BinaryWriter uses little-endian.
What is wordIDs, is it an enumeration or is it an Int32[]? You can use the following if it is just Int32[]:
byte[] bytes = new byte[wordIDs.Length * 4];
Buffer.BlockCopy(wordIDs, 0, bytes, 0, bytes.Length);
Otherwise, if wordIDs is an enumeration that you must step through, all you need to change is remove the m.Close (as mentioned) and use MemoryStream.ToArray (as mentioned).
Close is not needed here. The using statements will ensure the Dispose method on these types are called on exit and this will have the same effect as calling Close. In fact if you look at the code in reflector, you'll find that Close in both cases just proxies off to the Dispose method on both types.
Thus sayeth Skeet:
There's no real need to close either
a MemoryStream or a BinaryWriter, but
I think it's good form to use a using
statement to dispose of both - that
way if you change at a later date to
use something that really does need
disposing, it will fit into the same
code.
So you don't need the Close or the using statement, but using is idiomatic C#.
JaredPar's and Jonathan's answers are correct. If you want an alternative, you use BitConverter.GetBytes(int). So now your code turns into this
wordIDs.SelectMany(i => BitConverter.GetBytes(i));
I disagree with the Skeet here.
Whilst you may not need close by using using you are relying on the implementation of BinaryWriter and MemoryStream to do it for you in the Dispose method. This is true for framework types, but what if someone writes a Writer or Stream which doesn't do it?
Adding close does no harm and protects you against badly written classes.

Categories

Resources