I'm coming across something trivial, but it appears that data is flushed to disk (out of the FileStream's buffer) when the data I'm buffering hits the size of the FileStream's buffer.
//use the FileStream buffer to actually buffer the data to be written, so segments are written as desired.
FileStream writeStream = new FileStream(filename, FileMode.Append, FileAccess.Write, FileShare.None, CommandOperationBufferSize);
BinaryWriter binWriter = new BinaryWriter(writeStream);
byte[] FullSize = new byte[CommandOperationTotalSize];
//the BinaryWriter will flush when the FileStream buffer is hit
binWriter.Write(FullSize); //DATA FLUSHES TO DISK HERE!
//if wait, wait five seconds
if (CommandOperation == "writewait" || CommandOperation == "appendwait")
{
Thread.Sleep(5000);
writeStream.Flush();
Thread.Sleep(5000);
}
writeStream.Close();
writeStream.Dispose();
binWriter.Close();
Can anyone confirm that this is the case? That the FileStream's buffer is actual .Flush() when the FileStream's buffer is filled?
I ask because it appears that if I set CommandOperationTotalSize to 1MB, and set the CommandOperationBufferSize to 64KB, data is flushed to disk when the buffer is filled.
Sounds like I answered my own question, but it seems odd that the FileStream buffer wouldn't just overflow? But maybe the API developers are trying to be nice?
Thanks,
Matt
You can readily assume that overflowing the buffer is not possible. The class would be rather hard to use if that was the case, given that FileStream has no properties at all to tell you how much is currently being buffered.
The buffer is only there to reduce the number of calls to the native Windows WriteFile() call. Important when you write small amounts of data, say one byte at a time. If you don't explicitly specify the buffer size then it will use a buffer of 4096 bytes. Which is fine, it is very rare to need something else. Any writes are further buffered by the file system cache. You should only consider a non-standard size when you use FileOptions.WriteThrough
Related
I receive a fair amount of binary data from an external device, about 30-40 MB/s. I need to save it to a file. On the external source device side, I have a very small buffer that I can't enlarge and as soon as the transmission stutters on the C# application side, it quickly gets clogged and I lose data.
In the application, I tried writing using FileStream, but unfortunately it is not fast enough.
_FileStream = new FileStream(FileName, FileMode.Create, FileAccess.Write);
...
void Handler_OnFtdiBytesReceived(object sender, FtdiBytesReceivedEventArgs e)
{
...
Array.Copy(e.Bytes, 0, _ReceivedDataBuffer, _ReceivedDataBufferPosition, e.NumBytesAvailable);
_ReceivedDataBufferPosition += (int)e.NumBytesAvailable;
if (_ReceivedDataBufferPosition > 0)
{
_FileStream.Write(_ReceivedDataBuffer, 0, (int)e.NumBytesAvailable);
_ReceivedDataBufferPosition = 0;
}
if (_IsOperationFinished == true)
{
_FileStream.Flush();
_FileStream.Close();
}
...
}
I also tried adding a BinaryWriter:
_FileStream = new FileStream(FileName, FileMode.Create, FileAccess.Write);
_BinaryWriter = new BinaryWriter(_FileStream);
and then:
_BinaryWriter.Write(_ReceivedDataBuffer, 0, (int)e.NumBytesAvailable);
instead of previous _FileStream.Write(...), but also the buffer on the transmit side gets clogged.
Is there any way to deal with this?
I wonder if it might help to somehow buffer the data in the computer's RAM when receiving it and, for example, when it reaches some sizable amount (say, 512 MB), start writing to the file in a separate Task, so that in the meantime, new data can be collected into the buffer continuously. Perhaps I would need to use two buffers and use them alternately, one to receive data continuously and the other from which to write to the file, and swap??
This seems quite complex code for someone with little experience, which I don't know if it will help, so I'd like to ask you for a hint first.
I'll just add at the end that I have the ability to watch the buffer fill up in this external device and by commenting out this one line regarding writing:
_FileStream.Write(_ReceivedDataBuffer, 0, (int)e.NumBytesAvailable);
I see that the problem with its clogging disappears. Earlier I also analyzed whether other code fragments might be inefficient, such as Array.Copy(...) or passing parameters via Event, but it had no effect.
I am reading all files(around 3000 files and size is 50 GB) from specified path with 4k bytes at a time. Below is the code for the same. My query is when i see the CPU and Memory of the application in task manager i could see that the IO Reads are gradually increasing to high level, i can understand that it might be because of 4k read but does that affect to other things or its ok to increase the IO Read. Also is FileStream the optimum way to read the file as it does not load the entire file in memory?
fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read)
do
{
BytesRead = fileStream.Read(Buffer, 0, MAX_BUFFER);
}
while (BytesRead != 0);
fileStream.Close();
Check Hans Passant's answer about this issue, i find it very clear.
Files are already buffered by the file system cache, You just need to
pick a buffer size that doesn't force FileStream to make the native
Windows ReadFile() API call to fill the buffer too often. Don't go
below a kilobyte, more than 16 KB is a waste of memory.
Take a look at this post too, it provides some benchmarking code.
I have a file stream opened in writeshare and append mode from multiple processes.
Does anybody know if a single unbuffered write operation can be considered atomic?
Or have i to develop a way to synchronize different writes to ensure my data are safe?
I found my way.
You can open a filestream using this constructor.
new FileStream(FileName,
FileMode.Append,
System.Security.AccessControl.FileSystemRights.AppendData,
FileShare.ReadWrite, 4096, FileOptions.None);
using System.Security.AccessControl.FileSystemRights.AppendData parameter to open the stream, with FileMode.Append, the OS will try to write the buffer in atomic way.
If your write is bigger than buffer size, the operation will not be atomic, so you have to check your buffer size.
Occasionally we need to copy huge files from one bucket to another in AWS S3. Whenever possible we use the CopyRequest to handle this operation all on AWS (since no round trip required back to the client). But sometimes we do not have the option to do this because we need to copy between 2 completely separate accounts which requires a GET and then a PUT.
Problems:
The response stream returned from the GET is not seekable so it cannot be passed to the PUT request and have it stream seamlessly from one to the other
Copying the response stream to an intermediary stream (MemoryStream) using CopyTo() and then passing that to the PUT operation works well but doesn't scale (large files will throw OutOfMemory exceptions)
So basically I need an intermediary stream that I can read/write to at the same time, basically I would read a chunk from the response stream and write it to my intermediary stream, meanwhile the PUT request is reading out the content and its just a seamless pass-thru sort of scenario.
I found this post on stackoverflow and it seemed promising at first but it still throws an OutOfMemory exception with large files.
.NET Asynchronous stream read/write
Anyone ever had to do something similar to this? How would you tackle it? Thanks in advcance
It's not clear why you would want to use MemoryStream. The Stream.CopyTo method in .NET 4 doesn't need to use an intermediate stream - it will just read into a local buffer of a fixed size, then write that buffer to the output stream, then read more data (overwriting the buffer) etc.
If you're not using .NET 4, it's easy to implement something similar, e.g.
public static void CopyTo(this Stream input, Stream output)
{
byte[] buffer = new byte[64 * 1024]; // 64K buffer
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
I found this, but it uses a Queue internally, which the author notes is an order of magnitude slower than a MemoryStream.
http://www.codeproject.com/Articles/16011/PipeStream-a-Memory-Efficient-and-Thread-Safe-Stre
I keep hoping I'll find an official MS library solution, but it seems that this wheel hasn't been properly invented yet.
I've been playing around with what I thought was a simple idea. I want to be able to read in a file from somewhere (website, filesystem, ftp), perform some operations on it (compress, encrypt, etc.) and then save it somewhere (somewhere may be a filesystem, ftp, or whatever). It's a basic pipeline design. What I would like to do is to read in the file and put it onto a MemoryStream, then perform the operations on the data in the MemoryStream, and then save that data in the MemoryStream somewhere. I was thinking I could use the same Stream to do this but run into a couple of problems:
Everytime I use a StreamWriter or StreamReader I need to close it and that closes the stream so that I cannot use it anymore. That seems like there must be some way to get around that.
Some of these files may be big and so I may run out of memory if I try to read the whole thing in at once.
I was hoping to be able to spin up each of the steps as separate threads and have the compression step begin as soon as there is data on the stream, and then as soon as the compression has some compressed data available on the stream I could start saving it (for example). Is anything like this easily possible with the C# Streams? ANyone have thoughts as to how to accomplish this best?
Thanks,
Mike
Using a helper method to drive the streaming:
static public void StreamCopy(Stream source, Stream target)
{
byte[] buffer = new byte[8 * 1024];
int size;
do
{
size = source.Read(buffer, 0, 8 * 1024);
target.Write(buffer, 0, size);
} while (size > 0);
}
You can easily combine whatever you need:
using (FileStream iFile = new FileStream(...))
using (FileStream oFile = new FileStream(...))
using (DeflateStream oZip = new DeflateStream(outFile, CompressionMode.Compress))
StreamCopy(iFile, oZip);
Depending on what you are actually trying to do, you'd chain the streams differently. This also uses relatively little memory, because only the data being operated upon is in memory.
StreamReader/StreamWriter shouldn't have been designed to close their underlying stream -- that's a horrible misfeature in the BCL. But they do, they won't be changed (because of backward compatibility), so we're stuck with this disaster of an API.
But there are some well-established workarounds, if you want to use StreamReader/Writer but keep the Stream open afterward.
For a StreamReader: don't Dispose the StreamReader. It's that simple. It's harmless to just let a StreamReader go without ever calling Dispose. The only effect is that your Stream won't get prematurely closed, which is actually a plus.
For a StreamWriter: there may be buffered data, so you can't get away with just letting it go. You have to call Flush, to make sure that buffered data gets written out to the Stream. Then you can just let the StreamWriter go. (Basically, you put a Flush where you normally would have put a Dispose.)
Unless you're reading in streams bigger than your hard drive, I don't think you'll run out of memory:
http://blogs.msdn.com/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx