My question is based off of inheriting a great deal of legacy code that I can't do very much about. Basically, I have a device that will produce a block of data. A library which will call the device to create that block of data, for some reason I don't entirely understand and cannot change even if I wanted to, writes that block of data to disk.
This write is not instantaneous, but can take up to 90 seconds. In that time, the user wants to get a partial view of the data that's being produced, so I want to have a consumer thread which reads the data that the other library is writing to disk.
Before I even touch this legacy code, I want to mimic the problem using code I entirely control. I'm using C#, ostensibly because it provides a lot of the functionality I want.
In the producer class, I have this code creating a random block of data:
FileStream theFS = new FileStream(this.ScannerRawFileName,
FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read);
//note that I need to be able to read this elsewhere...
BinaryWriter theBinaryWriter = new BinaryWriter(theFS);
int y, x;
for (y = 0; y < imheight; y++){
ushort[] theData= new ushort[imwidth];
for(x = 0; x < imwidth;x++){
theData[x] = (ushort)(2*y+4*x);
}
byte[] theNewArray = new byte[imwidth * 2];
Buffer.BlockCopy(theImage, 0, theNewArray, 0, imwidth * 2);
theBinaryWriter.Write(theNewArray);
Thread.Sleep(mScanThreadWait); //sleep for 50 milliseconds
Progress = (float)(y-1 >= 0 ? y-1 : 0) / (float)imheight;
}
theFS.Close();
So far, so good. This code works. The current version (using FileStream and BinaryWriter) appears to be equivalent (though slower, because of the copy) to using File.Open with the same options and a BinaryFormatter on the ushort[] being written to disk.
But then I add a consumer thread:
FileStream theFS;
if (!File.Exists(theFileName)) {
//do error handling
return;
}
else {
theFS = new FileStream(theFileName, FileMode.Open,
FileAccess.Read, FileShare.Read);
//very relaxed file opening
}
BinaryReader theReader = new BinaryReader(theFS);
//gotta do this copying in order to handle byte array swaps
//frustrating, but true.
byte[] theNewArray = theReader.ReadBytes(
(int)(imheight * imwidth * inBase.Progress) * 2);
ushort[] theData = new ushort[((int)(theNewArray.Length/2))];
Buffer.BlockCopy(theNewArray, 0, theData, 0, theNewArray.Length);
Now, it's possible that the declaration of theNewArray is broken, and will cause some kind of read overflow. However, this code never gets that far, because it always always always breaks on trying to open the new FileStream with a System.IO.IOException that states that another process has opened the file.
I'm setting the FileAccess and FileShare enumerations as stated in the FileStream documentation on MSDN, but it appears that I just can't do what I want to do (ie, write in one thread, read in another). I realize that this application is a bit unorthodox, but when I get the actual device involved, I'm going to have to do the same thing, but using MFC.
In any event, What am I forgetting? Is what I'm wanting to do possible, since it's specified as possible in the documentation?
Thanks!
mmr
Your consumer must specify FileShare.ReadWrite.
By trying to open the file as FileShare.Read in the consumer you are saying "I want to open the file and let others read it at the same time" ... since there is already a writer that call fails, you have to allow concurrent writes with the reader.
I haven't had time to test this but I think you may need to call the Flush method of the BinaryWriter
FileStream theFS = new FileStream(this.ScannerRawFileName,
FileMode.OpenOrCreate, FileAccess.Write, FileShare.Read);
//note that I need to be able to read this elsewhere...
BinaryWriter theBinaryWriter = new BinaryWriter(theFS);
int y, x;
for (y = 0; y < imheight; y++){
ushort[] theData= new ushort[imwidth];
for(x = 0; x < imwidth;x++){
theData[x] = (ushort)(2*y+4*x);
}
byte[] theNewArray = new byte[imwidth * 2];
Buffer.BlockCopy(theImage, 0, theNewArray, 0, imwidth * 2);
theBinaryWriter.Write(theNewArray);
Thread.Sleep(mScanThreadWait); //sleep for 50 milliseconds
Progress = (float)(y-1 >= 0 ? y-1 : 0) / (float)imheight;
theBinaryWriter.Flush();
}
theFS.Close();
Sorry I haven't had time to test this. I ran into an issue with a file I was creating that was similar to this (although not exact) and a missing "Flush" was the culprit.
I believe Chuck is right, but keep in mind The only reason this works at all is because the filesystem is smart enough to serialize your read/writes; you have no locking on the file resource - that's not a good thing :)
Related
I would like to anticipate the exact size of my file before writing it in my device, to handle the error or prevent the crash in case there is no space in the corresponding drive. So I have this simple console script, that generates the file:
using System;
using System.IO;
namespace myNamespace
{
class Program
{
static void Main(string[] args) {
byte[] myByteArray = new byte[100];
MemoryStream stream = new MemoryStream();
string fileName = "E:\\myFile.mine";
FileStream myFs = new FileStream(fileName, FileMode.CreateNew);
BinaryWriter toStreamWriter = new BinaryWriter(stream);
BinaryWriter toFileWriter = new BinaryWriter(myFs, System.Text.Encoding.ASCII);
myFs.Write(myByteArray, 0, myByteArray.Length);
for (int i = 0; i < 30000; i++) {
toStreamWriter.Write(i);
}
Console.WriteLine($"allocated memory: {stream.Capacity}" );
Console.WriteLine($"stream lenght {stream.Length}");
Console.WriteLine($"file size: {(stream.Length / 4) * 4.096 }");
toFileWriter.Write(stream.ToArray());
Console.ReadLine();
}
}
}
I got to the point when I get to anticipate the size of the file.
I will be stream.Length / 4) * 4.096, but as long an the ramainder of stream.Length / 4 is 0.
For example for the case of adding 13589 integers to the stream
for (int i = 0; i < 13589; i++) {
toStreamWriter.Write(i);
}
I get that the file size is 55660,544 bytes in the script, but then its 57344 bytes in the explorer.
Same result as if the integers added would have been 14000 instead of 13589.
How can I anticipate the exact size of my created file when the remainder of stream.Length / 4 is not 0?
Edit: For the potential helper running the script you need to delete the created file every time the script is run! Of course use a path and fileName of your choice :)
Regarding the relation stream.Length / 4) * 4.096, the 4 is coming for the byte size, and I guess that the 4.096 comes from the array and file generation, however any further explanation would be much appreciated.
Edit2: Check that if the pending results are logged with:
for (int i = 13589; i <= 14000; i++) {
Console.WriteLine($"result for {i} : {(i*4 / 4) * 4.096} ");
}
You obtain:
....
result for 13991 : 57307,136
result for 13992 : 57311,232
result for 13993 : 57315,328
result for 13994 : 57319,424
result for 13995 : 57323,52
result for 13996 : 57327,616
result for 13997 : 57331,712
result for 13998 : 57335,808
result for 13999 : 57339,904
result for 14000 : 57344
So I assume that file size fits the next cluster + byteStream size with no decimal reminder. Would this file size set logic make sense for the file size anrticipation? If the stream is very big also?
From what I understand from the comments, the question is about how to get the actual file size of the file. Not the file size on disk. And your code is actually almost correct in doing so.
The math is pretty basic. In your example, you create a file stream and write a 100 byte long array to the file stream. Then you create memory stream and write 30000 integers into the memory stream. Then you write the memory stream into the file stream. Considering that each integer here is 4 byte long, as specified by C#, the resulting file has a file size of (30000 * 4) + 100 = 120100 bytes. At least for me, it's exactly what the file properties say in the Windows Explorer.
You could get the same result a bit easier with the following code:
FileStream myFs = new FileStream("test.file", FileMode.CreateNew);
byte[] myByteArray = new byte[100];
myFs.Write(myByteArray, 0, myByteArray.Length);
BinaryWriter toFileWriter = new BinaryWriter(myFs, System.Text.Encoding.ASCII);
for (int i = 0; i < 30000; i++)
{
toFileWriter.Write(i);
}
Console.WriteLine($"stream lenght {myFs.Length}");
myFs.Close();
This will return a stream length of 120100 bytes.
In case I misunderstood your question and comments and you were actually trying to get the file size on disk:
Don't go there. You cannot reliably predict the file size on disk due to variable circumstances. For example, file compression, encryption, various RAID types, various file systems, various disk types, various operating systems.
I am writing an encryption program for fun. I stumbled across the problem, that a deleted file has not to be gone even if it is overwritten by opening it with a FileStream and writing a bunch of random bytes into it.
But my current implementation of my program is creating a new temporary file, to write encrypted/decrypted data into it to save RAM. I was now wondering if the same problem does apply, even if I don't terminate the FileStream object, which exists since the files creation.
So if I just set my stream position back to zero and overwrite every single byte, does it really write onto the same positions as in the beginning or can parts of the temp file survive? If so, is there any workaround I could use?
My current approach:
var fileStream = new FileStream(path, FileMode.Create);
fileStream.Write(//possible decrypted data);
fileStream.Position = 0;
byte[] bytes = RandomBytes();
long amount = (fileStream.Length / bytes.Length + 1);
for (long i = 0; i < amount; i++)
{
fileStream.Write(bytes, 0, bytes.Length);
}
string name = fileStream.Name;
fileStream.Close();
File.Delete(name);
I have written an application that implements a file copy that is written as below. I was wondering why, when attempting to copy from a network drive to a another network drive, the copy times are huge (20-30 mins to copy a 300mb file) with the following code:
public static void CopyFileToDestination(string source, string dest)
{
_log.Debug(string.Format("Copying file {0} to {1}", source, dest));
DateTime start = DateTime.Now;
string destinationFolderPath = Path.GetDirectoryName(dest);
if (!Directory.Exists(destinationFolderPath))
{
Directory.CreateDirectory(destinationFolderPath);
}
if (File.Exists(dest))
{
File.Delete(dest);
}
FileInfo sourceFile = new FileInfo(source);
if (!sourceFile.Exists)
{
throw new FileNotFoundException("source = " + source);
}
long totalBytesToTransfer = sourceFile.Length;
if (!CheckForFreeDiskSpace(dest, totalBytesToTransfer))
{
throw new ApplicationException(string.Format("Unable to copy file {0}: Not enough disk space on drive {1}.",
source, dest.Substring(0, 1).ToUpper()));
}
long bytesTransferred = 0;
using (FileStream reader = sourceFile.OpenRead())
{
using (FileStream writer = new FileStream(dest, FileMode.OpenOrCreate, FileAccess.Write))
{
byte[] buf = new byte[64 * 1024];
int bytesRead = reader.Read(buf, 0, buf.Length);
double lastPercentage = 0;
while (bytesRead > 0)
{
double percentage = ((float)bytesTransferred / (float)totalBytesToTransfer) * 100.0;
writer.Write(buf, 0, bytesRead);
bytesTransferred += bytesRead;
if (Math.Abs(lastPercentage - percentage) > 0.25)
{
System.Diagnostics.Debug.WriteLine(string.Format("{0} : Copied {1:#,##0} of {2:#,##0} MB ({3:0.0}%)",
sourceFile.Name,
bytesTransferred / (1024 * 1024),
totalBytesToTransfer / (1024 * 1024),
percentage));
lastPercentage = percentage;
}
bytesRead = reader.Read(buf, 0, buf.Length);
}
}
}
System.Diagnostics.Debug.WriteLine(string.Format("{0} : Done copying", sourceFile.Name));
_log.Debug(string.Format("{0} copied in {1:#,##0} seconds", sourceFile.Name, (DateTime.Now - start).TotalSeconds));
}
However, with a simple File.Copy, the time is as expected.
Does anyone have any insight? Could it be because we are making the copy in small chunks?
Changing the size of your buf variable doesn't change the size of the buffer that FileStream.Read or FileStream.Write use when communicating with the file system. To see any change with buffer size, you have to specify the buffer size when you open the file.
As I recall, the default buffer size is 4K. Performance testing I did some time ago showed that the sweet spot is somewhere between 64K and 256K, with 64K being more consistently the best choice.
You should change your File.OpenRead() to:
new FileStream(sourceFile.FullName, FileMode.Open, FileAccess.Read, FileShare.None, BufferSize)
Change the FileShare value if you don't want exclusive access, and declare BufferSize as a constant equal to whatever buffer size you want. I use 64*1024.
Also, change the way you open your output file to:
new FileStream(dest, FileMode.Create, FileAccess.Write, FileShare.None, BufferSize)
Note that I used FileMode.Create rather than FileMode.OpenOrCreate. If you use OpenOrCreate and the source file is smaller than the existing destination file, I don't think the file is truncated when you're done writing. So the destination file would contain extraneous data.
That said, I wouldn't expect this to change your copy time from 20-30 minutes down to the few seconds that it should take. I suppose it could if every low-level read requires a network call. With the default 4K buffer, you're making 16 read calls to the file system in order to fill your 64K buffer. So by increasing your buffer size you greatly reduce the number of OS calls (and potentially the number of network transactions) your code makes.
Finally, there's no need to check to see if a file exists before you delete it. File.Delete silently ignores an attempt to delete a file that doesn't exist.
Call the SetLength method on your writer Stream before actual copying, this should reduce operations by the target disk.
Like so
writer.SetLength(totalBytesToTransfer);
You may need to set the Stream's psoition back to the start after calling this method by using Seek. Check the position of the stream after calling SetLength, should be still zero.
writer.Seek(0, SeekOrigin.Begin); // Not sure on that one
If that still is too slow use the SetFileValidData
Is there a way to get the "Path" to a memorystream?
For example if i want to use CMD and point to a filepath, like "C:..." but instead the file is in a memorystream, is it possible to point it there?
I have tried searching on it but i can´t find any clear information on this.
EDIT:
If it helps, the thing i am wanting to access is an image file, a print screen like this:
using (Bitmap b = new Bitmap(Screen.PrimaryScreen.Bounds.Width, Screen.PrimaryScreen.Bounds.Height))
{
using (Graphics g = Graphics.FromImage(b))
{
g.CopyFromScreen(0, 0, 0, 0, Screen.PrimaryScreen.Bounds.Size, CopyPixelOperation.SourceCopy);
}
using (MemoryStream ms = new MemoryStream())
{
b.Save(ms, ImageFormat.Bmp);
StreamReader read = new StreamReader(ms);
ms.Position = 0;
var cwebp = new Process
{
StartInfo =
{
WindowStyle = ProcessWindowStyle.Normal,
FileName = "cwebp.exe",
Arguments = string.Format(
"-q 100 -lossless -m 6 -alpha_q 100 \"{0}\" -o \"{1}\"", ms, "C:\test.webp")
},
};
cwebp.Start();
}
}
and then some random testing to get it to work....
And the thing i want to pass it to is cwebp, a Webp encoder.
Which is why i must use CMD, as i can´t work with it at the C# level, else i wouldn´t have this problem.
Yeah that is usually protected. If you know where it is, you might be able to grab it with an unsafe pointer. It might be easier to write it to a text file that cmd could read, or push it to Console to read.
If using .NET 4.0 or greater you can use a MemoryMappedFile. I haven't toyed with this class since 4.0 beta. However, my understanding is its useful for writing memory to disk in cases where you are dealing with large amounts of data or want some level of application memory sharing.
Usage per MSDN:
static void Main(string[] args)
{
long offset = 0x10000000; // 256 megabytes
long length = 0x20000000; // 512 megabytes
// Create the memory-mapped file.
using (var mmf = MemoryMappedFile.CreateFromFile(#"c:\ExtremelyLargeImage.data", FileMode.Open,"ImgA"))
{
// Create a random access view, from the 256th megabyte (the offset)
// to the 768th megabyte (the offset plus length).
using (var accessor = mmf.CreateViewAccessor(offset, length))
{
int colorSize = Marshal.SizeOf(typeof(MyColor));
MyColor color;
// Make changes to the view.
for (long i = 0; i < length; i += colorSize)
{
accessor.Read(i, out color);
color.Brighten(10);
accessor.Write(i, ref color);
}
}
}
}
If cwebp.exe is expecting a filename, there is nothing you can put on the command line that satisfies your criteria. Anything enough like a file that the external program can open it won't be able to get its data from your program's memory. There are a few possibilities, but they probably all require changes to cwebp.exe:
You can write to the new process's standard in
You can create a named pipe from which the process can read your data
You can create a named shared memory object from which the other process can read
You haven't said why you're avoiding writing to a file, so it's hard to say which is best.
I have to split a huge file into many smaller files. Each of the destination files is defined by an offset and length as the number of bytes. I'm using the following code:
private void copy(string srcFile, string dstFile, int offset, int length)
{
BinaryReader reader = new BinaryReader(File.OpenRead(srcFile));
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
byte[] buffer = reader.ReadBytes(length);
BinaryWriter writer = new BinaryWriter(File.OpenWrite(dstFile));
writer.Write(buffer);
}
Considering that I have to call this function about 100,000 times, it is remarkably slow.
Is there a way to make the Writer connected directly to the Reader? (That is, without actually loading the contents into the Buffer in memory.)
I don't believe there's anything within .NET to allow copying a section of a file without buffering it in memory. However, it strikes me that this is inefficient anyway, as it needs to open the input file and seek many times. If you're just splitting up the file, why not open the input file once, and then just write something like:
public static void CopySection(Stream input, string targetFile, int length)
{
byte[] buffer = new byte[8192];
using (Stream output = File.OpenWrite(targetFile))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}
This has a minor inefficiency in creating a buffer on each invocation - you might want to create the buffer once and pass that into the method as well:
public static void CopySection(Stream input, string targetFile,
int length, byte[] buffer)
{
using (Stream output = File.OpenWrite(targetFile))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}
Note that this also closes the output stream (due to the using statement) which your original code didn't.
The important point is that this will use the operating system file buffering more efficiently, because you reuse the same input stream, instead of reopening the file at the beginning and then seeking.
I think it'll be significantly faster, but obviously you'll need to try it to see...
This assumes contiguous chunks, of course. If you need to skip bits of the file, you can do that from outside the method. Also, if you're writing very small files, you may want to optimise for that situation too - the easiest way to do that would probably be to introduce a BufferedStream wrapping the input stream.
The fastest way to do file I/O from C# is to use the Windows ReadFile and WriteFile functions. I have written a C# class that encapsulates this capability as well as a benchmarking program that looks at differnet I/O methods, including BinaryReader and BinaryWriter. See my blog post at:
http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/
How large is length? You may do better to re-use a fixed sized (moderately large, but not obscene) buffer, and forget BinaryReader... just use Stream.Read and Stream.Write.
(edit) something like:
private static void copy(string srcFile, string dstFile, int offset,
int length, byte[] buffer)
{
using(Stream inStream = File.OpenRead(srcFile))
using (Stream outStream = File.OpenWrite(dstFile))
{
inStream.Seek(offset, SeekOrigin.Begin);
int bufferLength = buffer.Length, bytesRead;
while (length > bufferLength &&
(bytesRead = inStream.Read(buffer, 0, bufferLength)) > 0)
{
outStream.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
while (length > 0 &&
(bytesRead = inStream.Read(buffer, 0, length)) > 0)
{
outStream.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}
You shouldn't re-open the source file each time you do a copy, better open it once and pass the resulting BinaryReader to the copy function. Also, it might help if you order your seeks, so you don't make big jumps inside the file.
If the lengths aren't too big, you can also try to group several copy calls by grouping offsets that are near to each other and reading the whole block you need for them, for example:
offset = 1234, length = 34
offset = 1300, length = 40
offset = 1350, length = 1000
can be grouped to one read:
offset = 1234, length = 1074
Then you only have to "seek" in your buffer and can write the three new files from there without having to read again.
Have you considered using the CCR since you are writing to separate files you can do everything in parallel (read and write) and the CCR makes it very easy to do this.
static void Main(string[] args)
{
Dispatcher dp = new Dispatcher();
DispatcherQueue dq = new DispatcherQueue("DQ", dp);
Port<long> offsetPort = new Port<long>();
Arbiter.Activate(dq, Arbiter.Receive<long>(true, offsetPort,
new Handler<long>(Split)));
FileStream fs = File.Open(file_path, FileMode.Open);
long size = fs.Length;
fs.Dispose();
for (long i = 0; i < size; i += split_size)
{
offsetPort.Post(i);
}
}
private static void Split(long offset)
{
FileStream reader = new FileStream(file_path, FileMode.Open,
FileAccess.Read);
reader.Seek(offset, SeekOrigin.Begin);
long toRead = 0;
if (offset + split_size <= reader.Length)
toRead = split_size;
else
toRead = reader.Length - offset;
byte[] buff = new byte[toRead];
reader.Read(buff, 0, (int)toRead);
reader.Dispose();
File.WriteAllBytes("c:\\out" + offset + ".txt", buff);
}
This code posts offsets to a CCR port which causes a Thread to be created to execute the code in the Split method. This causes you to open the file multiple times but gets rid of the need for synchronization. You can make it more memory efficient but you'll have to sacrifice speed.
The first thing I would recommend is to take measurements. Where are you losing your time? Is it in the read, or the write?
Over 100,000 accesses (sum the times):
How much time is spent allocating the buffer array?
How much time is spent opening the file for read (is it the same file every time?)
How much time is spent in read and write operations?
If you aren't doing any type of transformation on the file, do you need a BinaryWriter, or can you use a filestream for writes? (try it, do you get identical output? does it save time?)
Using FileStream + StreamWriter I know it's possible to create massive files in little time (less than 1 min 30 seconds). I generate three files totaling 700+ megabytes from one file using that technique.
Your primary problem with the code you're using is that you are opening a file every time. That is creating file I/O overhead.
If you knew the names of the files you would be generating ahead of time, you could extract the File.OpenWrite into a separate method; it will increase the speed. Without seeing the code that determines how you are splitting the files, I don't think you can get much faster.
No one suggests threading? Writing the smaller files looks like text book example of where threads are useful. Set up a bunch of threads to create the smaller files. this way, you can create them all in parallel and you don't need to wait for each one to finish. My assumption is that creating the files(disk operation) will take WAY longer than splitting up the data. and of course you should verify first that a sequential approach is not adequate.
(For future reference.)
Quite possibly the fastest way to do this would be to use memory mapped files (so primarily copying memory, and the OS handling the file reads/writes via its paging/memory management).
Memory Mapped files are supported in managed code in .NET 4.0.
But as noted, you need to profile, and expect to switch to native code for maximum performance.