File.WriteAllBytes does not block - c#

I have a simple piece of code like so:
File.WriteAllBytes(Path.Combine(temp, node.Name), stuffFile.Read(0, node.FileHeader.FileSize));
One would think that WriteAllBytes would be a blocking call as it has Async counterparts in C# 5.0 and it doesn't state anywhere in any MSDN documentation that it is non-blocking. HOWEVER when a file is of a reasonable size (not massive, but somewhere in the realms of 20mb) the call afterwards which opens the file seems to be called before the writing is finished, and the file is opened (the program complains its corrupted, rightly so) and the WriteAllBytes then complains the file is open in another process. What is going on here?! For curiosity sake, this is the code used to open the file:
System.Diagnostics.Process.Start(Path.Combine(temp, node.Name));
Anyone experience this sort of weirdness before? Or is it me being a blonde and doing something wrong?
If it is indeed blocking, what could possibly be causing this issue?
EDIT: I'll put the full method up.
var node = item.Tag as FileNode;
stuffFile.Position = node.FileOffset;
string temp = Path.GetTempPath();
File.WriteAllBytes(Path.Combine(temp, node.Name), stuffFile.Read(0, node.FileHeader.FileSize));
System.Diagnostics.Process.Start(Path.Combine(temp, node.Name));
What seems to be happening is that Process.Start is being called BEFORE WriteAllBytes is finished, and its attempting to open the file, and then WriteAllBytes complains about another process holding the lock on the file.

No, WriteAllBytes is a blocking, synchronous method. As you stated, if it were not, the documentation would say so.
Possibly the virus scanner is still busy scanning the file that you just wrote, and is responsible for locking the file. Try temporarily disabling the scanner to test my hypothesis.

I think your problem may be with the way you are reading from the file. Note that Stream.Read (and FileStream.Read) is not required to read all you request.
In other words, your call stuffFile.Read(0, node.FileHeader.FileSize) might (and definitely will, sometimes) return an array of node.FileHeader.FileSize which contains some bytes of the file at the beginning, and then the 0's after.
The bug is in your UsableFileStream.Read method. You could fix it by having it read the entire file into memory:
public byte[] Read(int offset, int count)
{
// There are still bugs in this method, like assuming that 'count' bytes
// can actually be read from the file
byte[] temp = new byte[count];
int bytesRead;
while ( count > 0 && (bytesRead = _stream.Read(temp, offset, count)) > 0 )
{
offset += bytesRead;
count -= bytesRead;
}
return temp;
}
But since you are only using this to copy file contents, you could avoid having these potentially massive allocations and use Stream.CopyTo in your tree_MouseDoubleClick:
var node = item.Tag as FileNode;
stuffFile.Position = node.FileOffset;
string temp = Path.GetTempPath();
using (var output = File.Create(Path.Combine(temp, node.Name)))
stuffFile._stream.CopyTo(output);
System.Diagnostics.Process.Start(Path.Combine(temp, node.Name));

A little late, but adding for the benefit of anyone else that might come along.
The underlying C# implementation of File.WriteAllBytes may well be synchronous, but the authors of C# cannot control at the OS level how the writing to disk is handled.
Something called write caching means that when C# asks to save the file to disk, the OS may return "I'm done" before the file is fully written to the disk, causing the issue OP highlighted.
In that case, after writing, it may be better to sleep in a loop and keep checking to see if the file is still locked before calling Process.Start.
You can see that I run into problems caused by this here: C#, Entity Framework Core & PostgreSql : inserting a single row takes 20+ seconds
Also, in the final sentence of OPs post "and then WriteAllBytes complains about another process holding the lock on the file." I think they actually meant to write "and then Process.Start complains" which seems to have caused some confusion in the comments.

Related

Out of Memory Exception when using File Stream Write Byte to Output Progress Through the Console

I have the following code that throws an out of memory exception when writing large files. Is there something I'm missing?
I am not sure why it is throwing an out of memory error as I thought the Filestream would only use a maximum of 4096 bytes for the buffer? I am not entirely sure what it means by the Buffer to be honest and any advice would be appreciated.
public static async Task CreateRandomFile(string pathway, int size, IProgress<int> prog)
{
byte[] fileSize = new byte[size];
new Random().NextBytes(fileSize);
await Task.Run(() =>
{
using (FileStream fs = File.Create(pathway,4096))
{
for (int i = 0; i < size; i++)
{
fs.WriteByte(fileSize[i]);
prog.Report(i);
}
}
}
);
}
public static void p_ProgressChanged(object sender, int e)
{
int pos = Console.CursorTop;
Console.WriteLine("Progress Copied: " + e);
Console.SetCursorPosition (0, pos);
}
public static void Main()
{
Console.WriteLine("Testing CopyLearning");
//CopyFile()
Progress<int> p = new Progress<int>();
p.ProgressChanged += p_ProgressChanged;
Task ta = CreateRandomFile(#"D:\Programming\Testing\RandomFile.asd", 99999999, p);
ta.Wait();
}
Edit: the 99,999,999 was just created to make a 99MB file
Note: I have commented out prog.Report(i) and it will work fine.
It seems for some reason, the error occurs at the line
Console.writeline("Progress Copied: " + e);
I am not entirely sure why this causes an error? So the error might have been caused because of the progressEvent?
Edit 2: I have followed advice to change the code such that it reports progress every 4000 Bytes by using the following:
if (i%4000==0)
prog.Report(i);
For some reason. I am now able to write files up to 900MBs fine.
I guess the question is, why would the "Edit 2"'s code allow it to write up to 900MB just fine? Is it because it's reporting progress and writing to the console up to 4000x less than before? I didn't realize the Console would take up so much memory especially because I'm assuming all it's doing is outputting "Progress Copied"?
Edit 3:
For some reason when I change the following line as follows:
for (int i = 0; i < size; i++)
{
fs.WriteByte(fileSize[i]);
Console.Writeline(i)
prog.Report(i);
}
where there is a "Console.Writeline()" before the prog.Report(i), it would work fine and copy the file, albeit take a very long time to do so. This leads me to believe that this is a Console related issue for some reason but I am not sure as to what.
fs.WriteByte(fileSize[i]);
prog.Report(i);
You created a fire-hose problem. After deadlocks and threading races, probably the 3rd most likely problem caused by threads. And just as hard to diagnose.
Easiest to see by using the debugger's Debug + Windows + Threads window and look at thread that is executing CreateRandomFile(). With some luck, you'll see it is completed and has written all 99MB bytes. But the progress reported on the console is far behind this, having only reported 125KB bytes written, give or take.
Core issue is the way Progress<>.Report() works. It uses SynchronizationContext.Post() to invoke the ProgressChanged event handler. In a console mode app that will call ThreadPool.QueueUserWorkItem(). That's quite fast, your CreateRandomFile() method won't be bogged down much by it.
But the event handler itself is quite a lot slower, console output is not very fast. So in effect, you are adding threadpool work requests at an enormous rate, 99 million of them in a handful of seconds. No way for the threadpool scheduler to keep up, you'll have roughly 4 of them executing at the same time. All competing to write to the console as well, only one of them can acquire the underlying lock.
So it is the threadpool scheduler that causes OOM, forced to store so many work requests.
And sure, when you call Report() less frequently then the fire-hose problem is a lot less worse. Not actually that simple to ensure it never causes a problem, although directly calling Console.Write() is an obvious fix. Ultimately simple, create a usable UI that is useful to a human. Nobody likes a crazily scrolling window or a blur of text. Reporting progress no more frequently than 20 times per second is plenty good enough for the user's eyes, the console has no trouble keeping up with that.

Deleting file after uploading ASP.NET

I made a system where user can upload a file (image) to a server and server saves it. All is good, but when I want to delete the files uploaded by user, I get an exception saying:
the process cannot access the file because it is being used by another process
This is the code for saving the file:
HttpFileCollection files = httpRequest.Files;
for (int i = 0; i < files.Count; i++) {
var postedFile = files[i];
// I tried this one before, but I read that I should .Dispose() files, therefore
// I settled to the other, uncommented solution (however, both of them do the same thing)
//postedFile.SaveAs(filePath);
using (FileStream fs = File.Create(filePath)) {
postedFile.InputStream.CopyTo(fs);
postedFile.InputStream.Close();
postedFile.InputStream.Dispose();
fs.Dispose();
fs.Close();
}
}
The deleting of files is quite simple. In a method called Delete, I call this method:
...
File.Delete(HttpContext.Current.Server.MapPath(CORRECT_PATH_TO_FILE));
...
Any suggestions on how to solve this?
Thanks
Just as Michael Perrenoud suggested me in the comment to my main question, I was also opening the file in another class and not disposing it when done with working with it. Problem is therefore solved. Thanks!
Where are you trying the file delete method? As part of the loop? If so, it is natural to have it locked. If outside of the loop, then it is a different problem (perhaps not garbage collected yet?).
To avoid the loop problem, gather a list of locations you are going to delete (declare outside of the loop, can be populated within) and then delete in another "clean up" loop (another method is even better for reusability).
NOTE: Close() before Dispose() not the other way around. You actually do not have to do both, as Dispose() should always handle making sure everything is clean (especially in .NET framework uses of IDisposable), but I don't see any harm in Close() followed by Dispose().

Read Changes on a text file dynamically c#

I have a program that continuously writes its log to a text file.
I don't have the source code of it, so I can not modify it in any way and it is also protected with Themida.
I need to read the log file and execute some scripts depending on the content of the file.
I can not delete the file because the program that is continuously writing to it has locked the file.
So what will be the better way to read the file and only read the new lines of the file?
Saving the last line position? Or is there something that will be useful for solving it in C#?
Perhaps use the FileSystemWatcher along with opening the file with FileShare (as it is being used by another process). Hans Passant has provided a nice answer for this part here:
var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (var sr = new StreamReader(fs)) {
// etc...
}
Have a look at this question and the accepted answer which may also help.
using (var fs = new FileStream("test.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite | FileShare.Delete))
using (var reader = new StreamReader(fs))
{
while (true)
{
var line = reader.ReadLine();
if (!String.IsNullOrWhiteSpace(line))
Console.WriteLine("Line read: " + line);
}
}
I tested the above code and it works if you are trying to read one line at a time. The only issue is that if the line is flushed to the file before it is finished being written then you will read the line in multiple parts. As long as the logging system is writing each line all at once it should be okay.
If not then you may want to read into a buffer instead of using ReadLine, so you can parse the buffer yourself by detecting each Environment.NewLine substring.
You can just keep calling ReadToEnd() in a tight loop. Even after it reaches the end of the file it'll just return an empty string "". If some more data is written to the file it will pick it up on a subsequent call.
while (true)
{
string moreData = streamReader.ReadToEnd();
Thread.Sleep(100);
}
Bear in mind you might read partial lines this way. Also if you are dealing with very large files you will probably need another approach.
Use the filesystemwatcher to detect changes and get new lines using last read position and seek the file.
http://msdn.microsoft.com/en-us/library/system.io.filestream.seek.aspx
The log file is being "continuously" updated so you really shouldn't use FileSystemWatcher to raise an event each time the file changes. This would be triggering continuously, and you already know it will be very frequently changing.
I'd suggest using a timer event to periodically process the file. Read this SO answer for a good pattern to use System.Threading.Timer1. Keep a file stream open for reading or reopen each time and Seek to the end position of your last successful read. By "last successful read" I mean that you should encapsulate the reading and validating of a complete log line. Once you've successfully read and validated a log line, then you have a new position for the next Seek.
1 Note that System.Threading.Timer will execute on a system supplied thread that is kept in business by the ThreadPool. For short tasks this is more desirable that a dedicated thread.
Use this answer on another post c# continuously read file.
This one is quite efficient, and it checks once per second if the file size has changed. So the file is usually not read-locked as a result.
The other answers are quite valid and simple. A couple of them will read-lock the file continuously, but that's probably not a problem for most.

Some trouble with locking files for a longer time

I have a function that is reading a file and adding some of the string in a list and returning this list. Because I wanted that nobody and nothing could change, delete or whatever the current file that I was reading I locked it. Everything was fine, I did it somehow like this:
public static List<string> Read(string myfile)
{
using (FileStream fs = File.Open(myfile, FileMode.Open, FileAccess.Read, FileShare.None))
{
//read lines, add string to a list
//return list
}
}
Thats fine. Now I have another function in another class that is doing stuff with the list and calling other functions and so on. Now sometimes I want to move the file that I was reading. And here is the problem: because Im now in a new function and the function Read(string myfile) is already processed, there is no more lock in the file.
//in another class
public static void DoStuff(/*somefile*/)
{
List<string> = Read(/*somefile*/);
//the file (somefile) is not more locked!
//do stuff
if (something)
Move(/*somefile*/) //could get an error, file maybe not more there or changed...
}
So another function/user could change the file, rename it, deleting it or whatever => Im not able to move this file. Or I will move the changed file, but I dont what that. If I would use threading, another thread with the same function could lock the file again and I could not move it.
Thats why I somehow need to lock this file for a longer time. Is there an easy way? Or do I have to replace my using (FileStream fs = File.Open(myfile, FileMode.Open, FileAccess.Read, FileShare.None) code? Any suggestions? thank you
If you want to keep the file locked for longer then you need to refactor your code so that the Stream object is kept around for longer - I would change the Read method to accept a FileStream, a little bit like this
using (FileStream fs = File.Open(myfile, FileMode.Open, FileAccess., FileShare.None))
{
List<string> = Read(fs);
if (something)
{
File.Move(/* somefile */)
}
}
The problem you are going to have is that File.Move method is going to fail as this file is already locked (by you, but File.Move doesn't know that).
Depending on what exactly it is you want to do it might be possible to work out a way of keeping the file locked while also "moving" the file, (for example if you know something in advance you could open the file specifying FileOptions.DeleteOnClose and write a new file with the same contents in the desired destination), however this isn't really the same as moving the file and so it all depends on what exactly it is you are trying to do.
In general such things are almost always going to be more trouble than they are worth - you are better off just unlocking the file just before you move it and catching/ handling any exception that is thrown as a result of the move.
The only way you could keep it locked is to keep it exclusively open, like you have done in your code.
Maybe you need to //do stuff within your using statement, and then straight after call Move
No amount of locking will prevent this. A lock only protects the data in the file. The user (or any other program) can still move or rename the file. The file's attributes, including name, time stamps and file attributes are stored separately and can be changed at will.
This is just something you'll have to deal with in any Windows program. It is rare enough that simply catching the exception is good enough to let you know that something happened to the file. The user will rarely be surprised. If you really need to know up front then you can use FileSystemWatcher to get a notification when it happens.
You are locking the file only when Read method is called.
If you want to keep it locked and release it only when you decide, make your methods OpenFile(string filename) and CloseFile(string filename). Then remove the using statement from Read method.
Open it when you start working (lock). Read it when you need it. When you have to move it, simply create a new file with the same name and copy the content. Close the original file (unlock) and delete it.

c# - reading from binary log file that is updated every 6 seconds with 12k of data

I have a binary log file with streaming data from a sensor (Int16).
Every 6 seconds, 6000 samples of type Int16 are added, until the sensor is disconnected.
I need to poll this file on regular intervals, continuing from last position read.
Is it better to
a) keep a filestream and binary reader open and instantiated between readings
b) instantiate filestream and binary reader each time I need to read (and keep an external variable to track the last position read)
c) something better?
EDIT: Some great suggestions so far, need to add that the "server" app is supplied by an outside source vendor and cannot be modified.
If it's always adding the same amount of data, it may make sense to reopen it. You might want to find out the length before you open it, and then round down to the whole number of "sample sets" available, just in case you catch it while it's still writing the data. That may mean you read less than you could read (if the write finishes between you checking the length and starting the read) but you'll catch up next time.
You'll need to make sure you use appropriate sharing options so that the writer can still write while you're reading though. (The writer will probably have to have been written with this in mind too.)
Can you use MemoryMappedFiles?
If you can, mapping the file in memory and sharing it between processes you will be able to read the data by simply incrementing the offset for your pointer each time.
If you combine it with an event you can signal your reader when he can go in an read the information. There will be no need to block anything as the reader will always read "old" data which has already been written.
I would recommend using pipes, they act just like files, except stream data directly between applications, even if the apps run on different PCs (though this is really only an option if you are able to change both applications). Check it out under the "System.IO.Pipes" namespace.
P.S. You would use a "named" pipe for this (pipes are supported in 'c' as well, so basically any half decent programming language should be able to implement them)
I think that (a) is the best because:
Current Position will be incremented as you read and you don't need to worry about to store it somewhere;
You don't need to open it and seek required position (it shouldn't be much slower to reopen but keeping it open gives OS some hints for optimization I believe) each time you poll it;
Other solutions I can think out requires PInvokes to system interprocess synchronisation primitives. And they won't be faster than file operations already in framework.
You just need to set proper FileShare flags:
Just for example:
Server:
using(var writer = new BinaryWriter(new FileStream(#"D:\testlog.log", FileMode.Append, FileAccess.Write, FileShare.Read)))
{
int n;
while(Int32.TryParse(Console.ReadLine(), out n))
{
writer.Write(n);
writer.Flush(); // write cached bytes to file
}
}
Client:
using (var reader = new BinaryReader(new FileStream(#"D:\testlog.log", FileMode.Open, FileAccess.Read, FileShare.ReadWrite)))
{
string s;
while (Console.ReadLine() != "exit")
{
// allocate buffer for new ints
Int32[] buffer = new Int32[(reader.BaseStream.Length - reader.BaseStream.Position) / sizeof(Int32)];
Console.WriteLine("Stream length: {0}", reader.BaseStream.Length);
Console.Write("Ints read: ");
for (int i = 0; i < buffer.Length; i++)
{
buffer[i] = reader.ReadInt32();
Console.Write((i == 0 ? "" : ", ") + buffer[i].ToString());
}
Console.WriteLine();
}
}
you could also stream the data into a database, rather than a file as another alternative, then you wouldn't have to worry about file locking.
but if you're stuck with the file method, you may want to close the file each time you read data from it; it depends alot on how complicated the process writing to the file is going to be, and whether it can detect a file locking operation and respond appropriately without crashing horribly.

Categories

Resources