Check if a file Exists async? - c#

I wish there was a File.ExistsAsync()
I have:
bool exists = await Task.Run(() => File.Exists(fileName));
Using a thread for this feels like an antipattern.
Is there a cleaner way?

There is no cleaner way than your solution.
The problems of race conditions aside I believe your solution can be used in some situations.
e.g.
I have static file content in many different folders. (in my case cshtml views,script files, css files, for mvc)
These files (which do not change much, during application execution) are always checked for in every request to the webserver, due to my application architecture, there are alot more places that files are checked for than in the default mvc application. So much so that file.exists takes up quite a portion of time each request.
so race conditions will generally not happen. The only interesting question for me is performance
starting a task with Task.Factory.StartNew() takes 0.002 ms (source Why so much difference in performance between Thread and Task?)
calling file.exists takes "0.006255ms when the file exists and 0.010925ms when the file does not exist." [Richard Harrison]
so by simple math calling the async File.Exists takes 0.008 ms up to 0.012 ms
in the best case async File.Exists takes 1.2 times as long as File.Exists and in the worst case it takes 1.3 times as long. (in my case most paths that are searched do not exist) so most of the time a File.Exists is mostly close to 0.01 ms
so it is not that much overhead, and you can utilize multiple cores/ harddisk controllers etc. more efficiently. With these calculations you can see that asynchroniously checking for existence of 2 files you will already have a performance increase of 1.6 in the worst case (0.02/ 0.012 )
well i'm just asyning async File.Exists is worth it in specific situations.
caveats of my post:
i might have not calculated everything correctly
i rounded alot
i did not measure performance on a single pc
i took performance from other posts
i just added the time of File.Exists and Task.Factory.StartNew() (this may be wrong)
i disregard alot of sideffects of multithreading

Long time since this thread, but I found it today...
ExistsAsync should definitely be a thing. In fact, in UWP, you have to use Async methods to find out if a file exists, as it could take longer than 50ms (anything that 'could' take longer than 50ms should be async in UWP language).
However, this is not UWP. The reason I need it is to check for folder.exists which if on a network share, remote disk, or idle disk would block the UI. So I can put all the messages like "checking..." but the UI wouldn't update without aysnc (or ViewModel, or timers, etc.)
bool exists = await Task.Run(() => File.Exists(fileName)); works perfectly. In my code, I have both (Exists and ExistsAsync) so that I can run Exists() when running on a non UI thread and don't have to worry about the overhead.

There isn't a File.ExistsAsync probably for good reason; because it makes absolutely no sense to have one; File.Exists is not going to take very long; I tested it as 0.006255ms when the file exists and 0.010925ms when the file does not exist.
There are a few times when it is sensible to call File.Exists; however usually I think the correct solution would be to open the file (thus preventing deletion), catching any exceptions - as there is no guarantee that the file will continue to exist after the call to File.Exists.
When you want to create a new file and not overwrite the old one :
File.Open("fn", FileMode.CreateNew)
For most of the use cases I can think of File.Open() (whether for existing or create new) is going to be better because once the call succeeds you will have a handle to the file and be able to do something with it. Even when using the file's existence as a flag I think I'd still open and close it. The only time I've really used File.Exists is when checking to see if a local HTML file is there before calling the browser so I can show a nice error message when it isn't.
The is no guarantee that something else won't delete the file after File.Exists; so if you did open it after checking with File.Exists the open call could still fail.
In my tests using a FileExists on network drive takes longer than File.Open, File.Exists takes 1.5967ms whereas File.OpenRead takes 0.3927ms)
Maybe if you could expand upon why you're doing this we'd be better able to answer; until then I'd say that you shouldn't do this

Related

Async/Await vs Parellel.For, which is better in this instance?

So I have 1000s of items to check whether they are up to date. Each one of those items requires reading thousands of files (some of which might be the same file across different items).
Currently this is implements using the TPL (async/await), one for each file it has to read and one for each item it has to check. This works fine, except for when I profile it, about the 3rd most expensive function is TrySteal in the thread pool.
Using the visual studio concurrency viewer, I see that 99% of a threads time in spent in concurrently related items, and only 1% in execution. It is this that leads me to think that I am perhaps just creating too many tasks (note: I don't use Task.Run anywhere, just await).
Would Parellel.For be any less overhead than reading a bunch of files using async/await? How much overhead is expected using the task programming library?
If you are checking files on the hard drive, I don't think that this task is very well parallelable. If you are trying to read thousands of files at the same time, you just make the process much slower, because it cannot read that many of them at the same time, and even worse, it cannot cache too many into memory.
The fastest option, without optimization of the checking process itself, should be just running it consecutively.
If you really want to optimize it, I suggest to loop through the files, checking for each item, instead of looping through item, checking each file. In this case, it might be effective even to do it in multiple threads (not all at once though).
Update:
For the case when you have enough memory to cache all your files, then it does not restrict multithreading that much. Still, I would suggest to limit amount of parallel threads to number, comparable to amount of processor cores you going to work with. It is better to do it with Parallel.ForEach(). Also, Parallel.Foreach() clearly states, that you loop is async, so the code will be easier to understand.

Safely saving a file in Windows 10 IOT

My team requires a bulletproof way to save a file (less than 100kb) on Windows 10 IOT.
The file cannot be corrupted but it's OK to loose the most recent version if save failed because of power off etc.
Since the File IO has changed significantly (no more File.Replace) we are not sure how to achieve it.
We can see that:
var file = await folder.CreateFileAsync(fileName, CreationCollisionOption.OpenIfExists);
await Windows.Storage.FileIO.WriteTextAsync(file, data);
is reliably unreliable (it repeatedly broke when stopping debugging, or reset the device.) and we are ending up with a corrupted file (full of zeroes) and and a .tmp file next to it. We can recover this .tmp file I'm not confident that we should base our solution on undocumented behaviour.
One way we want to try is:
var tmpfile = await folder.CreateFileAsync(fileName+".tmp",
CreationCollisionOption.ReplaceExisting);
await Windows.Storage.FileIO.WriteTextAsync(tmpfile, data);
var file = await folder.CreateFileAsync(fileName, CreationCollisionOption.OpenIfExists);
// can this end up with a corrupt or missing file?
await tmpfile.MoveAndReplaceAsync(file);
In summary, is there a safe way to save some text to a file that will never corrupt the file?
Not sure if there's a best practice for this, but if needed to come up with something myself:
I would do something like calculating a checksum and save that along with the file.
When saving the next time, don't overwrite it but save it next to the previous one (which should be "known good"), and delete the previous one only after verifying that the new save completed successfully (together with the checksum)
Also I would assume that a rename operation should not corrupt the file, but I haven't researched that
This article has a good explanation: Best practices for writing to files on the underlying processes involved with writing to files in UWP.
The following common issues are highlighted:
A file is partially written.
The app receives an exception when calling one of the methods.
The operations leave behind .TMP files with a file name similar to the target file name.
What is not easily deduced in discussion about the trade off with convenience-vs-control is that while create or edit operations are more prone to failure, because they do a lot of things, renaming operations are a lot more fault tolerant if they are not physically writing bits around the filesystem.
You suggestion of creating a temp file first, is on the right track and may serve you well, but using MoveAndReplaceAsync means that you are still susceptible to these known issues if the destination file already exists.
UWP will use a transactional pattern with the file system and may create various backup copies of the source and the destination files.
You can take control of the final element by deleting the original file before calling MoveAndReplaceAsync, or you could simply use RenameAsync if your temp file is in the same folder, these have less components which should reduce the area for failure.
#hansmbakker has an answer along these lines, how you identify that the file write was successful is up to you, but by isolating the heavy write operation and verifying it before overwriting your original is a good idea if you need it to be bulletproof.
About Failure
I have observed the .TMP files a lot, when using the Append variants of FileIO writing, the .TMP files have the content of the original file before Append, but the actual file does not always have all of the original client, sometimes its a mix of old and new content, and sometimes the
In my experience, UWP file writes are very reliable when your entire call structure to the write operation is asynchronous and correctly awaits the pipeline. AND you take steps to ensure that only one process is trying to access the same file at any point in time.
When you try to manipulate files from a synchronous context we can start to see the "unreliable" nature you have identified, this happens a lot in code that is being transitioned from the old synchronous operations to the newer Async variants of FileIO operations.
Make sure the code calling your write method is non-blocking and correctly awaits, this will allow you to catch any exceptions that might be raised
it is common for us traditionally synchronous minded developers to try to use a lock(){} pattern to ensure single access to the file, but you cannot easily await inside a lock and attempts to do so often become the source of UWP file write issues.
If your code has a locking mechanism to ensure singleton access to the file, have a read over these articles for a different approach, they're old but a good resource that covers the transition for a traditional synchronous C# developer into async and parallel development.
What’s New For Parallelism in .NET 4.5
Building Async Coordination Primitives, Part 6: AsyncLock
Building Async Coordination Primitives, Part 7: AsyncReaderWriterLock
Other times we encounter a synchronous constraint are when an Event or Timer or Dispose context are the trigger for writing to the file in the first place. There are different techniques to involve there, please post another question that covers that scenario specifically if you think it might be contributing to your issues. :)

Is File.Exists considered harmful?

I often use library functions like File.Exists to check for a file's existence before opening it or doing some other action. While I have had good luck with this in practice over the years, I wonder if it is a poorly thought-out pattern.
Any IO call like a file system read can fail for multiple reasons. The path string could be wrong or the file actually not exist, you could lack permissions, someone else might have a lock that blocks you. You could even have another process or another user on the network move a file in the millisecond between your File.Exists and your Open.
Even if you get a successful result from File.Exists, you still really should enclose your actual open statements in a try block to handle one of the other possible failure modes. If I am thinking about this correctly, File.Exists just lulls you into a false sense of safety if you use it instead of Try (as I am sure that I have on occasion in the past).
All of this makes it sound like I should abandon File.Exists and change whatever existing code I find to use the Try...Catch pattern only. Is this a sensible conclusion? I realize that the framework authors but it there for us to use, but that does not automatically make it a good tool in practice.
I think that the answer completely depends on your specific reasons for using File.Exists.
For example, if you are checking a certain file path for arrival of a file, File.Exists could easily be the appropriate approach because you don't care what the reason for non-existence is.
However, if you are processing a file that an end user has requested (i.e. please import this excel file), you will want to know exactly why the file has failed. In this specific instance, File.Exists wouldn't be quite the right approach because the file existence could change between the time you check it and the time you open the file. In this case, we attempt to open the file and obtain a lock on it prior to processing it. The open method will throw the errors appropriate to the specific scenario that you are handling so you can provide the user more accurate information about the problem (i.e. another process has the file locked, the network file is not available, etc.).
You should absolutely implement an exception handler for any code that could reasonably throw an exception and any I/O operation falls into that category.
That doesn't mean that using File.Exists is wrong though. If there is a reasonable possibility that the file may not exist then prevention is more efficient than cure. If the file absolutely should exist though, it might be more performant overall to suffer the occasional exception rather than take the hit of checking first every time.
I use File.Exists in cases where the file may not exist under normal operating conditions (without something being broken in my system). If I know the file should exist (unless my system is broken), then I don't use File.Exist.
I wouldn't call this a "pattern" though. The best pattern is to consider what you're doing on a case by case basis.
It is up to you how you want to handle the File not found. If you have used File.Exists to check if file is there or not, or you can also use try catch block around your code and handle FilenotFound exception this will identify weather a file exists or not. Its purely up to you but I would prefer to check for File.Exists. It is same like checking null for accesing object rather than writing try catch around your code block and identify in catch that you object is null. It is always good to handle such validations at your end rather leaving it to c# try catch block.

virtual temp file, omit IO operations

Let's say I received a .csv-File over network,
so I have a byte[].
I also have a parser that reads .csv-files and does business things with it,
using File.ReadAllLines().
So far I did:
File.WriteAllBytes(tempPath, incomingBuffer);
parser.Open(tempPath);
I won't ever need the actual file on this device, though.
Is there a way to "store" this file in some virtual place and "open" it again from there, but all in memory?
That would save me ages of waiting on the IO operations to complete (good article on that on coding horror),
plus reducing wear on the drive (relevant if this occured a few dozen times a minute 24/7)
and in general eliminating a point of failure.
This is a bit in the UNIX-direction, where everything is a file-stream, but we're talking windows here.
I won't ever need the actual file on this device, though. - Well, you kind of do if all your API's expect file on the disk.
You can:
1) Get decent API's(I am sure there are CSV parsers that take Stream as construtor parameter - you then can possibly use MemoryStream, for example.)
2) If performance is serious issue, and there is no way you can handle the API's, there's one simple solution: write your own implementation of ramdisk, which will cache everything that is needed, and page stuff to hdd if necessary.
http://code.msdn.microsoft.com/windowshardware/RAMDisk-Storage-Driver-9ce5f699 (Oh did I mention that you absolutely need to have mad experience with drivers :p?)
There's also "ready" solutions for ramdisk(Google!), which means you can just run(in your application initializer) 'CreateRamDisk.exe -Hdd "__MEMDISK__"'(for example), and use File.WriteAllBytes("__MEMDISK__:\yourFile.csv");
Alternatively you can read about memory-mapped files(>= C# 4.0 has nice support). However, by the sounds of it, that probably does not help you too much.

Does FileStream.Dispose close the file immediately?

I have some code that writes a file by saving a MemoryStream to a FileStream using MemoryStream.WriteTo(). After the file is closed it is opened up again to read some metdata...
This works about 80 - 90% of the time. The other 20% I get an exception saying the file is "in use by another process".
Does FileStream.Dispose() not release resources synchronously? Is there something going on lower in Win32 land I'm not aware of? I'm not seeing anything obvious in the .Net documentation.
As "immediately" as possible. There can easily be some lag due to outstanding writes, delay in updating the directory info etc. It could also be anti-virus software checking your changed file.
This may be a rare case where a Thread.Sleep(1) is called for. But to be totally safe you will have to catch the (any) exception and try again a set number of times.

Categories

Resources