Faster way to get files of a folder than StorageFolder.GetFilesAsync()?

Faster way to get files of a folder than StorageFolder.GetFilesAsync()? - c#

StorageFolder.GetFilesAsync is incredibly slow:
~7 seconds for a folder with ~3500 files
Back in Windows Phone 8.0 Silverlight, I was able to get the content of the CameraRoll much faster (via the MediaLibrary):
<1 second for the same amount of files
Are there any possibilities to speedup GetFilesAsync up, or is there any alternative for getting files of a folder?
I need the photo files to immediately extract information such as the Geotag or DateTaken. You can see how fast they loaded with Silverlight in my app GeoPhoto - which I am now trying to port to UWP. I've already implemented caching (mapping geotag and DateTaken with the picture path), so I would only need the picture path for subsequent app starts. Photos not yet cached could then be displayed later (after the long GetFilesAsync-call), but it is important to give the user something he can interact with immediately after he launched the app.

I wonder if you've read this: https://www.suchan.cz/2014/07/file-io-best-practices-in-windows-and-phone-apps-part-1-available-apis-and-file-exists-checking/
Windows 8.1 - finally on Windows 8.1 the fastest method is the new StorageFolder.TryGetItemAsync method, but only by a slim margin when
comparing to other methods. The main benefit here is definitely the
simple code required without any Exception catching if the file does
not exists. The results for sync methods are similar to Windows 8
platform, if original synchronization context is not required, the
sync methods are a better choice.
In short, for checking if target file exists, on WP8 and WP8.1 Silverlight the fastest method is File.Exists, on Windows 8 and WP8.1
XAML you should use StorageFolder.GetFileAsync, and on Windows 8.1 use
the new method StorageFolder.TryGetItemAsync. Do NOT use the iteration
of StorageFiles returned from StorageFolder.GetFilesAsync on any
platform, it is terribly slow. Also if you don't need co continue on
the original thread, you can use the synchronous alternatives on WP8.1
XAML, Windows 8 and Windows 8.1 platforms.
or something like this?
StorageFolder.GetItemsAsync(UInt32, UInt32)
to fetch the first X number of files to give the user immediate feedback that you desire. After that load the rest.
https://msdn.microsoft.com/en-us/library/windows/apps/br227287.aspx

EDIT: Since my original answer seemed not to be helpful, I hope this one will be matching your problem.
I created a folder with ~4000 files, just for testing and used a Stopwatch for taking the time.
Just reading each item in the folder took:
System.IO.Directory.GetFiles(): 0.2 secs
Windows.Storage.StorageFolder.GetFilesAsync ~5.5 secs
Executing both mulitple times and in different order
I understand that this just gives you the file names as string, but depending on the library you use for reading the pictures, this might still help you.
Original Answer:
When you got the Path as string (e.g from ApplicationData.Current.LocalFolder.Path) you may use System.IO.Directory.GetFiles(string path). It does not return specific objects, but the path as string. You can use those with the static class System.IO.File.
It also allows you to pass a searchPattern, which allows you to use placeholders like * and ? and it's working synchronously, but retrieving files via this method works really quick.

Related

UWP Enumerate Folders with 10,000 files while creating a subfolder

I'm a fairly experienced developer, but this has me stumped in UWP - I'll keep it simple.
Let's say I want to go through all photos in the pictures folder, watermark them, and save the watermarked version in a sub folder of pictures (eg. pictures\watermarked)
Sound easy?
Try 1: Using GetFilesAsync (incl. GetItemsAsync, GetFoldersAsync) - This method goes through every file, giving me the StorageFile objects I need.
There are 2 problems with this approach:
I can't show a progress bar until I've scanned every file and that's
painfully slow in UWP.
The Runtime Broker will consume all memory if I keep any reference
to the StorageFile object (so enumerate and enumerate again to get a
progress is seriously slow, think 1,000 times slower than Win32)
Try 2: Using Queries - This method involves using Windows.System.Search & Queries to return a list of pointers (ish) to all the files. I can then use StorageFolderQueryResult to get each StorageFile on the fly and release immediately so that the Runtime Broker behaves. This is very fast as it uses the Windows Index system, really, really fast.
The problem is that the query system is fairly stupid, as soon I create the subfolder "Watermarked Photos", the storagefiles returned by the Query (which did not exist when it was queried) start to contain files from the Watermarked folder. It appears that the Query is actually just a number of files, not a static list of the actual files, so the results are arbitrary based on any files added/removed after the query was invoked within it's scope.
Anyone with thoughts on how to do this?

RESOLVED - It's not possible using the index system. I created my own Query class. It uses the GetItemsAsync method of folders, the number of objects here won't kill the RuntimeBroker, I store the Path in a string list. containing the paths of all files and sub folders. I can then use GetFileFromPathAsync to instantiate and destroy StorageItems as needed. The RuntimeBroker is okay with that, although it's not the best performance it does give me custom file/folder filtering. Happy to elaborate if anyone needs more info.

How to write a window service that track the number of times a specific folder was Opened

I Hope that this is the Correct way of asking this question. first my problem is that i want to know that how many times a specific folder was opened from the time my windows service start's. I don't want to write a desktop application for this purpose because i want it to happen in the background and also later i may want to add some more functionality. So that is why i need to be it a windows service.
is there some kind of OS Event that i can handle during my code, i.e the event is fired when a user open's folder.
If this is not the correct method then please let me know some other method that can help.

That's not possible in C#. You can be notified of changes within a directory and infer from that that the directory was opened--but there are many times when a directory is opened and nothing will be changed. What you're describing is most like a File System Filter Driver.
From What is a File System Filter Driver:
A file system filter driver can filter I/O operations for one or more file systems or file system volumes. Depending on the nature of the driver, filter can mean log, observe, modify, or even prevent.
Writing a filter is relatively easy, considering there are templates that you can use to base your work from. But, they do consist of kernel-mode code meaning they're not written with C# (they are typically written with C) and they are drivers.
for more details: http://msdn.microsoft.com/en-us/library/windows/hardware/ff540382(v=vs.85).aspx

Patch an application

I need to create a patching routine for my application,
it's really small but I need to update it daily or weekly
how does the xdelta and the others work?
i've read around about those but I didn't understand much of it
the user shouldn't be prompted at all

Ok this post got flagged on meta for the answers given, so I'm going to weigh in on this.
xdelta is a binary difference program that, rather than providing you with a full image, only gives you what has changed and where. An example of a text diff will have + and - signs before lines of text showing you that these have been added or removed in the new version.
There are two ways to update a binary image: replace it using your own program or replace it using some form of package management. For example, Linux Systems use rpm etc to push out updates to packages. In a windows environment your options are limited by what is installed if you're not on a corporate network. If you are, try WSUS and MSI packaging. That'll give you an easier life, or ClickOnce as someone has mentioned.
If you're not however, you will need to bear in mind the following:
You need to be an administrator to update anything in certain folders as others have said. I would strongly encourage you to accept this behaviour.
If the user is an administrator, you can offer to check for updates. Then, you can do one of two things. You can download a whole new version of your application and write it over the image on the hard disk (i.e. the file - remember images are loaded into memory so you can re-write your own program file). You then need to tell the user the update has succeeded and reload the program as the new image will be different.
Or, you can apply a diff if bandwidth is a concern. Probably not in your case but you will need to know from the client program the two versions to diff between so that the update server gives you the correct patch. Otherwise, the diff might not succeed.
I don't think for your purposes xdelta is going to give you much gain anyway. Just replace the entire image.
Edit if the user must not be prompted at all, just reload the app. However, I would strongly encourage informing the user you are talking on their network and ask permission to do so / enable a manual update mode, otherwise people like me will block it.

What kind of application is this ? Perhaps you could use clickonce to deploy your application. Clickonce very easily allows you to push updates to your users.
The short story is, Clickonce creates an installation that allows your users to install the application from a web server or a file share, you enable automatic updates, and whenever you place a new version of the app on the server the app will automatically(or ask the user wether to) update the app. The clickonce framework takes care of the rest - fetching the update , figure out which files have changed and need to be downloaded again and performs the update. You can also check/perform the update programatically.
That said, clickonce leaves you with little control over the actual installation procedure, and you have nowhere close to the freedom of building your own .msi.

I wouldn't go with a patching solution, since it really complicates things when you have a lot of revisions. How will the patching solution handle different versions asking to be updated? What if user A is 10 revisions behind the current revision? Or 100 revisions, etc? It would probably be best to just download the latest exe(s) and dll(s) and replace them.
That said, I think this SO question on silent updates might help you.

There is a solution for efficient patching - it works on all platforms and can run in completely silent mode, without the user noticing anything. On .NET, it provides seamless integration of the update process using a custom UserControl declaratively bound to events from your own UI.
It's called wyUpdate.
While the updating client (wyUpdate) is open source, a paid for wybuild tool is used to build and publish the patches.

Depending on the size of your application, you'd probably have it split up into several dll's, an exe, and other files.
What you could do is have the main program check for updates. If updates are available, the main program would close and the update program would take over - updating old files, creating new ones, and deleting current files as specified by the instructions sent along with a patch file (probably a compressed format such as .zip) downloaded by the updater.
If your application is small (say, a single exe) it would suffice to simply have the updater replace that one exe.
Edit:
Another way to do this would be to (upon compilation of the new exe), compare the new one to the old one, and just send the differences over to the updater. It would then make the appropriate adjustments.

You can make your function reside in a separate DLL. So you can just replace the DLL instead of patching the whole program. (Assuming Windows as the target platform for a C# program.)

.NET FileInfo.LastWriteTime & FileInfo.LastAccessTime are wrong

When I call FileInfo(path).LastAccessTime or FileInfo(path).LastWriteTime on a file that is in the process of being written it returns the time that the file was created, not the last time it was written to (ie. now).
Is there a way to get this information?
Edit: To all the responses so far. I hadn't tried Refresh() but that does not do it either. I am returned the time that the file was started to be written to. The same goes for the static method, and creating a new instance of FileInfo.
Codymanix might have the answer, but I'm not running Windows Server (using Windows 7), and I don't know where the setting is to test.
Edit 2: Nobody finds it interesting that this function doesn't seem to work?

The FileInfo values are only loaded once and then cached. To get the current value, call Refresh() before getting a property:
f.Refresh();
t = f.LastAccessTime;
Another way to get the current value is by using the static methods on the File class:
t = File.GetLastAccessTime(path);

Starting in Windows Vista, last access time is not updated by default. This is to improve file system performance. You can find details here:
http://blogs.technet.com/b/filecab/archive/2006/11/07/disabling-last-access-time-in-windows-vista-to-improve-ntfs-performance.aspx
To reenable last access time on the computer, you can run the following command:
fsutil behavior set disablelastaccess 0

As James has pointed out LastAccessTime is not updated.
The LastWriteTime has also undergone a twist since Vista. When the process has the file still open and another process checks the LastWriteTime it will not see the new write time for a long time -- until the process has closed the file.
As a workaround you can open and close the file from your external process. After you have done that you can try to read the LastWriteTime again which is then the up to date value.
File System Tunneling:
If an application implements something like a rolling logger which closes the file and then renames it to a different file name you will also run into issues since the creation time and file size of the "old" file is remembered by the OS although you did create a new file. This includes wrong reports of the file size even if you did recreate log.txt from scratch which is still 0 bytes in size. This feature is called OS File System Tunneling which is still present on Windows 8.1 . An example how to work around this issue check out RollingFlatFileTracelistener from Enterprise Library.
You can see the effects of file system tunneling on your own machine from the cmd shell.
echo test > file1.txt
ren file1.txt file2.txt
Wait one minute
echo test > file1.txt
dir /tc file*.txt
...
05.07.2015 19:26 7 file1.txt
05.07.2015 19:26 7 file2.txt
The file system is a state machine. Keeping states correctly synchronized is hard if you care about performance and correctness.
This strange tunneling syndrome is obviously still used by application which do e.g. autosave a file and move it to a save location and then recreate the file again at the same location. For these applications it makes to sense to give the file a new creation date because it was only copied around. Some installers do also such tricks to move files temporarily to a different location and write the contents back later to get past some file exists check for some install hooks.

Have you tried calling Refresh() just before accessing the property (to avoid getting a cached value)? If that doesn't work, have you looked at what Explorer shows at the same time? If Explorer is showing the wrong information, then it's probably something you can't really address - it might be that the information is only updated when the file handle is closed, for example.

There is a setting in windows which is sometimes set especially on server systems so that modified and accessed times for files are not set for better performance.

From MSDN:
When first called, FileSystemInfo
calls Refresh and returns the
cached information on APIs to get
attributes and so on. On subsequent
calls, you must call Refresh to get
the latest copy of the information.
FileSystemInfo.Refresh()
If you're application is the one doing the writing, I think you are going to have to "touch" the file by setting the LastWriteTime property your self between each buffer of data you write. Some psuedocode:
while(bytesWritten < totalBytes)
{
bytesWritten += br.Write(buffer);
myFileInfo.LastWriteTime = DateTime.Now;
}
I'm not sure how severely this will affect write performance.

Tommy Carlier's answer got me thinking....
A good way to visualise the differences is seperately running the two snippets (I just used LinqPAD) simliar to below while also running sysinternals Process Monitor.
while(true)
File.GetLastAccessTime([file path here]);
and
FileInfo bob = new FileInfo(path);
while(true){
string accessed = bob.LastAccessTime.ToString();
}
If you look at Process Monitor while running the first snippet you will see repeated and constant access attempts to the file for the LinqPAD process. The second snippet will do an initial access of the file, for which you will see activity in process monitor, and then very little afterwards.
However if you go and modify the file (I just opened the text file I was monitoring using FileInfo and added a character and saved) you will see a series of access attempts by the LinqPAD process to the file in process monitor.
This illustrates the non-cached and cached behaviour of the two different approachs respectively.
Will the non-cached approach wear a hole in the hard drive?!
EDIT
I went away feeling all clever over my testing and then used the caching behaviour of FileInfo in my windows service (basically to sit in a loop and say 'Has-file-changed-has-file-changed...' before doing processing)
While this approach worked on my dev box, it did not work in the production environment, ie the process just kept running regardless if the file had changed or not. I ended up changing my approach to checking and just used GetLastAccessTime as part of it. Don't know why it would behave differently on production server....but I am not too concerned at this point.

Searching directories for tons of files?

I'm using MSVE, and I have my own tiles I'm displaying in layers on top. Problem is, there's a ton of them, and they're on a network server. In certain directories, there are something on the order of 30,000+ files. Initially I called Directory.GetFiles, but once I started testing in a pseudo-real environment, it timed out.
What's the best way to programatically list, and iterate through, this many files?
Edit: My coworker suggested using the MS indexing service. Has anyone tried this approach, and (how) has it worked?

I've worked on a SAN system in the past with telephony audio recordings which had issues with numbers of files in a single folder - that system became unusable somewhere near 5,000 (on Windows 2000 Advanced Server with an application in C#.Net 1.1)- the only sensible solution that we came up with was to change the folder structure so that there were a more reasonable number of files. Interestingly Explorer would also time out!
The convention we came up with was a structure that broke the structure up in years, months and days - but that will depend upon your system and whether you can control the directory structure...

Definitely split them up. That said, stay as far away from the Indexing Service as you can.

None. .NET relies on underlying Windows API calls that really, really hate that amount of files themselves.
As Ronnie says: split them up.

You could use DOS?
DIR /s/b > Files.txt

You could also look at either indexing the files yourself, or getting a third part app like google desktop or copernic to do it and then interface with their index. I know copernic has an API that you can use to search for any file in their index and it also supports mapping network drives.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.