We’ve got a process that obtains a list of files from a remote transatlantic Samba share. This is naturally on the slow side, however it’s made worse by the fact that we don’t just need the names of the files, we need the last write times to spot updates. There’s quite a lot of files in the directory, and as far as I can tell, the .NET file API insists on me asking for each one individually. Is there a faster way of obtaining the information we need?
I would love to find a way myself. I have exactly the same problem - huge number of files on a slow network location, and I need to scan for changes.
As far as I know, you do need to ask for file properties one by one.
The amount of information transferred per file should not be high though; the roundabout request-response time is probably the main problem. You can help the situation by running multiple requests in parallel (e.g. using Parallel.ForEach)
The answer to your question is most likely no, at least not in a meaningful way.
Exactly how you enumerate the files in your code is almost irrelevant since they all boil down to the same file system API in Windows. Unfortunately, there is no function that returns a list of file details in one call*.
So, no matter what your code looks like, somewhere below, it's still enumerating the directory contents and calling a particular file function individually for each file.
If this is really a problem, I would look into moving the detection logic closer to the files and send your app the results periodically.
*Disclaimer: It's been a long time since I've been down this far in the stack and I'm just browsing the API docs now, there may be a new function somewhere that does exactly this.
Related
I'm currently using Rackspace cloud files for backing up files, some that can be rather large, and I would like to avoid having to start from the beginning every time there is a failure in the network. For example, some time ago my log showed a 503 error happening with the server being unavailable which caused the upload to stop.
Is there anyway the .Net SDK can handle this? If not, is there another possible solution working around the SDK? I've been searching for a solution, but have not yet come across anything.
Thank you.
EDIT:
I've tried solving this in the meantime by creating my own method for segmentation for files as big as 2 GB, even though the SDK does that for you. By dealing with smaller pieces of files, this helps, but it will result in take=ing up a lot of room in the container( 1000 object limit), so I'd still like to see if there is a better way to prevent this problem.
I can't really speak for the .Net SDK, but I can give you some tips as far as Cloud Files goes.
is there another possible solution working around the SDK?
We usually recommend segmenting large objects yourself. This will allow you to upload multiple segments in parallel. Then if a segment fails while uploading, you can just re-upload that single segment. As a general rule we usually recommend ~100MB segments.
If you need to be able to access your file as a single object, you can use the segments to create a Static Large Object aka SLO.
will result in take=ing up a lot of room in the container( 1000 object limit),
Containers don't have a hard limit on the number of objects they can contain, however if you expect to have a million objects you may consider spreading them across multiple containers. If you are talking about a SLO's 1000 segment limit, you could always create nested SLOs.
Let's say I received a .csv-File over network,
so I have a byte[].
I also have a parser that reads .csv-files and does business things with it,
using File.ReadAllLines().
So far I did:
File.WriteAllBytes(tempPath, incomingBuffer);
parser.Open(tempPath);
I won't ever need the actual file on this device, though.
Is there a way to "store" this file in some virtual place and "open" it again from there, but all in memory?
That would save me ages of waiting on the IO operations to complete (good article on that on coding horror),
plus reducing wear on the drive (relevant if this occured a few dozen times a minute 24/7)
and in general eliminating a point of failure.
This is a bit in the UNIX-direction, where everything is a file-stream, but we're talking windows here.
I won't ever need the actual file on this device, though. - Well, you kind of do if all your API's expect file on the disk.
You can:
1) Get decent API's(I am sure there are CSV parsers that take Stream as construtor parameter - you then can possibly use MemoryStream, for example.)
2) If performance is serious issue, and there is no way you can handle the API's, there's one simple solution: write your own implementation of ramdisk, which will cache everything that is needed, and page stuff to hdd if necessary.
http://code.msdn.microsoft.com/windowshardware/RAMDisk-Storage-Driver-9ce5f699 (Oh did I mention that you absolutely need to have mad experience with drivers :p?)
There's also "ready" solutions for ramdisk(Google!), which means you can just run(in your application initializer) 'CreateRamDisk.exe -Hdd "__MEMDISK__"'(for example), and use File.WriteAllBytes("__MEMDISK__:\yourFile.csv");
Alternatively you can read about memory-mapped files(>= C# 4.0 has nice support). However, by the sounds of it, that probably does not help you too much.
I am writing a program that searches and copies mp3-files to a specified directory.
Currently I am using a List that is filled with all the mp3s in a directory (which takes - not surprisingly - a very long time.) Then I use taglib-sharp to compare the ID3Tags with the artist and title entered. If they match I copy the file.
Since this is my first program and I am very new to programming I figure there must be a better/more efficient way to do this. Does anybody have a suggestion on what I could try?
Edit: I forgot to add an important detail: I want to be able to specify what directories should be searched every time I start a search (the directory to be searched will be specified in the program itself). So storing all the files in a database or something similar isn't really an option (unless there is a way to do this every time which is still efficient). I am basically looking for the best way to search through all the files in a directory where the files are indexed every time. (I am aware that this is probably not a good idea but I'd like to do it that way. If there is no real way to do this I'll have to reconsider but for now I'd like to do it like that.)
You are mostly saddled with the bottleneck that is IO, a consequence of the hardware with which you are working. It will be the copying of files that is the denominator here (other than finding the files, which is dwarfed compared to copying).
There are other ways to go about file management, and each exposing better interfaces for different purposes, such as NTFS Change Journals and low-level sector handling (not recommended) for example, but if this is your first program in C# then maybe you don't want to venture into p/invoking native calls.
Other than alternatives to actual processes, you might consider mechanisms to minimise disk access - i.e. not redoing anything you have already done, or don't need to do.
Use an database (simple binary serialized file or an embedded database like RavenDb) to cache all files. And query that cache instead.
Also store modified time for each folder in the database. Compare the time in the database with the time on the folder each time you start your application (and sync changed folders).
That ought to give you much better performance. Threading will not really help searching folders since it's the disk IO that takes time, not your application.
I have been trying to compare files from two differents environments. Both environment are accessed via a network drive.
First, I checked if the file is present in both environments (to save time), then I ask for a FileInfo on both files to compare their filesizes (FileInfo.Length).
My goal is to populate a listview with every files that does not have the same size to investigate on these later.
I find it hard to understand that windows explorer gets the filesize of a lot of files so quickly and that FileInfo is taking so long...
Thanks.
Ben
If you use DirectoryInfo.GetFiles(some-search-pattern) to get the files, then use the FileInfo-instance returned from that call, then the property FileInfo.Length will be cached from the search.
Whether this will be of any help obviously depends on how the search performs in comparison to the check you are currently doing (if the search is slower, you may not gain anything). Still, it may be worth looking into.
You can try doing it the operating system way. You call via win32 system calls. You can start with this page,
http://www.pinvoke.net/default.aspx/kernel32.getfilesize
I need FileSystemWatcher, that can observing same specific paths, and specific extensions.
But the paths could by dozens, hundreds or maybe thousand (hope not :P), the same with extensions. The paths and ext are added by user.
Creating hundreds of FileSystemWatcher it's not good idea, isn't it?
So - how to do it?
Is it possible to watch/observing every device (HDDs, SD flash, pendrives, etc.)?
Will it be efficient? I don't think so... . Every changing Windows log file, scanning file by antyvirus program - it could realy slow down my program with SystemWatcher :(
Well try first and then you'll see if you run into troubles.
Trying to optimize something where you don't even know if there is a problem is usually not very effective.
You're probably right that creating 10,000+ FileSystemWatchers may cause a problem. If it does (as Foxfire says - test it), start with the easy consolidations -- ignore the extensions when setting up your FileSystemWatchers, and filter the events after you get them.
If that still results in too much resource usage, try intelligently combining paths in the same manner, perhaps even going so far as to only create one FileSystemWatcher per drive letter, and perform the rest of your filtering after the event is received by your code.