Iterating through a file structure - c#

If I have a directory structure like so on a web service that sends me a list of files and folders in the directory I request. How would you go about iterating through the entire structure. There is no recursive search, so each directory has to be pulled one at a time. I can get the directories content I request but it doesnt list any sub-directories information. I was trying to think of a way to do this with a for loop or for each loop, but I havent been able to come up with anything. I didnt write the web service so I cant include recursive directory searching in it.
Pictures
photo1.png
photo2.png
TestFolder1
April.png
MyPictures
ProfilePic.png
TestFolder2
2012
August
Photos
photo3.png
photo4.png

Your probably looking for a Tree Traversal Algorithm:
Traversing a tree of objects in c#
Tree traversal algorithm for directory structures with a lot of files
How to: Iterate Through a Directory Tree

Related

Searching for a specific file inside Azure File Storage

I have some performance issues while trying to get specific files from Azure Storage.
We have the following folder tree inside Azure:
In the folder Week there is a file. I would like to get on one call precise file based on Location, Year and Week. Currently there is a working solution but it isn't efficient. It is looking recursively through folder tree.
Does anyone has a better solution ?

UWP Enumerate Folders with 10,000 files while creating a subfolder

I'm a fairly experienced developer, but this has me stumped in UWP - I'll keep it simple.
Let's say I want to go through all photos in the pictures folder, watermark them, and save the watermarked version in a sub folder of pictures (eg. pictures\watermarked)
Sound easy?
Try 1: Using GetFilesAsync (incl. GetItemsAsync, GetFoldersAsync) - This method goes through every file, giving me the StorageFile objects I need.
There are 2 problems with this approach:
I can't show a progress bar until I've scanned every file and that's
painfully slow in UWP.
The Runtime Broker will consume all memory if I keep any reference
to the StorageFile object (so enumerate and enumerate again to get a
progress is seriously slow, think 1,000 times slower than Win32)
Try 2: Using Queries - This method involves using Windows.System.Search & Queries to return a list of pointers (ish) to all the files. I can then use StorageFolderQueryResult to get each StorageFile on the fly and release immediately so that the Runtime Broker behaves. This is very fast as it uses the Windows Index system, really, really fast.
The problem is that the query system is fairly stupid, as soon I create the subfolder "Watermarked Photos", the storagefiles returned by the Query (which did not exist when it was queried) start to contain files from the Watermarked folder. It appears that the Query is actually just a number of files, not a static list of the actual files, so the results are arbitrary based on any files added/removed after the query was invoked within it's scope.
Anyone with thoughts on how to do this?
RESOLVED - It's not possible using the index system. I created my own Query class. It uses the GetItemsAsync method of folders, the number of objects here won't kill the RuntimeBroker, I store the Path in a string list. containing the paths of all files and sub folders. I can then use GetFileFromPathAsync to instantiate and destroy StorageItems as needed. The RuntimeBroker is okay with that, although it's not the best performance it does give me custom file/folder filtering. Happy to elaborate if anyone needs more info.

C# move files between directories in a SAN

I am developing a winform application that moves files between directories in a SAN. The app searches a list of files in directories with over 200,000 files (per directory) in the SAN's Directory, and then moves the found files to another directory.
For example I have this path:
\san\xxx\yyy\zzz
I perform the search in \zzz and then move the files to \yyy, but while I'm moving the files, another app is inserting more files into the xxx, yyy and zzz directories.
I don't want to impact the access or performance to other applications that use these directories.
The file's integrity is also a big concern, because when the application moves the files to \yyy directory another app uses those files.
All of these steps should run in parallel.
What methods could I use to achieve this?
How can I handle concurrency?
Thanks in advance.
Using the published .net sources, one can see how the move file operation works Here. Now if you follow the logic you'll find there's a call to KERNEL32. The MSDN documentation for this states
To perform this operation as a transacted operation, use the MoveFileTransacted function.
Based on that I'd say the move method is a safe/good way to move a file.
Searching through a large number of files is always going to take time. If you need to increase the speed of this operation, I'd suggest think out of side of the box to achieve this. For instance, I've seen cache buckets produced for both letters and dates. Using some sort of cache system, you could search for files in a quicker manner, say by asking the cache bucket for files starting with "a" for "these files like this starting with a" file, or files between these dates - etc.
Presumably, these files are used by something. For the purposes of this conversation, let's assume they are indexed in an SQL Database.
TABLE: Files
File ID Filename
-------------- ----------------------------
1 \\a\b\aaaaa.bin
2 \\a\b\bbbbb.bin
To move them to \\a\c\ without impacting access, you can execute the following:
Copy the file from \\a\b\aaaa.bin to \\a\c\aaaa.bin
Update the database to point to \\a\c\aaaa.bin
Once all open sessions have been closed for \\a\b\aaaa.bin, delete the old file
If the files are on the same volume, you could also use a hardlink instead of a copy.

Listing files from a path different that root, C# Google Drive SDK

I was trying to list files from a specific folder with the Google Drive v2 SDK but I have troubles with it.
If I want to list files from my 'root' folder it is easy to do, because I can write in the q parameter 'root' in parents. But I cannot get an easy way to list files from my path "/folder1/pictures". The only solution that I found is listing folders from 'root', get the 'folder1' id, then I list folders from there, get the 'pictures' id, and then I search for files with q parameter with the pictures id in parents. That solutions requires so many queries, and it is not performant. Is there any way to make it easier?
In order to to list files in /folder/pictures you will need its ID. There are three ways you can do this...
Recurse the directories as you are doing.
Look directly for "pictures". If the user has more than one "pictures" folder, then check the parents for each one to see which one is owned by "folder1"
Whichever of 1 or 2 you so, you can store the ID of pictures in some local storage, so subsequent searches are a single GET

Storing files on the filesystem

I have an application that stores images in a database.
Now I have learned that this isn't the best way to do this because of performance.
I've started extracting all the "blobs" out of the database and added them (through a virtual folder) into the data folder
the problem is that I allready have 8000 blobs stored and if I put them in the folder like this "data/< blobid >/< blobname.extension >" this folder will contain to many folder to be manageable.
I was wondering how you can store your files the best?
group them in dates of creation like this "data/< year >/< month >/< day >/< blobid >/< name >".
I also have to add that our files are stored in a tree in the database.
I was wondering if I should map that treestructure to the filesystem, the only problem there is that you can move branches. That would mean that I have to move the branches on the filesystem.
Any help is welcome.
Grtz,
M
What version of SQL Server are you using? Because if you are using 2008 you can use the FILESTREAM datatype to store images. This is just as efficient as storing them on the filestore but without any of the associated hassle. See Getting Traction with SQL Server 2008 Filestream.
A simple strategy is grouping according to the first [few] digit(s). E.g.:
1/
2/
123.blob
129.blob
5/
151.blob
2/
0/
208.blob
That way, you know you'll never have more than 10 subdirectories in a directory. You may of course use more or less levels (of directories) and/or more digits per level.
A more complex, dynamic system could create sublevels on demand: If the number of blobs in a certain directory exceed a preset maximum, create another 10 subdirectories and move the files in.
Most filesystems for BLOB data will set up a number of subdirectories. For instance, if you have IDs 1-10000, you could have:
00/
00/
01/
02/
00020.blob
00021.blob
...
...
01/
02/
03/
...
The other question I have back for you: why is it so bad for you to manage them as BLOBs?
Do you need to store the files in the relevant tree structure? If not you could name the file /YOURFOLDER/blobid_blobname.extension. This way the upload folder acts purely as a repository for the data rather than mimicking the data structure.

Categories

Resources