Sync services like Dropbox, theory behind file indexing? - c#

I have realised that by using the Amazon S3 service directly, I can save myself a lot of money. Instead of buying a client like GoodSync or Jungle Disk I thought it would be interesting to create my own Windows syncing application, which would sync my files to S3.
I have discovered that I can use FileSystemWatcher to monitor for changes to files and directories, but I am looking for the theory behind how other services like Dropbox index their files. Things like comparing the file size of a file with the size recorded in an index somewhere on the client PC, then using this information to determine whether to sync or not.
I am using C# and references to different libraries or code samples I could use would be helpful, but I am mainly looking for the best way to index files and for someone to point me in the right direction.
Thanks

I've went down this path myself. In fact, now that Mozy dropped their unlimited plan and Carbonite chooses to NOT backup certain files...like 3GP files and *.dat files unless you routinely go in and manually add them, I am very disgruntled with online backups.
But your question was on syncing. Dropbox does it the best. But it's expensive. But I'm not sure S3 would be any cheaper.
Anyway, you will have a lot of hurdles. In my experiences, the problems I ran into are:
1) Propagating deletes
2) FileSystemWatcher simply missing events such as rapidly adding files to a folder then deleting them
3) etc..
Now some ideas on how I would tackle this again:
1) Keep a small SQLite db for files names/path locally
2) Copy files to a tmp directory before sending to S3.
3) On file changes/updates/deletions/etc store that meta information in SQLite
Anyway just some ideas.

Related

How to store User Uploaded Files using MVC.NET on a IIS webserver?

We are working on a Educational website which allows users(teachers and students) to upload files (.pdf,.docx,.png and ... ). We don't have any experience in this area and wanted to make sure we are doing the right thing to store and index these files. We would like to have an architecture that scales well to high volumes of data.
currently we store the path to our files in database like below (Nvarchar(MAX)) :
~/Files/UserPhotos/2fd7199b-a491-433d-acf9-56ce54b6b14f_168467team-03.png
and we use codes below to save and retrieve files :
//save:
file.SaveAs(Server.MapPath("~/Files/UserPhotos/") + fileName);
//retrieve:
<img alt="" src="#Url.Content(Model.FilePath)">
now our questions are :
are we proceding in a good direction?
should we save files in a root directory or a virtual directory?
imagine our server has 1 TB storage,after storing 1 TB data if we add an extra hard drive how should we manage changes?
we searched a lot but did not find any good tutorial or guidelines for correct architecture.
sorry for my bad English.
In an ideal world, you would be using cloud storage, such as Azure Blob Storage, if that's not an option then the way I would do it is create a separate web service that specifically deals with uploaded files and file storage.
By creating a separate web service that manages file storage, you will have isolated your concerns, this service can monitor hard drive storage spaces and balance them out as documents are being uploaded, and in future if you add additional servers... you will already have separated your service so it won't be as big of a mess as it would be if you didn't.
You can index everything in a SQL data store as files are being uploaded. Your issues are actually much more complicated than what I've just mentioned though...
The other issues that need attention is the game plan if or when one of the hard drives go kaput! Without a RAID 1 configuration of your hard drives, your availability plummets to NADA.
Queue issue number 2... availability != backups... You need to consider your game plan on how you intend to back the system up, how often, during what time of day, etc... The more data you have, the more difficult this gets...
This is why everyone is moving over to Azure / AWS etc... you just don't have to worry about these sort of things anymore...
1.I usually save files in this way:
file.SaveAs(Server.MapPath("/Files/UserPhotos/") + fileName);
2.it is better to save it in a virtual directory,so that you can move your files folder to a new an extra hard disk and change your virtual directory's path in IIS when you have too much files in this folder.

Transactional file management in C#

I have a scenario in my application is that i need to upload some files (Zip files) from the client to the server and in the server i want to extract the Zip file and replace those files which i getting from extracting the Zip file into some other folder.
The files which i need to be replaced is mostly dll files. So one thing that i need to ensure that either all files should be replaced or none of them get replaced.
Is there any way in C# to achieve this (like Transaction in SQL) ? If anything bad occurs while replacing files (Example: no memory space), every changes happened to the previous files should be rollbacked.
Hope you understand the problem.
Any help ?
NTFS allows file system transactions, see https://msdn.microsoft.com/en-us/magazine/cc163388.aspx
Having a quick poke around, only way I can see you doing this would be through https://msdn.microsoft.com/en-us/magazine/cc163388.aspx which involves some native code. Otherwise you could use a third party tool such as http://transactionalfilemgr.codeplex.com/
If you wanted to manage it yourself or go for a simpler approach, I would suggest backing up the existing files somewhere before trying to copy the new files. This could be in another folder or zipped up. Then if the copy fails, you handle this and revert all the files to their original state.
Whatever you choose, make sure you have plenty of logging so you can see what's happening and if/when something goes wrong :)

Getting code from Windows Azure VM

OK, so my hard disk just crashed. Big deal. All my web dev code that was on it went along with it, and now I'm running ddrescue on Ubuntu trying to recover whatever data I can recover. The hard disk keeps disconnecting and sometimes it can quit responding for a long time so it's really a pain in the ass.
Anyway, back to the main topic--I have my web dev code which was packaged and uploaded to Azure; now what I'm wondering is if it's possible to obtain all my .cs files from the VM. I noticed approot and siteroot folders, but all I saw were the views, the .asax file, some other misc, stuff, nothing with the .cs extension.
Is there any way I can get a copy of the code I packaged? or (as a last resort) any way to get the .cspkg file and work from there?
The site you are seeing on the web role and inside the cspkg file is the output of the compile, so you can't get the original .cs files out of them. That said, you can use a tool like Reflector, Just Decompile, or a variety of other decompilers out there to reverse engineer your compiled bits into something that will be very close to the original C# code (not I'm assuming this is your own code, or code that doesn't have a provision against reverse engineering). This at least will let you use the bits on the webrole to get the majority of your code back, then review it to see how good a job it did.
Note, you can open the cspkg file. It's just a zip file. You can rename with a .zip file extension and open it up, but you won't find the .cs files in there. The only time you find this to be the case is if you have multiple websites within a single web role. The default packager for Windows Azure doesn't compile the additional sites, only packages up all the files in their root directory. Not at all helpful for actual deployments really, but this won't likely help you.
You are likely well ahead of me on this, but I'd recommend using a personal source control system of some sort to avoid this issue in the future.

Create a folder backed by software rather than OS?

I recently got put on a project where they're having issues with too many files in a folder slowing down access. I believe it is 10,000+ files in a single folder where windows starts to slow down access, we have something on the order of 50,000. All the files are small and most of the time we only need to access the newest .1-2.% of them via windows file and print sharing. I'd look into dividing the files into subfolders, except that there is a bunch of legacy code that is only able to look at a single folder.
My idea - I don't know if it is possible or even plausible - is to create a small program that buffers the newest .1-.2% files in memory, and retrieves the rest from disk as needed.
I had thought that years ago I'd read of a protocol that could simulate a folder on a hard drive. Is it possible?
Is there something out there that already does this? Is there a better option without major changes to the system?
What to other systems use for serving up a large number of files? Is there some other product that serves files that we could map as a network drive? Or some way to blend 2 folders so they look like one?
Putting aside the "correct way to solve this problem" for the moment, what you're looking for is called "Shell namespace extensions". There are several .NET resources for writing these explorer extensions.
http://namespaceextension.codeplex.com/
http://www.codeproject.com/Articles/1649/The-Complete-Idiot-s-Guide-to-Writing-Namespace-Ex
http://www.codeproject.com/Articles/13515/A-Namespace-Extension-Toolkit
And perhaps many more.
Of course - we must remember why it isn't a good idea to write explorer extensions in .NET.
Hope this helps.

How to handle temporary files in an ASP.NET application

Recently I was working on displaying workflow diagram images in our web application. I managed to use the rehosted WF designer and create images on-the-fly on the server, but imagining how large the workflow diagrams can very quickly become, I wanted to give a better user experience by using some ajax control for displaying images that would support zoom & pan functionality.
I happened to come across the website of seadragon, which seems to be just an amazing piece of work that I could use. There is just one disadvantage - in order to use their library for generating deep zoom versions of images I have to use the file structure on a server. Because of the temporary nature of the images I am using (workflow diagrams with progress indicators), it is important to not only be able to create such images but also to get rid of them after some time.
Now the question is how can I best ensure that the temporary image files and the folder hierarchy can be created on a server (ASP.NET web app), and later cleaned up. I was thinking of using the cache functionality and by the expiration of the cache item delete the corresponding image folder hierarchy, or simply in the Application_Start and Application_End of Global.asax delete the content of the whole temporary folder, but I'm not really sure whether this is a good idea and whether there are some security restrictions or file-system-related troubles. What do you think ?
We do something similar for creating PDF reports and found the easiest way is to use a timestamp check to determine how "old" files are, and then delete them based on a period of time, in our case more then 2 hours old. This is done before the next PDF document is created, but as part of the creation process. We also created a specific folder and gave the ASP.Net user read/write access to the folder.
The only disadvantage is that if the process of creating PDF's is not used regularly there will be a build up of files, however they will be cleaned up eventually. In 2 years and close on 4000 PDF's we have yet to have an error doing it this way.
Use the App_Data folder. This folder is inside your application and writable by your app without having to go outside the context of the app, but it's also secured from casual browsing. It's meant to hold data files for your application.
Application_Start and Application_End will only fire once each, so if you need better cleanup than that, I would consider using a cache structure or a simple windows service to handle the cleanup.
First, you have to make sure your IIS worker process has rights to write/delete files from your cache directory (and NOT the rest of your site, just in case)
2nd, I would stay away from using App_Start and App_End, App end to clean up files is not 100% guaranteed to fire, and you could end up with a growing pile of orphaned images.
I would instead make a scheduled process, maybe runs once per hour, or once a day, depending on what you want. And have it check how old each image in your cache is, and if its older than your arbitrary "expiure time" then delete it.
Other than that there's not much to it.

Categories

Resources