I have an application that stores images in a database.
Now I have learned that this isn't the best way to do this because of performance.
I've started extracting all the "blobs" out of the database and added them (through a virtual folder) into the data folder
the problem is that I allready have 8000 blobs stored and if I put them in the folder like this "data/< blobid >/< blobname.extension >" this folder will contain to many folder to be manageable.
I was wondering how you can store your files the best?
group them in dates of creation like this "data/< year >/< month >/< day >/< blobid >/< name >".
I also have to add that our files are stored in a tree in the database.
I was wondering if I should map that treestructure to the filesystem, the only problem there is that you can move branches. That would mean that I have to move the branches on the filesystem.
Any help is welcome.
Grtz,
M
What version of SQL Server are you using? Because if you are using 2008 you can use the FILESTREAM datatype to store images. This is just as efficient as storing them on the filestore but without any of the associated hassle. See Getting Traction with SQL Server 2008 Filestream.
A simple strategy is grouping according to the first [few] digit(s). E.g.:
1/
2/
123.blob
129.blob
5/
151.blob
2/
0/
208.blob
That way, you know you'll never have more than 10 subdirectories in a directory. You may of course use more or less levels (of directories) and/or more digits per level.
A more complex, dynamic system could create sublevels on demand: If the number of blobs in a certain directory exceed a preset maximum, create another 10 subdirectories and move the files in.
Most filesystems for BLOB data will set up a number of subdirectories. For instance, if you have IDs 1-10000, you could have:
00/
00/
01/
02/
00020.blob
00021.blob
...
...
01/
02/
03/
...
The other question I have back for you: why is it so bad for you to manage them as BLOBs?
Do you need to store the files in the relevant tree structure? If not you could name the file /YOURFOLDER/blobid_blobname.extension. This way the upload folder acts purely as a repository for the data rather than mimicking the data structure.
Related
We are working on a Educational website which allows users(teachers and students) to upload files (.pdf,.docx,.png and ... ). We don't have any experience in this area and wanted to make sure we are doing the right thing to store and index these files. We would like to have an architecture that scales well to high volumes of data.
currently we store the path to our files in database like below (Nvarchar(MAX)) :
~/Files/UserPhotos/2fd7199b-a491-433d-acf9-56ce54b6b14f_168467team-03.png
and we use codes below to save and retrieve files :
//save:
file.SaveAs(Server.MapPath("~/Files/UserPhotos/") + fileName);
//retrieve:
<img alt="" src="#Url.Content(Model.FilePath)">
now our questions are :
are we proceding in a good direction?
should we save files in a root directory or a virtual directory?
imagine our server has 1 TB storage,after storing 1 TB data if we add an extra hard drive how should we manage changes?
we searched a lot but did not find any good tutorial or guidelines for correct architecture.
sorry for my bad English.
In an ideal world, you would be using cloud storage, such as Azure Blob Storage, if that's not an option then the way I would do it is create a separate web service that specifically deals with uploaded files and file storage.
By creating a separate web service that manages file storage, you will have isolated your concerns, this service can monitor hard drive storage spaces and balance them out as documents are being uploaded, and in future if you add additional servers... you will already have separated your service so it won't be as big of a mess as it would be if you didn't.
You can index everything in a SQL data store as files are being uploaded. Your issues are actually much more complicated than what I've just mentioned though...
The other issues that need attention is the game plan if or when one of the hard drives go kaput! Without a RAID 1 configuration of your hard drives, your availability plummets to NADA.
Queue issue number 2... availability != backups... You need to consider your game plan on how you intend to back the system up, how often, during what time of day, etc... The more data you have, the more difficult this gets...
This is why everyone is moving over to Azure / AWS etc... you just don't have to worry about these sort of things anymore...
1.I usually save files in this way:
file.SaveAs(Server.MapPath("/Files/UserPhotos/") + fileName);
2.it is better to save it in a virtual directory,so that you can move your files folder to a new an extra hard disk and change your virtual directory's path in IIS when you have too much files in this folder.
I am developing a winform application that moves files between directories in a SAN. The app searches a list of files in directories with over 200,000 files (per directory) in the SAN's Directory, and then moves the found files to another directory.
For example I have this path:
\san\xxx\yyy\zzz
I perform the search in \zzz and then move the files to \yyy, but while I'm moving the files, another app is inserting more files into the xxx, yyy and zzz directories.
I don't want to impact the access or performance to other applications that use these directories.
The file's integrity is also a big concern, because when the application moves the files to \yyy directory another app uses those files.
All of these steps should run in parallel.
What methods could I use to achieve this?
How can I handle concurrency?
Thanks in advance.
Using the published .net sources, one can see how the move file operation works Here. Now if you follow the logic you'll find there's a call to KERNEL32. The MSDN documentation for this states
To perform this operation as a transacted operation, use the MoveFileTransacted function.
Based on that I'd say the move method is a safe/good way to move a file.
Searching through a large number of files is always going to take time. If you need to increase the speed of this operation, I'd suggest think out of side of the box to achieve this. For instance, I've seen cache buckets produced for both letters and dates. Using some sort of cache system, you could search for files in a quicker manner, say by asking the cache bucket for files starting with "a" for "these files like this starting with a" file, or files between these dates - etc.
Presumably, these files are used by something. For the purposes of this conversation, let's assume they are indexed in an SQL Database.
TABLE: Files
File ID Filename
-------------- ----------------------------
1 \\a\b\aaaaa.bin
2 \\a\b\bbbbb.bin
To move them to \\a\c\ without impacting access, you can execute the following:
Copy the file from \\a\b\aaaa.bin to \\a\c\aaaa.bin
Update the database to point to \\a\c\aaaa.bin
Once all open sessions have been closed for \\a\b\aaaa.bin, delete the old file
If the files are on the same volume, you could also use a hardlink instead of a copy.
We need to create or integrate some existing software which identifies the FTP folder to download files.
The problem here is the folder structure will be configured by the user according to how the client stores them on FTP at run time and stored in some xml or DB.
The folder structure needs to be generic so that we can easily configure it for any type of structure. The folder or file names can contain dates or part of date in their names which change everyday according to the date.
For eg. we can have a folder Files_DDMMYYYY and inside that there will be specific files which have to be downloaded everyday.
OR
A single folder in which different files can contain dates.
The first major improvement you can make to your solution is a very simple one.
That would be to restructure the date format in your folder structure from using
Files_DDMMYYYY
to using
Files_YYYYMMDD
This way, your folders will list in the directory in sequential order. Otherwise, your files will be crosscut sorted by the day of the month and then the month of the year and then the year. With DDMMYYYY you'll see them listed something like this:
01102011
01102012
01102013
01112011
01112012
01112013
01122011
01122012
01122013
15102011
15102012
15102013
15112011
15112012
15112013
15122011
15122012
15122013
With them YYYYMMDD you'll see them
20111001
20111015
20111101
20111115
20111201
20111215
20121001
20121015
20121101
20121115
20121201
20121215
20131001
20131015
20131101
20131115
20131201
20131215
As I said, it's a very simple change that will help keep your structure organized. The second list is automatically sequentially ordered by date.
I am developing a WinForms application using C# 3.5. I have a requirement to save a file on a temporary basis. Let's just say, for arguments sake, that's it's for a short duration of time while the user is viewing a particular tab on the app. After the user navigates away from the tab I am free to delete this file. Each time the user navigates to the tab(which is typically only done once), the file will be created(using a GUID name).
To get to my question - is it considered good practice to save a file to the temp directory? I'll be using the following logic:
Path.GetTempFileName();
My intention would be to create the file and leave it without deleting it. I'm going to assume here that the Windows OS cleans up the temp directory at some interval based on % of available space remaining.
Note: I had considered using the IsolatedStorage option to create the file and manually delete the file when I was finished using it i.e. when the user navigates away from the tab. However, it's not going so well as I have a requirement to get the Absolute or Relative path to the file and this does not appear to be an straight-forward/safe chore when interacting with IsolatedStorage. My opinion is that it's just not designed to allow
this.
I write temp files quite frequently. In my humble opionion the key is to clean up after one self by deleting unneeded temp files.
In my opinion, it's a better practice to actually delete the temporary files when you don't need them. Consider the following remarks from Path.GetTempFileName() Method:
The GetTempFileName method will raise an IOException if it is used to
create more than 65535 files without deleting previous temporary
files.
The GetTempFileName method will raise an IOException if no
unique temporary file name is available. To resolve this error, delete
all unneeded temporary files.
Also, you should beaware about the following hotfix for Windows 7 and Windows Server 2008 R2.
Creating temp files in the temp directory is fine. It is considered good practice to clean up any temporary file when you are done using it.
Remember that temp files shouldn't persist any data you need on a long term basis (defined as across user sessions). Exaples of data needed "long term" are user settings or a saved data file.
Go ahead and save there, but clean up when you're done (closing the program). Keeping them until the end also allows re-use.
I am using asp.net mvc and have a section where a user can upload images. I am wondering where should I store them.
I was following this tutorial and he seems to store it in the app_data. I however read another person say it should only hold your database.
So not sure what the advantages are for using app_data. I am on a shared hosting so I don't know if that makes a difference.
Edit
I am planning to store the path to the images in the database. I will be then using them in a image tag and rendering them to the user when they come to my site. I have a file uploader that only will expect images(check will be client and server)
The tutorial is a simple example - and if you read the comments, the original code just saved to an uploads directory, no app_data in sight.
It was changed to app_data because that's a special folder - one that will not allow execution of code.
And you have understood correctly - app_data is really there for holding file based databases. That's the meaning of the folder. As such saving images into it doesn't feel like the right thing to do.
If you are certain only images will get uploaded (and you control that), I would suggest an /uploads directory - a reserved location for images that also will not allow code execution (something that you should be able to control via IIS).
I would say that depends on what you will do with that images later. If you use those images in an img tag, you could save them somewhere in the Content/ folder structure.
If you do not need them reachable from the outside, or need to stream them changed back, you might store them out of the web root if the hoster allows for that.
I wouldn't store them in app_data, as I - personally - think that it's more a convention to store there a database. Most developers not familiar with that product wouldn't look there for the image.
But: you could store binaries in a db (even though it is probably not the best thing to do), so a reference in a db pointing to a file in the same directory makes sense again.
It's more an opinion thing than a technical question though, I think.
I prefer to store them in the database. When storing images on the file system I've found it can be a bit harder to manage them. With a database you can quickly rename files, delete them, copy them, etc. You can do the same when they're on the file system, but it takes some scripting knowledge.
Plus I prefer not to manage paths and file locations, which is another vote for the database. Those path values always make their way into the web.config and it can become more difficult to deploy and manage.