Related:
Storing Images in DB - Yea or Nay?
After reading the above question, it seems the preferred method for image storage with databases is to store only the filepath within the database. However, most of these answers seem to focus on web servers.
In my case, I'm developing a desktop application that will be used across multiple computers within an intranet. A dedicated server will host the database, containing information related to performing tests on various equipment.
Images need to be stored on the server in some way. Would storing the images in the database be the correct approach in this case, or even the only approach?
Pros:
Backup is limited to only the database.
No need to open up the server's file system to the network.
Single protocol for server information access.
Protected file access. (User can't go in and delete all the images)
Cons
Performance issues in future if there's too many images.
Edit: As stated in the tags, the application is being written in C#/.NET. If writing the images to the file system is an option in this case, I could use some help understanding how this is done.
Edit 2: As elaborated some in the comments below, for now I'm assuming a MySQL database, although the FileStream capabilities of SQL Server 2008 could potentially change that.
Also in my case, images will be added often, and can be considered read-only after this point since they should never be changed, and will just be read out when needed. Images will likely be small (~70k each), and I'm also considering some other binary format storage on the server, files which are ~20k each which I can likely apply the same approach for storing and retrieving.
I'd suggest keeping those files on disk in the file system, rather than in the database. File system for files, databases for relational data, etc.
Deliver by Web Service
Consider delivering those images to your desktop app by hosting a web service/app on that DB machine. That app's job it is to serve only images. Setup a web server on that machine with an ASP.NET application. Have an .ashx handle requests and stream the binary image. Something like this:
http://myserver/myapp/GetImage.ashx?CustomerID=123&ImageID=456
Security
If intranet security is an issue, this would be the point where you could ensure that the user is authenticated and authorized for read access to the image. Audit trails could be implemented here as well.
File System Security
Regarding security on those images, consider that NTFS gives you a lot of measures to ensure that only those who are authorized can read/delete/put files as required. The task then would be to define those roles and implement Windows security groups.
Future Needs
This approach allows you to securely consume those images from anywhere on the intranet. Perhaps this app would be migrated to a web application at some point? Perhaps a feature request comes from the customer where a web solution is appropriate?
This might sound like overkill rather than reading a blob from the database, but it's great from a security perspective. Consider your customers' and patients' expectations on privacy and security.
<%# WebHandler Language="C#" Class="Handler" %>
public class Handler : IHttpHandler {
public void ProcessRequest (HttpContext context)
{
//go to the DB and get the path for this ID.
string filePath = GetImagePath(context.Request.QueryString["ImageID"]);
//now you have the path on disk; read the file
byte[] imgBytes=GetBytesFromDisk(filePath);
// send back as byte[]
context.Response.BinaryWrite(imgBytes);
}
I think the answer is that there is no right answer. As with most things in programming (and life), It DEPENDS.
Here are some Pros and Cons of storing in DB:
PROS
Easy backup, management and one stop shop for data in your application
Less dependencies in your app and fewer moving parts. KISS Principle
Works fine on small files under 1GB.
Hey its a DB, so saves can be done inside transactions and rolled back if there are network problems
Sharepoint and TFS store everything in the DB and work just fine. even the big boys do it
Security can be easily controlled by the app and not involve file/folder permissions
Cons
Eats up db space
Potentially effect performance if not done right
Not such a great idea if always storing large files (>1GB) unless using Filestream in SQL Server 2k8
Requires you to implement a decent caching strategy (although you would probably want this anyways)
File system feels more natural than DB and easier for manually replacing/viewing files.
I guess when it comes to your situation, I would lean towards the simplicity of storing in the DB.
From an architecture perspective, you'll get the best performance by splitting the solution into two pieces: a database server, and an image server.
You would do this both in order to keep row sizes small, and also to separate your transactional environment from content. Relational databases in the vein of SQL Server and mysql will support big BLOBs but aren't optimized for them.
Most people equate "image server" to "web server" because they work on web applications and therefore have a de facto image repository (a directory on a local disk). However, this does not have to be the case. Images can be served from any location over any protocol.
You mentioned a C#/.NET platform and an intranet. Can we assume a Windows environment, possibly Active Directory?
If so, a plain vanilla file server could be your image server. Set up a file share, set read/create (but not modify/delete) permissions on it for all users of this app, store the UNC path somewhere in the database (so you don't have to redeploy the app if you decide to relocate it), and have your client application generate a unique, relative path using something reliable like a Guid.
It's not as elegant as a web service (which is my preferred approach), nor quite as maintenance-free as the pure-database approach, but my impression of this topic is that you're on a tight budget with a short delivery deadline, and a Windows or NFS file server is cheaper, easier, and faster to set up and maintain (including backups) than a full-fledged web server, so it might be just what you're looking for here.
Most businesses already have a file server, so usually this won't require any new infrastructure whatsoever. But even if you don't, I've seen file servers run off old reconditioned workstations - it's not fancy, but in a low-traffic environment it gets the job done.
If you choose this approach, I would suggest some kind of directory structure on the file share to simplify backups, archiving, etc. For example:
\\ImageServer\MyAppRepository\yyyy-mm\{image-file-name-or-guid}.{ext}.
Hope that helps.
How many images are we talking? Are they unique/updated frequently? If not can you package the images with the client that you are going distribute to multiple computers?
Personally, I would avoid storing images in the database, and instead as you said store the file paths.
If you have read through all of the other similar questions (This, this, and this) but are still asking if this is a good idea, then maybe your problem is different enough that this would be a good idea.
My company developed a Windows forms c# application that stores images in a database and it worked out pretty well. We have been actively using it since 2003 and have about 150 gigs of data in the system.
First, let me say that this is NOT the optimal performance architecture. We have had some problems with keeping the database statistics up to date and keeping the indexes tuned correctly. We basically have to re-index the system monthly. You need to be aware that the built-in optimization system of most RDBMS servers is not set up for large collections of binary objects.
The reason we chose to put the images in the database is because of database level replication. Our system is spread across seven offices in five states and I needed to sync the data to each site. So, I pinned up a VPN between each site and our corporate office and set up SQL merge replication on the database. In this way, I can sync the data and images at the same time with only one channel open between offices.
So, I would say that images in the database is not the optimal solution in most cases but it worked out for our requirements.
I don't think it matters where the images are stored. Pick the simplest approach that will work. But you should have an architecture where you can change the approach if it proves to be the wrong one.
To accomplish this, I would put the data and the image storage both behind a web services interface. Pick a technology - doesn't matter. All access to the data (and images) would be the same way - through the web service.
By doing this, you have decoupled where the data is stored from the desktop application. The desktop app doesn't care. All it knows is that the server at a certain address can get it the data.
Then store the data and the images wherever you want. Choose the simplest thing for you. If you end up having issues, then (and only then) should you add additional complexity in order to solve the problem. The good news is that the additional complexity and work shouldn't affect the desktop applications at all. You can make the changes on the server without having to deploy a new version of the desktop applications.
If you're looking for alternatives, one of my favorites is a ten-line HTTP POST file upload handler (PHP, .NET, Java, etc.) + one webserver. When the script validates max file size, and possibly extracts the width & height, it inserts a row into the database. Retrieval need not go through the script. Standard file hosting will work. This would require you to open port 80. You needn't complicate this with SOAP or anything. A regular upload handler would do the job.
Then there's WebDAV, along the same lines. Of course, with this method, you'd have to monitor the filesystem and adjust the database accordingly. You could use a polling service or hook into file system events. Actually, you could also inject an ISAPI filter or Apache handler to perform the database updates.
You could use FTP. Add an extension to ProFTPd that will update the database and keep everything in sync.
Lots of ways to avoid putting image data into tables.
If you opt for the database solution, just be sure to segment your BLOBs into separate tables. Separate table spaces / devices / partitions, if you can. Or, use Oracle and ignore everything I've said.
Use Amazon S3 storage for your images
Just store the GUID or other file name in the DB
Amazon is simple , fast, cheap. secure etc etc
It scales fine, and optionally provides CDN like edge services directly from S3
Storing images in the DB always seems to turn into a nightmare over time
It seems to me that what you want to do something like what Infovark do.
They use Firebird for this and I'll give you a link on Firebird and storing image
you should try MS SQl 2008, it comes with a Type: FileStream, which automatically store blob in file system.
Related
I need to be able to support user image upload and download/view images.
here are my options.
1) Store images in a sql database.
I have seen this work for a small setup. DB cost would go higher as the size increases.
backups would be easier. Can't take advantage of caching or CDN.
2) Store images in a file system.
I have seen this option being cumbersome in slightly larger than a small setup. Difficult to manage directories with huge number of files. Will have to come up with some hashing algorithm to make sure there are a few images in a directory and a directory contains only a few directories. Dont know if there is a limit in windows for creating a deep directory structure. Could use caching.
3) Store images in nosql DB.
Just throwing this one there. I am not too familiar with NoSql.
4) Windows Azure storage/Amazon storage.
Couple of things.
1) money is an important factor.
2) windows is preferred environment but linux/apache solutions are ok.
And one more thing. What would Facebook do? or does.
Thanks again.
You should go with a hybrid solution.
Store your actual binary images on the filesystem, but use a database for image metadata. This gives you an easier medium with which to serve the files from - allowing for scalability and potentially speed of serving them, whilst also having the speed of a database for searching, filtering, etc.
I have seen various ways of implementing this.. but generally they are primary keys + mime type + directory tied to a file name / folder. For example, a photo in the /simon-whitehead/albums/stackoverflow/ directory with the filename 1013.jpg would have something like this as it's table in the database:
Id - 1013
Name - example.jpg
AlbumId - (Stackoverflow album id)
UserId - (my user id)
Lat - 37.81
Long - 144.96
Date - 7/10/2013
Mime type - image/jpeg
You may even have a junction table that joins tags to images (for searching). Then, you basically build the response like this:
file = getuser(userId).name / getalbum(albumId).name / getimage(imageid).name
EDIT: I see you've now added Azure. I will say that one company I worked for used Azure and they had fantastic experiences. I however didn't get much chance to have a look.. so I cannot give any advice on that.
Don't reinvent the wheel.
Image Resizing for .Net has pretty much everything you could think of. Caching, cloud plugins, API and a huge community + associated support.
There are a variety of methods to optimize performance and it's easy to switch from one provider to another say S3 to Azure; take a look at that product (it has a nuget package) if you have a chance.
If your project is not in Azure, it's better to save your files in a file system.
But if your project is already located in Azure, then you should better use blob's containers for storing your files.
Comparing where to store the files: in a database or in a file system , the answer is:
It's better to use a file system because it would work faster and your database won't grow up because of the tons of the images.
Is it possible to have a Redis server running on two machines and each server specifies in the config file the same snapshot dump file name and directory, with the directory and file obviously being shared between both machines?
RavenDB seems to work fine with that, I can setup the whole server file directory on a Dropbox folder on my machine and do the same on the other machine with the two drop boxes syncing while the RavenDb servers read and write data from/to the database that is stored within the drop box folder.
I understand both DBs' concepts are very different, I just use the RavenDB experience as example to explain what I try to accomplish. Please note this is just for developing purposes not to run in production.
I am running Redis in Version 2.4.5 as a Windows service and use BookSleeve as client within C# .Net 4.5
Thanks
Most certainly not. This would be a sure way to ensure a corrupt file.
You might want to watch progress on Redis Cluster (http://redis.io/topics/cluster-spec), currently at the specification stage.
The only time you would use the dump file on a system which does not have persistence enabled is on boot time. However, if persistence is disabled it doesn't read from the dump file.
Even without server specific data on the dump file the possibility for corruption comes at any and every point when both services write to the file. You could set the persistence settings to only save if there have been, say, 59 million changes in 60 seconds. This could allow you to read the file on load but not save to it. You would then need to use
redis config set save ""
To disable saving in both but be able to save when you want, you would do the above and issue a by save command.
I also have to advise against doing this over a shared file system, which is what you'll need to do this with multiple machines accessing the same file. In your case you are talking about Dropbox as your shared file system, but this is likely to kill performance if you are persisting to disk.
But ultimately, I'd have to ask why you think you need this?
If you are using one for read only, then use a slave or two and do reads on the slaves. This way you don't have to worry about multiple instances corrupting a persistence file. This avoid the need for shared storage as you have two nodes running each with a copy of the data. This provides redundancy and you can relatively easily work a master/slave failover setup.
Ultimately, if you are just using it to develop something against, I don't see the need for such a setup. Just store configuration where you can download it (Dropbox, github, etc) and develop away. It isn't difficult, and certainly less complicated, to simply copy your dump file to Dropbox or anywhere else you need it than to do what you describe.
Problem:
I have multiple instances of the same C# application running on different PCs (OS: Windows XP, Windows 7) in the same LAN. I have to share some configuration data among them. Each process must have read-write access to the data. My employer insists on storing these shared data in a file, which is in a shared directory on one of these PCs.
Possible solutions:
Exclusive file opening: The data is stored in a TXT file (serialization to and from a binary file is also an option). Each process uses File.Open with FileShare.None when trying to open the file. Getting an IOException means that the file is already in use, so the process has to wait and try again later.
SQL Server CE embedded DB: The data is stored in an SDF file. The engine can handle at most 256 simultaneous connections (v3.5 SP2), which is more than enough.
SQLite embedded DB: The data is stored in an SQLite DB file. The documentation says SQLite works, but may be unreliable when used on a network share.
Other?
What is the preferred way to do this?
Don't know if is the best way, but I've done this in C ages ago, it was working well for me.
Each process will read and create a personal copy of the file and then work on that.
At a fixed moment (upon process termination or triggered via some UI or whatever you feel like) each process will send its copy of the file to a master process in charge of rebuilding the original file in the shared directory and signaling the other process that they need to reload.
Each process reloads the file (containing infos coming from all the other processes).
Of course this solution requires that the file writing process has knowledge on how to rebuild the file and how to resolve conflicts (but this depends on data format)
You don't really describe the type of data you're working with so I'd say the answer varies.
Using a proper DBMS for this would be best if the data you are working with could generally be considered record/field oriented (and under rare circumstance even if it isn't). In this case I would recommend MSSQL CE since its runtime will mitigate multi-user issues for you.
SQLite was generally considered a single user/application database (at least back when I used it in C) though things could have changed in the last 5 years. If you're using .NET 4 then there are few free adapters available for use from what I've found unless you're comfortable with a mixed framework application.
I would only monitor the file locking manually if you're in a situation where the data is pretty flat by design (like a log file), though if it was log like data I would probably look into how some of the open source log libraries do it. You basically said you have control over the data structure so I'd suggest redesigning the data to be more normalized/rigid to avoid using this solution.
Create a web service and make your programs pull the configuration from there. You can control file locking from inside the web service and not have to deal with that at the program level. This also affords you the abstraction that if you decide to change how the settings are stored (e.g. move them from a file to a database) you can do this without having to make any changes to your program.
A CMS we use called Kentico stores Media Library Files on the file system, and also stores a record in the database for file meta data (title, description, etc.). When you use a Media Library control to list those items, it will read the files from the file system to display them. Is it faster to read from the file system then to query the database? Or would it be faster to run a simple query on the media file meta data database table?
Assumptions:
Kentico is an ASP.NET application, so the code is in C#. They use simple DataSets for passing their data around.
Only meta data would be read from the direct files like filename and size.
At most 100 files per folder.
The database query would be indexed correctly.
The query would be something like:
SELECT *
FROM Media_File
WHERE FilePath LIKE 'Path/To/Current/Media/Folder/%'
The short answer is, it depends on a number of variable factors, but the file system will generally be faster than a DB.
The longer answer is: scanning the local filesystem at a known location is generally fast, because the resource is close to home and computers are designed to do these operations very efficiently.
HOWEVER, whether it's FASTER than a database depends on the database implementation, where it's located, and how much data we're talking about. On the whole, DBMSes are optimized to very effectively store and query large datasets, while a "flat" filesystem can only scan the drive as fast as the hardware goes. How fast they are depends on the implementation (SqLite isn't going to be as fast overall as MS Sql Server or Oracle), the communication scheme (transferring files over a network is the slowest thing your computer does regularly; by contrast, named pipes provide very fast inter-process communication), and how much hardware you're throwing at it (a quad-Xeon blade server with SATA-RAID striping is going to be much faster than your Celeron laptop).
In addition to what others have said here, caching can come into play too depending on your cache settings. Don't forget to take those into account as Kentico, SQL, and IIS all have many different levels of caching and are used at different times depending on your setup, configuration, and which use case(s) you are optimizing.
When it comes to performance issues at this level, the answer is often: it depends. So benchmark your own solution to see which one helps most in your particular users' situational needs.
Kentico did release a couple of performance guides (for 5.0 and another for 5.5) that may help, but they still won't give you a definitive answer until you test it yourself.
I have a C# application that allows one user to enter information about customers and job sites. The information is very basic.
Customer: Name, number, address, email, associated job site.
Job Site: Name, location.
Here are my specs I need for this program.
No limit on amount of data entered.
Single user per application. No concurrent activity or multiple users.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
Allows for user queries to display customers based on different combinations of customer information/job site information.
The data will never be viewed or manipulated outside of the application.
The program will be running almost always, minimized to the task bar.
Startup time is not very important, however I would like the queries to be considerably fast.
This all seems to point me towards a database, but a very lightweight one. However I also need it to have no limitations as far as data storage. If you agree I should use a database, please let me know what would be best suited for my needs. If you don't think I should use a database, please make some other suggestions on what you think would be best.
My suggestion would be to use SQLite. You can find it here: http://sqlite.org/. And you can find the C# wrapper version here: http://sqlite.phxsoftware.com/
SQLite is very lightweight and has some pretty powerful stuff for such a lightweight engine. Another option you can look into is Microsoft Access.
You're asking the wrong question again :)
The better question is "how do I build an application that lets me change the data storage implementation?"
If you apply the repository pattern and properly interface it you can build interchangable persistence layers. So you could start with one implementation and change it as-needed wihtout needing to re-engineer the business or application layers.
Once you have a repository interface you could try implementations in a lot of differnt approaches:
Flat File - You could persist the data as XML, and provided that it's not a lot of data you could store the full contents in-memory (just read the file at startup, write the file at shutdown). With in-memory XML you can get very high throughput without concern for database indexes, etc.
Distributable DB - SQLite or SQL Compact work great; they offer many DB benefits, and require no installation
Local DB - SQL Express is a good middle-ground between a lightweight and full-featured DB. Access, when used carefully, can suffice. The main benefit is that it's included with MS Office (although not installed by default), and some IT groups are more comfortable having Access installed on machines than SQL Express.
Full DB - MySql, SQL Server, PostGreSQL, et al.
Given your specific requirements I would advise you towards an XML-based flat file--with the only condition being that you are OK with the memory-usage of the application directly correlating to the size of the file (since your data is text, even with the weight of XML, this would take a lot of entries to become very large).
Here's the pros/cons--listed by your requirements:
Cons
No limit on amount of data entered.
using in-memory XML would mean your application would not scale. It could easily handle a 10MB data-file, 100MB shouldn't be an issue (unless your system is low on RAM), above that you have to seriously question "can I afford this much memory?".
Pros
Single user per application. No concurrent activity or multiple users.
XML can be read into memory and held by the process (AppDomain, really). It's perfectly suited for single-user scenarios where concurrency is a very narrow concern.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
XML is perfect for exporting, and also easy to import to Excel, databases, etc...
Allows for user queries to display customers based on different combinations of customer information/job site information.
Linq-to-XML is your friend :D
The data will never be viewed or manipulated outside of the application.
....then holding it entirely in-memory doesn't cause any issues
The program will be running almost always, minimized to the task bar.
so loading the XML at startup, and writing at shutdown will be acceptible (if the file is very large it could take a while)
Startup time is not very important, however I would like the queries to be considerably fast
Reading the XML would be relatively slow at startup; but when it's loaded in-memory it will be hard to beat. Any given DB will require that the DB engine be started, that interop/cross-process/cross-network calls be made, that the results be loaded from disk (if not cached by the engine), etc...
It sounds to me like a database is 100% what you need. It offers both the data storage, data retrieval (including queries) and the ability to export data to a standard format (either direct from the database, or through your application.)
For a light database, I suggest SQLite (pronounced 'SQL Lite' ;) ). You can google for tutorials on how to set it up, and then how to interface with it via your C# code. I also found a reference to this C# wrapper for SQLite, which may be able to do much of the work for you!
How about SQLite? It sounds like it is a good fit for your application.
You can use System.Data.SQLite as the .NET wrapper.
You can get SQL Server Express for free. I would say the question is not so much why should you use a database, more why shouldn't you? This type of problem is exactly what databases are for, and SQL Server is a very powerful and widely used database, so if you are going to go for some other solution you need to provide a good reason why you wouldn't go with a database.
A database would be a good fit. SQLite is good as others have mentioned.
You could also use a local instance of SQL Server Express to take advantage of improved integration with other pieces of the Microsoft development stack (since you mention C#).
A third option is a document database like Raven which may fit from the sounds of your data.
edit
A fourth option would be to try Lightswitch when the beta comes out in a few days. (8-23-2010)
/edit
There is always going to be a limitation on data storage (the empty space of the hard disk). According to wikipedia, SQL Express is limited to 10 GB for SQL Server Express 2008 R2