.NET Data Storage - Database vs single file

.NET Data Storage - Database vs single file - c#

I have a C# application that allows one user to enter information about customers and job sites. The information is very basic.
Customer: Name, number, address, email, associated job site.
Job Site: Name, location.
Here are my specs I need for this program.
No limit on amount of data entered.
Single user per application. No concurrent activity or multiple users.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
Allows for user queries to display customers based on different combinations of customer information/job site information.
The data will never be viewed or manipulated outside of the application.
The program will be running almost always, minimized to the task bar.
Startup time is not very important, however I would like the queries to be considerably fast.
This all seems to point me towards a database, but a very lightweight one. However I also need it to have no limitations as far as data storage. If you agree I should use a database, please let me know what would be best suited for my needs. If you don't think I should use a database, please make some other suggestions on what you think would be best.

My suggestion would be to use SQLite. You can find it here: http://sqlite.org/. And you can find the C# wrapper version here: http://sqlite.phxsoftware.com/
SQLite is very lightweight and has some pretty powerful stuff for such a lightweight engine. Another option you can look into is Microsoft Access.

You're asking the wrong question again :)
The better question is "how do I build an application that lets me change the data storage implementation?"
If you apply the repository pattern and properly interface it you can build interchangable persistence layers. So you could start with one implementation and change it as-needed wihtout needing to re-engineer the business or application layers.
Once you have a repository interface you could try implementations in a lot of differnt approaches:
Flat File - You could persist the data as XML, and provided that it's not a lot of data you could store the full contents in-memory (just read the file at startup, write the file at shutdown). With in-memory XML you can get very high throughput without concern for database indexes, etc.
Distributable DB - SQLite or SQL Compact work great; they offer many DB benefits, and require no installation
Local DB - SQL Express is a good middle-ground between a lightweight and full-featured DB. Access, when used carefully, can suffice. The main benefit is that it's included with MS Office (although not installed by default), and some IT groups are more comfortable having Access installed on machines than SQL Express.
Full DB - MySql, SQL Server, PostGreSQL, et al.
Given your specific requirements I would advise you towards an XML-based flat file--with the only condition being that you are OK with the memory-usage of the application directly correlating to the size of the file (since your data is text, even with the weight of XML, this would take a lot of entries to become very large).
Here's the pros/cons--listed by your requirements:
Cons
No limit on amount of data entered.
using in-memory XML would mean your application would not scale. It could easily handle a 10MB data-file, 100MB shouldn't be an issue (unless your system is low on RAM), above that you have to seriously question "can I afford this much memory?".
Pros
Single user per application. No concurrent activity or multiple users.
XML can be read into memory and held by the process (AppDomain, really). It's perfectly suited for single-user scenarios where concurrency is a very narrow concern.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
XML is perfect for exporting, and also easy to import to Excel, databases, etc...
Allows for user queries to display customers based on different combinations of customer information/job site information.
Linq-to-XML is your friend :D
The data will never be viewed or manipulated outside of the application.
....then holding it entirely in-memory doesn't cause any issues
The program will be running almost always, minimized to the task bar.
so loading the XML at startup, and writing at shutdown will be acceptible (if the file is very large it could take a while)
Startup time is not very important, however I would like the queries to be considerably fast
Reading the XML would be relatively slow at startup; but when it's loaded in-memory it will be hard to beat. Any given DB will require that the DB engine be started, that interop/cross-process/cross-network calls be made, that the results be loaded from disk (if not cached by the engine), etc...

It sounds to me like a database is 100% what you need. It offers both the data storage, data retrieval (including queries) and the ability to export data to a standard format (either direct from the database, or through your application.)
For a light database, I suggest SQLite (pronounced 'SQL Lite' ;) ). You can google for tutorials on how to set it up, and then how to interface with it via your C# code. I also found a reference to this C# wrapper for SQLite, which may be able to do much of the work for you!

How about SQLite? It sounds like it is a good fit for your application.
You can use System.Data.SQLite as the .NET wrapper.

You can get SQL Server Express for free. I would say the question is not so much why should you use a database, more why shouldn't you? This type of problem is exactly what databases are for, and SQL Server is a very powerful and widely used database, so if you are going to go for some other solution you need to provide a good reason why you wouldn't go with a database.

A database would be a good fit. SQLite is good as others have mentioned.
You could also use a local instance of SQL Server Express to take advantage of improved integration with other pieces of the Microsoft development stack (since you mention C#).
A third option is a document database like Raven which may fit from the sounds of your data.
edit
A fourth option would be to try Lightswitch when the beta comes out in a few days. (8-23-2010)
/edit
There is always going to be a limitation on data storage (the empty space of the hard disk). According to wikipedia, SQL Express is limited to 10 GB for SQL Server Express 2008 R2

Related

Correct solution for persistent table/grid in C# that does not require a full database solution?

My WinForms C#/.NET application requires a table/grid control to display records to the end user. The records will be simple, containing only two fields, a string and a date/time field. I need to persist the data and I am wondering what the most efficient control and storage back-end is to use. The data is non-critical (i.e. - not health or financial records, or anything sensitive requiring extensive safety or any encryption).
One solution I have found so far is the DataGrid control in conjunction with SQL Server Compact Edition. I learned about this solution from this tutorial:
http://www.dotnetperls.com/datagridview-tutorial
It seems though that this may be overkill for my application. In addition, I am worried about the complexities of installing SQL Server CE, especially when it comes to admin vs. user account privilege issues during installation:
http://msdn.microsoft.com/en-us/library/aa983326(v=vs.80).aspx
Is there a table or grid control with built-in file load/save capabilities that uses a simple disk file as the storage method, perhaps a comma delimited ASCII file? I'd like something that I can still use SQL (via LINQ) to interface with. also, I am hoping that this can be done transparently. That is, if I want to upgrade to an SQL database engine solution later, the code from my end that interfaces with the data would not change (except perhaps for the database open/create code of course).
Or am I better off simply biting the bullet and going with SQL Server CE or perhaps SQLite:
Good embedded database solution (like SQLite) for .Net
If you have any caveats or anecdotes regarding installation issues and ease of use, they would be appreciated.

In my projects, we use Object datasources. Grid's can be bound to collections of objects just as easily as they can dataTables. You can store/restore the data using a simple serialization engine (XmlSerializer is rather easy to implement). Make a basic object, use List or BindingList as the dataset, and serialize/de-serialize it in the backEnd when you need it.
List and BindingList both support Linq queries.
Adding database save later is as simple as writing the code that saves the object to the database, in place of the serialization code, no change to the front end at all.
As far as a "Correct" solution is concerned...there are so many different ways to do it that it boils down to personal preference, and possibly actual requirements and expected future development. I find it easier to code using objects because the data manipulation is easier, but if you are going for straight record entry, no data manipulation required, going direct to a database is easier. It just depends on the data and what you plan on doing with it.

I strongly recommend you to use an embedded database, because it will be easier to go to a full database in a near future. SQL Server CE is a good option, and if you want to go big you can simply go to a full SQL Server Database with minimal changes in your code, the only downside of SQL Server CE is that you need to install it and it requires the .NET Framework 4, aside from that I don't see a big problem with it.

What is a good way to store a large amount of data in an offline, single-client application?

All other applications I've written have been network/web apps where I've used an SQL database to store data and that has made sense to me.
If I'm creating an application that does not need to be networked ever, is there a standard way to store this data permanently? A text file is possible, but doesn't give me the benefits of querying an SQL server nor is it very secure. Is there something similar to an SQL server that I can initiate and save on starting up my program?
Perhaps there is some structure I've never come across?
From what I've read I might be able to do something as mentioned above like with SQLite. Does that make sense for large and distributed applications?
Thanks in advance for any clarifications on how to design these types of programs.
Edit: to clarify what #TomTom was saying, it is not a large amount of data like he is suggesting. I would be surprised if it ever got over several gigs of data. My point in saying large was that it seemed unreasonable to create some sort of a data structure that I could save into a text file, load up/search through to grab my data compared to using an SQL-like database.
Reading through the answers apparently SQLite or something similar is reasonable to use for desktop applications. I'll continue looking into it and probably use it to track data for my next project.

You can use an embedded database - this can be a SQL database, but does not have to be.
For windows, look at SQLite, SQL Server Compact and RavenDB (for a non SQL, document database).

You could still use SQL database, but locally. Try SQLite.
Other option to use Windows built-in database engine which name is Esent. Fast, but not really convenient to use its Api.

How to transfer data (Entities) from one database to another

I have a system (using Entity Framework) that is deployed in various production systems and also on a quality control system. My problem is that data entry is often done only on one of those occurrences of my system (different databases).
I want to find the best way to transfer my data from one database to another database. Ids can change, as long as the relations between my objects are maintained. 98% of my data in in DB, some of it is external files, I can manage those separately, manually.
Currently we use a xml structure as a transition file. The file is then imported in the destination system, and code manually imports the entities and re-creates the data.
I'm looking for a more generic way to do this, with less code. Since all my data in stored in Entities couldn't I simply create a big List and throw all my objects in there, then serialize that in some matter into an external file and finally generically import all the entities in there in my destination system? (I'll probably have to be careful in maintaining relation ids, but should be ok...)
Anyways I'm wondering if anyone would have smart approaches, I'm pretty sure I,m not the first with a similar problem.
Thanks!

You need to get some process around this. If all environments contain the same data (unlikely) you can replicate. It is the most automatic. But a QA environ should not update production, so you have to really think this through.
If semi-automated is okay, there are tools out there you can use from a variety of vendors. I use Red Gate tools, personally, but others are also fine.
Can you set up a more automated push with EF? Sure, but the amount of time you spend is really not worth it.

In my opinion you can check some of the following approaches:
1) Use Sql Compare or Sql Data Compare. Those tools are from Red Gate and can be found here
2) Regular backups and restores of the databases. You could, if it is an option regularly backup your most up-to-date database and restore it on the destination systems. I have no experience in automatizing this but here is a link to do that through .net.
3) You could always give it a go creating a version control system of your own. I would picture one such system selecting all records from a certain table (or all of them), deleting all records in the target database and inserting them. This seems pretty complex though, as you have to worry about relationships, data dependencies, etc.
Hope this helps in some way.
Regards

If you for some reason will not be satisfied with existing tools may be you'll want take a look at the Sync Framework and implement this functionality yourself for your very particular data bases.

Given what you described, pushing data from One SQL Server to another for demo purposes, you should consider SQL Server Integration Services.
If you're got a simple scenario where you just move the data and objects from DB to the next you can use their built-in Wizards. If you need to do custom stuff you can build complex workflows using C# and SQL (tools you already know). Note: most of what you're going to want comes with the standard edition so if you're using express this is less interesting.
The story for Red Gate products is more compelling when you don't have SQL Server (So you have to go out and buy something) and if you are interested in finding out what the changes are between DB's (like viewing code changes in a .cs file in a source control product)

File system based reads vs. a simple database query?

A CMS we use called Kentico stores Media Library Files on the file system, and also stores a record in the database for file meta data (title, description, etc.). When you use a Media Library control to list those items, it will read the files from the file system to display them. Is it faster to read from the file system then to query the database? Or would it be faster to run a simple query on the media file meta data database table?
Assumptions:
Kentico is an ASP.NET application, so the code is in C#. They use simple DataSets for passing their data around.
Only meta data would be read from the direct files like filename and size.
At most 100 files per folder.
The database query would be indexed correctly.
The query would be something like:
SELECT *
FROM Media_File
WHERE FilePath LIKE 'Path/To/Current/Media/Folder/%'

The short answer is, it depends on a number of variable factors, but the file system will generally be faster than a DB.
The longer answer is: scanning the local filesystem at a known location is generally fast, because the resource is close to home and computers are designed to do these operations very efficiently.
HOWEVER, whether it's FASTER than a database depends on the database implementation, where it's located, and how much data we're talking about. On the whole, DBMSes are optimized to very effectively store and query large datasets, while a "flat" filesystem can only scan the drive as fast as the hardware goes. How fast they are depends on the implementation (SqLite isn't going to be as fast overall as MS Sql Server or Oracle), the communication scheme (transferring files over a network is the slowest thing your computer does regularly; by contrast, named pipes provide very fast inter-process communication), and how much hardware you're throwing at it (a quad-Xeon blade server with SATA-RAID striping is going to be much faster than your Celeron laptop).

In addition to what others have said here, caching can come into play too depending on your cache settings. Don't forget to take those into account as Kentico, SQL, and IIS all have many different levels of caching and are used at different times depending on your setup, configuration, and which use case(s) you are optimizing.
When it comes to performance issues at this level, the answer is often: it depends. So benchmark your own solution to see which one helps most in your particular users' situational needs.
Kentico did release a couple of performance guides (for 5.0 and another for 5.5) that may help, but they still won't give you a definitive answer until you test it yourself.

Any considerations before jumping into SQLite?

I have a WCF application that at present is using XML based file storage to store data that gets used to generate reports. Besides this processing decisions are made based on information stored in these XML files.
I'm now hitting volumes of around 30 000 text files. This is incredibly taxing, and the application at times comes to a grinding halt.
I've always wanted to swop out the XML DAL in favor of an RDBMS, but project managers simply won't allow it. But they would be willing to look at a serverless solution for example SQLLite. I am really tempted to just dive right in and start using it as a replacement DAL (Data Access Layer).
I would need no more than around 20 tables in the whole solution, and I would expect to get no more than around 20 000 - 100 000 transactions a day, however this is extreme, the real volumes would be less than this in most cases.
Update
I am not expecting a great deal of simultaneous connections, when I say transactions, I essentially mean 1 or 2 clients that make calls and execute against the database in order. At times there might be a possibility of external clients making quick calls to the DB. But the bulk of DB connections will be done by my WCF service, which is a back end scheduled task, not serving 100's of people across an organization.
Another good point is that I only need to retain data for 90 days, so the DB shouldn't grow too big.
My main concerns are:
How reliable is SQLLite? What if the DB File gets corrupted, will I loose all processing Data. How easy is the DB to back up? Will it handle my volumes? And lastly how well does the .net provider work (located here: http://sourceforge.net/projects/sqlite-dotnet2/).
If you have any experience with SQLLite, please post your experiences so I can make aan informed decision to switch or not.
Thanks in advance...

SQLite is as reliable as your OS and hardware.
Its transactional rate is similar to SQL server, and often faster because it's all in process.
The .NET ADO provider works great.
To back up the DB, stop the service and copy the file. If the journal file is present copy it too.
EDIT: SQLite uses UTF-8 by default so with the ADO-NET provider you should be able to avoid losing accents (just so long as you follow the typical XML in string rules).

You could consider Microsoft's Sql Compact Edition.
It's like sqlite, in terms of being a single file embedded database, but has better integration with the .net framework :)
SQLite seems reliable, and even with Microsoft's one, don't expect to receive much support in case of a corrupted database.

Given your transaction volume I'd say the fact that the DB itself is a single monolithic file with only file system locking available could be a problem.
There is no row based locking as far as I know.

I used SQLite with the .Net provider without problems in a monouser enviroment, except for one concern: accents, wich don't showed correcly. The backup is quite simply: the SQLite database is an plain text file. Simply copy it.

I use Sqlite for storing XML config data and have had no problems with it. I use the System.Data.Sqlite provider: http://sqlite.phxsoftware.com/. It's solid and has a good support forum. It also includes a LINQ provider. It also integrates with VS 2008 so you can use Server Explorer to query tables. The examples and documentation also show how to use parameterized commands and transactions for increased performance.
The release candidate for LinqPad now supports Sqlite: http://www.linqpad.net/Beta.aspx.
Sqlite stores everything in a single file, which can be backed up like any other binary file.
Sqlite only supports file-level locking, but shouldn't present a performance problem since it doesn't sound like you'll have a large number of simultaneous transactions.
Unicode shouldn't be a problem. This link in the forum addresses an area where someone was trying to read unicode characters with an incompatible utility http://sqlite.phxsoftware.com/forums/t/954.aspx.
This site shows how to do case-insenitive UTF8 comparisons using System.Data.Sqlite via a custom collator, with Russian characters as an example: http://www.codeproject.com/KB/database/SQLiteUTF8CIComparison.aspx.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.