I am using a xml file as a database currently in development.
The xml file is going to be modified by multiple users over the network.(Not on a server per say but on my computer where they have access over the network)
I kinda know it is a bad idea to use xml for this but the structure of xml is much better/cleaner/something I like.
Wondering, what are my options ? As in would I be able to continue with the xml with some weird background custom connection ? (Which would verify all the necessary details to allow me to write/read from the xml without issues)
Or am I stuck in using some SQL type of database? If I am stuck in using that would there be some sort of database that is somewhat similar to XML...
EDIT: Reason for liking xml.
Grouped easily for the eyes.
<SomeDocument name="Something">
<URL>bbbb</URL>
<Something>2342</Soemthing>
<Something_That_would_of_been_in_another_database>derp</...>
rather than linking 3-4 tables together...
There are some examples of XML based databases that support multi-user environments. One is the OneNote Revision File Format used by Microsoft OneNote. Although there is a very detailed documentation on that, it is tremendously complicated to support multiple users editing a single file. Basically one could argue that an XML based storage is not viable option when you need multi-user support.
If you are stuck with the XML file you could look into the OneNote file format, but it isn't a traditional XML format, since it also uses a "binary wrapper", meaning that the actual content is defined in XML data within the binary file, but transactions/revisions/free chunks are represented binary. This is necessary since you have to allocate specific portions of the file for users to write to, while you have the file open.
If you don't want to use a dedicated server software, you could use various file-based databases like SQL CE or SQLite.
You would need to deal with concurrency issues if you used a file that several users had access to. Guarantees need to be made for one user not overwriting another user's changes made around the same time.
My suggestion is to use a proper database (e.g. SQL Server) that will handle these issues for you.
I am not familiar with the C# soultions, but for our java application we use eXist-db and query it with xquery. I'm not too familiar with it, but some use markLogic. Still more use Berkley db.
The question whether or not to use a native XML database, an XML-enabled database, a so-called NoSQL database, or any of the more traditional methods can rely on multiple factors. Just to mention two:
Most importantly, do you have your data in XML, and do you want to keep it that way? If so, use an XML-enabled solution.
Do you need scalability or performance? If so, you will need a solution that can deal with that. There are lots of NoSQL and XML databases that are well capable of handling that.
As for concurrency: any database should deal with that natively.
A number of databases have been mentioned already. To single out a few, MarkLogic Server ( www.marklogic.com ) is built to scale and perform upto Terabyte scale (and beyond), and has connectors for amongst others Java and .Net. The solution from 28msec ( www.28msec.com based on Zorba) runs in the cloud, and should scale too.
But most interesting to mention here is that these databases are often used through HTTP / REST interfaces. That allows easy integration from any programming language, and makes interchanging easier too.
Related
I would like to handle xml data in an activerecord way, so 1 class for each xml structure (I will need an xsd obviusly) and the possibility to do operations like Users.FindAll() like castle activerecord do.
The problem is, obviusly, that those are xml file, not relational databases.
Are there any library to achieve this? If is MS library and not a third party library is better, obviusly.
To understand why I would like to achieve this, I'll explain the program I'm building so you can eventually give me some suggestions if a different approach is better:
The program "output" will be something like a long MS-Word (or pdf) document which will contains information about how a company handles the privacy of their customers, following the local legislation.
I will have, so, a "global" xml file which contains something like Jobs (as defined in law, but law can change so should be editable by the user) that each employee can have in it's company (there will be other data too, this is a generic example).
Then, I will have an xml file for each company the user would like to use this program for. This xml file will have a list of employees where each emplyee have a reference to the Job (chosen from the global xml file).
Obviusly the program will have much more data, but this explains how it works.
I'm still not sure if I must use a relational databse, what really frighten me in case I use one, is that I will have "troubles" in allowing the user to export/import data if he install the program on a new computer. Also I would like to avoid to force the user to install a database on his computer (well, an sqlite-like database could be ok because is on a file).
Any suggestion about this?
Thanks to everyone
Although Linq-to-XML is pretty easy to use, there are many more things to do when it comes to reading and storing related data in a way a RDBMS does. An RDBMS is all about referential integrity, ACID transactions, concurrent users, performance enhancements, to name a few elements that spring to my mind now. Thinking of this daunting task, I think doing this all by yourself is more scary than deploying a database file.
There are some XML-based databases, but I don't know how mature and user friendly they are. I even remember having read of database systems based on plain text files.
I would go for the paved roads and use a relational database, possibly a local database, as you already suggested. Lots of support and tooling available.
How about to use Linq to Xml?
I have a C# application that allows one user to enter information about customers and job sites. The information is very basic.
Customer: Name, number, address, email, associated job site.
Job Site: Name, location.
Here are my specs I need for this program.
No limit on amount of data entered.
Single user per application. No concurrent activity or multiple users.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
Allows for user queries to display customers based on different combinations of customer information/job site information.
The data will never be viewed or manipulated outside of the application.
The program will be running almost always, minimized to the task bar.
Startup time is not very important, however I would like the queries to be considerably fast.
This all seems to point me towards a database, but a very lightweight one. However I also need it to have no limitations as far as data storage. If you agree I should use a database, please let me know what would be best suited for my needs. If you don't think I should use a database, please make some other suggestions on what you think would be best.
My suggestion would be to use SQLite. You can find it here: http://sqlite.org/. And you can find the C# wrapper version here: http://sqlite.phxsoftware.com/
SQLite is very lightweight and has some pretty powerful stuff for such a lightweight engine. Another option you can look into is Microsoft Access.
You're asking the wrong question again :)
The better question is "how do I build an application that lets me change the data storage implementation?"
If you apply the repository pattern and properly interface it you can build interchangable persistence layers. So you could start with one implementation and change it as-needed wihtout needing to re-engineer the business or application layers.
Once you have a repository interface you could try implementations in a lot of differnt approaches:
Flat File - You could persist the data as XML, and provided that it's not a lot of data you could store the full contents in-memory (just read the file at startup, write the file at shutdown). With in-memory XML you can get very high throughput without concern for database indexes, etc.
Distributable DB - SQLite or SQL Compact work great; they offer many DB benefits, and require no installation
Local DB - SQL Express is a good middle-ground between a lightweight and full-featured DB. Access, when used carefully, can suffice. The main benefit is that it's included with MS Office (although not installed by default), and some IT groups are more comfortable having Access installed on machines than SQL Express.
Full DB - MySql, SQL Server, PostGreSQL, et al.
Given your specific requirements I would advise you towards an XML-based flat file--with the only condition being that you are OK with the memory-usage of the application directly correlating to the size of the file (since your data is text, even with the weight of XML, this would take a lot of entries to become very large).
Here's the pros/cons--listed by your requirements:
Cons
No limit on amount of data entered.
using in-memory XML would mean your application would not scale. It could easily handle a 10MB data-file, 100MB shouldn't be an issue (unless your system is low on RAM), above that you have to seriously question "can I afford this much memory?".
Pros
Single user per application. No concurrent activity or multiple users.
XML can be read into memory and held by the process (AppDomain, really). It's perfectly suited for single-user scenarios where concurrency is a very narrow concern.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
XML is perfect for exporting, and also easy to import to Excel, databases, etc...
Allows for user queries to display customers based on different combinations of customer information/job site information.
Linq-to-XML is your friend :D
The data will never be viewed or manipulated outside of the application.
....then holding it entirely in-memory doesn't cause any issues
The program will be running almost always, minimized to the task bar.
so loading the XML at startup, and writing at shutdown will be acceptible (if the file is very large it could take a while)
Startup time is not very important, however I would like the queries to be considerably fast
Reading the XML would be relatively slow at startup; but when it's loaded in-memory it will be hard to beat. Any given DB will require that the DB engine be started, that interop/cross-process/cross-network calls be made, that the results be loaded from disk (if not cached by the engine), etc...
It sounds to me like a database is 100% what you need. It offers both the data storage, data retrieval (including queries) and the ability to export data to a standard format (either direct from the database, or through your application.)
For a light database, I suggest SQLite (pronounced 'SQL Lite' ;) ). You can google for tutorials on how to set it up, and then how to interface with it via your C# code. I also found a reference to this C# wrapper for SQLite, which may be able to do much of the work for you!
How about SQLite? It sounds like it is a good fit for your application.
You can use System.Data.SQLite as the .NET wrapper.
You can get SQL Server Express for free. I would say the question is not so much why should you use a database, more why shouldn't you? This type of problem is exactly what databases are for, and SQL Server is a very powerful and widely used database, so if you are going to go for some other solution you need to provide a good reason why you wouldn't go with a database.
A database would be a good fit. SQLite is good as others have mentioned.
You could also use a local instance of SQL Server Express to take advantage of improved integration with other pieces of the Microsoft development stack (since you mention C#).
A third option is a document database like Raven which may fit from the sounds of your data.
edit
A fourth option would be to try Lightswitch when the beta comes out in a few days. (8-23-2010)
/edit
There is always going to be a limitation on data storage (the empty space of the hard disk). According to wikipedia, SQL Express is limited to 10 GB for SQL Server Express 2008 R2
I am writing a small program for our local high school (pro bono). The program has an interface allows the user to enter school holidays. This is a simple stand alone Windows app.
What format should I use to store the data? A big relational data is obviously overkill.
My initial plan was to store the data in an XML file. Co-workers have been suggesting that I use JSON files, Access Databases, SQL Lite, and SQL Server Express. There was even a suggestion of old school INI files.
Projects like this have a habit of getting bigger, quickly, and if they do your XML file will become complex and a burden to manage.
I would not recommend storing the data in an xml file or json - they are just text files by a different name, all suffering from the same problem - you don't have any control over who edits them.
Use some kind of db, starting from the small ones first (Access, SQLLite)
Edit
Based on your latest comments, roll forward to a point where the users have been using the app for two years.
How much data do you expect to have stored by then?
Will the user(s) need to look back through historic data to see, for example, what they did this time last year
And more so, right now
What is Teacher A doing on Thursday afternoon
Will Teacher B be free to attend event on 15th May 2010?
Can Student C attend event D?
All of these questions/problems are a lot easier/more efficient to handle with SQL. Plus your resulting codebase will make a lot more sense. Traversing XML isn't the prettiest of things to do.
Plus if your user base is familiar with Excel already, linking Excel to a SQL database (and produce custom results) is a lot easier than doing the same with XML.
Have you considered using SQLite? It'll result in a small .s3db file. SQLite is used by all kinds of desktop applications for local storage.
There's a SQLite .NET library that'll allow you to use ADO.NET to CRUD your data.
Check out Mike Duncan's article on how to get started with SQLite in .NET.
I would have to second the Json answer and SQL Lite.
Another option would be to use the built in database that's included in all of windows (since Windows 2000), ESENT. There is a codeplex project to make it easy to work with http://managedesent.codeplex.com/
Hm, obviously SQL Express is a full blown database - otohj it may make sense. Why NOT use a databsae if they already have one ?;)
Otherwise I would possibly go with a XML file.
I would recommend an XML file and a typed dataset.
You will need to figure out where to put the XML file.
Note that if you ever want to allow multiple users to use it, you need to use a database, such as Access or SQL Server.
I'd go for
SQL Server Express (free and full blown relational database)
XML/JSON if you don't want a database (LINQ to XML might be a big help here).
xml files seems to be a good option. using Linq to xml it should be quite easy to read/write the files from objects/to objects. check out XDocument and XElement classes
Access Databases, SQL Lite, and SQL Server Express seens like an overkill. do you really need a database to store a simple calendar data? is your application going to grow?
From "simple stand alone Windows app" it sounds to me like it's not critical. I'd go with whatever is easiest, or what you are most comfortable with. XML is usefully human readable, but a sensibly formatted flat file might be just as sensible.
I'd use serialized classes.
One thing to consider if you use a database instead of a text file of some sort: you're no longer a "simple stand alone Windows app". Now you've (most likely) got an installation program to write.
I'm writing a utility program with C# in WPF that allows users to create role-playing scenarios, including monsters, items, characters, etc.
The user will create or import the elements (monsters, etc) and then use the imported elements to create scenarios. Everything used by the program is created within the program, so I don't have any pre-defined data I'll be accessing.
Here's my question - what's the best way to store and load the data?
Currently, I'm using XML serialization to serialize the objects to XML files and reload them later. This is kind of clunky, and I'm wondering if a database would be more effective - the data is definitely relational (monsters have items, maps have monsters, etc), and there could be dozens or hundreds of entries.
I don't need actual lines of code or methods to use, just an idea of what kind of file storage/retrieval would usually be used in this situation (in .NET).
Thanks!
As you said yourself: The data is relational so a relational database will probably help. Using Sql Server Compact you can have simple files, which are named whatever you want, that you load into Sql Server when opening. That way you won't have to administer a traditional database server and the user won't even know there is a database involved.
To access the data I'm personally very fond of Linq-to-Sql, which gives type-safe querying directly in C#.
Database is the way to go, definetely. Use an object-relational mapper to talk to a database, this will probably cover 99% of your needs at the beginning.
I prefer to keep the XML-serialization for the scenarios requiring different process intercommunication.
It really depends on what you need to achieve. Databases have a place, but flat files are also perfectly fine for data (via serialization).
So; what problems is the xml giving you? If you can answer that, then you'll know what the pain points are that you want to address. You mention "game", and indeed flat files tend to be more suitable (assuming you want minimum overhead etc), but either would normally do fine. Binary serialization might be more efficient in terms of CPU and disk (but I don't recommend BinaryFormatter - it will bite you when you change the types).
I'm not anti-database (far from it) - I just wanted to present a balanced viewpoint ;-p
You could use an object database (such as db4o). The benefits include: type safety, no ORM, indexed information...
I have a WCF application that at present is using XML based file storage to store data that gets used to generate reports. Besides this processing decisions are made based on information stored in these XML files.
I'm now hitting volumes of around 30 000 text files. This is incredibly taxing, and the application at times comes to a grinding halt.
I've always wanted to swop out the XML DAL in favor of an RDBMS, but project managers simply won't allow it. But they would be willing to look at a serverless solution for example SQLLite. I am really tempted to just dive right in and start using it as a replacement DAL (Data Access Layer).
I would need no more than around 20 tables in the whole solution, and I would expect to get no more than around 20 000 - 100 000 transactions a day, however this is extreme, the real volumes would be less than this in most cases.
Update
I am not expecting a great deal of simultaneous connections, when I say transactions, I essentially mean 1 or 2 clients that make calls and execute against the database in order. At times there might be a possibility of external clients making quick calls to the DB. But the bulk of DB connections will be done by my WCF service, which is a back end scheduled task, not serving 100's of people across an organization.
Another good point is that I only need to retain data for 90 days, so the DB shouldn't grow too big.
My main concerns are:
How reliable is SQLLite? What if the DB File gets corrupted, will I loose all processing Data. How easy is the DB to back up? Will it handle my volumes? And lastly how well does the .net provider work (located here: http://sourceforge.net/projects/sqlite-dotnet2/).
If you have any experience with SQLLite, please post your experiences so I can make aan informed decision to switch or not.
Thanks in advance...
SQLite is as reliable as your OS and hardware.
Its transactional rate is similar to SQL server, and often faster because it's all in process.
The .NET ADO provider works great.
To back up the DB, stop the service and copy the file. If the journal file is present copy it too.
EDIT: SQLite uses UTF-8 by default so with the ADO-NET provider you should be able to avoid losing accents (just so long as you follow the typical XML in string rules).
You could consider Microsoft's Sql Compact Edition.
It's like sqlite, in terms of being a single file embedded database, but has better integration with the .net framework :)
SQLite seems reliable, and even with Microsoft's one, don't expect to receive much support in case of a corrupted database.
Given your transaction volume I'd say the fact that the DB itself is a single monolithic file with only file system locking available could be a problem.
There is no row based locking as far as I know.
I used SQLite with the .Net provider without problems in a monouser enviroment, except for one concern: accents, wich don't showed correcly. The backup is quite simply: the SQLite database is an plain text file. Simply copy it.
I use Sqlite for storing XML config data and have had no problems with it. I use the System.Data.Sqlite provider: http://sqlite.phxsoftware.com/. It's solid and has a good support forum. It also includes a LINQ provider. It also integrates with VS 2008 so you can use Server Explorer to query tables. The examples and documentation also show how to use parameterized commands and transactions for increased performance.
The release candidate for LinqPad now supports Sqlite: http://www.linqpad.net/Beta.aspx.
Sqlite stores everything in a single file, which can be backed up like any other binary file.
Sqlite only supports file-level locking, but shouldn't present a performance problem since it doesn't sound like you'll have a large number of simultaneous transactions.
Unicode shouldn't be a problem. This link in the forum addresses an area where someone was trying to read unicode characters with an incompatible utility http://sqlite.phxsoftware.com/forums/t/954.aspx.
This site shows how to do case-insenitive UTF8 comparisons using System.Data.Sqlite via a custom collator, with Russian characters as an example: http://www.codeproject.com/KB/database/SQLiteUTF8CIComparison.aspx.