Is there a standard to represent databases as a XML file - c#

I would like to work with XML for a small project, is there a standard to how we should represent an XML database? Everybody can represent a database using an XML file in many different schemas, I saw different examples on the internet on how to read/write an XML database but non are the same, so is there something I can rely on that many people would use? I doubt that there exists a library which can parse a XML database.
I tried searching on the internet about this problem, but didn't find any article that would help

Related

Data storage approach for different file types

Currently I'm working on an application to import data from different sources (csv and xml). The core data in the files are the same (ID, Name, Coordinate, etc.) but with different structures (xml: in nodes, tables: on rows) and in the xml I have additional data which I need to keep together till the export. Additionally important to say, I need for the visualization and modification just a few data but I need all for the export.
Problem:
I'm locking for a good structure (database or what ever) where I can import the data at run-time. I need to have a reference to the data to visualize and modify them. Afterward I need to export the information to a user specified file type. (consider the image).
Approaches:
I defined a class for the csv schema and mapped the necessary information of the xml to it. The problem occurs when I try to export the data because I have not all data available in the memory.
I defined a class for the xml schema and mapped the information from the csv to it. The problem is in this case, that the storage structure bases on the schema of the xml and if this xml schema changes, I need to change the whole storage structure.
I'm planing now to implement a sql database with entity framework. This is not the easiest way but it seems to be state-of-the-art and updateable. The thing is that I'm not very experienced with databases and the entity framework. That's way I like to know whether this is a good way to solve this problem.
Last thing to say: I would like to store the imported data just once and would like to work with references to this source. This way I can export the information from this source and I'm certain that I have the current data.
Question:
What is the common way to solve such storage problems. Did I missed a good approach? Thank you so much for your help!

Extracting a small subset of data from XMLs

I am writing a C# / VB program that is to be used for reporting data based upon information received in XMLs.
My situation is that I receive many XMLs per month (about 100-200) - Each ranging in size from 10mb to 350mb. For each of these XMLs, I only need a small subset of its data (less than 5% of any one file's entire data) so as to produce the necessary reports.
Also, that subset of data will always be held in the same key-structure (it will exist within multiple keys and at differing levels down, perhaps, but it will always exist within the same key names / the keys containing it will always have the with the same attributes such as "name", etc)
So, my current idea of how to go about doing this is to:
To create a "scraper" that will pull the necessary data from the XMLs using XPath.
Store that small subset of necessary data in a SQL Server table along with file characteristic data stored in a separate table so as to know which file this scraped data came from
Query out the data into a program for reporting it.
My main question here is really what is the best way to scrape that data out?
I am most familiar with XPath, but for multiple files of 200MB in size, I'm afraid of performance issues loading in the entire file.
Other things I have seen / researched are:
Creating an XSLT file to transform / pull from the XML only the data I want
Using Linq to XML
Somehow linking the XMLs to SQL server and then being able to query them directly
Using ADO to query the XMLs from within the program
Doing it using the XMLReader class (rather than loading in each XML entirely)
Maybe there is a native .Net component that does this very well already
Quite honestly, I just have no clue what the standard is given the high number of XMLs and the large variance in file sizes and I'm not familiar with any of the other ways of doing this - such as, for example, linking the XMLs to SQL Server directly / using ADO to query the XML - and, therefore, don't know of their possible benefits / drawbacks.
If any of you have been in a similar situation, I'd really appreciate any kind of pointers in the right direction / at least validation that my method isn't the worst one out there :)
Thanks!!!
As for the memory consumption and performance concerns, a nice feature of the .NET XML APIs is that you can combine XmlReader with XPathDocument or XmlDocument or XElement to only selectively read part of a document into memory to then have the XPath or LINQ to XML features available on that part. LINQ to XML has http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom%28v=vs.110%29.aspx for doing that, DOM/XmlDocument has http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode%28v=vs.110%29.aspx. So depending on your XML structure you might be able to use an XmlReader to read forward through the XML in a fast way without consuming much memory and then, when you have the element you are interested in, you can read it into an XElement (LINQ to XML) or XmlNode (DOM) to then apply LINQ to XML and/or XPath to read out details.

Guidance on data source(s) for my project

I'm a novice programmer. I am full of theoretical knowledge, but I'm behind with the practice. OK. I am trying to make a program for adding categories and descriptions to files. The language is C#, it should run on Windows 7...
1.The categories can contain sub-categories.
I don't want to call them "tags", because these are different. A category can be fx "favorites". But it can also be: "favorites->music->2013". You can create sub-categories, I will use a TreeView on a WinForm for all the operations a user can do with them.
QUESTION: Should I use XML file for the categories?
2.Every file CAN have a description and one or many categories. However:
Even if the file is deleted, I want to keep its description, so that it can be available for later usage.
Folders themselves will be omitted. The folders themselves cannot have nor categories, nor description. But the contained files YES.
I made a very simple SQL Server database containing one table: !http://img832.imageshack.us/img832/3931/finalprojectdb.png
QUESTION: Is this a good idea? Maybe the categories column is better to be of type XML ?
Any advice on what should the best approach in this situation be, is welcomed. Thanks in advance !
SQL is not great for getting nested data at once. You can store things in XML which gives you a lot of flexibility, but you also have to write a parser or deserializer for it. Nowadays people also just write a little Javascript class and use something like Newtonsoft to deserialize it automatically.
If you want a DB solution, you can use something like SQLite embedded in your application if you don't want to install a database separately.
XML is a great design for an app that needs to communicate cross platform (say c# to java), or cross internet, or cross network. But as a way to store data as a subset in a table, not really.
A normalized database is a terrific tool. It can be indexed (xml can not) this allows for rapid querying of data. If you de-normalize your data by embedding xml in a column querying it will be slow and updating / maintaining a pain.
I personally prefer foreign key tables.

xml database, is it good for the following?

I am using a xml file as a database currently in development.
The xml file is going to be modified by multiple users over the network.(Not on a server per say but on my computer where they have access over the network)
I kinda know it is a bad idea to use xml for this but the structure of xml is much better/cleaner/something I like.
Wondering, what are my options ? As in would I be able to continue with the xml with some weird background custom connection ? (Which would verify all the necessary details to allow me to write/read from the xml without issues)
Or am I stuck in using some SQL type of database? If I am stuck in using that would there be some sort of database that is somewhat similar to XML...
EDIT: Reason for liking xml.
Grouped easily for the eyes.
<SomeDocument name="Something">
<URL>bbbb</URL>
<Something>2342</Soemthing>
<Something_That_would_of_been_in_another_database>derp</...>
rather than linking 3-4 tables together...
There are some examples of XML based databases that support multi-user environments. One is the OneNote Revision File Format used by Microsoft OneNote. Although there is a very detailed documentation on that, it is tremendously complicated to support multiple users editing a single file. Basically one could argue that an XML based storage is not viable option when you need multi-user support.
If you are stuck with the XML file you could look into the OneNote file format, but it isn't a traditional XML format, since it also uses a "binary wrapper", meaning that the actual content is defined in XML data within the binary file, but transactions/revisions/free chunks are represented binary. This is necessary since you have to allocate specific portions of the file for users to write to, while you have the file open.
If you don't want to use a dedicated server software, you could use various file-based databases like SQL CE or SQLite.
You would need to deal with concurrency issues if you used a file that several users had access to. Guarantees need to be made for one user not overwriting another user's changes made around the same time.
My suggestion is to use a proper database (e.g. SQL Server) that will handle these issues for you.
I am not familiar with the C# soultions, but for our java application we use eXist-db and query it with xquery. I'm not too familiar with it, but some use markLogic. Still more use Berkley db.
The question whether or not to use a native XML database, an XML-enabled database, a so-called NoSQL database, or any of the more traditional methods can rely on multiple factors. Just to mention two:
Most importantly, do you have your data in XML, and do you want to keep it that way? If so, use an XML-enabled solution.
Do you need scalability or performance? If so, you will need a solution that can deal with that. There are lots of NoSQL and XML databases that are well capable of handling that.
As for concurrency: any database should deal with that natively.
A number of databases have been mentioned already. To single out a few, MarkLogic Server ( www.marklogic.com ) is built to scale and perform upto Terabyte scale (and beyond), and has connectors for amongst others Java and .Net. The solution from 28msec ( www.28msec.com based on Zorba) runs in the cloud, and should scale too.
But most interesting to mention here is that these databases are often used through HTTP / REST interfaces. That allows easy integration from any programming language, and makes interchanging easier too.

Handle xml data in an activerecord pattern way: Any way?

I would like to handle xml data in an activerecord way, so 1 class for each xml structure (I will need an xsd obviusly) and the possibility to do operations like Users.FindAll() like castle activerecord do.
The problem is, obviusly, that those are xml file, not relational databases.
Are there any library to achieve this? If is MS library and not a third party library is better, obviusly.
To understand why I would like to achieve this, I'll explain the program I'm building so you can eventually give me some suggestions if a different approach is better:
The program "output" will be something like a long MS-Word (or pdf) document which will contains information about how a company handles the privacy of their customers, following the local legislation.
I will have, so, a "global" xml file which contains something like Jobs (as defined in law, but law can change so should be editable by the user) that each employee can have in it's company (there will be other data too, this is a generic example).
Then, I will have an xml file for each company the user would like to use this program for. This xml file will have a list of employees where each emplyee have a reference to the Job (chosen from the global xml file).
Obviusly the program will have much more data, but this explains how it works.
I'm still not sure if I must use a relational databse, what really frighten me in case I use one, is that I will have "troubles" in allowing the user to export/import data if he install the program on a new computer. Also I would like to avoid to force the user to install a database on his computer (well, an sqlite-like database could be ok because is on a file).
Any suggestion about this?
Thanks to everyone
Although Linq-to-XML is pretty easy to use, there are many more things to do when it comes to reading and storing related data in a way a RDBMS does. An RDBMS is all about referential integrity, ACID transactions, concurrent users, performance enhancements, to name a few elements that spring to my mind now. Thinking of this daunting task, I think doing this all by yourself is more scary than deploying a database file.
There are some XML-based databases, but I don't know how mature and user friendly they are. I even remember having read of database systems based on plain text files.
I would go for the paved roads and use a relational database, possibly a local database, as you already suggested. Lots of support and tooling available.
How about to use Linq to Xml?

Categories

Resources