Is it possible to query a XML file with SQL?

Is it possible to query a XML file with SQL? - c#

Currently I'm working on a case where we don't want to change to much on a c#/wpf program, but like to add a feature. Currently we allow certain users to add sql queries to a database to retrieve customer data, hereby a custom connection string/ provider name must be specified. With this information it's possible to create the connection and obtain the data with c#.
However we like to add the possibility to allow that user group to query XML files too, with a certain connection string/ provider name. I just had a look for possibities in .net to do that, but can't seem to find a decent way... Is something like this possible? (OleDb/ODBC way maybe?)
edit: For clarity I'd like to state that the solution must be able to fit into the pattern of connecting the datasource with the specified connection string, with the specified provider and execute the SQL Query.
edit2: After reviewing the first three answers I decided to have a look beyond XML. This post seems to illustrate the above case the best (only difference is that a XLS is used in stead of a XML): How to query excel file in C# using a detailed query. Possible solutions with XML still welcome however...
Thanks in advance.

Yes. use Linq2Xml
http://www.hookedonlinq.com/LINQtoXML5MinuteOverview.ashx
http://www.liquidcognition.com/tech-tidbits/linq2xml-example.aspx
// Loading from a file, you can also load from a stream
XDocument loaded = XDocument.Load(#"C:\contacts.xml");
// Query the data and write out a subset of contacts
var q = from c in loaded.Descendants("contact")
where (int)c.Attribute("contactId") < 4
select (string)c.Element("firstName") + “ “ +
(string)c.Element("lastName");
foreach (string name in q)
Console.WriteLine("Customer name = {0}", name);

AFAIK, you cannot use standard sql statements for XML. But what you can use is XQuery.
It's a query language for xml documents.
http://en.wikipedia.org/wiki/XQuery
http://www.w3schools.com/xquery/default.asp
hth

Many XML libraries allow XPath queries to be issued against an XML document, but the syntax is very different from SQL and the semantics are very different. Additionally, XPath doesn't really produce result sets in the way that SQL does - it returns parts of an XML document or the contents of fields. I'd say that you will probably encounter a significant impedance mismatch if the rest of the application is geared towards SQL result sets.
XPath is also much dumber than SQL, although there is another language (XQuery) that is much cleverer. However, good XQuery support is much less common in XML parsing libraries. XQuery works quite differently to SQL, so your users may also have trouble understanding it.
Many DBMS platforms (including SQL Server) also have a native XML data type that supports embedding Xpath expressions in SQL queries. Using CROSS APPLY you can do join operations to flatten hierarchical data structures into a SQL result set. However, this is quite fiddly and your users may have trouble getting it to work properly.
In short, I think that adding this sort of facility to query XML documents will probably not work very well.
One option might be to build a facility that shreds the XML documents and populates the contents into a database with the same structure as your application. This is reasonably straightforward to implement and would not require your users to learn a new paradigm.

Linq is an SQL like language for .NET that allows you to write SQL style statements for querying lots of things. It specifically allows you to do it for XML documents.
This article has a pretty good overview of Linq to XML.
Here is an example of how it looks / works
var q = from c in xmlSource.contact
where c.contactId < 4
select c.firstName + " " + c.lastName;

Related

Fastest efficient way to create XML document .NET/Oracle and return to web client

Does anyone know which is the fastest most efficient way to create XML documents in an Oracle/.NET environment.
There are two philosophies:
Do the coding in an Oracle Package and Use Oracles native XML abilities to return an XML document after querying the data from the DB, create the document by then looping through your Query result setting nodes like so (addXMLNode(doc, nContact, 'ROW_ID', rec_con.ROW_ID);)
Just query the Data from Oracle and use .NET on the data looping through the data reader and create your XML document using .NET XML classes. Essentially letting the DB serve the data, and DOM XML creation is done in .NET.
Assuming no knowledge difference in the two practices, does someone know if one is more efficient, faster, or better than the other one? Please don't give me your "favorite" way to handle it. An "our query was slow so we moved it into the code", or vice versa real world example would give me some direction for code refactoring and application performance improvement.
Thanks.

Extracting a small subset of data from XMLs

I am writing a C# / VB program that is to be used for reporting data based upon information received in XMLs.
My situation is that I receive many XMLs per month (about 100-200) - Each ranging in size from 10mb to 350mb. For each of these XMLs, I only need a small subset of its data (less than 5% of any one file's entire data) so as to produce the necessary reports.
Also, that subset of data will always be held in the same key-structure (it will exist within multiple keys and at differing levels down, perhaps, but it will always exist within the same key names / the keys containing it will always have the with the same attributes such as "name", etc)
So, my current idea of how to go about doing this is to:
To create a "scraper" that will pull the necessary data from the XMLs using XPath.
Store that small subset of necessary data in a SQL Server table along with file characteristic data stored in a separate table so as to know which file this scraped data came from
Query out the data into a program for reporting it.
My main question here is really what is the best way to scrape that data out?
I am most familiar with XPath, but for multiple files of 200MB in size, I'm afraid of performance issues loading in the entire file.
Other things I have seen / researched are:
Creating an XSLT file to transform / pull from the XML only the data I want
Using Linq to XML
Somehow linking the XMLs to SQL server and then being able to query them directly
Using ADO to query the XMLs from within the program
Doing it using the XMLReader class (rather than loading in each XML entirely)
Maybe there is a native .Net component that does this very well already
Quite honestly, I just have no clue what the standard is given the high number of XMLs and the large variance in file sizes and I'm not familiar with any of the other ways of doing this - such as, for example, linking the XMLs to SQL Server directly / using ADO to query the XML - and, therefore, don't know of their possible benefits / drawbacks.
If any of you have been in a similar situation, I'd really appreciate any kind of pointers in the right direction / at least validation that my method isn't the worst one out there :)
Thanks!!!

As for the memory consumption and performance concerns, a nice feature of the .NET XML APIs is that you can combine XmlReader with XPathDocument or XmlDocument or XElement to only selectively read part of a document into memory to then have the XPath or LINQ to XML features available on that part. LINQ to XML has http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom%28v=vs.110%29.aspx for doing that, DOM/XmlDocument has http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode%28v=vs.110%29.aspx. So depending on your XML structure you might be able to use an XmlReader to read forward through the XML in a fast way without consuming much memory and then, when you have the element you are interested in, you can read it into an XElement (LINQ to XML) or XmlNode (DOM) to then apply LINQ to XML and/or XPath to read out details.

Algorithm to Parse CAML xml

I have a particular requirement to parse Sharepoint CAML and produce something else.
The first scenario is to produce an SQL Query.
Are there any best practice dev tools/algorithms when parsing XML? I am thinking of using Linq To Xml as my tool. But not sure if there is a better approach for such type of parsing.
Another approach I like is that the one used by the OpenXML SDK where they have built a strongly typed engine around the Open XML format. Perhaps I could build something similar but it could be a little far fetched.
Any assistance (perhaps previous experience on xml parsing) would be greatly appreciated.

When reading any structure in any format, you have to find the optimal traversal mechanism. In this case, since CAML Logical Operations are binary (have 2 children only), a particular traversal mechanism can be adopted. Linq to XML is a powerful tool for traversing XML.

I normally use Entity Framework for my SQL queries which is great as it allows me to dynamically construct queries in a strongly-typed and maintainable fashion.
However, I'm on a project at the moment which utilises spatial queries. Most of my queries will output a resultset order by time or distance from a given co-ordinate. However, I have found that ordering by STDistance slows the query down 10 fold. (actually it slows down if I join on another table in addition to the "order by")
I have managed to optimize the query myself and got the performance back to where it should be, however, this query cannot be produced by Entity Framework.
So I could end up having a set of "order by time" queries generated by EF and then another set of "order by distance" queries as stored procedures in SQL Server. The trouble is that, basically, this latter set of queries would have to be created via string concatenation (either in an SP or C#).
I have been trying to get away from sql string concatenation for years now and these ORM frameworks are great for 99% of queries, however, I always find myself having to go back to string concanenation so I get the most optimal queries sent to the server. This is a maintenance nightmare though.
String concatenation was solved in ASP.NET with templating engines, which essentially allow you to build html strings. Does anyone know such a solution for SQL strings? Although it's a bit messy in some respects it would allow for the most optimal queries. In my view this would be better than
String concat in a stored proc
String concat in C#
Masses of duplicate code in stored procs covering all possible input parameters
LINQ queries which can create sub-optimal SQL
I'd love to know your thoughts about this general problem and what you think of my proposed solution.
thanks
Kris

Have you checked out the latest Entity Framework Beta? It is supposed to have support for spatial data types.
Also, if you want to build SQL queries dynamically, check out PetaPoco's SQL Builder. Some examples from the site:
Example 1:
var id=123;
var a=db.Query<article>(PetaPoco.Sql.Builder
.Append("SELECT * FROM articles")
.Append("WHERE article_id=#0", id)
.Append("AND date_created<#0", DateTime.UtcNow)
)
Example 2:
var id=123;
var sql=PetaPoco.Sql.Builder
.Append("SELECT * FROM articles")
.Append("WHERE article_id=#0", id);
if (start_date.HasValue)
sql.Append("AND date_created>=#0", start_date.Value);
if (end_date.HasValue)
sql.Append("AND date_created<=#0", end_date.Value);
var a=db.Query<article>(sql)
Example 3:
var sql=PetaPoco.Sql.Builder()
.Select("*")
.From("articles")
.Where("date_created < #0", DateTime.UtcNow)
.OrderBy("date_created DESC");

Maybe it would be an option to use a T4 template to generate the views (and corresponding queries). You could modify either the template or override the generated output selectively (although this may possibly also require you to modify the template).
Microsoft provides a T4 template for this purpose (assuming you're not using code-first, as I'm not sure the equivalent exists for that scenario).
Incidentally, pre-compiling the views also provides for faster startup as the views don't have to be generated at run-time.

Hard to understand the question.
Any kind of EF orm will always be slower than handcrafted sql. So we're left with manually creating and managing those sql queries. This can be done
manually write the procs
write "smart procs" with sql string concatenation in them
use templating engine for generating those procs at compile time
Write your own linq to sql provider for runtime generation of queries
all those have upsides and downsides, but if you have good unit test coverage it should protect you from obvious errors where someone has renamed the field in the database.

xml database, is it good for the following?

I am using a xml file as a database currently in development.
The xml file is going to be modified by multiple users over the network.(Not on a server per say but on my computer where they have access over the network)
I kinda know it is a bad idea to use xml for this but the structure of xml is much better/cleaner/something I like.
Wondering, what are my options ? As in would I be able to continue with the xml with some weird background custom connection ? (Which would verify all the necessary details to allow me to write/read from the xml without issues)
Or am I stuck in using some SQL type of database? If I am stuck in using that would there be some sort of database that is somewhat similar to XML...
EDIT: Reason for liking xml.
Grouped easily for the eyes.
<SomeDocument name="Something">
<URL>bbbb</URL>
<Something>2342</Soemthing>
<Something_That_would_of_been_in_another_database>derp</...>
rather than linking 3-4 tables together...

There are some examples of XML based databases that support multi-user environments. One is the OneNote Revision File Format used by Microsoft OneNote. Although there is a very detailed documentation on that, it is tremendously complicated to support multiple users editing a single file. Basically one could argue that an XML based storage is not viable option when you need multi-user support.
If you are stuck with the XML file you could look into the OneNote file format, but it isn't a traditional XML format, since it also uses a "binary wrapper", meaning that the actual content is defined in XML data within the binary file, but transactions/revisions/free chunks are represented binary. This is necessary since you have to allocate specific portions of the file for users to write to, while you have the file open.
If you don't want to use a dedicated server software, you could use various file-based databases like SQL CE or SQLite.

You would need to deal with concurrency issues if you used a file that several users had access to. Guarantees need to be made for one user not overwriting another user's changes made around the same time.
My suggestion is to use a proper database (e.g. SQL Server) that will handle these issues for you.

I am not familiar with the C# soultions, but for our java application we use eXist-db and query it with xquery. I'm not too familiar with it, but some use markLogic. Still more use Berkley db.

The question whether or not to use a native XML database, an XML-enabled database, a so-called NoSQL database, or any of the more traditional methods can rely on multiple factors. Just to mention two:
Most importantly, do you have your data in XML, and do you want to keep it that way? If so, use an XML-enabled solution.
Do you need scalability or performance? If so, you will need a solution that can deal with that. There are lots of NoSQL and XML databases that are well capable of handling that.
As for concurrency: any database should deal with that natively.
A number of databases have been mentioned already. To single out a few, MarkLogic Server ( www.marklogic.com ) is built to scale and perform upto Terabyte scale (and beyond), and has connectors for amongst others Java and .Net. The solution from 28msec ( www.28msec.com based on Zorba) runs in the cloud, and should scale too.
But most interesting to mention here is that these databases are often used through HTTP / REST interfaces. That allows easy integration from any programming language, and makes interchanging easier too.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.