Algorithm to Parse CAML xml - c#

I have a particular requirement to parse Sharepoint CAML and produce something else.
The first scenario is to produce an SQL Query.
Are there any best practice dev tools/algorithms when parsing XML? I am thinking of using Linq To Xml as my tool. But not sure if there is a better approach for such type of parsing.
Another approach I like is that the one used by the OpenXML SDK where they have built a strongly typed engine around the Open XML format. Perhaps I could build something similar but it could be a little far fetched.
Any assistance (perhaps previous experience on xml parsing) would be greatly appreciated.

When reading any structure in any format, you have to find the optimal traversal mechanism. In this case, since CAML Logical Operations are binary (have 2 children only), a particular traversal mechanism can be adopted. Linq to XML is a powerful tool for traversing XML.

Related

Fastest efficient way to create XML document .NET/Oracle and return to web client

Does anyone know which is the fastest most efficient way to create XML documents in an Oracle/.NET environment.
There are two philosophies:
Do the coding in an Oracle Package and Use Oracles native XML abilities to return an XML document after querying the data from the DB, create the document by then looping through your Query result setting nodes like so (addXMLNode(doc, nContact, 'ROW_ID', rec_con.ROW_ID);)
Just query the Data from Oracle and use .NET on the data looping through the data reader and create your XML document using .NET XML classes. Essentially letting the DB serve the data, and DOM XML creation is done in .NET.
Assuming no knowledge difference in the two practices, does someone know if one is more efficient, faster, or better than the other one? Please don't give me your "favorite" way to handle it. An "our query was slow so we moved it into the code", or vice versa real world example would give me some direction for code refactoring and application performance improvement.
Thanks.

Extracting a small subset of data from XMLs

I am writing a C# / VB program that is to be used for reporting data based upon information received in XMLs.
My situation is that I receive many XMLs per month (about 100-200) - Each ranging in size from 10mb to 350mb. For each of these XMLs, I only need a small subset of its data (less than 5% of any one file's entire data) so as to produce the necessary reports.
Also, that subset of data will always be held in the same key-structure (it will exist within multiple keys and at differing levels down, perhaps, but it will always exist within the same key names / the keys containing it will always have the with the same attributes such as "name", etc)
So, my current idea of how to go about doing this is to:
To create a "scraper" that will pull the necessary data from the XMLs using XPath.
Store that small subset of necessary data in a SQL Server table along with file characteristic data stored in a separate table so as to know which file this scraped data came from
Query out the data into a program for reporting it.
My main question here is really what is the best way to scrape that data out?
I am most familiar with XPath, but for multiple files of 200MB in size, I'm afraid of performance issues loading in the entire file.
Other things I have seen / researched are:
Creating an XSLT file to transform / pull from the XML only the data I want
Using Linq to XML
Somehow linking the XMLs to SQL server and then being able to query them directly
Using ADO to query the XMLs from within the program
Doing it using the XMLReader class (rather than loading in each XML entirely)
Maybe there is a native .Net component that does this very well already
Quite honestly, I just have no clue what the standard is given the high number of XMLs and the large variance in file sizes and I'm not familiar with any of the other ways of doing this - such as, for example, linking the XMLs to SQL Server directly / using ADO to query the XML - and, therefore, don't know of their possible benefits / drawbacks.
If any of you have been in a similar situation, I'd really appreciate any kind of pointers in the right direction / at least validation that my method isn't the worst one out there :)
Thanks!!!
As for the memory consumption and performance concerns, a nice feature of the .NET XML APIs is that you can combine XmlReader with XPathDocument or XmlDocument or XElement to only selectively read part of a document into memory to then have the XPath or LINQ to XML features available on that part. LINQ to XML has http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom%28v=vs.110%29.aspx for doing that, DOM/XmlDocument has http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.readnode%28v=vs.110%29.aspx. So depending on your XML structure you might be able to use an XmlReader to read forward through the XML in a fast way without consuming much memory and then, when you have the element you are interested in, you can read it into an XElement (LINQ to XML) or XmlNode (DOM) to then apply LINQ to XML and/or XPath to read out details.

import - xpath or convert to .csv?

Im looking for some advice on how I should go about a solution. I have an import to write using c#. The data comes from an xml file containing ~30000 records each with ~10 nodes for differnet data. My initial thought would be to create a node list of records ids(one of the nodes is a unique id). Then loop through the node list and use xpath to get the rest of the data for the record. My other thought was to convert the xml file into .cvs format and read it that way. Before i dive head first into one or the other any advice, pros/cons or suggestions? Thanks in advance
Go with whichever you feel more comfortable with.
Personally, I would use XDocument and LINQ to XML to query the XML directly.
Transforming to CSV has its own pitfalls, if you don't adhere to the rules (quoting fields, line breaks within fields etc...).
I agree with the above poster that you want to use LINQ to XML if possible, however if you are on an older version of the framework you could use an XMLDocument and the SelectNodes/SelectSingleNode methods. If you do that however make sure you use a NamespaceManager or you won't return anything from your methods unless your XML has no namespaces etc.
That got me a bunch of times.

Lots of xml files - need search and filter

I have data in xml files (about 5000), need to search and filter this data. It would be wonderful if I can use Fuzzy Search. Suppose need use index? index of the attributes? what should I do use xml database, something like lucene? I prefer .net.
For a recordset that small, you may find that XSLT meets your needs.
See here for an XSLT Tutorial.
Have you looked at LINQ to XML? Sounds like it would be perfect for this situation, allowing you to query the XML with a SQL-like syntax, and the performance shouldn't be too bad over ~5000 records.
Edit: Sorry, I misread the question—it appears you meant 5000 XML files, not 5000 records. Still might be worth a look though.

Is it possible to query a XML file with SQL?

Currently I'm working on a case where we don't want to change to much on a c#/wpf program, but like to add a feature. Currently we allow certain users to add sql queries to a database to retrieve customer data, hereby a custom connection string/ provider name must be specified. With this information it's possible to create the connection and obtain the data with c#.
However we like to add the possibility to allow that user group to query XML files too, with a certain connection string/ provider name. I just had a look for possibities in .net to do that, but can't seem to find a decent way... Is something like this possible? (OleDb/ODBC way maybe?)
edit: For clarity I'd like to state that the solution must be able to fit into the pattern of connecting the datasource with the specified connection string, with the specified provider and execute the SQL Query.
edit2: After reviewing the first three answers I decided to have a look beyond XML. This post seems to illustrate the above case the best (only difference is that a XLS is used in stead of a XML): How to query excel file in C# using a detailed query. Possible solutions with XML still welcome however...
Thanks in advance.
Yes. use Linq2Xml
http://www.hookedonlinq.com/LINQtoXML5MinuteOverview.ashx
http://www.liquidcognition.com/tech-tidbits/linq2xml-example.aspx
// Loading from a file, you can also load from a stream
XDocument loaded = XDocument.Load(#"C:\contacts.xml");
// Query the data and write out a subset of contacts
var q = from c in loaded.Descendants("contact")
where (int)c.Attribute("contactId") < 4
select (string)c.Element("firstName") + “ “ +
(string)c.Element("lastName");
foreach (string name in q)
Console.WriteLine("Customer name = {0}", name);
AFAIK, you cannot use standard sql statements for XML. But what you can use is XQuery.
It's a query language for xml documents.
http://en.wikipedia.org/wiki/XQuery
http://www.w3schools.com/xquery/default.asp
hth
Many XML libraries allow XPath queries to be issued against an XML document, but the syntax is very different from SQL and the semantics are very different. Additionally, XPath doesn't really produce result sets in the way that SQL does - it returns parts of an XML document or the contents of fields. I'd say that you will probably encounter a significant impedance mismatch if the rest of the application is geared towards SQL result sets.
XPath is also much dumber than SQL, although there is another language (XQuery) that is much cleverer. However, good XQuery support is much less common in XML parsing libraries. XQuery works quite differently to SQL, so your users may also have trouble understanding it.
Many DBMS platforms (including SQL Server) also have a native XML data type that supports embedding Xpath expressions in SQL queries. Using CROSS APPLY you can do join operations to flatten hierarchical data structures into a SQL result set. However, this is quite fiddly and your users may have trouble getting it to work properly.
In short, I think that adding this sort of facility to query XML documents will probably not work very well.
One option might be to build a facility that shreds the XML documents and populates the contents into a database with the same structure as your application. This is reasonably straightforward to implement and would not require your users to learn a new paradigm.
Linq is an SQL like language for .NET that allows you to write SQL style statements for querying lots of things. It specifically allows you to do it for XML documents.
This article has a pretty good overview of Linq to XML.
Here is an example of how it looks / works
var q = from c in xmlSource.contact
where c.contactId < 4
select c.firstName + " " + c.lastName;

Categories

Resources