XML Parsing with C#? - c#

I'm working on a project for school that involves a heavy amount of XML Parsing. I'm coding in C#, but I have yet to find a "suitable" method of parsing this XML out. There's several different ways I've looked at, but haven't gotten it right yet; so I have come to you. Ideally, I'm looking for something kind of similar to Beautiful Soup in Python (sort of).
I was wondering if there was any way to convert XML like this:
<config>
<bgimg>C:\\background.png</bgimg>
<nodelist>
<node>
<oid>012345</oid>
<image>C:\\image.png</image>
<label>EHRV</label>
<tooltip>
<header>EHR Viewer</header>
<body>Version 1.0</body>
<icon>C:\\ico\ehrv.png</icon>
</tooltip>
<msgSource>8181:iqLog</msgSource>
</nodes>
</nodeList>
<config>
Into an Array/Hastable/Dictionary/Other like this:
Array
(
["config"] => array
(
["bgimg"] => "C:\\background.png"
["nodelist"] => array
(
["node"] => array
(
["oid"] => "012345"
["image"] => "C:\\image.png"
["label"] => "Version 1.0"
["tooltip"] => array
(
["header"] => "EHR Viewer"
["body"] => "Version 1.0"
["icon"] => "C:\\ico\ehrv.png"
)
["msgSource"] => "8181:iqLog"
)
)
)
)
Even just giving me a decent resource to look through would be really helpful. Thanks a ton.

I would look into Linq to Xml. This gives you an object structure similar to the Xml file that is fairly easy to traverse.

XmlDocument + XPath is pretty much all you ever need in .NET to parse XML.

There must be 1/2 dozen different ways to do this in C#. My favorite uses the System.Xml namespace, particularly System.Xml.Serialization.
You use a command line tool called xsd.exe to turn an xml sample into an xsd schema file (tip: make sure your nodelist has more than one node in the sample), and then use it again on the schema to turn that into a C# class file you can load into your project and easily use with the System.Xml.Serialization.XmlSerializer class.

There's no shame in using an old-fashioned XmlDocument:
var xml = "<config>hello world</config>";
var doc = new System.Xml.XmlDocument();
doc.LoadXml(xml);
var nodes = doc.SelectNodes("/config");

You should defiantly use LINQ to XML, A.K.A. XLINQ. There is a nice tool called LINQPad that you should check out. It has nice features, from a comprehensive examples library to allowing you to directly query an SQL database via Linq to SQL. Best of all, it lets you test your queries before putting them into code.

The best approach will be dictated by what you actually want to do with the data once you've parsed it out.
If you want to pass it around in a structured-but-not-tied-to-XML fashion, XML Serialization is probably your best bet. This will also get you closest to what you've described, though you'll be dealing with an object graph rather than nested maps.
If you are just looking for a convenient format to query for specific bits of data, your best option would be LINQ to Xml. Alternatively, you could use the more traditional classes in the System.Xml namespace (starting with XmlDocument) and query using XPath.
You could also use any of these techniques (or an XmlTextReader) as building blocks to create the datastructure you've described but, barring some special need, I don't think it'll give you any more versatility than what the other approaches will.

You can also use serialization to convert the XML text back into a strongly typed class instance.

I personally like to map XML elements to classes and viceversa using System.Xml.Serialization.XmlSerializer class.
http://msdn.microsoft.com/es-es/library/system.xml.serialization.xmlserializer(VS.80).aspx

I personally use XPathDocument, XPathNavigator and XPathNodeIterator e.g.
XPathDocument xDoc = new XPathDocument(CHOOSE SOURCE!);
XPathNavigator xNav = xDoc.CreateNavigator();
XPathNodeIterator iterator = xNav.Select("nodes/node[#SomePredicate = 'SomeValue']");
while (iterator.MoveNext())
{
string val = iterator.Current.SelectSingleNode("nodeWithValue");
// etc etc
}

Yeah, i agree..
The linq-way is very nice.
And i especially like the way you write XML using it.
It is much more simple using the "objects in objects"-way.

Related

What is the easiest way of handling xml files with C#?

I'm developing a windows app using C#. I chose xml for data storage.
It is required to read xml file, make small changes, and then write it back to hard disk.
Now, what is the easiest way of doing this?
XLinq is much comfortable than the ordinary Xml, because is much more object oriented, supports linq, has lots of implicit casts and serializes to the standard ISO format.
The best way is to use XML Serialization where it loads the XML into a class (with various classes representing all the elements/attributes). You can then change the values in code and then serialize back to XML.
To create the classes, the best thing to do is to use xsd.exe which will generate the c# classes for you from an existing XML document.
I think the easiest way of doing it - it is using XmlDocument class:
var doc = new XmlDocument();
doc.Load("filename or stream or streamwriter or XmlReader");
//do something
doc.Save("filename or stream or streamwriter or XmlWriter");
I think I found the easiest way, check out this Project in Codeproject. It is easy to use as XML elements are accessed similarly to array elements using name strings as indexes.
Code sample to write bool property to XML:
Xmlconfig xcfg = new Xmlconfig("config.xml", true);
xcfg.Settings[this.Name]["AddDateStamp"]["bool"].boolValue = checkBoxAddStamp.Checked;
xcfg.Save("config.xml");
Sample to read the property:
Xmlconfig xcfg = new Xmlconfig("config.xml", true);
checkBoxAddStamp.Checked = xcfg.Settings[this.Name]["AddDateStamp"]["bool"].boolValue;
To write string use .Value, for int .intValue.
You can use LINQ to read XML Files as described here...
LINQ to read XML
Check out linq to XML

what's the fastest way to write XML

I need create XML files frequently and I choose XmlWrite to do the job, I found it spent much time on things like WriteAttributeString ( I need write lots of attributes in some cases), my question is are there some better way to create xml files? Thanks in advance.
Fastest way that I know is two write the document structure as a plain string and parse it into an XDocument object:
string str =
#"<?xml version=""1.0""?>
<!-- comment at the root level -->
<Root>
<Child>Content</Child>
</Root>";
XDocument doc = XDocument.Parse(str);
Console.WriteLine(doc);
Now you will have a structured and ready to use XDocument object where you can populate with your data. Also, you can even parse a fully structured and populated XML as string and start from there. Also you can always use structured XElements like this:
XElement doc =
new XElement("Inventory",
new XElement("Car", new XAttribute("ID", "1000"),
new XElement("PetName", "Jimbo"),
new XElement("Color", "Red"),
new XElement("Make", "Ford")
)
);
doc.Save("InventoryWithLINQ.xml");
Which will generate:
<Inventory>
<Car ID="1000">
<PetName>Jimbo</PetName>
<Color>Red</Color>
<Make>Ford</Make>
</Car>
</Inventory>
XmlSerializer
You only have to define hierarchy of classes you want to serialize, that is all. Additionally you can control the schema through some attributes applied to your properties.
Write it directly to a file via for example a FileStream (through manually created code). This can be made very fast, but also pretty hard to maintain. As always, optimizations comes with a prize tag.
Also, do not forget that "premature optimization is the root of all evil".
Using anonymous types and serializing to XML is an interesting approach as mentioned here
How much is much time...is it 10 ms, 10 sec or 10 min...and how much of the whole process that writes an Xml is it?
Not saying that you shouldn't optimize but imo it's a matter of how much time do you want to spend optimizing that slight bit of a process. In the end the faster you wanna go, the more complex it will be to maintain in this case (personal opinion).
I personally like to use XmlDocument type. It's still a bit heavy when writing nodes but attributes are one-liner, and all in all way simpler that using Xmlwrite.

XDocument or XmlDocument

I am now learning XmlDocument but I've just ran into XDocument and when I try to search the difference or benefits of them I can't find something useful, could you please tell me why you would use one over another ?
If you're using .NET version 3.0 or lower, you have to use XmlDocument aka the classic DOM API. Likewise you'll find there are some other APIs which will expect this.
If you get the choice, however, I would thoroughly recommend using XDocument aka LINQ to XML. It's much simpler to create documents and process them. For example, it's the difference between:
XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("root");
root.SetAttribute("name", "value");
XmlElement child = doc.CreateElement("child");
child.InnerText = "text node";
root.AppendChild(child);
doc.AppendChild(root);
and
XDocument doc = new XDocument(
new XElement("root",
new XAttribute("name", "value"),
new XElement("child", "text node")));
Namespaces are pretty easy to work with in LINQ to XML, unlike any other XML API I've ever seen:
XNamespace ns = "http://somewhere.com";
XElement element = new XElement(ns + "elementName");
// etc
LINQ to XML also works really well with LINQ - its construction model allows you to build elements with sequences of sub-elements really easily:
// Customers is a List<Customer>
XElement customersElement = new XElement("customers",
customers.Select(c => new XElement("customer",
new XAttribute("name", c.Name),
new XAttribute("lastSeen", c.LastOrder)
new XElement("address",
new XAttribute("town", c.Town),
new XAttribute("firstline", c.Address1),
// etc
));
It's all a lot more declarative, which fits in with the general LINQ style.
Now as Brannon mentioned, these are in-memory APIs rather than streaming ones (although XStreamingElement supports lazy output). XmlReader and XmlWriter are the normal ways of streaming XML in .NET, but you can mix all the APIs to some extent. For example, you can stream a large document but use LINQ to XML by positioning an XmlReader at the start of an element, reading an XElement from it and processing it, then moving on to the next element etc. There are various blog posts about this technique, here's one I found with a quick search.
I am surprised none of the answers so far mentions the fact that XmlDocument provides no line information, while XDocument does (through the IXmlLineInfo interface).
This can be a critical feature in some cases (for example if you want to report errors in an XML, or keep track of where elements are defined in general) and you better be aware of this before you happily start to implement using XmlDocument, to later discover you have to change it all.
XmlDocument is great for developers who are familiar with the XML DOM object model. It's been around for a while, and more or less corresponds to a W3C standard. It supports manual navigation as well as XPath node selection.
XDocument powers the LINQ to XML feature in .NET 3.5. It makes heavy use of IEnumerable<> and can be easier to work with in straight C#.
Both document models require you to load the entire document into memory (unlike XmlReader for example).
As mentioned elsewhere, undoubtedly, Linq to Xml makes creation and alteration of xml documents a breeze in comparison to XmlDocument, and the XNamespace ns + "elementName" syntax makes for pleasurable reading when dealing with namespaces.
One thing worth mentioning for xsl and xpath die hards to note is that it IS possible to still execute arbitrary xpath 1.0 expressions on Linq 2 Xml XNodes by including:
using System.Xml.XPath;
and then we can navigate and project data using xpath via these extension methods:
XPathSelectElement - Single Element
XPathSelectElements - Node Set
XPathEvaluate - Scalars and others
For instance, given the Xml document:
<xml>
<foo>
<baz id="1">10</baz>
<bar id="2" special="1">baa baa</bar>
<baz id="3">20</baz>
<bar id="4" />
<bar id="5" />
</foo>
<foo id="123">Text 1<moo />Text 2
</foo>
</xml>
We can evaluate:
var node = xele.XPathSelectElement("/xml/foo[#id='123']");
var nodes = xele.XPathSelectElements(
"//moo/ancestor::xml/descendant::baz[#id='1']/following-sibling::bar[not(#special='1')]");
var sum = xele.XPathEvaluate("sum(//foo[not(moo)]/baz)");
XDocument is from the LINQ to XML API, and XmlDocument is the standard DOM-style API for XML. If you know DOM well, and don't want to learn LINQ to XML, go with XmlDocument. If you're new to both, check out this page that compares the two, and pick which one you like the looks of better.
I've just started using LINQ to XML, and I love the way you create an XML document using functional construction. It's really nice. DOM is clunky in comparison.
Also, note that XDocument is supported in Xbox 360 and Windows Phone OS 7.0.
If you target them, develop for XDocument or migrate from XmlDocument.
I believe that XDocument makes a lot more object creation calls. I suspect that for when you're handling a lot of XML documents, XMLDocument will be faster.
One place this happens is in managing scan data. Many scan tools output their data in XML (for obvious reasons). If you have to process a lot of these scan files, I think you'll have better performance with XMLDocument.

C#, XML Query Question

I have tons of XML files all containing a the same XML Document, but with different values. But the structure is the same for each file.
Inside this file I have a datetime field.
What is the best, most efficient way to query these XML files? So I can retrieve for example... All files where the datetime field = today's date?
I'm using C# and .net v2. Should I be using XML objects to achieve this or text in file search routines?
Some code examples would be great... or just the general theory, anything would help, thanks...
This depends on the size of those files, and how complex the data actually is. As far as I understand the question, for this kind of XML data, using an XPath query and going through all the files might be the best approach, possibly caching the files in order to lessen the parsing overhead.
Have a look at:
XPathDocument, XmlDocument classes and XPath queries
http://support.microsoft.com/kb/317069
Something like this should do (not tested though):
XmlNamespaceManager nsmgr = new XmlNamespaceManager(new NameTable());
// if required, add your namespace prefixes here to nsmgr
XPathExpression expression = XPathExpression.Compile("//element[#date='20090101']", nsmgr); // your query as XPath
foreach (string fileName in Directory.GetFiles("PathToXmlFiles", "*.xml")) {
XPathDocument doc;
using (XmlTextReader reader = new XmlTextReader(fileName, nsmgr.NameTable)) {
doc = new XPathDocument(reader);
}
if (doc.CreateNavigator().SelectSingleNode(expression) != null) {
// matching document found
}
}
Note: while you can also load a XPathDocument directly from a URI/path, using the reader makes sure that the same nametable is being used as the one used to compile the XPath query. If a different nametable was being used, you'd not get results from the query.
You might look into running XSL queries. See also XSLT Tutorial, XML transformation using Xslt in C#, How to query XML with an XPath expression by using Visual C#.
This question also relates to another on Stack Overflow: Parse multiple XML files with ASP.NET (C#) and return those with particular element. The accepted answer there, though, suggests using Linq.
If it is at all possible to move to C# 3.0 / .NET 3.5, LINQ-to-XML would be by far the easiest option.
With .NET 2.0, you're stuck with either XML objects or XSL.

C# xml read/write/xpath without using XmlDocument

I am refactoring some code in an existing system. The goal is to remove all instances of the XmlDocument to reduce the memory footprint. However, we use XPath to manipulate the xml when certain rules apply. Is there a way to use XPath without using a class that loads the entire document into memory? We've replaced all other instances with XmlTextReader, but those only worked because there is no XPath and the reading is very simple.
Some of the XPath uses values of other nodes to base its decision on. For instance, the value of the message node may be based on the value of the amount node, so there is a need to access multiple nodes at one time.
If your XPATH expression is based on accessing multiple nodes, you're just going to have to read the XML into a DOM. Two things, though. First, you don't have to read all of it into a DOM, just the part you're querying. Second, which DOM you use makes a difference; XPathDocument is read-only and tuned for XPATH query speed, unlike the more general purpose but expensive XmlDocument.
I supose that using System.Xml.Linq.XDocument is also prohibited? Otherwise, it would be a good choice, as it is faster than XmlDocument (as I remember).
Supporting XPath means supporting queries like:
//address[/states/state[#code=current()/#code]='California']
or
//item[#id != preceding-sibling/item/#id]
which require the XPath processor to be able to look everywhere in the document. You're not going to find a forward-only XPath processor.
The way to do this is to use XPathDocument, which can take a stream - therefore you can use StringReader.
This returns the value in a forward read way without the overhead of loading the whole XML DOM into memory with XmlDocument.
Here is an example which returns the value of the first node that satisfies the XPath query:
public string extract(string input_xml)
{
XPathDocument document = new XPathDocument(new StringReader(input_xml));
XPathNavigator navigator = document.CreateNavigator();
XPathNodeIterator node_iterator = navigator.Select(SEARCH_EXPRESSION);
node_iterator.MoveNext();
return node_iterator.Current.Value;
}

Categories

Resources