I would like to know if there is any simple/fast way to create a complex XML file, when I say complex, I mean more than 20 elements encapsulated. Although this is VB.net, unfortunately literals will not work in this case. Some advice will be most certainly appreciated.
You definitely want to System.Xml.Linq tools from .Net 3.5. Even if you aren't using Linq at all, and not using XML literals, it's still a fantastic library for dynamically building XML in code. But since you say you can't use XML literals, does that mean you're in a .Net 3.0 or before project, and you can't upgrade? That would be very unfortunate, to not be able to use the best tooling.
Assuming you can use System.Xml.Linq, and you just have a silly boss who is against the XML literals syntactic sugar (some sort of language snob perhaps?), then you need to get familiar with the library, centered around the XElement class.
I would strongly suggest watching the following video casts on dnrtv, Part1 and Part2.
XLinq: Create XML from object using LINQ
To play with the XML data if you are using .net version 3.5 its better to user LINQ to XML.
http://www.codeproject.com/Articles/24376/LINQ-to-XML
or
Manipulate XML data with XPath and XmlDocument (C#)
In the thread What’s your favorite “programmer ignorance” pet peeve?, the following answer appears, with a large amount of upvotes:
Programmers who build XML using string concatenation.
My question is, why is building XML via string concatenation (such as a StringBuilder in C#) bad?
I've done this several times in the past, as it's sometimes the quickest way for me to get from point A to point B when to comes to the data structures/objects I'm working with. So far, I have come up with a few reasons why this isn't the greatest approach, but is there something I'm overlooking? Why should this be avoided?
Probably the biggest reason I can think of is you need to escape your strings manually, and most new programmers (and even some experienced programmers) will forget this. It will work great for them when they test it, but then "randomly" their apps will fail when someone throws an & symbol in their input somewhere. Ok, I'll buy this, but it's really easy to prevent the problem (SecurityElement.Escape to name one).
When I do this, I usually omit the XML declaration (i.e. <?xml version="1.0"?>). Is this harmful?
Performance penalties? If you stick with proper string concatenation (i.e. StringBuilder), is this anything to be concerned about? Presumably, a class like XmlWriter will also need to do a bit of string manipulation...
There are more elegant ways of generating XML, such as using XmlSerializer to automatically serialize/deserialize your classes. Ok sure, I agree. C# has a ton of useful classes for this, but sometimes I don't want to make a class for something really quick, like writing out a log file or something. Is this just me being lazy? If I am doing something "real" this is my preferred approach for dealing w/ XML.
You can end up with invalid XML, but you will not find out until you parse it again - and then it is too late. I learned this the hard way.
I think readability, flexibility and scalability are important factors. Consider the following piece of Linq-to-Xml:
XDocument doc = new XDocument(new XDeclaration("1.0","UTF-8","yes"),
new XElement("products", from p in collection
select new XElement("product",
new XAttribute("guid", p.ProductId),
new XAttribute("title", p.Title),
new XAttribute("version", p.Version))));
Can you find a way to do it easier than this? I can output it to a browser, save it to a document, add attributes/elements in seconds and so on ... just by adding couple lines of code. I can do practically everything with it without much of effort.
Actually, I find the biggest problem with string concatenation is not getting it right the first time, but rather keeping it right during code maintenance. All too often, a perfectly-written piece of XML using string concat is updated to meet a new requirement, and string concat code is just too brittle.
As long as the alternatives were XML serialization and XmlDocument, I could see the simplicity argument in favor of string concat. However, ever since XDocument et. al., there is just no reason to use string concat to build XML anymore. See Sander's answer for the best way to write XML.
Another benefit of XDocument is that XML is actually a rather complex standard, and most programmers simply do not understand it. I'm currently dealing with a person who sends me "XML", complete with unquoted attribute values, missing end tags, improper case sensitivity, and incorrect escaping. But because IE accepts it (as HTML), it must be right! Sigh... Anyway, the point is that string concatenation lets you write anything, but XDocument will force standards-complying XML.
I wrote a blog entry back in 2006 moaning about XML generated by string concatenation; the simple point is that if an XML document fails to validate (encoding issues, namespace issues and so on) it is not XML and cannot be treated as such.
I have seen multiple problems with XML documents that can be directly attributed to generating XML documents by hand using string concatenation, and nearly always around the correct use of encoding.
Ask yourself this; what character set am I currently encoding my document with ('ascii7', 'ibm850', 'iso-8859-1' etc)? What will happen if I write a UTF-16 string value into an XML document that has been manually declared as 'ibm850'?
Given the richness of the XML support in .NET with XmlDocument and now especially with XDocument, there would have to be a seriously compelling argument for not using these libraries over basic string concatenation IMHO.
I think that the problem is that you aren't watching the xml file as a logical data storage thing, but as a simple textfile where you write strings.
It's obvious that those libraries do string manipulation for you, but reading/writing xml should be something similar to saving datas into a database or something logically similar
If you need trivial XML then it's fine. Its just the maintainability of string concatenation breaks down when the xml becomes larger or more complex. You pay either at development or at maintenance time. The choice is yours always - but history suggests the maintenance is always more costly and thus anything that makes it easier is worthwhile generally.
You need to escape your strings manually. That's right. But is that all? Sure, you can put the XML spec on your desk and double-check every time that you've considered every possible corner-case when you're building an XML string. Or you can use a library that encapsulates this knowledge...
Another point against using string concatenation is that the hierarchical structure of the data is not clear when reading the code. In #Sander's example of Linq-to-XML for example, it's clear to what parent element the "product" element belongs, to what element the "title" attribute applies, etc.
As you said, it's just awkward to build XML correct using string concatenation, especially now you have XML linq that allows for simple construction of an XML graph and will get namespaces, etc correct.
Obviously context and how it is being used matters, such as in the logging example string.Format can be perfectly acceptable.
But too often people ignore these alternatives when working with complex XML graphs and just use a StringBuilder.
The main reason is DRY: Don't Repeat Yourself.
If you use string concat to do XML, you will constantly be repeating the functions that keep your string as a valid XML document. All the validation would be repeated, or not present. Better to rely on a class that is written with XML validation included.
I've always found creating an XML to be more of a chore than reading in one. I've never gotten the hang of serialization - it never seems to work for my classes - and instead of spending a week trying to get it to work, I can create an XML file using strings in a mere fraction of the time and write it out.
And then I load it in using an XMLReader tree. And if the XML file doesn't read as valid, I go back and find the problem within my saving routines and corret it. But until I get a working save/load system, I refuse to perform mission-critical work until I know my tools are solid.
I guess it comes down to programmer preference. Sure, there are different ways of doing things, for sure, but for developing/testing/researching/debugging, this would be fine. However I would also clean up my code and comment it before handing it off to another programmer.
Because regardless of the fact you're using StringBuilder or XMLNodes to save/read your file, if it is all gibberish mess, nobody is going to understand how it works.
Maybe it won't ever happen, but what if your environment switches to XML 2.0 someday? Your string-concatenated XML may or may not be valid in the new environment, but XDocument will almost certainly do the right thing.
Okay, that's a reach, but especially if your not-quite-standards-compliant XML doesn't specify an XML version declaration... just saying.
In one of the applications we are developing we do lot of XML processing. Currently we use DOM and XPath for most of the processing and we are not much happy with the performance.
At the moment we are considering of moving XML processing logic to LINQ and our initial investigations suggest LINQ performance is much better than DOM.
Before making these changes I would like to know how others feel about this. Is using LINQ a better option? Any disavantages etc...
Thanks,
Shamika
Thank you very much for your answers. I did some performance tests and as expected XmlReader out performed both XmlDocument and LINQ. Please note that this is only for XML reading.
Also if you need the ease of use of LINQ you can implement LINQ XML processing by using some features of the XmlReader and can get much better performance than XmlDocument. Please refer to "rwwilden" comments for more information.
Thanks.
Using DOM (ie. System.Xml.XmlDocument) is likely to be slower, because of the rich navigation support (all those references start to add up), and this overhead will become more significant as the number of nodes increases.
Simpler object models (System.Xml.Linq.XDocument and System.Xml.XPath.XPathDocument) don't have such complex structures, but allow navigation by other means. This might add to CPU overhead but should save memory.
In the end you need to profile (time and space) in your case, and also consider how much real (user perceived) difference it makes.
But, for ultimate performance don't load the whole document into memory at all: use System.Xml.XmlReader and System.Xml.XmlWriter and do everything in a stream. Of course this adds development cost.
.NET has a rich (maybe too rich) set of XML APIs, which is best (or at least, least worst) for you can only be determined by you making the trade-offs which are best for you.
Personally I would avoid XmlDocument and use either XPathDocument (especially to read, and query with XPath) or XDocument (especially to create) where XmlReader/XmlWriter does not give enough of a performance boost to justify.
I'm not sure you would notice a very large performance improvement using LINQ2XML instead of DOM/XPath. For both DOM and LINQ2XML the document that you iterate over, is represented as an in-memory tree.
If performance really is an issue and you have rather large XML documents, you could take a look at the rudimentary XML streaming support that is implemented in the framework (via XStreamingElement). Also check this Microsoft XML team blog entry.
My take on it is that LINQ -> XML is leaps and bounds easier to use than DOM. It's more intuitive to me and much easier to read IMO.
What are the best functions, practices, and/or techniques to read/write XML with C#/.NET?
If you are working with .NET 3.5, LINQ to XML is certainly a very good way to interact with XML.
MSDN Link
There are classes to read XML:
XmlDocument is slow and memory-intensive: it parses the XML and loads it into an in-RAM DOM, which is good if you want to edit it.
XmlReader is less memory-intensive: it scans the XML from front to back, never needing to keep all of it in RAM at once.
Similarly, for writing you can construct an XmlDocument and then save it, or use an XmlWriter.
After I wrote the above, there's now a new set of APIs which are easier to use: i.e. for example the XDocument and XElement classes.
By far the simplest method I've found for dealing with XML in C# is to use the XML Serialization tools. For example: http://www.dotnetjohn.com/articles.aspx?articleid=173.
Essentially, you can define C# classes that match your XML file (in fact, you can have them created for you if you have an XML definition file) and then you simply initialize instances of those classes directly from the XML file. Once you have them as instances, you can manipulate them as you wish and rewrite them back into XML files just as easily.
In a performance critical application XmlReader/XmlWriter are a good choice (see here) for the sake of simplicity which is offered by Linq to XML and XmlDocument.
I've found the MvpXml project very useful in past scenarios where performance is a consideration. There's a wealth of knowledge about good practice within their project pages: http://www.codeplex.com/MVPXML
How do I make the XDocument object save an attribute value of a element with single quotes?
I'm not sure that any of the formatting options for LINQ to XML allow you to specify that. Why do you need to? It's a pretty poor kind of XML handler which is going to care about it...
As long as you use single- and double-quotes in matched pairs and with correct nesting, standards-compliant XML processors won't care which style you use. Your question suggests that you are intending to process your XML output with tools that are not standards-compliant (or perhaps even not XML-aware). This is a dicey proposition at best, though I recognize that work situations and customer demands may not always give you the options of working with the right tools. I have co-workers who use sed and grep to sift through and modify XML files, and they often can get away with that. But if you have any choice at all, I recommend that you handle XML files with XML-aware tools all along the pipeline up to the point where the data is no longer marked up in XML. Doing otherwise will result in systems that are much more fragile than if you used XML-aware tools for all XML processing.
If you can't do that, then JacobE's suggestion is probably your best bet.
If it is absolutely necessary to have single quotes you could write your XML document to a string and then use a string replace to change from single to double quotes.