what's the fastest way to write XML - c#

I need create XML files frequently and I choose XmlWrite to do the job, I found it spent much time on things like WriteAttributeString ( I need write lots of attributes in some cases), my question is are there some better way to create xml files? Thanks in advance.

Fastest way that I know is two write the document structure as a plain string and parse it into an XDocument object:
string str =
#"<?xml version=""1.0""?>
<!-- comment at the root level -->
<Root>
<Child>Content</Child>
</Root>";
XDocument doc = XDocument.Parse(str);
Console.WriteLine(doc);
Now you will have a structured and ready to use XDocument object where you can populate with your data. Also, you can even parse a fully structured and populated XML as string and start from there. Also you can always use structured XElements like this:
XElement doc =
new XElement("Inventory",
new XElement("Car", new XAttribute("ID", "1000"),
new XElement("PetName", "Jimbo"),
new XElement("Color", "Red"),
new XElement("Make", "Ford")
)
);
doc.Save("InventoryWithLINQ.xml");
Which will generate:
<Inventory>
<Car ID="1000">
<PetName>Jimbo</PetName>
<Color>Red</Color>
<Make>Ford</Make>
</Car>
</Inventory>

XmlSerializer
You only have to define hierarchy of classes you want to serialize, that is all. Additionally you can control the schema through some attributes applied to your properties.

Write it directly to a file via for example a FileStream (through manually created code). This can be made very fast, but also pretty hard to maintain. As always, optimizations comes with a prize tag.
Also, do not forget that "premature optimization is the root of all evil".

Using anonymous types and serializing to XML is an interesting approach as mentioned here

How much is much time...is it 10 ms, 10 sec or 10 min...and how much of the whole process that writes an Xml is it?
Not saying that you shouldn't optimize but imo it's a matter of how much time do you want to spend optimizing that slight bit of a process. In the end the faster you wanna go, the more complex it will be to maintain in this case (personal opinion).

I personally like to use XmlDocument type. It's still a bit heavy when writing nodes but attributes are one-liner, and all in all way simpler that using Xmlwrite.

Related

Most Efficient Way to Parse Only From Specific Keys in a Large XML with XMLReader

Suppose I have a large XML (200 - 1000+ MB) and I'm just looking to get a very small subset of data in the most efficient way.
Given a great solution from one of my previous questions, I ended up coding a solution to use an XMLReader mixed with XMLDocument / XPath.
So, supposing I have the following XML:
<Doc>
<Big_Element1>
... LOTS of sub-elements ...
</Big_Element1>
.....
<Small_Element1>
<Sub_Element1_1 />
...
<Sub_Element1_N />
</Small_Element1>
.....
<Small_Element2>
<Sub_Element2_1 />
...
<Sub_Element2_N />
</Small_Element2>
.....
<Big_ElementN>
.......
</Big_ElementN>
</Doc>
And all I really need is the data from the Small_Elements and the Big_Elements are definitely very large (with many small sub-elements within them) and, so, I'd like to not even enter them if I don't have to.
I came up with this form of solution:
Dim doc As XmlDocument
Dim xNd As XmlNode
Using reader As XmlReader = XmlReader.Create(uri)
reader.MoveToContent()
While reader.Read
If reader.NodeType = XmlNodeType.Element Then
Select Case UCase(reader.Name)
Case "SMALL_ELEMENT1"
doc = New XmlDocument
xNd = doc.ReadNode(reader)
GetSmallElement1Data(xNd)
Case "SMALL_ELEMENT2"
doc = New XmlDocument
xNd = doc.ReadNode(reader)
GetSmallElement2Data(xNd)
End Select
End If
End While
End Using
And GetSmallElement1Data(xNd) & GetSmallElement2Data(xNd) are easy enough for me to deal with since they're small and so I use XPath within them to get the data I need.
But my question is that it seems this reader still goes through the entire XML rather than just skipping over the Big_Elements. Or is it not / this the correct way to have programmed this??
Also, I know this sample code was written in VB.net, but I'm equally comfortable with c# / VB.net solutions.
Any help / thoughts would be great!!!
Thanks!!!
Suppose I have a large XML (200 - 1000+ MB)
XmlReader is the only approach that does not parse the whole document to create an in memory object model.
But my question is that it seems this reader still goes through the entire XML rather than just skipping over the Big_Elements. Or is it not / this the correct way to have programmed this??
The parser still has to read that content: it has no knowledge of what elements you are interested in.
Your only option to skip content (thus not returning to your code from XmlReader.Read) is to call XmlReader.Skip: telling the parser there are no descendants of the current node you are interested in. The parser will still need to read and parse the text to find the matching end node, but without your code being running this will be quicker.

how to change xml string value on fly

I have an xml response string and I want to change a value inside and log it.
<xml>
<ns2:abcd>
<password>sample</password>
</ns2:abcd>
I want to change the password value into encrypted version.
I am have tried using XmlDocument.SelectSingleNode but was thinking is there any better way than this?
Btw you need ns2 namespace to be declared, otherwise your xml will not be valid. After adding namespace definition, you can parse and modify your xml with Linq to Xml:
XDocument xdoc = XDocument.Parse(xml);
var passwordElement = xdoc.XPathSelectElement("//password");
passwordElement.Value = Encrypt((string)passwordElement);
xdoc.Save(path_to_xml);
No - there is no better way than using proper XML classes.
XmlDocument or XDocument would be perfectly fine for this task. If you XML is very large you may want to look into streaming with XmlReader (unlikely necessary in your case).
You might also consider looking into xsd.exe. With xsd.exe, you can deserialize your xml into a type-safe object model. From there, it's easy to manipulate the data.

How to work with an Xml file without loading the whole document in memory?

How to add a new node, update an existing node and remove an existing node of an xml document without loading the whole document in memory?
I'm having an xml document and treating it as the memory of my application so would need to be able to do hundreds of reads and writes quickly without loading the whole document.
its structure is like this:
<spiderMemory>
<profileSite profileId="" siteId="">
<links>
<link>
<originalUrl></originalUrl>
<isCrawled></isCrawled>
<isBroken></isBroken>
<isHtmlPage></isHtmlPage>
<firstAppearedLevel></firstAppearedLevel>
</link>
</links>
</profileSite>
</spiderMemory>
How would that be possible with XDocument?
Thanks
If you want to do hundreds of reads and writes quickly... you might be using the wrong technology. Have you tried using a plain old RDBMS?
If you still need the XML representation, then you can create an export methods to produce it from the database.
XML isn't really a good substitute for this kind of problem. Just saying.
Also... what is wrong with having the whole thing in memory? How big can it possibly get? Say 1GB? Suck it up. Say 1TB? Oops. But then XML is wrong, wrong, wrong anyway in that case ;) way too verbose!
You can use XmlReader, something like this :
FileStream stream = new FileStream("test.xml", FileMode.Open);
XmlReader reader = new XmlTextReader(stream);
while(reader.Read())
{
Console.WriteLine(reader.Value);
}
here is an more elaborate example http://msdn.microsoft.com/en-us/library/cc189056%28v=vs.95%29.aspx
As Daren Thomas said, the proper solution is to use RDBMS instead of XML for your needs. I have a partial solution using XML and Java. Stax parser does not parse the whole document in memory and is a lot faster than DOM (still XML parsing will always be slow). A 'pull parser' (eg Stax) allows u to control what gets parsed. A less cleaner way is to throw an exception in SAX parser when you get the element(s) needed.
To modify, the simplest (but slow) way is to use XPath. Another (untested) option is to treat XML file as text and then 'Search and replace' stuff. Here you can use all kinds of text search optimization.

XDocument or XmlDocument

I am now learning XmlDocument but I've just ran into XDocument and when I try to search the difference or benefits of them I can't find something useful, could you please tell me why you would use one over another ?
If you're using .NET version 3.0 or lower, you have to use XmlDocument aka the classic DOM API. Likewise you'll find there are some other APIs which will expect this.
If you get the choice, however, I would thoroughly recommend using XDocument aka LINQ to XML. It's much simpler to create documents and process them. For example, it's the difference between:
XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("root");
root.SetAttribute("name", "value");
XmlElement child = doc.CreateElement("child");
child.InnerText = "text node";
root.AppendChild(child);
doc.AppendChild(root);
and
XDocument doc = new XDocument(
new XElement("root",
new XAttribute("name", "value"),
new XElement("child", "text node")));
Namespaces are pretty easy to work with in LINQ to XML, unlike any other XML API I've ever seen:
XNamespace ns = "http://somewhere.com";
XElement element = new XElement(ns + "elementName");
// etc
LINQ to XML also works really well with LINQ - its construction model allows you to build elements with sequences of sub-elements really easily:
// Customers is a List<Customer>
XElement customersElement = new XElement("customers",
customers.Select(c => new XElement("customer",
new XAttribute("name", c.Name),
new XAttribute("lastSeen", c.LastOrder)
new XElement("address",
new XAttribute("town", c.Town),
new XAttribute("firstline", c.Address1),
// etc
));
It's all a lot more declarative, which fits in with the general LINQ style.
Now as Brannon mentioned, these are in-memory APIs rather than streaming ones (although XStreamingElement supports lazy output). XmlReader and XmlWriter are the normal ways of streaming XML in .NET, but you can mix all the APIs to some extent. For example, you can stream a large document but use LINQ to XML by positioning an XmlReader at the start of an element, reading an XElement from it and processing it, then moving on to the next element etc. There are various blog posts about this technique, here's one I found with a quick search.
I am surprised none of the answers so far mentions the fact that XmlDocument provides no line information, while XDocument does (through the IXmlLineInfo interface).
This can be a critical feature in some cases (for example if you want to report errors in an XML, or keep track of where elements are defined in general) and you better be aware of this before you happily start to implement using XmlDocument, to later discover you have to change it all.
XmlDocument is great for developers who are familiar with the XML DOM object model. It's been around for a while, and more or less corresponds to a W3C standard. It supports manual navigation as well as XPath node selection.
XDocument powers the LINQ to XML feature in .NET 3.5. It makes heavy use of IEnumerable<> and can be easier to work with in straight C#.
Both document models require you to load the entire document into memory (unlike XmlReader for example).
As mentioned elsewhere, undoubtedly, Linq to Xml makes creation and alteration of xml documents a breeze in comparison to XmlDocument, and the XNamespace ns + "elementName" syntax makes for pleasurable reading when dealing with namespaces.
One thing worth mentioning for xsl and xpath die hards to note is that it IS possible to still execute arbitrary xpath 1.0 expressions on Linq 2 Xml XNodes by including:
using System.Xml.XPath;
and then we can navigate and project data using xpath via these extension methods:
XPathSelectElement - Single Element
XPathSelectElements - Node Set
XPathEvaluate - Scalars and others
For instance, given the Xml document:
<xml>
<foo>
<baz id="1">10</baz>
<bar id="2" special="1">baa baa</bar>
<baz id="3">20</baz>
<bar id="4" />
<bar id="5" />
</foo>
<foo id="123">Text 1<moo />Text 2
</foo>
</xml>
We can evaluate:
var node = xele.XPathSelectElement("/xml/foo[#id='123']");
var nodes = xele.XPathSelectElements(
"//moo/ancestor::xml/descendant::baz[#id='1']/following-sibling::bar[not(#special='1')]");
var sum = xele.XPathEvaluate("sum(//foo[not(moo)]/baz)");
XDocument is from the LINQ to XML API, and XmlDocument is the standard DOM-style API for XML. If you know DOM well, and don't want to learn LINQ to XML, go with XmlDocument. If you're new to both, check out this page that compares the two, and pick which one you like the looks of better.
I've just started using LINQ to XML, and I love the way you create an XML document using functional construction. It's really nice. DOM is clunky in comparison.
Also, note that XDocument is supported in Xbox 360 and Windows Phone OS 7.0.
If you target them, develop for XDocument or migrate from XmlDocument.
I believe that XDocument makes a lot more object creation calls. I suspect that for when you're handling a lot of XML documents, XMLDocument will be faster.
One place this happens is in managing scan data. Many scan tools output their data in XML (for obvious reasons). If you have to process a lot of these scan files, I think you'll have better performance with XMLDocument.

XML Parsing with C#?

I'm working on a project for school that involves a heavy amount of XML Parsing. I'm coding in C#, but I have yet to find a "suitable" method of parsing this XML out. There's several different ways I've looked at, but haven't gotten it right yet; so I have come to you. Ideally, I'm looking for something kind of similar to Beautiful Soup in Python (sort of).
I was wondering if there was any way to convert XML like this:
<config>
<bgimg>C:\\background.png</bgimg>
<nodelist>
<node>
<oid>012345</oid>
<image>C:\\image.png</image>
<label>EHRV</label>
<tooltip>
<header>EHR Viewer</header>
<body>Version 1.0</body>
<icon>C:\\ico\ehrv.png</icon>
</tooltip>
<msgSource>8181:iqLog</msgSource>
</nodes>
</nodeList>
<config>
Into an Array/Hastable/Dictionary/Other like this:
Array
(
["config"] => array
(
["bgimg"] => "C:\\background.png"
["nodelist"] => array
(
["node"] => array
(
["oid"] => "012345"
["image"] => "C:\\image.png"
["label"] => "Version 1.0"
["tooltip"] => array
(
["header"] => "EHR Viewer"
["body"] => "Version 1.0"
["icon"] => "C:\\ico\ehrv.png"
)
["msgSource"] => "8181:iqLog"
)
)
)
)
Even just giving me a decent resource to look through would be really helpful. Thanks a ton.
I would look into Linq to Xml. This gives you an object structure similar to the Xml file that is fairly easy to traverse.
XmlDocument + XPath is pretty much all you ever need in .NET to parse XML.
There must be 1/2 dozen different ways to do this in C#. My favorite uses the System.Xml namespace, particularly System.Xml.Serialization.
You use a command line tool called xsd.exe to turn an xml sample into an xsd schema file (tip: make sure your nodelist has more than one node in the sample), and then use it again on the schema to turn that into a C# class file you can load into your project and easily use with the System.Xml.Serialization.XmlSerializer class.
There's no shame in using an old-fashioned XmlDocument:
var xml = "<config>hello world</config>";
var doc = new System.Xml.XmlDocument();
doc.LoadXml(xml);
var nodes = doc.SelectNodes("/config");
You should defiantly use LINQ to XML, A.K.A. XLINQ. There is a nice tool called LINQPad that you should check out. It has nice features, from a comprehensive examples library to allowing you to directly query an SQL database via Linq to SQL. Best of all, it lets you test your queries before putting them into code.
The best approach will be dictated by what you actually want to do with the data once you've parsed it out.
If you want to pass it around in a structured-but-not-tied-to-XML fashion, XML Serialization is probably your best bet. This will also get you closest to what you've described, though you'll be dealing with an object graph rather than nested maps.
If you are just looking for a convenient format to query for specific bits of data, your best option would be LINQ to Xml. Alternatively, you could use the more traditional classes in the System.Xml namespace (starting with XmlDocument) and query using XPath.
You could also use any of these techniques (or an XmlTextReader) as building blocks to create the datastructure you've described but, barring some special need, I don't think it'll give you any more versatility than what the other approaches will.
You can also use serialization to convert the XML text back into a strongly typed class instance.
I personally like to map XML elements to classes and viceversa using System.Xml.Serialization.XmlSerializer class.
http://msdn.microsoft.com/es-es/library/system.xml.serialization.xmlserializer(VS.80).aspx
I personally use XPathDocument, XPathNavigator and XPathNodeIterator e.g.
XPathDocument xDoc = new XPathDocument(CHOOSE SOURCE!);
XPathNavigator xNav = xDoc.CreateNavigator();
XPathNodeIterator iterator = xNav.Select("nodes/node[#SomePredicate = 'SomeValue']");
while (iterator.MoveNext())
{
string val = iterator.Current.SelectSingleNode("nodeWithValue");
// etc etc
}
Yeah, i agree..
The linq-way is very nice.
And i especially like the way you write XML using it.
It is much more simple using the "objects in objects"-way.

Categories

Resources