What's the best way to update xml in a file? - c#

I have been looking all over for the best way to update xml in a file. I have just switched over to using XmlReader (coming from the XDocument method) for speed (not having to read the entire file in memory).
My XmlReader method works perfect and when I need to read a value, it opens the xml, starts reading and ONLY reads up to the node needed, then closes everything. It's very fast and effective.
Now that I have that working I want to make a method that UPDATES xml that is already in place. I would like to keep to the same idea and ONLY read in memory what is needed. So the idea would be, read up until the node I'm changing then use the writer to UPDATE that value.
Everything I have seen has a XmlReader reading while using an XmlWriter writing everything. If I did that I would assume that I would have to let it run through the entire file just like the XDocument would do. As an example this answer.
Is it possible to maybe just use the reader and read up to the node I'm trying to edit then change the innerxml or something?
What's the fastest and most efficient method to update XML in a file?
I would like to only read into memory what I'm trying to edit, not
the whole file.
I would also like to account for nodes that do not
exist (that need to be added).

By design, XmlReader represents a "read-only forward-only" view of the document and cannot be used to update the content. Using the Load method of either XmlDocument, XDocument or XElement, will still cause the entire file to be read in to memory. (Under the hood, XDocument and XElement still use an XmlReader.) However, you can combine using a raw XmlReader and XElement together using the overloads of the Load method which take an XmlReader.
You don't describe your XML structure, but you would want to do something similar to this:
var reader = XmlReader.Create(#"file://c:\test.xml");
var document = XElement.Load(reader);
document.Add(new XElement("branch", "leaves"));
document.Save("Tree.xml");
To find a specific node (for example, with a specific attribute value), you'd want to do something similar to this:
var node = document.Descendants("branch")
.SingleOrDefault(e => (string)e.Attribute("name") == "foo");

Related

Copy or clone a specific Element from XML to the same XML

i want to copy or clone a specific node or Element from XML. I tried many codes, but no one worked. I am program with C#.
Here is my XML, I hope my problem is clear!
This is my XML before
I want this XML
I canĀ“t do this manually, because I need more than 30 Tools more.
it really depends on what you are using to parse the xml.
I will give you info for the two most used classes for parsing an xml in .NET.
XmlDocument: then you can use .CloneNode
XDocument: then you can do something like this:
XElement toCopy = ...;
XElement copy = XElement.Parse(toCopy.ToString());
If you are not familiar with xml processing in .NET, there is enough information in msdn for XDocument and XmlDocument.

structure of selfnodes changes when creating an xml file from another

while creating an xml file from another one by cloning nodes from source to target file in c#, the structure of empty nodes like <noeud></noeud> becomes <noeud/>
i've tried this :
if (nodeSource.InnerText.Equals(""))
XmlNode nodeDestination = NodeSource.CloneNode(false);
is there any method to keep the same structure .
The format <element/> is frequently called a self-closing element. It's 100% valid, and the preferred storage method. If you really care (why?) re-writing to expanded format (<element></element>), you can look at writing your own XmlTextWriter. This article will be helpful for you.
http://blogs.msdn.com/b/nareshjoshi/archive/2009/01/15/how-to-force-non-self-closing-tags-for-empty-nodes-when-using-xslcompiledtransform-class.aspx

How to work with an Xml file without loading the whole document in memory?

How to add a new node, update an existing node and remove an existing node of an xml document without loading the whole document in memory?
I'm having an xml document and treating it as the memory of my application so would need to be able to do hundreds of reads and writes quickly without loading the whole document.
its structure is like this:
<spiderMemory>
<profileSite profileId="" siteId="">
<links>
<link>
<originalUrl></originalUrl>
<isCrawled></isCrawled>
<isBroken></isBroken>
<isHtmlPage></isHtmlPage>
<firstAppearedLevel></firstAppearedLevel>
</link>
</links>
</profileSite>
</spiderMemory>
How would that be possible with XDocument?
Thanks
If you want to do hundreds of reads and writes quickly... you might be using the wrong technology. Have you tried using a plain old RDBMS?
If you still need the XML representation, then you can create an export methods to produce it from the database.
XML isn't really a good substitute for this kind of problem. Just saying.
Also... what is wrong with having the whole thing in memory? How big can it possibly get? Say 1GB? Suck it up. Say 1TB? Oops. But then XML is wrong, wrong, wrong anyway in that case ;) way too verbose!
You can use XmlReader, something like this :
FileStream stream = new FileStream("test.xml", FileMode.Open);
XmlReader reader = new XmlTextReader(stream);
while(reader.Read())
{
Console.WriteLine(reader.Value);
}
here is an more elaborate example http://msdn.microsoft.com/en-us/library/cc189056%28v=vs.95%29.aspx
As Daren Thomas said, the proper solution is to use RDBMS instead of XML for your needs. I have a partial solution using XML and Java. Stax parser does not parse the whole document in memory and is a lot faster than DOM (still XML parsing will always be slow). A 'pull parser' (eg Stax) allows u to control what gets parsed. A less cleaner way is to throw an exception in SAX parser when you get the element(s) needed.
To modify, the simplest (but slow) way is to use XPath. Another (untested) option is to treat XML file as text and then 'Search and replace' stuff. Here you can use all kinds of text search optimization.

Delay-load of XmlDocument

I'm writing an XML document based on a stream of data. This part has been accomplished using the XmlTextWriter and the XElement classes.
Now when I come to read in the document I want to be able to 'delay-load' the XML document so that certain nodes are skipped (i.e. the ones which contain large binary chunks.) and then load them when required.
Is this possible using the XmlDocument class? Or will I have to do things in a more manual way using the XmlTextReader class.
Thanks.
Nick.
Not possible with XmlDocument as the whole document needs to be loaded onto memory before parsed as tree.
XmlTextReader/SAX is the standard solution.
This is not possible with either XmlDocument or XDocument.
note that if you want to use XmlTextReader, it is fwd only. i.e. once youhave skipped it, you cant come back to it.
see MSDN on this

Best way to read, modify, and write XML

My plan is to read in an XML document using my C# program, search for particular entries which I'd like to change, and then write out the modified document. However, I've become unstuck because it's hard to differentiate between elements, whether they start or end using XmlTextReader which I'm using to read in the file. I could do with a bit of advice to put me on the right track.
The document is a HTML document, so as you can imagine, it's quite complicated.
I'd like to search for an element id within the HTML document, so for example look for this and change the src;
<img border="0" src="bigpicture.png" width="248" height="36" alt="" id="lookforthis" />
If it's actually valid XML, and will easily fit in memory, I'd choose LINQ to XML (XDocument, XElement etc) every time. It's by far the nicest XML API I've used. It's easy to form queries, and easy to construct new elements too.
You can use XPath where that's appropriate, or the built-in axis methods (Elements(), Descendants(), Attributes() etc). If you could let us know what specific bits you're having a hard time with, I'd be happy to help work out how to express them in LINQ to XML.
If, on the other hand, this is HTML which isn't valid XML, you'll have a much harder time - because XML APIs generalyl expect to work with valid XML documents. You could use HTMLTidy first of course, but that may have undesirable effects.
For your specific example:
XDocument doc = XDocument.Load("file.xml");
foreach (var img in doc.Descendants("img"))
{
// src will be null if the attribute is missing
string src = (string) img.Attribute("src");
img.SetAttributeValue("src", src + "with-changes");
}
Are the documents you are processing relatively small? If so, you could load them into memory using an XmlDocument object, modify it, and write the changes back out.
XmlDocument doc = new XmlDocument();
doc.Load("path_to_input_file");
// Make changes to the document.
using(XmlTextWriter xtw = new XmlTextWriter("path_to_output_file", Encoding.UTF8)) {
xtw.Formatting = Formatting.Indented; // optional, if you want it to look nice
doc.WriteContentTo(xtw);
}
Depending on the structure of the input XML, this could make your parsing code a bit simpler.
Here's a tool I wrote to modify an IAR EWARM project (ewp) file, adding a linker define to the project. From the command line, you run it with 2 arguments, the input and output file names (*.ewp).
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
namespace ewp_tool
{
class Program
{
static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
XmlNodeList list = doc.SelectNodes("/project/configuration[name='Debug']/settings[name='ILINK']/data/option[name='IlinkConfigDefines']/state");
foreach(XmlElement x in list) {
x.InnerText = "MAIN_APP=1";
}
using (XmlTextWriter xtw = new XmlTextWriter(args[1], Encoding.UTF8))
{
//xtw.Formatting = Formatting.Indented; // leave this out, it breaks EWP!
doc.WriteContentTo(xtw);
}
}
}
}
The structure of the XML looks like this
<U+FEFF><?xml version="1.0" encoding="iso-8859-1"?>
<project>
<fileVersion>2</fileVersion>
<configuration>
<name>Debug</name>
<toolchain>
<name>ARM</name>
</toolchain>
<debug>1</debug>
...
<settings>
<name>ILINK</name>
<archiveVersion>0</archiveVersion>
<data>
...
<option>
<name>IlinkConfigDefines</name>
<state>MAIN_APP=0</state>
</option>
If you have smaller documents which fit in computers memory you can use XmlDocument.
Otherwise you can use XmlReader to iterate through the document.
Using XmlReader you can find out the elements type using:
while (xml.Read()) {
switch xml.NodeType {
case XmlNodeType.Element:
//Do something
case XmlNodeType.Text:
//Do something
case XmlNodeType.EndElement:
//Do something
}
}
For the task in hand - (read existing doc, write, and modify in a formalised way) I'd go with XPathDocument run through an XslCompiledTransform.
Where you can't formalise, don't have pre-existing docs or generally need more adaptive logic, I'd go with LINQ and XDocument like Skeet says.
Basically if the task is transformation then XSLT, if the task is manipulation then LINQ.
My favorite tool for this kind of thing is HtmlAgilityPack. I use it to parse complex HTML documents into LINQ-queryable collections. It is an extremely useful tool for querying and parsing HTML (which is often not valid XML).
For your problem, the code would look like:
var htmlDoc = HtmlAgilityPack.LoadDocument(stringOfHtml);
var images = htmlDoc.DocumentNode.SelectNodes("//img[id=lookforthis]");
if(images != null)
{
foreach (HtmlNode node in images)
{
node.Attributes.Append("alt", "added an alt to lookforthis images.");
}
}
htmlDoc.Save('output.html');
One fairly easy approach would be to create a new XmlDocument, then use the Load() method to populate it. Once you've got the document, you can use CreateNavigator() to get an XPathNavigator object that you can use to find and alter elements in the document. Finally, you can use the Save() method on the XmlDocument to write the changed document back out.
Just start by reading the documentation of the Xml namespace on the MSDN. Then if you have more specific questions, post them here...

Categories

Resources