structure of selfnodes changes when creating an xml file from another - c#

while creating an xml file from another one by cloning nodes from source to target file in c#, the structure of empty nodes like <noeud></noeud> becomes <noeud/>
i've tried this :
if (nodeSource.InnerText.Equals(""))
XmlNode nodeDestination = NodeSource.CloneNode(false);
is there any method to keep the same structure .

The format <element/> is frequently called a self-closing element. It's 100% valid, and the preferred storage method. If you really care (why?) re-writing to expanded format (<element></element>), you can look at writing your own XmlTextWriter. This article will be helpful for you.
http://blogs.msdn.com/b/nareshjoshi/archive/2009/01/15/how-to-force-non-self-closing-tags-for-empty-nodes-when-using-xslcompiledtransform-class.aspx

Related

Is it possible to convert a node's InnerXml to a new collection of nodes? (C#)

I have an xml file that I am trying to parse and load content from the file into a collection of custom classes.
I need to use an XMLnode's InnerXml and extract (or create) an additional collection of XMLNodes from that string.
I've googled as well as I can to find a solution, but nothing quite fits what I'm after. Is it possible to do that?
thanks

What's the best way to update xml in a file?

I have been looking all over for the best way to update xml in a file. I have just switched over to using XmlReader (coming from the XDocument method) for speed (not having to read the entire file in memory).
My XmlReader method works perfect and when I need to read a value, it opens the xml, starts reading and ONLY reads up to the node needed, then closes everything. It's very fast and effective.
Now that I have that working I want to make a method that UPDATES xml that is already in place. I would like to keep to the same idea and ONLY read in memory what is needed. So the idea would be, read up until the node I'm changing then use the writer to UPDATE that value.
Everything I have seen has a XmlReader reading while using an XmlWriter writing everything. If I did that I would assume that I would have to let it run through the entire file just like the XDocument would do. As an example this answer.
Is it possible to maybe just use the reader and read up to the node I'm trying to edit then change the innerxml or something?
What's the fastest and most efficient method to update XML in a file?
I would like to only read into memory what I'm trying to edit, not
the whole file.
I would also like to account for nodes that do not
exist (that need to be added).
By design, XmlReader represents a "read-only forward-only" view of the document and cannot be used to update the content. Using the Load method of either XmlDocument, XDocument or XElement, will still cause the entire file to be read in to memory. (Under the hood, XDocument and XElement still use an XmlReader.) However, you can combine using a raw XmlReader and XElement together using the overloads of the Load method which take an XmlReader.
You don't describe your XML structure, but you would want to do something similar to this:
var reader = XmlReader.Create(#"file://c:\test.xml");
var document = XElement.Load(reader);
document.Add(new XElement("branch", "leaves"));
document.Save("Tree.xml");
To find a specific node (for example, with a specific attribute value), you'd want to do something similar to this:
var node = document.Descendants("branch")
.SingleOrDefault(e => (string)e.Attribute("name") == "foo");

How to work with an Xml file without loading the whole document in memory?

How to add a new node, update an existing node and remove an existing node of an xml document without loading the whole document in memory?
I'm having an xml document and treating it as the memory of my application so would need to be able to do hundreds of reads and writes quickly without loading the whole document.
its structure is like this:
<spiderMemory>
<profileSite profileId="" siteId="">
<links>
<link>
<originalUrl></originalUrl>
<isCrawled></isCrawled>
<isBroken></isBroken>
<isHtmlPage></isHtmlPage>
<firstAppearedLevel></firstAppearedLevel>
</link>
</links>
</profileSite>
</spiderMemory>
How would that be possible with XDocument?
Thanks
If you want to do hundreds of reads and writes quickly... you might be using the wrong technology. Have you tried using a plain old RDBMS?
If you still need the XML representation, then you can create an export methods to produce it from the database.
XML isn't really a good substitute for this kind of problem. Just saying.
Also... what is wrong with having the whole thing in memory? How big can it possibly get? Say 1GB? Suck it up. Say 1TB? Oops. But then XML is wrong, wrong, wrong anyway in that case ;) way too verbose!
You can use XmlReader, something like this :
FileStream stream = new FileStream("test.xml", FileMode.Open);
XmlReader reader = new XmlTextReader(stream);
while(reader.Read())
{
Console.WriteLine(reader.Value);
}
here is an more elaborate example http://msdn.microsoft.com/en-us/library/cc189056%28v=vs.95%29.aspx
As Daren Thomas said, the proper solution is to use RDBMS instead of XML for your needs. I have a partial solution using XML and Java. Stax parser does not parse the whole document in memory and is a lot faster than DOM (still XML parsing will always be slow). A 'pull parser' (eg Stax) allows u to control what gets parsed. A less cleaner way is to throw an exception in SAX parser when you get the element(s) needed.
To modify, the simplest (but slow) way is to use XPath. Another (untested) option is to treat XML file as text and then 'Search and replace' stuff. Here you can use all kinds of text search optimization.

How to get the text from XML with position in the XML file?

I want to parse HTML (you can assume as a XML, converted via Tidy) and get all the text nodes (which means nodes in Body tag that are visible) and their location in the XML file. Location means the text position in the flat XML file.
XmlTextReader implements IXmlLineInfo - if you look at the docs for IXmlLineInfo it gives an example of reading an XML file and reporting the location of each node.
EDIT: For those saying it's irrelevant, it may well be irrelevant to the XML - but quite possibly not to a human. If you're trying to tell people where to look in the XML for particular bits, it can be very helpful to report line numbers and positions.
The SAX specification for reading XML (which almost all XML tools implement) provides a ContentHandler with a Locator which allows you to get the line and character (column) number.
int getColumnNumber()
Return the column number where the current document event ends.
int getLineNumber()
Return the line number where the current document event ends.
(I missed the requirement for C#. The example above is for Java but I will try to find the corresponding C# interface).
The event could be a string of characters.
SAX for .NET is described in:
http://saxdotnet.sourceforge.net/
You should not rely on text position in an XML file(whitespace is completely ignored by any sane parser). What you can (and should) do is use XPath to identify the nodes you are interested in, and then take out the text from those nodes. If you're interested in just the text nodes, then the query "//text()" will grab all the text nodes.

what is the best way to update an XML node value in C#?

My function iterates through every node of an instance of an XMLDocument. It checks to see if the current node's name is in a lookup list. If it is, it applies appropriate validation to the value of the current node.
When the validation method indicates that the value has been changed, I want to replace the value in the original document with the updated value.
I think the easiest way to achieve this might be to write out to an XMLTextWriter as I process each node in the original XMLDocument, either writing out the original or modified node and value as appropriate. This method would rely on determining whether the current node has any children, or is a stand-alone node.
Is there a better way I could update the values in the original document? I need to end up with the complete XMLDocument, but with updated node values, where appropriate.
Thanks in advance.
Can you not modify the existing nodes (which ate already in the correct structure and in an XMLDocument, then re-serialise the XMLDocument? If the nodes are simple text containters then the
.InnerText
property is the one you want.
I know I always go back to this but this sounds like an example where clever use of apply-templates and ExtensionObjects in XSLT would be efficient.
That said XMLDocument is optimised for modification, so if you were going with a pure programmatic solution I would modify the object directly, not create a new Writer.

Categories

Resources