How to access 'row' and 'column' of nodes when parsing XML - c#

In C#, is there a way to work out an XmlNode's position in the original XML 'text', when the document is loaded from a file or string? I want to be able to report problems with an XML document that I am processing.
e.g:
"Error in foo.xml - value of attribute 'pet' must be a species of fluffy mammal, at line 27, column 13 [snippet of original XML text here...]"
Edit:
The checks can't be done using schema validation. Here is another, less frivolous sample error message to illustrate: "specified addin type 'Addins.LogWindow' must be public"

Well you're not supposed to write your own XmlParser but in the Compact Framework we have no choice as XmlDocument is as slow as the Dalai Lama on ketamine so we use an XmlReader when parsing an Xml file.
We throw an exception whenever we find something messed up or inconsistent and we pass the XmlReader to the exception. We then can extract the line position by casting the XmlReader into a IXmlLineInfo object which contains properties for the line and position.
Don't know if this will help. Generally I wouldn't be writing my own XmlParser on desktop which is why im reticent to suggest this as a solution.

Would a XML Schema work for you?
http://support.microsoft.com/kb/318504

Sorry, there are very few DOM implementations that will remember the original parsed location of a Node for you. Most only report any position information on a parsing error. For example in DOM Level 3 LS you only get a reference to a DOMLocator when there is a DOMError.
The only imp I know of that keeps track after parsing is pxdom, and that's for Python so not of much use to you.

Related

Is there a way to have an XmlReader preserve a character reference as text rather than converting it?

I'm using an xml reader to parse some xml and I'm wondering if I can have it read in a character entity reference as straight text rather than converting it to the actual character. So if I called ReadInnerXml() on the node:
<param name="id">don&apos;t convert this</param>
I would get "don&apos;t convert this" as opposed to what I'm currently getting, which is "don't convert this". This is necessary as any characters or character entity references should be handed back the way the came due to them being legacy content.
Any help appreciated!
No, I don't know of any XML parser that has this feature. The job of an XML parser is to parse the input, and that's what it will do.
If you can't fix the consumer of this process to handle XML properly, your best bet might be to preprocess the text by replacing & by (say) § so it doesn't mean anything special to the XML parser.

structure of selfnodes changes when creating an xml file from another

while creating an xml file from another one by cloning nodes from source to target file in c#, the structure of empty nodes like <noeud></noeud> becomes <noeud/>
i've tried this :
if (nodeSource.InnerText.Equals(""))
XmlNode nodeDestination = NodeSource.CloneNode(false);
is there any method to keep the same structure .
The format <element/> is frequently called a self-closing element. It's 100% valid, and the preferred storage method. If you really care (why?) re-writing to expanded format (<element></element>), you can look at writing your own XmlTextWriter. This article will be helpful for you.
http://blogs.msdn.com/b/nareshjoshi/archive/2009/01/15/how-to-force-non-self-closing-tags-for-empty-nodes-when-using-xslcompiledtransform-class.aspx

Xml with Namespace but without xmlns

I am working on a program in C# that edits open-document files on xml level. For example it adds rows to tables.
So I load the content.xml into an XmlDocument "doc" and traverse the xml structure.
Say I have the <table:table-row> node in an XmlNode "row" and now I want to add a <table:table-cell> node to it. So I call
XmlDocument doc = new XmlDocument();
doc.Load(filename);
...
XmlNode row = ...;
...
XmlNode cell = doc.CreateElement("table:table-cell");
row.Append(cell);
...
doc.Save(filename);
The problem is that, in the file, the new node only contains
<table-cell>...</table-cell>
C# just decides to ignore what I told it to and does something else without even telling me (at first I overlooked the problem and was wondering why it didn't work although the generated xml looked okay).
From what I gathered out so far, the problem has to do with the fact that "table:" is a namespace. When I also supply a NamespaceURI to CreateElement, I get
<table:table-cell table:xmlns="THE_URI" >... - but the original document did not have this xmlns, so I don't want it either...
I tried to use an XmlTextWriter and setting writer.Settings.Namespaces = false, because I thought, this should suppress the output of the xmlns, but it only caused an exception - the document has some namespaces, which are forbidden if Namespaces is set to false... (wtf!? suppressing the output of xmlns seems a billion times more logical than throwing an exception if an xmlns is present...)
In some similar discussions I read that you should set the cell.Name manually, but this property is read-only...
Others suggest to change it on text-file level (that's tinkering and it would be slow)
Can anyone give me a hint?
Every namespace should have at least one xmlns definition with a URI. This is the ultimate differentiation between two tags.
You can however have the xmlns attribute declared only once in the file (in the beginning).
See
Creating a specific XML document using namespaces in C#
The table: parts are not namespaces. They are "namespace prefixes". They are an alias for the actual namespace. They must be declared somewhere. If they are not declared at all in your source XML, then it is not valid XML, and you shouldn't expect to be able to process it.
Are you sure that what you have loaded is the entire XML document? They haven't left off parts to make it simpler? Those parts may be the ones that contain the definition of table:.

Parsing .plist Files to plain XML C#

I'm trying to read my Apple Safari history with c#, which is stored in a plist file, however I always get an error and I'm not sure what the correct way is to do it.
The code I tried to execute is this:
XmlDocument xmd = new XmlDocument();
xmd.LoadXml(#"C:\Users\Oran\AppData\Roaming\AppleComputer\Safari\History.plist");
and I always get the following error:
"Data at the root level is invalid. Line 1, position 1."
Does anyone know whats wrong with this code and recommend what is the best way to read plist files?
It looks like that Apple Safari history.plist is binary plist. I've found a great project:
https://github.com/animetrics/PlistCS
From the readme:
This is a C# Property List (plist) serialization library (MIT
license). It supports both XML and binary versions of the plist
format.
try this and everyhing should be fine ;-)
xmd.Load(...)
The one you have used loads the xml data from a string not from a file.
A plist doesn't have to be XML. There are four different serialization methods — old-style (for NeXT; no longer used), XML, binary and JSON (new in 10.7). Safari's History.plist is most likely binary, for efficiency reasons.
If I'm not mistaken, Safari for Windows does ship with plutil.exe in Common Files\Apple Application Support. You can use that like plutil -convert xml1 SOME_FILE.plist to convert your file.
The problem is with the second line, saying
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
Option 1. Remove it before parsing.
Option 2. Read the MSDN on
"XmlDocument.XmlResolver Property" and figure out how to make the
XmlDocument download, parse and use the DTD from the URI specified in the XML.

How to get the text from XML with position in the XML file?

I want to parse HTML (you can assume as a XML, converted via Tidy) and get all the text nodes (which means nodes in Body tag that are visible) and their location in the XML file. Location means the text position in the flat XML file.
XmlTextReader implements IXmlLineInfo - if you look at the docs for IXmlLineInfo it gives an example of reading an XML file and reporting the location of each node.
EDIT: For those saying it's irrelevant, it may well be irrelevant to the XML - but quite possibly not to a human. If you're trying to tell people where to look in the XML for particular bits, it can be very helpful to report line numbers and positions.
The SAX specification for reading XML (which almost all XML tools implement) provides a ContentHandler with a Locator which allows you to get the line and character (column) number.
int getColumnNumber()
Return the column number where the current document event ends.
int getLineNumber()
Return the line number where the current document event ends.
(I missed the requirement for C#. The example above is for Java but I will try to find the corresponding C# interface).
The event could be a string of characters.
SAX for .NET is described in:
http://saxdotnet.sourceforge.net/
You should not rely on text position in an XML file(whitespace is completely ignored by any sane parser). What you can (and should) do is use XPath to identify the nodes you are interested in, and then take out the text from those nodes. If you're interested in just the text nodes, then the query "//text()" will grab all the text nodes.

Categories

Resources