I am doing some parsing and iterating over XML (specifically XSD) elements (XmlSchemaElement https://learn.microsoft.com/en-us/dotnet/api/system.xml.schema.xmlschemaelement?view=netcore-3.1) using C# and .NET. While iterating over nodes, I want to fetch their Xpaths but don't know how to do that. I see there is XmlSchemaXPath class available, but don't know is it possible to use it to get Xpath from XmlSchemaElement.
For now I am making Xpath by concatenating nodes into one string like "/node1/node1child1/node987...", while iterating over them, but this obviously isn't good way to do it.
Related
I have an xml file that I am trying to parse and load content from the file into a collection of custom classes.
I need to use an XMLnode's InnerXml and extract (or create) an additional collection of XMLNodes from that string.
I've googled as well as I can to find a solution, but nothing quite fits what I'm after. Is it possible to do that?
thanks
I have a dynamically loaded string list of simple XPath expressions that contain a 'whitelist' of all the valid nodes that my result document should contain. What us the best way in C# to filter the source document to a result document that contains only nodes that match my list of XPath expressions? My source document is currently loaded as an XDocument. Needs to have optimal performance, possibly processing many of these per second. Sounded simple to me at first, but I'm struggling to find the best (or any) way to do it given the array of choices for XML processing in .NET. This XmlPathReader from someone at Microsoft that combines an XmlReader with XPath matching seemed promising, but it hasn't been touched in eight years so not so sure.
I'm attempting to find complete XML objects in a string. They have been placed in the string by an XmlSerializer, but may or may not be complete. I've toyed with the idea of using a regular expression, because it seems like the kind of thing they were built for, except for the fact that I'm trying to parse XML.
I'm trying to find complete objects in the form:
<?xml version="1.0"?>
<type>
<field>value</field>
...
</type>
My thought was a regex to find <?xml version="1.0"?><type> and </type>, but if a field has the same name as type, it obviously won't work.
There's plenty of documentation on XML parsers, but they seem to all need a complete, fully-formed document to parse. My XML objects can be in a string surrounded by pretty much anything else (including other complete objects).
hw<e>reR#lot$0fr#ndm&nchrs%<?xml version="1.0"?><type><field>...</field>...</type>#ndH#r$omOre!!>nuT6erjc?y!<?xml version="1.0"?><type><field>...</field>...</type>ty!=]
A regex would be able to match a string while excluding the random characters, but not find a complete XML object. I'd like some way to extract an object, parse it with a serializer, then repeat until the string contains no more valid objects.
Can you use a regular expression to search for the "<?xml" piece and then assume that's the beginning of an XML object, then use an XMLReader to read/check the remainder of the string until you have parsed one entire element at the root level (then stop reading from the stream with XMLReader after the root node has been completely parsed)?
Edit: For more information about using XMLReader, I suggest one of the questions I asked: I can never predict xmlreader behavior, any tips on understanding?
My final solution was to stick with the "Read" method when parsing XML and avoid other methods that actually read from the stream advancing the current position.
You could try using the Html Agility Pack, which can be used to parse "malformed XML" and make it accessible with a DOM.
It would be necessary to know which element you are looking for (like <type> in your example), because it will be parsing the accidental elements too (like <e> in your example).
I have an XML document that I load in and try to search with XPath. The root node in this file is <t:Transmission xmlns:t='urn:InboundShipment'> and the file end is properly closed with </t:Transmission>.
My problem is that I cannot walk the tree without using a descendant axis. In other words, I can do: SelectSingleNode("//TransactionHeader[SHIPPERSTATE='CA']") and get a node in return. But I cannot do what should be the equivalent: SelectSingleNode("/Transmission/TransmissionBody/Transaction/TransactionHeader[SHIPPERSTATE='CA']")
If I remove the t: I can do an XPath search on /Transmission and get the whole file. With the t: in there I just get null. Or if I try SelectSingleNode("t:Transmission") I get an error with my XPath statement.
I generally do not need to query the root element, so I should be able to make do with just using the descendant axis for my searches. But the XML looks valid to me and so I'd like to know how to address this. Plus I don't want to ask the client to remove "t:" just because I don't know how to deal with it.
The "t:" is a namespace prefix, which is bound to the namespace 'urn:InboundShipment.' In order to properly handle it, you have to tell c# what the prefix is bound to. This page should explain how to use System.Xml.XmlNamespaceManager to handle the namespace.
Edit: See this answer, as well.
I am refactoring some code in an existing system. The goal is to remove all instances of the XmlDocument to reduce the memory footprint. However, we use XPath to manipulate the xml when certain rules apply. Is there a way to use XPath without using a class that loads the entire document into memory? We've replaced all other instances with XmlTextReader, but those only worked because there is no XPath and the reading is very simple.
Some of the XPath uses values of other nodes to base its decision on. For instance, the value of the message node may be based on the value of the amount node, so there is a need to access multiple nodes at one time.
If your XPATH expression is based on accessing multiple nodes, you're just going to have to read the XML into a DOM. Two things, though. First, you don't have to read all of it into a DOM, just the part you're querying. Second, which DOM you use makes a difference; XPathDocument is read-only and tuned for XPATH query speed, unlike the more general purpose but expensive XmlDocument.
I supose that using System.Xml.Linq.XDocument is also prohibited? Otherwise, it would be a good choice, as it is faster than XmlDocument (as I remember).
Supporting XPath means supporting queries like:
//address[/states/state[#code=current()/#code]='California']
or
//item[#id != preceding-sibling/item/#id]
which require the XPath processor to be able to look everywhere in the document. You're not going to find a forward-only XPath processor.
The way to do this is to use XPathDocument, which can take a stream - therefore you can use StringReader.
This returns the value in a forward read way without the overhead of loading the whole XML DOM into memory with XmlDocument.
Here is an example which returns the value of the first node that satisfies the XPath query:
public string extract(string input_xml)
{
XPathDocument document = new XPathDocument(new StringReader(input_xml));
XPathNavigator navigator = document.CreateNavigator();
XPathNodeIterator node_iterator = navigator.Select(SEARCH_EXPRESSION);
node_iterator.MoveNext();
return node_iterator.Current.Value;
}