Parsing XML: Colon in my element causes XPath to miss it - c#

I have an XML document that I load in and try to search with XPath. The root node in this file is <t:Transmission xmlns:t='urn:InboundShipment'> and the file end is properly closed with </t:Transmission>.
My problem is that I cannot walk the tree without using a descendant axis. In other words, I can do: SelectSingleNode("//TransactionHeader[SHIPPERSTATE='CA']") and get a node in return. But I cannot do what should be the equivalent: SelectSingleNode("/Transmission/TransmissionBody/Transaction/TransactionHeader[SHIPPERSTATE='CA']")
If I remove the t: I can do an XPath search on /Transmission and get the whole file. With the t: in there I just get null. Or if I try SelectSingleNode("t:Transmission") I get an error with my XPath statement.
I generally do not need to query the root element, so I should be able to make do with just using the descendant axis for my searches. But the XML looks valid to me and so I'd like to know how to address this. Plus I don't want to ask the client to remove "t:" just because I don't know how to deal with it.

The "t:" is a namespace prefix, which is bound to the namespace 'urn:InboundShipment.' In order to properly handle it, you have to tell c# what the prefix is bound to. This page should explain how to use System.Xml.XmlNamespaceManager to handle the namespace.
Edit: See this answer, as well.

Related

Getting XPath from XmlSchemaElement

I am doing some parsing and iterating over XML (specifically XSD) elements (XmlSchemaElement https://learn.microsoft.com/en-us/dotnet/api/system.xml.schema.xmlschemaelement?view=netcore-3.1) using C# and .NET. While iterating over nodes, I want to fetch their Xpaths but don't know how to do that. I see there is XmlSchemaXPath class available, but don't know is it possible to use it to get Xpath from XmlSchemaElement.
For now I am making Xpath by concatenating nodes into one string like "/node1/node1child1/node987...", while iterating over them, but this obviously isn't good way to do it.

c# cannot extract element with xsd:schemaLocation attribute

Please have a look at the following lines of XML codes. My goal is to extract the values in the interactor element:
<HPRD3r xmlns="org:hprd:dtd:hprd3r">
<interactions>
<entrySet xsi:schemaLocation="net:sf:psidev:mi http://psidev.sourceforge.net/mi/rel25/src/MIF25.xsd">
<interactionList>
<interactor>
For simplicity, let's assume interactions is a direct child of root.
Set the namespace as follows,
XNamespace ns = "org:hprd:dtd:hprd3r";
The following always returns null although "entrySet" is present:
root.Element(ns+"interactions").Element(ns+"entrySet");
On the other hand,
root.Descendants(ns+"interactor");
does not return null but gives a count of zero even if there are more than one interactor elements in the file.
Seems like the problem is the attribute xsi:schemaLocation in entrySet. Would someone explain to me please the reasons behind the problems above and how to fix them.
Thanks

XPath returns value in C# utility but not in Expression shape in Biztalk

In the XML message below I can get 'AccountNumber' using
//*[local-name()='AccountNumber']/text()
or
/*[local-name()='GetFullAxxAccountNoResponse']/*[local-name()='GetFullAxxAccountNoResult']/*[local-name()='FullAxxAccNo']/*[local-name()='FullAxxAccountNo']/*[local-name()='AccountNumber']
This works fine in a C# test app or in the Notepad++ XPath plugin, but it does not return anything when used in a BizTalk expression shape, can anyone help flesh this out? I have also tried including the namespace in the top level node but had no luck.
Expression shape code:
vAccount = xpath(mymessage.body, "either one of the xpath statements above")
Instance:
<GetFullAxxAccountNoResponse xmlns="http://temp.org/">
<GetFullAxxAccountNoResult>
<FullAxxAccNo>
<FullAxxAccountNo>
<AccountNumber>123456</AccountNumber>
</FullAxxAccountNo>
</FullxxAccNo>
<SuccessFlag>success</SuccessFlag>
<Message />
</GetFullAxxAccountNoResult>
</GetFullAxxAccountNoResponse>
Those xPaths by themselves will return a Node. To get the text content, you should use a format such as:
xpath(myMessage, "string(//*[local-name()='SomeElement']/text())")
Instead of using /text(). which returns a text node. you shall use string() function to converting the node to string.
so
xpath(mymessage.body, "string(//*[local-name()='AccountNumber'])")
shall work.
In addition. If you have your schema defined for this message in your biztalk application. promote this field as a distinguished fields in the schema will make your expression looks more clean. you can access this field like this:
vAcccount = mymessage.AccountNumber

Check if element in XML exists that ends with matching string

I can see a way of searching for an element within XML by just going:
if(doc.SelectSingleNode("//mynode")==null)
But what I'm more interested in, is finding an element that matches the part of the name. Something like:
doc.SelectSingleNode ...that contains "table" in it.
So if I had a node called "AlinasTable", I want it to find that. Why it matters is because my node can inconsistently contain anything that comes before "table", like "JohnsTable" - in which case I'd want that to be returned. So something more generic.
Cheers.
You can use the contains function, as in the following XPath expression:
doc.SelectSingleNode("//*[contains(name(), 'Table')]")

How do I match complete XML objects in a string?

I'm attempting to find complete XML objects in a string. They have been placed in the string by an XmlSerializer, but may or may not be complete. I've toyed with the idea of using a regular expression, because it seems like the kind of thing they were built for, except for the fact that I'm trying to parse XML.
I'm trying to find complete objects in the form:
<?xml version="1.0"?>
<type>
<field>value</field>
...
</type>
My thought was a regex to find <?xml version="1.0"?><type> and </type>, but if a field has the same name as type, it obviously won't work.
There's plenty of documentation on XML parsers, but they seem to all need a complete, fully-formed document to parse. My XML objects can be in a string surrounded by pretty much anything else (including other complete objects).
hw<e>reR#lot$0fr#ndm&nchrs%<?xml version="1.0"?><type><field>...</field>...</type>#ndH#r$omOre!!>nuT6erjc?y!<?xml version="1.0"?><type><field>...</field>...</type>ty!=]
A regex would be able to match a string while excluding the random characters, but not find a complete XML object. I'd like some way to extract an object, parse it with a serializer, then repeat until the string contains no more valid objects.
Can you use a regular expression to search for the "<?xml" piece and then assume that's the beginning of an XML object, then use an XMLReader to read/check the remainder of the string until you have parsed one entire element at the root level (then stop reading from the stream with XMLReader after the root node has been completely parsed)?
Edit: For more information about using XMLReader, I suggest one of the questions I asked: I can never predict xmlreader behavior, any tips on understanding?
My final solution was to stick with the "Read" method when parsing XML and avoid other methods that actually read from the stream advancing the current position.
You could try using the Html Agility Pack, which can be used to parse "malformed XML" and make it accessible with a DOM.
It would be necessary to know which element you are looking for (like <type> in your example), because it will be parsing the accidental elements too (like <e> in your example).

Categories

Resources