LINQ / Xpath query for ungrouped and repeated XML elements

LINQ / Xpath query for ungrouped and repeated XML elements - c#

I am new to .NET and I am having some trouble implementing queries in LINQ to XML.
I have a XML file in a strange format:
<calendar>
<event>
<amount>1200</amount>
<age>40</age>
<country>FR</country>
<amount>255</amount>
<age>16</age>
<country>UK</country>
<amount>10524</amount>
<age>18</age>
<country>FR</country>
<amount>45</amount>
<age>12</age>
<country>CH</country>
<event>
<event>
<amount>1540</amount>
<age>25</age>
<country>UK</country>
<amount>255</amount>
<age>31</age>
<country>CH</country>
<amount>4310</amount>
<age>33</age>
<country>FR</country>
<amount>45</amount>
<age>17</age>
<country>FR</country>
<event>
</calendar>
From this file I want to compute the sum of every <amount> element value, where <age> is greater than '20' and <country> is either 'FR' or 'CH'.
This operation is independent of the tag <event> (all <amount> elements that check the above conditions should be summed, whether they're under the same or different <event> elements).
My problem is that I have no element tag that groups <amount>, <age> and <country> together... (I can't change the XML format, I'm consuming it from a Web Service I can't access).
If I had an hypothetical <transfer> tag grouping these triples together, I think the code would be simply:
XElement root = XElement.Load("calendar.xml");
IEnumerable<XElement> sum =
from trf in root.Elements("events").Elements("transfers")
where (decimal) trf.Element("age") > 20 &&
((string) trf.Element("Country") == "FR" ||
(string) trf.Element("Country") == "cH")
select trf.Element("Amount").Sum();
Should I programatically group these elements? Thanks in advance!

Try this:
var xe = XElement.Load(#"calendar.xml");
var langs = new List<string> { "FR", "CH" };
var sum = xe.Descendants("amount")
.Where(e =>
Convert.ToInt32(e.ElementsAfterSelf("age").First().Value) > 20 &&
langs.Any(l => l == e.ElementsAfterSelf("country").First().Value))
.Select(e => Convert.ToDouble(e.Value)).Sum();
I have tested the code. You also have to make sure that amount element must be the first element in the group.

If I were you, I would just pre-process the Xml (maybe reading it node by node with a XmlReader ) and read it in a more hierarchical structure.
That would make it easier to search for elements and also to sort or filter them without losing their relationship (which is now based solely on their order).
EDIT (see discussion in the comments)
As far as I know, the xml specification does not say that the order of the elements is significant, so the parsers you use (or any pre-processing of the Xml as a whole or extraction of its elements) could change the order of amount, age and country elements at the same level.
While I think most operations tend to preserve the document order, the possibility of subtle and hard-to-find bugs due to random reorderings would not let me sleep too well...

Use:
sum(/*/*/amount
[following-sibling::age[1] > 20
and
contains('FRCH',
following-sibling::country[1])
])
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select=
"sum(/*/*/amount
[following-sibling::age[1] > 20
and
contains('FRCH',
following-sibling::country[1])
])"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<calendar>
<event>
<amount>1200</amount>
<age>40</age>
<country>FR</country>
<amount>255</amount>
<age>16</age>
<country>UK</country>
<amount>10524</amount>
<age>18</age>
<country>FR</country>
<amount>45</amount>
<age>12</age>
<country>CH</country>
</event>
<event>
<amount>1540</amount>
<age>25</age>
<country>UK</country>
<amount>255</amount>
<age>31</age>
<country>CH</country>
<amount>4310</amount>
<age>33</age>
<country>FR</country>
<amount>45</amount>
<age>17</age>
<country>FR</country>
</event>
</calendar>
the XPath expression is evaluated and the wanted, correct result is output:
5765
Do note: The currently selected answer contains wrong XPath expressions and the sum they produce is wrong. See this illustrated in the XSLT transformation below (the first number is the correct result, the second number is produced using the XPath expressions from the accepted answer:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select=
"sum(/*/*/amount
[following-sibling::age[1] > 20
and
contains('FRCH',
following-sibling::country[1])
])"/>
============
<xsl:value-of select="sum(//*[text()='FR' or text()='CH']/preceding::age[number(text())>20][1]/preceding::amount[1]/text())"/>
</xsl:template>
</xsl:stylesheet>
Result:
5765
============
12475

Well... I'm not sure how you would accomplish that in LINQ, but here's an XPath query that works for me on the data you provided:
Edit:
returns nodes:
//*[text()='FR' or text()='CH']/preceding::age[number(text())>20][1]/preceding::amount[1]
returns sum:
sum(//*[text()='FR' or text()='CH']/preceding::age[number(text())>20][1]/preceding::amount[1]/text())

Related

How to update all instances of an element in XDocument?

I have xml structure as below. How can I replace the element value of that is present everywhere in this structure? Is there a way to do this using Linq? Also, sometimes the structure could be different but there will always be Resource element so I need to look at all instances of Resource and not care about where it is present.
Thanks for any suggestions.
<Users>
<User>
<Number>123456</Number>
<ID>1</ID>
<Events>
<Event>
<ID>12</ID>
</Event>
</Events>
<Items>
<Item>
<ID>12</ID>
<Resource>Replace this value</Resource>
</Item>
<Item>
<ID>13</ID>
<Resource>Replace this value</Resource>
</Item>
<Item>
<ID>14</ID>
<Resource>Replace this value</Resource>
</Item>
</Items>
</User>
//More User elements where Resource needs to be updated
<User>
</User>
<User>
</User>
</Users>

Linq is a query language, so you can't directly use it to modify the value, but you can easily select all the Resource elements in the document with it and iterate/change them.
For example:
// or load from xml, however you have it
var xDoc = XDocument.Load(#"c:\temp\myxml.xml");
// iterate every Resource element
foreach (XElement element in xDoc.Descendants("Resource"))
element.Value = "Hello, world";
That will pick out every Resource element in the XML regardless of where it is in the hierarchy, which in your case, is what you need. If you needed to target it more specifically, you could either use an XPath expression or further Linq calls such as Element() which work on a single level of the hierarchy.

XPath problems in C# XSLT transformation

I am trying to parse an XML document to a website, through a XSLT transformation.
However, to make it work I have to use the following XPath:
/*[name()='standards']/*[name() = 'standard']
Why does the following XPath expression not work?
/standards/standard

Your problem is the most FAQ in XPath -- search for XPath and default namespace and you'll find many good answers.
To summarize the problem: XPath interpretes any unprefixed name as belonging to "no namespace".
Therefore any unprefixed name in any XPath expression, belonging to some default namespace (not the "no namespace") isn't selected.
One way to continue to use names in the location steps is to indicate to the XPath processor that a specific prefix, say "x" is associated to the default namespace. Then issue:
/x:standards/x:standard
In .NET such namespace binding (called "registering of namespace") is done using the XmlNamespaceManager class. See this complete example.
In XSLT, simply define a namespace at a global level, then specify XPath expressions where each element name is prefixed by the prefix so defined.
Here is a small example:
<nums xmlns="some:nums">
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>10</num>
</nums>
To process the above XML document we have this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="some:nums">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:value-of select="/x:nums/x:num[. = 3]"/>
</xsl:template>
</xsl:stylesheet>
Applying this transformation to the above XML document correctly selects the wanted element and outputs its string value:
03

I don't know what your question is. Just taking a wild stab, perhaps this is what you want ...
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ikas="http://www.ikas.dk"
exclude-result-prefixes="msxsl ikas">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<div xmlns="http://www.ikas.dk">
<textarea>
<xsl:copy-of select="/ikas:standards/ikas:standard"/>
</textarea>
</div>
</xsl:template>
</xsl:stylesheet>

Flatten XML structure by element with linq to xml

I recently created a post about flattening an XML structure so every element and it's values were turned into attributes on the root element. Got some great answer and got it working. However, sad thing is that by flattening, the client meant to flatten the elements and not make them into attributes :-/
What I have is this:
<members>
<member xmlns="mynamespace" id="1" status="1">
<sensitiveData>
<notes/>
<url>someurl</url>
<altUrl/>
<date1>somedate</date1>
<date2>someotherdate</date2>
<description>some description</description>
<tags/>
<category>some category</category>
</sensitiveData>
<contacts>
<contact contactId="1">
<contactPerson>some contact person</contactPerson>
<phone/>
<mobile>mobile number</mobile>
<email>some#email.com</email>
</contact>
</kontakter>
</member>
</members>
And what I need is the following:
<members>
<member xmlns="mynamespace" id="1" status="1">
<sensitiveData/>
<notes/>
<url>someurl</url>
<altUrl/>
<date1>somedate</date1>
<date2>someotherdate</date2>
<description>some description</description>
<tags/>
<category>some category</category>
<contacts/>
<contact contactId="1"></contact>
<contactPerson>some contact person</contactPerson>
<phone/>
<mobile>mobile number</mobile>
<email>some#email.com</email>
</member>
</members>
So basically all elements, but flattened as childnodes of . I do know that it's not pretty at all to begin parsing XML documents like this, but it's basically the only option left as the CMS we're importing data to requires this flat structure and the XML document comes from an external webservice.
I started to make a recursive method for this, but I've got an odd feeling that it could be made smoother (well, as smooth as possible at least) with some LINQ to XML (?) I'm not the best at linq to xml, so I hope there's someone out there who would be helpful to give a hint on how to solve this? :-)

This seems to work - there may be neater approaches, admittedly:
var doc = XDocument.Load("test.xml");
XNamespace ns = "mynamespace";
var member = doc.Root.Element(ns + "member");
// This will *sort* of flatten, but create copies...
var descendants = member.Descendants().ToList();
// So we need to strip child elements from everywhere...
// (but only elements, not text nodes). The ToList() call
// materializes the query, so we're not removing while we're iterating.
foreach (var nested in descendants.Elements().ToList())
{
nested.Remove();
}
member.ReplaceNodes(descendants);

xpath return string instead of nodelist

I am working on a biztalk project and I need to copy (filtered) content from 1 xml to another.
I have to do this with xpath, I can't use xsl transformation.
So my xpath to get the content from the source xml file is this:
//*[not(ancestor-or-self::IN1_Insurance)]|//IN1_Insurance[2]/descendant-or-self::*
Now this returns an xmlNodelist. Is it possible to return a string with all the nodes in it like:
"<root><node>text</node></root>"
If I put string() before my xpath it returns the values, but I want the whole xml in a string (with nodes..), so I could load that string in another xmldocument. I think this is the best method for my problem.
I know I can loop over the xmlnodelist and append the nodes to the new xmldocument, but it's a bit tricky to loop in a biztalk orchestration and I want to avoid this.
The code I can use is C#.
I've tried to just assign the nodelist to the xmldocument, but this throws a cast error (obvious..).
The way I see it is that I have 2 solutions:
assign the nodelist to the xmldocument without a loop (not possible i think in C#)
somehow convert the nodelist to string and load this in the xmldocument
load the xpath directly in the new xmldocument (don't know if this is possible since it returns a nodelist)
Thanks for your help
edit:
sample input:
<root>
<Patient>
<PatientId></PatientId>
<name></name>
</Patient>
<insurance>
<id>1</id>
<billing></billing>
</insurance
<insurance>
<id>2</id>
<billing></billing>
</insurance>
<insurance>
<id>3</id>
<billing></billing>
</insurance>
</root>
Now I want to copy this sample to another xmldocument, but without insurance node 2 and 3 (this is dynamically, so it could be unsurance node 1 and 2 to delete, or 1 and 3...)
So this has to be the output:
<root>
<Patient>
<PatientId></PatientId>
<name></name>
</Patient>
<insurance>
<id>1</id>
<billing></billing>
</insurance>
</root>
What I am doing now is use the xpath to get the nodes I want. Then I want to assign the result to the new xmldocument, but this is not possible since I get the castException
string xpath = "//*[not(ancestor-or-self::IN1_Insurance)]|//IN1_Insurance[2]/descendant-or-self::*";
xmlDoc = new System.Xml.XmlDocument();
xmlDoc = xpath(sourceXml, strXpath); <= cast error (cannot cast xmlnodelist to xmldocuemnt)
I know the syntax is a bit strange, but it is biztalk c# code..

The most straightforward solution would indeed be to "loop over the xmlnodelist and append (import) the nodes to the new xmldocument", but since you can't loop, what other basic things can/can't you do?
To serialize the nodelist, you could try using XmlNodeList.toString(). If that worked, you'd get a strange beast, because it could be duplicating parts of the XML document several times over. Especially since you're explicitly including ancestors and descendants directly in the nodelist. It would not be something that you could parse back in and have a result that resembled the nodelist you started with.
In other words, it would be best to loop over the XmlNodeList and import the nodes to the new XmlDocument.
But even so, I would be really surprised if you wanted to put all these ancestor and descendant nodes:
//*[not(ancestor-or-self::IN1_Insurance)]|//IN1_Insurance[2]/descendant-or-self::
directly into the new XML document. If you post some sample input and the desired output, we can probably help determine if that's the case.
Update:
I see what you're trying to do: copy an XML document, omitting all <insurance> elements (and their descendants) except the one you want.
This can be done without a loop if the output is as simple as your sample output: only one <Patient> and one <insurance> element, with their descendants, under one top-level element.
Something like (I can't test this as I don't have a biztalk server):
string xpathPatient = "/*/Patient";
string xpathInsuran = "/*/insurance[id = " + insId + "]"; // insId is a parameter
xmlDoc = new System.Xml.XmlDocument();
xmlPatient = xpath(sourceXml, xpathPatient);
xmlInsuran = xpath(sourceXml, xpathInsuran);
XmlElement rootNode = xmlDoc.CreateElement("root");
xmlDoc.AppendChild(rootNode);
//**Update: use [0] to get an XmlNode from the returned XmlNodeList (presumably)
rootNode.AppendChild(xmlDoc.ImportNode(xmlPatient[0], true));
rootNode.AppendChild(xmlDoc.ImportNode(xmlInsuran[0], true));
I confess though, I'm curious why you can't use XSLT. You're approaching tasks that would be more easily done in XSLT than in XPath + C# XmlDocument.
Update: since the xpath() function probably returns an XmlNodeList rather than an XmlNode, I added [0] to the first argument to ImportNode() above. Thanks to #Martin Honnen for alerting me to that.

XPath is a query language (only) for XML documents.
It operates on an abstract model -- the XML INFOSET, and cannot either modify the structure of the XML document(s) it operates on or serialize the INFOSET information items back to XML.
Therefore, the only way to achieve such serialization is to use the language that is hosting XPath.
Apart from this, there are obvious problems with yout question, for example these is no element named IN1_Insurance in the provided XML document -- therefore the XPath expression provided:
//*[not(ancestor-or-self::IN1_Insurance)]|//IN1_Insurance[2]/descendant-or-self::*
selects all elements in the document.
Note:
The described task is elementary to fulfil using XSLT.
Finally: If you are allowed to use C# then you can use the XslCompiledTransform (or XslTransform) class. Use its Transform() method to carry out the following transformation against the XML document:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="insurance[not(id=1)]"/>
</xsl:stylesheet>
This produces exactly the wanted result:
<root>
<Patient>
<PatientId></PatientId>
<name></name>
</Patient>
<insurance>
<id>1</id>
<billing></billing>
</insurance>
</root>

Force XML character entities into XmlDocument

I have some XML that looks like this:
<abc x="{"></abc>
I want to force XmlDocument to use the XML character entities of the brackets, ie:
<abc x="{"></abc>
MSDN says this:
In order to assign an attribute value
that contains entity references, the
user must create an XmlAttribute node
plus any XmlText and
XmlEntityReference nodes, build the
appropriate subtree and use
SetAttributeNode to assign it as the
value of an attribute.
CreateEntityReference sounded promising, so I tried this:
XmlDocument doc = new XmlDocument();
doc.LoadXml("<abc />");
XmlAttribute x = doc.CreateAttribute("x");
x.AppendChild(doc.CreateEntityReference("#123"));
doc.DocumentElement.Attributes.Append(x);
And I get the exception Cannot create an 'EntityReference' node with a name starting with '#'.
Any reason why CreateEntityReference doesn't like the '#' - and more importantly how can I get the character entity into XmlDocument's XML? Is it even possible? I'm hoping to avoid string manipulation of the OuterXml...

You're mostly out of luck.
First off, what you're dealing with are called Character References, which is why CreateEntityReference fails. The sole reason for a character reference to exist is to provide access to characters that would be illegal in a given context or otherwise difficult to create.
Definition: A character reference
refers to a specific character in the
ISO/IEC 10646 character set, for
example one not directly accessible
from available input devices.
(See section 4.1 of the XML spec)
When an XML processor encounters a character reference, if it is referenced in the value of an attribute (that is, if the &#xxx format is used inside an attribute), it is set to "Included" which means its value is looked up and the text is replaced.
The string "AT&T;" expands to "
AT&T;" and the remaining ampersand is
not recognized as an entity-reference
delimiter
(See section 4.4 of the XML spec)
This is baked into the XML spec and the Microsoft XML stack is doing what it's required to do: process character references.
The best I can see you doing is to take a peek at these old XML.com articles, one of which uses XSL to disable output escaping so &#123; would turn into { in the output.
http://www.xml.com/pub/a/2001/03/14/trxml10.html
<!DOCTYPE stylesheet [
<!ENTITY ntilde
"<xsl:text disable-output-escaping='yes'>&ntilde;</xsl:text>">
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output doctype-system="testOut.dtd"/>
<xsl:template match="test">
<testOut>
The Spanish word for "Spain" is "España".
<xsl:apply-templates/>
</testOut>
</xsl:template>
</xsl:stylesheet>
And this one which uses XSL to convert specific character references into other text sequences (to accomplish the same goal as the previous link).
http://www.xml.com/lpt/a/1426
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output use-character-maps="cm1"/>
<xsl:character-map name="cm1">
<xsl:output-character character=" " string="&nbsp;"/>
<xsl:output-character character="é" string="&233;"/> <!-- é -->
<xsl:output-character character="ô" string="&#244;"/>
<xsl:output-character character="—" string="--"/>
</xsl:character-map>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

You should always manipulate your strings with the preceding # like so #"My /?.,<> STRING". I don't know if that will solve your issue though.
I would approach the problem using XmlNode class from the XmlDocument. You can use the Attributes property and it'll be way easier. Check it out here:
http://msdn.microsoft.com/en-us/library/system.xml.xmlnode.attributes.aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ / Xpath query for ungrouped and repeated XML elements - c#

Related

How to update all instances of an element in XDocument?

XPath problems in C# XSLT transformation

Flatten XML structure by element with linq to xml

xpath return string instead of nodelist

Force XML character entities into XmlDocument

Categories

Resources