XPath problems in C# XSLT transformation - c#

I am trying to parse an XML document to a website, through a XSLT transformation.
However, to make it work I have to use the following XPath:
/*[name()='standards']/*[name() = 'standard']
Why does the following XPath expression not work?
/standards/standard

Your problem is the most FAQ in XPath -- search for XPath and default namespace and you'll find many good answers.
To summarize the problem: XPath interpretes any unprefixed name as belonging to "no namespace".
Therefore any unprefixed name in any XPath expression, belonging to some default namespace (not the "no namespace") isn't selected.
One way to continue to use names in the location steps is to indicate to the XPath processor that a specific prefix, say "x" is associated to the default namespace. Then issue:
/x:standards/x:standard
In .NET such namespace binding (called "registering of namespace") is done using the XmlNamespaceManager class. See this complete example.
In XSLT, simply define a namespace at a global level, then specify XPath expressions where each element name is prefixed by the prefix so defined.
Here is a small example:
<nums xmlns="some:nums">
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>10</num>
</nums>
To process the above XML document we have this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="some:nums">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:value-of select="/x:nums/x:num[. = 3]"/>
</xsl:template>
</xsl:stylesheet>
Applying this transformation to the above XML document correctly selects the wanted element and outputs its string value:
03

I don't know what your question is. Just taking a wild stab, perhaps this is what you want ...
<xsl:stylesheet version="1.0"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ikas="http://www.ikas.dk"
exclude-result-prefixes="msxsl ikas">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<div xmlns="http://www.ikas.dk">
<textarea>
<xsl:copy-of select="/ikas:standards/ikas:standard"/>
</textarea>
</div>
</xsl:template>
</xsl:stylesheet>

Related

Remove Certain HTML tags in C#

I'm trying to remove a certain html tags in C# like this:
<div>
<blockquote style="font-size: 30px" width="300px">
For 50 years, WWF has been protecting the future of nature. The world's leading conservation organization, WWF works in 100 countries and is supported by 1.2 million members in the United States and close to 5 million globally.
</blockquote>
</div>
To be result as
<div>For 50 years, WWF has been protecting the future of nature. The world's leading conservation organization, WWF works in 100 countries and is supported by 1.2 million members in the United States and close to 5 million globally.</div>
So far, I'm trying to do the regex. (<.+?)\s+style\s*=\s*([""']).*?\2(.*?>) but this is only for removing the style but I'm not sure how can I able to achieve the result that I want.
Thanks!
As far as I can see, you want to remove the HTML elements that contain a style attribute, also remove their closing pairs. Unfortunately, there is no good way to do that with regexes. Without the 'also remove their closing pairs' clause, we could write an approximately good regex.
On the other hand, XSLT is the right tool for this, because it can handle the recursive nature of XML:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="//*[not(#style)]">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
What's happening here? The <xsl:template match="//*[not(#style)]"> part matches everything that does not have a style attribute. Then the <xsl:copy>...</xsl:copy> part copies them entirely. I.e. the items that have a style attribute, they will not be copied.
For the record, this is a slight variant of the XSLT identity transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

How to select an attribute value in an XML and concatenate it with a string and use it as an attribute value in a new XML using XSLT

I need to transform an existing XML into another XML using XSLT.
The problem I am facing is that I need to use the "typeName" attribute from the ECClass and concatenate it with http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1#
The XML i am working with is -
<ECSchema>
<ECClass typeName="ABC">
<BaseClass>PQR</BaseClass>
<BaseClass>XYZ</BaseClass>
</ECClass>
<ECClass typeName="IJK">
<BaseClass>MNO</BaseClass>
<BaseClass>DEF</BaseClass>
</ECClass>
<ECSchema>
For example the concatenated result should be -
http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1#ABC for the first ECClass
I need to set this string as the attribute value of rdf:about in the owl:class tag in the new XML structure.
The new XML structure is -
<owl:ontology rdf:about="http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1">
<owl:class rdf:about="http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1#ABC">
</owl:class>
<owl:class rdf:about="http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1#ABC">
</owl:class>
</owl:ontology>
Right now I have not yet tried to do anything about the BaseClass. I have only been trying to convert the ECCLass to owl:class.
The XSL for it is -
<xsl:template match="/">
<owl:Ontology rdf:about="http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1"/>
<xsl:for-each select="ECSchema/ECClass">
<owl:class rdf:about="<xsl:value-of select="concat('http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1#' , '#typeName') />" >
</owl:class>
</xsl:for-each>
</xsl:template>
I have been trying many combinations to do this from various sources but haven't been able to do it.
It always returns an error - "Additional information: '<', hexadecimal value 0x3C, is an invalid attribute character."
Can anybody please help me with this as I am very new to XSLT and all I have been getting is lots of errors.
Tags cannot be nested. To achieve your purpose, you should learn about attribute value templates. In addition, your code is rather sloppy. Try it this way:
<xsl:template match="/">
<owl:Ontology rdf:about="http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1">
<xsl:for-each select="ECSchema/ECClass">
<owl:class rdf:about="{concat('http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1#', #typeName)}" />
</xsl:for-each>
</owl:Ontology>
</xsl:template>
or perhaps a bit more elegant:
<xsl:variable name="myURL">http://www.semanticweb.org/aman.prasad/ontologies/2015/5/untitled-ontology-1</xsl:variable>
<xsl:template match="/">
<owl:Ontology rdf:about="{$myURL}">
<xsl:for-each select="ECSchema/ECClass">
<owl:class rdf:about="{$myURL}#{#typeName}" />
</xsl:for-each>
</owl:Ontology>
</xsl:template>

LINQ / Xpath query for ungrouped and repeated XML elements

I am new to .NET and I am having some trouble implementing queries in LINQ to XML.
I have a XML file in a strange format:
<calendar>
<event>
<amount>1200</amount>
<age>40</age>
<country>FR</country>
<amount>255</amount>
<age>16</age>
<country>UK</country>
<amount>10524</amount>
<age>18</age>
<country>FR</country>
<amount>45</amount>
<age>12</age>
<country>CH</country>
<event>
<event>
<amount>1540</amount>
<age>25</age>
<country>UK</country>
<amount>255</amount>
<age>31</age>
<country>CH</country>
<amount>4310</amount>
<age>33</age>
<country>FR</country>
<amount>45</amount>
<age>17</age>
<country>FR</country>
<event>
</calendar>
From this file I want to compute the sum of every <amount> element value, where <age> is greater than '20' and <country> is either 'FR' or 'CH'.
This operation is independent of the tag <event> (all <amount> elements that check the above conditions should be summed, whether they're under the same or different <event> elements).
My problem is that I have no element tag that groups <amount>, <age> and <country> together... (I can't change the XML format, I'm consuming it from a Web Service I can't access).
If I had an hypothetical <transfer> tag grouping these triples together, I think the code would be simply:
XElement root = XElement.Load("calendar.xml");
IEnumerable<XElement> sum =
from trf in root.Elements("events").Elements("transfers")
where (decimal) trf.Element("age") > 20 &&
((string) trf.Element("Country") == "FR" ||
(string) trf.Element("Country") == "cH")
select trf.Element("Amount").Sum();
Should I programatically group these elements? Thanks in advance!
Try this:
var xe = XElement.Load(#"calendar.xml");
var langs = new List<string> { "FR", "CH" };
var sum = xe.Descendants("amount")
.Where(e =>
Convert.ToInt32(e.ElementsAfterSelf("age").First().Value) > 20 &&
langs.Any(l => l == e.ElementsAfterSelf("country").First().Value))
.Select(e => Convert.ToDouble(e.Value)).Sum();
I have tested the code. You also have to make sure that amount element must be the first element in the group.
If I were you, I would just pre-process the Xml (maybe reading it node by node with a XmlReader ) and read it in a more hierarchical structure.
That would make it easier to search for elements and also to sort or filter them without losing their relationship (which is now based solely on their order).
EDIT (see discussion in the comments)
As far as I know, the xml specification does not say that the order of the elements is significant, so the parsers you use (or any pre-processing of the Xml as a whole or extraction of its elements) could change the order of amount, age and country elements at the same level.
While I think most operations tend to preserve the document order, the possibility of subtle and hard-to-find bugs due to random reorderings would not let me sleep too well...
Use:
sum(/*/*/amount
[following-sibling::age[1] > 20
and
contains('FRCH',
following-sibling::country[1])
])
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select=
"sum(/*/*/amount
[following-sibling::age[1] > 20
and
contains('FRCH',
following-sibling::country[1])
])"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<calendar>
<event>
<amount>1200</amount>
<age>40</age>
<country>FR</country>
<amount>255</amount>
<age>16</age>
<country>UK</country>
<amount>10524</amount>
<age>18</age>
<country>FR</country>
<amount>45</amount>
<age>12</age>
<country>CH</country>
</event>
<event>
<amount>1540</amount>
<age>25</age>
<country>UK</country>
<amount>255</amount>
<age>31</age>
<country>CH</country>
<amount>4310</amount>
<age>33</age>
<country>FR</country>
<amount>45</amount>
<age>17</age>
<country>FR</country>
</event>
</calendar>
the XPath expression is evaluated and the wanted, correct result is output:
5765
Do note: The currently selected answer contains wrong XPath expressions and the sum they produce is wrong. See this illustrated in the XSLT transformation below (the first number is the correct result, the second number is produced using the XPath expressions from the accepted answer:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select=
"sum(/*/*/amount
[following-sibling::age[1] > 20
and
contains('FRCH',
following-sibling::country[1])
])"/>
============
<xsl:value-of select="sum(//*[text()='FR' or text()='CH']/preceding::age[number(text())>20][1]/preceding::amount[1]/text())"/>
</xsl:template>
</xsl:stylesheet>
Result:
5765
============
12475
Well... I'm not sure how you would accomplish that in LINQ, but here's an XPath query that works for me on the data you provided:
Edit:
returns nodes:
//*[text()='FR' or text()='CH']/preceding::age[number(text())>20][1]/preceding::amount[1]
returns sum:
sum(//*[text()='FR' or text()='CH']/preceding::age[number(text())>20][1]/preceding::amount[1]/text())

Can XPathSelectElement ignore case?

Is there a way to ignore case when we try to use XPathSelectElement or any operation like to retrieving attributes from XDocument? The purpose for asking this question is that, I have some configuration files (xml) and I am writing a generic code that will read the config files to get required information for XPathSelectElement. Also, I try to get the values of attributes. Even if someone puts the nodes/attributes in different case, my program should be able to work without fail.
I use C#/.Net 3.5.
You can't ignore case with XPath. You can accomodate, though.
For example - elements, assuming they contain letters in the ASCII range only:
//*[
translate(
name(),
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz'
) = 'myname'
]
Attributes would work the same (with #* in place of *).
If you do not want to bloat your XPath expressions with this, you could lower-case all element- and attribute names beforehand, for example via XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:variable name="upper" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
<xsl:variable name="lower" select="'abcdefghijklmnopqrstuvwxyz'" />
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{translate(name(), $upper, $lower)}">
<xsl:apply-templates select="node() | #*" />
</xsl:element>
</xsl:template>
<xsl:template match="#*">
<xsl:attribute name="{translate(name(), $upper, $lower)}">
<xsl:value-of select="." />
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Before you load the XML string make lower-case. That will solve the issue. I use this method myself.

Force XML character entities into XmlDocument

I have some XML that looks like this:
<abc x="{"></abc>
I want to force XmlDocument to use the XML character entities of the brackets, ie:
<abc x="{"></abc>
MSDN says this:
In order to assign an attribute value
that contains entity references, the
user must create an XmlAttribute node
plus any XmlText and
XmlEntityReference nodes, build the
appropriate subtree and use
SetAttributeNode to assign it as the
value of an attribute.
CreateEntityReference sounded promising, so I tried this:
XmlDocument doc = new XmlDocument();
doc.LoadXml("<abc />");
XmlAttribute x = doc.CreateAttribute("x");
x.AppendChild(doc.CreateEntityReference("#123"));
doc.DocumentElement.Attributes.Append(x);
And I get the exception Cannot create an 'EntityReference' node with a name starting with '#'.
Any reason why CreateEntityReference doesn't like the '#' - and more importantly how can I get the character entity into XmlDocument's XML? Is it even possible? I'm hoping to avoid string manipulation of the OuterXml...
You're mostly out of luck.
First off, what you're dealing with are called Character References, which is why CreateEntityReference fails. The sole reason for a character reference to exist is to provide access to characters that would be illegal in a given context or otherwise difficult to create.
Definition: A character reference
refers to a specific character in the
ISO/IEC 10646 character set, for
example one not directly accessible
from available input devices.
(See section 4.1 of the XML spec)
When an XML processor encounters a character reference, if it is referenced in the value of an attribute (that is, if the &#xxx format is used inside an attribute), it is set to "Included" which means its value is looked up and the text is replaced.
The string "AT&T;" expands to "
AT&T;" and the remaining ampersand is
not recognized as an entity-reference
delimiter
(See section 4.4 of the XML spec)
This is baked into the XML spec and the Microsoft XML stack is doing what it's required to do: process character references.
The best I can see you doing is to take a peek at these old XML.com articles, one of which uses XSL to disable output escaping so &#123; would turn into { in the output.
http://www.xml.com/pub/a/2001/03/14/trxml10.html
<!DOCTYPE stylesheet [
<!ENTITY ntilde
"<xsl:text disable-output-escaping='yes'>&ntilde;</xsl:text>">
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output doctype-system="testOut.dtd"/>
<xsl:template match="test">
<testOut>
The Spanish word for "Spain" is "España".
<xsl:apply-templates/>
</testOut>
</xsl:template>
</xsl:stylesheet>
And this one which uses XSL to convert specific character references into other text sequences (to accomplish the same goal as the previous link).
http://www.xml.com/lpt/a/1426
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output use-character-maps="cm1"/>
<xsl:character-map name="cm1">
<xsl:output-character character=" " string="&nbsp;"/>
<xsl:output-character character="é" string="&233;"/> <!-- é -->
<xsl:output-character character="ô" string="&#244;"/>
<xsl:output-character character="—" string="--"/>
</xsl:character-map>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
You should always manipulate your strings with the preceding # like so #"My /?.,<> STRING". I don't know if that will solve your issue though.
I would approach the problem using XmlNode class from the XmlDocument. You can use the Attributes property and it'll be way easier. Check it out here:
http://msdn.microsoft.com/en-us/library/system.xml.xmlnode.attributes.aspx

Categories

Resources