Can I use WebUtility.HtmlDecode to decode XML? - c#

I have an XML-encoded attribute value. This is actually from a processing instruction. So the original data looks something like this:
<root><?pi key="value" data="<foo attr="bar">Hello world</foo>" ?></root>
I can parse it like this:
using System;
using System.Linq;
using System.Xml.Linq;
public class Program
{
private const string RawData = #"<root><?pi key=""value"" data=""<foo attr="bar">Hello world</foo>"" ?></root>";
public static void Main()
{
XDocument doc = GetXDocumentFromProcessingInstruction();
IEnumerable<XElement> fooElements = doc.Descendants("foo");
// ...
}
private static XProcessingInstruction LoadProcessingInstruction()
{
XDocument doc = XDocument.Parse(rawData);
return doc
.DescendantNodes()
.OfType<XProcessingInstruction>()
.First();
}
private static XDocument GetXDocumentFromProcessingInstruction()
{
XProcessingInstruction processingInstruction = LoadProcessingInstruction();
// QUESTION:
// Can there ever be a situation where HtmlDecode wouldn't decode the XML correctly?
string decodedXml = WebUtility.HtmlDecode(processingInstruction.Data);
// This works well, but it contains the attributes of the processing
// instruction as text.
string dummyXml = $"<dummy>{xml}</dummy>";
return XDocument.Parse(dummyXml);
}
This works absolutely fine, as far as I can tell.
But I am wondering if there might be some edge cases where it may fail, because of differences in how data would be encoded in XML vs. HTML.
Anybody have some more insight?
Edit:
Sorry, I made some incorrect assumptions about XProcessingInstruction.Data, but the code above was still working fine, so my question stands.
I have nonetheless rewritten my code and wrapped everything in an XElement, which (of course) removed the issue altogether:
private static XDocument GetXDocumentFromProcessingInstruction2()
{
XProcessingInstruction processingInstruction = LoadProcessingInstruction();
string encodedXml = string.Format("<dummy {0} />", processingInstruction.Data);
XElement element = XElement.Parse(encodedXml);
string parsedXml = element.Attribute("data").Value;
return XDocument.Parse(parsedXml);
}
So this does exactly what I need. But since WebUtility.HtmlDecode worked sufficiently well, I would still like to know if there could have been a situation where the first approach could have failed.

Removing the question marks and adding a forward slash at end of your input I got this
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = "<pi data=\"<foo attr="bar">Hello world</foo>\" />";
XElement pi = XElement.Parse(input);
string data = (string)pi.Attribute("data");
XElement foo = XElement.Parse(data);
string attr = (string)foo.Attribute("attr");
string innertext = (string)foo;
}
}
}

Related

XPath query with multiple namespaces

What do you think the correct XPath is to pull the dublin core identifier below?
I added a namespace manager with these entries:
// Add the namespace.
XmlNamespaceManager nsmgr = new XmlNamespaceManager(m_xml.NameTable);
nsmgr.AddNamespace("mets", "http://www.loc.gov/METS/");
nsmgr.AddNamespace("dc", "http://purl.org/dc/elements/1.1/");
nsmgr.AddNamespace("dcterms", "http://purl.org/dc/terms/");
And I have tried 15 or so different XPath iterations including the ones below. When I do not get an error, the result is null.
//xml_uuid = m_xml.SelectSingleNode("/mets:mets/mets:dmdSec/mets:mdWrap/mets:xmlData/dcterms:dublincore/dc:identifier").Value;
xml_uuid = m_xml.SelectSingleNode("//dc:identifier",nsmgr).Value;
Here is the xml I am working with:
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version18/mets.xsd">
<mets:metsHdr CREATEDATE="2017-03-08T20:13:27" />
<mets:dmdSec ID="dmdSec_1">
<mets:mdWrap MDTYPE="DC">
<mets:xmlData>
<dcterms:dublincore xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xsi:schemaLocation="http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd">
<dc:identifier>F2015.5</dc:identifier>
</dcterms:dublincore>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
etc...
I am trying to assign the dc:identifier - F2015.5 in this case - to a string.
With xml linq and ignoring the namespaces :
using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication131
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
string results = (string)doc.Descendants().Where(x => x.Name.LocalName == "identifier").FirstOrDefault();
}
}
}
Here is one working answer. I believe the main thing is to use .InnerText and not .Value.
xml_uuid = m_xml.SelectSingleNode("//mets:xmlData[1]/dcterms:dublincore[1]/dc:identifier[1]", nsmgr).InnerText;

Loading from an Xml file

Okay so this is very basic but I've literally started learning how to read an XML document today and i usually find answers more comprehensive on here than on online guides. Essentially i'm coding a Pokemon game which uses an XML file to load all the stats (its one i swiped from someone else).The user will input a Pokemon and i then want to read the Base Stats of the Pokemon from the XML file, to give a template, this would be one of the Pokemon:
<Pokemon>
<Name>Bulbasaur</Name>
<BaseStats>
<Health>5</Health>
<Attack>5</Attack>
<Defense>5</Defense>
<SpecialAttack>7</SpecialAttack>
<SpecialDefense>7</SpecialDefense>
<Speed>5</Speed>
</BaseStats>
</Pokemon>
The code ive tried to use is:
XDocument pokemonDoc = XDocument.Load(#"File Path Here");
while(pokemonDoc.Descendants("Pokemon").Elements("Name").ToString() == cbSpecies.SelectedText)
{
var Stats = pokemonDoc.Descendants("Pokemon").Elements("BaseStats");
}
but this just returns pokemonDoc as null, any idea where im going wrong?
NOTE:
cbSpeciesSelect is where the user selects which species of pokemon they want.
The File Path definitely works as i've used it already in my program
The while loop never actually starts
Try xml linq
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
var pokemon = doc.Descendants("Pokemon").Select(x => new {
name = (string)x.Element("Name"),
health = (int)x.Element("BaseStats").Element("Health"),
attack = (int)x.Element("BaseStats").Element("Attack"),
defense = (int)x.Element("BaseStats").Element("Defense"),
specialAttack = (int)x.Element("BaseStats").Element("SpecialAttack"),
specialDefense = (int)x.Element("BaseStats").Element("SpecialDefense"),
speed = (int)x.Element("BaseStats").Element("Speed"),
}).FirstOrDefault();
}
}
}
Can you try below code:
foreach(var e in pokemonDoc.Descendants("Pokemon").Elements("Name"))
{
if(e.Value==cbSpecies.SelectedText)
{
var Stats = pokemonDoc.Descendants("Pokemon").Elements("BaseStats");
}
}

Selecting attribute value using XPath and HtmlAgilityPack

I am trying to get second attribute value of a meta tag using an xpath expression in html agility pack:
The meta tag:
<meta name="pubdate" content="2012-08-30" />
The xml path expression i am using:
//meta[#name='pubdate']/#content
But it does not return anything. I tried to search around and implement this solution:
//meta[#name='pubdate']/string(#content)
Another way:
string(//meta[#name='pubdate']/#content)
But it gives xml exception in html agility pack.
Another solution did not work as well.
//meta[#name='pubdate']/data(#content)
For reasons i wanted to use just xml path (and not html agility pack functions to get the attribute value). The function i use is below:
date = TextfromOneNode(document.DocumentNode.SelectSingleNode(".//body"), "meta[#name='pubdate']/#content");
public static string TextfromOneNode(HtmlNode node, string xmlPath)
{
string toReturn = "";
if(node.SelectSingleNode(xmlPath) != null)
{
toReturn = node.SelectSingleNode(xmlPath).InnerText;
}
return toReturn;
}
So far it looks like there is no way to use xml path expression to get an attribute value directly.
Any ideas?
There is a way using HtmlNodeNavigator :
public static string TextfromOneNode(HtmlNode node, string xmlPath)
{
string toReturn = "";
var navigator = (HtmlAgilityPack.HtmlNodeNavigator)node.CreateNavigator();
var result = navigator.SelectSingleNode(xmlPath);
if(result != null)
{
toReturn = result.Value;
}
return toReturn;
}
The following console app example demonstrates how HtmlNodeNavigator.SelectSingleNode() works with both XPath that return element and XPath that return attribute :
var raw = #"<div>
<meta name='pubdate' content='2012-08-30' />
<span>foo</span>
</div>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(raw);
var navigator = (HtmlAgilityPack.HtmlNodeNavigator)doc.CreateNavigator();
var xpath1 = "//meta[#name='pubdate']/#content";
var xpath2 = "//span";
var result = navigator.SelectSingleNode(xpath1);
Console.WriteLine(result.Value);
result = navigator.SelectSingleNode(xpath2);
Console.WriteLine(result.Value);
dotnetfiddle demo
output :
2012-08-30
foo
Using xml linq
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = "<meta name=\"pubdate\" content=\"2012-08-30\" />";
XElement meta = XElement.Parse(input);
DateTime output = (DateTime)meta.Attribute("content");
}
}
}

Ebay API FixedPriceTransaction - ReturnedTransactionCountActual returns 0?

When an item is bought with Buy It Now and immediate payment, my code is being triggered correctly (the code sends me an email to test it).
What I want is the ebay item id of the item sold which I believe is here (found this sample in the documentation):
<GetItemTransactionsResponse xmlns="urn:ebay:apis:eBLBaseComponents">
<Timestamp>2015-07-02T00:09:03.273Z</Timestamp>
<Ack>Success</Ack>
<Version>967</Version>
<Build>E967_CORE_BUNDLED_11481347_R1</Build>
<PaginationResult>
<TotalNumberOfPages>1</TotalNumberOfPages>
<TotalNumberOfEntries>1</TotalNumberOfEntries>
</PaginationResult>
<HasMoreTransactions>false</HasMoreTransactions>
<TransactionsPerPage>100</TransactionsPerPage>
<PageNumber>1</PageNumber>
<ReturnedTransactionCountActual>1</ReturnedTransactionCountActual>
<Item>
<AutoPay>false</AutoPay>
<Currency>USD</Currency>
**<ItemID>110048746230</ItemID>**
I'm using the eBay API:
public void EbayCall(GetItemTransactionsResponseType response)
Is that the correct object to expect? I can't see why response.ReturnedTransactionCountActual is 0. Also response.Item.ItemID is null.
response.Ack returns Success.
What am I doing wrong?
Looks like you are getting an xml response. The ID is shown as 110048746230. The xml can easily be parsed in c#. I sued XML Linq below with Load(string FILENAME) method but you can use a string with Parse(string XML) method.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication3
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XDocument doc = XDocument.Load(FILENAME);
string itemID = (string)doc.Descendants().Where(x => x.Name.LocalName == "ItemID").FirstOrDefault();
}
}
}

accessing descendant elements from an xml returned null using linq to xml in c#

Please people help me out I need to consume a web service that returns an xml from my application, The code that downloads xml works fine, but I need to extract values from the xml file, but I keep getting a null return value from the code, precisely the GetLocationFromXml() method is the method returning null, the GetLocationAsXMLFromHost() method works fine.
this is the complete class
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using AMIS.Core.DTOs;
using System.Net;
using System.Xml.Linq;
using System.Xml;
using System.Linq;
public class GeoLocationService
{
private string _hostWebSite = "http://api.hostip.info";
private readonly XNamespace _hostNameSpace = "http://www.hostip.info/api";
private readonly XNamespace _hostGmlNameSpace = "http://www.opengis.net/gml";
public LocationInfo GetLocationInfoFromIPAddress(string userHostIpAddress)
{
IPAddress ipAddress = null;
IPAddress.TryParse(userHostIpAddress, out ipAddress);
string xmlData = GetLocationAsXMLFromHost(ipAddress.ToString());
LocationInfo locationInfo = GetLocationFromXml(xmlData);
return locationInfo;
}
private string GetLocationAsXMLFromHost(string userHostIpAddress)
{
WebClient webClient= new WebClient();
string formattedUrl = string.Format(_hostWebSite + "/?ip={0}", userHostIpAddress);
var xmlData = webClient.DownloadString(formattedUrl);
return xmlData;
}
private LocationInfo GetLocationFromXml(string xmlData)
{
LocationInfo locationInfo = new LocationInfo();
var xmlResponse = XDocument.Parse(xmlData);
var nameSpace = (XNamespace)_hostNameSpace;
var gmlNameSpace = (XNamespace)_hostGmlNameSpace;
try
{
locationInfo = (from x in xmlResponse.Descendants(nameSpace + "Hostip")
select new LocationInfo
{
CountryName = x.Element(nameSpace + "countryName").Value,
CountryAbbreviation = x.Element(nameSpace + "countryAbbrev").Value,
LocationInCountry = x.Element(gmlNameSpace + "name").Value
}).SingleOrDefault();
}
catch (Exception)
{
throw;
}
return locationInfo;
}
}
and the xml file is below
<?xml version="1.0" encoding="iso-8859-1"?>
<HostipLookupResultSet version="1.0.1" xmlns:gml="http://www.opengis.net/gml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.hostip.info/api/hostip-1.0.1.xsd">
<gml:description>This is the Hostip Lookup Service</gml:description>
<gml:name>hostip</gml:name>
<gml:boundedBy>
<gml:Null>inapplicable</gml:Null>
</gml:boundedBy>
<gml:featureMember>
<Hostip>
<ip>41.78.8.3</ip>
<gml:name>(Unknown city)</gml:name>
<countryName>NIGERIA</countryName>
<countryAbbrev>NG</countryAbbrev>
<!-- Co-ordinates are unavailable -->
</Hostip>
</gml:featureMember>
</HostipLookupResultSet>
Given the comments, I suspect the problem may be as simple as:
private string _hostNameSpace = "hostip.info/api";
should be:
private string _hostNameSpace = "http://hostip.info/api";
(Ditto for the others.) Personally I'd make then XNamespace values to start with:
private static readonly XNamespace HostNameSpace = "http://hostip.info/api";
EDIT: Okay, after messing around with your example (which could have been a lot shorter and a lot more complete) I've worked out what's wrong: you're looking for elements using the "host namespace" - but the elements in the XML aren't in any namespace. Just get rid of those namespace bits, and it works fine.

Categories

Resources