Issues querying google sitemap.xml with Linq to XML - c#

I have a Linq-2-XML query that will not work if a google sitemap that I have created has its urlset element populated with attributes but will work fine if there are no attributes present.
Can't query:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.foo.com/index.htm</loc>
<lastmod>2010-05-11</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://www.foo.com/about.htm</loc>
<lastmod>2010-05-11</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority>
</url>
</urlset>
Can query:
<?xml version="1.0" encoding="utf-8"?>
<urlset>
<url>
<loc>http://www.foo.com/index.htm</loc>
<lastmod>2010-05-11</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://www.foo.com/about.htm</loc>
<lastmod>2010-05-11</lastmod>
<changefreq>monthly</changefreq>
<priority>1.0</priority>
</url>
</urlset>
The query:
XDocument xDoc = XDocument.Load(#"C:\Test\sitemap.xml");
var sitemapUrls = (from l in xDoc.Descendants("url")
select l.Element("loc").Value);
foreach (var item in sitemapUrls)
{
Console.WriteLine(item.ToString());
}
What would be the reason for this?

See the "xmlns=" tag in the XML? You need to specify the namespace. Test the following modification of your code:
XDocument xDoc = XDocument.Load(#"C:\Test\sitemap.xml");
XNamespace ns = "http://www.sitemaps.org/schemas/sitemap/0.9";
var sitemapUrls = (from l in xDoc.Descendants(ns + "url")
select l.Element(ns + "loc").Value);
foreach (var item in sitemapUrls)
{
Console.WriteLine(item.ToString());
}

Related

Unable to parse xml string using Xdocument and Linq

I would like to parse the below xml using XDocument in Linq.
<?xml version="1.0" encoding="UTF-8"?>
<string xmlns="http://tempuri.org/">
<Sources>
<Item>
<Id>1</Id>
<Name>John</Name>
</Item>
<Item>
<Id>2</Id>
<Name>Max</Name>
</Item>
<Item>
<Id>3</Id>
<Name>Ricky</Name>
</Item>
</Sources>
</string>
My parsing code is :
var xDoc = XDocument.Parse(xmlString);
var xElements = xDoc.Element("Sources")?.Elements("Item");
if (xElements != null)
foreach (var source in xElements)
{
Console.Write(source);
}
xElements is always null. I tried using namespace as well, it did not work. How can I resolve this issue?
Try below code:
string stringXml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><string xmlns=\"http://tempuri.org/\"><Sources><Item><Id>1</Id><Name>John</Name></Item><Item><Id>2</Id><Name>Max</Name></Item><Item><Id>3</Id><Name>Ricky</Name></Item></Sources></string>";
XDocument xDoc = XDocument.Parse(stringXml);
var items = xDoc.Descendants("{http://tempuri.org/}Sources")?.Descendants("{http://tempuri.org/}Item").ToList();
I tested it and it correctly shows that items has 3 lements :) Maybe you used namespaces differently (it's enough to inspect xDoc objct in object browser and see its namespace).
You need to concatenate the namespace and can directly use Descendants method to fetch all Item nodes like:
XNamespace ns ="http://tempuri.org/";
var xDoc = XDocument.Parse(xmlString);
var xElements = xDoc.Descendants(ns + "Item");
foreach (var source in xElements)
{
Console.Write(source);
}
This prints on Console:
<Item xmlns="http://tempuri.org/">
<Id>1</Id>
<Name>John</Name>
</Item><Item xmlns="http://tempuri.org/">
<Id>2</Id>
<Name>Max</Name>
</Item><Item xmlns="http://tempuri.org/">
<Id>3</Id>
<Name>Ricky</Name>
</Item>
See the working DEMO Fiddle

delete a node from sitemap xml

I have sitemap format as below.
I want to delete a complete node that
I find loc.
For example:
Where a node has <loc>with a value of http://www.my.com/en/flight1.
I want to delete the <url> node and his child
I want to delete loc
than lastmod than priority and than changefreq
<url>
<loc>http://www.my.com/en/flight1
</loc>
<lastmod>2015-03-05</lastmod>
<priority>0.5</priority>
<changefreq>never</changefreq>
</url>
<url>
<loc>
http://www.my.com/en/flight2
</loc>
<lastmod>2015-03-05</lastmod>
<priority>0.5</priority>
<changefreq>never</changefreq>
</url>
<url>
<loc>
http://www.my.com/en/flight3
</loc>
<lastmod>2015-03-05</lastmod>
<priority>0.5</priority>
<changefreq>never</changefreq>
</url>
If you're using C# you should use System.xml.linq (XDocument)
You can remove a node like so:
XDocument.Load(/*URI*/);
var elements = document.Root.Elements().Where(e => e.Element("loc") != null && e.Element("loc").Value == "http://www.my.com/en/flight1");
foreach (var url in elements)
{
url.Remove();
}

XMLDocument Access To Last Child

i'm developing sitemap project and have a problem;
this is my xml file
<?xml version="1.0" encoding="UTF-8"?>
<urlset>
<url ID="1">
<loc>http://www.serkancamur.com/Site/Index/sayfa/Hakkimda</loc>
<changefreq>Daily</changefreq>
<priority>0,9</priority>
</url>
</urlset>
this is my c# code:
int ID = Convert.ToInt32(doc.SelectSingleNode("urlset").LastChild.Attributes["ID"].Value);
this works but look to urlset element attributes:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" schemaLocation="http://www.serkancamur.com/sitemap.xsd">
<url ID="1">
<loc>http://www.serkancamur.com/Site/Index/sayfa/Hakkimda</loc>
<changefreq>Daily</changefreq>
<priority>0,9</priority>
</url>
</urlset>
i only added attributes to urlset element,so why this doesn't work?
You need to use XmlNamespaceManager
Try this
int id = 0;
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("x", "http://www.sitemaps.org/schemas/sitemap/0.9");
var urlset = doc.SelectSingleNode("//x:urlset", nsmgr);
id = Convert.ToInt32(urlset.LastChild.Attributes["ID"].Value);
Hope this helps

Looping through nodes with XPathNavigator

Hi below is a sample of the xml i am using. I been through allsorts of options I can think of to be able to start at the personData node and iterate the results, and nothing i seem to try works unless I navigate manually through each child node from the root down. Can anyone advise on how I can do this without starting at the root
My code currently is
using (var r = File.OpenText(#"C:\S\sp.xml"))
{
XPathDocument document = new XPathDocument(XmlReader.Create(r));
XPathNavigator xPathNav = document.CreateNavigator();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xPathNav.NameTable);
nsmgr.AddNamespace("g2", "http://person.transferobject.com/xsd");
XPathNodeIterator xni = xPathNav.Select("/g2:companys/g2:company/g2:person/g2:personData", nsmgr);
foreach (XPathNavigator nav in xni)
Console.WriteLine(nav.Name);
}
Xml
<?xml version="1.0" encoding="UTF-8"?>
<Header xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<dataSource xmlns="http://person.transferobject.com/xsd">IG2</dataSource>
<dateTime xmlns="http://person.transferobject.com/xsd">Thu Mar 21 15:56:42 GMT 2013</dateTime>
<formatVersion xmlns="http://person.transferobject.com/xsd">2.0</formatVersion>
<companys xmlns="http://person.transferobject.com/xsd">
<company>
<errorMessages xsi:nil="true"/>
<person>
<personData>
<address>
<address1 xmlns="http://transferobject.com/xsd">37 Smith St</address1>
<county xmlns="http://transferobject.com/xsd">COUNTY-37</county>
<postcode xmlns="http://transferobject.com/xsd">Po12 123</postcode>
</address>
<basicDetails>
<currentFirstName xmlns="http://transferobject.com/xsd">Fred</currentFirstName>
<currentLastName xmlns="http://transferobject.com/xsd">Bloggs</currentLastName >
<currentStage xmlns="http://transferobject.com/xsd">H1</currentStage>
<currentGroup xmlns="http://transferobject.com/xsd">3</currentGroup>
<dob xmlns="http://transferobject.com/xsd">2000-04-25</dob>
<email xmlns="http://transferobject.com/xsd">AN#AN.AOM</email>
<entryDate xmlns="http://transferobject.com/xsd">2003-09-03</entryDate>
</basicDetails>
</personData>
<personData>
<address>
<address1 xmlns="http://transferobject.com/xsd">37 Smith St</address1>
<county xmlns="http://transferobject.com/xsd">COUNTY-37</county>
<postcode xmlns="http://transferobject.com/xsd">Po12 123</postcode>
</address>
<basicDetails>
<currentFirstName xmlns="http://transferobject.com/xsd">John</currentFirstName>
<currentLastName xmlns="http://transferobject.com/xsd">Bloggs</currentLastName >
<currentStage xmlns="http://transferobject.com/xsd">H1</currentStage>
<currentGroup xmlns="http://transferobject.com/xsd">3</currentGroup>
<dob xmlns="http://transferobject.com/xsd">1999-04-25</dob>
<email xmlns="http://transferobject.com/xsd">AN#AN.AOM</email>
<entryDate xmlns="http://transferobject.com/xsd">2003-09-03</entryDate>
</basicDetails>
</personData>
</person>
</company>
</companys>
</header>
I know you're using XPath, but as you have an answer with XPath I'll give one using Linq
using System;
using System.Linq;
using System.Xml.Linq;
namespace xmlTest
{
class Program
{
static void Main()
{
XDocument doc = XDocument.Load("C:\\Users\\me\\Desktop\\so.xml");
var personDataDetails = (from p in doc.Descendants().Elements()
where p.Name.LocalName == "personData"
select p);
foreach (var item in personDataDetails)
{
Console.WriteLine(item.ToString());
}
Console.ReadKey();
}
}
}
Are you just asking how you can iterate through the personData nodes without listing the full path? If that's what you want to do, you can just do this:
XPathNodeIterator xni = xPathNav.Select("//g2:personData", nsmgr);

Work with XML file

I have a sitemap file for search engines:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://site.com/</loc>
</url>
<url>
<loc>http://site.com/about</loc>
</url>
<url>
<loc>http://site.com/contacts</loc>
</url>
<url>
<loc>http://site.com/articles/article1.html</loc>
</url>
<url>
<loc>http://site.com/users/123</loc>
</url>
</urlset>
How to insert a new node?
When I use xDoc.Element("url") or xDoc.Element("urlset") or xDoc.Element("xml") or Doc.Elements(...) I get null always. It's very strange.
The code below shows how to navigate within the xml and how to insert a new node
XDocument xDoc = XDocument.Load("sitemap.xml");
XNamespace ns = xDoc.Root.Name.Namespace;
// Navigation within the xml
XElement urlset = xDoc.Element(ns + "urlset");
Console.WriteLine(urlset.Name.LocalName); // -> "urlset"
IEnumerable<XElement> urls = urlset.Elements(ns + "url");
foreach (var url in urls)
{
XElement loc = url.Element(ns + "loc");
Console.WriteLine(loc.Value); // -> "http://site.com/", ...
}
// Inserting a new node under "urlset" node
urlset.Add(
new XElement(ns + "url",
new XElement(ns + "loc",
"http://site.com//questions/4183526")));

Categories

Resources