Why is the XmlReader skipping elements? - c#

Please note this question is specific to XmlReader and not whether to use XDocument or XmlReader.
I have an XML fragment as:
private string GetXmlFragment()
{
return #"<bookstore>
<book genre='novel' ISBN='10-861003-324'>
<title>The Handmaid's Tale</title>
<price>19.95</price>
</book>
<book genre='novel' ISBN='1-861001-57-5'>
<title>Pride And Prejudice</title>
<price>24.95</price>
</book>
</bookstore>";
}
I also have an extension method as:
public static IEnumerable<XElement> GetElement(this XmlReader reader, string elementName)
{
reader.MoveToElement();
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element
&& reader.Name.Equals(elementName, StringComparison.InvariantCulture))
{
yield return XNode.ReadFrom(reader) as XElement;
}
}
}
I then try to get the two book elements by doing:
var xmlReaderSettings = new XmlReaderSettings
{
CheckCharacters = false,
ConformanceLevel = ConformanceLevel.Fragment,
IgnoreComments = true,
IgnoreWhitespace = true,
IgnoreProcessingInstructions = true
};
using (var stringReader = new StringReader(this.GetXmlFragment()))
using (var xmlReader = XmlReader.Create(stringReader, xmlReaderSettings))
{
xmlReader.GetElement("book").Count().ShouldBe(2);
}
However I only get the first element, debugging shows that as soon as I get the first element the reader jumps to the title of the second book element.
The solution is inspired from HERE
Any help is much appreciated.

The problem is that, if there is no intervening whitespace, the call to XNode.ReadFrom() will leave the XML reader positioned right at the next element. The while condition then immediately consumes this element before we can check it. The fix is to not call XmlReader.Read() immediately afterwards, but to continue checking for nodes (as the read has been done implicitly):
while (reader.Read()) {
while (reader.NodeType == XmlNodeType.Element
&& reader.Name.Equals(elementName, StringComparison.InvariantCulture)) {
yield return XNode.ReadFrom(reader) as XElement;
}
}
(In case it's not clear, the if in the loop has been changed to a while.)

public static IEnumerable<XElement> GetElement(this XmlReader reader, string elementName)
{
while (!reader.EOF)
if (reader.NodeType == XmlNodeType.Element && reader.Name == "book")
yield return XNode.ReadFrom(reader) as XElement;
else
reader.Read();
}

The code is skipping every other book tag because the book tags immediately follow each other . The read method leave the reader at the next book tag and then the read method moves before reading the 2nd book tag skipping the element. Try the code below which I developed and always works.
public static IEnumerable<XElement> GetElement(XmlReader reader, string elementName)
{
List<XElement> books = new List<XElement>();
while (!reader.EOF)
{
if(reader.Name != "book")
{
reader.ReadToFollowing("book");
}
if(!reader.EOF)
{
books.Add((XElement)XElement.ReadFrom(reader));
}
}
return books;
}

Like others have said, XNode.ReadFrom is advancing your reader to the next book open tag (if there is no whitespace between them) then reader.Read will advance to the inner text of that tag.
See here for more information:
https://stackoverflow.com/a/26788230/3850405
Fix for your extension method:
public static IEnumerable<XElement> GetElement(this XmlReader reader, string elementName)
{
reader.MoveToElement();
reader.Read();
while (!reader.EOF)
{
if (reader.NodeType == XmlNodeType.Element
&& reader.Name.Equals(elementName, StringComparison.InvariantCulture))
{
yield return XNode.ReadFrom(reader) as XElement;
}
else
{
reader.Read();
}
}
}

Related

XML element with multiple different Text elements

I have the following elements as part of an XML document:
<RegisterEntry>
<EntryNumber>3</EntryNumber>
<EntryDate>2009-01-30</EntryDate>
<EntryType>Registered Charges</EntryType>
<EntryText>REGISTERED CHARGE dated 30 December 2008.</EntryText>
</RegisterEntry>
<RegisterEntry>
<EntryNumber>4</EntryNumber>
<EntryType>Registered Charges</EntryType>
<EntryText>REGISTERED CHARGE dated 30 December 2008.</EntryText>
</RegisterEntry>
I am using XmlReader to iterate through the document. The RegisterEntry is an XMLNodeType.Element and the four enclosed in this element are XmlNodeType.Text. How can I assign each of these Text values to a different variable as the XmlReader returns an empty string for Node.Name on a NodeType.Text. Also, the repeated elements do not always have the same number of text elements. Code below:
XmlTextReader reader = new XmlTextReader(fName);
if(reader.NodeType == XmlNodeType.Element && reader.Name =="RegisterEntry")
{
propEntryNo = "";
propEntryDate = "";
propEntryType = "";
propEntryText = "";
while(reader.Read())
{
if(reader.NodeType == XmlNodeType.Text && reader.Name == "EntryNumber" && reader.HasValue)
{
propEntryNo = reader.Value;
}
if (reader.NodeType == XmlNodeType.Text && reader.Name == "EntryDate" && reader.HasValue)
{
propEntryDate = reader.Value;
}
if (reader.NodeType == XmlNodeType.Text && reader.Name == "EntryType" && reader.HasValue)
{
propEntryType = reader.Value;
}
if (reader.NodeType == XmlNodeType.Text && reader.Name == "EntryText" && reader.HasValue)
{
propEntryText += reader.Value + ",";
}
if(reader.NodeType == XmlNodeType.EndElement && reader.Name == "RegisterEntry")
{
add variable values to list
break;
}
}
}
In each of the if statements above the NodeType returns as Text and the Name as an empty string.
The XML element and the text inside are different nodes!
You have to read the content of the XML element first. Simple example:
switch (reader.Name)
{
// found a node with name = "EntryNumber" (type = Element)
case "EntryNumber":
// make sure it's not the closing tag
if (reader.IsStartElement())
{
// read the text inside the element, which is a seperate node (type = Text)
reader.Read();
// get the value of the text node
propEntryNo = reader.Value;
}
break;
// ...
}
Another option would be ReadElementContentAsString
switch (reader.Name)
{
case "EntryNumber":
propEntryNo = reader.ReadElementContentAsString();
break;
// ...
}
Of course, these simple examples assume that the XML is in the expected format. You should include appropriate checks in your code.
As for the other suggested solutions:
You could XmlDocument or XDocument or instead. The handling is easier, but the memory overhead is bigger (see also).
Deserializing the XML into objects is another option. But I feel handling errors caused by an unexpected format is trickier then.
You can use XDocument to list your RegisterEntry child node like
class Program
{
static void Main(string[] args)
{
XDocument doc = XDocument.Load(#"C:\Users\xxx\source\repos\ConsoleApp4\ConsoleApp4\Files\XMLFile14.xml");
var registerEntries = doc.Descendants("RegisterEntry");
var result = (from e in registerEntries
select new
{
EntryNumber = e.Element("EntryNumber") != null ? Convert.ToInt32(e.Element("EntryNumber").Value) : 0,
EntryDate = e.Element("EntryDate") != null ? Convert.ToDateTime(e.Element("EntryDate").Value) : (DateTime?)null,
EntryType = e.Element("EntryType") != null ? e.Element("EntryType").Value : "",
EntryText = e.Element("EntryText") != null ? e.Element("EntryText").Value : "",
}).ToList();
foreach (var entry in result)
{
Console.WriteLine($"EntryNumber: {entry.EntryNumber}");
Console.WriteLine($"EntryDate: {entry.EntryDate}");
Console.WriteLine($"EntryType: {entry.EntryType}");
Console.WriteLine($"EntryText: {entry.EntryText}");
Console.WriteLine();
}
Console.ReadLine();
}
}
Output:
You can also make certain operations on your list like.
//If you want to get all `EntryText` in xml to be comma separated then you can do like
string propEntryText = string.Join(", ", result.Select(x => x.EntryText));
//Get first register entry from xml
var getFirstRegisterEntry = result.FirstOrDefault();
//Get last register entry from xml
var getLastRegisterEntry = result.LastOrDefault();
//Get register entry from xml with specific condition
var getSpecificRegisterEntry = result.Where(x => x.EntryNumber == 3).SingleOrDefault();

C# xml reader, same element name

I'm trying to read an element from my xml file.
I need to read an string in an "link" element inside the "metadata",
but there are 2 elements called "link", I only need the second one:
<metadata>
<name>visit-2015-02-18.gpx</name>
<desc>February 18, 2015. Corn</desc>
<author>
<name>text</name>
<link href="http://snow.traceup.com/me?id=397760"/>
</author>
<link href="http://snow.traceup.com/stats/u?uId=397760&vId=1196854"/>
<keywords>Trace, text</keywords>
I need to read this line:
<link href="http://snow.traceup.com/stats/u?uId=397760&vId=1196854"/>
This is the working code for the first "link" tag, it works fine,
public string GetID(string path)
{
string id = "";
XmlReader reader = XmlReader.Create(path);
while (reader.Read())
{
if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "link"))
{
if (reader.HasAttributes)
{
id = reader.GetAttribute("href");
MessageBox.Show(id + "= first id");
return id;
//id = reader.ReadElementContentAsString();
}
}
}
return id;
}
Does anyone know how I can skip the first "link" element?
or check if reader.ReadElementContentAsString() contains "Vid" or something like that?
I hope you can help me.
xpath is the answer :)
XmlReader reader = XmlReader.Create(path);
XmlDocument doc = new XmlDocument();
doc.Load(reader);
XmlNodeList nodes = doc.SelectNodes("metadata/link");
foreach(XmlNode node in nodes)
Console.WriteLine(node.Attributes["href"].Value);
Use the String.Contains method to check if the string contains the desired substring, in this case vId:
public string GetID(string path)
{
XmlReader reader = XmlReader.Create(path);
while (reader.Read())
{
if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "link"))
{
if (reader.HasAttributes)
{
var id = reader.GetAttribute("href");
if (id.Contains(#"&vId"))
{
MessageBox.Show(id + "= correct id");
return id;
}
}
}
return String.Empty;
}
If acceptable you can also use LINQ2XML:
var reader = XDocument.Load(path); // or XDocument.Parse(path);
// take the outer link
Console.WriteLine(reader.Root.Element("link").Attribute("href").Value);
The output is always:
http://snow.traceup.com/stats/u?uId=397760&vId=1196854= first id
Another options is to use XPath like #user5507337 suggested.
XDocument example:
var xml = XDocument.Load(path); //assuming path points to file
var nodes = xml.Root.Elements("link");
foreach(var node in nodes)
{
var href = node.Attribute("href").Value;
}

how to parse XML using XmlReader along with their closing tags?

Consider the following XML which I have to parse.
<root>
<item>
<itemId>001</itemId>
<itemName>test 1</itemName>
<description/>
</item>
</root>
I have to parse each of its tag and store it into a table as follows:
TAG_NAME TAG_VALUE IsContainer
------------ -------------- -----------
root null true
item null true
itemId 001 false
itemName test 1 false
description null false
/item null true
/root null true
Now to get this done, I am using XmlReader as this allows us to parse each & every node.
I am doing it as follows:
I created the following class to contain each tag's data
public class XmlTag
{
public string XML_TAG { get; set; }
public string XML_VALUE { get; set; }
public bool IsContainer { get; set; }
}
I am trying to get the list of tags(including closing ones) as follows:
private static List<XmlTag> ParseXml(string path)
{
var tags = new List<XmlTag>();
using (var reader = XmlReader.Create(path))
{
while (reader.Read())
{
var tag = new XmlTag();
bool shouldAdd = false;
switch (reader.NodeType)
{
case XmlNodeType.Element:
shouldAdd = true;
tag.XML_TAG = reader.Name;
//How do I get the VALUE of current reader?
//How do I determine if the current node contains children nodes to set IsContainer property of XmlTag object?
break;
case XmlNodeType.EndElement:
shouldAdd = true;
tag.XML_TAG = string.Format("/{0}", reader.Name);
tag.XML_VALUE = null;
//How do I determine if the current closing node belongs to a node which had children.. like ROOT or ITEM in above example?
break;
}
if(shouldAdd)
tags.Add(tag);
}
}
return tags;
}
but I am having difficulty determining the following:
How to determine if current ELEMENT contains children XML nodes? To set IsContainer property.
How to get the value of current node value if it is of type XmlNodeType.Element
Edit:
I have tried to use LINQ to XML as follows:
var xdoc = XDocument.Load(#"SampleItem.xml");
var tags = (from t in xdoc.Descendants()
select new XmlTag
{
XML_TAG = t.Name.ToString(),
ML_VALUE = t.HasElements ? null : t.Value,
IsContainer = t.HasElements
}).ToList();
This gives me the XML tags and their values but this does not give me ALL the tags including the closing ones. That's why I decided to try XmlReader. But If I have missed anything in LINQ to XML example, please correct me.
First of all, as noted by Jon Skeet in the comments you should probably consider using other tools, like XmlDocument possibly with LINQ to XML (EDIT: an example with XmlDocument follows).
Having said that, here is the simplest solution for what you have currently (note that it's not the cleanest possible code, and it doesn't have much validation):
private static List<XmlTag> ParseElement(XmlReader reader, XmlTag element)
{
var result = new List<XmlTag>() { element };
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
element.IsContainer = true;
var newTag = new XmlTag() { XML_TAG = reader.Name };
if (reader.IsEmptyElement)
{
result.Add(newTag);
}
else
{
result.AddRange(ParseElement(reader, newTag));
}
break;
case XmlNodeType.Text:
element.XML_VALUE = reader.Value;
break;
case XmlNodeType.EndElement:
if (reader.Name == element.XML_TAG)
{
result.Add(new XmlTag()
{
XML_TAG = string.Format("/{0}", reader.Name),
IsContainer = element.IsContainer
});
}
return result;
}
}
return result;
}
private static List<XmlTag> ParseXml(string path)
{
var result = new List<XmlTag>();
using (var reader = XmlReader.Create(path))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
result.AddRange(ParseElement(
reader,
new XmlTag() { XML_TAG = reader.Name }));
}
else if (reader.NodeType == XmlNodeType.EndElement)
{
result.Add(new XmlTag()
{
XML_TAG = string.Format("/{0}",current.Name)
});
}
}
}
return result;
}
An example using XmlDocument. This will give slightly different result for self-enclosing tags (<description/> in your case). You can change this behaviour easily, depending on what you want.
private static IEnumerable<XmlTag> ProcessElement(XElement current)
{
if (current.HasElements)
{
yield return new XmlTag()
{
XML_TAG = current.Name.ToString(),
IsContainer = true
};
foreach (var tag in current
.Elements()
.SelectMany(e => ProcessElement(e)))
{
yield return tag;
}
yield return new XmlTag()
{
XML_TAG = string.Format("/{0}", current.Name.ToString()),
IsContainer = true
};
}
else
{
yield return new XmlTag()
{
XML_TAG = current.Name.ToString(),
XML_VALUE = current.Value
};
yield return new XmlTag()
{
XML_TAG = string.Format("/{0}",current.Name.ToString())
};
}
}
And using it:
var xdoc = XDocument.Load(#"test.xml");
var tags = ProcessElement(xdoc.Root).ToList();

XmlReader How to read properly?

what is the best way to read xml like this one:
<Users>
<user name = "mail">
<supplier name = "supp1">
<project name = "proj1">
<subscribe name= "sub1"/>
<subscribe name = "sub2"/>
</project>
</supplier>
<supplier name = "supp2">
<project name = "proj2">
<subscribe name = "sub3"/>
</project>
<project name = "proj3">
<subscribe name= "sub4"/>
<subscribe name = "sub5"/>
</project>
<project name = "proj4"/>
</supplier>
<supplier name = "supp3"/>
<supplier name = "supp5">
<project name = "proj4"/>
<supplier name = "supp4"/>
</user>
</Users>
For now I am using
While(reader.Read())
{
if (((reader.NodeType == XmlNodeType.EndElement) && (reader.Name == "user")))
break;
if ((reader.NodeType == XmlNodeType.Element) && (reader.Name =="supplier"))
{
foreach (TreeNode tree in TreeView1.Nodes)
{
if (reader.GetAttribute(0) == tree.Text)
{
TreeView1.SelectedNode = tree;
TreeView1.SelectedNode.Checked = true;
Get_projects(reader, tree);
break;
}
}
}
}
this is the main after that is get_projects(...):
private void Get_projects(XmlReader reader, TreeNode tree)
{
while (reader.Read())
{
if ((reader.NodeType == XmlNodeType.EndElement) && (reader.Name == "supplier")) break;
//(reader.IsEmptyElement && reader.Name == "supplier")
if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "project"))
{
foreach (TreeNode projTree in tree.Nodes)
{
if (reader.GetAttribute(0) == projTree.Text)
{
TreeView1.SelectedNode = projTree;
TreeView1.SelectedNode.Checked = true;
Get_subscribes(reader, projTree);
break;
}
}
}
}
}
the Get_subscribes(reader, projTree):
private void Get_subscribes(XmlReader reader, TreeNode tree)
{
while (reader.Read())
{
if ((reader.NodeType == XmlNodeType.EndElement) && (reader.Name == "project") ||
(reader.IsEmptyElement && reader.Name == "project")) break;
if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "subscribe"))
{
foreach (TreeNode subTree in tree.Nodes)
{
if (reader.GetAttribute(0) == subTree.Text)
{
TreeView1.SelectedNode = subTree;
TreeView1.SelectedNode.Checked = true;
break;
}
}
}
}
}
It doesn't work, so I am wondering is there a better way or what am i missing?
I will give you sample to read properly
<ApplicationPool>
<Accounts>
<Account>
<NameOfKin></NameOfKin>
<StatementsAvailable>
<Statement></Statement>
</StatementsAvailable>
</Account>
</Accounts>
</ApplicationPool>
static IEnumerable<XElement> SimpleStreamAxis(string inputUrl, string elementName)
{
using (XmlReader reader = XmlReader.Create(inputUrl))
{
reader.MoveToContent();
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (reader.Name == elementName)
{
XElement el = XNode.ReadFrom(reader) as XElement;
if (el != null)
{
yield return el;
}
}
}
}
}
}
using (XmlReader reader = XmlReader.Create(inputUrl))
{
reader.ReadStartElement("theRootElement");
while (reader.Name == "TheNodeIWant")
{
XElement el = (XElement) XNode.ReadFrom(reader);
}
reader.ReadEndElement();
}
Source: Reading Xml with XmlReader in C#
Hope this help.
I would consider the reverse approach i.e.: instead of reading XML and checking if a node exists in TreeView I would prefer to use XPath to check if a node exists in XML document.
In order to do so you have to traverse TreeView nodes and for each node build XPath query
e.g.: /Users/user/supplier[#name='supp1']/project[#name='proj1'].
Having XPath query you can create an instance of XPathDocument based on your XMLReader and run the query. If something is found, you will check current node in TreeView.
you can try XPath to read informations you need
XMLDocument doc = new XMLDocument();
doc.Load(your_xml_file_path);
XMLNodeList list = doc.SelectNodes(#"//project"); //get all project element

Getting the value of XML tag xith XMLReader and add to LinkedList C#

I am a new to programming, and have a serious problem and cant get out of it.
I have 5 XML URLs. such as http://www.shopandmiles.com/xml/3_119_3.xml
This is an XML URL which I have to get values and write to database in related columns.
My column names and XML tag names do match.
When I write the below code, reader element miss null xml values. Some tags do not have value inside. I have to add them null to linkedlist because after that code, i am going through the linked list but the order doesnt match if ı cant add a value for null xml values. So column names and data inside doesnt match. i lose the order. My all code is here, you can also check comment in the code if that helps. Thank you all.
public void WebServiceShopMilesCampaignsXMLRead(string URL)
{
XmlReader reader = XmlReader.Create(URL);
LinkedList<string> linkedList = new LinkedList<string>();
List<ShopAndMilesCampaigns> shopMileCampaigns = new List<ShopAndMilesCampaigns>();
try
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Text:
linkedList.AddLast(reader.Value);
break;
}
}
}
catch (XmlException exception)
{
Console.WriteLine("XML okurken bir sorun oluştu, hata detayı --> " + exception.Message);
}
LinkedListNode<string> node = linkedList.First;
while (node != null)
{
ShopAndMilesCampaigns shopMilesCampaign = new ShopAndMilesCampaigns();
shopMilesCampaign.Name = node.Value; // Null values mixes up the order because i cant add as null with reader.read above
node = node.Next;
shopMilesCampaign.Summary = node.Value;
node = node.Next;
shopMilesCampaign.AccountName = node.Value;
node = node.Next;
shopMilesCampaign.Category = node.Value;
node = node.Next;
shopMilesCampaign.Sector = node.Value;
node = node.Next;
shopMilesCampaign.Details = node.Value;
node = node.Next;
shopMilesCampaign.Image = node.Value;
node = node.Next;
shopMilesCampaign.Status = 1;
node = node.Next;
shopMileCampaigns.Add(shopMilesCampaign);
}
foreach (ShopAndMilesCampaigns shopMileCampaign in shopMileCampaigns)
{
shopMileCampaign.Insert();
}
}
I found the answer. Here it is to let you know.
If the XmlNodeType is equal to Element, then the loop continues to read from the XML data and looks for Whitesapces and end Element of XML tag. The below code gives me the exact value of XML tag even it is empty.
public LinkedList<string> AddToLinkedList(XmlReader reader)
{
LinkedList<string> linkedList = new LinkedList<string>();
try
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
reader.Read();
Start:
if (reader.NodeType == XmlNodeType.Whitespace || reader.NodeType == XmlNodeType.Element)
{
reader.Read();
goto Start;
}
else if (reader.NodeType == XmlNodeType.EndElement)
{
linkedList.AddLast("");
}
else
{
linkedList.AddLast(reader.Value);
}
break;
}
}
}

Categories

Resources