How can I search values of consecutive XML nodes with C#? - c#

I want to select nodes from an XML that have consecutive child nodes with values matching with the respective words from my search term.
Here is a sample XML:
<book name="Nature">
<page number="4">
<line ln="10">
<word wn="1">a</word>
<word wn="2">white</word>
<word wn="3">bobcat</word>
<word wn="3">said</word>
</line>
<line ln="11">
<word wn="1">Hi</word>
<word wn="2">there,</word>
<word wn="3">Bob.</word>
</line>
</page>
My search term is Hi Bob. I want find all the nodes from the above XML that contain two consecutive words with values %Hi% and %Bob%. Please note that I want to perform a partial and case-insensitive match for each word in my search term.
It should return the following output for the above XML:
ln="10" wn="2" wn="3"
Please note that line (ln=10) is selected because it contains two consecutive words (in the correct order) that match with the search term. white=%Hi% bobcat=%Bob%
However, the next line (ln=11) is not selected because the matching nodes are not consecutive.
Please note that all the words from the search term should be matched in order for it to be considered a match.
Thank you!
[Edit]
I tried the following solution and it yields the expected results. Is there a better or more efficient way of achieving this? The program has to search 100,000 XML files per day and each of them would be 300 KB to 50 MB.
XDocument xDoc = XDocument.Load(#"C:\dummy.xml");
var xLines = xDoc
.Descendants("page")
.Descendants("line");
foreach (var xLine in xLines)
{
var xFirstWords = xLine
.Descendants("word")
.Where(item => item.Value.ToUpper().Contains("HI"));
foreach (var xFirstWord in xFirstWords)
{
var xNextWord = xFirstWord.NodesAfterSelf().OfType<XElement>().First();
if(xNextWord.Value.ToUpper().Contains("BOB"))
{
MessageBox.Show(xLine.FirstAttribute.Value + " " + xFirstWord.FirstAttribute.Value + " " + xNextWord.FirstAttribute.Value);
}
}
}

I could improvise my code. Please let me know if you have a better solution.
XDocument xDoc = XDocument.Load(#"C:\dummy.xml");
var xLines = xDoc
.Descendants("page")
.Descendants("line");
foreach (var xLine in xLines)
{
var xFirstWords = xLine
.Descendants("word")
.Where(item => item.Value.ToUpper().Contains("HI"))
.Where(item => item.ElementsAfterSelf("word").First().Value.ToUpper().Contains("BOB"));
foreach (var xFirstWord in xFirstWords)
{
var xNextWord = xFirstWord.ElementsAfterSelf("word").First();
MessageBox.Show(xLine.FirstAttribute.Value + " " + xFirstWord.FirstAttribute.Value + " " + xNextWord.FirstAttribute.Value);
}
}

I have no idea whether the performance of this code will be better or worse, but I'm pretty sure it will be different, so it might be worth trying. Reconstitute the text of the line and then use a regular expression to match.
Regex re = new Regex(#"^.*Hi\s+\S+\s+Bob$*", RegexOptions.IgnoreCase);
XDocument xDoc = XDocument.Load(#"C:\Users\user\Documents\temp.xml");
foreach (XElement xLine in xDoc.Root.Descendants("line")) {
string text = string.Join(" ", xLine.Elements("word").Select(x => x.Value));
if (re.IsMatch(text)) {
Console.WriteLine(text);
}
}

Things that come to mind performance-wise:
.Nodes will be faster than .Descendants, as it only gets the direct children.
Use IndexOf with OrdinalIgnoreCase instead of ToUpper.Contains.
In the foreach instead of NodesAfterSelf, you can just hold the previous node.
var xLines = xDoc.Descendants("line");
foreach (var xLine in xLines)
{
XNode prevWord = null;
foreach (var word in xLine.Nodes("word"))
{
if(prevWord == null && word.Value.IndexOf("HI", StringComparison.OrdinalIgnoreCase))
{
prevWord == word;
}
else if(prevWord != null && word.Value.IndexOf("BOB"), StringComparison.OrdinalIgnoreCase))
{
MessageBox.Show(xLine.FirstAttribute.Value + " " + prevWord.FirstAttribute.Value + " " + word.FirstAttribute.Value);
}
}
}

Related

Displaying List of Date-Sorted XML Elements as UI Text

I have an XML document with several different sections and I need to produce a 'feed' of the most recent entries which displays as UI text in the scene. I have a method for sorting entries by date within their sections which I know works, and I've been trying to apply it to the entire document.
This is an example of the structure of the document:
<Document>
<Data>3</Data>
<Section1>
<Type1>
<Entry ID="1">
<Date>09/08/2011</Date>
<Details1>text</Details1>
</Entry>
<Entry ID="3">
<Date>07/3/2012</Date>
<Details2>text</Details2>
</Entry>
</Type1>
<Type2 />
<Type3>
<Entry ID="2">
<Date>08/8/2011</Date>
<Details3>text</Details3>
<Details4>text</Details4>
</Entry>
</Type3>
</Section1>
<Section2>
<Type4 />
<Type5 />
</Section2>
...
</Document>
The problem is that, unlike in my previous method, I need to sort the date of every entry and display them as such - not in their individual sections. So far, the dates have been displaying all over the place, although I'm fairly sure that they're being sorted properly.
My code so far is as follows (the sorting method was taken from a very helpful blog post that I can't find the link to):
public void Feed () {
Debug.Log ("Feed initiated");
// FOR SORTING AND DISPLAYING THE DATA
XDocument xDocument = XDocument.Load (Application.persistentDataPath + "/UserData/document.xml");
var all =
from objActs in xDocument.Element("Document").Descendants("Entry")
let actDate = DateTime.ParseExact(objActs.Element("Date").Value,"d/M/yyyy",new CultureInfo("en-GB"))
orderby actDate
select objActs;
foreach (var objActs in all.ToList())
{
foreach (var aa in objActs.Ancestors("Section1").Elements("Type1").Elements("Entry")) {
Debug.Log(aa);
feedText.text += aa.Element("Date").Value+"\n";
feedText.text += aa.Element("Details1").Value+"\n\n";
}
foreach (var ab in objActs.Ancestors("Section1").Elements("Type2").Elements("Entry")) {
Debug.Log(ab);
feedText.text += ab.Element("Date").Value+"\n";
feedText.text += ab.Element("Details2").Value+"\n\n";
}
}
}
I was originally doing this using an if (objActs.parent.parent.name == Section 1) {} method, but that had exactly the same problem.
The problem appears to be with using feedText+=, especially since entries end up repeating themselves and I get the error "String too long for TextMeshGenerator. Cutting off characters". I need to format them in rects, eventually, since I'll be adding buttons in certain positions, but I imagine that using rects will also produce the same result.
Does anyone have any idea how I can get this to display properly to text?
I'm trying to display the whole document as a 'feed' in date order, so that, when displayed as text, they would appear:
Entry 2 date
Entry 2 details
Entry 1 date
Entry 1 details
Entry 3 date
Entry 3 details
At the moment, with my code, it displays as:
Entry 1 Date
Entry 1 Details
Entry 3 Date
Entry 1 details (partial - this is the point at which the string gets too long - the foreach loop isn't stopping after reading all elements)
Use XML Serialization for this purpose.
Try this. Messy because of the null elemenet.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input =
"<Document>" +
"<Data>3</Data>" +
"<Section1>" +
"<Type1>" +
"<Entry ID=\"1\">" +
"<Date>09/8/2011</Date>" +
"<Details1>text</Details1>" +
"</Entry>" +
"<Entry ID=\"3\">" +
"<Date>07/3/2012</Date>" +
"<Details2>text</Details2>" +
"</Entry>" +
"</Type1>" +
"<Type2 />" +
"<Type3>" +
"<Entry ID=\"2\">" +
"<Date>08/08/2011</Date>" +
"<Details3>text</Details3>" +
"<Details4>text</Details4>" +
"</Entry>" +
"</Type3>" +
"</Section1>" +
"<Section2>" +
"<Type4 />" +
"<Type5 />" +
"</Section2>" +
"</Document>";
XDocument doc = XDocument.Parse(input);
var results = doc.Descendants().Where(s => s.Name.ToString().StartsWith("Section"))
.Where(t => t.Descendants("Date").Count() > 0).Select(u => u.Elements().Where(v => v.Descendants("Date").Count() > 0)
.Select(w => w.Elements().Select(x => new
{
element = x,
dates = (DateTime)x.Descendants("Date").FirstOrDefault()
}).ToList()).SelectMany(y => y).OrderBy(z => z.dates).ToList().Select(b => b.element).ToList()).FirstOrDefault();
}
}
}
​
Most of the issue with the code displaying as text was due to the function being called on Update instead of on Start.
I also changed the XML format slightly so that sections and types were attributes instead of child elements. The resultant code is as below:
public void Feed () {
Debug.Log ("Feed initiated");
// FOR SORTING AND DISPLAYING THE DATA
XDocument xDocument = XDocument.Load (Application.persistentDataPath + "/UserData/document.xml");
// Many thanks to jdweng for suggesting using the element SortedDocument
XElement element = new XElement("SortedDocument");
var all =
from objActs in xDocument.Element("Document").Descendants("Data")
let actDate = DateTime.ParseExact(objActs.Element("Date").Value,"d/M/yyyy",new CultureInfo("en-GB"))
orderby actDate
select objActs;
foreach (var objActs in all.ToList())
{
element.Add (objActs);
} // Foreach
Debug.Log(element);
foreach (XElement sortedActs in element.Descendants("Entry")) {
if (sortedActs.Attribute("Section").Value == "Section1") {
if (sortedActs.Attribute("Type").Value == "Type1") {
feedText.text+=sortedActs.Element("Date").Value+"\n";
feedText.text+=sortedActs.Element("Details").Value+"\n\n";
}
if (sortedActs.Attribute("Type").Value == "Type2") {
feedText.text+=sortedActs.Element("Date").Value+"\n";
feedText.text+=sortedActs.Element("Details").Value+"\n\n";
}
}
}

Concatenate XML Node data

I have XML look like this
<BoxResult>
<DocumentType>BCN</DocumentType>
<DocumentID>BCN_20131113_1197005001#854#11XEZPADAHANDELC</DocumentID>
<DocumentVersion>1</DocumentVersion>
<ebXMLMessageId>CENTRAL_MATCHING</ebXMLMessageId>
<State>FAILED</State>
<Timestamp>2013-11-13T13:02:57</Timestamp>
<Reason>
<ReasonCode>efet:IDNotFound</ReasonCode>
<ReasonText>Unknown Sender</ReasonText>
</Reason>
<Reason>
<ReasonCode>efet:IDNotFound</ReasonCode>
<ReasonText>Unknown Receiver</ReasonText>
</Reason>
</BoxResult>
In my C# code , i need to parse through the XML and concatenate the Reason Text Data.
Basically , i need the output as Unknown Sender ; Unknown Receiver
I tried the following code but i am not getting the desired output
XmlNodeList ReasonNodeList = xmlDoc.SelectNodes(/BoxResult/Reason);
foreach (XmlNode xmln in ReasonNodeList)
{
ReasonText = ReasonText + ";" + xmlDoc.SelectSingleNode(/BoxResult/Reason/ReasonText).InnerXml.ToString();
}
if (ReasonText != " ")
{
ReasonText = ReasonText.Substring(1);
}
The output i am getting from this code is Unknown Sender ; Unknown Sender
It is not displaying Unknown Receiver
Please advise and your help will be useful
You are always using the same node to retrieve the data. The xmlDoc is always called (i.e. the first <Reason> node), instead of each targeted node.
XmlNodeList ReasonNodeList = xmlDoc.SelectNodes("/BoxResult/Reason/ReasonText"); //change here
foreach (XmlNode xmln in ReasonNodeList)
{
ReasonText = ReasonText + ";" + xmln.InnerXml.ToString(); //change here
}
if (ReasonText != " ")
{
ReasonText = ReasonText.Substring(1);
}
You're iterating through <Reason> nodes and each time selecting the first /BoxResult/Reason/ReasonText node in document (note you're not using your xmln variable anywhere).
By the way, here's a shorter version (replaces your whole code block):
ReasonText += String.Join(";",
xmlDoc.SelectNodes("/BoxResult/Reason/ReasonText")
.Cast<XmlNode>()
.Select(n => n.InnerText));

Reading multiple child nodes of xml file

I have created an Xml file with example contents as follows:
<?xml version="1.0" encoding="utf-8" ?>
<Periods>
<PeriodGroup name="HER">
<Period>
<PeriodName>Prehistoric</PeriodName>
<StartDate>-500000</StartDate>
<EndDate>43</EndDate>
</Period>
<Period>
<PeriodName>Iron Age</PeriodName>
<StartDate>-800</StartDate>
<EndDate>43</EndDate>
</Period>
<Period>
<PeriodName>Roman</PeriodName>
<StartDate>43</StartDate>
<EndDate>410</EndDate>
</Period>
</PeriodGroup>
<PeriodGroup name="CAFG">
<Period>
<PeriodName>Prehistoric</PeriodName>
<StartDate>-500000</StartDate>
<EndDate>43</EndDate>
</Period>
<Period>
<PeriodName>Roman</PeriodName>
<StartDate>43</StartDate>
<EndDate>410</EndDate>
</Period>
<Period>
<PeriodName>Anglo-Saxon</PeriodName>
<StartDate>410</StartDate>
<EndDate>800</EndDate>
</Period>
</PeriodGroup>
</Periods>
I need to be able to read the Period node children within a selected PeriodGroup. I guess the PeriodName could be an attribute of Period if that is more sensible.
I have looked at loads of examples but none seem to be quite right and there seems to be dozens of different methods, some using XmlReader, some XmlTextReader and some not using either. As this is my first time reading an Xml file, I thought I'd ask if anyone could give me a pointer. I've got something working just to try things out, but it feels clunky. I'm using VS2010 and c#. Also, I see a lot of people are using LINQ-Xml, so I'd appreciate the pros and cons of using this method.
string PG = "HER";
XmlDocument doc = new XmlDocument();
doc.Load(Server.MapPath("./Xml/XmlFile.xml"));
string text = string.Empty;
XmlNodeList xnl = doc.SelectNodes("/Periods/PeriodGroup");
foreach (XmlNode node in xnl)
{
text = node.Attributes["name"].InnerText;
if (text == PG)
{
XmlNodeList xnl2 = doc.SelectNodes("/Periods/PeriodGroup/Period");
foreach (XmlNode node2 in xnl2)
{
text = text + "<br>" + node2["PeriodName"].InnerText;
text = text + "<br>" + node2["StartDate"].InnerText;
text = text + "<br>" + node2["EndDate"].InnerText;
}
}
Response.Write(text);
}
You could use an XPath approach like so:
XmlNodeList xnl = doc.SelectNodes(string.Format("/Periods/PeriodGroup[#name='{0}']/Period", PG));
Though prefer LINQ to XML for it's readability.
This will return Period node children based on the PeriodGroup name attribute supplied, e.g. HER:
XDocument xml = XDocument.Load(HttpContext.Current.Server.MapPath(FileLoc));
var nodes = (from n in xml.Descendants("Periods")
where n.Element("PeriodGroup").Attribute("name").Value == "HER"
select n.Element("PeriodGroup").Descendants().Elements()).ToList();
Results:
<PeriodName>Prehistoric</PeriodName>
<StartDate>-500000</StartDate>
<EndDate>43</EndDate>
<PeriodName>Iron Age</PeriodName>
<StartDate>-800</StartDate>
<EndDate>43</EndDate>
<PeriodName>Roman</PeriodName>
<StartDate>43</StartDate>
<EndDate>410</EndDate>
The query is pretty straightforward
from n in xml.Descendants("Periods")
Will return a collection of the descendant elements for the element Periods.
We then use where to filter this collection of nodes based on attribute value:
where n.Element("PeriodGroup").Attribute("name").Value == "HER"
Will then filter down the collection to PeriodGroup elements that have a name attribute with a value of HER
Finally, we select the PeriodGroup element and get it's descendant nodes
select n.Element("PeriodGroup").Descendants().Elements()
EDIT (See comments)
Since the result of this expression is just a query, we use .ToList() to enumerate the collection and return an object containing the values you need. You could also create anonymous types to store the element values for example:
var nodes = (from n in xml.Descendants("Period").
Where(r => r.Parent.Attribute("name").Value == "HER")
select new
{
PeriodName = (string)n.Element("PeriodName").Value,
StartDate = (string)n.Element("StartDate").Value,
EndDate = (string)n.Element("EndDate").Value
}).ToList();
//Crude demonstration of how you can reference each specific element in the result
//I would recommend using a stringbuilder here..
foreach (var n in nodes)
{
text += "<br>" + n.PeriodName;
text += "<br>" + n.StartDate;
text += "<br>" + n.EndDate;
}
This is what the nodes object will look like after the query has run:
Since the XmlDocument.SelectNodes method actually accepts an XPath expression, you're free to go like this:
XmlNodeList xnl = doc.SelectNodes("/Periods/PeriodGroup[#name='" + PG + "']/Period");
foreach (XmlNode node in xnl) {
// Every node here is a <Period> child of the relevant <PeriodGroup>.
}
You can learn more on XPath at w3schools.
go thru this
public static void XMLNodeCheck(XmlNode xmlNode)
{
if (xmlNode.HasChildNodes)
{
foreach (XmlNode node in xmlNode)
{
if (node.HasChildNodes)
{
Console.WriteLine(node.Name);
if (node.Attributes.Count!=0)
{
foreach (XmlAttribute att in node.Attributes)
{
Console.WriteLine("----------" + att.Name + "----------" + att.Value);
}
}
XMLNodeCheck(node);//recursive function
}
else
{
if (!node.Equals(XmlNodeType.Element))
{
Console.WriteLine(node.InnerText);
}
}
}
}
}

Regex refactoring - Search for match and if found replace it in one line

Hello I am learning regex and I need a sum of your intelligence to solve this problem.
I need to know if I can search somewhere for match of some word and if matched I put whole article to match collection then I search in every item in collection by foreach and replacing the keyword with another... this code work but I need to know if is possible to do that without foreach because its wasting of memory....
MatchCollection mc;
List<string> listek = new List<string>();
Regex r = new Regex(#".*" + word + #".*");
mc = r.Matches(text);
foreach (var item in mc)
{
listek.Add(Regex.Replace(item.ToString(), word, #"<span class=""highlighted"">" + word + "</span>"));
}
I have the following XML:
<article>
<title>title 1</title>
<text>some long text</text>
</article>
<article>
<title>title 2</title>
<text>some long text</text>
</article>
I need to search for keyword in every text node and if i found match i need to return article witch replaced keywords... my code showed up do it but dummy way..(#"." + word + #".") this mean that i adding to collection whole text but only if contains my keyword i would like to replace the keywords in same time and i dont know how
I solved it like this:
internal static string SearchWordInXml()
{
var all = from a in WordBase.Descendants("ITEM")
select new
{
title = a.Element("TITLE").Value,
text = a.Element("TEXT").Value
};
foreach (var d in all)
{
Regex r = new Regex(#".*" + service.word + #".*");
Match v = r.Match(d.text);
Template();
var xElemData = TempBase.XPathSelectElement("//DATA");
if (v.Success)
{
XElement elemSet = new XElement("DATASET");
XElement elemId = new XElement("DATAPIECE");
XAttribute attId = new XAttribute("ATT", "TITLE");
XAttribute valueId = new XAttribute("VALUE", d.title);
elemSet.Add(elemId);
elemId.Add(attId);
elemId.Add(valueId);
XElement elemName = new XElement("DATAPIECE");
XAttribute attName = new XAttribute("ATT", "TEXT");
XAttribute valueName = new XAttribute("VALUE", Regex.Replace(d.text, service.word, #"<span class=""highlighted"">" + service.word + "</span>"));
xElemData.Add(elemSet);
elemSet.Add(elemName);
elemName.Add(attName);
elemName.Add(valueName);
}
}
return convert(TempBase);
}
If you are only looking into text nodes, I would probaby go with something like this
string text = "<article><title>title 1</title><text>some long text</text></article><article><title>title 2</title><text>some long text</text></article>";
string word = "long";
Regex r = new Regex("(?<=<text>.*?)"+word+"(?=.*?</text>)");
text = r.Replace(text, "<span class=\"highlighted\">$&</span>");
Text will now contains only you correct values.
Note that the $& is a backreference to the full match. If you would any kind of grouping (use of parenthesis ()) you could use $1, $2, $3, etc.
To only have one line you can use
text = Regex.Replace(text, "(?<=<text>.*?)"+word+"(?=.*?</text>)","<span class=\"highlighted\">$&</span>");

I want to split a XML Like string to tokens in c# or sql

I want to split a XML Like string to tokens in c# or sql.
for example
input string is like
<entry><AUTHOR>C. Qiao</AUTHOR> and <AUTHOR>R.Melhem</AUTHOR>, "<TITLE>Reducing Communication </TITLE>",<DATE>1995</DATE>. </entry>
and I want this output:
C AUTHOR
. AUTHOR
Qiao AUTHOR
and
R AUTHOR
. AUTHOR
Melhem AUTHOR
,
"
Reducing TITLE
Communication TITLE
"
,
1995 DATE
.
This is the first attempt on how to solve this problem, considering the following:
1. XML String will be valid (i.e. there's not going to be any invalid chars between tags)
Like this:
string xml = #"<ENTRY><AUTHOR>C. Qiao</AUTHOR>
<AUTHOR>R.Melhem</AUTHOR>
<TITLE>Reducing Communication </TITLE>
<DATE>1995</DATE>
</ENTRY>";
2. Splitting will be done by space ' '
string xml = #"<ENTRY><AUTHOR>C. Qiao</AUTHOR>
<AUTHOR>R.Melhem</AUTHOR>
<TITLE>Reducing Communication </TITLE>
<DATE>1995</DATE>
</ENTRY>";
XElement doc = XElement.Parse(xml);
foreach (XElement element in doc.Elements())
{
var values = element.Value.Split(' ');
foreach (string value in values)
{
Console.WriteLine(element.Name + " " + value);
}
}
Will print out
AUTHOR C.
AUTHOR Qiao
AUTHOR R.Melhem
TITLE Reducing
TITLE Communication
TITLE
DATE 1995
EDIT:
Now, to split based on "." and a space, the best idea is to use regex. Like this:
var values = Regex.Split(element.Value, #"(\.| )");
foreach (string value in values.Where(x=>!String.IsNullOrWhiteSpace(x)))
{
Console.WriteLine(element.Name + " " + value);
}
You can add more delimiters if you'd like. The following example will give you the following:
AUTHOR C
AUTHOR .
AUTHOR Qiao
AUTHOR R
AUTHOR .
AUTHOR Melhem
TITLE Reducing
TITLE Communication
DATE 1995
Edit2:
And here's an example that works with your original string, it is most likely not the best approach, since it doesn't have a correct ordering of tokens, but it should be pretty close:
string xml = #" <entry>
<AUTHOR>C. Qiao</AUTHOR>
and
<AUTHOR>R.Melhem</AUTHOR>,
""<TITLE>Reducing Communication </TITLE>""
,<DATE>1995</DATE>.
</entry>";
//Parse xml to XDocument
XDocument doc = XDocument.Parse(xml);
// Get first element (we only have one)
XElement element = doc.Descendants().FirstOrDefault();
//Create a copy of an element for use by child elements.
XElement copyElement = new XElement(element);
//Remove all child nodes from root leaving only text
element.Elements().Remove();
//Splitting based on the tokens specified
var values = Regex.Split(element.Value, #"(\.| |\,|\"")");
foreach (string value in values.Where(x => !String.IsNullOrWhiteSpace(x)))
{
Console.WriteLine(value);
}
//Getting children nodes and splitting the same way
foreach (XElement elem in copyElement.Elements())
{
var val = Regex.Split(elem.Value, #"(\.| |\,|\"")");
foreach (string value in val.Where(x => !String.IsNullOrWhiteSpace(x)))
{
Console.WriteLine(value + " " + elem.Name);
}
}
//You can try to play with DescendantsAndSelf
//to see if you can do it in single action and with order preserved.
//foreach (XElement elem in element.DescendantsAndSelf())
//{
// //....
//}
This will print out the following:
and
,
"
"
,
.
C AUTHOR
. AUTHOR
Qiao AUTHOR
R AUTHOR
. AUTHOR
Melhem AUTHOR
Reducing TITLE
Communication TITLE
1995 DATE
Edit: Just noticed I read the question wrong - having copied the formatted XML from the first answer instead of from the question I failed to notice the mixed content nodes within the string. This makes it easier. The solution could look like:
using System;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
class Program
{
static void Main(string[] args)
{
var xml = #"<entry><AUTHOR>C. Qiao</AUTHOR> and <AUTHOR>R.Melhem</AUTHOR>, ""<TITLE>Reducing Communication </TITLE>"",<DATE>1995</DATE>. </entry>";
var elem = XElement.Parse(xml);
var tokFunc = new Func<XNode, string>(node =>
{
var s = node.ToString().Replace(".", " . ").Replace(",", " , ");
var nodeName = node.Parent != null &&
node.Parent.NodeType == XmlNodeType.Element &&
node.Parent.Name.LocalName.ToUpper() != "ENTRY"
? node.Parent.Name.LocalName
: "";
var sb = new StringBuilder();
s.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries).ToList().ForEach(e => sb.AppendFormat("{0}\t{1}\n", e, nodeName));
return sb.ToString();
});
elem.DescendantNodes().Where(e => e.NodeType == XmlNodeType.Text).ToList()
.ForEach(c => Console.Write(tokFunc(c)));
}
}
Which produces the desired output:
C AUTHOR
. AUTHOR
Qiao AUTHOR
and
R AUTHOR
. AUTHOR
Melhem AUTHOR
,
"
Reducing TITLE
Communication TITLE
"
,
1995 DATE
.

Categories

Resources