Removing invalid child nodes but keep its contents intact..? - c#

I have some xml files that look like sample file
I want to remove invalid xref nodes from it but keep the contents of those nodes as it is.
The way to know whether a xref node is valid is to check its attribute rid's value exactly matches any of the attributes id of any node present in the entire file, so the output file of the above sample should be something like sample output file
The code I've written thus far is below
XDocument doc=XDocument.Load(#"D:\sample\sample.xml",LoadOptions.None);
var ids = from a in doc.Descendants()
where a.Attribute("id") !=null
select a.Attribute("id").Value;
var xrefs=from x in doc.Descendants("xref")
where x.Attribute("rid")!=null
select x.Attribute("rid").Value;
if (ids.Any() && xrefs.Any())
{
foreach(var xref in xrefs)
{
if (!ids.Contains(xref))
{
string content= File.ReadAllText(#"D:\sample\sample.xml");
string result=Regex.Replace(content,"<xref ref-type=\"[^\"]+\" rid=\""+xref+"\">(.*?)</xref>","$1");
File.WriteAllText(#"D:\sample\sample.xml",result);
}
}
Console.WriteLine("complete");
}
else
{
Console.WriteLine("No value found");
}
Console.ReadLine();
The problem is when the values of xref contain characters like ., *, (etc. which on a regex replace needs to be escaped properly or the replace can mess up the file.
Does anyone have a better solution to the problem?

You don't need regex to do this. Instead use element.ReplaceWith(element.Nodes()) to replace node with its children. Sample code:
XDocument doc = XDocument.Load(#"D:\sample\sample.xml", LoadOptions.None);
// use HashSet, since you only use it for lookups
var ids = new HashSet<string>(from a in doc.Descendants()
where a.Attribute("id") != null
select a.Attribute("id").Value);
// select both element itself (for update), and value of "rid"
var xrefs = from x in doc.Descendants("xref")
where x.Attribute("rid") != null
select new { element = x, rid = x.Attribute("rid").Value };
if (ids.Any()) {
var toUpdate = new List<XElement>();
foreach (var xref in xrefs) {
if (!ids.Contains(xref.rid)) {
toUpdate.Add(xref.element);
}
}
if (toUpdate.Count > 0) {
foreach (var xref in toUpdate) {
// replace with contents
xref.ReplaceWith(xref.Nodes());
}
doc.Save(#"D:\sample\sample.xml");
}
}

Related

HTMLAgilityPack error: "Multiple node elements can't be created."

I'm attempting to use the HTMLAgilityPack to get retrieve and edit inner text of some HTML. The inner text of each node i retrieve needs to be checked for matching strings and those matching strings to be highlighted like so:
var HtmlDoc = new HtmlDocument();
HtmlDoc.LoadHtml(item.Content);
var nodes = HtmlDoc.DocumentNode.SelectNodes("//div[#class='guide_subtitle_cell']/p");
foreach (HtmlNode htmlNode in nodes)
{
htmlNode.ParentNode.ReplaceChild(HtmlTextNode.CreateNode(Methods.HighlightWords(htmlNode.InnerText, searchstring)), htmlNode);
}
This is the code for the HighlightWords method I use:
public static string HighlightWords(string input, string searchstring)
{
if (input == null || searchstring == null)
{
return input;
}
var lowerstring = searchstring.ToLower();
var words = lowerstring.Split(' ').ToList();
for (var i = 0; i < words.Count; i++)
{
Match m = Regex.Match(input, words[i], RegexOptions.IgnoreCase);
if (m.Success)
{
string ReplaceWord = string.Format("<span class='search_highlight'>{0}</span>", m.Value);
input = Regex.Replace(input, words[i], ReplaceWord, RegexOptions.IgnoreCase);
}
}
return input;
}
Can anyone suggest how to get this working or indicate what i'm doing wrong?
The problem is that HtmlTextNode.CreateNode can only create one node. When you add a <span> inside, that's another node, and CreateNode throws the exception you see.
Make sure that you are only doing a search and replace on the lowest leaf nodes (nodes with no children). Then rebuild that node by:
Create a new empty node to replace the old one
Search for the text in .InnerText
Use HtmlTextNode.Create to add the plain text before the text you want to highlight
Then add your new <span> with the highlighted text with HtmlNode.CreateNode
Then search for the next occurrence (start back at 1) until no more occurrences are found.
Your function HighlightWords must be returning multiple top-level HTML nodes. For example:
<p>foo</p>
<span>bar</span>
The HtmlAgilityPack only allows one top-level node to be returned. You can hardcode the return value for HighlightWords to test.
Also, this post has run across the same problem.

How to check XML nodes contained in different XML files for equality?

I have two XML files (file A and file B where file A is a subset of file B) which I read using the System.Xml.XmlDocument.LoadXml(fileName) method.
I am then selecting nodes within these files using the System.Xml.XmlNode.SelectNodes(nodeName) I need to compare that each selected xml node in file A is either equal or a subset of that same node in file B. Need to also check that the order of the subnodes contained within any node in file A is the same of the order of those same subnodes contained within that node in fileB.
For example,
fileA
<rootNodeA>
<elementA>
<subelementA>content</subElementA>
<subelementB>content</subElementB>
<subelementB>content</subElementC>
<subelementB>content</subElementD>
</elementA>
<elementB>
<subelementA>content</subElementA>
<subelementB>content</subElementB>
</elementB>
</rootNodeA>
fileB
<rootNodeB>
<elementA>
<subelementB>content</subElementB>
<subelementD>content</subElementD>
</elementA>
<elementB>
<subelementA>content</subElementA>
</elementB>
</rootNodeB>
As you see, fileB is a subset of fileA. I need to check that elementA node of file B is equal or a subset of that same elementA node in file A. This should be true for the subnodes (subElementA, etc.) as well and the content of the nodes/subnodes.
Also, if you see elementA in fileA, there are 4 subelements in the order A,B,C,D. For that same elementA in fileB, there are 2 subelements in the order A,D. This order i.e A comes before D is same as the order in file A, need to check this as well.
My idea is to compute Hashes of the nodes and then compare them but unsure of how or if this would satisfy the purpose.
EDIT: Code I have so far,
HashSet<XmlElement> hashA = new HashSet<XmlElement>();
HashSet<XmlElement> hashB = new HashSet<XmlElement>();
foreach (XmlElement node in nodeList)
{
hashA.Add(node);
}
foreach(XmlElement node in masterNodeList)
{
hashB.Add(node);
}
isSubset = new HashSet<XmlElement>(hashA).IsSubsetOf(hashB);
return isSubset;
this sounds like a simple recursive function.
didn't check if it actually work, but that should do it:
public static bool isSubset(XmlElement source, XmlElement target)
{
if (!target.HasChildNodes)
{
if (source.HasChildNodes) // surly not same.
return false;
return string.Equals(source.Value, target.Value); // equalize values.
}
var sourceChildren = source.ChildNodes.OfType<XmlElement>().ToArray(); // list all child tags in source (by order)
var currentSearchIndex = 0; // where are we searching from (where have we found our match)
foreach (var targetChild in target.ChildNodes.OfType<XmlElement>())
{
var findIndex = Array.FindIndex(sourceChildren, currentSearchIndex, el => el.Name == targetChild.Name);
if (findIndex == -1)
return false; // not found in source, therefore not a subset.
if (!isSubset(sourceChildren[findIndex], targetChild))
return false; // if the child is not a subset, then parent isn't too.
currentSearchIndex = findIndex; // increment our search index so we won't match nodes that already passed.
}
}

Cannot find item name C# XML

I'm having a problem with my XML document.
I want my program to find all values of the items in my XML file, but only if the handlingType is of a certain character bunch.
Code (C#) :
string path = "//files//handling.meta";
var doc = XDocument.Load(path);
var items = doc.Descendants("HandlingData").Elements("Item");
var query = from i in items
select new
{
HandlingName = (string)i.Element("handlingName"),
HandlingType = (string)i.Element("HandlingType"),
Mass = (decimal?)i.Element("fMass")
};
foreach (var HandlingType in items)
{
if (HandlingType.ToString() == "HANDLING_TYPE_FLYING")
{
MessageBox.Show(HandlingType.ToString());
}
}
The above code demonstraights a short version of what I want to happen, but fails to find this handlingType (does not show the messageBox)
Here's the XML :
<CHandlingDataMgr>
<HandlingData>
<Item type="CHandlingData">
<handlingName>Plane</handlingName>
<fMass value="380000.000000"/>
<handlingType>HANDLING_TYPE_FLYING</handlingType>
</Item>
<Item type="CHandlingData">
<handlingName>Car1</handlingName>
<fMass value="150000.000000"/>
<handlingType>HANDLING_TYPE_DRIVING</handlingType>
</Item>
</HandlingData>
</CHandlingDataMgr>
I would like the output to show the handlingName if it contains a certain HandlingType
For e.g.
if (handlingType == "HANDLING_TYPE_FLYING")
{
messageBox.Show(this.HandlingName);
}
My problem in short : Program does not find item's handling type, it does find the tag but when asked to display, returns empty/shows as nothing.
Edit: Also in the XML handling_type_flying contains extra elements such as thrust that cannot be found in each item (such as car), I would like the program to also find these elements. (this is a second problem I'm facing, maybe should ask 2nd ques?)
Several things that need fixing.
you are not using your query in your foreach loop. foreach (var item in query)
Your element has an upercase "H" but should be lowercase "handlingType". HandlingType = (string)i.Element("handlingType"),
You are not pulling the Attribute value of your fMass element.Mass = i.Element("fMass").Attribute("value").Value
Once you adjust your Query in your foreach loop you then need to adjust the loop to account for looping over your newly made object.
NOTE that I removed (decimal) from Mass = i.Element("fMass").Attribute("value").Value
here is the code with all the fixes.
class Program
{
static void Main()
{
const string path = "//files//handling.meta";
var doc = XDocument.Load(path);
var items = doc.Descendants("HandlingData").Elements("Item");
var query = from i in items
select new
{
HandlingName = (string)i.Element("handlingName"),
HandlingType = (string)i.Element("handlingType"),
Mass = i.Element("fMass").Attribute("value").Value
};
foreach (var item in query)
{
if (item.HandlingType == "HANDLING_TYPE_FLYING")
{
//Remove messagebox if consoleapp
MessageBox.Show(item.HandlingType);
MessageBox.Show(item.HandlingName);
Console.WriteLine(item.HandlingType);
Console.WriteLine(item.HandlingName);
}
}
}
}
I would recommend looking into serializing your xml to an object.
If you look at http://msdn.microsoft.com/en-us/library/system.xml.linq.xelement(v=vs.110).aspx the ToString() method doesn't return the name of the tag, but the indented XML.
You should instead be using the Value property. Also you should use .equals("...") instead of ==
if (handlingType.Value.equals("HANDLING_TYPE_FLYING")
{
messageBox.Show(this.handlingname);
}

Create XML based on text tree

I need to go from a list like this:
/home
/home/room1
/home/room1/subroom
/home/room2
/home/room2/miniroom
/home/room2/bigroom
/home/room2/hugeroom
/home/room3
to an xml file. I've tried using LINQ to XML to do this but I just end up getting confused and not sure what to do from there. Any help is much appreciated!
Edit:
I want the XML file to look something like this:
<home>
<room1>
<subroom>This is a subroom</subroom>
</room1>
<room2>
<miniroom>This is a miniroom</miniroom>
<bigroom>This is a bigroom</bigroom>
<hugeroom>This is a hugeroom</hugeroom>
</room2>
<room3></room3>
</home>
The text inside if the tags ("this is a subroom", etc) is optional, but would be really nice to have!
Ok buddy, here's a solution.
Couple of notes and explanation.
Your text structure can be split up into lines and then again by the slashes into the names of the XML nodes. If you think of the text in this way, you get a list of "lines" broken into a list of
names.
/home
First of all, the first line /home is the root of the XML; we can get rid of it and just create and XDocument object with that name as the root element;
var xDoc = new XDocument("home");
Of course we don't want to hard code things but this is just an example. Now, on to the real work:
/home/room1/
/home/room1/bigroom
etc...
as a List<T> then it will look like this
myList = new List<List<string>>();
... [ add the items ]
myList[0][0] = home
myList[0][1] = room1
myList[1][0] = home
myList[1][1] = room1
myList[1][2] = bigroom
So what we can do to get the above structure is use string.Split() multiple times to break your text first into lines, then into parts of each line, and end up with a multidimensional array-style List<T> that contains List<T> objects, in this case, List<List<string>>.
First let's create the container object:
var possibleNodes = new List<List<string>>();
Next, we should split the lines. Let's call the variable that holds the text, "text".
var splitLines = text
.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.ToList();
This gives us a List but our lines are still not broken up. Let's split them again by the slash (/) character. This is where we build our node names. We can do this in a ForEach and just add to our list of possible nodes:
splitLines.ForEach(l =>
possibleNodes.Add(l
.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
.ToList()
)
);
Now, we need to know the DEPTH of the XML. Your text shows that there will be 3 nodes of depth. The node depth is the maximum depth of any one given line of nodes, now stored in the List<List<string>>; we can use the .Max() method to get this:
var nodeDepth = possibleNodes.Max(n => n.Count);
A final setup step: We don't need the first line, because it's just "home" and it will be our root node. We can just create an XDocument object and give it this first line to use as the name of Root:
// Create the root node
XDocument xDoc = new XDocument(new XElement(possibleNodes[0][0]));
// We don't need it anymore
possibleNodes.RemoveAt(0);
Ok, here is where the real work happens, let me explain the rules:
We need to loop through the outer list, and through each inner list.
We can use the list indexes to understand which node to add to or which names to ignore
We need to keep hierarchy proper and not duplicate nodes, and some XLinq helps here
The loops - see the comments for a detailed explanation:
// This gets us looping through the outer nodes
for (var i = 0; i < possibleNodes.Count; i++)
{
// Here we go "sideways" by going through each inner list (each broken down line of the text)
for (var ii = 1; ii < nodeDepth; ii++)
{
// Some lines have more depth than others, so we have to check this here since we are looping on the maximum
if (ii < possibleNodes[i].Count)
{
// Let's see if this node already exists
var existingNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii]));
// Let's also see if a parent node was created in the previous loop iteration.
// This will tell us whether to add the current node at the root level, or under another node
var parentNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii - 1]));
// If the current node has already been added, we do nothing (this if statement is not entered into)
// Otherwise, existingNode will be null and that means we need to add the current node
if (null == existingNode)
{
// Now, use parentNode to decide where to add the current node
if (null == parentNode)
{
// The parent node does not exist; therefore, the current node will be added to the root node.
xDoc.Root.Add(new XElement(possibleNodes[i][ii]));
}
else
{
// There IS a parent node for this node!
// Therefore, we must add the current node to the parent node
// (remember, parent node is the previous iteration of the inner for loop on nodeDepth )
var newNode = new XElement(possibleNodes[i][ii]);
parentNode.Add(newNode);
// Add "this is a" text (bonus!) -- only adding this text if the current node is the last one in the list.
if (possibleNodes[i].Count -1 == ii)
{
newNode.Add(new XText("This is a " + newNode.Name.LocalName));
}
}
}
}
}
}
The bonus here is this code will work with any number of nodes and build your XML.
To check it, XDocument has a nifty .ToString() overriden implementation that just spits out all of the XML it is holding, so all you do is this:
Console.Write(xDoc.ToString());
And, you'll get this result:
(Note I added a test node to make sure it works with more than 3 levels)
Below, you will find the entire program with your test text, etc, as a working solution:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
namespace XmlFromTextString
{
class Program
{
static void Main(string[] args)
{
// This simulates text from a file; note that it must be flush to the left of the screen or else the extra spaces
// add unneeded nodes to the lists that are generated; for simplicity of code, I chose not to implement clean-up of that and just
// ensure that the string literal is not indented from the left of the Visual Studio screen.
string text =
#"/home
/home/room1
/home/room1/subroom
/home/room2
/home/room2/miniroom
/home/room2/test/thetest
/home/room2/bigroom
/home/room2/hugeroom
/home/room3";
var possibleNodes = new List<List<string>>();
var splitLines = text
.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.ToList();
splitLines.ForEach(l =>
possibleNodes.Add(l
.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
.ToList()
)
);
var nodeDepth = possibleNodes.Max(n => n.Count);
// Create the root node
XDocument xDoc = new XDocument(new XElement(possibleNodes[0][0]));
// We don't need it anymore
possibleNodes.RemoveAt(0);
// This gets us looping through the outer nodes
for (var i = 0; i < possibleNodes.Count; i++)
{
// Here we go "sideways" by going through each inner list (each broken down line of the text)
for (var ii = 1; ii < nodeDepth; ii++)
{
// Some lines have more depth than others, so we have to check this here since we are looping on the maximum
if (ii < possibleNodes[i].Count)
{
// Let's see if this node already exists
var existingNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii]));
// Let's also see if a parent node was created in the previous loop iteration.
// This will tell us whether to add the current node at the root level, or under another node
var parentNode = xDoc.Root.Descendants().FirstOrDefault(d => d.Name.LocalName == (possibleNodes[i][ii - 1]));
// If the current node has already been added, we do nothing (this if statement is not entered into)
// Otherwise, existingNode will be null and that means we need to add the current node
if (null == existingNode)
{
// Now, use parentNode to decide where to add the current node
if (null == parentNode)
{
// The parent node does not exist; therefore, the current node will be added to the root node.
xDoc.Root.Add(new XElement(possibleNodes[i][ii]));
}
else
{
// There IS a parent node for this node!
// Therefore, we must add the current node to the parent node
// (remember, parent node is the previous iteration of the inner for loop on nodeDepth )
var newNode = new XElement(possibleNodes[i][ii]);
parentNode.Add(newNode);
// Add "this is a" text (bonus!) -- only adding this text if the current node is the last one in the list.
if (possibleNodes[i].Count -1 == ii)
{
newNode.Add(new XText("This is a " + newNode.Name.LocalName));
// For the same default text on all child-less nodes, us this:
// newNode.Add(new XText("This is default text"));
}
}
}
}
}
}
Console.Write(xDoc.ToString());
Console.ReadKey();
}
}
}
Time for LINQ magic?
// load file into string[]
var input = File.ReadAllLines("TextFile1.txt");
// in case you have more than one home in your file
var homes =
new XDocument(
new XElement("root",
from line in input
let items = line.Split(new[] { "/" }, StringSplitOptions.RemoveEmptyEntries)
group items by items[0] into g
select new XElement(g.Key,
from rooms in g.OrderBy(x => x.Length).Skip(1)
group rooms by rooms[1] into g2
select new XElement(g2.Key,
from name in g2.OrderBy(x => x.Length).Skip(1)
select new XElement(name[2], string.Format("This is a {0}", name[2]))))));
// get the right home
var home = new XDocument(homes.Root.Element("home"));

How to test whether a node contains particular string or character as its text value?

How to test whether a node contains particular string or character using C# code.
example:
<abc>
<foo>data testing</foo>
<foo>test data</foo>
<bar>data value</bar>
</abc>
Now I need to test the particular node value has the string "testing" ?
The output would be "foo[1]"
You can also that into an XPath document and then use a query:
var xPathDocument = new XPathDocument("myfile.xml");
var query = XPathExpression.Compile(#"/abc/foo[contains(text(),""testing"")]");
var navigator = xpathDocument.CreateNavigator();
var iterator = navigator.Select(query);
while(iterator.MoveNext())
{
Console.WriteLine(iterator.Current.Name);
Console.WriteLine(iterator.Current.Value);
}
This will determine if any elements (not just foo) contain the desired value and will print the element's name and it's entire value. You didn't specify what the exact result should be, but this should get you started. If loading from a file use XElement.Load(filename).
var xml = XElement.Parse(#"<abc>
<foo>data testing</foo>
<foo>test data</foo>
<bar>data value</bar>
</abc>");
// or to load from a file use this
// var xml = XElement.Load("sample.xml");
var query = xml.Elements().Where(e => e.Value.Contains("testing"));
if (query.Any())
{
foreach (var item in query)
{
Console.WriteLine("{0}: {1}", item.Name, item.Value);
}
}
else
{
Console.WriteLine("Value not found!");
}
You can use Linq to Xml
string someXml = #"<abc>
<foo>data testing</foo>
<foo>test data</foo>
</abc>";
XDocument doc = XDocument.Parse(someXml);
bool containTesting = doc
.Descendants("abc")
.Descendants("foo")
.Where(i => i.Value.Contains("testing"))
.Count() >= 1;

Categories

Resources