How to iterate over xml using linq2xml or Xquery

How to iterate over xml using linq2xml or Xquery - c#

I have an incoming file with data as
<root><![CDATA[<defs><elements>
<element><item>aa</item><int>1</int></element>
<element><item>bb</item><int>2</int></element>
<element><item>cc</item><int>3</int></element>
</elements></defs>]]></root>
writing multiple foreach( xElement x in root.Elements ) seems superfluous !
looking for a less verbose method preferably using C#
UPDATE - yes - the input is in a CDATA, rest assured it's not my design and i have ZERO control over it !

Assuming that nasty CDATA section is intentional, and you're only interested in the text content of your leaf elements, you can do something like:
XElement root = XElement.Load(yourFile);
var data = from element in XElement.Parse(root.Value).Descendants("element")
select new {
Item = element.Elements("item").First().Value,
Value = element.Elements("int").First().Value
};
That said, if the code that generates your input file is under your control, consider getting rid of the CDATA section. Storing XML within XML that way is not the way to go most of the time, as it defeats the purpose of the markup language (and requires multiple parser passes, as shown above).

Related

Custom XML-like Syntax Parsing

I'm attempting to replicate a dialogue system from a game that has control codes, which are HTML/XML-like tags that dictate behavior of a text bubble. For example, changing the color of a piece of text would be like <co FF0000FF>Hello World!</co>. These control codes are not required in the text, so Hello <co FF0000FF>World!</co> or simply Hello World should parse as well.
I've attempted to make it similar to XML to ease parsing, but XML requires a root-level tag to parse successfully, and the text may or may not have any control codes. For example, I'm able to parse the following fine with XElement.
string Text = "<co value=\"FF0000FF\">Hello World!</co>"
XElement.Parse(Text);
However, the following fails with an XMLException ("Data at the root level is invalid. Line 1, position 1."):
string Text = "Hello <co value=\"FF0000FF\">World!</co>"
XElement.Parse(Text);
What would be a good approach to handling this? Is there a way to handle parsing XML elements in a string without requiring a strict XML syntax, or is there another type of parser I can use to achieve what I want?

If the only difference between your XML-like fragments and real XML is the absence of a root element, then simply wrap the fragment in a dummy root element before parsing:
parse("<dummy>" + fragment + "</dummy>")
If there are other differences, for example attributes not being in quotes, or attribute names starting with a digit, then an XML parser isn't going to be much use to you, you will need to write your own. Or an HTML parser such as validator.nu might handle it, if you're lucky.

You can try with HtmlAgilityPack
Install Nuget packge by firing this command Install-Package HtmlAgilityPack
The following sample will return all the child nodes. I did not pass any level to Descendants but you can further put more code as per need.
It will parse your custom format.
string Text = "Hello <co value=\"FF0000FF\">World!</co>";
Text = System.Net.WebUtility.HtmlDecode(Text);
HtmlDocument result = new HtmlDocument();
result.LoadHtml(Text);
List<HtmlNode> nodes = result.DocumentNode.Descendants().ToList();

If the XML elements within your text will always be well-formed, then you can use the XML libraries to do this.
You can either wrap your text inside a root element and use XElement.Parse and read the child nodes, or you can use some lower level bits to allow you to parse the nodes in an XML fragment:
public static IEnumerable<XNode> Parse(string text)
{
var settings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment
};
using (var sr = new StringReader(text))
using (var xr = XmlReader.Create(sr, settings))
{
xr.MoveToContent();
while (xr.EOF == false)
{
yield return XNode.ReadFrom(xr);
}
}
}
Using it like this:
foreach (var node in Parse("Hello <co value=\"FF0000FF\">World!</co>"))
{
Console.WriteLine($"{node.GetType().Name}: {node}");
}
Would output this:
XText: Hello
XElement: <co value="FF0000FF">World!</co>
See this fiddle for a working demo.

When saving XML file with XElement, alignment in file changes as well, how to avoid?

I am using
XElement root = XElement.Load(filepath);
to load XML file, then finding things that I need.
IEnumerable<XElement> commands = from command in MyCommands
where (string) command.Attribute("Number") == Number
select command;
foreach (XElement command in commands)
{
command.SetAttributeValue("Group", GroupFound);
}
When I am done with my changes, I save the file with the following code.
root.Save(filepath);
When file is saved, all the lines in my XML file are affected. Visual Studio aligns all the lines by default, but I need to save the original file format.
I cannot alter any part of the document, except the Group attribute values.
command.SetAttributeValue("Group") attributes.

You would need to do:
XElement root = XElement.Load(filepath, LoadOptions.PreserveWhitespace);
then do:
root.Save(filepath, SaveOptions.DisableFormatting);
This will preserve your original whitespace through the use of LoadOptions and SaveOptions.

The information you're looking to preserve is lost to begin in the XDocument.
XDocument doesn't care if your elements had tabs or spaces on the line in front of them and if there are multiple whitespaces between attributes etc. If you want to rely on the Save() method you have to give up the idea you can preserve formatting.
To preserve formatting you'll need to add custom processing and figure out where precisely to make changes. Alternatively you may be able to adjust your save options to match the formatting you have if your XML is coming from a machine and not human edited

Merging two xml files in C# without appending and without deleting anything (example given)

So say I have one xml file such as this:
<shapes>
<shape>shape1</shape>
</shapes>
And another xml file like this:
<parentNode>
<shapes>
<shape>shape 2</shape>
</shapes>
</parentnode>
I would like the output to be:
<parentNode>
<shapes>
<shape>shape1</shape>
<shape>shape 2</shape>
</shapes>
</parentnode>
The context is that I am using the visio schema but I wish the config file for an application which writes visio xml files to be a stripped down version of a visio config file. It should allow users to change shape properties, e.g. "process" to have a yellow colour AND it should allow them to add new shapes for example "AccountsTable" which the application will search for before using a standard shape and use the custom shape instead in some circumstances.
In terms of the merge it basically needs to stick the right leaf nodes in the right places if that makes sense? Without overwriting anything unless the config file has been explicitly written to do so, e.g. a custom "shape 2".
What should I be looking at to achieve this? The dataset method is pretty useless.
Many thanks!!!

You can load both files into two XElement objects, locate the target nodes in both objects and add or remove as you wish.
Here is a sample:
var doc1 = XDocument.Parse(file1).Element("shapes");
var doc2 = XDocument.Parse(file2).Element("parentNode").Element("shapes");
doc2.Add(doc1.Nodes());

I don't think there is an easy solution. Considering that you are not restricted to merging the contents of the Shapes node, i think you will have to parse through the nodes of one of the document recursively, checking whether each of these nodes is present in the other document through XPath. And once you find a node that is common in both the documents, you can merge the contents of one in the other. It is hardly efficient and there may be a better way but thats the best I can think of.

Psuedo code, I am guessing at the method names.
...
xmlreader xmlToMerge1 = xmlreader.create(XmlSourceVariableHere);
xmlreader xmlToMerge2 = xmlreader.create(XmlSourceVariableToMergeHere);
xmlwriter xmlout = new xmlwriter(someStreamOrOther);
xmlout.writeBeginElement("parentnode");
xmlout.writeBeginElement("shapes");
while (xmlToMerge1.Read())
{
if (xmlreader.nodetype == element && xml.Name == "shape")
{
xmlToMerge1.WriteNodeTo(xmlout);
}
}
while (xmlToMerge2.Read())
{
if (xmlToMerge2.nodetype == element && xmlToMerge2.Name == "shape")
{
xmlToMerge2.WriteNodeTo(xmlout);
}
}
xmlout.writeEndNode(); // end shapes
xmlout.writeEndNode(); // end parentnode
I remember that there is a command to write a node from a reader to a writer, but I don't remember what it is specifically, you'll have to look that one up.
What exactly do you mean by the following?
In terms of the merge it basically
needs to stick the right leaf nodes in
the right places if that makes sense?
Without overwriting anything unless
the config file has been explicitly
written to do so, e.g. a custom "shape
2".
You'll have to explain your requirements a bit more if you want an answer to be more detailed than simply merging nodes.

How do I work with an XML tag within a string?

I'm working in Microsoft Visual C# 2008 Express.
Let's say I have a string and the contents of the string is: "This is my <myTag myTagAttrib="colorize">awesome</myTag> string."
I'm telling myself that I want to do something to the word "awesome" - possibly call a function that does something called "colorize".
What is the best way in C# to go about detecting that this tag exists and getting that attribute? I've worked a little with XElements and such in C#, but mostly to do with reading in and out XML files.
Thanks!
-Adeena

Another solution:
var myString = "This is my <myTag myTagAttrib='colorize'>awesome</myTag> string.";
try
{
var document = XDocument.Parse("<root>" + myString + "</root>");
var matches = ((System.Collections.IEnumerable)document.XPathEvaluate("myTag|myTag2")).Cast<XElement>();
foreach (var element in matches)
{
switch (element.Name.ToString())
{
case "myTag":
//do something with myTag like lookup attribute values and call other methods
break;
case "myTag2":
//do something else with myTag2
break;
}
}
}
catch (Exception e)
{
//string was not not well formed xml
}
I also took into account your comment to Dabblernl where you want parse multiple attributes on multiple elements.

You can extract the XML with a regular expression, load the extracted xml string in a XElement and go from there:
string text=#"This is my<myTag myTagAttrib='colorize'>awesome</myTag> text.";
Match match=Regex.Match(text,#"(<MyTag.*</MyTag>)");
string xml=match.Captures[0].Value;
XElement element=XElement.Parse(xml);
XAttribute attribute=element.Attribute("myTagAttrib");
if(attribute.Value=="colorize") DoSomethingWith(element.Value);// Value=awesome
This code will throw an exception if no MyTag element was found, but that can be remedied by inserting a line of:
if(match.Captures.Count!=0)
{...}
It gets even more interesting if the string could hold more than just the MyTag Tag...

I'm a little confused about your example, because you switch between the string (text content), tags, and attributes. But I think what you want is XPath.
So if your XML stream looks like this:
<adeena/><parent><child x="this is my awesome string">This is another awesome string<child/><adeena/>
You'd use an XPath expression that looks like this to find the attribute:
//child/#x
and one like this to find the text value under the child tag:
//child
I'm a Java developer, so I don't know what XML libraries you'd use to do this. But you'll need a DOM parser to create a W3C Document class instance for you by reading in the XML file and then using XPath to pluck out the values.
There's a good XPath tutorial from the W3C schools if you need it.
UPDATE:
If you're saying that you already have an XML stream as String, then the answer is to not read it from a file but from the String itself. Java has abstractions called InputStream and Reader that handle streams of bytes and chars, respectively. The source can be a file, a string, etc. Check your C# DOM API to see if it has something similar. You'll pass the string to a parser that will give back a DOM object that you can manipulate.

Since the input is not well-formed XML you won't be able to parse it with any of the built in XML libraries. You'd need a regular expression to extract the well-formed piece. You could probably use one of the more forgiving HTML parsers like HtmlAgilityPack on CodePlex.

This is my solution to match any type of xml using Regex:
C# Better way to detect XML?

The XmlTextReader can parse XML fragments with a special constructor which may help in this situation, but I'm not positive about that.
There's an in-depth article here:
http://geekswithblogs.net/kobush/archive/2006/04/20/75717.aspx

Unit testing XML Generation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What unit testing strategies do people recommend for testing xml is being generated correctly.
The my current tests seem abit primitive, something along the lines of:
[Test]
public void pseudo_test()
{
XmlDocument myDOC = new XmlDocument();
mydoc = _task.MyMethodToMakeXMLDoc();
Assert.AreEqual(myDoc.OuterXML(),"big string of XML")
}

First, as pretty much everyone is saying, validate the XML if there's a schema defined for it. (If there's not, define one.)
But you can build tests that are a lot more granular than that by executing XPath queries against the document, e.g.:
string xml="Your xml string here" ;
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
path = "/doc/element1[#id='key1']/element2[. = 'value2']";
Assert.IsTrue(doc.SelectSingleNode(path) != null);
This lets you test not only whether or not your document is semantically valid, but whether or not the method producing it is populating it with the values that you expect.

Fluent Assertions is an excellent library for expressing test assertions in a fluent, easy to read style. It works with all the major Unit Testing frameworks.
It also has some useful XML functionality (all taken from the examples here), for example:
xElementA.Should().Be(xElementB);
xDocument.Should().HaveRoot("configuration");
xDocument.Should().HaveElement("settings");
xElement.Should().HaveAttribute("age", "36");
xElement.Should().HaveElement("address");
xAttribute.Should().HaveValue("Amsterdam");
Note that this works with LINQ-To-XML rather than the XmlDocument object specified in the original question but personally these days I find I'm using LINQ-To-XML as a first choice.
It is also quite easily extensible, should you want to add further XML assertions to fit your needs.

Another possibility might be to use XmlReader and check for an error count > 0. Something like this:
void CheckXml()
{
string _xmlFile = "this.xml";
string _xsdFile = "schema.xsd";
StringCollection _xmlErrors = new StringCollection();
XmlReader reader = null;
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationEventHandler += new ValidationEventHandler(this.ValidationEventHandler);
settings.ValidationType = ValidationType.Schema;
settings.IgnoreComments = chkIgnoreComments.Checked;
settings.IgnoreProcessingInstructions = chkIgnoreProcessingInstructions.Checked;
settings.IgnoreWhitespace = chkIgnoreWhiteSpace.Checked;
settings.Schemas.Add(null, XmlReader.Create(_xsdFile));
reader = XmlReader.Create(_xmlFile, settings);
while (reader.Read())
{
}
reader.Close();
Assert.AreEqual(_xmlErrors.Count,0);
}
void ValidationEventHandler(object sender, ValidationEventArgs args)
{
_xmlErrors.Add("<" + args.Severity + "> " + args.Message);
}

Validate against XML schema or DTD, also check key that nodes have the values you expect.

If you have a standard format that you expect the output to be, why not create an XML schema or DTD and validate against that. This won't depend on the data, so will be flexible. Also defining how the XML can be formed can be helpful when designing you system.

This blog post by marianor gives a lightweight way to compare XElement structures, so I'm going to try that before tackling XMLUnit.
The first thing to do is normalize the two XMLs...using Linq... After both elements were normalized, simply you can compare both strings.
The XML is normalized by sorting the element and attribute names.

Validate it against an XSD schema using XmlSchema class. Its found under System.XML i think.
Another option would be to write a serialization class (XMLSerializer) to deserialize your XML into an object. The gain will be that it will implicitly validate your structure and after that the values can be easily accessed for testing using the resulting object.

Another reason to use a Schema to validate against is that while XML nodes are explicitly ordered, XML attributes are not.
So your string comparison of:
Assert.AreEqual(myDoc.OuterXML(),"big string of XML")
would fail if the attributes are in a different order, as could easily happen if one bit of XML was manually created and the other programatically.

Verify the resulting document is well formed
Verify the resulting document is valid
Verify the resulting document is correct.
Presumably, you are crafting an XML document out of useful data, so you will want to ensure that you have the right coverage of inputs for your tests. The most common problems I see are
Incorrectly escaped elements
Incorrectly escaped attributes
Incorrectly escaped element names
Incorrectly escaped attribute names
So if you haven't already done so, you would need to review the XML spec to see what's allowed in each place.
How much "checking" should happen in each test isn't immediately clear. It will depend a lot on what a unit is in your problem space, I suppose. It seems reasonable that each unit test is checking that one piece of data is correctly expressed in the XML. In this case, I'm in agreement with Robert that a simple check that you find the right data at a single XPath location is best.
For larger automated tests, where you want to check the entire document, what I've found to be effective is to have an Expected results which is also a document, and walk through it node by node, using XPath expressions to find the corresponding node in the actual document, and then applying the correct comparison of the data encoded in the two nodes.
With this approach, you'll normally want to catch all failures at once, rather than aborting on first failure, so you may need to be tricksy about how you track where mismatches occurred.
With a bit more work, you can recognize certain element types as being excused from a test (like a time stamp), or to validate that they are pointers to equivalent nodes, or... whatever sort of custom verification you want.

I plan on using this new Approval Testing library to help with XML testing.
It looks to be perfect for the job, but read it first yourself as I don't have experience using it.

why not assume that some commercial xml parser is correct and validate your xml code against it? something like.
Assert.IsTrue(myDoc.Xml.ParseOK)
other than that and if you want to be thorough I'd say you would have to build a parser yourself and validate each rule the xml specification requires.

You can use a DTD to check for the validity of the generated xml.
To test for the correct content I would go for XMLUnit.
Asserting xml using XMLUnit:
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreDiffBetweenTextAndCDATA(true);
Diff diff = new Diff(expectedDocument, obtainedDocument);
XMLAssert.assertXMLIdentical("xml invalid", diff, true);
One thing you might come across is the fact that the generated xml might contain changing identifiers (id/uid attributes or alike). This can be solved by using a DifferenceListener when asserting the generated xml.
Example implementation of such DifferenceListener:
public class IgnoreVariableAttributesDifferenceListener implements DifferenceListener {
private final List<String> IGNORE_ATTRS;
private final boolean ignoreAttributeOrder;
public IgnoreVariableAttributesDifferenceListener(List<String> attributesToIgnore, boolean ignoreAttributeOrder) {
this.IGNORE_ATTRS = attributesToIgnore;
this.ignoreAttributeOrder = ignoreAttributeOrder;
}
#Override
public int differenceFound(Difference difference) {
// for attribute value differences, check for ignored attributes
if (difference.getId() == DifferenceConstants.ATTR_VALUE_ID) {
if (IGNORE_ATTRS.contains(difference.getControlNodeDetail().getNode().getNodeName())) {
return RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
}
// attribute order mismatch (optionally ignored)
else if (difference.getId() == DifferenceConstants.ATTR_SEQUENCE_ID && ignoreAttributeOrder) {
return RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
// attribute missing / not expected
else if (difference.getId() == DifferenceConstants.ATTR_NAME_NOT_FOUND_ID) {
if (IGNORE_ATTRS.contains(difference.getTestNodeDetail().getValue())) {
return RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
}
return RETURN_ACCEPT_DIFFERENCE;
}
#Override
public void skippedComparison(Node control, Node test) {
// nothing to do
}
}
using DifferenceListener:
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreDiffBetweenTextAndCDATA(true);
Diff diff = new Diff(expectedDocument, obtainedDocument);
diff.overrideDifferenceListener(new IgnoreVariableAttributesDifferenceListener(Arrays.asList("id", "uid"), true));
XMLAssert.assertXMLIdentical("xml invalid", diff, true);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.