XML DocumentWriter - Stop empty elements appearing over two lines in .NET - c#

I am using the XML document object in VB.NET to manipulate XML.
My application creates a new XML fragment and updates the XML via the innerXML method:
reasonFrag.InnerXml = String.Format("<ReasonForPayment>{0}</ReasonForPayment>", reason)
This produces the correct XML output on most occasions, e.g.
<ReasonForPayment>reason</ReasonForPayment>
If the reason string is empty I get element spanning two lines in the XML as follows:
<ReasonForPayment>
</ReasonForPayment>
I am looking for a way of keeping the element on a single line while maintaining the same format, e.g.
<ReasonForPayment></ReasonForPayment>
The alternative <ReasonForPayment /> is not acceptable (third party application wont accept it).
Thanks
Steven

I think the best way to handle this would be to do something like this:
if (reason == null | reason.Trim() == "")
{
reasonFrag.IsEmpty = true;
}
else
{
reasonFrag.InnerText = reason.Trim();
}
This changes the output to
<ReasonForPayment/>

if(string.IsNullOrEmpty(reason))
{
reasonFrag.InnerXml = "<ReasonForPayment></ReasonForPayment>"
}
else
{
reasonFrag.InnerXml = String.Format("<ReasonForPayment>{0}</ReasonForPayment>", reason)
}

Not tested but maybe something like
edit threw it into LinqPad, works just fine for what you need.
public static string ToXmlFragment(this object input, string element)
{
//extension method, place in a static class somewhere
return string.IsNullOrEmpty(input.ToString()) ?
string.Format("<{0}></{0}>",element) :
string.Format("<{0}>{1}</{0}>",element,input);
}
reasonFrag.InnerXml = reason.ToXmlFragment("ReasonForPayment");

The solution to my problem was unusual. When reading or writing files in .NET using streamreader/writer, textreader/writer and the XMLDocument object, the document format changes depending on the file extension. So for example reading a file with a XML extension, the file is treated and formatted as XML. This was causing my original problem, a empty element was output over two lines with a CRLF inserted. The solution was to output the steam to a file with a .txt extension and then rename the file to XML, then my formatting was preserved.

Related

Convert all HTML entities not predefined for XML to unicode

I am trying to manipulate a string containing HTML-Code and then save the content to a htm-file. Afterwards the htm file is imported to a Word-File. Goal is to append a document formatted in HTML to a Word document. This process is part of a much larger programm and i cannot modify the given parameters.
To easily modify the HTML-Code I thought using XDocument would be a great idea.
So I tried this:
AppendContent(string content, Document doc)
{
string filePath = ...; //somewhere in /AppData/Local
var xDoc = XDocument.Parse(content);
// code left out because irrelevant
// Finding all "img" elements, in order to
// extract the embedded picture and save it as external file
FileHelper.SaveToFile(filePath, xDoc.ToString());
//... After this, the file is appended to the word file (the one in doc)
}
First attempt worked actually, with a small test html. Using any of the big documents I'm trying to append to the word document, cause an exception to be thrown:
XDocument.Parse cannot parse entities like "nbsp" or "uuml" (german ü). I already found out that XML only supports a hand full of predefined entities, so i would have to manually add the definition to the html file. This is not an option, because this operation is supposed to work with ANY Html file.
I found following fix:
var decodedContent = WebUtility.HtmlDecode(content);
var xDoc = XDocument.Parse(decodedContent);
This converts all entities to the representing character. So "uuml" is converted to "ü", etc. This worked until i hit a document that contained the "amp" entity, which is then converted to "&"... and such the XDocument.Parse is complaining again.
I'm looking for a way to convert HTML to unicode-representation ("\0x1234") or a HTML-decode, that does not decode XML-predefined entities.

When saving XML file with XElement, alignment in file changes as well, how to avoid?

I am using
XElement root = XElement.Load(filepath);
to load XML file, then finding things that I need.
IEnumerable<XElement> commands = from command in MyCommands
where (string) command.Attribute("Number") == Number
select command;
foreach (XElement command in commands)
{
command.SetAttributeValue("Group", GroupFound);
}
When I am done with my changes, I save the file with the following code.
root.Save(filepath);
When file is saved, all the lines in my XML file are affected. Visual Studio aligns all the lines by default, but I need to save the original file format.
I cannot alter any part of the document, except the Group attribute values.
command.SetAttributeValue("Group") attributes.
You would need to do:
XElement root = XElement.Load(filepath, LoadOptions.PreserveWhitespace);
then do:
root.Save(filepath, SaveOptions.DisableFormatting);
This will preserve your original whitespace through the use of LoadOptions and SaveOptions.
The information you're looking to preserve is lost to begin in the XDocument.
XDocument doesn't care if your elements had tabs or spaces on the line in front of them and if there are multiple whitespaces between attributes etc. If you want to rely on the Save() method you have to give up the idea you can preserve formatting.
To preserve formatting you'll need to add custom processing and figure out where precisely to make changes. Alternatively you may be able to adjust your save options to match the formatting you have if your XML is coming from a machine and not human edited

Parsing Resx file with C# crashes on relative paths

Out of the sheer frustration of having to copy resx data into word to get the word count
i've started to write my own tool to do so.
Well that made me run into an issue.
i have icons and such things in the Resources.resx file.
and they have relative paths according to the project they are being used int.
Which they should have obviously.
Well when i try to parse the Resx file in another application to count the words from the Value column.
i am getting errors as they can't parse the relative path. they end up going to folders that do not exist in my wordcount application.
Does any of you have an idea how i can fool the app into looking in the right folder when parsing these values?
i'm not quite sure why it is parsing those values to begin with.
it should just grab the string that's all i care about.
i'm using the ResXReader
ResXResourceReader reader = new ResXResourceReader(filename);
foreach(System.Collections.DictionaryEntry de in reader)
{
if (((string)de.Key).EndsWith(".Text"))
{
System.Diagnostics.Debug.WriteLine(string.Format("{0}: {1}", de.Key, de.Value));
}
}
I found this here: Word Count of .resx files
It errors out on the foreach.
..\common\app.ico for example.
anyone have an idea on how to do this?
Alright.
So the solution was a little easier than expected.
i was using an outdated Class.
I should have been using XDocument instead of XmlDataDocument
secondly LINQ is the bomb.
all i had to do was this:
try
{
XDocument xDoc = XDocument.Load(resxFile);
var result = from item in xDoc.Descendants("data")
select new
{
Name = item.Attribute("name").Value,
Value = item.Element("value").Value
};
resxGrid.DataSource = result.ToArray();
}
and you can even allow empty strings if you cast those attributes/elements to (String)
Hope this helps someone!
Try to use ResXResourceReader for this purpose - see http://msdn.microsoft.com/en-us/library/czdde9sc.aspx

How do I work with an XML tag within a string?

I'm working in Microsoft Visual C# 2008 Express.
Let's say I have a string and the contents of the string is: "This is my <myTag myTagAttrib="colorize">awesome</myTag> string."
I'm telling myself that I want to do something to the word "awesome" - possibly call a function that does something called "colorize".
What is the best way in C# to go about detecting that this tag exists and getting that attribute? I've worked a little with XElements and such in C#, but mostly to do with reading in and out XML files.
Thanks!
-Adeena
Another solution:
var myString = "This is my <myTag myTagAttrib='colorize'>awesome</myTag> string.";
try
{
var document = XDocument.Parse("<root>" + myString + "</root>");
var matches = ((System.Collections.IEnumerable)document.XPathEvaluate("myTag|myTag2")).Cast<XElement>();
foreach (var element in matches)
{
switch (element.Name.ToString())
{
case "myTag":
//do something with myTag like lookup attribute values and call other methods
break;
case "myTag2":
//do something else with myTag2
break;
}
}
}
catch (Exception e)
{
//string was not not well formed xml
}
I also took into account your comment to Dabblernl where you want parse multiple attributes on multiple elements.
You can extract the XML with a regular expression, load the extracted xml string in a XElement and go from there:
string text=#"This is my<myTag myTagAttrib='colorize'>awesome</myTag> text.";
Match match=Regex.Match(text,#"(<MyTag.*</MyTag>)");
string xml=match.Captures[0].Value;
XElement element=XElement.Parse(xml);
XAttribute attribute=element.Attribute("myTagAttrib");
if(attribute.Value=="colorize") DoSomethingWith(element.Value);// Value=awesome
This code will throw an exception if no MyTag element was found, but that can be remedied by inserting a line of:
if(match.Captures.Count!=0)
{...}
It gets even more interesting if the string could hold more than just the MyTag Tag...
I'm a little confused about your example, because you switch between the string (text content), tags, and attributes. But I think what you want is XPath.
So if your XML stream looks like this:
<adeena/><parent><child x="this is my awesome string">This is another awesome string<child/><adeena/>
You'd use an XPath expression that looks like this to find the attribute:
//child/#x
and one like this to find the text value under the child tag:
//child
I'm a Java developer, so I don't know what XML libraries you'd use to do this. But you'll need a DOM parser to create a W3C Document class instance for you by reading in the XML file and then using XPath to pluck out the values.
There's a good XPath tutorial from the W3C schools if you need it.
UPDATE:
If you're saying that you already have an XML stream as String, then the answer is to not read it from a file but from the String itself. Java has abstractions called InputStream and Reader that handle streams of bytes and chars, respectively. The source can be a file, a string, etc. Check your C# DOM API to see if it has something similar. You'll pass the string to a parser that will give back a DOM object that you can manipulate.
Since the input is not well-formed XML you won't be able to parse it with any of the built in XML libraries. You'd need a regular expression to extract the well-formed piece. You could probably use one of the more forgiving HTML parsers like HtmlAgilityPack on CodePlex.
This is my solution to match any type of xml using Regex:
C# Better way to detect XML?
The XmlTextReader can parse XML fragments with a special constructor which may help in this situation, but I'm not positive about that.
There's an in-depth article here:
http://geekswithblogs.net/kobush/archive/2006/04/20/75717.aspx

Unit testing XML Generation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What unit testing strategies do people recommend for testing xml is being generated correctly.
The my current tests seem abit primitive, something along the lines of:
[Test]
public void pseudo_test()
{
XmlDocument myDOC = new XmlDocument();
mydoc = _task.MyMethodToMakeXMLDoc();
Assert.AreEqual(myDoc.OuterXML(),"big string of XML")
}
First, as pretty much everyone is saying, validate the XML if there's a schema defined for it. (If there's not, define one.)
But you can build tests that are a lot more granular than that by executing XPath queries against the document, e.g.:
string xml="Your xml string here" ;
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
path = "/doc/element1[#id='key1']/element2[. = 'value2']";
Assert.IsTrue(doc.SelectSingleNode(path) != null);
This lets you test not only whether or not your document is semantically valid, but whether or not the method producing it is populating it with the values that you expect.
Fluent Assertions is an excellent library for expressing test assertions in a fluent, easy to read style. It works with all the major Unit Testing frameworks.
It also has some useful XML functionality (all taken from the examples here), for example:
xElementA.Should().Be(xElementB);
xDocument.Should().HaveRoot("configuration");
xDocument.Should().HaveElement("settings");
xElement.Should().HaveAttribute("age", "36");
xElement.Should().HaveElement("address");
xAttribute.Should().HaveValue("Amsterdam");
Note that this works with LINQ-To-XML rather than the XmlDocument object specified in the original question but personally these days I find I'm using LINQ-To-XML as a first choice.
It is also quite easily extensible, should you want to add further XML assertions to fit your needs.
Another possibility might be to use XmlReader and check for an error count > 0. Something like this:
void CheckXml()
{
string _xmlFile = "this.xml";
string _xsdFile = "schema.xsd";
StringCollection _xmlErrors = new StringCollection();
XmlReader reader = null;
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationEventHandler += new ValidationEventHandler(this.ValidationEventHandler);
settings.ValidationType = ValidationType.Schema;
settings.IgnoreComments = chkIgnoreComments.Checked;
settings.IgnoreProcessingInstructions = chkIgnoreProcessingInstructions.Checked;
settings.IgnoreWhitespace = chkIgnoreWhiteSpace.Checked;
settings.Schemas.Add(null, XmlReader.Create(_xsdFile));
reader = XmlReader.Create(_xmlFile, settings);
while (reader.Read())
{
}
reader.Close();
Assert.AreEqual(_xmlErrors.Count,0);
}
void ValidationEventHandler(object sender, ValidationEventArgs args)
{
_xmlErrors.Add("<" + args.Severity + "> " + args.Message);
}
Validate against XML schema or DTD, also check key that nodes have the values you expect.
If you have a standard format that you expect the output to be, why not create an XML schema or DTD and validate against that. This won't depend on the data, so will be flexible. Also defining how the XML can be formed can be helpful when designing you system.
This blog post by marianor gives a lightweight way to compare XElement structures, so I'm going to try that before tackling XMLUnit.
The first thing to do is normalize the two XMLs...using Linq... After both elements were normalized, simply you can compare both strings.
The XML is normalized by sorting the element and attribute names.
Validate it against an XSD schema using XmlSchema class. Its found under System.XML i think.
Another option would be to write a serialization class (XMLSerializer) to deserialize your XML into an object. The gain will be that it will implicitly validate your structure and after that the values can be easily accessed for testing using the resulting object.
Another reason to use a Schema to validate against is that while XML nodes are explicitly ordered, XML attributes are not.
So your string comparison of:
Assert.AreEqual(myDoc.OuterXML(),"big string of XML")
would fail if the attributes are in a different order, as could easily happen if one bit of XML was manually created and the other programatically.
Verify the resulting document is well formed
Verify the resulting document is valid
Verify the resulting document is correct.
Presumably, you are crafting an XML document out of useful data, so you will want to ensure that you have the right coverage of inputs for your tests. The most common problems I see are
Incorrectly escaped elements
Incorrectly escaped attributes
Incorrectly escaped element names
Incorrectly escaped attribute names
So if you haven't already done so, you would need to review the XML spec to see what's allowed in each place.
How much "checking" should happen in each test isn't immediately clear. It will depend a lot on what a unit is in your problem space, I suppose. It seems reasonable that each unit test is checking that one piece of data is correctly expressed in the XML. In this case, I'm in agreement with Robert that a simple check that you find the right data at a single XPath location is best.
For larger automated tests, where you want to check the entire document, what I've found to be effective is to have an Expected results which is also a document, and walk through it node by node, using XPath expressions to find the corresponding node in the actual document, and then applying the correct comparison of the data encoded in the two nodes.
With this approach, you'll normally want to catch all failures at once, rather than aborting on first failure, so you may need to be tricksy about how you track where mismatches occurred.
With a bit more work, you can recognize certain element types as being excused from a test (like a time stamp), or to validate that they are pointers to equivalent nodes, or... whatever sort of custom verification you want.
I plan on using this new Approval Testing library to help with XML testing.
It looks to be perfect for the job, but read it first yourself as I don't have experience using it.
why not assume that some commercial xml parser is correct and validate your xml code against it? something like.
Assert.IsTrue(myDoc.Xml.ParseOK)
other than that and if you want to be thorough I'd say you would have to build a parser yourself and validate each rule the xml specification requires.
You can use a DTD to check for the validity of the generated xml.
To test for the correct content I would go for XMLUnit.
Asserting xml using XMLUnit:
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreDiffBetweenTextAndCDATA(true);
Diff diff = new Diff(expectedDocument, obtainedDocument);
XMLAssert.assertXMLIdentical("xml invalid", diff, true);
One thing you might come across is the fact that the generated xml might contain changing identifiers (id/uid attributes or alike). This can be solved by using a DifferenceListener when asserting the generated xml.
Example implementation of such DifferenceListener:
public class IgnoreVariableAttributesDifferenceListener implements DifferenceListener {
private final List<String> IGNORE_ATTRS;
private final boolean ignoreAttributeOrder;
public IgnoreVariableAttributesDifferenceListener(List<String> attributesToIgnore, boolean ignoreAttributeOrder) {
this.IGNORE_ATTRS = attributesToIgnore;
this.ignoreAttributeOrder = ignoreAttributeOrder;
}
#Override
public int differenceFound(Difference difference) {
// for attribute value differences, check for ignored attributes
if (difference.getId() == DifferenceConstants.ATTR_VALUE_ID) {
if (IGNORE_ATTRS.contains(difference.getControlNodeDetail().getNode().getNodeName())) {
return RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
}
// attribute order mismatch (optionally ignored)
else if (difference.getId() == DifferenceConstants.ATTR_SEQUENCE_ID && ignoreAttributeOrder) {
return RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
// attribute missing / not expected
else if (difference.getId() == DifferenceConstants.ATTR_NAME_NOT_FOUND_ID) {
if (IGNORE_ATTRS.contains(difference.getTestNodeDetail().getValue())) {
return RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
}
return RETURN_ACCEPT_DIFFERENCE;
}
#Override
public void skippedComparison(Node control, Node test) {
// nothing to do
}
}
using DifferenceListener:
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreDiffBetweenTextAndCDATA(true);
Diff diff = new Diff(expectedDocument, obtainedDocument);
diff.overrideDifferenceListener(new IgnoreVariableAttributesDifferenceListener(Arrays.asList("id", "uid"), true));
XMLAssert.assertXMLIdentical("xml invalid", diff, true);

Categories

Resources