I'm trying to build a simple way for non programmers to define loops and conditional logic within an XML file.
I've decided to use <Loop></Loop> and <Condition></Condition> tags to handle these cases. The idea being that the Loop and Condition tags will be replaced with handlebars.js #each and #if statements within the XML e.g
<Meeting>
<Loop Target="People">
<Person>
<Name>{{Name}}</Name>
<Surname>{{Surname}}</Surname>
</Person>
</Loop>
</Meeting>
Will need to end up as
<Meeting>
{{#each People}}
<Person>
<Name>{{Name}}</Name>
<Surname>{{Surname}}</Surname>
</Person>
{{/each}}
</Meeting>
And like wise for Condition tags being converted to the handlebars if statement.
I've attempted to use .Net's XDocument library but I'm struggling to figure out how to actually accomplish this (the IXmlLineInfo instance doesn't provide me with enough info).
It seems as if I may need a way to parse the raw string to get the start and end positions of both the opening tags and closing tags in order to do a straight up text replacement. But I'm not sure how to go about doing this in a manner that can handle the numerous edge cases that comes with editable text.
I was able to accomplish this without needing to use raw string manipulation or the use of IXmlLineInfo.
When encountering a Loop element I simply:
//1. add the opening handlerbars code to the parent elemnet before the Loop element
loopNode.AddBeforeSelf("{{#each " + collection.Value + "}}");
//2. add all the loop element's children to the parent element just after the handlebars code
loopNode.AddBeforeSelf(loopNode.Elements());
//3. Add the closing handlebars statement to the parent after the Loop element
loopNode.AddAfterSelf("{{/each}}");
//4. Then finally remove the Loop element from the DOM
loopNode.Remove();
And I'm left with exactly what I need to send into the handlebars renderer.
Related
I am Currently Facing A problem. I am loading a xml file in C# and remove some nodes from it and appending some nodes. now problem is that when i am doing removal from the xml file then there are some empty lines created automatically ,so i want to remove these line .
And when i append some nodes to the parent node in xml then i want the new line in each ending tag
For Eg. My Xml file is
<intro id="S0001">
<title>Introduction Title</title>
<para>This is a paragraph. Note that paragraphs can contain other block–level objects, such as lists, as well as directly containing text.</para>
<para>The introduction can contain all of the text objects that a section can contain, except that it cannot be divided into parts, sections and sub–sections.</para>
<para>The introduction can contain tables:</para>
</intro><part>
<no>Part A</no> Article Structure <sup>(Part Title)</sup><section1 id="S0002">`enter code here`
<no>Sect 1</no>
<title>First Section in Part 1 <sup>(Section 1 Title)</sup></title>
<shortsectionhead>Short Section Header</shortsectionhead>
<para>This is a section in the first part of the article.</para>
</section1><section1 id="S0003">
Code:
XmlNode partNnode = xmlDoc.SelectSingleNode("//part");
XmlNode introNode=xmlDoc.SelectSingleNode("//intro");
XmlDocumentFragment newNode=xmlDoc.CreateDocumentFragment();
newNode.InnerXml=partNnode.OuterXml;
introNode.ParentNode.InsertAfter(newNode,introNode);
partNnode.ParentNode.RemoveChild(partNnode);
partNnode = xmlDoc.SelectSingleNode("//part");
nodeList = xmlDoc.SelectNodes("//section1");
foreach (XmlNode refrangeNode in nodeList)
{
newNode=xmlDoc.CreateDocumentFragment();
newNode.InnerXml=refrangeNode.??OuterXml;
partNnode.AppendChild(newNode);
}
Please help me
Thanks in advance
If you load and save a XMl file with C#, then the XML should be formatted correctly (an easy way to format strange looking XML files is just to load and save them with some C# code).
If I understand your question correctly, then you are just not happy with the format of the XML file?
Like you want (A):
</intro><part>
But you get (B):
</intro>
<part>
If that is the question, then, in my eyes, you just want a strange thing. Because...
a) Code doesn't care how the XML file is formatted and
b) The format in (B) is the correct one
If you, for what reason ever, want to change it, then you have to parse through the XML file, opening it as a string and checking manually for closed and opened tags.
I have an incoming file with data as
<root><![CDATA[<defs><elements>
<element><item>aa</item><int>1</int></element>
<element><item>bb</item><int>2</int></element>
<element><item>cc</item><int>3</int></element>
</elements></defs>]]></root>
writing multiple foreach( xElement x in root.Elements ) seems superfluous !
looking for a less verbose method preferably using C#
UPDATE - yes - the input is in a CDATA, rest assured it's not my design and i have ZERO control over it !
Assuming that nasty CDATA section is intentional, and you're only interested in the text content of your leaf elements, you can do something like:
XElement root = XElement.Load(yourFile);
var data = from element in XElement.Parse(root.Value).Descendants("element")
select new {
Item = element.Elements("item").First().Value,
Value = element.Elements("int").First().Value
};
That said, if the code that generates your input file is under your control, consider getting rid of the CDATA section. Storing XML within XML that way is not the way to go most of the time, as it defeats the purpose of the markup language (and requires multiple parser passes, as shown above).
I'm attempting to find complete XML objects in a string. They have been placed in the string by an XmlSerializer, but may or may not be complete. I've toyed with the idea of using a regular expression, because it seems like the kind of thing they were built for, except for the fact that I'm trying to parse XML.
I'm trying to find complete objects in the form:
<?xml version="1.0"?>
<type>
<field>value</field>
...
</type>
My thought was a regex to find <?xml version="1.0"?><type> and </type>, but if a field has the same name as type, it obviously won't work.
There's plenty of documentation on XML parsers, but they seem to all need a complete, fully-formed document to parse. My XML objects can be in a string surrounded by pretty much anything else (including other complete objects).
hw<e>reR#lot$0fr#ndm&nchrs%<?xml version="1.0"?><type><field>...</field>...</type>#ndH#r$omOre!!>nuT6erjc?y!<?xml version="1.0"?><type><field>...</field>...</type>ty!=]
A regex would be able to match a string while excluding the random characters, but not find a complete XML object. I'd like some way to extract an object, parse it with a serializer, then repeat until the string contains no more valid objects.
Can you use a regular expression to search for the "<?xml" piece and then assume that's the beginning of an XML object, then use an XMLReader to read/check the remainder of the string until you have parsed one entire element at the root level (then stop reading from the stream with XMLReader after the root node has been completely parsed)?
Edit: For more information about using XMLReader, I suggest one of the questions I asked: I can never predict xmlreader behavior, any tips on understanding?
My final solution was to stick with the "Read" method when parsing XML and avoid other methods that actually read from the stream advancing the current position.
You could try using the Html Agility Pack, which can be used to parse "malformed XML" and make it accessible with a DOM.
It would be necessary to know which element you are looking for (like <type> in your example), because it will be parsing the accidental elements too (like <e> in your example).
I'm using XSLT transfer an XML to a different format XML. If there is empty data with the element, it will display as a self-closing, eg. <data />, but I want output it with the closing tag like this <data></data>.
If I change the output method from "xml" to "html" then I can get the <data></data>, but I will lose the <?xml version="1.0" encoding="UTF-8"?> on the top of the document. Is this the correct way of doing this?
Many thanks.
Daoming
If you want this because you think that self closing tags are ugly, then get over it.
If you want to pass the output to some non-conformant XML Parser that is under control, then use a better parser, or fix the one you are using.
If it is out of your control, and you must send it to an inadequate XML Parser, then do you really need the prolog? If not, then html output method is fine.
If you do need the XML prolog, then you could use the html output method, and prepend the prolog after transformation, but before sending it to the deficient parser.
Alternatively, you could output it as XML with self-closing tags, and preprocess before sending it to your deficient parser with some kind of custom serialisation, using the DOM. If it can't handle self-closing tags, then I'm sure that isn't the only way in which it fails to parse XML. You might need to do something about namespaces, for example.
You could try adding an empty text node to any empty elements that you are outputting. That might do the trick.
Self-closed and explicitly closed elements are exactly the same thing in any regard whatsoever.
Only if somewhere along your processing chain there is a tool that is not XML aware (code that does XML processing with regex, for example), it might make a difference. At which point you should think about changing that part of the processing, instead of the XML generation/serialization part.
I'd like to strip out occurrences of a specific tag, leaving the inner XML intact. I'd like to do this with one pass (rather than searching, replacing, and starting from scratch again). For instance, from the source:
<element>
<RemovalTarget Attribute="Something">
Content Here
</RemovalTarget>
</element>
<element>
More Here
</element>
I'd like the result to be:
<element>
Content Here
</element>
<element>
More Here
</element>
I've tried something like this (forgive me, I'm new to Linq):
var elements = from element in doc.Descendants()
where element.Name.LocalName == "RemovalTarget"
select element;
foreach (var element in elements) {
element.AddAfterSelf(element.Value);
element.Remove();
}
but on the second time through the loop I get a null reference, presumably because the collection is invalidated by changing it. What is an efficient way to make remove these tags on a potentially large document?
You'll have to skip the deferred execution with a call to ToList, which probably won't hurt your performance in large documents as you're just going to be iterating and replacing at a much lower big-O than the original search. As #jacob_c pointed out, I should be using element.Nodes() to replace it properly, and as #Panos pointed out, I should reverse the list in order to handle nested replacements accurately.
Also, use XElement.ReplaceWith, much faster than your current approach in large documents:
var elements = doc.Descendants("RemovalTarget").ToList().Reverse();
/* reverse on the IList<T> may be faster than Reverse on the IEnumerable<T>,
* needs benchmarking, but can't be any slower
*/
foreach (var element in elements) {
element.ReplaceWith(element.Nodes());
}
One last point, in reviewing what this MAY be used for, I tend to agree with #Trull that XSLT may be what you're actually looking for, if say you're removing all say <b> tags from a document. Otherwise, enjoy this fairly decent and fairly well performing LINQ to XML implementation.
Have you considered using XSLT? Seems like the perfect soution, as you are doing exactly what XSLT is meant for, transforming one XML doc into another. The templating system will delve into nested nastiness for you without problems.
Here is a basic example
I would recommend either doing XSLT as Trull recommended as the best solution.
Or you might look at using a string builder and regex matching to remove the items.
You could look at walking through the document, and working with nodes and parent nodes to effectively move the code from inside the node to the parent, but it would be tedious, and very un-necessary with the other potential solutions out there.
A lightweight solution would be to use XmlReader to go trough the input document and XmlWriter to write the output.
Note: XmlReader and XmlWriter clases are abstract, use the appropriate for your situation derived classes.
Depending on how you manage your XML, you could use a regular expression to remove the tags.
Here's a simple console application that demonstrates the use of a regex:
static void Main(string[] args)
{
string content = File.ReadAllText(args[0]);
Regex openTag = new Regex("<([/]?)RemovalTarget([^>]*)>", RegexOptions.Multiline);
string cleanContent = openTag.Replace(content, string.Empty);
File.WriteAllText(args[1], cleanContent);
}
This leaves newline characters in the file, but it shouldn't be too difficult to augment the regular expression.