spaces in end tags - c#

I have a problem for loading xml in c #.
XmlDocument doc = new XmlDocument();
string xmlText = File.ReadAllText("D:\\webservice_aspnet\\novo2.xml");
doc.PreserveWhitespace = true;
doc.LoadXml(xmlText);
Above do loading the file.
Original file:
<?xml version="1.0" encoding="UTF-8"?>
<teste>
<abc xmlns="xxx"/>
</teste>
When I try doc.InnerXml, and create a xml file, it looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<teste>
<abc xmlns="xxx" />
</teste>
See that a space was added here:
<abc xmlns="xxx" />
at the end of the tag. I know this does not alter the structure of the file, however I have a validation algorithm that file and I can not change anything or add a space.
I do not want to replace to fix this, because they are giants and files can lose information.
Anyone know how I can generate the identical file?

If you are reading XML using a parser (perhaps a home-brew parser) than can't handle all legal XML syntax, then you are storing up trouble, and your name will be cursed by anyone who inherits your code. Don't do it.
Don't try to fix the XML generation code to generate the subset of XML that your parser can handle. Fix your parser.
One way to fix your parser might be to add an XML canonicalization step as the first thing it does; canonicalization generates a well defined subset of XML that might (if you're lucky) correspond to the subset that your home-brew parser understands.

Perhaps you could try doc.Load(file) instead of doc.LoadXml(file).

Please try just using Replace function:
string YourXML=SomexmlContent;
string result=YourXML.Replace(" />","/>");
Hope this helps!

Related

How to parse an xml that has non-xml data in it

I am working with some xml in C# and am having some issues parsing an xml file due to the format it is in. It has non xml data in the file and I have no control over the format of this file. The file is "test.xml"(see below). I am only concerned with the xml portion of the data, but am unsure the best way to go about accessing it. Any thoughts or recommendations would be greatly appreciated.
Test data -1
Smith, 2234
##*j
Random--
#<?xml version="1.0" encoding="utf-16"?>
<ConfigMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.Test.com/schemas/Test.test.Config">
<Config>
<Version>10</Version>
<Build>00520</Build>
<EnableV>false</EnableV>
<BuildL>22</BuildL>
<BuildP>\\testpath\test</BuildP>
</Config>
</ConfigMessage>
#
Put the whole file into a string that contains anything within the first '<' and the last '>' characters detected on the file. Then you can treat it as normal XML from there. If there's random non-XML elements throughout it though you will need to add additional logic to detect starting/stopping XML "blocks".
I can suggest you such solution: open your pseudo-xml like simple text-file, read whole text, after that, with using regex you ought to take xml document (part of primordial document that is able to be converted to XML [|startTag|any symbols|/endTag|]), put it into XDocument (in memory) and now parse it like XML-file.

AngleSharp and XHTML round-trip

I'm trying to parse an XHTML file using AngleSharp, make a change, then output it. However, I'm having some issues getting the output to match the input.
If I use the XML parser and either the XMLMarkupFormatter or the HtmlMarkupFormatter I get no self-closing tags (all are <img></img>) and no XML declaration.
If I use the HTML parser and the HTMLMarkupFormatter I get XML invalid self-closing tags (all are simply <img>) and no XML declaration.
If I use the HTML parser and the XMLMarkupFormatter I get nice self closing tags (<img />), and the XML declaration - however, the XML declaration is picked up as a comment and outputted as <!-- <?xml version="1.0" encoding="UTF-8"?> -->
Is there a way around this or do I need to write my own MarkupFormatter?
Simple answer: It sounds like you need to provide your own MarkupFormatter.
There has been some effort to come up with an XhtmlMarkupFormatter, but this component has unfortunately not been realized so far. I imagine such a component may combine the serialization from both, the existing HTML and the available XML formatter.
Maybe this issue on the AngleSharp repo helps you.

How to embed xml in xml

I need to embed an entire well-formed xml document within another xml document. However, I would rather avoid CDATA (personal distaste) and also I would like to avoid the parser that will receive the whole document from wasting time parsing the embedded xml. The embedded xml could be quite significant, and I would like the code that will receive the whole file to treat the embedded xml as arbitrary data.
The idea that immediately came to mind is to encode the embedded xml in base64, or to zip it. Does this sound ok?
I'm coding in C# by the way.
You could convert the XML to a byte array, then convert it to binary64 format. That will allow you to nest it in an element, and not have to use CDATA.
The W3C-approved way of doing this is XInclude. There is an implementation for .Net at http://mvp-xml.sourceforge.net/xinclude/
Just a quick note, I have gone the base64 route and it works just fine but it does come with a stiff performance penalty, especially under heavy usage. We do this with document fragments upto 20MB and after base64 encoding they can take upwards of 65MB (with tags and data), even with zipping.
However, the bigger issue is that .NET base64 encoding can consume up-to 10x the memory when performing the encoding/decoding and can frequently cause OOM exceptions if done repeatedly and/or done on multiple threads.
Someone, on a similar question recommended ProtoBuf as an option, as well as Fast InfoSet as another option.
Depending on how you construct the XML, one way is to not care about it and let the framework handle it.
XmlDocument doc = new XmlDocument();
doc.LoadXml("<?xml version=\"1.0\" encoding=\"utf-8\" ?><helloworld></helloworld>");
string xml = "<how><are><you reply=\"i am fine\">really</you></are></how>";
doc.GetElementsByTagName("helloworld")[0].InnerText = xml;
The output will be something like a HTMLEncoded string:
<?xml version="1.0" encoding="utf-8"?>
<helloworld><how><are><you
reply="i am fine">really</you></are></how>
</helloworld>
I would encode it in your favorite way (e.g. base64 or HttpServerUtility::UrlEncode, ...) and then embed it.
If you don't need the xml declaration (first line of the document), just insert the root element (with all childs) into the tree of the other xml document as a child of an existing element. Use a different namespace to seperate the inserted elements.
It seems that serialization is the recommended method.
Can't you use XSLT for this? Perhaps using xsl:copy or xsl:copy-of? This is what XSLT is for.
I use Comments for this :
<!-- your xml text -->
[EDITED]
If the embedded xml with comments, replace it with a different syntax.
<?xml version="1.0" encoding="iso-8859-1" ?>
<xml>
<status code="0" msg="" cause="" />
<data>
<order type="07" user="none" attrib="..." >
<xmlembeded >
<!--
<?xml version="1.0" encoding="iso-8859-1" ?>
<xml>
<status ret="000 "/>
<data>
<allxml_here />
<!** embedeb comments **>
</data>
<xml>
-->
</xmlembeded >
</order>
<context sessionid="12345678" scriptname="/from/..." attrib="..." />
</data>
</xml>

XSLT: transfer xml with the closed tags

I'm using XSLT transfer an XML to a different format XML. If there is empty data with the element, it will display as a self-closing, eg. <data />, but I want output it with the closing tag like this <data></data>.
If I change the output method from "xml" to "html" then I can get the <data></data>, but I will lose the <?xml version="1.0" encoding="UTF-8"?> on the top of the document. Is this the correct way of doing this?
Many thanks.
Daoming
If you want this because you think that self closing tags are ugly, then get over it.
If you want to pass the output to some non-conformant XML Parser that is under control, then use a better parser, or fix the one you are using.
If it is out of your control, and you must send it to an inadequate XML Parser, then do you really need the prolog? If not, then html output method is fine.
If you do need the XML prolog, then you could use the html output method, and prepend the prolog after transformation, but before sending it to the deficient parser.
Alternatively, you could output it as XML with self-closing tags, and preprocess before sending it to your deficient parser with some kind of custom serialisation, using the DOM. If it can't handle self-closing tags, then I'm sure that isn't the only way in which it fails to parse XML. You might need to do something about namespaces, for example.
You could try adding an empty text node to any empty elements that you are outputting. That might do the trick.
Self-closed and explicitly closed elements are exactly the same thing in any regard whatsoever.
Only if somewhere along your processing chain there is a tool that is not XML aware (code that does XML processing with regex, for example), it might make a difference. At which point you should think about changing that part of the processing, instead of the XML generation/serialization part.

Problem with node.GetElementsByTagName in C#

I have a really simple XML file that I'm trying to read, but I can't seem to get it working. Here is the XML file:
<?xml version="1.0"?> <Results><One>45364634</One><Two>-1</Two><Three>B</Three></Results>
I am trying to get the contents of two like this:
XmlNode node = doc.DocumentElement.SelectSingleNode("/Results/Two");
or
XmlNodeList list = doc.GetElementsByTagName("Two");
Neither is working. When I copy paste the XML as a string into the XmlDocument, then it works. However, when I use the string I pull out of the response (where I'm getting the XML from), it doesn't work.
I'm wondering if it's something weird like a character issue or not looking at the correct root, but I can't figure it out. Any ideas?
Thanks!
Check the Xml file encoding ...
Is it ansi? utf-8 or utf-16?
Check if the xml was loaded from the file at all. Check if there is any error, see if the document was populated.
I think the document is not being populated when loading from the file.
By your use of the word "response" I am assuming you are passing the xml via http? If so, try using HttpServerUtility.HtmlDecode( xml ) see if that works
Bleh.
Turns out I was returning an XML document within an XML document. That's why printing to the screen looked ok but I couldn't pull it out.
Thanks guys.

Categories

Resources