Trouble with hexadecimap chars in XML data - c#

I am trying to create an XML document using LINQ.
XElement element = new XElement("ManufacturerName", supplierName);
XDocument doc = new XDocument(element);
doc.Save("Sample.xml");
The supplierName has some special char at the end whose hexadecimal value is 0x1f.
This will not allow to save the document.
For this instance its this value for others it may be different.
So is there a way to remove any / all special chars?
Thanks in advance.

Related

C# XPathDocument parsing string to XML with BOM

For a code in C#, I am parsing a string to XML using XPathDocument.
The string is retrieved from SDL Trados Studio and it depends on the XML that is being worked on (how it was originally created and loaded for translations) the string sometimes has a BOM sometimes not.
Edit: The 'xml' is actually parsed from the segments of the source and target text and the structure element. The textual elements are escaped for xml and the markup and text is joined in one string. So if the markup has BOM in the xliff, then the string will have BOM.
I am trying to actually parse any of the xmls, independent of encoding. So at this point my solution is to remove the BOM with Substring.
Here is my code:
//Recreate XML files (extractor returns two string arrays)
string strSourceXML = String.Join("", extractor.TextSrc);
string strTargetXML = String.Join("", extractor.TextTgt);
//strip BOM
strSourceXML = strSourceXML.Substring(strSourceXML.IndexOf("<?"));
strTargetXML = strTargetXML.Substring(strSourceXML.IndexOf("<?"));
//Transform XML with the preview XSL
var xSourceDoc = new XPathDocument(strSourceXML);
var xTargetDoc = new XPathDocument(strTargetXML);
I have searched for a better solution, through several articles, such as these, but I found no better solution yet:
XML - Data At Root Level is Invalid
Parsing XML with C#
Parsing complex XML with C#
Parsing : String to XML
XmlReader breaks on UTF-8 BOM
Any advice to solve this more elegantly?
The constructor of XPathDocument taking a String argument https://msdn.microsoft.com/en-us/library/te0h7f95%28v=vs.110%29.aspx takes a URI with the XML file location. If you have a string with XML markup then use a StringReader over that string e.g.
XPathDocument xSourceDoc;
using (TextReader tr = new StringReader(strSourceXML))
{
xSourceDoc = new XPathDocument(tr);
}

Load xml file with XmlDocument

I would like to load a XML File with an absolute path. I have tried doing this:
XmlDocument doc = new XmlDocument();
doc.Load(#"C:\Users\Accueil\Desktop\TestEDI\ARTest.xml");
But I get the error:
the character '<', hexadecimal value 0x3c, cannot be included in a name.
You will get this error if you have a use of < other than as the open tag of an xml element.
For example, <my<Element> could give you this error, because the parser finds the second < while it is expecting either part of the tag name for myElement or the closing tag >.
Another example would be that you wanted to use < in the body of some xml text:
<inequality>Here is an example of an inequality: x < 5</inequality>
The way to avoid this is to make sure that all non opening tag uses of '<' are encoded as proper XML entities, in this case, that would be <
As Andy has said it looks as though you are using restricted characters in your xml file...
Taken from here...
This gives an error message:
<message>if salary < 1000 then</message>
This is fine:
<message>if salary < 1000 then</message>
There are 5 pre-defined entity references in XML:
< < less than
> > greater than
& & ampersand
&apos; ' apostrophe
" " quotation mark
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it.
So replace those illegal characters or consider using CData
Try this:
XmlReader xmlFile;
FileStream fs = new FileStream("/*YOUR XML FILE PATH*/.xml", FileMode.Open, FileAccess.Read, FileShare.ReadWrite); // Creates a FileStream that will open a specific .xml file you want, read it and then write its data into your program
xmlFile = XmlReader.Create(fs, new XmlReaderSettings()); // Enables your program to use this newly "created" .xml file
DataSet ds = new DataSet();
ds.ReadXml(xmlFile); // Uses your .xml file as a DataSet, which can then be used as a data source for something you need (e.g.: a DataGridView)

Avoid escaping &apos; entity with .NET XmlDocument class

How can I avoid XmlDocument class replace &apos; entity with the ' character?
For example if I have:
string xml = "<a> &apos; </a>";
After doing
var doc = new XmlDocument();
doc.LoadXml(xml);
string output = doc.OutterXml;
The value of output is
"<a>'</a>"
I need to avoid this because I must load an XML, make some changes and sign it digitally so the signed XML must be the same loaded.
For your specific requirements, don't use XmlDocument or any other XML parser to parse the original document.
Do use XmlDocument or any other XML-specific classes to create your new document, except put a placeholder where the original document needs to go, like ORIGINAL_DOCUMENT_HERE. Then after you've generated the resulting text XML for your new document, replace ORIGINAL_DOCUMENT_HERE with your original received text, and then sign the result.
Not a normal way to work with XML, but should work for your specific use case.

How do I preserve whitespace characters when parsing XML from C# LINQ

What do I need to do in either my C# code or my XML document so that the XDocument parser reads literal whitespace for Values of XElements?
Background
I have an XML document, part of which looks like this:
<NewLineString>
</NewLineString>
<IndentString> </IndentString>
I'm adding the values of each XELement to a data dictionary using a LINQ query; the .ForEach part looks like this:
.ForEach(x => SchemaDictionary.Add(
LogicHelper.GetEnumValue(x.Name.ToString()), x.Value));
To test to see if the whitespace values were preserved, I'm printing out a line of the character numbers of each value item in the data dictionary. In the following code, x represents a KeyValuePair and the Aggregate is simply making a string of the character integer values:
x.Value.ToCharArray()
.Aggregate<char,string>("",(word,c) => word + ((int)c).ToString() + " " )
));
I expected to see 10 13 for the <NewLineString> value and 32 32 32 32 for the <IndentString> value. However, nothing was printed for each value (note: other escaped values in the XML such as < printed their character numbers correctly).
What do I need to do in either my C# code or my XML document so that my parser adds the complete whitespace string to the Data Dictionary?
Try loading your XDocument with the LoadOptions.PreserveWhitespace
Try loading your document this way.
XmlDocument doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.Load("book.xml");
or just modify your input xml to:
<NewLineString>
</NewLineString>
<IndentString xml:space="preserve"> </IndentString>

Create XML using Linq to XML and arrays

I am using Linq To XML to create XML that is sent to a third party. I am having difficulty understanding how to create the XML using Linq when part of information I want to send in the XML will be dynamic.
The dynamic part of the XML is held as a string[,] array. This multi dimensional array holds 2 values.
I can 'build' the dynamic XML up using a stringbuilder and store the values that were in the array into a string variable but when I try to include this variable into Linq the variable is HTMLEncoded rather than included as proper XML.
How would I go about adding in my dynamically built string to the XML being built up by Linq?
For Example:
//string below contains values passed into my class
string[,] AccessoriesSelected;
//I loop through the above array and build up my 'Tag' and store in string called AccessoriesXML
//simple linq to xml example with my AccessoriesXML value passed into it
XDocument RequestDoc = new XDocument(
new XElement("MainTag",
new XAttribute("Innervalue", "2")
),
AccessoriesXML);
'Tag' is an optional extra, it might appear in my XML multiple times or it might not - it's dependant on a user checking some checkboxes.
Right now when I run my code I see this:
<MainTag> blah blah </MainTag>
&lt ;Tag&gt ;&lt ;InnerTag&gt ; option1="valuefromarray0" option2="valuefromarray1" /&gt ;&lt ;Tag/&gt ;
I want to return something this:
<MainTag> blah blah </MainTag>
<Tag><InnerTag option1="valuefromarray0" option2="valuefromarray1" /></Tag>
<Tag><InnerTag option1="valuefromarray0" option2="valuefromarray1" /></Tag>
Any thoughts or suggestions? I can get this working using XmlDocument but I would like to get this working with Linq if it is possible.
Thanks for your help,
Rich
Building XElements with the ("name", "value") constructor will use the value text as literal text - and escape it if necessary to achieve that.
If you want to create the XElement programatically from a snippet of XML text that you want to actually be interpreted as XML, you should use XElement.Load(). This will parse the string as actual XML, instead of trying to assign the text of the string as an escaped literal value.
Try this:
XDocument RequestDoc = new XDocument(
new XElement("MainTag",
new XAttribute("Innervalue", "2")
),
XElement.Load(new StringReader(AccessoriesXML)));

Categories

Resources