Do not serialize empty strings to XML

Do not serialize empty strings to XML - c#

I've got a library project where object are serialized to XML format for further download by users in ASP.NET application. Additionaly i've used XSD to generate types for serialization. The number of types for serialization is very big. Each type is serialized to its own XML. Some types have string properties, sometimes those properties contains empty strings. During serializations those properties are been serialized to some like this
<propertyName />
So this properties become invalid by XSD (they are not required but have some restrictions like string minimal string length etc.
Is there any way to configure XMLSerializer not no serialize empty strings to empty xml elements for all types that are been serialized.
For serializing I use System.Xml.XmlSerializer.

You'd need to implement xml writer/reader for the serializations to work;
You would also need to edit the writer and reader to work on conditionals, first check if a param is an empty string before writing a new xml element and placing its value.
if(string.isNullOrEmpty(this.testString)){
break; // if in a loop of params, just giving an example, rest of the
// xmlwriter implementation would be normal
// note you might need to also implement the reader a bit different - unsure of that.
}
Reference material:
http://forum.codecall.net/topic/58239-c-tutorial-reading-and-writing-xml-files/
http://www.dotnetperls.com/xmlwriter

I would advise you to go back and read the XML specification carefully. See http://www.w3.org/TR/REC-xml/#sec-starttags
where it says:
[Definition: An element with no content is said to be empty.] The representation >of an empty element is either a start-tag immediately followed by an end-tag, or
an empty-element tag. [Definition: An empty-element tag takes a special form:]
So this:
<propertyName />
is exactly equivalent to this:
<propertyName></propertyName>
...and any XML processor that treats them differently is not conforming to the specification.
I find that people often confuse the following concepts when dealing with XML and XML schema:
tag with empty content.
Either form is acceptable. Empty is not the same as 'null' or 'nil'.
An element is allowed to be empty or nil even if minOccurs=1 in the schema.
null value / nil value.
Not the same as empty content. XML has a specific attribute to indicate that the value is 'nil'.
missing tag.
The tag is entirely omitted from the XML document. Not the same as empty or nil.
This will trigger a validation error if minOccurs=1

If you are fetching data from database then you can apply if condition like :-
if (Dbobject.propertyName == ""){
XMLObject.propertyName = null;
} else {
XMLObject.propertyName = Dbobject.propertyName;
}
The null values will not be serialized and the property name will be skipped during XML Serialization.

Related

Handling \x01 received from Flash's ExternalInterface

I'm receiving data from a Flash component embedded in a Windows Form. Unfortunately, if the data returned from the socket contains any of the following characters, the call to loadXml below fails:
This is the callback method I have to receive data from the socket (via ExternalInterface in the Flash component).
private void player_FlashCall(object sender, _IShockwaveFlashEvents_FlashCallEvent e)
{
String output = e.request;
//output = CleanInvalidXmlChars(output);
XmlDocument document = new XmlDocument();
document.LoadXml(output);
XmlAttributeCollection attributes = document.FirstChild.Attributes;
String command = attributes.Item(0).InnerText;
XmlNodeList list = document.GetElementsByTagName("arguments");
process(list[0].InnerText);
I had a method to replace the characters with text (CleanInvalidXmlChars), but I don't think this is the right approach.
How can I load this data into an XML file, as this makes separating the method name, paramter names and parameter types which are returned very easy to work with.
Would appreciate any help at all.
Thanks.

If the “XML” contains any U+0001 (aka '\x01') or other similar characters, it is not a valid XML. There is no way you can include those characters in XML (well, in XML 1.0, anyway). See the XML specification. If you need to pass e.g. binary data in XML, you need to convert them to a proper form, e.g. using Base-64.
If the data does contain those invalid characters, it is not XML, and therefore cannot be read using standard XML tools (I don’t think any of the standard .NET classes allows you to override that behavior). You can either replace all those characters (these are basically all control characters (U+0000 through U+001F) except U+0009 (tab), U+000A and U+000D (CR+LF), plus U+FFFE and U+FFFF (noncharacters)) prior to use as you tried – you could devise a safe transformation which would not lose any data (e.g. first replace all # characters with #0040, then replace any invalid character with #xxxx where xxxx is its code, and when processing the parsed XML data, replace all #xxxx back).
Another option is to drop the XML idea and just process it as a string. Just for inspiration, see e.g. this piece of code.

how to read xml fragment and validate schema

how to parse an xml document and validate the fragment that is not valid using c# ignoring the binary data at the end. is it possible to only parse the xml elements enclosed between the root elements and ignore the binary data.

You can use the XDocument validation methods to validate the document as a whole, then as long as you use the override that embeds the validation information in the XDcoument, you can go back over specific elements and get their validity.
Sorry I don't have any code to hand for this at the moment...

Is there a XmlReader method similar to the XmlWriter.WriteRaw method?

XmlWriter.WriteRaw will preserve &apos; and not send an actual apostrophe. Is there a method to read in &apos; and keep it as such?

You need to encode it properly. Let's take for example the following XML:
<root>&apos;</root>
The value of the <root> node is ' no matter which XML parser you use to read this XML.
On the other hand if you have the following XML:
<root>&apos;</root>
the value of the <root> node is &apos;.
So in both cases we have properly encoded XML so that when a standard compliant parser reads it, it is able to correctly retrieve the value.
So be very careful when using the WriteRaw method when generating the XML. Since it properly encode the argument it is now your responsibility to ensure that you are passing correct data to it.

How do I stop XElement.Save from escaping characters?

I'm populating an XElement with information and writing it to an xml file using the XElement.Save(path) method. At some point, certain characters in the resulting file are being escaped - for example, > becomes >.
This behaviour is unacceptable, since I need to store information in the XML that includes the > character as part of a password. How can I write the 'raw' content of my XElement object to XML without having these escaped?

Lack of this behavior is unacceptable.
A standalone unescaped > is invalid XML.
XElement is designed to produce valid XML.
If you want to get the unescaped content of the element, use the Value property.

The XML specification usually allows > to appear unescaped. XDocument plays it safe and escapes it although it appears in places where the escaping is not strictly required.
You can do a replace on the generated XML. Be aware per http://www.w3.org/TR/REC-xml#syntax, if this results in any ]]> sequences, the XML will not conform to the XML specification. Moreover, XDocument.Parse will actually reject such XML with the error "']]>' is not allowed in character data.".
XDocument doc = XDocument.Parse("<test>Test>Data</test>");
// Don't use this if it could result in any ]]> sequences!
string s = doc.ToString().Replace(">", ">");
System.IO.File.WriteAllText(#"c:\path\test.xml", s);
In consideration that any spec-compliant XML parser must support >, I'd highly recommend fixing the code that is processing the XML output of your program.

How to change character encoding of XmlReader

I have a simple XmlReader:
XmlReader r = XmlReader.Create(fileName);
while (r.Read())
{
Console.WriteLine(r.Value);
}
The problem is, the Xml file has ISO-8859-9 characters in it, which makes XmlReader throw "Invalid character in the given encoding." exception. I can solve this problem with adding <?xml version="1.0" encoding="ISO-8859-9" ?> line in the beginning but I'd like to solve this in another way in case I can't modify the source file. How can I change the encoding of XmlReader?

To force .NET to read the file in as ISO-8859-9, just use one of the many XmlReader.Create overloads, e.g.
using(XmlReader r = XmlReader.Create(new StreamReader(fileName, Encoding.GetEncoding("ISO-8859-9")))) {
while(r.Read()) {
Console.WriteLine(r.Value);
}
}
However, that may not work because, IIRC, the W3C XML standard says something about when the XML declaration line has been read, a compliant parser should immediately switch to the encoding specified in the XML declaration regardless of what encoding it was using before. In your case, if the XML file has no XML declaration, the encoding will be UTF-8 and it will still fail. I may be talking nonsense here so try it and see. :-)

The XmlTextReader class (which is what the static Create method is actually returning, since XmlReader is the abstract base class) is designed to automatically detect encoding from the XML file itself - there's no way to set it manually.
Simply insure that you include the following XML declaration in the file you are reading:
<?xml version="1.0" encoding="ISO-8859-9"?>

If you can't ensure that the input file has the right header, you could look at one of the other 11 overloads to the XmlReader.Create method.
Some of these take an XmlReaderSettings variable or XmlParserContext variable, or both. I haven't investigated these, but there is a possibility that setting the appropriate values might help here.
There is the XmlReaderSettings.CheckCharacters property - the help for this states:
Instructs the reader to check characters and throw an exception if any characters are outside the range of legal XML characters. Character checking includes checking for illegal characters in the document, as well as checking the validity of XML names (for example, an XML name may not start with a numeral).
So setting this to false might help. However, the help also states:
If the XmlReader is processing text data, it always checks that the XML names and text content are valid, regardless of the property setting. Setting CheckCharacters to false turns off character checking for character entity references.
So further investigation is warranted.

Use a XmlTextReader instead of a XmlReader:
System.Text.Encoding.UTF8.GetString(YourXmlTextReader.Encoding.GetBytes(YourXmlTextReader.Value))

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Do not serialize empty strings to XML - c#

Related

Handling \x01 received from Flash's ExternalInterface

how to read xml fragment and validate schema

Is there a XmlReader method similar to the XmlWriter.WriteRaw method?

How do I stop XElement.Save from escaping characters?

How to change character encoding of XmlReader

Categories

Resources