C# validating XML with XSD without loading unvalidated XSD?

C# validating XML with XSD without loading unvalidated XSD? - c#

A security scan of our C# source reported "Missing XML Validation" as a possible injection flaw. It cited https://cwe.mitre.org/data/definitions/112.html and other sources.
Its recommendation was:
Always enable validation when you parse XML. If enabling validation
causes problems because the rules for defining a well-formed document
are Byzantine or altogether unknown, chances are good that there are
security errors nearby.
Example: The following code demonstrates how to enable validation when using XmlReader.XmlReader
Settings settings = new XmlReaderSettings();
settings.Schemas.Add(schema);
settings.ValidationType = ValidationType.Schema;
StringReader sr = new StringReader(xmlDoc);
XmlReader reader = XmlReader.Create(sr, settings);
I have an XSD schema available for validation. My question is, how do I load the XSD as an XmlSchema without duplicating the error of loading an XML file without validation?
If I read the XSD from the file system, I think I am just duplicating the same error (reading XML without validation). Is there a recommended way to do this?
Our first approach was to read the XSD from the file system, like:
XmlTextReader xsdReader = new XmlTextReader("MySchema.xsd"));
XmlSchema schema = XmlSchema.Read(xsdReader, ValidationCallback);
But, I believe this causes the same error, reading the XML (in this case the XSD) without validation.
The approach that we are using now (that I think will pass the security scan) is to load the XSD from an embedded resource.
Stream xsdStream = Assembly.GetAssembly(typeof(MyType))
.GetManifestResourceStream("MyNamespace.MySchema.xsd");
if (xsdStream == null) throw ...
XmlSchema schema = XmlSchema.Read(xsdStream, ValidationCallback);
We have not rescanned yet, but I suspect the embedded resource approach will pass. But, is there recommended or best practice approach to this?

Anyone who can write the phrase "If enabling validation causes problems because the rules for defining a well-formed document are Byzantine" is revealing that they know very little about XML; it seems they don't understand the difference between being valid and being well-formed, which is pretty fundamental. So you're having to find ways of getting around rules that aren't very smart. At this point you have to decide whether your objective is to make the system more secure, or to pass the security tests.
It's very hard to see what security vulnerabilities will be fixed by enabling validation.
Especially as you can write a schema that accepts any document as valid, and I bet your security tool will be happily content that you are obeying the rules even though you haven't increased security one iota.
When a schema processor loads a schema then it automatically validates that it is a valid schema. So there really isn't any risk. But whether your security scanner accepts that there isn't any risk is another matter entirely.

Related

Validate XML without XSD

I have an XML file coming in and need that to have a few specific tags without that I cannot process that file. How can I make sure if those tags are there or not , I tried using the XSD validation but file format keeps changing and they keep sending additional tags which I do not need to process the file , but having those additional tags does not harm my process.
Is there a way to write the XSD in a way that it only looks for a few tags and ignore the others?

You can create an xsd in which you have all of the elements you require. By default an element has minOccurs=1, which would imply that it's required. Then in order to ignore all of the rest you need to add <xs:any processContents="lax" macOccurs="unbounded"/>, which basically says that the xml may contain any number of additional elements which do not need to be validated.

Consider forgoing an XSD and instead writing XPath checks against the XML to test known-invariant properties of your XSD. XSD is better for when you have a known, relatively static grammar. Ad hoc XPath assertions or Schematron would be better for XML that can't be held do a definitive grammar.

What sort of problems yield validation warnings when parsing a XML document using a XSD schema

I am trying to reacquaint myself with XML documents and schemas (XSD), but I am having trouble understanding what sort of XML document issues would yield a XmlSeverityType.Warning.
Taking this as an example, what would I need to change in the XML document (or XSD schema) to force a warning to appear?

As #Evk commented, the documentation for XmlSeverityType.Warning states:
Warning Indicates that a validation event occurred that is not an error. A warning is typically issued when there is no DTD, or XML
Schema to validate a particular element or attribute against. Unlike
errors, warnings do not throw an exception if there is no validation
event handler.
Therefore, it's not so much what to change in your XML or XSD but what you might change to cause your XSD not to be found for your XML, i.e. perturbing (or deleting) the following line in your source:
settings.Schemas.Add("http://www.contoso.com/books", "contosoBooks.xsd");
It's possible that the .NET XSD processor has opted to present additional diagnostics beyond those required by the W3C XSD Recommendation or W3C XML Recommendation (see also valid vs well-formed), but the only change mentioned in the documentation pertains to validation request that cannot be fulfilled due to a missing schema.

Is there a way to use the new Linq to Xml classes but still validate the XML file against a schema file?

Bit stuck trying to use the new lightweight Linq/XML classes in .NET. I can find 'old' code that uses the System.Xml.XmlReader and will load an XML file and allow validation against an XSD schema file, but I can't find any code that shows how to get the newer Linq/XML classes to do the same validation. Seem to be missing a key Lego brick here somewhere.

DTD prohibited in xml document exception

I'm getting this error when trying to parse through an XML document in a C# application:
"For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method."
For reference, the exception occurred at the second line of the following code:
using (XmlReader reader = XmlReader.Create(uri))
{
reader.MoveToContent(); //here
while (reader.Read()) //(code to parse xml doc follows).
My knowledge of Xml is pretty limited and I have no idea what DTD processing is nor how to do what the error message suggests. Any help as to what may be causing this and how to fix it? thanks...

First, some background.
What is a DTD?
The document you are trying to parse contains a document type declaration; if you look at the document, you will find near the beginning a sequence of characters beginning with <!DOCTYPE and ending with the corresponding >. Such a declaration allows an XML processor to validate the document against a set of declarations which specify a set of elements and attributes and constrain what values or contents they can have.
Since entities are also declared in DTDs, a DTD allows a processor to know how to expand references to entities. (The entity pubdate might be defined to contain the publication date of a document, like "15 December 2012", and referred to several times in the document as &pubdate; -- since the actual date is given only once, in the entity declaration, this usage makes it easier to keep the various references to publication date in the document consistent with each other.)
What does a DTD mean?
The document type declaration has a purely declarative meaning: a schema for this document type, in the syntax defined in the XML spec, can be found at such and such a location.
Some software written by people with a weak grasp of XML fundamentals suffers from an elementary confusion about the meaning of the declaration; it assumes that the meaning of the document type declaration is not declarative (a schema is over there) but imperative (please validate this document). The parser you are using appears to be such a parser; it assumes that by handing it an XML document that has a document type declaration, you have requested a certain kind of processing. Its authors might benefit from a remedial course on how to accept run-time parameters from the user. (You see how hard it is for some people to understand declarative semantics: even the creators of some XML parsers sometimes fail to understand them and slip into imperative thinking instead. Sigh.)
What are these 'security reasons' they are talking about?
Some security-minded people have decided that DTD processing (validation, or entity expansion without validation) constitutes a security risk. Using entity expansion, it's easy to make a very small XML data stream which expands, when all entities are fully expanded, into a very large document. Search for information on what is called the "billion laughs attack" if you want to read more.
One obvious way to protect against the billion laughs attack is for those who invoke a parser on user-supplied or untrusted data to invoke the parser in an environment which limits the amount of memory or time the parsing process is allowed to consume. Such resource limits have been standard parts of operating systems since the mid-1960s. For reasons that remain obscure to me, however, some security-minded people believe that the correct answer is to run parsers on untrusted input without resource limits, in the apparent belief that this is safe as long as you make it impossible to validate the input against an agreed schema.
This is why your system is telling you that your data has a security issue.
To some people, the idea that DTDs are a security risk sounds more like paranoia than good sense, but I don't believe they are correct. Remember (a) that a healthy paranoia is what security experts need in life, and (b) that anyone really interested in security would insist on the resource limits in any case -- in the presence of resource limits on the parsing process, DTDs are harmless. The banning of DTDs is not paranoia but fetishism.
Now, with that background out of the way ...
How do you fix the problem?
The best solution is to complain bitterly to your vendor that they have been suckered by an old wive's tale about XML security, and tell them that if they care about security they should do a rational security analysis instead of prohibiting DTDs.
Meanwhile, as the message suggests, you can "set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method." If the input is in fact untrusted, you might also look into ways of giving the process appropriate resource limits.
And as a fallback (I do not recommend this) you can comment out the document type declaration in your input.

Note that settings.ProhibitDtd is now obsolete, use DtdProcessing instead: (new options of Ignore, Parse, or Prohibit)
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
and as stated in this post: How does the billion laughs XML DoS attack work?
you should add a limit to the number of characters to avoid DoS attacks:
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
settings.MaxCharactersFromEntities = 1024;

As far as fixing this, with a bit of looking around I found it was as simple as adding:
XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
and passing these settings into the create method.
[UPDATE 3/9/2017]
As some have pointed out, .ProhibitDTDT is now deprecated. Dr. Aaron Dishno's answer, below, shows the superseding solution

After trying all of the above answers without success I changing the service user from service#mydomain.com to service#mydomain.onmicrosoft.com and now the app works correctly while running in azure.
Alternatively if you run into this problem in an environment you have more control over; you can paste the following into your hosts file:
127.0.0.1 msoid.onmicrosoft.com
127.0.0.1 msoid.mydomain.com
127.0.0.1 msoid.mydomain.onmicrosoft.com
127.0.0.1 msoid.*.onmicrosoft.com

Xml Serialization and Schemas in .net (C#)

The following questions are about XML serialization/deserialization and schema validation for a .net library of types which are to be used for data exchange.
First question, if I have a custom xml namespace say "http://mydomain/mynamespace" do I have to add a
[XmlRoot(Namespace = "http://mydomain/mynamespace")]
to every class in my library. Or is there a way to define this namespace as default for the whole assembly?
Second question, is there a reason behind the always added namespaces
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
even if there is no actual reference to any of the namespaces? I just feel they add noise to the resulting xml. Is there a way to remove them an only have the custom namespace in the resulting xml?
Third question, are there tools to support the generation of schema definitions (e.g. for all public [Serializable] classes of an assembly) and the validation of xml against specific schemas available?
If there are, would you recommend XML Schema from W3C or RELAX NG?

Just to add - the "xsi" etc is there to support things like xsi:nil on values later on - a well-known pattern for nullable values. It has to write the stream "forwards only", and it doesn't know (when it writes the first bit) whether it will need nil or not, so it assumes that writing it unnecessarily once is better than having to use the full namespace potentially lots of times.

1) XmlRoot can only be set at the class/struct/interface level (or on return values). So you can't use it on the assembly level. What you're looking for is the XmlnsDefinitionAttribute, but I believe that only is used by the XamlWriter.
2) If you're worried about clutter you should avoid xml. Well formed xml is full of clutter. I believe there are ways to interract with the xml produced by the serializer, but not directly with the XmlSerializer. You have much more control over the XML produced with the XmlWriter class. Check here for how you can use the XmlWriter to handle namespaces.
3) XSD.exe can be used to generate schemas for POCOs, I believe (I've always written them by hand; I may be using this soon to write up LOTS, tho!).

Tools,
- xsd.exe, with a command line like
xsd /c /n:myNamespace.Schema.v2_0 myschema_v2_0.xsd
I put the schema in a separate project.
liqudXML which is useful if there are several schemas, or you want full support of the schema features (DateTimes with offsets, positive/Negative decimals,), and cross platform generation.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.