I have a c# script that validates an XML document against an XSD document, as follows:
static bool IsValidXml(string xmlFilePath, string xsdFilePath)
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add(null, xsdFilePath);
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Compile();
try
{
XmlReader xmlRead = XmlReader.Create(xmlFilePath, settings);
while (xmlRead.Read())
{ };
xmlRead.Close();
}
catch (Exception e)
{
return false;
}
return true;
}
I've compiled this after looking at a number of MSDN articles and questions here where this is the solution. It does correctly validate that the XSD is formed well (returns false if I mess with the file) and checks that the XML is formed well (also returns false when messed with).
I've also tried the following, but it does the exact same thing:
static bool IsValidXml(string xmlFilePath, string xsdFilePath)
{
XDocument xdoc = XDocument.Load(xmlFilePath);
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add(null, xsdFilePath);
try
{
xdoc.Validate(schemas, null);
}
catch (XmlSchemaValidationException e)
{
return false;
}
return true;
}
I've even pulled a completely random XSD off the internet and thrown it into both scripts, and it still validates on both. What am I missing here?
Using .NET 3.5 within an SSIS job.
In .NET you have to check yourself if the validator actually matches a schema component; if it doesn't, there is no exception thrown, and so your code will not work as you expect.
A match means one or both of the following:
there is one global element in your schema set with a qualified name that is the same as your XML document element's qualified name.
the document element has an xsi:type attribute, that is a qualified name pointing to a global type in your schema set.
In streaming mode, you can do this check easily. This pseudo-kind-of-code should give you an idea (error handling not shown, etc.):
using (XmlReader reader = XmlReader.Create(xmlfile, settings))
{
reader.MoveToContent();
var qn = new XmlQualifiedName(reader.LocalName, reader.NamespaceURI);
// element test: schemas.GlobalElements.ContainsKey(qn);
// check if there's an xsi:type attribute: reader["type", XmlSchema.InstanceNamespace] != null;
// if exists, resolve the value of the xsi:type attribute to an XmlQualifiedName
// type test: schemas.GlobalTypes.ContainsKey(qn);
// if all good, keep reading; otherwise, break here after setting your error flag, etc.
}
You might also consider the XmlNode.SchemaInfo which represents the post schema validation infoset that has been assigned to a node as a result of schema validation. I would test different conditions and see how it works for your scenario. The first method is recommended to reduce the attack surface in DoS attacks, as it is the fastest way to detect completely bogus payloads.
Related
I would like to check the better approach to check for XML documents that contain external resource.
I have received this error during veracode analysis.
Configure the XML parser to disable external entity resolution.
I can set the XMLResolve to null, but we depend on third party dlls too. So, I would like to validate the xml if it contains any external resource and reject the file immediately.
We do not use DTDs for our XML documents.
So here are the two options that I could think of. I guess both are almost the same. Just wanted to make sure if I'm missing anything.
//Check for DTD element in XML, if it contains, ignore this document.
public bool IsValid(string xml)
{
if (xml.Contains("<!DOCTYPE"))
{
return false;
}
return true;
}
or
public bool IsValid(string xml)
{
XmlReaderSettings xs = new XmlReaderSettings() {DtdProcessing = DtdProcessing.Prohibit};
try
{
XmlReader.Create(xml, xs);
return true;
}
catch (Exception ex)
{
return false;
}
}
Also, this will only resolve DTDs, how can we check for other external resources like entities and schemas? What is the process to check for all the external entities? Thanks for your help.
I am using following code to validate XML agains the XSD:
public static bool IsValidXmlOld(string xmlFilePath, string xsdFilePath)
{
if (File.Exists(xmlFilePath) && File.Exists(xsdFilePath))
{
try
{
XDocument xdocXml = XDocument.Load(xmlFilePath);
var schemas = new XmlSchemaSet();
schemas.Add(null, xsdFilePath);
Boolean result = true;
xdocXml.Validate(schemas, (sender, e) =>
{
result = false;
});
return result;
}
catch (Exception ex)
{
// Logging logic + error handling logic
throw new Exception(ex.Message);
}
}
throw new Exception("Either the Schema or the XML file does not exist Please check");
}
For some reason, it always returns true even if the XML is not valid for given XSD. I picked up this code from following link:
Validate XML against XSD in a single method. sounds like that result= false never gets called even if the xml is completely invalid.
I have a pair of valid and invalid XML that goes against a particular XSD
Valid XML
XSD
Invalid XML
If I try to validate them on This web site then the valid one passes the validation test against the invalid one BUT the invalid XML Fails the test. However, the code above passes both the XMLs invariably.
At the same time it fails the validation when I use some basic XML like following:
XDocument doc2 = new XDocument(
new XElement("Root",
new XElement("Child1", "content1"),
new XElement("Child3", "content1")
)
);
with following error:
The 'Root' element is not declared.: {0}
Now, it clearly demonstrates that the code is not completely incapable of failing a validation. However, what is so special about the 3. Invalid XML that the code passes that particular XML when This Site clearly fails it?
I have a Web Method (within a SOAP Web Service) with a signature of:
public msgResponse myWebMethod([XmlAnyElement] XmlElement msgRequest)
I chose to use the XmlElement parameter after reading that it would allow me to perform my own XSD validation on the parameter. The problem is that the parameter can be quite large (up to 80Mb of XML) so calling XmlElement.OuterXML() as suggested in the link isn't a very practical method.
Is there another way to validate the XmlElement object against an XSD?
More generally, is this an inappropriate approach for implementing a web service expecting large amounts of XML? I've come across some hints at using SoapExtensions for gaining access to the input stream directly but am not sure this is the correct approach for my situation.
Note: Unfortunately, I'm chained to an existing WSDL and XSD that I have no power to alter which is why I went with a non-WCF implementation in the first place.
Here's a quick example. Just pass your XmlElement to this method:
private static void TheAnswer(IXPathNavigable inputElement)
{
var schemas = new XmlSchemaSet();
schemas.Add("http://foo.org/importvalidator.xsd",
#"..\..\validator.xsd");
var settings = new XmlReaderSettings
{
Schemas = schemas,
ValidationFlags =
XmlSchemaValidationFlags.
ProcessIdentityConstraints |
XmlSchemaValidationFlags.
ReportValidationWarnings,
ValidationType = ValidationType.Schema
};
settings.ValidationEventHandler +=
(sender, e) =>
Console.WriteLine("{0}: {1}", e.Severity, e.Message);
using (
XmlReader documentReader =
inputElement.CreateNavigator().ReadSubtree())
{
using (
XmlReader validatingReader = XmlReader.Create(
documentReader, settings))
{
while (validatingReader.Read())
{
}
}
}
}
I got this weird error when I wanted to validate my in-memory xml Schema, anything I did wrong?
[Test]
public void ValidationXML()
{
int errorCount = 0;
var xmlString = "<?xml version='1.0'?><response><error code='1'> Success</error></response>";
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xmlString);
xmlDocument.Validate((sender, e) => errorCount++);
Assert.AreEqual(0, errorCount);
}
The exception was:
failed: System.InvalidOperationException : The XmlSchemaSet on the document is either null or has no schemas in it. Provide schema information before calling Validate.
at System.Xml.XmlDocument.Validate(ValidationEventHandler validationEventHandler, XmlNode nodeToValidate)
at System.Xml.XmlDocument.Validate(ValidationEventHandler validationEventHandler)
You are trying to validate the XmlDocument without assigning a schema to check against.
xmlDocument.Schemas.Add(new XmlSchema());
This tries to validate against an empty schema (as opposed to null) and will fail the validation (instead of throwing an exception), setting errorCount to 1.
Im trying to validate an XML file using a .DTD but it gives me the following error.
'ENTITY' is an unexpected token. The expected token is 'DOCTYPE'. Line 538, position 3.
public static void Validate(string xmlFilename, string schemaFilename)
{
XmlTextReader r = new XmlTextReader(xmlFilename);
XmlValidatingReader validator = new XmlValidatingReader(r);
validator.ValidationType = ValidationType.Schema;
XmlSchemaCollection schemas = new XmlSchemaCollection();
schemas.Add(null, schemaFilename);
validator.ValidationEventHandler += new ValidationEventHandler(ValidationEventHandler);
try
{
while (validator.Read())
{ }
}
catch (XmlException err)
{
Console.WriteLine(err.Message);
}
finally
{
validator.Close();
}
}
The DTD im using to validate = http://www.editeur.org/onix/2.1/reference/onix-international.dtd
I hope someone can help me thanks!
I realise this is a really old question, but for anyone else struggling with this problem, here's what I did.
I gave up trying to validate with the DTD.
Instead, I ended up using the onix 2.1 xsd available at http://www.editeur.org/15/Previous-Releases/#R%202.1%20Downloads. I had to set the default namespace:
var nt = new NameTable();
var ns = new XmlNamespaceManager(nt);
ns.AddNamespace(string.Empty, "http://www.editeur.org/onix/2.1/reference");
var context = new XmlParserContext(null, ns, null, XmlSpace.None);
and then when loading the xml, turn off DTD parsing (this is using .NET4)
var settings = XmlReaderSettings
{
ValidationType = System.Xml.ValidationType.Schema,
DtdProcessing = DtdProcessing.Ignore
}
using(var reader = XmlReader.Create("path to xml file", settings)) { ... }
Edit:
Just noticed: your validation type is also set wrong. Try setting it to ValidationType.DTD instead of Schema.
ValidationType at MSDN
--
The error means exactly as it states- the DTD that is referenced is not well formed, as DOCTYPE should be present before any other declarations in a DTD.
Document Type Definition (Wikipedia)
Introduction to DTD (w3schools)
You might be able to get around this by downloading a local copy, modifying it to add in the expected root element yourself, and then referencing your edited version in your source.