Using the following MSDN documentation I validate an XML file against a schema: http://msdn.microsoft.com/en-us/library/8f0h7att%28v=vs.100%29.aspx
This works fine as long as the XML contains a reference to the schema location or the inline schema. Is it possible to embed the schema "hard-coded" into the application, i.e. the XSD won't reside as a file and thus the XML does not need to reference it?
I'm talking about something like:
Load XML to be validated (without schema location).
Load XSD as a resource or whatever.
Do the validation.
Try this:
Stream objStream = objFile.PostedFile.InputStream;
// Open XML file
XmlTextReader xtrFile = new XmlTextReader(objStream);
// Create validator
XmlValidatingReader xvrValidator = new XmlValidatingReader(xtrFile);
xvrValidator.ValidationType = ValidationType.Schema;
// Add XSD to validator
XmlSchemaCollection xscSchema = new XmlSchemaCollection();
xscSchema.Add("xxxxx", Server.MapPath(#"/zzz/XSD/yyyyy.xsd"));
xvrValidator.Schemas.Add(xscSchema);
try
{
while (xvrValidator.Read())
{
}
}
catch (Exception ex)
{
// Error on validation
}
You can use the XmlReaderSettings.Schemas property to specify which schema to use. The schema can be loaded from a Stream.
var schemaSet = new XmlSchemaSet();
schemaSet.Add("http://www.contoso.com/books", new XmlTextReader(xsdStream));
var settings = new XmlReaderSettings();
settings.Schemas = schemaSet;
using (var reader = XmlReader.Create(xmlStream, settings))
{
while (reader.Read());
}
You could declare the XSD as an embedded resource and load it via GetManifestResourceStream as described in this article: How to read embedded resource text file
Yes, this is possible. Read the embedded resource file to string and then create your XmlSchemaSet object adding the schema to it. Use it in your XmlReaderSettings when validating.
Related
I am trying to convert an XML document to another by using XslCompiledTransform. But I am getting an exception with the following error message:
For security reasons DTD is prohibited in this XML document. To enable DTD processing set the DtdProcessing property on XmlReaderSettings to Parse and pass the settings into XmlReader.Create method.
I already set the DtdProcessing property on XmlReaderSettings to Parse.
However, I am still encountering the same exception with the same error message.
My sample code:
XslCompiledTransform xslt = new XslCompiledTransform(false);
XmlReaderSettings xmlReaderSettings = new XmlReaderSettings() { DtdProcessing = DtdProcessing.Parse, ValidationType = ValidationType.DTD };
xmlReaderSettings.DtdProcessing = DtdProcessing.Parse;
xmlReaderSettings.ValidationType = ValidationType.DTD;
xmlReaderSettings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
using (XmlReader xsltReader = XmlReader.Create(_tesseractSettings.GetXSLTFilePath(), xmlReaderSettings))
{
xslt.Load(_tesseractSettings.GetXSLTFilePath());
xslt.Transform(inputFile, outputFile);
}
ValidationCallBack:
private static void ValidationCallBack(object sender, ValidationEventArgs e)
{
File.WriteAllText(someTxtFilePath, e.Message);
}
If it is relevant here is the document type declarations.
Input XML:
<!DOCTYPE html PUBLIC "/W3C/DTD XHTML 1.0 Transitional/EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
XSLT file:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:htm="http://www.w3.org/1999/xhtml" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0" xpath-default-namespace="http://www.w3.org/1999/xhtml">
<xsl:output method="xml" omit-xml-declaration="yes"/>
I have also tried setting DtdProcessing to DtdProcessing.Ignore and the same exception occurs. I have also tried removing the Document Type declaration element before the transformation. When I do this I no longer get the exception, however the transformation doesn't give me the output I expect. I know the issue isn't with the xslt file because the transformation still works on Oxygen or any online tester.
I have been researching the internet but with no avail.
Any help would be appreciated thank you.
DTD which stands for Document Type Definition.The purpose of a DTD is to define the structure and the legal elements and attributes of an XML document. However it is occasionally exploited by hackers to perform something known as XXE (XML External Entity) attacks.
So Microsoft basically provides three options for Dtd Processing to avoid such attacks:
Parse: Allows parsing of DTD content inside of xml file by the xml reader. However we need to set also the validation type and validation callback to report any errors/warnings.
Inline is an example from microsoft docs:
using System;
using System.Xml;
using System.Xml.Schema;
using System.IO;
public class Sample {
public static void Main() {
// Set the validation settings.
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
settings.ValidationType = ValidationType.DTD;
settings.ValidationEventHandler += new ValidationEventHandler (ValidationCallBack);
// Create the XmlReader object.
XmlReader reader = XmlReader.Create("itemDTD.xml", settings);
// Parse the file.
while (reader.Read());
}
// Display any validation errors.
private static void ValidationCallBack(object sender, ValidationEventArgs e) {
Console.WriteLine("Validation Error: {0}", e.Message);
}
}
Prohibit: Also the default value of DtdProcessing. Throws an exception when xml reader encounters any DTD content in xml file.
Ignore: It simply instructs the xml reader to ignore any dtd content inside of xml file and process it. As a result the output is ripped of any dtd content if present. Thus results in loss of data.
var xDoc = XDocument.Load(fileName);
I am using above code in a function to load an XML file. Functionality wise its working fine but it is showing following Veracode Flaw after Veracode check.
Description
The product processes an XML document that can contain XML entities with URLs that resolve to documents outside
of the intended sphere of control, causing the product to embed incorrect documents into its output. By default, the
XML entity resolver will attempt to resolve and retrieve external references. If attacker-controlled XML can be
submitted to one of these functions, then the attacker could gain access to information about an internal network, local
filesystem, or other sensitive data. This is known as an XML eXternal Entity (XXE) attack.
Recommendations
Configure the XML parser to disable external entity resolution.
What I need to do to resolve it.
If you are not using external entity references in your XML, you can disable the resolver by setting it to null, from How to prevent XXE attack ( XmlDocument in .net)
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.XmlResolver = null;
xmlDoc.LoadXml(OurOutputXMLString);
If you are expecting the document to contain entity references, then you will need to create a custom resolver and whitelist what you are expecting. Especially, any references to websites that you do not control.
Implement a custom XmlResolver and use it for reading the XML. By default, the XmlUrlResolver is used, which automatically downloads the resolved references.
public class CustomResolver : XmlUrlResolver
{
public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
// base calls XmlUrlResolver.DownloadManager.GetStream(...) here
}
}
And use it like this:
var settings = new XmlReaderSettings { XmlResolver = new CustomResolver() };
var reader = XmlReader.Create(fileName, settings);
var xDoc = XDocument.Load(reader);
According to the official OWASP documentation you need to do this:
Use of XercesDOMParser do this to prevent XXE:
XercesDOMParser *parser = new XercesDOMParser;
parser->setCreateEntityReferenceNodes(false);
Use of SAXParser, do this to prevent XXE:
SAXParser* parser = new SAXParser;
parser->setDisableDefaultEntityResolution(true);
Use of SAX2XMLReader, do this to prevent XXE:
SAX2XMLReader* reader = XMLReaderFactory::createXMLReader();
parser->setFeature(XMLUni::fgXercesDisableDefaultEntityResolution, true);
Take a look at these guide: https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
you can try this way:
XmlDocument doc = new XmlDocument() { XmlResolver = null };
System.IO.StringReader sreader = new System.IO.StringReader(fileName);
XmlReader reader = XmlReader.Create(sreader, new XmlReaderSettings() { XmlResolver = null });
doc.Load(reader);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// to be compliant, completely disable DOCTYPE declaration:
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
// or completely disable external entities declarations:
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
// or prohibit the use of all protocols by external entities:
factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
factory.setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
I have a problem in validating XML against XSD when the base XSD is importing some other XSDs from site. For example, for the following XSD item, it is throwing error.
<link:linkbase xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:link = 'http://www.xbrl.org/2003/linkbase' xmlns:xbrli = 'http://www.xbrl.org/2003/instance' xmlns:xlink = 'http://www.w3.org/1999/xlink' xsi:schemaLocation = 'http://www.xbrl.org/2003/linkbase http://www.xbrl.org/2003/xbrl-linkbase-2003-12-31.xsd' >
Is there any solution for importing the XSD by release version of DLLs. I am using the following C# code for validating XML against the XSD. The same is working when I execute it through Visual Studio.
var schemas = new XmlSchemaSet();
schemas.Add(null, xsdFilePath);
var readerSettings = new XmlReaderSettings();
readerSettings.ValidationType = ValidationType.Schema;
readerSettings.ValidationFlags |= XmlSchemaValidationFlags.ProcessSchemaLocation;
readerSettings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings;
readerSettings.Schemas.Add(schemas);
using (var xmlReader = XmlReader.Create(xmlFilePath, readerSettings))
{
while (xmlReader.Read())
{
}
}
Obviously, the parser cannot find the schema xbrl-instance-2003-12-31. From the w3 schema specs:
(xsi:schemaLocation) records the author's warrant with pairs of URI references (one for the namespace name, and one for a hint as to the location of a schema document defining names for that namespace name)
that is, the first part of your schemaLocation definition xbrl.org/2003/xbrl-instance-2003-12-31.xsd is the namespace. If the parser doesn't already know where to find the schema for such namespace, you must provide it with the location. For example:
<xs:import
namespace='xbrl.org/2003/instance'
schemaLocation='xbrl.org/2003/xbrl-instance-2003-12-31.xsd http:/xbrl.org/2003/xbrl-instance-2003-12-31.xsd'/>
I have a webservice that gets specific XML which does not have a schema specified in the file itself.
I do have XSD schemas in my project which will be used to test the obtained XML files against them.
The problem is that whatever I do the validator seems to accept the files even when they aren't valid.
The code I'm using is this (some parts omitted to make it easier):
var schemaReader = XmlReader.Create(new StringReader(xmlSchemeInput));
var xmlSchema = XmlSchema.Read(schemaReader, ValidationHandler);
var xmlReaderSettings = new XmlReaderSettings();
xmlReaderSettings.Schemas.Add(xmlSchema);
xmlReaderSettings.ValidationEventHandler += ValidationHandler;
xmlReaderSettings.ValidationType = ValidationType.Schema;
xmlReaderSettings.ValidationFlags |= XmlSchemaValidationFlags.ProcessIdentityConstraints;
xmlReaderSettings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings;
xmlReaderSettings.ValidationFlags |= XmlSchemaValidationFlags.ProcessSchemaLocation;
using(var xmlReader = XmlReader.Create(new StringReader(xmlInput), xmlReaderSettings))
{
while (xmlReader.Read()) { }
}
// return if the xml is valid or not
I've found several solutions with an inline specified schema which work great, but with a schema specified like this (which I assume should work) I can't seem to find any.
Am I doing something wrong? Or am I just wrong in assuming this is how it should work?
Thanks!
Try adding
xmlReaderSettings.Schemas.Compile()
after
xmlReaderSettings.Schemas.Add(xmlSchema);
worked for me in that situation.
I am processing an XML file (which does not contain any dtd or ent declarations) in C# that contains entities such as é and à. I receive the following exception when attempting to load an XML file...
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(record);
Reference to undeclared entity
'eacute'.
I was able to track down the proper ent file here. How do I tell XmlDocument to use this ent file when loading my XML file?
In versions of the framework prior to .Net 4 you use ProhibitDtd of an XmlReaderSettings instance.
var settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
string DTD = #"<!DOCTYPE doc [
<!ENTITY % iso-lat1 PUBLIC ""ISO 8879:1986//ENTITIES Added Latin 1//EN//XML""
""http://www.oasis-open.org/docbook/xmlcharent/0.3/iso-lat1.ent"">
%iso-lat1;
]> ";
string xml = string.Concat(DTD,"<xml><txt>rené</txt></xml>");
XmlDocument xd = new XmlDocument();
xd.Load(XmlReader.Create(new MemoryStream(
UTF8Encoding.UTF8.GetBytes(xml)), settings));
From .Net 4.0 onward use the DtdProcessing property with a value of DtdProcessing.Parse which you set on the XmlTextReader.
XmlDocument xd = new XmlDocument();
using (var rdr = new XmlTextReader(new StringReader(xml)))
{
rdr.DtdProcessing = DtdProcessing.Parse;
xd.Load(rdr);
}
I ran into the same problem, and not wanting to modify my XML (or DTD), I decided to create my own XmlResolver to add entities on the fly.
My implementation actually reads entities from the config file, but this should be enough to do what you're asking for. In this example, I'm converting a right single curly quote into an apostrophe.
class XmlEntityResolver : XmlResolver {
public override object GetEntity(Uri absoluteUri,
string role,
Type ofObjectToReturn)
{
if (absoluteUri.toString() == "-//MY PUB ID") {
MemoryStream ms = new MemoryStream();
StreamWriter sw = new StreamWriter(ms);
sw.Write("<!ENTITY rsquo \"'\">");
sw.Flush();
ms.Position = 0;
return ms;
}
else {
return base.GetEntity(absoluteUri, role, ofObjectToReturn);
}
}
}
Then, when you declare your XmlDocument, just set the resolver prior to load.
XmlDocument doc = new XmlDocument();
doc.XmlResolver = new XmlEntityResolver();
doc.Load(XML_FILE);
é is not a valid XML entity by default whereas it is a valid HTML entity by default.
You would need to define é as a valid XML entity for XML parsing purposes.
EDIT:
To add a reference to your external ent file you need to do that within the XML file itself. Save the ent file to disk and place it within the same directory as the document being parsed.
<!ENTITY % stuff SYSTEM "iso-lat1.ent">
%stuff;
If you want to go a different route check out the information on ENTITY declaration.
According to this, you have to reference them within the file; you cannot tell LoadXml to do this for you.
Your question has been answered in 2004 itself at MSDN Article........ You can find it here.......
http://msdn.microsoft.com/en-us/library/aa302289.aspx