I am trying to validate an xml file via a .dtd. I have write this validator:
public bool Validation(XmlDocument xmlDoc)
{
var xml = XmldocToString(xmlDoc);
var r = new XmlTextReader(new StringReader(xml));
var settings = new XmlReaderSettings();
var sb = new StringBuilder();
settings.ProhibitDtd = false;
settings.ValidationType = ValidationType.DTD;
settings.ValidationEventHandler += (a, e) =>
{
sb.AppendLine(e.Message);
_isValid = false;
};
XmlReader validator = XmlReader.Create(r, settings);
while (validator.Read())
{
}
validator.Close();
return _isValid;
}
The problem is that I must have the dtd file in bin directory of the Solution. I want to chose a diferent directory to keep the .dtd file and i really can't find how.
Thank you for your time.
Declare in the Xml file the association with the DTD:
Example in case the dtd is stored in a remote server:
<!DOCTYPE Catalog PUBLIC "abc/Catalog" "http://xyz.abc.org/dtds/catalog.dtd">
Take a look at that wiki page and at that site for more options and information about Xml files and DTD association.
Example in case the dtd is placed locally (SYSTEM):
How to reference a DTD from a document:
Assuming the top element of the document is spec and the dtd is placed
in the file mydtd in the subdirectory dtds of the directory from where
the document were loaded:
<!DOCTYPE spec SYSTEM "dtds/mydtd">
Notes:
The system string is actually an URI-Reference (as defined in RFC
2396) so you can use a full URL string indicating the location of your
DTD on the Web. This is a really good thing to do if you want others
to validate your document. It is also possible to associate a PUBLIC
identifier (a magic string) so that the DTD is looked up in catalogs
on the client side without having to locate it on the web. A DTD
contains a set of element and attribute declarations, but they don't
define what the root of the document should be. This is explicitly
told to the parser/validator as the first element of the DOCTYPE
declaration.
(Excerpt from here)
Related
I cannot understand when validation of XML occurs on Load or on Validate. Here is following code...
XmlDocument doc = null;
try
{
XmlReaderSettings settings = new XmlReaderSettings( );
settings.Schemas.Add("http://xxx/customs/DealFile/Common/ReleaseGoodsMessage",
ConfigurationManager.AppSettings.Get("Schemas"));
settings.ValidationType = ValidationType.Schema;
using (XmlReader reader = XmlReader.Create(path, settings)) {
doc = new XmlDocument( );
doc.Load(reader);
}
ValidationEventHandler eventHandler = new ValidationEventHandler(ValidationEventHandler);
doc.Validate(eventHandler);
}
catch(XmlSchemaException xmlErr)
{
// Do something
}
I expect a validation to occur on line doc.Validate(eventHandler);
However it always occurs on doc.Load(reader); . I've got an exception if something wrong with XML.
XMLHelpers.LoadXML(#"C:\work\Xml2Db\Xml2Db\Data\Tests\BadData\01.xml")
Exception thrown: 'System.Xml.Schema.XmlSchemaValidationException' in System.Xml.dll
xmlErr.Message
"The 'http://xxx/customs/DealFile/Common/ReleaseGoodsMessage:governmentProcedureType' element is invalid -
The value 'a' is invalid according to its datatype 'Int' - The string 'a' is not a valid Int32 value."
And this is the code from Microsoft's example https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmldocument.validate?view=netcore-3.1
try
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add("http://www.contoso.com/books", "contosoBooks.xsd");
settings.ValidationType = ValidationType.Schema;
XmlReader reader = XmlReader.Create("contosoBooks.xml", settings);
XmlDocument document = new XmlDocument();
document.Load(reader);
ValidationEventHandler eventHandler = new ValidationEventHandler(ValidationEventHandler);
// the following call to Validate succeeds.
document.Validate(eventHandler);
...
It's actually the same.
But, pay attention on comment // the following call to Validate succeeds. . They also expect to get validation on the line document.Validate(eventHandler);
What's going on.
As your block of code sets up the settings object, it sets a schema and the Validator to use ValidationType.Schema (i.e.: use the schema).
When you setup the XmlReader, using your settings it's setup to validate according to the schema, too - which is causing your schema-based error/exception.
The call to document.Validate(eventHandler); is completely redundant, because it will succeed in all circumstances - because the xml has already been validated. The comment is correct "the following call to Validate succeeds" because the document has already been proved valid.
I suspect that you are failing to distinguish between XML that is well-formed and XML that is valid.
A well-formed XML document satisfies all of the rules of the XML specification. If it does not, you should get a well-formedness error from any XML parser.
If you also choose to
a) supply an XSD that describes you XML document and
b) tell your XML processor to validate against that XSD
then the XML processor will also check that the document satisfies the rules in the XML schema (an XML Schema is composed of one or more XSDs).
If you are still not sure, edit your question and supply the error message(s) that you are seeing. You don't need to include any confidential information - the error template is enough to tell which kind of error it is.
I'm trying to validate my XML using external dtd file. Here is XML header:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE CONTEXT SYSTEM "Data.dtd">
<CONTEXT>
...
</CONTEXT>
And here is my code:
// Set the validation settings.
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
settings.ValidationType = ValidationType.DTD;
settings.ValidationEventHandler += (sender, args) => Debug.WriteLine(args.Message);
// Create the XmlReader object.
XmlReader reader = XmlReader.Create("Data.xml", settings);
// Parse the file.
while (reader.Read());
After running this code I receive in result a lot of errors looks the same way:
The 'CONTEXT' element is not declared.
I've tried to change file name in doctype for obviously nonexistent file, but as result get the same errors. Please tell me where have I been mistaken?
I could reproduce the problem, as a fix I would suggest to set
settings.XmlResolver = new XmlUrlResolver();
that way, the external DTD file is fetched, it seems, otherwise not. The documentation on MSDN says: "Starting with the .NET Framework 4.5.2, this setting has a default value of null.". So it seems, you need to create it explicitly.
I have some code (in c#) that creates a bunch of XML sheets on the fly. At the end of my code I am generating XSDs based on those XML sheets. I am making the XSDs successfully, but saving them as files is what I cannot figure out. My code so far is basically taken from the MSDN page on generating XSDs from XML sheets:
Directory.CreateDirectory(directoryName);
string[] directoryFiles = Directory.GetFiles(xmlFilePath);
foreach (string xFile in directoryFiles)
{
XmlReader reader = XmlReader.Create(xFile);
XmlSchemaSet schemaSet = new XmlSchemaSet();
XmlSchemaInference schema = new XmlSchemaInference();
schema.TypeInference = XmlSchemaInference.InferenceOption.Relaxed;
schemaSet = schema.InferSchema(reader);
//insert code here to save the file
//stored in schemaSet.Schemas()
}
Any help is appreciated. Thanks.
XMLSchemaSet has a method called Schemas() that returns a collection of all the schemas in the set. MSDN has a page that describes how to access these.
Simply access each Schema in the set using the code in the link above and write it using the example here.
I wanna verify a digitally signed xml against its schema definition while this schema actually contains this tag
<xs:import namespace="http://www.w3.org/2000/09/xmldsig#" schemaLocation="xmldsig-core-schema.xsd" id="schema"/>
Then I tried to load schemas:
XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add(null, "a.xsd");
settings.Schemas.Compile();
I will get the following error
The 'http://www.w3.org/2000/09/xmldsig#:Signature' element is not declared.
You need to also load in the imported schema with another
settings.Schemas.Add([importednamespace], [pathtoimportedXSD]);
The scheme xmldsig-core-schema.xsd does not charge for security reasons since it makes reference to a DTD to validate the upload directory and add it as another scheme.
<!DOCTYPE schema PUBLIC "-//W3C//DTD XMLSchema 200102//EN" "http://www.w3.org/2001/XMLSchema.dtd"
this works
The solution is C#
XElement xsdMarkup = XElement.Load("C:\\Proyectos\\WindowService\\Sbif\\Schema\\Schema\\IndicadoresFinancieros-v1.0.xsd");
XElement xsdMarkup2 = XElement.Load("http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/xmldsig-core-schema.xsd");
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add(null, xsdMarkup.CreateReader());
schemas.Add(null, xsdMarkup2.CreateReader());
schemas.Compile();
are you sure hash is required at the end of?:
http://www.w3.org/2000/09/xmldsig#
From the error it would seem the XML Signature schema is not being loaded, despite the import.
Adding the XML Signature schema to the schema set explicitly should confirm that.
The most likely cause is the schema set's XmlReslver is not finding the file you specify, this could be a current folder/relative path issue.
Using Process Monitor to see where you could is trying to load the XSD file may also help.
When using XmlDocument.Load , I am finding that if the document refers to a DTD, a connection is made to the provided URI. Is there any way to prevent this from happening?
After some more digging, maybe you should set the XmlResolver property of the XmlReaderSettings object to null.
'The XmlResolver is used to locate and
open an XML instance document, or to
locate and open any external resources
referenced by the XML instance
document. This can include entities,
DTD, or schemas.'
So the code would look like this:
XmlReaderSettings settings = new XmlReaderSettings();
settings.XmlResolver = null;
settings.DtdProcessing = DtdProcessing.Parse;
XmlDocument doc = new XmlDocument();
using (StringReader sr = new StringReader(xml))
using (XmlReader reader = XmlReader.Create(sr, settings))
{
doc.Load(reader);
}
The document being loaded HAS a DTD.
With:
settings.ProhibitDtd = true;
I see the following exception:
Service cannot be started. System.Xml.XmlException: For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method.
So, it looks like ProhibitDtd MUST be set to true in this instance.
It looked like ValidationType would do the trick, but with:
settings.ValidationType = ValidationType.None;
I'm still seeing a connection to the DTD uri.
This is actually a flaw in the XML specifications. The W3C is bemoaning that people all hit their servers like mad to load schemas billions of times. Unfortunately just about no standard XML library gets this right, they all hit the servers over and over again.
The problem with DTDs is particularly serious, because DTDs may include general entity declarations (for things like & -> &) which the XML file may actually rely upon. So if your parser chooses to forgo loading the DTD, and the XML makes use of general entity references, parsing may actually fail.
The only solution to this problem would be a transparent caching entity resolver, which would put the downloaded files into some archive in the library search path, so that this archive would be dynamically created and almost automatically bundled with any software distributions made. But even in the Java world there is not one decent such EntityResolver floating about, certainly not built-in to anything from apache foundation.
Try something like this:
XmlDocument doc = new XmlDocument();
using (StringReader sr = new StringReader(xml))
using (XmlReader reader = XmlReader.Create(sr, new XmlReaderSettings()))
{
doc.Load(reader);
}
The thing to note here is that XmlReaderSettings has the ProhibitDtd property set to true by default.
Use an XMLReader to load the document and set the ValidationType property of the reader settings to None.