When using XmlDocument.Load , I am finding that if the document refers to a DTD, a connection is made to the provided URI. Is there any way to prevent this from happening?
After some more digging, maybe you should set the XmlResolver property of the XmlReaderSettings object to null.
'The XmlResolver is used to locate and
open an XML instance document, or to
locate and open any external resources
referenced by the XML instance
document. This can include entities,
DTD, or schemas.'
So the code would look like this:
XmlReaderSettings settings = new XmlReaderSettings();
settings.XmlResolver = null;
settings.DtdProcessing = DtdProcessing.Parse;
XmlDocument doc = new XmlDocument();
using (StringReader sr = new StringReader(xml))
using (XmlReader reader = XmlReader.Create(sr, settings))
{
doc.Load(reader);
}
The document being loaded HAS a DTD.
With:
settings.ProhibitDtd = true;
I see the following exception:
Service cannot be started. System.Xml.XmlException: For security reasons DTD is prohibited in this XML document. To enable DTD processing set the ProhibitDtd property on XmlReaderSettings to false and pass the settings into XmlReader.Create method.
So, it looks like ProhibitDtd MUST be set to true in this instance.
It looked like ValidationType would do the trick, but with:
settings.ValidationType = ValidationType.None;
I'm still seeing a connection to the DTD uri.
This is actually a flaw in the XML specifications. The W3C is bemoaning that people all hit their servers like mad to load schemas billions of times. Unfortunately just about no standard XML library gets this right, they all hit the servers over and over again.
The problem with DTDs is particularly serious, because DTDs may include general entity declarations (for things like & -> &) which the XML file may actually rely upon. So if your parser chooses to forgo loading the DTD, and the XML makes use of general entity references, parsing may actually fail.
The only solution to this problem would be a transparent caching entity resolver, which would put the downloaded files into some archive in the library search path, so that this archive would be dynamically created and almost automatically bundled with any software distributions made. But even in the Java world there is not one decent such EntityResolver floating about, certainly not built-in to anything from apache foundation.
Try something like this:
XmlDocument doc = new XmlDocument();
using (StringReader sr = new StringReader(xml))
using (XmlReader reader = XmlReader.Create(sr, new XmlReaderSettings()))
{
doc.Load(reader);
}
The thing to note here is that XmlReaderSettings has the ProhibitDtd property set to true by default.
Use an XMLReader to load the document and set the ValidationType property of the reader settings to None.
Related
I cannot understand when validation of XML occurs on Load or on Validate. Here is following code...
XmlDocument doc = null;
try
{
XmlReaderSettings settings = new XmlReaderSettings( );
settings.Schemas.Add("http://xxx/customs/DealFile/Common/ReleaseGoodsMessage",
ConfigurationManager.AppSettings.Get("Schemas"));
settings.ValidationType = ValidationType.Schema;
using (XmlReader reader = XmlReader.Create(path, settings)) {
doc = new XmlDocument( );
doc.Load(reader);
}
ValidationEventHandler eventHandler = new ValidationEventHandler(ValidationEventHandler);
doc.Validate(eventHandler);
}
catch(XmlSchemaException xmlErr)
{
// Do something
}
I expect a validation to occur on line doc.Validate(eventHandler);
However it always occurs on doc.Load(reader); . I've got an exception if something wrong with XML.
XMLHelpers.LoadXML(#"C:\work\Xml2Db\Xml2Db\Data\Tests\BadData\01.xml")
Exception thrown: 'System.Xml.Schema.XmlSchemaValidationException' in System.Xml.dll
xmlErr.Message
"The 'http://xxx/customs/DealFile/Common/ReleaseGoodsMessage:governmentProcedureType' element is invalid -
The value 'a' is invalid according to its datatype 'Int' - The string 'a' is not a valid Int32 value."
And this is the code from Microsoft's example https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmldocument.validate?view=netcore-3.1
try
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add("http://www.contoso.com/books", "contosoBooks.xsd");
settings.ValidationType = ValidationType.Schema;
XmlReader reader = XmlReader.Create("contosoBooks.xml", settings);
XmlDocument document = new XmlDocument();
document.Load(reader);
ValidationEventHandler eventHandler = new ValidationEventHandler(ValidationEventHandler);
// the following call to Validate succeeds.
document.Validate(eventHandler);
...
It's actually the same.
But, pay attention on comment // the following call to Validate succeeds. . They also expect to get validation on the line document.Validate(eventHandler);
What's going on.
As your block of code sets up the settings object, it sets a schema and the Validator to use ValidationType.Schema (i.e.: use the schema).
When you setup the XmlReader, using your settings it's setup to validate according to the schema, too - which is causing your schema-based error/exception.
The call to document.Validate(eventHandler); is completely redundant, because it will succeed in all circumstances - because the xml has already been validated. The comment is correct "the following call to Validate succeeds" because the document has already been proved valid.
I suspect that you are failing to distinguish between XML that is well-formed and XML that is valid.
A well-formed XML document satisfies all of the rules of the XML specification. If it does not, you should get a well-formedness error from any XML parser.
If you also choose to
a) supply an XSD that describes you XML document and
b) tell your XML processor to validate against that XSD
then the XML processor will also check that the document satisfies the rules in the XML schema (an XML Schema is composed of one or more XSDs).
If you are still not sure, edit your question and supply the error message(s) that you are seeing. You don't need to include any confidential information - the error template is enough to tell which kind of error it is.
I am a library which has a method which parses XML from the supplied XmlReader. So, the caller passes me XmlReader instance (or an instance of any derived class) but I need to make sure whitespaces are ignored. I.e. I want to do this:
xmlReader.Settings.IgnoreWhitespace = true;
// Then do my parsing
// Finally, revert to whatever state xmlReader.Settings had prior to calling my method
However, if the caller didn't instantiate XmlReaderSettings when creating XmlReader instance, I don't see the way how I can fix this myself.
For instance, if the caller used this code:
XmlReader reader = new XmlTextReader(File.OpenRead("file.xml"));
reader.Settings will remain null. This property is read-only so I can't assign it.
I'm not responsible for the caller and I don't force them to use this or that way of getting XmlReader instance and configuring it. I know XmlTextReader is deprecated but it's still available in .NET 4.6 and folks can use it.
Does this mean there is no way to work around this in my library and it's the caller who must supply me already well-configured XmlReader?
You can wrap the provided XmlReader into a new one using XmlReader.Create():
public void ReadMyXml(XmlReader reader)
{
XmlReaderSettings settings = reader.Settings ?? new XmlReaderSettings();
settings.IgnoreWhitespace = true;
settings.CloseInput = false;
using(XmlReader myReader = XmlReader.Create(reader, settings))
{
// use myReader to read the xml
}
}
Set settings.CloseInput = false if you want to avoid closing the original reader at the end (thanks to Jon Hanna for the comment)
I got an error that i can't solve for the moment
The code :
var myXslTrans = new XslCompiledTransform();
myXslTrans.Load(stylesheet);
myXslTrans.Transform(sourceFile , outputFile);
The error :
For safety reasons, DTC prohibited in this XML document To enable DTD
processing , set on the Parse DtdProcessing property XmlReaderSettings
and pass parameters to the XmlReader.Create method
I have tried with the XmlReaderSettings (DtdProcessing, DtdProcessing.Parse) but i didn't find the answer
If the stylesheet document uses a DTD (e.g. has <!DOCTYPE xsl:stylesheet ...>) then load it with
using (XmlReader xr = XmlReader.Create(stylesheet, new XmlReaderSettings() { DtdProcessing = DtdProcessing.Parse }))
{
myXslTrans.Load(xr);
}
If the sourceFile uses a DTD then load it with such an XmlReader passed as the first argument to the Transform method, you might then need a different overload of that method for the second and third argument.
I recently uncovered a memory leak in an application I maintain for work, and I'm confused as to why the code produces a leak. I've pulled out the relevant code (with slight modifications) and provided it below.
In our application, a given XML document could validate against one or more available schema files. Each schema file corresponds to a different version of the XML document as it has changed over time. We only care that the XML document validates against at least one schema. Each schema completely describes the contents of the XML document (they are not nested schema files).
According to the ANTS memory profiler, it looks like the XmlDocument object is hording references to the previous schemas, even after the schema set has been cleared. Commenting out the call to Validate(), leaving everything else the same, will stop the leak.
I fixed the leak in our application by loading the schemas once at application initialization time, and swapping out which schema file is associated with the XML document until we find one that validates.
The code below produces the memory leak, and I'm not sure why.
class Program
{
private static XmlDocument xmlDocument_ = new XmlDocument();
static void Main(string[] args)
{
using (StreamReader reader = new StreamReader("contents.xml"))
{
xmlDocument_.LoadXml(reader.ReadToEnd());
}
XmlReaderSettings xmlReaderSettings = new XmlReaderSettings();
xmlReaderSettings.CloseInput = true;
while (true)
{
xmlDocument_.Schemas = new XmlSchemaSet();
XmlReader xmlReader = XmlReader.Create("schema.xsd", xmlReaderSettings);
xmlDocument_.Schemas.Add(XmlSchema.Read(xmlReader, null));
xmlReader.Close();
xmlDocument_.Validate(null);
}
}
}
You have the memory leak because your XmlDocument reference is static and because of the SchemaInfo property, which is populated when you validate your XML. Since those properties hold references to objects from your compiled XSDs, you'll have those around for as long as you have the XmlDocument around, which could be quite a while (since it is static).
Some people may argue if indeed this is a leak or not: validating another XML with another set of XSDs will release previously held resources.
Try changing the while statement as below. I haven't tested this but it differs from the original code in that every while iteration disposes of the XmlReader.
The GC may dispose automatically of the XmlReader instances eventually but I doubt it, because XmlReader implements IDispose. That is, code that uses XmlReader must dispose it deterministically (garbage-collection is non-deterministic). If the GC was capable of disposing them, and if the while iterates thousands of times before the GC does this, the memory used will be killing the system anyway.
while (true)
{
xmlDocument_.Schemas = new XmlSchemaSet();
using (XmlReader xmlReader = XmlReader.Create("schema.xsd", xmlReaderSettings))
{
xmlDocument_.Schemas.Add(XmlSchema.Read(xmlReader, null));
}
xmlDocument_.Validate(null);
}
EDIT:
I read the MSDN page on XmlDocument.Validate, which provides a code sample that does this differently, using XmlReaderSettings to set validation options. Also, code in the OP assumes that the XML file is always encoded as UTF-8. Here's a rewrite that detects the text encoding and is based on the MSDN sample; this may fix the memory leak. This code is untested.
class Program
{
private static XmlDocument xmlDocument_ = new XmlDocument();
static void Main(string[] args)
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.CloseInput = true;
xmlDocument_.Load(XmlReader.Create("contents.xml", settings));
while (true)
{
settings.Schemas = new XmlSchemaSet();
settings.Schemas.Add(null, "schema.xsd");
xmlDocument_.Validate(null);
}
}
}
You could try ILDASM to see what's inside XmlDocument.Validate.
I have a problem with XslCompiledTransform class.
If I tried to run this code:
string pathToXsltFile, pathToInputFile, pathToOutputFile;
XsltSettings xsltSettings = new XsltSettings(true, true);
XslCompiledTransform myXslTransform = new XslCompiledTransform();
XmlTextReader reader = new XmlTextReader(pathToFile);
myXslTransform.Load(reader, xsltSettings, new XmlUrlResolver());
myXslTransform.Transform(pathToInputFile, pathToOutputFile);
It works fine.
But if I want to create XmlTextReader from a string (text):
MemoryStream mStrm = new System.IO.MemoryStream(Encoding.UTF8.GetBytes(text));
XmlTextReader xmlReader = new XmlTextReader(mStrm);
mStrm.Position = 0;
And try to run:
myXslTransform.Load(xmlReader, xsltSettings, new XmlUrlResolver());
myXslTransform.Transform(pathToInputFile, pathToOutputFile);
I get a Exception:
"this operation is not supported for a relative uri"
For some reasons I don't want to create temporaty file and create XmlTextReader from path to this file.
Edit:
Full exception message:
"An error occurred while loading document ''.
See InnerException for a complete description of the error."
InnerException.Message:
"This operation is not supported for a relative URI."
Stack trace:
at System.Xml.Xsl.Runtime.XmlQueryContext.GetDataSource(String uriRelative, String uriBase)
at <xsl:template match=\"gmgml:FeatureCollection\">(XmlQueryRuntime {urn:schemas-microsoft-com:xslt-debug}runtime, XPathNavigator {urn:schemas-microsoft-com:xslt-debug}current)
at <xsl:apply-templates>(XmlQueryRuntime {urn:schemas-microsoft-com:xslt-debug}runtime, XPathNavigator )
at Root(XmlQueryRuntime {urn:schemas-microsoft-com:xslt-debug}runtime)
at Execute(XmlQueryRuntime {urn:schemas-microsoft-com:xslt-debug}runtime)
at System.Xml.Xsl.XmlILCommand.Execute(Object defaultDocument, XmlResolver dataSources, XsltArgumentList argumentList, XmlSequenceWriter results)
at System.Xml.Xsl.XmlILCommand.Execute(Object defaultDocument, XmlResolver dataSources, XsltArgumentList argumentList, XmlWriter writer, Boolean closeWriter)
at System.Xml.Xsl.XmlILCommand.Execute(XmlReader contextDocument, XmlResolver dataSources, XsltArgumentList argumentList, Stream results)
at System.Xml.Xsl.XslCompiledTransform.Transform(String inputUri, String resultsFile)
at MyNamespace.ApplyXslTransformation1(String input, String output, String xsltFileName)
the statement causing the exception:
myXslTransform.Transform(pathToInputFile, pathToOutputFile);
About the document function I will have to ask tommorrow. I've get the xslt file from the other person.
When I've created the XmlTextReader file from the path to the xslt file everytning was fine. I've also try to use:
myXslTransform.Load(pathToXsltFile, xsltSettings, new XmlUrlResolver());
myXslTransform.Transform(pathToInputFile, pathToOutputFile);
And it was also fine.
Now i get the encrypted xslt. I've decrypt it and I want to create XmlTextReader from the decrypted string. Besause of the security reason i don't wont to create temporaty xslt decrypted file.
I think we need to see the XSLT and any calls to the document function it does. In general you need to be aware that the document function has a second argument that can serve as a base URI to resolve URIs resulting from the first argument. Without the second argument being passed in as in e.g. <xsl:value-of select="document('foo.xml')"/> the stylesheet code itself provides the base URI. If you load the stylesheet code from a string that mechanism might not resolve URIs the same way as it happens with a stylesheet loaded from the file system or a HTTP URI. The solution to that problem depends on the location of the resource you want to load and how that relates to the main input file. If you want to load foo.xml from the same location as the main input document then doing document('foo.xml', /) instead of document('foo.xml') should work.
I think this is caused by your manual setting of the memory stream's position to 0; you're confusing the XmlTextReader.
I tried the above and it works fine for me when I comment that line out.
Is there a particular reason you are setting it to 0?
Assuming this question is about using XslCompiledTransform in a .Net Core application, I found the answer to "This operation is not supported for a relative URI." at the site https://github.com/dotnet/corefx/issues/31390
The relevant answer (by vcsjones commented on Jul 26, 2018) is:
"I believe you are running in to a known compatibility change. .NET Core does not allow resolving external URIs for XML by default and is documented here.
As the documentation says, the old behavior can be restored, if you so choose, by putting
AppContext.SetSwitch("Switch.System.Xml.AllowDefaultResolver", true);
In your application. Try placing that at the top of your example program."
When I added
AppContext.SetSwitch("Switch.System.Xml.AllowDefaultResolver", true);
as the top line of
public void Configure(IApplicationBuilder app, IHostingEnvironment env)
in startup, the error "This operation is not supported for a relative URI" went away. At that moment, a new error occurred calling Load with a XmlReader relating to finding the other files referenced by the XSL file. When I then instead passed the file path to the xsl in Load, it all worked as expected.
var resolver = new XmlUrlResolver {Credentials = CredentialCache.DefaultCredentials};
var transform = new XslCompiledTransform();
transform.Load(XslPath, new XsltSettings(true, true), resolver);
var settings = new XmlWriterSettings {OmitXmlDeclaration = true};
using (var results = new StringWriter())
using(var writer = XmlWriter.Create(results, settings))
{
using (var reader = XmlReader.Create(new StringReader(document)))
{
transform.Transform(reader, writer);
}
return results.ToString();
}
I add this in hope helps someone else trying to debug why XslCompiledTransform thows "This operation is not supported for a relative URI." in .net core.