I am trying to validate XML using an online XSD. Here is my current code for my controller:
using System;
using System.IO;
using System.Net;
using System.Xml;
using System.Xml.Linq;
using System.Xml.Schema;
using Microsoft.AspNetCore.Mvc;
namespace EINV.API.Controllers
{
[Route("api/[controller]")]
[ApiController]
public class XmlController : Controller
{
[HttpPost]
public IActionResult ValidateXml2(IFormFile xmlFile, string xsdUrl)
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.XmlResolver = new XmlXsdResolver(); // Need this for resolving include and import
settings.ValidationType = ValidationType.Schema; // This might not be needed, I am using same settings to validate the input xml
//settings.DtdProcessing = DtdProcessing.Parse; // I have an include that is dtd. maybe I should prohibit dtd after I compile the xsd files.
settings.Schemas.Add(null, xsdUrl); // https://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd
settings.Schemas.Compile();
settings.ValidationType = ValidationType.Schema;
XmlReader reader = XmlReader.Create(xmlFile.OpenReadStream(), settings, "https://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/");
XmlDocument document = new XmlDocument();
document.Load(reader);
ValidationEventHandler eventHandler = new ValidationEventHandler(ValidationEventHandler);
// the following call to Validate succeeds.
document.Validate(eventHandler);
// Load the XML file into an XmlDocument
return Ok();
}
protected class XmlXsdResolver : XmlUrlResolver
{
public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{
return base.GetEntity(absoluteUri, role, ofObjectToReturn);
}
}
private void ValidationEventHandler(object? sender, ValidationEventArgs? e)
{
if (e?.Severity == XmlSeverityType.Error)
{
throw new Exception("XML validation error: " + e.Message);
}
}
}
}
I have referenced several other posts in trying to resolve this, such as the following:
How can I resolve the schemaLocation attribute of an .XSD when all of my .XSD's are stored as resources?
Compiling two embedded XSDs: error "Cannot resolve 'schemaLocation' attribute
Validating xml against an xsd that has include and import in c#
But always end up with the same error:
System.Xml.Schema.XmlSchemaValidationException: 'The 'urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2:UBLExtensions' element is not declared.'
The XML that I am using, which I downloaded into a file and upload through my SWAGGER when calling the controller, is located here: https://docs.oasis-open.org/ubl/os-UBL-2.1/xml/UBL-Invoice-2.1-Example.xml
The XSD that I am using is located here: https://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd
I think you need to set settings.Schemas.XmlResolver = new XmlUrlResolver(); as well, as the flag settings.ValidationFlags |= XmlSchemaValidationFlags.ProcessSchemaLocation; before.
That might get you only further as I think some schemas (e.g. for signatures) are imported and not found. So in the end you will need to make sure you have local copies of those schemas and have your resolver use the local copies.
Related
I have a similar problem to C# - Validating xml file against local .xsd security issues.
but my point is not a security concern in the first place. I'm hoping to secure my schema files against a "stupid user" more than an actual attacker.
Is there a possibility to pack my xsd-files into a dll at compile time and use it from there during runtime (instead of just reading a text file from the file system)?
If it would be inside a dll the "stupid user" wouldn't be able to just edit the files by accident and for an attacker we could even go further and protect the dll with strong-naming and digital signatures.
internal class XmlValidator : Validator
{
private static XmlSchemaSet _schemas;
/// <summary>
/// Initializes a new instance of the <see cref="XmlValidator"/> class.
/// </summary>
internal XmlValidator()
{
string path;
path = Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location);
}
else
{
path = #".\";
}
// Add schemas
_schemas = new XmlSchemaSet();
_schemas.Add("http://myschema/schema1", Path.Combine(path, "Schemas", "schema-v1.0.xsd"));
_schemas.Add("http://myschema/schema11", Path.Combine(path, "Schemas", "chema-v1.1.xsd"));
}
So instead of reading them directly from the file system during initialization I would like to read them as some kind of resource.
So something similar to translation files. Created at compile time and unchangeable during runtime
Sure this is possible. I do it the same way to protect them.
First declare them as Embedded Resource
Use it in code
public void LoadXsd()
{
string resourceName = "DefaultNamespace.specs.info.IErrorInfo.xsd";
Assembly assembly = Assembly.GetExecutingAssembly();
XmlSchema xsd = XmlSchema.Read(assembly.GetManifestResourceStream(resourceName), _XsdSchema_ValidationEventHandler);
XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add(xsd);
}
private void _XsdSchema_ValidationEventHandler(object sender, ValidationEventArgs e)
{
_Logger.Error($"XSD validation error: {e.Message}");
}
It would be also possible to load them all at once:
public void LoadAllXsdFiles()
{
XmlSchemaSet schemaSet = new XmlSchemaSet();
var assembly = Assembly.GetExecutingAssembly();
var allXsdFiles = assembly.GetManifestResourceNames().Where(r => r.EndsWith(".xsd"));
foreach (string xsdFile in allXsdFiles)
{
XmlSchema xsd = XmlSchema.Read(assembly.GetManifestResourceStream(xsdFile), _XsdSchema_ValidationEventHandler);
schemaSet.Add(xsd);
}
}
I'm trying to create an XmlSchemaSet against the SAML 2.0 set of schema definitions, starting with the protocol schema here: https://docs.oasis-open.org/security/saml/v2.0/saml-schema-protocol-2.0.xsd
var set = new XmlSchemaSet();
XmlSchema schema;
using (var reader = XmlReader.Create(
"https://docs.oasis-open.org/security/saml/v2.0/saml-schema-protocol-2.0.xsd"))
{
schema = XmlSchema.Read(reader, (sender, e) => Console.WriteLine(e.Message));
}
set.Add(schema);
set.Compile();
When Compile is called, the following exception is thrown:
System.Xml.Schema.XmlSchemaException
Type 'urn:oasis:names:tc:SAML:2.0:assertion:EncryptedElementType' is not declared.
at System.Xml.Schema.XmlSchemaSet.InternalValidationCallback(Object sender, ValidationEventArgs e)
at System.Xml.Schema.BaseProcessor.SendValidationEvent(XmlSchemaException e, XmlSeverityType severity)
at System.Xml.Schema.BaseProcessor.SendValidationEvent(XmlSchemaException e)
at System.Xml.Schema.Compiler.CompileElement(XmlSchemaElement xe)
at System.Xml.Schema.Compiler.Compile()
at System.Xml.Schema.Compiler.Execute(XmlSchemaSet schemaSet, SchemaInfo schemaCompiledInfo)
at System.Xml.Schema.XmlSchemaSet.Compile()
at XSD.Program.Main(String[] args)
The type specified urn:oasis:names:tc:SAML:2.0:assertion:EncryptedElementType appears in the namespace imported at the top of the schema:
<import
namespace="urn:oasis:names:tc:SAML:2.0:assertion"
schemaLocation="saml-schema-assertion-2.0.xsd"/>
Using Fiddler, I can't see the application making any attempts at retrieving the imported schema.
Why don't these import statements appear to be working with the XmlSchemaSet?
The default behaviour of the XmlSchemaSet is to not try to resolve any external schemas. To this this, the XmlResolver property must be set. The go-to resolver implementation is XmlUrlResolver:
set.XmlResolver = new XmlUrlResolver();
The important thing is to set this property before adding any schemas to the set. The call to Add performs "pre-processing" on the schema, which includes resolving any import statements. Assigning the XmlResolver after calling Add appears to have no effect.
The application code needs to be:
var set = new XmlSchemaSet
{
// Enable resolving of external schemas.
XmlResolver = new XmlUrlResolver()
};
XmlSchema schema;
using (var reader = XmlReader.Create(
"https://docs.oasis-open.org/security/saml/v2.0/saml-schema-protocol-2.0.xsd"))
{
schema = XmlSchema.Read(reader, (sender, e) => Console.WriteLine(e.Message));
}
set.Add(schema);
set.Compile();
NOTE The above code still does not actually produce the desired result due to problems loading the schemas from w3.org, however the imported SAML schema is resolved successfully.
I am using the .NET XmlSerializer class to deserialize GPX files.
There are two versions of the GPX standard:
<gpx xmlns="http://www.topografix.com/GPX/1/0"> ... </gpx>
<gpx xmlns="http://www.topografix.com/GPX/1/1"> ... </gpx>
Also, some GPX files do not specify a default namespace:
<gpx> ... </gpx>
My code needs to handle all three cases, but I can't work out how to get XmlSerializer to do it.
I am sure there must be a simple solution because this a common scenario, for example KML has the same issue.
I have done something similar to this a few times before, and this might be of use to you if you only have to deal with a small number of namespaces and you know them all beforehand. Create a simple inheritance hierarchy of classes, and add attributes to the different classes for the different namespaces. See the following code sample. If you run this program it gives the output:
Deserialized, type=XmlSerializerExample.GpxV1, data=1
Deserialized, type=XmlSerializerExample.GpxV2, data=2
Deserialized, type=XmlSerializerExample.Gpx, data=3
Here is the code:
using System;
using System.IO;
using System.Xml;
using System.Xml.Serialization;
[XmlRoot("gpx")]
public class Gpx {
[XmlElement("data")] public int Data;
}
[XmlRoot("gpx", Namespace = "http://www.topografix.com/GPX/1/0")]
public class GpxV1 : Gpx {}
[XmlRoot("gpx", Namespace = "http://www.topografix.com/GPX/1/1")]
public class GpxV2 : Gpx {}
internal class Program {
private static void Main() {
var xmlExamples = new[] {
"<gpx xmlns='http://www.topografix.com/GPX/1/0'><data>1</data></gpx>",
"<gpx xmlns='http://www.topografix.com/GPX/1/1'><data>2</data></gpx>",
"<gpx><data>3</data></gpx>",
};
var serializers = new[] {
new XmlSerializer(typeof (Gpx)),
new XmlSerializer(typeof (GpxV1)),
new XmlSerializer(typeof (GpxV2)),
};
foreach (var xml in xmlExamples) {
var textReader = new StringReader(xml);
var xmlReader = XmlReader.Create(textReader);
foreach (var serializer in serializers) {
if (serializer.CanDeserialize(xmlReader)) {
var gpx = (Gpx)serializer.Deserialize(xmlReader);
Console.WriteLine("Deserialized, type={0}, data={1}", gpx.GetType(), gpx.Data);
}
}
}
}
}
Here's the solution I came up with before the other suggestions came through:
var settings = new XmlReaderSettings();
settings.IgnoreComments = true;
settings.IgnoreProcessingInstructions = true;
settings.IgnoreWhitespace = true;
using (var reader = XmlReader.Create(filePath, settings))
{
if (reader.IsStartElement("gpx"))
{
string defaultNamespace = reader["xmlns"];
XmlSerializer serializer = new XmlSerializer(typeof(Gpx), defaultNamespace);
gpx = (Gpx)serializer.Deserialize(reader);
}
}
This example accepts any namespace, but you could easily make it filter for a specific list of known namespaces.
Oddly enough you can't solve this nicely. Have a look at the deserialize section in this troubleshooting article. Especially where it states:
Only a few error conditions lead to exceptions during the
deserialization process. The most common ones are:
•The name of the
root element or its namespace did not match the expected name.
...
The workaround I use for this is to set the first namespace, try/catch the deserialize operation and if it fails because of the namespace I try it with the next one. Only if all namespace options fail do I throw the error.
From a really strict point of view you can argue that this behavior is correct since the type you deserialize to should represent a specific schema/namespace and then it doesn't make sense that it should also be able to read data from another schema/namespace. In practice this is utterly annoying though. File extenstion rarely change when versions change so the only way to tell if a .gpx file is v0 or v1 is to read the xml contents but the xmldeserializer won't unless you tell upfront which version it will be.
I don't want to do anything fancy, I just want to make sure a document is valid, and print an error message if it is not. Google pointed me to this, but it seems XmlValidatingReader is obsolete (at least, that's what MonoDevelop tells me).
Edit: I'm trying Mehrdad's tip, but I'm having trouble. I think I've got most of it, but I can't find OnValidationEvent anywhere. Where go I get OnValidationEvent from?
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.DTD;
settings.ValidationEventHandler += new ValidationEventHandler(/*trouble is here*/);
XmlReader validatingReader = XmlReader.Create(fileToLoad, settings);
Instead of creating XmlValidatingReader class directly, you should construct an appropriate XmlReaderSettings object and pass it as an argument to the XmlReader.Create method:
var settings = new XmlReaderSettings { ValidationType = ValidationType.DTD };
settings.ValidationEventHandler += new ValidationEventHandler(OnValidationEvent);
var reader = XmlReader.Create("file.xml", settings);
The rest is unchanged.
P.S. OnValidationEvent is the name of the method you declare to handle validation events. Obviously, you can remove the line if you don't want to subscribe to validation events raised by the XmlReader.
var messages = new StringBuilder();
var settings = new XmlReaderSettings { ValidationType = ValidationType.DTD };
settings.ValidationEventHandler += (sender, args) => messages.AppendLine(args.Message);
var reader = XmlReader.Create("file.xml", settings);
if (messages.Length > 0)
{
// Log Validation Errors
// Throw Exception
// Etc.
}
ValidationEventHandler
Lambda Expressions
Type Inference
I've referred to this example on DTD validation.
https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmlreadersettings.dtdprocessing?view=netcore-3.1#input
This example has invalid DTD XML, which I've corrected as below.
<!--XML file using a DTD-->
<!DOCTYPE store [
<!ELEMENT store (item)*>
<!ELEMENT item (name,dept,price)>
<!ATTLIST item type CDATA #REQUIRED ISBN CDATA
#REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT dept (#PCDATA)>
<!ELEMENT price (#PCDATA)>]>
<store>
<item type="supplies" ISBN="2-3631-4">
<name>paint</name>
<dept>1</dept>
<price>16.95</price>
</item>
</store>
full description:
In Visual Studio .NET, create a new Visual C# Console Application
project named ValidateXml. Add two using statements to the beginning
of Class1.cs as follows:
using System.Xml; // for XmlTextReader and XmlValidatingReader
using System.Xml.Schema; // for XmlSchemaCollection (which is used later)
In Class1.cs, declare a boolean variable named isValid before the
start of the Main method as follows:
private static bool isValid = true; // If a validation error occurs,
// set this flag to false in the
// validation event handler.
Create an XmlTextReader object to read an XML document from a text
file in the Main method, and then create an XmlValidatingReader to
validate this XML data as follows:
XmlTextReader r = new XmlTextReader("C:\\MyFolder\\ProductWithDTD.xml");
XmlValidatingReader v = new XmlValidatingReader(r);
The ValidationType property of the XmlValidatingReader object
indicates the type of validation that is required (DTD, XDR, or
Schema). Set this property to DTD as follows:
v.ValidationType = ValidationType.DTD;
If any validation errors occur, the validating reader generates a
validation event. Add the following code to register a validation
event handler (you will implement the MyValidationEventHandler
method in Step 7):
v.ValidationEventHandler +=
new ValidationEventHandler(MyValidationEventHandler);
Add the following code to read and validate the XML document. If any
validation errors occur, MyValidationEventHandler is called to
address the error. This method sets isValid to false (see Step 8).
You can check the status of isValid after validation to see if the
document is valid or invalid.
while (v.Read())
{
// Can add code here to process the content.
}
v.Close();
// Check whether the document is valid or invalid.
if (isValid)
Console.WriteLine("Document is valid");
else
Console.WriteLine("Document is invalid");
Write the MyValidationEventHandler method after the Main method as
follows:
public static void MyValidationEventHandler(object sender,
ValidationEventArgs args)
{
isValid = false;
Console.WriteLine("Validation event\n" + args.Message);
}
Build and run the application. The application should report that the XML document is valid.
e.g.:
In Visual Studio .NET, modify ProductWithDTD.xml to invalidate it (for example, delete the <AuthorName>M soliman</AuthorName> element).
Run the application again. You should receive the following error message:
Validation event
Element 'Product' has invalid content. Expected 'ProductName'.
An error occurred at file:///C:/MyFolder/ProductWithDTD.xml(4, 5).
Document is invalid
The HR-XML 3.0 spec provides WSDL's to generate their entities. I'm trying to deserialize their example xml in their documentation, but it's not working.
Candidate.CandidateType candidate = null;
string path = "c:\\candidate.xml";
XmlSerializer serializer = new XmlSerializer(typeof(Candidate.CandidateType), "http://www.hr-xml.org/3");
StreamReader reader = null;
reader = new StreamReader(path);
candidate = (Candidate.CandidateType)serializer.Deserialize(reader);
The error I'm getting:
"<Candidate xmlns='http://www.hr-xml.org/3'> was not expected."
Any suggestions?
Update: I tried XmlSerializing a CandidatePerson element and it looks like it uses CandidatePersonType instead of CandidatePerson. I think I'm doing something wrong here though...
first lines of Candidate.CandidateType (all auto-generated):
[System.CodeDom.Compiler.GeneratedCodeAttribute("System.Xml", "2.0.50727.3082")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://www.hr-xml.org/3")]
public partial class CandidateType {
private IdentifierType2 documentIDField;
private IdentifierType2[] alternateDocumentIDField;
private string documentSequenceField;
The following is more of a comment, but it's too long, so I'll put it here.
The CandidateType class is properly decorated with the XmlType attribute. That is an attribute that applies to types, and determines how the type will be emitted in any generated XML Schema. It has nothing to do with the namespace on an element that happens to have the same type.
Consider the following C# code:
public class CandidateType {}
public class Foo
{
CandidateType _candidate1;
CandidateType _candidate2;
}
Note that you can have multiple variables of the same type. In the same way, you could have:
<xs:element name="Candidate1" type="hrx:CandidateType"/>
<xs:element name="Candidate2" type="hrx:CandidateType"/>
These are two elements which will validate against the same type definition, but which are otherwise unrelated. If they are in the same XML Schema, then they will be in the same namespace. But what if they're not? Then you could have an instance document like:
<ns1:Candidate1 xmlns:ns1="namespace1" xmlns="http://www.hr-xml.org/3"> ... </ns1:Candidate1>
<ns2:Candidate2 xmlns:ns2="namespace2" xmlns="http://www.hr-xml.org/3"> ... </ns1:Candidate2>
What you need to do is specify the namespace of the Candidate element to the XML Serializer. The fact that the CandidateType type is in a particular namespace does not determine the namespace of the Candidate element.
Muahaha I figured it out finally!
John Saunders was right, I needed to specify the default namespace in the XmlSerializer, but in addition to that I have to specify the XmlRootAttribute because the Class that I'm trying to de-serialize to does not have the same name as the root element.
Here is my code for de-serializing the HR-XML ProcessCandidate example:
protected static ImportTest.CandidateService.ProcessCandidateType DeserializeProcessCandidate(string path)
{
CandidateService.ProcessCandidateType processCandidate = null;
XmlRootAttribute root = new XmlRootAttribute("ProcessCandidate");
XmlSerializer serializer = new XmlSerializer(typeof(CandidateService.ProcessCandidateType), new XmlAttributeOverrides(), new Type[0], root, "http://www.hr-xml.org/3");
StreamReader reader = null;
try
{
reader = new StreamReader(path);
processCandidate = (CandidateService.ProcessCandidateType)serializer.Deserialize(reader);
reader.Close();
}
catch (Exception ex)
{
reader.Close();
throw (new Exception(ex.InnerException.Message));
}
return processCandidate;
}
Thanks for the help John!