Validate XML against XSD in a single method

Validate XML against XSD in a single method - c#

I need to implement a C# method that needs to validate an XML against an external XSD and return a Boolean result indicating whether it was well formed or not.
public static bool IsValidXml(string xmlFilePath, string xsdFilePath);
I know how to validate using a callback. I would like to know if it can be done in a single method, without using a callback. I need this purely for cosmetic purposes: I need to validate up to a few dozen types of XML documents so I would like to make is something as simple as below.
if(!XmlManager.IsValidXml(
#"ProjectTypes\ProjectType17.xml",
#"Schemas\Project.xsd"))
{
throw new XmlFormatException(
string.Format(
"Xml '{0}' is invalid.",
xmlFilePath));
}

There are a couple of options I can think of depending on whether or not you want to use exceptions for non-exceptional events.
If you pass a null as the validation callback delegate, most of the built-in validation methods will throw an exception if the XML is badly formed, so you can simply catch the exception and return true/false depending on the situation.
public static bool IsValidXml(string xmlFilePath, string xsdFilePath, XNamespace namespaceName)
{
var xdoc = XDocument.Load(xmlFilePath);
var schemas = new XmlSchemaSet();
schemas.Add(namespaceName, xsdFilePath);
try
{
xdoc.Validate(schemas, null);
}
catch (XmlSchemaValidationException)
{
return false;
}
return true;
}
The other option that comes to mind pushes the limits of your without using a callback criterion. Instead of passing a pre-defined callback method, you could instead pass an anonymous method and use it to set a true/false return value.
public static bool IsValidXml(string xmlFilePath, string xsdFilePath, XNamespace namespaceName)
{
var xdoc = XDocument.Load(xmlFilePath);
var schemas = new XmlSchemaSet();
schemas.Add(namespaceName, xsdFilePath);
Boolean result = true;
xdoc.Validate(schemas, (sender, e) =>
{
result = false;
});
return result;
}

Related

XML Validation against XSD always returns true

I have a c# script that validates an XML document against an XSD document, as follows:
static bool IsValidXml(string xmlFilePath, string xsdFilePath)
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add(null, xsdFilePath);
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Compile();
try
{
XmlReader xmlRead = XmlReader.Create(xmlFilePath, settings);
while (xmlRead.Read())
{ };
xmlRead.Close();
}
catch (Exception e)
{
return false;
}
return true;
}
I've compiled this after looking at a number of MSDN articles and questions here where this is the solution. It does correctly validate that the XSD is formed well (returns false if I mess with the file) and checks that the XML is formed well (also returns false when messed with).
I've also tried the following, but it does the exact same thing:
static bool IsValidXml(string xmlFilePath, string xsdFilePath)
{
XDocument xdoc = XDocument.Load(xmlFilePath);
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add(null, xsdFilePath);
try
{
xdoc.Validate(schemas, null);
}
catch (XmlSchemaValidationException e)
{
return false;
}
return true;
}
I've even pulled a completely random XSD off the internet and thrown it into both scripts, and it still validates on both. What am I missing here?
Using .NET 3.5 within an SSIS job.

In .NET you have to check yourself if the validator actually matches a schema component; if it doesn't, there is no exception thrown, and so your code will not work as you expect.
A match means one or both of the following:
there is one global element in your schema set with a qualified name that is the same as your XML document element's qualified name.
the document element has an xsi:type attribute, that is a qualified name pointing to a global type in your schema set.
In streaming mode, you can do this check easily. This pseudo-kind-of-code should give you an idea (error handling not shown, etc.):
using (XmlReader reader = XmlReader.Create(xmlfile, settings))
{
reader.MoveToContent();
var qn = new XmlQualifiedName(reader.LocalName, reader.NamespaceURI);
// element test: schemas.GlobalElements.ContainsKey(qn);
// check if there's an xsi:type attribute: reader["type", XmlSchema.InstanceNamespace] != null;
// if exists, resolve the value of the xsi:type attribute to an XmlQualifiedName
// type test: schemas.GlobalTypes.ContainsKey(qn);
// if all good, keep reading; otherwise, break here after setting your error flag, etc.
}
You might also consider the XmlNode.SchemaInfo which represents the post schema validation infoset that has been assigned to a node as a result of schema validation. I would test different conditions and see how it works for your scenario. The first method is recommended to reduce the attack surface in DoS attacks, as it is the fastest way to detect completely bogus payloads.

Validating string value has the correct XML format

I am Having a sring for which i need to chek weather it has correct XML format like consistent start and end tags.
Sorry i tried to make string value well formated but could not :).
string parameter="<HostName>Arasanalu</HostName><AdminUserName>Administrator</AdminUserName><AdminPassword>A1234</AdminPassword><placeNumber>38</PlaceNumber>"
I tried with following check :
public bool IsValidXML(string value)
{
try
{
// Check we actually have a value
if (string.IsNullOrEmpty(value) == false)
{
// Try to load the value into a document
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(parameter);
// If we managed with no exception then this is valid XML!
return true;
}
else
{
// A blank value is not valid xml
return false;
}
}
catch (System.Xml.XmlException)
{
return false;
}
}
It was throwing error for correct as well as wrong format.
Please let me know how can i proceed.
Regards,
Channa

The content of the string you have do not actually form a valid xml document
Its missing a Root Element
string parameter="<HostName>Arasanalu</HostName><AdminUserName>Administrator</AdminUserName><AdminPassword>A1234</AdminPassword><PlaceNumber>38</PlaceNumber>";
XmlDocument doc = new XmlDocument(); \
doc.LoadXml("<root>" + parameter + "</root>"); // this adds a root element and makes it Valid
Root Element
There is exactly one element, called the root, or document element, no
part of which appears in the content of any other element.] For all
other elements, if the start-tag is in the content of another element,
the end-tag is in the content of the same element. More simply stated,
the elements, delimited by start- and end-tags, nest properly within
each other.

Always put proper tags in variable. Put <root> tag before and after you code. Try below code.
try
{
string unformattedXml = "<Root><HostName>Arasanalu</HostName><AdminUserName>Administrator</AdminUserName><AdminPassword>A1234</AdminPassword><PlaceNumber>38</PlaceNumber></Root>";
string formattedXml = XElement.Parse(unformattedXml).ToString();
return true;
}
catch (Exception e)
{
return false;
}

WCF message:How to remove the SOAP Header element?

I try to delete the whole SOAP header from a WCF message, just only want to leave the envelope body. Anybody can give me an idea how can do that?
Create a WCF message like this:
**string response = "Hello World!";
Message msg = Message.CreateMessage(MessageVersion.Soap11, "*", new TextBodyWriter(response));
msg.Headers.Clear();**
The sending SOAP message will be:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header />
<s:Body>
<Binary>Hello World!</Binary>
</s:Body>
</s:Envelope>
But I don't want to the SOAP header element, which I just only need the envelop body.How to remove the header element from a WCF message?

Option 1: Use bacicHttpBinding, it will not add content to the header (when not configured for security)
Option 2: Implement a custom mesaage encoder and strip the header there. anywhere before that there is a chance wcf will add the header back. See sample encoder here.

That question is a tricky one: let's take it step by step
Some Context
The Message class writes its headers in its ToString() method. ToString() then calls an internal overload ToString(XmlDictionaryWriter writer) which then starts writing:
// System.ServiceModel.Channels.Message
internal void ToString(XmlDictionaryWriter writer)
{
if (this.IsDisposed)
{
throw TraceUtility.ThrowHelperError(this.CreateMessageDisposedException(), this);
}
if (this.Version.Envelope != EnvelopeVersion.None)
{
this.WriteStartEnvelope(writer);
this.WriteStartHeaders(writer);
MessageHeaders headers = this.Headers;
for (int i = 0; i < headers.Count; i++)
{
headers.WriteHeader(i, writer);
}
writer.WriteEndElement();
MessageDictionary arg_60_0 = XD.MessageDictionary;
this.WriteStartBody(writer);
}
this.BodyToString(writer);
if (this.Version.Envelope != EnvelopeVersion.None)
{
writer.WriteEndElement();
writer.WriteEndElement();
}
}
The this.WriteStartHeaders(writer); code writes the header tag regardless of the number of headers. It is matched by the writer.WriteEndElement() after the for loop. This writer.WriteEndElement() must be matched with the header tag being written, else the Xml document will be invalid.
So there is no way we can override a virtual method to get rid of the headers: WriteStartHeaders calls the virtual method OnWriteStartHeaders but the tag closing prevents simply shutting it off). We have to change the whole ToString() method in order to remove any header-related structure, to arrive at:
- write start of envelope
- write start of body
- write body
- write end of body
- write end of envelope
Solutions
In the above pseudocode, we have control on everything but the "write body" part. All methods called in the initial ToString(XmlDictionaryWriter writer) are public except BodyToString. So we will need to call it through reflection or whichever method fits your needs. Writing a message without its headers simply becomes:
private void ProcessMessage(Message msg, XmlDictionaryWriter writer)
{
msg.WriteStartEnvelope(writer); // start of envelope
msg.WriteStartBody(writer); // start of body
var bodyToStringMethod = msg.GetType()
.GetMethod("BodyToString", BindingFlags.Instance | BindingFlags.NonPublic);
bodyToStringMethod.Invoke(msg, new object[] {writer}); // write body
writer.WriteEndElement(); // write end of body
writer.WriteEndElement(); // write end of envelope
}
Now we have a way to get our message content without the headers. But how should this method be invoked?
We only want the message without headers as a string
Great, we don't need to care about overriding the ToString() method that then calls the initial writing of the message. Just create a method in your program that takes a Message and an XmlDictionaryWriter and call it to get the message without its headers.
We want the ToString() method to return the message without headers
This one is a bit more complicated. We cannot easily inherit from the Message class because we would need to pull out a lot of dependencies out of the System.ServiceModel assembly. I won't go there in this answer.
What we can do is use the capabilities of some frameworks to create a proxy around an existing object and to intercept some calls to the original object in order to replace/enhance its behavior: I'm used to Castle Dynamic proxy so let's use that.
We want to intercept the ToString() method so we create a proxy around the Message object we are using and add an interceptor to replace the ToString method of the Message with our implementation:
var msg = Message.CreateMessage(MessageVersion.Soap11, "*");
msg.Headers.Clear();
var proxyGenerator = new Castle.DynamicProxy.ProxyGenerator();
var proxiedMessage = proxyGenerator.CreateClassProxyWithTarget(msg, new ProxyGenerationOptions(),
new ToStringInterceptor());
The ToStringInterceptor needs to do almost the same thing as the initial ToString() method, we will however use our ProcessMessage method defined above:
public class ToStringInterceptor : IInterceptor
{
public void Intercept(IInvocation invocation)
{
if (invocation.Method.Name != "ToString")
{
invocation.Proceed();
}
else
{
var result = string.Empty;
var msg = invocation.InvocationTarget as Message;
StringWriter stringWriter = new StringWriter(CultureInfo.InvariantCulture);
XmlDictionaryWriter xmlDictionaryWriter =
XmlDictionaryWriter.CreateDictionaryWriter(new XmlTextWriter(stringWriter));
try
{
ProcessMessage(msg, xmlDictionaryWriter);
xmlDictionaryWriter.Flush();
result = stringWriter.ToString();
}
catch (XmlException ex)
{
result = "ErrorMessage";
}
invocation.ReturnValue = result;
}
}
private void ProcessMessage(Message msg, XmlDictionaryWriter writer)
{
// same method as above
}
}
And here we are: calls to the ToString() method of the message will now return a envelope without headers. We can pass the message to other parts of the framework and know it should mostly work: direct calls to some of the internal plumbing of Message can still produce the initial output but short of a full reimplementation we cannot control that.
Points of note
This is the shortest way to removing the headers I found. The fact that the header serialisation in the writer was not handled in one virtual function only was a big problem. The code doesn't give you much wriggle room.
This implementation doesn't use the same XmlWriter as the one used in the original implementation of ToString() in the Message, EncodingFallbackAwareXmlTextWriter. This class is internal in System.ServiceModel and pulling it out is left as an exercice to the reader. As a result, the output differs slightly since the xml is not formatted with the simple XmlTextWriter I use.
The interceptor could simply have parsed the xml returned from the initial ToString() call and removed the headers node before letting the value bubble up. This is another viable solution.
Raw code
public class ToStringInterceptor : IInterceptor
{
public void Intercept(IInvocation invocation)
{
if (invocation.Method.Name != "ToString")
{
invocation.Proceed();
}
else
{
var result = string.Empty;
var msg = invocation.InvocationTarget as Message;
StringWriter stringWriter = new StringWriter(CultureInfo.InvariantCulture);
XmlDictionaryWriter xmlDictionaryWriter =
XmlDictionaryWriter.CreateDictionaryWriter(new XmlTextWriter(stringWriter));
try
{
ProcessMessage(msg, xmlDictionaryWriter);
xmlDictionaryWriter.Flush();
result = stringWriter.ToString();
}
catch (XmlException ex)
{
result = "ErrorMessage";
}
invocation.ReturnValue = result;
}
}
private void ProcessMessage(Message msg, XmlDictionaryWriter writer)
{
msg.WriteStartEnvelope(writer);
msg.WriteStartBody(writer);
var bodyToStringMethod = msg.GetType()
.GetMethod("BodyToString", BindingFlags.Instance | BindingFlags.NonPublic);
bodyToStringMethod.Invoke(msg, new object[] { writer });
writer.WriteEndElement();
writer.WriteEndElement();
}
}
internal class Program
{
private static void Main(string[] args)
{
var msg = Message.CreateMessage(MessageVersion.Soap11, "*");
msg.Headers.Clear();
var proxyGenerator = new Castle.DynamicProxy.ProxyGenerator();
var proxiedMessage = proxyGenerator.CreateClassProxyWithTarget(msg, new ProxyGenerationOptions(),
new ToStringInterceptor());
var initialResult = msg.ToString();
var proxiedResult = proxiedMessage.ToString();
Console.WriteLine("Initial result");
Console.WriteLine(initialResult);
Console.WriteLine();
Console.WriteLine("Proxied result");
Console.WriteLine(proxiedResult);
Console.ReadLine();
}
}

I did not have your XmlBodyWriter but you could use data contract serializer or your xml body writer
But the trick is to use msg.WriteBody. this will omit the headers
var response = "Hello";
Message msg = Message.CreateMessage(MessageVersion.Soap11, "*",response, new DataContractSerializer(response.GetType()));
msg.Headers.Clear();
var sb = new StringBuilder();
var xmlWriter = new XmlTextWriter(new StringWriter(sb));
msg.WriteBody(xmlWriter);

Should be something like this:
XmlDocument xml = new XmlDocument();
xml.LoadXml(myXmlString); // suppose that myXmlString contains "<Body>...</Body>"
XmlNodeList xnList = xml.SelectNodes("/Envelope/Body");
foreach (XmlNode xn in xnList)
{
string binary1 = xn["Binary1"].InnerText;
string binary2 = xn["Binary2"].InnerText;
Console.WriteLine("Binary: {0} {1}", binary1 , binary2);
}

XmlDocument.Validate does not fire for multiple errors

I am trying to validate an incoming input xmlDocument against a an existing XmlSchemaSet. Following is the code:
public class ValidateSchemas
{
private bool _isValid = true;
public List<string> errorList = new List<string>();
public bool ValidateDocument(XmlDocument businessDocument)
{
XmlSchemaSet schemaSet = SchemaLoader.Loader();
bool isValid = Validate(businessDocument, SchemaLoader._schemaSet);
return isValid;
}
public bool Validate(XmlDocument document, XmlSchemaSet schema)
{
ValidationEventHandler eventHandler = new ValidationEventHandler(HandleValidationError);
document.Schemas = schema;
document.Validate(eventHandler);
return _isValid;
}
private void HandleValidationError(object sender, ValidationEventArgs ve)
{
_isValid = false; errorList.Add(ve.Message);
}
}
The code works fine from a validation perspective. However the errorList captures only the first node error. It does not capture the other node errors. Looks like the event is getting fired only once. How to accomplish this, please help. Please note I am getting xmldocument as input , hence not using a reader.

That's exactly the expected behavior of XmlDocument.Validate method. Once it finds a validation error it stops validate process and returns the error. So, the user has to fix that error and validate again.
This behavior is different from the Visual studio error list. For example, if you have a single syntax error in the code sometimes it returns 100s of errors. But actually you have to fix only one at one place. So, there can be both pros and cons depends on the circumstance. However, I don't think you could easily get all the validation errors for a XMLDocument, it works in a different way inherently.

.NET, Why must I use *Specified property to force serialization? Is there a way to not do this?

I am using xml-serialization in my project to serialize and deserialize objects based on an xml schema. I used the xsd tool to create classes to use when serializing / deserializing the objects.
When I go to serialize the object before sending, I am forced to set the *Specified property to true in order to force the serializer to serialize all propeties that are not of type string.
Is there a way to force the serialization of all properties without having to set the *Specified property to true?

The FooSpecified property is used to control whether the Foo property must be serialized. If you always want to serialize the property, just remove the FooSpecified property.

I know this is an old question, but none of the other answers (except perhaps the suggestion of using Xsd2Code) really produces an ideal solution when you're generating code as part of your build and your .xsd may change several times during a single release cycle.
An easy way for me to get what I really wanted and still use xsd.exe was to run the generated file through a simple post-processor. The code for the post-processor is as follows:
namespace XsdAutoSpecify
{
using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
try
{
if (args.Length != 1)
{
throw new ArgumentException("Specify a file name");
}
string fileName = args[0];
Regex regex = new Regex(".*private bool (?<fieldName>.*)Specified;");
IList<string> result = new List<string>();
IDictionary<string, string> edits = new Dictionary<string, string>();
foreach (string line in File.ReadLines(fileName))
{
result.Add(line);
if (line.Contains("public partial class"))
{
// Don't pollute other classes which may contain like-named fields
edits.Clear();
}
else if (regex.IsMatch(line))
{
// We found a "private bool fooSpecified;" line. Add
// an entry to our edit dictionary.
string fieldName = regex.Match(line).Groups["fieldName"].Value;
string lineToAppend = string.Format("this.{0} = value;", fieldName);
string newLine = string.Format(" this.{0}Specified = true;", fieldName);
edits[lineToAppend] = newLine;
}
else if (edits.ContainsKey(line.Trim()))
{
// Use our edit dictionary to add an autospecifier to the foo setter, as follows:
// set {
// this.fooField = value;
// this.fooFieldSpecified = true;
// }
result.Add(edits[line.Trim()]);
}
}
// Overwrite the result
File.WriteAllLines(fileName, result);
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
Environment.Exit(-1);
}
}
}
}
The result is generated code similar to the following:
[System.Xml.Serialization.XmlAttributeAttribute()]
public barEnum foo {
get {
return this.fooField;
}
set {
this.fooField = value;
this.fooFieldSpecified = true;
}
}

You could add a default value to your schema and then use the DefaultValueAttribute.
For example, you could have the following in your schema:
<xs:element name="color" type="xs:string" default="red"/>
And then the following property for serialization:
[DefaultValue(red)]
public string color { get; set; }
This should force the color property to always serialize as "red" if it has not been explicitly set to something else.

I faced same issue and ended up setting all *Specified properties to true by reflection.
Like
var customer = new Customer();
foreach (var propertyInfo in typeof(Customer).GetProperties().Where(p => p.Name.EndsWith("Specified")))
{
propertyInfo.SetValue(customer, true);
}

We found that the answer to this question is to make sure that the schema elements are all defined as string data types. This will make sure that the serializer serializes all fields without the use of the correlated *specified property.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Validate XML against XSD in a single method - c#

Related

XML Validation against XSD always returns true

Validating string value has the correct XML format

WCF message:How to remove the SOAP Header element?

XmlDocument.Validate does not fire for multiple errors

.NET, Why must I use *Specified property to force serialization? Is there a way to not do this?

Categories

Resources