XmlDocument oXmlDoc = new XmlDocument();
try
{
oXmlDoc.Load(filePath);
}
catch (Exception ex)
{
// Log Error Here
try
{
Encoding enc = Encoding.GetEncoding("iso-8859-1");
StreamReader sr = new StreamReader(filePath, enc);
String response = sr.ReadToEnd();
oXmlDoc.LoadXml(response);
}
catch (Exception innerException)
{
// Log Error Here
return false;
}
}
I got xml file from third party which also include the Document Type Definition file after xml declaration.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE SoccerMatchPlus SYSTEM "SoccerMatchPlus.dtd">
<SoccerMatchPlus matchid="33226">
<Booking id="13642055" time="47">
<Player id="370927">
<Name firstName="Lasse" initials="L" lastName="Nielsen">L Nielsen</Name>
</Player>
<Team id="26415" name="AæB" homeOrAway="Home"/>
</Booking>
</SoccerMatchPlus>
If I parse the file with Invalid character in the given encoding. Line 102, position 56. If I catch the exception and retry to parse the file then I got another issue, file parses but
I got the error Could not find file 'C:\Windows\system32\SoccerMatchPlus.dtd'.
Document Type Definition file named SoccerMatchPlus.dtd is added before the root element by third party.
In the case of Load method the parser loads the file from the location where xml file also exists.
I put the SoccerMatchPlus.dtd in other location where xml file resides, can I load that SoccerMatchPlus.dtd file from the specified location at runtime or can you tell me the better way to load the xml file which contains the invalid characters data?
Use the XmlResolver property of XmlDocument class to disable DTD processing.
XmlDocument oXmlDoc = new XmlDocument();
oXmlDoc.XmlResolver = null;
Related
I have a complex schema structure, with multiple schema using the tag. The issue I was having was accessing the schema files that were stored in sub folders (something like /xsd/Person/personcommon.xsd), as the file structure doesn't exist when added as an embedded resource. I have written a .dll to validate the XML file and then de-serialize the file into objects. These objects are called by another application. After searching around I came up with the following to help validate the XML:
public bool ValidateXML(string xmlFilePath)
{
// Set the namespace followed by the folder
string prefix = "MyApp.xsd";
// Variable to determine if file is valid
bool isValid = true;
// Assembly object to find the embedded resources
Assembly myAssembly = Assembly.GetExecutingAssembly();
// Get all the xsd resoruces
var resourceNames = myAssembly.GetManifestResourceNames().Where(name => name.StartsWith(prefix));
try
{
// Load the XML
XDocument xmlDoc = XDocument.Load(xmlFilePath);
// Create new schema set
XmlSchemaSet schemas = new XmlSchemaSet();
// Iterate through all the resources and add only the xsd's
foreach (var name in resourceNames)
{
using (Stream schemaStream = myAssembly.GetManifestResourceStream(name))
{
using (XmlReader schemaReader = XmlReader.Create(schemaStream))
{
schemas.Add(null, schemaReader);
}
}
}
// Call to the validate method
xmlDoc.Validate(schemas, (o, e) => { this.ErrorMessage = string.Format("File failure: There was an error validating the XML document: {0} ", e.Message); isValid = false; }, true);
return isValid;
}
catch (Exception ex)
{
isValid = false;
this.ErrorMessage = string.Format("File failure: There was an error validating the XML document: {0}", ex.ToString());
return isValid;
}
}
The issue I am now having is in my unit tests. Debuging through the code I see exceptions are thrown when schemas.Add() is called, it can't find the referenced XSD files. The weird thing is that it still validates the XML file passed in correctly.
Is this expected behavior and is the code I have written valid / is there a better way I can access these XSD files.
I appreciate your help.
I have created a web service to receive an XML file, so I followed the below approach then I published it and it worked fine for me :
....
XmlDocument xmldoc = new XmlDocument();
try
{
if (HttpContext.Current.Request.InputStream != null)
{
StreamReader stream = new StreamReader(HttpContext.Current.Request.InputStream);
string xmls = stream.ReadToEnd();
xmldoc.LoadXml(xmls);
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
}
}
catch (Exception ex)
{
logger.Log(NLog.LogLevel.Error, ex.Message + ex.StackTrace);
}
...
knowing that my XML structure is:
<reports uis="5521452542">
<attribute1>val1</attribute1>
...
</reports>
but after testing by some friends, that called my web service from the Lunix platform I received in the Log file error the below message error; knowing that their XML file is validated.
Just to let you know; that their XML file did not contains the declaration of:
<?xml version="1.0" encoding="UTF-8"?>
Can this provide the error or NOT ?
2014-04-03 03:56:53.7408|Error|Root element is missing.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at WebService.Service1.GetInfoService() in
D:\yassine\Mobily\Log\WebService\WebService\WebService\Service1.asmx.cs:line 56
2014-04-03 03:56:53.8032|Error|Root element is missing. at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text, LoadOptions options)
at WebService.Service1.GetInfoService() in
D:\yassine\Mobily\Log\WebService\WebService\WebService\Service1.asmx.cs:line 71
Can you please help me to find the exact error please ?
Thank you
The exception is saying exactly whats wrong, you are receiving an invalid xml that has no root element. Ask your friends to send you the raw xml by mail so you could see what they're sending you.
You can you Altova XmlSpy to verify that the xml is valid.
A very basic but valid xml should be:
<root>
<child></child
</root>
I have an application which is using to create XML documents on the example of existing. But that's not the point. Today I noticed that there is an error if the opened file encoding is ANSI. Before that I worked with files UTF-8 and this problem does not arise. What should you do and how?
Fragments of code:
string filepath;
XmlDocument xdoc = new XmlDocument();
XmlElement root;
...............
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
filepath = openFileDialog1.FileName;
textBox1.Text = filepath;
load();
}
...............
public void load()
{
xdoc.Load(filepath);
root = xdoc.DocumentElement;
...............
Error:
An unhandled exception of type 'System.Xml.XmlException' occurred in
System.Xml.dll Additional information: An invalid character for the
specified encoding., Line 35, position 16.
In that line is Cyrillic symbols (russian language). But if I converted this document to UTF-8 by NotePad++ - it loaded correctly.
You could use a StreamReader to read the file with the correct encoding and then load that stream into the XmlDocument overload that accepts a stream.
using(var sr = new StreamReader(filepath, myEncoding))
{
xdoc.Load(sr);
}
You can obtain myEncoding via the GetEncoding method.
I am getting "Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it" error while trying to load xml. Both my C# code and contents of XML file are given below. XML definition exists in Line 6 of the xml file and hence the error.
I can not control what's there in the xml file so how can I edit/rewrite it using C# such that xml declaration comes first and then the comments to load it without any error!
//xmlFilepath is the path/name of the xml file passed to this function
static function(string xmlFilepath)
{
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreComments = true;
readerSettings.IgnoreWhitespace = true;
XmlReader reader = XmlReader.Create(XmlFilePath, readerSettings);
XmlDocument xml = new XmlDocument();
xml.Load(reader);
}
XmlDoc.xml
<!-- Customer ID: 1 -->
<!-- Import file: XmlDoc.xml -->
<!-- Start time: 8/14/12 3:15 AM -->
<!-- End time: 8/14/12 3:18 AM -->
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
-----
As the error states, the first five characters of an XML document should be <?xml. No ifs, ands or buts. The comments you have above the opening XML tag are illegal; they must go inside the XML tag (because the comment structure is itself defined by the XML standard and so is meaningless outside the main XML tags).
EDIT: Something like this should be able to rearrange the rows, given the file format from the OP:
var lines = new List<string>();
using (var fileStream = File.Open(xmlFilePath, FileMode.Open, FileAccess.Read))
using(var reader = new TextReader(fileStream))
{
string line;
while((line = reader.ReadLine()) != null)
lines.Add(line);
}
var i = lines.FindIndex(s=>s.StartsWith("<?xml"));
var xmlLine = lines[i];
lines.RemoveAt(i);
lines.Insert(0,xmlLine);
using (var fileStream = File.Open(xmlFilePath, FileMode.Truncate, FileAccess.Write)
using(var writer = new TextWriter(fileStream))
{
foreach(var line in lines)
writer.Write(line);
writer.Flush();
}
That is not valid XML.
As the error clearly states, the XML declaration (<?xml ... ?>) must come first.
I'm using the following function to remove whitespace from xml:
public static void DoRemovespace(string strFile)
{
string str = System.IO.File.ReadAllText(strFile);
str = str.Replace("\n", "");
str = str.Replace("\r", "");
Regex regex = new Regex(#">\s*<");
string cleanedXml = regex.Replace(str, "><");
System.IO.File.WriteAllText(strFile, cleanedXml);
}
Don't put any comments in the beginning of your file!
How do i know if my XML file has data besides the name space info:
Some of the files contain this:
<?xml version="1.0" encoding="UTF-8"?>
And if i encounter such a file, i want to place the file in an error directory
You could use the XmlReader to avoid the overhead of XmlDocument. In your case, you will receive an exception because the root element is missing.
string xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
using (StringReader strReader = new StringReader(xml))
{
//You can replace the StringReader object with the path of your xml file.
//In that case, do not forget to remove the "using" lines above.
using (XmlReader reader = XmlReader.Create(strReader))
{
try
{
while (reader.Read())
{
}
}
catch (XmlException ex)
{
//Catch xml exception
//in your case: root element is missing
}
}
}
You can add a condition in the while(reader.Read()) loop after you checked the first nodes to avoid to read the entire xml file since you just want to check if the root element is missing.
I think the only way is to catch an exception when you try and load it, like this:
try
{
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.Load(Server.MapPath("XMLFile.xml"));
}
catch (System.Xml.XmlException xmlEx)
{
if (xmlEx.Message.Contains("Root element is missing"))
{
// Xml file is empty
}
}
Yes, there is some overhead, but you should be performing sanity checks like this anyway. You should never trust input and the only way to reliably verify it is XML is to treat it like XML and see what .NET says about it!
XmlDocument xDoc = new XmlDocument();
if (xDoc.ChildNodes.Count == 0)
{ // xml document is empty }
if (xDoc.ChildNodes.Count == 1)
{ // in xml document is only declaration node. (if you are shure that declaration is allways at the begining }
if (xDoc.ChildNodes.Count > 1)
{ // there is declaration + n nodes (usually this count is 2; declaration + root node) }
Haven't tried this...but should work.
try
{
XmlDocument doc = new XmlDocument();
doc.Load("test.xml");
}
catch (XmlException exc)
{
//invalid file
}
EDIT: Based on feedback comments
For large XML documents see Thomas's answer. This approach can have performance issues.
But, if it is a valid xml and the program wants to process it then this approach seems better.
If you aren't worried about validity, just check to see if there is anything after the first ?>. I'm not entirely sure of the C# syntax (it's been too long since I used it), but read the file, look for the first instance of ?>, and see if there is anything after that index.
However, if you want to use the XML later or you want to process the XML later, you should consider PK's answer and load the XML into an XmlDocument object. But if you have large XML documents that you don't need to process, then a solution more like mine, reading the file as text, might have less overhead.
You could check if the xml document has a node (the root node) and check it that node has inner text or other children.
As long as you aren't concerned with the validity of the XML document, and only want to ensure that it has a tag other than the declaration, you could use simple text processing:
var regEx = new RegEx("<[A-Za-z]");
bool foundTags = false;
string curLine = "";
using (var reader = new StreamReader(fileName)) {
while (!reader.EndOfStream) {
curLine = reader.ReadLine();
if (regEx.Match(curLine)) {
foundTags = true;
break;
}
}
}
if (!foundTags) {
// file is bad, copy.
}
Keep in mind that there's a million other reasons that the file may be invalid, and the code above would validate a file consisting only of "<a". If your intent is to validate that the XML document is capable of being read, you should use the XmlDocument approach.