Under High load XDocument.Parse Creating errors - c#

I am trying to access this webservice, The problem is that sometimes XDocument.Parse is not able to process and generates an error System.Xml.XmlException: Root element is missing. on the line:
XDocument xmlDoc = XDocument.Parse(xmlData);
Even though the XML sent is correct according to my logs.
I was wondering, Is it possible that the StreamReader is not working properly
using (StreamReader reader = new StreamReader(context.Request.InputStream))
{
xmlData = reader.ReadToEnd();
}
XDocument xmlDoc = XDocument.Parse(xmlData);
By the way this is all under a Custom HttpHandler.
Can someone please me guide in the right direction for this.
Thanks

Does it work any more consistently if you use
XDocument.Load(new StreamReader(context.Request.InputStream))
instead of XDocument.Parse?

Your code sample doesn't include logging of the read inputstream. The problem is prior to this point.

Related

XDocument.Parse: Avoid replacing XXE references

I'm trying to protect against malicious XXE injections in the XMLs processed by my app. Therefore I'm using XDocument instead of XmlDocument.
The XML represents the payload of a web request so I call XDocument.Parse on its string content. However, I'm seeing the XXE references contained in the XML (&XXE) being replaced in the result with the actual value of ENTITY xxe.
Is it possible to parse the XML with XDocument without replacing &xxe ?
Thanks
EDIT:
I managed to avoid the replacement of xxes in the XML using XmlResolver=null for XDocument.Load
Instead of Parse try to use Load with a pre-configured reader:
var xdoc = XDocument.Load(new XmlTextReader(
new StringReader(xmlContent)) { EntityHandling = EntityHandling.ExpandCharEntities });
From MSDN:
When EntityHandling is set to ExpandCharEntities, the reader expands character entities and returns general entities as EntityReference nodes.
Use the following example to stop resolving XXE (schemas and DTD).
Dim objXmlReader As System.Xml.XmlTextReader = Nothing
objXmlReader = New System.Xml.XmlTextReader(_patternFilePath)
objXmlReader.XmlResolver = Nothing
patternDocument = XDocument.Load(objXmlReader)

Reading an xml file 50 lines at a time

Currently trying to make a method to read in XML files at the moment 50 lines at a time this will be increased to allow larger files to be used in the program.
At the moment i am trying to accomplish this with the following code.
List<dataclass.DataRecord> list = new List<dataclass.DataRecord>();
string filename = "FileLocation"
XmlDocument testing = new XmlDocument();
//using (StreamReader streamreader = new StreamReader(filename))
using (XmlTextReader reader = new XmlTextReader(new StringReader(filename)))
{
while (reader.Read() != null)
{
for (int i = 0; i < 50; i++)
{
testing.Load(reader);
//list.add(line);
Console.WriteLine(testing);
//testing.Load(reader);
}
}
}
commented lines are just from previous ideas i used to accomplish my goal and the filename has been taken out as i just prefer not to place that online.
Basically at the moment i keep getting the following error:
Data at the root level is invalid. Line 1, position 1.
So i dunno if I am:
A. Going about this the right way.
B. Is the only way to fix this error is by surrounding the "testing.load" by "root + /root" tags
hope someone can help thank.
As I explained in my comment XML consists of nodes whereas you are looking at it as though it were a flat-file with lines.
There are a couple of Stackoverflow questions with answers that match what you are trying to do. The real question is "How can you load a large XML file". The answer is to use a stream rather than loading in one big chunk, following on from there you can find lots of resources about using XmlReader.
Couple of pointers to other SO articles:
C# and Reading Large XML Files
Reading large XML documents in .net
Hope that helps!
If you are only trying to load xml into XmlDocument - why not just
XmlDocument testing = new XmlDocument();
testing.Load(filename);
If your XML file is really big, you're better off using some sort of pull parser (parses tag-by-tag, attribute-by-attribute, etc) rather than DOM parser (loads whole document during parsing, keeps it in memory).

validate xml string content including encoding using C#

I need to validate a string that contains XML Data, there is no schema validation required. All I need to do is make sure that the XML is well formed and properly encoded. For example, I want my code to identify this snippet of XML as invalid:
<?xml version="1.0" encoding="utf-8"?>
<parentNode> Positions1 ’</parentNode>
Using the LoadXML method in XMLDocument does not work, there are no errors thrown when I load the snippet above.
I am aware of how to do this if the content were in an XML file, the following snippet of code shows that:
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.ConformanceLevel = ConformanceLevel.Document;
readerSettings.CheckCharacters = true;
readerSettings.ValidationType = ValidationType.None;
xmlReader = XmlReader.Create(xmlFileName, readerSettings);
XmlDocument xdoc = new XmlDocument();
xdoc.Load(xmlReader);
So short of creating a temporary file to write out my xml string content and then creating an XmlReader instance to read it, is there any alternative? Appreciate much if someone could guide me in the right direction with this problem.
You have not fully understand what encoding means. If you have a .Net string in memory, it's no more "raw data" and has no encoding for that reason. And so LoadXML ingores for a good reason. So what you want to do makes not much sense at all. But if you really want to do it:
You can convert your string into a in memory stream, so you don't have to write a temporary file. Then you can use that stream instead of the xmlFileName in your call to XmlReader.Create.
Achim,
Thanks for your detailed replies, I was able to finally come up with a solution that fits my needs. It involves grabbing the bytes out of the 'unicode' string and then transforming the bytes to utf8 encoding.
try
{
byte[] xmlContentInBytes = new System.Text.UnicodeEncoding().GetBytes(xmlContent);
System.Text.UTF8Encoding utf8 = new System.Text.UTF8Encoding(false, true);
utf8.GetChars(xmlContentInBytes);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
return false;
}

How to Verify using C# if a XML file is broken

Is there anything built in to determine if an XML file is valid. One way would be to read the entire content and verify if the string represents valid XML content. Even then, how to determine if string contains valid XML data.
Create an XmlReader around a StringReader with the XML and read through the reader:
using (var reader = XmlReader.Create(something))
while(reader.Read())
;
If you don't get any exceptions, the XML is well-formed.
Unlike XDocument or XmlDocument, this will not hold an entire DOM tree in memory, so it will run quickly even on extremely large XML files.
You can try to load the XML into XML document and catch the exception.
Here is the sample code:
var doc = new XmlDocument();
try {
doc.LoadXml(content);
} catch (XmlException e) {
// put code here that should be executed when the XML is not valid.
}
Hope it helps.
Have a look at this question:
How to check for valid xml in string input before calling .LoadXml()

How to resolve System.OutOfMemoryException when loading large XML file

I have this code on my program that actually loads 500 MB and up files.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
reader.Close();
I get this kind of error and don't know how to resolve the problem. Please send me some advice.
I would use an XmlReader to parse the document, providing forward only access to the data and cleans itself up nicely in memory -- of course, it can be much more complex without the convenience of the XmlDocument class.
This simple sample will start by starting to read the file line by line, providing an XmlReader for each line.
using (var rdr = XmlReader.Create(new StreamReader("File.xml")))
{
while (rdr.Read())
{
//do what you will with the line
}
}
See the methods and properties available to you when using the XmlReader at XmlReader Properties (MSDN)
you need something like SAX but for .NET.
http://sourceforge.net/projects/saxdotnet/ or the XmlReader, basically a stream based parser.
HTH

Categories

Resources