How to resolve System.OutOfMemoryException when loading large XML file - c#

I have this code on my program that actually loads 500 MB and up files.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
reader.Close();
I get this kind of error and don't know how to resolve the problem. Please send me some advice.

I would use an XmlReader to parse the document, providing forward only access to the data and cleans itself up nicely in memory -- of course, it can be much more complex without the convenience of the XmlDocument class.
This simple sample will start by starting to read the file line by line, providing an XmlReader for each line.
using (var rdr = XmlReader.Create(new StreamReader("File.xml")))
{
while (rdr.Read())
{
//do what you will with the line
}
}
See the methods and properties available to you when using the XmlReader at XmlReader Properties (MSDN)

you need something like SAX but for .NET.
http://sourceforge.net/projects/saxdotnet/ or the XmlReader, basically a stream based parser.
HTH

Related

scraping data from website with a C# console application

I'm trying to learn Spanish and making some flash cards (for my personal use) to help me learn the verbs.
Here is an example, page example. So near the top of the page you will see the past participle: bloqueado & gerund: bloqueando. It is these two values that I wish to obtain in my code and use for my flash cards.
If this is possible I will use a C# console application. I am aware that scraping data from a website is not ideal however this is a once off.
Any guidance on how to start something like this and pitfalls to avoid would be very helpful!
I know this isn't an exact answer, but here is the process I would suggest.
https://www.gnu.org/software/wget/ and mirror the website to a
folder. Wget is a web spider and will follow the links on the site until it has downloaded everything. You'll have to run it with a few different parameters until you figure out the correct settings you want.
Use C# to run through each file in the folder and extract the
words from <section class="verb-mood-section"> in each file. It's your choosing of whether you want to output them to the console or store them in a database or flat file.
Should be that easy, in theory.
Use SGMLReader. SGMLReader is a versatile and robust component that will stream HTML to an XMLReader:
XmlDocument FromHtml(TextReader reader) {
// setup SgmlReader
Sgml.SgmlReader sgmlReader = new Sgml.SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
sgmlReader.InputStream = reader;
// create document
XmlDocument doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.XmlResolver = null;
doc.Load(sgmlReader);
return doc;
}
You can see that you need to create a TextReader first. TThis would in reality be a StreamReader as a TextReader is an abstract class.
Then you create the XMLDocument over that. Once you've got it into the XMLDocument you can use the various methods supported by XMLDocument to isolate and extract the nodes you need. I'll leave you to explore that aspect of it.
You might try using the XDocument class as it's a lot easier to handle than the XMLDocument, especially if you're a newbie. It also supports LINQ.

Reading an xml file 50 lines at a time

Currently trying to make a method to read in XML files at the moment 50 lines at a time this will be increased to allow larger files to be used in the program.
At the moment i am trying to accomplish this with the following code.
List<dataclass.DataRecord> list = new List<dataclass.DataRecord>();
string filename = "FileLocation"
XmlDocument testing = new XmlDocument();
//using (StreamReader streamreader = new StreamReader(filename))
using (XmlTextReader reader = new XmlTextReader(new StringReader(filename)))
{
while (reader.Read() != null)
{
for (int i = 0; i < 50; i++)
{
testing.Load(reader);
//list.add(line);
Console.WriteLine(testing);
//testing.Load(reader);
}
}
}
commented lines are just from previous ideas i used to accomplish my goal and the filename has been taken out as i just prefer not to place that online.
Basically at the moment i keep getting the following error:
Data at the root level is invalid. Line 1, position 1.
So i dunno if I am:
A. Going about this the right way.
B. Is the only way to fix this error is by surrounding the "testing.load" by "root + /root" tags
hope someone can help thank.
As I explained in my comment XML consists of nodes whereas you are looking at it as though it were a flat-file with lines.
There are a couple of Stackoverflow questions with answers that match what you are trying to do. The real question is "How can you load a large XML file". The answer is to use a stream rather than loading in one big chunk, following on from there you can find lots of resources about using XmlReader.
Couple of pointers to other SO articles:
C# and Reading Large XML Files
Reading large XML documents in .net
Hope that helps!
If you are only trying to load xml into XmlDocument - why not just
XmlDocument testing = new XmlDocument();
testing.Load(filename);
If your XML file is really big, you're better off using some sort of pull parser (parses tag-by-tag, attribute-by-attribute, etc) rather than DOM parser (loads whole document during parsing, keeps it in memory).

Read an XML file from http address

I need to read an xml file using c#/.net from a source like so: https://10.1.12.15/xmldata?item=all
That is basically just an xml file.
StreamReader does not like that.
What's the best way to read the contents of that link?
The file looks like so:
- <RIMP>
- <HSI>
<SBSN>CZ325000123</SBSN>
<SPN>ProLiant DL380p Gen8</SPN>
<UUID>BBBBBBGGGGHHHJJJJ</UUID>
<SP>1</SP>
<cUUID>0000-000-222-22222-333333333333</cUUID>
- <VIRTUAL>...
You'll want to use LINQ to XML to process the XML file. The XDocument.Load Method supports loading an XML document from an URI:
var document = XDocument.Load("https://10.1.12.15/xmldata?item=all");
Another way to do this is using the XmlDocument class. A lot of servers around the world are still running .Net Framework < 3.0 so it's good to know that this class still exists alongside XDocumentin case you're developing an application that will be run on a server.
string url = #"https://10.1.12.15/xmldata?item=all";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(url);
Maybe the correct answer must starting by reading the initial question about how to "Read an XML file from a URL (or in this case from a Http address)".
I think that can be the best for you see the next easy demos:
(In this case XmlTextReader but today you can use XmlReader instead of XmlTextReader)
http://support.microsoft.com/en-us/kb/307643
(Parallel you could read this documentation too).
https://msdn.microsoft.com/en-us/library/system.xml.xmlreader(v=vs.110).aspx
regards

How to Verify using C# if a XML file is broken

Is there anything built in to determine if an XML file is valid. One way would be to read the entire content and verify if the string represents valid XML content. Even then, how to determine if string contains valid XML data.
Create an XmlReader around a StringReader with the XML and read through the reader:
using (var reader = XmlReader.Create(something))
while(reader.Read())
;
If you don't get any exceptions, the XML is well-formed.
Unlike XDocument or XmlDocument, this will not hold an entire DOM tree in memory, so it will run quickly even on extremely large XML files.
You can try to load the XML into XML document and catch the exception.
Here is the sample code:
var doc = new XmlDocument();
try {
doc.LoadXml(content);
} catch (XmlException e) {
// put code here that should be executed when the XML is not valid.
}
Hope it helps.
Have a look at this question:
How to check for valid xml in string input before calling .LoadXml()

Under High load XDocument.Parse Creating errors

I am trying to access this webservice, The problem is that sometimes XDocument.Parse is not able to process and generates an error System.Xml.XmlException: Root element is missing. on the line:
XDocument xmlDoc = XDocument.Parse(xmlData);
Even though the XML sent is correct according to my logs.
I was wondering, Is it possible that the StreamReader is not working properly
using (StreamReader reader = new StreamReader(context.Request.InputStream))
{
xmlData = reader.ReadToEnd();
}
XDocument xmlDoc = XDocument.Parse(xmlData);
By the way this is all under a Custom HttpHandler.
Can someone please me guide in the right direction for this.
Thanks
Does it work any more consistently if you use
XDocument.Load(new StreamReader(context.Request.InputStream))
instead of XDocument.Parse?
Your code sample doesn't include logging of the read inputstream. The problem is prior to this point.

Categories

Resources