I have a very large xml-file (let's say it has about 300000 elements). In my part of the application I only have to know if the name of the root-element is ApplicationLog and if there is an attribute called LogId in the root-element.
To read the XML I use:
XDocument document;
using (StreamReader streamReader = new StreamReader(filePath, true))
{
document = XDocument.Load(streamReader);
}
and to get the information I need:
try
{
if (document.Root != null)
{
if (string.Equals(document.Root.Name.LocalName, "ApplicationLog", StringComparison.InvariantCultureIgnoreCase) &&
document.Root.HasAttributes && (from o in document.Root.Attributes() where string.Equals(o.Name.LocalName, "LogId", StringComparison.InvariantCultureIgnoreCase) select o).Any())
{
isRelevantFile = true;
}
}
}
catch (Exception e)
{
}
This just works fine.
The problem is that the XDocument.Load takes about 15 seconds to load a XML-File which is about 20MB.
I also tried it with the XmlDocument, but there I have the same problem.
My first idea for a solution was to read the file as text and parse the first lines for the searched element/attribute. But this seems to be not so professional to me.
Does anybody know a better way to achieve this?
Use the XmlReader API with
using (XmlReader xr = XmlReader.Create(filePath))
{
xr.MoveToContent();
if (xr.LocalName == "ApplicationLog" ...)
}
You can try the solution provided here or use/develop a SAX reader such as this one. You can find more information on SAX here.
Related
I am using XmlReader.ReadInnerXml() to load part of an XML file and save it as an XmlDocument. I ran into OutOfMemoryException when the innerXml part was over 2 GB (an estimate). What is the best way to handle this error? Is there a better way to create a large xml from XmlReader? Can I save the content without loading into memory?
using (XmlReader xmlRdr = XmlReader.Create(file))
{
xmlRdr.MoveToContent();
while (xmlRdr.Read())
{
//when read to XmlNodeType.Element and xmlRdr.Name meets certain criteria
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.PreserveWhitespace = true;
try
{
xmlDoc.LoadXml(xmlRdr.ReadInnerXml());
//get a few data from within the innerXml and eventually use XmlWritter to save the file
}
catch(Exception e)
{
string content = $"{e.GetType()} {e.Message} {NewLine} {objId}";
//send content to log file and email
}
}
}
As said in one of the comments maybe try using StreamReader and StreamWriter
This tutorial might help
I have a piece of code which works well for normal files. But for really big files, it makes the server stop working.
Here it is:
XmlReader reader = null;
try
{
reader = XmlReader.Create(file_name + ".xml");
XDocument xml = XDocument.Load(reader);
XmlNamespaceManager namespaceManager = GetNamespaceManager(reader);
XElement root = xml.Root;
//XAttribute supplier = root.XPathSelectElement("//sh:Receive/sh:Id", namespaceManager).Attribute("Authority");
//string version = root.XPathSelectElement("//sh:DocumentId/sh:Version", namespaceManager).Value;
var nodes = root.XPathSelectElements("//eanucc:msg/eanucc:transact", namespaceManager);
return nodes;
}
catch
{ }
I think this is the part which causes the memory problem which happens on the server. How can I fix this?
It sounds like there's simply too much data to read in one go. You'll have to iterate over the elements one at a time, using XmlReader as a cursor, and converting one element to XElement at a time.
public static IEnumerable<XElement> ReadTransactions()
{
using (var reader = XmlReader.Create(file_name + ".xml"))
{
while (reader.ReadToFollowing("transact", eanuccNamespaceUri))
{
using (var subtree = reader.ReadSubtree())
{
yield return XElement.Load(subtree);
}
}
}
}
Note: this assumes there are never "transact" elements at any other level. If there are, you'll need to be more careful with your XmlReader than just calling ReadToFollowing. Also note that you'll need to find the actual namespace URI of the eanucc alias.
Don't forget that if you try to read all of this information in one go (e.g. by calling ToList()) then you'll still run out of memory. You need to stream the information. (It's not clear what you're trying to do with the elements, but you need to think about it carefully.)
Try putting the reader in a using(){} clause so it gets disposed of after use.
try
{
using(var reader = XmlReader.Create(file_name + ".xml"))
{
XDocument xml = XDocument.Load(reader);
XmlNamespaceManager namespaceManager = GetNamespaceManager(reader);
XElement root = xml.Root;
var nodes = root.XPathSelectElements("//eanucc:msg/eanucc:transact", namespaceManager);
return nodes;
}
}
catch
{ }
Im trying to open and edit XML file inside Silverlight element, but I can't edit it.
My XML file (Customers.xml) looks like this:
<?xml version="1.0"?>
<customers>
<customer>Joe</customer>
<customer>Barrel</customer>
</customers>
And my C# logic:
[...]
XDocument xdoc = XDocument.Load("Customers.xml");
xdoc.Root.Add(new XElement("customer", "Stephano")); //here I wish it to add Stephano as a customer.
using (var file = IsolatedStorageFile.GetUserStoreForApplication())
{
using (var stream = file.OpenFile("Customers.xml", FileMode.Create))
{
xdoc.Save(stream); //and here I wish it to save it to the file
}
}
PopulateCustomersList();
/\ here is a function that is used to display the content of XML file, it goes:
private void PopulateCustomersList()
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.XmlResolver = new XmlXapResolver();
XmlReader reader = XmlReader.Create("Customers.xml");
reader.MoveToContent();
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && reader.Name == "customer")
{
//OutputTextBlock.Text = reader.GetAttribute("first");
customersList.Items.Add(new ListBoxItem()
{
Content = reader.ReadInnerXml()
});
}
if (reader.NodeType == XmlNodeType.EndElement && reader.Name == "customers")
{
break;
}
}
reader.Close();
}
In my xaml file I have
<ListBox x:Name="customersList" />
so it gets displayed, but the problem is that only Joe and Barrel gets to be displayed and where is Stephano?
I got this code from various tutorials and forums, I don't quite understand it, I know it may be strange way to do that, but I just can't find out how to do that and I'm trying all sort of things. The funniest thing is that I found on many forums a way to save the file, which looks like this:
xdoc.Save("Customers.xml"); but my Visual Studio says that the arguments are wrong, because it is a string. How am I supposed to tell him that its a file?
Okay:
.Save() saves the current XDocument, I.E it's going to save the XML file you loaded up here
XDocument xdoc = XDocument.Load("Customers.xml");
So it should be something like (this is coded without any knowledge more than you provided)
XDocument xdoc = XDocument.Load("Customers.xml");
xdoc.Root.Add(new XElement("customer", "Stephano"));
xdoc.Save();
PopulateCustomersList(xdoc);
private void PopulateCustomersList(XDocument xdoc)
{
foreach(XElement in element xdoc.Root.Elements("customer"))
{
customersList.Items.Add(new ListBoxItem()
{
Content = (string)element;
}
}
}
How do i know if my XML file has data besides the name space info:
Some of the files contain this:
<?xml version="1.0" encoding="UTF-8"?>
And if i encounter such a file, i want to place the file in an error directory
You could use the XmlReader to avoid the overhead of XmlDocument. In your case, you will receive an exception because the root element is missing.
string xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
using (StringReader strReader = new StringReader(xml))
{
//You can replace the StringReader object with the path of your xml file.
//In that case, do not forget to remove the "using" lines above.
using (XmlReader reader = XmlReader.Create(strReader))
{
try
{
while (reader.Read())
{
}
}
catch (XmlException ex)
{
//Catch xml exception
//in your case: root element is missing
}
}
}
You can add a condition in the while(reader.Read()) loop after you checked the first nodes to avoid to read the entire xml file since you just want to check if the root element is missing.
I think the only way is to catch an exception when you try and load it, like this:
try
{
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.Load(Server.MapPath("XMLFile.xml"));
}
catch (System.Xml.XmlException xmlEx)
{
if (xmlEx.Message.Contains("Root element is missing"))
{
// Xml file is empty
}
}
Yes, there is some overhead, but you should be performing sanity checks like this anyway. You should never trust input and the only way to reliably verify it is XML is to treat it like XML and see what .NET says about it!
XmlDocument xDoc = new XmlDocument();
if (xDoc.ChildNodes.Count == 0)
{ // xml document is empty }
if (xDoc.ChildNodes.Count == 1)
{ // in xml document is only declaration node. (if you are shure that declaration is allways at the begining }
if (xDoc.ChildNodes.Count > 1)
{ // there is declaration + n nodes (usually this count is 2; declaration + root node) }
Haven't tried this...but should work.
try
{
XmlDocument doc = new XmlDocument();
doc.Load("test.xml");
}
catch (XmlException exc)
{
//invalid file
}
EDIT: Based on feedback comments
For large XML documents see Thomas's answer. This approach can have performance issues.
But, if it is a valid xml and the program wants to process it then this approach seems better.
If you aren't worried about validity, just check to see if there is anything after the first ?>. I'm not entirely sure of the C# syntax (it's been too long since I used it), but read the file, look for the first instance of ?>, and see if there is anything after that index.
However, if you want to use the XML later or you want to process the XML later, you should consider PK's answer and load the XML into an XmlDocument object. But if you have large XML documents that you don't need to process, then a solution more like mine, reading the file as text, might have less overhead.
You could check if the xml document has a node (the root node) and check it that node has inner text or other children.
As long as you aren't concerned with the validity of the XML document, and only want to ensure that it has a tag other than the declaration, you could use simple text processing:
var regEx = new RegEx("<[A-Za-z]");
bool foundTags = false;
string curLine = "";
using (var reader = new StreamReader(fileName)) {
while (!reader.EndOfStream) {
curLine = reader.ReadLine();
if (regEx.Match(curLine)) {
foundTags = true;
break;
}
}
}
if (!foundTags) {
// file is bad, copy.
}
Keep in mind that there's a million other reasons that the file may be invalid, and the code above would validate a file consisting only of "<a". If your intent is to validate that the XML document is capable of being read, you should use the XmlDocument approach.
How can you do a streaming read on a large XML file that contains a xs:sequence just below root element, without loading the whole file into a XDocument instance in memory?
Going with a SAX-style element parser and the XmlTextReader class created with XmlReader.Create would be a good idea, yes. Here's a slightly-modified code example from CodeGuru:
void ParseURL(string strUrl)
{
try
{
using (var reader = XmlReader.Create(strUrl))
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
var attributes = new Hashtable();
var strURI = reader.NamespaceURI;
var strName = reader.Name;
if (reader.HasAttributes)
{
for (int i = 0; i < reader.AttributeCount; i++)
{
reader.MoveToAttribute(i);
attributes.Add(reader.Name,reader.Value);
}
}
StartElement(strURI,strName,strName,attributes);
break;
//
//you can handle other cases here
//
//case XmlNodeType.EndElement:
// Todo
//case XmlNodeType.Text:
// Todo
default:
break;
}
}
}
catch (XmlException e)
{
Console.WriteLine("error occured: " + e.Message);
}
}
}
}
I can't add a comment, since I just signed up but the code sample posted by Hirvox and currently selected as the answer has a bug in it. It should not have the new statement when using the static Create method.
Current:
using (var reader = new XmlReader.Create(strUrl))
Fixed:
using (var reader = XmlReader.Create(strUrl))
I think it's not possible if you want to use object model (i.e. XElement\XDocument) to query XML. Obviously, you can't build XML objects tree without reading enough data. However you can use XmlReader class.
The XmlReader class reads XML data
from a stream or file. It provides
non-cached, forward-only, read-only
access to XML data.
Heres is a howto: http://support.microsoft.com/kb/301228/en-us Just remember that you should not use XmlTextReader but instead XmlReader in conjunction with XmlReader.Create
I'm confused by the mention of the "xs:sequence" - this is a XML Schema element.
Are you trying to open a large XML Schema file? Are you open a large XML file that is based on that schema? Or are you trying to open a large XML file and validate it at the same time?
None of these situations should provide you with a problem using the standard XmlReader (or XmlValidatingReader).
Reading XML with XMLReader: http://msdn.microsoft.com/en-us/library/9d83k261(VS.80).aspx
That code sample tries to turn XmlReader style code into SAX style code - if you're writing code from scratch I'd just use XmlReader as it was intended - Pull not Push.