Efficient Way to Parse XML - c#

I find it puzzling to determine the best way to parse some XML. It seems they are so many possible ways and none have really clicked with me.
My current attempt looks something like this:
XElement xelement = XElement.Parse(xmlText);
var name = xelement.Element("Employee").Attribute("name").Value;
So, this works. But it throws an exception if either the "Employee" element or the "name" attribute is missing. I don't want to throw an exception.
Exploring some examples available online, I see code like this:
XElement xelement = XElement.Load("..\\..\\Employees.xml");
IEnumerable<XElement> employees = xelement.Elements();
Console.WriteLine("List of all Employee Names :");
foreach (var employee in employees)
{
Console.WriteLine(employee.Element("Name").Value);
}
This would seem to suffer from the exact same issue. If the "Name" element does not exist, Element() returns null and there is an error calling the Value property.
I need a number of blocks like the first code snippet above. Is there a simple way to have it work and not throw an exception if some data is missing?

You can use the combination of the explicit string conversion from XAttribute to string (which will return null if the operand is null) and the FirstOrDefault method:
var name = xelement.Elements("Employee")
.Select(x => (string) x.Attribute("name"))
.FirstOrDefault();
That will be null if either there's no such element (because the sequence will be empty, and FirstOrDefault() will return null) or there's an element without the attribute (in which case you'll get a sequence with a null element, which FirstOrDefault will return).

I often use extension methods in cases like this as they work even if the reference is null. I use a slightly modified version of the extension method's from Anders Abel's very good blog posting from early 2012 'Null Handling with Extension Methods':
public static class XElementExtension
{
public static string GetValueOrDefault(this XAttribute attribute,
string defaultValue = null)
{
return attribute == null ? defaultValue : attribute.Value;
}
public static string GetAttributeValueOrDefault(this XElement element,
string attributeName,
string defaultValue = null)
{
return element == null ? defaultValue : element.Attribut(attributeName)
.GetValueOrDefault(defaultValue);
}
}
If you want to return 'null' if the element or attribute doesn't exist:
var name = xelement.Element("Employee")
.GetAttributeValueOrDefault("name" );
If you want to return a default value if the element or attribute doesn't exist:
var name = xelement.Element("Employee")
.GetAttributeValueOrDefault("name","this is the default value");
To use in your for loop:
XElement xelement = XElement.Load("..\\..\\Employees.xml");
IEnumerable<XElement> employees = xelement.Elements();
Console.WriteLine("List of all Employee Names :");
foreach (var employee in employees)
{
Console.WriteLine(employee.GetAttributeValueOrDefault("Name"));
}

You could always use XPath:
string name = xelement.XPathEvaluate("string(Employee/#name)") as string;
This will be either the value of the attribute, or null if either Employee or #name do not exist.
And for the iterative example:
foreach (XNode item in (IEnumerable)xelement.XPathEvaluate("Employee/Name"))
{
Console.WriteLine(item.Value);
}
XPathEvaluate() will only select valid nodes here, so you can be assured that item will always be non-null.

It all depends on what you want to do with the data once you've extracted it from the XML.
You would do well to look at languages that are designed for XML processing, such as XSLT and XQuery, rather than using languages like C#, which aren't (though Linq gives you something of a hybrid). Using C# or Java you're always going to have to do a lot of work to cope with the fact that XML is so flexible.

Use the native XmlReader. If your problem is reading large XML files instead of allowing the XElement to build an object representation, you can build something like Java SAX parser that only stream the XML.
Ex:
http://www.codeguru.com/csharp/csharp/cs_data/xml/article.php/c4221/Writing-XML-SAX-Parsers-in-C.htm

Related

Check if an elements exist while parsing xml "Error No Sequence element"

i have an xml with different set tags, i want to check if an element exist since i am having error no sequence element How do i check if tag szSerialNmbr is present, if not assign a null value or escape the Transaction Descendants.
I went through other post but getting error Error "Extension method must be defined in a non-generic static class"
XDocument xDocument = XDocument.Load(file);
foreach (var trans in xDocument.Descendants("Transaction"))
{
var val1 = (string)trans.Descendants("Set").Elements("szSerialNmbr").First();
var val2 = (string)trans.Descendants("Set").Elements("lMediaNmbr").First();
var val3 = (string)trans.Descendants("Set").Elements("lMediaMember").First();
}
you can do a null check by adding a ?, also if you are not sure if your set will contain the object you are looking for in linq, you can use firstOrDefault to allow null as a return value.
var val1 = (string)trans?.Descendants("Set")?.Elements("szSerialNmbr")?.FirstOrDefault()?.Value;

Find elements by attribute name and its value using XDocument

Well, using .NET 3.5 and XDocument I am trying to find <table class='imgcr'> element. I created the code below but it crashes, of course, because e.Attribute("class") may be null. So... I have to put null check everywhere? This will double e.Attribute("class"). Not laconic solution at all.
XElement table =
d.Descendants("table").
SingleOrDefault(e => e.Attribute("class").Value == "imgcr");
If you are sure you exception is thrown because you table element may come without class attribute, then you could do this instead:
XElement table =
d.Descendants("table").
SingleOrDefault(e => ((string)e.Attribute("class")) == "imgcr");
In that case you are casting a null value to string, which is null at the end, so you are comparing null == "imgcr", what is false.
You can check this msdn page if you need more info about how to retrieve the value of an attribute. There you will find this affirmation:
You can cast an XAttribute to the desired type; the explicit
conversion operator then converts the contents of the element or
attribute to the specified type.
I guess this is quite short
XElement table =
d.Descendants("table").
SingleOrDefault(e => { var x = e.Attribute("class"); return x==null ? false: x.Value == "imgcr";});
this is shorter (but not much -- unless you can re-use t variable.)
XAttribute t = new XAttribute("class","");
XElement table =
d.Descendants("table").
SingleOrDefault(e => (e.Attribute("class") ?? t).Value == "imgcr");

If Element does not exist

I have around a dozen solutions to this, but none seem to fit what I am trying to do. The XML file has elements that may not be in the file each time it is posted.
The trick is, the query is dependent upon a question value to get the answer value. Here is the code:
string otherphone = (
from e in contact.Descendants("DataElement")
where e.Element("QuestionName").Value == "other_phone"
select (string)e.Element("Answer").Value
).FirstOrDefault();
otherphone = (!String.IsNullOrEmpty(otherphone)) ? otherphone.Replace("'", "''") : null;
Under the "contact" collection, here are many elements named "DataElement", each with its own "QuestionName" and "Answer" elements, so I query to find the one where the element's QuestionName value is "other_phone", then I get the Answer value. Of course I will need to do this for each value I am seeking.
How can I code this to ignore the DataElement containing QuestionName with value of "other_phone" if it doesn't exist?
You can use Any method to check whether or not the elements exists :
if(contact.Descendants("DataElement")
.Any(e => (string)e.Element("QuestionName") == "other_phone"))
{
var otherPhone = (string)contact
.Descendants("DataElement")
.First(e => (string)e.Element("QuestionName") == "other_phone")
.Element("Answer");
}
Also, don't use Value property if you are using explicit cast.The point of explicit cast is avoid the possible exception if the element wasn't found.If you use both then before the cast, accessing the Value property will throw the exception.
Alternatively, you can also just use the FirstOrDefault method without Any, and perform a null-check:
var element = contact
.Descendants("DataElement")
.FirstOrDefault(e => (string)e.Element("QuestionName") == "other_phone");
if(element != null)
{
var otherPhone = (string)element.Element("Answer");
}
So you want to know if other_phone exists or not?
XElement otherPhone = contact.Descendants("QuestionName")
.FirstOrDefault(qn => ((string)qn) == "other_phone");
if (otherPhone == null)
{
// No question with "other_phone"
}
else
{
string answer = (string)otherPhone.Parent.Element("Answer");
}

How to check multiple XMLNode Attributes for Null Value?

I am trying to read multiple attributes from an xml file using XMLNode, but depending on the element, the attribute might not exist. In the event the attribute does not exist, if I try to read it into memory, it will throw a null exception. I found one way to test if the attribute returns null:
var temp = xn.Attributes["name"].Value;
if (temp == null)
{ txtbxName.Text = ""; }
else
{ txtbxName.Text = temp; }
This seems like it will work for a single instance, but if I am checking 20 attributes that might not exist, I'm hoping there is a way to setup a method I can pass the value to test if it is null. From what I have read you can't pass a var as it is locally initialized, but is there a way I could setup a test to pass a potentially null value to be tested, then return the value if it is not null, and return "" if it is null? Is it possible, or do would I have to test each value individually as outlined above?
You can create a method like this:
public static string GetText(XmlNode xn, string attrName)
{
var attr = xn.Attributes[attrName];
if (attr == null). // Also check whether the attribute does not exist at all
return string.Empty;
var temp = attr.Value;
if (temp == null)
return string.Empty;
return temp;
}
And call it like this:
txtbxName.Text = GetText(xn, "name");
If you use an XDocument you could just use Linq to find all the nodes you want.
var names = (from attr in doc.Document.Descendants().Attributes()
where attr.Name == "name"
select attr).ToList();
If you are using XmlDocument for some reason, you could select the nodes you want using XPath. (My XPath is rusty).
var doc = new XmlDocument();
doc.Load("the file");
var names = doc.SelectNodes("//[Name=\"name\"");

Extension method for Null handling not working on linq for xml

I have an nullexception issue when trying to get the value of an xml tag, which is under a subtree that may not be there.
The extension handler works great when it can't find a tag on an existing subtree, but seems to not be able to handle when looking for a tag in a subtree that doesn't exist.
In this case, the subtree is summaryData, which may or not be there, and trying to get addressLine1 is where it doesn't handle the null, and I get the error
System.NullReferenceException occurred, Message=Object reference not
set to an instance of an object.
Here is the xml, cut down for clarity, but structure is correct:
<record>
<accounts>
<account >
</account >
</accounts>
<summaryData>
<Date>2013-02-04</Date>
<address >
<city>Little Rock</city>
<postalCode>00000</postalCode>
<state>AR</state>
<addressLine1>Frank St</addressLine1>
</serviceAddress>
</summaryData>
</record>
My C# code is:
xmlDoc.Descendants("account")
//select (string)c.Element("account") ;
select new
{
//this works fine
Stuffinxml = c.Element("stuffinxml").Value,
//this field may not be there, but the handler handlers the exception correctly here when it as the correct root (account)
otherstuff = CustXmlHelp.GetElementValue(mR.Element("otherstuff")),
//this is the problem, where the summaryData root does not exist (or moved somewhere else)
street_address = GetElementValue(c.Element("summaryData").Element("serviceAddress").Element("addressLine1"))
};
My extension method to handle a null is:
public static string GetElementValue(this XElement element)
{
if (element != null)
{
return element.Value;
}
else
{
return string.Empty;
}
}
Any help would be appreciated, as I can't see why it fails when the subtree does not exist.
The summary data may or may not be there
That's why. As you're nesting calls, you'll have to null check them all.
Any one of these could be null:
c.Element("summaryData").Element("serviceAddress").Element("addressLine1")
Without a complex conditional, there's not a nice way around it:
street_address = c.Element("summaryData") != null
? c.Element("summaryData").Element("serviceAddress") != null
? GetElementValue(c.Element("summaryData").Element("serviceAddress").Element("addressLine1"))
: string.Empty
: string.Empty;
If the summaryDate element does not exist then
c.Element("summaryData").Element("serviceAddress").Element("addressLine1")
will throw a NullReferenceException because you're trying to call Element() on a null reference (c.Element("summaryData"))
As had been stated, your exception is due to the fact that you are passing multiple nested queries
c.Element("summaryData").Element("serviceAddress").Element("addressLine1")
is the equivalant of writing:
var x1 = c.Element("summaryData");
var x2 = x1.Element("serviceAddress")
var x3 = x2.Element("addressLine1")
So if any of c, x1, or x2 are null, you are going to get a NullReferenceException.
One possible alternative to a pure LINQ solution using multiple null checks, is to use XPath to build the expression.
With XPath, rather than doing a null check at every level, you can instead write your expression:
street_address = GetElementValue(c.XPathSelectElement("/summaryData/serviceAddress/addressLine1"))
This will evaluate the entire expression and if it does not exist in its entirety, it will return null, but will not thrown an exception like your pure LINQ query.

Categories

Resources