Parsing an UN XML file in C#

Parsing an UN XML file in C# - c#

I'm trying to parse an XML file from UN website (http://www.un.org/sc/committees/1267/AQList.xml) using c#.
There is one problem I'm constantly having with this file, and that's the number of child tags varies from one <.INDIVIDUAL.> tag to another. One example is <.FORTH_NAME.> child tag.
I've tried a number of different approaches, but somehow I always seem to be stuck with the same problem, and that's different number of child tags inside <.INDIVIDUAL.> tag.
What I'm trying to achieve is to collect all the tags and their values under one <.INDIVIDUAL.> tag, and then insert only those I want into my database. If a tag is missing, for example <.FOURTH_NAME.>, than I need to insert only first three names into the database, and skip the fourth.
I've tried using Linq to XML, and here are some examples:
XDocument xdoc = XDocument.Load(path);
var tags = (from t in xdoc.Descendants("INDIVIDUALS")
from a in t.Elements("INDIVIDUAL")
select new
{
Tag = a.Name,
val = a.Value
});
foreach (var obj in tags)
{
Console.WriteLine(obj.Tag + " - " + obj.val + "\t");
//insert SQL goes here
}
or:
but this one only collects non empty FOURTH_NAME tags...
var q = (from c in xdoc.Descendants("INDIVIDUAL")
from _1 in c.Elements("FIRST_NAME")
from _2 in c.Elements("SECOND_NAME")
from _3 in c.Elements("THIRD_NAME")
from _4 in c.Elements("FOURTH_NAME")
where _1 != null && _2 != null && _3 != null && _4 != null
select new
{
_1 = c.Element("FIRST_NAME").Value,
_2 = c.Element("SECOND_NAME").Value,
_3 = c.Element("THIRD_NAME").Value,
_4 = c.Element("FOURTH_NAME").Value
});
foreach (var obj in q)
{
Console.WriteLine("Person: " + obj._1 + " - " + obj._2 + " - " + obj._3 + " - " + obj._4);
//insert SQL goes here
}
Any ideas??

Instead of calling Value on the element, consider using a string cast. LINQ to XML safely returns null if the element doesn't exist. Try the following:
var data = XElement.Load(#"http://www.un.org/sc/committees/1267/AQList.xml");
var individuals = data.Descendants("INDIVIDUAL")
.Select(i => new {
First = (string)i.Element("FIRST_NAME"),
Middle = (string)i.Element("SECOND_NAME"),
Last = (string)i.Element("THIRD_NAME")
});
If you want to be more flexible and get all of the name fields, you can do something like the following. (I'll leave the process of grouping individuals as an additional homework assignment ;-)
data.Descendants("INDIVIDUAL").Elements()
.Where (i =>i.Name.LocalName.EndsWith("_NAME" ))
.Select(i => new { FieldName= i.Name.LocalName, Value=i.Value});

Why don't you use XmlSerializer and LINQ instead ?
As explained in this answer, generate your classes by pasting in a new CS file :
menu EDIT > Paste Special > Paste XML As Classes.
Then grab your data as easily as follows :
var serializer = new XmlSerializer(typeof (CONSOLIDATED_LIST));
using (FileStream fileStream = File.OpenRead(#"..\..\aqlist.xml"))
{
var list = serializer.Deserialize(fileStream) as CONSOLIDATED_LIST;
if (list != null)
{
var enumerable = list.INDIVIDUALS.Select(s => new
{
FirstName = s.FIRST_NAME,
SecondName = s.SECOND_NAME,
ThirdName = s.THIRD_NAME,
FourthName = s.FOURTH_NAME
});
}
}
You can then specify any predicate that better suits your needs.
Going this path will be a huge time-saver and less error-prone, no need to use strings to access fields, strong typing etc ...

Related

Getting list of Variables of map in BPM Metastorm

I'm trying to get list of variables in some map OUTSIDE program automatically. I know I can find them in .process file, with has xml structure.
I also figured out that "x:object" with variable contains "x:Type" ending with "MboField}".
But unfortunately I need to narrow searching criterias more, because I still can't find the main patern to separate variables from other objects.
This is my current code in c#:
var xdoc = XDocument.Load(patches.ProcessFilePatch);
var xmlns = XNamespace.Get("http://schema.metastorm.com/Metastorm.Common.Markup");
IEnumerable<string> values = from x in xdoc.Descendants(xmlns+"Object")
where x.Attribute(xmlns+"Type").Value.ToString().EndsWith("MboField}")
select x.Attribute(xmlns+"Name").Value.ToString();
VariablesInProcessFile = values.ToList();
Any other ways to find Variables among others?

private void getVariablesInProcessFile()
{
var xdoc = XDocument.Load(patches.ProcessFilePatch);
var xmlns = XNamespace.Get("http://schema.metastorm.com/Metastorm.Common.Markup");
var dane = xdoc.Descendants(xmlns + "Object").Where(x => CheckAttributes(x, xmlns)).ToArray();
IEnumerable<string> valuesE = from x in dane.Descendants(xmlns + "Object")
where x.Attribute(xmlns + "Type").Value.ToString().EndsWith("MboField}")
select x.Attribute(xmlns + "Name").Value.ToString();
VariablesInProcessFile = valuesE.ToList();
}
private bool CheckAttributes(XElement x, XNamespace xmlns)
{
var wynik = x.Attribute(xmlns + "Name");
return wynik != null && (wynik.Value == patches.MapName + "Data" || wynik.Value == patches.altMapName + "Data");
}
Where "patches" is my own class containing patch to .process file and possible names of group of Variables, usually related to name of the map.

C# Linq get descendants on a subquery

I've been banging my head on the desktop for the past couple of hours trying to decipher this issue.
I'm trying to query an XML file with Linq, the xml has the following format:
<MRLGroups>
<MRLGroup>
<MarketID>6084</MarketID>
<MarketName>European Union</MarketName>
<ActiveIngredientID>28307</ActiveIngredientID>
<ActiveIngredientName>2,4-DB</ActiveIngredientName>
<IndexCommodityID>59916</IndexCommodityID>
<IndexCommodityName>Cucumber</IndexCommodityName>
<ScientificName>Cucumis sativus</ScientificName>
<MRLs>
<MRL>
<PublishedCommodityID>60625</PublishedCommodityID>
<PublishedCommodityName>Cucumbers</PublishedCommodityName>
<MRLTypeID>238</MRLTypeID>
<MRLTypeName>General</MRLTypeName>
<DeferredToMarketID>6084</DeferredToMarketID>
<DeferredToMarketName>European Union</DeferredToMarketName>
<UndefinedCommodityLinkInd>false</UndefinedCommodityLinkInd>
<MRLValueInPPM>0.0100</MRLValueInPPM>
<ResidueDefinition>2,4-DB</ResidueDefinition>
<AdditionalRegulationNotes>Comments.</AdditionalRegulationNotes>
<ExpiryDate xsi:nil="true" />
<PrimaryInd>true</PrimaryInd>
<ExemptInd>false</ExemptInd>
</MRL>
<MRL>
<PublishedCommodityID>60626</PublishedCommodityID>
<PublishedCommodityName>Gherkins</PublishedCommodityName>
<MRLTypeID>238</MRLTypeID>
<MRLTypeName>General</MRLTypeName>
<DeferredToMarketID>6084</DeferredToMarketID>
<DeferredToMarketName>European Union</DeferredToMarketName>
<UndefinedCommodityLinkInd>false</UndefinedCommodityLinkInd>
<MRLValueInPPM>0.0100</MRLValueInPPM>
<ResidueDefinition>2,4-DB</ResidueDefinition>
<AdditionalRegulationNotes>More Comments.</AdditionalRegulationNotes>
<ExpiryDate xsi:nil="true" />
<PrimaryInd>false</PrimaryInd>
<ExemptInd>false</ExemptInd>
</MRL>
</MRLs>
</MRLGroup>
So far i've created classes for the "MRLGroup" section of the file
var queryMarket = from market in doc.Descendants("MRLGroup")
select new xMarketID
{
MarketID = Convert.ToString(market.Element("MarketID").Value),
MarketName = Convert.ToString(market.Element("MarketName").Value)
};
List<xMarketID> markets = queryMarket.Distinct().ToList();
var queryIngredient = from ingredient in doc.Descendants("MRLGroup")
select new xActiveIngredients
{
ActiveIngredientID = Convert.ToString(ingredient.Element("ActiveIngredientID").Value),
ActiveIngredientName = Convert.ToString(ingredient.Element("ActiveIngredientName").Value)
};
List<xActiveIngredients> ingredientes = queryIngredient.Distinct().ToList();
var queryCommodities = from commodity in doc.Descendants("MRLGroup")
select new xCommodities {
IndexCommodityID = Convert.ToString(commodity.Element("IndexCommodityID").Value),
IndexCommodityName = Convert.ToString(commodity.Element("IndexCommodityName").Value),
ScientificName = Convert.ToString(commodity.Element("ScientificName").Value)
};
List<xCommodities> commodities = queryCommodities.Distinct().ToList();
After i got the "catalogues" I'm trying to query the document against the catalogues to achieve some sort of "groups", after all this, i'm going to send this data to the database, the issue here is that the xml files are around 600MB each and i get the everyday, so my approach is to create catalogues and just send the MRLs to the database joined to the "header" table that contains the Catalogues IDs, here's what i've done so far but failed miserably:
//markets
foreach (xMarketID market in markets) {
//ingredients
foreach (xActiveIngredients ingredient in ingredientes) {
//commodities
foreach (xCommodities commodity in commodities) {
var mrls = from m in doc.Descendants("MRLGroup")
where Convert.ToString(m.Element("MarketID").Value) == market.MarketID
&& Convert.ToString(m.Element("ActiveIngredientID").Value) == ingredient.ActiveIngredientID
&& Convert.ToString(m.Element("IndexCommodityID").Value) == commodity.IndexCommodityID
select new
{
ms = new List<xMRLIndividial>(from a in m.Element("MRLs").Descendants()
select new xMRLIndividial{
publishedCommodityID = string.IsNullOrEmpty(a.Element("PublishedCommodityID").Value) ? "" : a.Element("PublishedCommodityID").Value,
publishedCommodityName = a.Element("PublishedCommodityName").Value,
mrlTypeId = a.Element("MRLTypeID").Value,
mrlTypeName = a.Element("MRLTypeName").Value,
deferredToMarketId = a.Element("DeferredToMarketID").Value,
deferredToMarketName = a.Element("DeferredToMarketName").Value,
undefinedCommodityLinkId = a.Element("UndefinedCommodityLinkInd").Value,
mrlValueInPPM = a.Element("MRLValueInPPM").Value,
residueDefinition = a.Element("ResidueDefinition").Value,
additionalRegulationNotes = a.Element("AdditionalRegulationNotes").Value,
expiryDate = a.Element("ExpiryDate").Value,
primaryInd = a.Element("PrimaryInd").Value,
exemptInd = a.Element("ExemptInd").Value
})
};
foreach (var item in mrls)
{
Console.WriteLine(item.ToString());
}
}
}
}
If you notice i'm trying to get just the MRLs descendants but i got this error:
All i can reach on the "a" variable is the very first node of MRLs->MRL not all of them, what is going on?
If you guys could lend me a hand would be super!
Thanks in advance.

With this line...
from a in m.Element("MRLs").Descendants()
...will iterate through all sub-elements, including children of children. Hence your error, since your <PublishedCommodityID> element does not have a child element.
Unless you want to specifically return all child elements of all levels, always use the Element and Elements axis instead of Descendant and Descendants:
from a in m.Element("MRLs").Elements()
That should solve your problem.
However, your query is also difficult to read with the nested foreach loops and the multiple tests for the IDs. You can simplify it with a combination of LINQ and XPath:
var mrls =
from market in markets
from ingredient in ingredientes
from commodity in commodities
let xpath = $"/MRLGroups/MRLGroup[{market.MarketId}]" +
$"[ActiveIngredientID={ingredient.ActiveIngredientId}]" +
$"[IndexCommodityID={commodity.IndexCommodityID}]/MRLs/MRL"
select new {
ms =
(from a in doc.XPathSelectElements(xpath)
select new xMRLIndividial {
publishedCommodityID = string.IsNullOrEmpty(a.Element("PublishedCommodityID").Value) ? "" : a.Element("PublishedCommodityID").Value,
publishedCommodityName = a.Element("PublishedCommodityName").Value,
mrlTypeId = a.Element("MRLTypeID").Value,
mrlTypeName = a.Element("MRLTypeName").Value,
deferredToMarketId = a.Element("DeferredToMarketID").Value,
deferredToMarketName = a.Element("DeferredToMarketName").Value,
undefinedCommodityLinkId = a.Element("UndefinedCommodityLinkInd").Value,
mrlValueInPPM = a.Element("MRLValueInPPM").Value,
residueDefinition = a.Element("ResidueDefinition").Value,
additionalRegulationNotes = a.Element("AdditionalRegulationNotes").Value,
expiryDate = a.Element("ExpiryDate").Value,
primaryInd = a.Element("PrimaryInd").Value,
exemptInd = a.Element("ExemptInd").Value
}).ToList()
};

Best way to put dash for empty field in C#

I populated a pdf from some fields that are coming from XML in C# code.
I have to put dash (-) for each empty field. Should I check each time if the field is empty put the dash or there is a way to do it for all the fields at once?
What is the best way as I have 50 fields to check.
that is the code I have now:
dt.LastName = (dt.LastName == null ? null : (string)individual.XPathSelectElement("AIndividual[#Type='Co-Applicant']/GivenName/LastName"));
if (dt.LastName == null)
dt.LastName = "-";

I presume dt.LastName originally comes form the same document from another AIndividual element. In this case you could process your document using an array of XPath selectors and property setters. (Mind the code below is a rough sketch and not even compiled):
public class Applicant
{
public string LastName { get; set;}
}
public void Process(XmlElement application, Applicant applicant)
{
var selectors = new[] {
new {
Setter = new Action<Applicant, string>((t,v) => t.LastName = v),
XPath = "GivenName/LastName"
}
};
foreach(var selector in selectors)
{
var node = application.SelectSingleNode("AIndividual[#Type='PrimaryApplicant']/" + selector.XPath) ??
application.SelectSingleNode("AIndividual[#Type='CoApplicant']/" + selector.XPath);
selector.Setter(applicant, node == null ? "-" : node.Value);
}
}

XML linq need detail info on exception

I am using xml linq on my project. I am dealing with very large xml's for easy understanding purpose I have mentioned small sample xml.
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<StackOverflowReply xmlns="http://xml.stack.com/RRAND01234">
<processStatus>
<statusCode1>P</statusCode1>
<statusCode2>P</statusCode2>
<statusCode3>P</statusCode3>
<statusCode4>P</statusCode4>
</processStatus>
</StackOverflowReply>
</soap:Body>
Following is C# xml linq
XNamespace x = "http://xml.stack.com/RRAND01234";
var result = from StackOverflowReply in XDocument.Parse(Myxml).Descendants(x + "Security_AuthenticateReply")
select new
{
status1 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode1").Value,
status2 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode2").Value,
status3 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode3").Value,
status4 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode4").Value,
status5 = StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode5").Value,
};
Here I am getting exception like "Object reference not set to an instance of an object.". Because the tag
<statusCode5>
was not in my xml.In this case I want to get detail exception message like "Missing tag statusCode5". Please guide me how to get this message from my exception.

There's no easy way (that I'm aware of) to find out exactly what element(s) was/were missing in a LINQ to XML statement. What you can do however is use (string) on the element to handle missing elements - but that can get tricky if you have a chain of elements.
That wouldn't work in your current code:
status5 = (string)StackOverflowReply.Element(x + "processStatus").Element(x + "statusCode5")
Becuase (string) will only work on first element, and the second one is the one that is missing.
You could change your LINQ to focus only on the subnodes, like this:
XNamespace x = "http://xml.stack.com/RRAND01234";
var result = from StackOverflowReply in XDocument.Parse(Myxml).Descendants(x + "processStatus")
select new
{
status1 = (string)StackOverflowReply.Element(x + "statusCode1"),
status2 = (string)StackOverflowReply..Element(x + "statusCode2"),
status3 = (string)StackOverflowReply..Element(x + "statusCode3"),
status4 = (string)StackOverflowReply.Element(x + "statusCode4"),
status5 = (string)StackOverflowReply.Element(x + "statusCode5"),
};
However, if your XML is complex and you have different depths (nested elements), you'll need a more robust solution to avoid a bunch of conditional operator checks or multiple queries.
I have something that might help if that is the case - I'll have to dig it up.
EDIT For More Complex XML
I've had similar challenges with some XML I have to deal with at work. In lieu of an easy way to determine what node was the offending node, and not wanting to have hideously long ternary operators, I wrote an extension method that worked recursively from the specified starting node down to the one I was looking for.
Here's a somewhat simple and contrived example to demonstrate.
<SomeXML>
<Tag1>
<Tag1Child1>Value1</Tag1Child1>
<Tag1Child2>Value2</Tag1Child2>
<Tag1Child3>Value3</Tag1Child3>
<Tag1Child4>Value4</Tag1Child4>
</Tag1>
<Tag2>
<Tag2Child1>
<Tag2Child1Child1>SomeValue1</Tag2Child1Child1>
<Tag2Child1Child2>SomeValue2</Tag2Child1Child2>
<Tag2Child1Child3>SomeValue3</Tag2Child1Child3>
<Tag2Chidl1Child4>SomeValue4</Tag2Child1Child4>
<Tag2Child1>
<Tag2Child2>
<Tag2Child2Child1>
<Tag2Child2Child1Child1 />
<Tag2Child2Child1Child2 />
</Tag2Child2>
</Tag2>
</SomeXML>
In the above XML, I had no way of knowing (prior to parsing) if any of the children elements were empty, so I after some searching and fiddling I came up with the following extension method:
public static XElement GetChildFromPath(this XElement currentElement, List<string> elementNames, int position = 0)
{
if (currentElement == null || !currentElement.HasElements)
{
return currentElement;
}
if (position == elementNames.Count - 1)
{
return currentElement.Element(elementNames[position]);
}
else
{
XElement nextElement = currentElement.Element(elementNames[position]);
return GetChildFromPath(nextElement, elmenentNames, position + 1);
}
}
Basically, the method takes the XElement its called on, plus a List<string> of the elements in path order, with the one I want as the last one, and a position (index in the list), and then works it way down the path until it finds the element in question or runs out of elements in the path. It's not as elegant as I would like it to be, but I haven't had time to refactor it any.
I would use it like this (based on the sample XML above):
MyClass myObj = (from x in XDocument.Parse(myXML).Descendants("SomeXML")
select new MyClass() {
Tag1Child1 = (string)x.GetChildFromPath(new List<string>() {
"Tag1", "Tag1Child1" }),
Tag2Child1Child4 = (string)x.GetChildFromPath(new List<string>() {
"Tag2", "Tag2Child1", "Tag2Child1Child4" }),
Tag2Child2Child1Child2 = (string)x.GetChildFromPath(new List<string>() {
"Tag2", "Tag2Child2", "Tag2Child2Child1",
"Tag2Child2Child1Child2" })
}).SingleOrDefault();
Not as elegant as I'd like it to be, but at least it allows me to parse an XML document that may have missing nodes without blowing chunks. Another option was to do something like:
Tag2Child2Child1Child1 = x.Element("Tag2") == null ?
"" : x.Element("Tag2Child2") == null ?
"" : x.Element("Tag2Child2Child1") == null ?
"" : x.Element("Tag2Child2Child1Child2") == null ?
"" : x.Element("Tag2")
.Element("Tag2Child2")
.Element("Tag2Child2Child1")
.Element("Tag2Child2Child1Child2").Value
That would get really ugly for an object that had dozens of properties.
Anyway, if this is of use to you feel free to use/adapt/modify as you need.

Parsing XML: NullReferenceException for Variable Elements

I'm getting a text string from a website and parsing it into an XDocument. I'm looking to feed the value of certain elements into a very simple object (named NWSevent). My problem is that the original string changes and the XML tree varies; sometimes there are numerous events, up to 40, sometimes there is only one, and sometimes there is only one that does not have all the characteristics. If there are no alerts, then the "event" element has a title, but no areaDesc, summary, or severity.
I have two constructors for NWSevent, one takes in a single string, the other takes in four string arguments. I'm having trouble getting around a NullReferenceException. The if statement below can't do it because there is no value to compare. I'd appreciate any help.
public static void ParseWeatherData(String xmlString)
{
String ticker = string.Empty;
XDocument root = XDocument.Parse(xmlString);
XNamespace ns = XNamespace.Get("http://www.w3.org/2005/Atom");
XNamespace nsCap = XNamespace.Get("urn:oasis:names:tc:emergency:cap:1.1");
//get list of entry elements, set conditions for title, areaDesc, etc
var xlist = root.Descendants(ns + "entry").Select(elem => new
{ //use first or default to deal with possiblity of null return
Title = elem.Descendants(ns + "title").FirstOrDefault(),
AreaDesc = elem.Descendants(nsCap + "areaDesc").FirstOrDefault(),
Severity = elem.Descendants(nsCap + "severity").FirstOrDefault(),
Summary = elem.Descendants(ns + "summary").FirstOrDefault()
});
foreach (var el in xlist) //need to address null values when no alerts
{
if (el.AreaDesc.Value != null) //causes yellow null ERROR; no value exists for el.areaDesc.value
{
String titleIn = el.Title.Value;
String areaIn = el.AreaDesc.Value;
String severityIn = el.Severity.Value;
String summaryIn = el.Summary.Value;
new Models.NWSevent(titleIn, areaIn, severityIn, summaryIn);
}
else
{
String titleIn = el.Title.Value;
new Models.NWSevent(titleIn);
}
}

Embarassing! Props to Dweeberly for pointing it out. I just need to change the if statement from
if (el.AreaDesc.Value != null){}
to if (el.AreaDesc != null){}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing an UN XML file in C# - c#

Related

Getting list of Variables of map in BPM Metastorm

C# Linq get descendants on a subquery

Best way to put dash for empty field in C#

XML linq need detail info on exception

Parsing XML: NullReferenceException for Variable Elements

Categories

Resources