Improving Linq-XML to Object query - c#

I want to use Linq to extract data from an XML document and place it into a list
<Data>
<FlightData DTS="20110216 17:17" flight="1234" origin="CYYZ" dest="CYUL" aircraft="945">
<TLDRequest>
<Airline>ABC</Airline>
<AcReg>C-FABC</AcReg>
<CalcType>T</CalcType>
<OAT>-05</OAT>
<Wind>060/10</Wind>
<Flaps>5</Flaps>
<Switches></Switches>
<Runways>
<Rwy>6L</Rwy>
<Rwy>6R</Rwy>
</Runways>
...
</TLDRequest>
...
</FlightData>
</Data>
My Linq code in C# works - I can get attributes from the FlightData tab, but I think it could be more efficient, especially in the area of getting data from the TLDRequest tag. Can I get some insight on using best practices to get to and grab child tags?
public static List<ACARS_Phase> createAcarsPhaseObject(XDocument xDoc)
{
return (from ao in xDoc.Descendants("FlightData")
select new ACARS_Phase
{
FlightDate = DateTime.ParseExact(ao.Attribute("DTS").Value, "yyyyMMdd HH:mm", new CultureInfo("en-CA")),
FlightNumber = ao.Attribute("flight").Value,
Origin = ao.Attribute("origin").Value,
Destination = ao.Attribute("dest").Value,
InternalFinNumber = ao.Attribute("aircraft").Value,
OperatorCode = ao.Element("TLDRequest").Element("Airline").Value,
RegistrationNumber = ao.Element("TLDRequest").Element("AcReg").Value,
Wind = ao.Element("TLDRequest").Element("Wind").Value,
Flaps = ao.Element("TLDRequest").Element("Flaps").Value,
OAT = ao.Element("TLDRequest").Element("OAT").Value,
}).ToList();
}
Best regards

Your query is fine, generally speaking. If you want to cut down on some of the redundancy, consider using let to get the TLDRequest element once, so you repeat yourself a bit less.
return (from ao in xDoc.Descendants("FlightData")
let request = ao.Element("TLDRequest")
select new AcARS_Phase
{
// stuff
OperatorCode = request.Element("Airline").Value,
RegistrationNumber = request.Element("AcReg").Value,
// etc.
}).ToList();

Related

C# Linq get descendants on a subquery

I've been banging my head on the desktop for the past couple of hours trying to decipher this issue.
I'm trying to query an XML file with Linq, the xml has the following format:
<MRLGroups>
<MRLGroup>
<MarketID>6084</MarketID>
<MarketName>European Union</MarketName>
<ActiveIngredientID>28307</ActiveIngredientID>
<ActiveIngredientName>2,4-DB</ActiveIngredientName>
<IndexCommodityID>59916</IndexCommodityID>
<IndexCommodityName>Cucumber</IndexCommodityName>
<ScientificName>Cucumis sativus</ScientificName>
<MRLs>
<MRL>
<PublishedCommodityID>60625</PublishedCommodityID>
<PublishedCommodityName>Cucumbers</PublishedCommodityName>
<MRLTypeID>238</MRLTypeID>
<MRLTypeName>General</MRLTypeName>
<DeferredToMarketID>6084</DeferredToMarketID>
<DeferredToMarketName>European Union</DeferredToMarketName>
<UndefinedCommodityLinkInd>false</UndefinedCommodityLinkInd>
<MRLValueInPPM>0.0100</MRLValueInPPM>
<ResidueDefinition>2,4-DB</ResidueDefinition>
<AdditionalRegulationNotes>Comments.</AdditionalRegulationNotes>
<ExpiryDate xsi:nil="true" />
<PrimaryInd>true</PrimaryInd>
<ExemptInd>false</ExemptInd>
</MRL>
<MRL>
<PublishedCommodityID>60626</PublishedCommodityID>
<PublishedCommodityName>Gherkins</PublishedCommodityName>
<MRLTypeID>238</MRLTypeID>
<MRLTypeName>General</MRLTypeName>
<DeferredToMarketID>6084</DeferredToMarketID>
<DeferredToMarketName>European Union</DeferredToMarketName>
<UndefinedCommodityLinkInd>false</UndefinedCommodityLinkInd>
<MRLValueInPPM>0.0100</MRLValueInPPM>
<ResidueDefinition>2,4-DB</ResidueDefinition>
<AdditionalRegulationNotes>More Comments.</AdditionalRegulationNotes>
<ExpiryDate xsi:nil="true" />
<PrimaryInd>false</PrimaryInd>
<ExemptInd>false</ExemptInd>
</MRL>
</MRLs>
</MRLGroup>
So far i've created classes for the "MRLGroup" section of the file
var queryMarket = from market in doc.Descendants("MRLGroup")
select new xMarketID
{
MarketID = Convert.ToString(market.Element("MarketID").Value),
MarketName = Convert.ToString(market.Element("MarketName").Value)
};
List<xMarketID> markets = queryMarket.Distinct().ToList();
var queryIngredient = from ingredient in doc.Descendants("MRLGroup")
select new xActiveIngredients
{
ActiveIngredientID = Convert.ToString(ingredient.Element("ActiveIngredientID").Value),
ActiveIngredientName = Convert.ToString(ingredient.Element("ActiveIngredientName").Value)
};
List<xActiveIngredients> ingredientes = queryIngredient.Distinct().ToList();
var queryCommodities = from commodity in doc.Descendants("MRLGroup")
select new xCommodities {
IndexCommodityID = Convert.ToString(commodity.Element("IndexCommodityID").Value),
IndexCommodityName = Convert.ToString(commodity.Element("IndexCommodityName").Value),
ScientificName = Convert.ToString(commodity.Element("ScientificName").Value)
};
List<xCommodities> commodities = queryCommodities.Distinct().ToList();
After i got the "catalogues" I'm trying to query the document against the catalogues to achieve some sort of "groups", after all this, i'm going to send this data to the database, the issue here is that the xml files are around 600MB each and i get the everyday, so my approach is to create catalogues and just send the MRLs to the database joined to the "header" table that contains the Catalogues IDs, here's what i've done so far but failed miserably:
//markets
foreach (xMarketID market in markets) {
//ingredients
foreach (xActiveIngredients ingredient in ingredientes) {
//commodities
foreach (xCommodities commodity in commodities) {
var mrls = from m in doc.Descendants("MRLGroup")
where Convert.ToString(m.Element("MarketID").Value) == market.MarketID
&& Convert.ToString(m.Element("ActiveIngredientID").Value) == ingredient.ActiveIngredientID
&& Convert.ToString(m.Element("IndexCommodityID").Value) == commodity.IndexCommodityID
select new
{
ms = new List<xMRLIndividial>(from a in m.Element("MRLs").Descendants()
select new xMRLIndividial{
publishedCommodityID = string.IsNullOrEmpty(a.Element("PublishedCommodityID").Value) ? "" : a.Element("PublishedCommodityID").Value,
publishedCommodityName = a.Element("PublishedCommodityName").Value,
mrlTypeId = a.Element("MRLTypeID").Value,
mrlTypeName = a.Element("MRLTypeName").Value,
deferredToMarketId = a.Element("DeferredToMarketID").Value,
deferredToMarketName = a.Element("DeferredToMarketName").Value,
undefinedCommodityLinkId = a.Element("UndefinedCommodityLinkInd").Value,
mrlValueInPPM = a.Element("MRLValueInPPM").Value,
residueDefinition = a.Element("ResidueDefinition").Value,
additionalRegulationNotes = a.Element("AdditionalRegulationNotes").Value,
expiryDate = a.Element("ExpiryDate").Value,
primaryInd = a.Element("PrimaryInd").Value,
exemptInd = a.Element("ExemptInd").Value
})
};
foreach (var item in mrls)
{
Console.WriteLine(item.ToString());
}
}
}
}
If you notice i'm trying to get just the MRLs descendants but i got this error:
All i can reach on the "a" variable is the very first node of MRLs->MRL not all of them, what is going on?
If you guys could lend me a hand would be super!
Thanks in advance.
With this line...
from a in m.Element("MRLs").Descendants()
...will iterate through all sub-elements, including children of children. Hence your error, since your <PublishedCommodityID> element does not have a child element.
Unless you want to specifically return all child elements of all levels, always use the Element and Elements axis instead of Descendant and Descendants:
from a in m.Element("MRLs").Elements()
That should solve your problem.
However, your query is also difficult to read with the nested foreach loops and the multiple tests for the IDs. You can simplify it with a combination of LINQ and XPath:
var mrls =
from market in markets
from ingredient in ingredientes
from commodity in commodities
let xpath = $"/MRLGroups/MRLGroup[{market.MarketId}]" +
$"[ActiveIngredientID={ingredient.ActiveIngredientId}]" +
$"[IndexCommodityID={commodity.IndexCommodityID}]/MRLs/MRL"
select new {
ms =
(from a in doc.XPathSelectElements(xpath)
select new xMRLIndividial {
publishedCommodityID = string.IsNullOrEmpty(a.Element("PublishedCommodityID").Value) ? "" : a.Element("PublishedCommodityID").Value,
publishedCommodityName = a.Element("PublishedCommodityName").Value,
mrlTypeId = a.Element("MRLTypeID").Value,
mrlTypeName = a.Element("MRLTypeName").Value,
deferredToMarketId = a.Element("DeferredToMarketID").Value,
deferredToMarketName = a.Element("DeferredToMarketName").Value,
undefinedCommodityLinkId = a.Element("UndefinedCommodityLinkInd").Value,
mrlValueInPPM = a.Element("MRLValueInPPM").Value,
residueDefinition = a.Element("ResidueDefinition").Value,
additionalRegulationNotes = a.Element("AdditionalRegulationNotes").Value,
expiryDate = a.Element("ExpiryDate").Value,
primaryInd = a.Element("PrimaryInd").Value,
exemptInd = a.Element("ExemptInd").Value
}).ToList()
};

Speeding up a linq query with 40,000 rows

In my service, first I generate 40,000 possible combinations of home and host countries, like so (clientLocations contains 200 records, so 200 x 200 is 40,000):
foreach (var homeLocation in clientLocations)
{
foreach (var hostLocation in clientLocations)
{
allLocationCombinations.Add(new AirShipmentRate
{
HomeCountryId = homeLocation.CountryId,
HomeCountry = homeLocation.CountryName,
HostCountryId = hostLocation.CountryId,
HostCountry = hostLocation.CountryName,
HomeLocationId = homeLocation.LocationId,
HomeLocation = homeLocation.LocationName,
HostLocationId = hostLocation.LocationId,
HostLocation = hostLocation.LocationName,
});
}
}
Then, I run the following query to find existing rates for the locations above, but also include empty the missing rates; resulting in a complete recordset of 40,000 rows.
var allLocationRates = (from l in allLocationCombinations
join r in Db.PaymentRates_AirShipment
on new { home = l.HomeLocationId, host = l.HostLocationId }
equals new { home = r.HomeLocationId, host = (Guid?)r.HostLocationId }
into matches
from rate in matches.DefaultIfEmpty(new PaymentRates_AirShipment
{
Id = Guid.NewGuid()
})
select new AirShipmentRate
{
Id = rate.Id,
HomeCountry = l.HomeCountry,
HomeCountryId = l.HomeCountryId,
HomeLocation = l.HomeLocation,
HomeLocationId = l.HomeLocationId,
HostCountry = l.HostCountry,
HostCountryId = l.HostCountryId,
HostLocation = l.HostLocation,
HostLocationId = l.HostLocationId,
AssigneeAirShipmentPlusInsurance = rate.AssigneeAirShipmentPlusInsurance,
DependentAirShipmentPlusInsurance = rate.DependentAirShipmentPlusInsurance,
SmallContainerPlusInsurance = rate.SmallContainerPlusInsurance,
LargeContainerPlusInsurance = rate.LargeContainerPlusInsurance,
CurrencyId = rate.RateCurrencyId
});
I have tried using .AsEnumerable() and .AsNoTracking() and that has sped things up quite a bit. The following code shaves several seconds off of my query:
var allLocationRates = (from l in allLocationCombinations.AsEnumerable()
join r in Db.PaymentRates_AirShipment.AsNoTracking()
But, I am wondering: How can I speed this up even more?
Edit: Can't replicate foreach functionality in linq.
allLocationCombinations = (from homeLocation in clientLocations
from hostLocation in clientLocations
select new AirShipmentRate
{
HomeCountryId = homeLocation.CountryId,
HomeCountry = homeLocation.CountryName,
HostCountryId = hostLocation.CountryId,
HostCountry = hostLocation.CountryName,
HomeLocationId = homeLocation.LocationId,
HomeLocation = homeLocation.LocationName,
HostLocationId = hostLocation.LocationId,
HostLocation = hostLocation.LocationName
});
I get an error on from hostLocation in clientLocations which says "cannot convert type IEnumerable to Generic.List."
The fastest way to query a database is to use the power of the database engine itself.
While Linq is a fantastic technology to use, it still generates a select statement out of the Linq query, and runs this query against the database.
Your best bet is to create a database View, or a stored procedure.
Views and stored procedures can easily be integrated into Linq.
Material Views ( in MS SQL ) can further speed up execution, and missing indexes are by far the most effective tool in speeding up database queries.
How can I speed this up even more?
Optimizing is a bitch.
Your code looks fine to me. Make sure to set the index on your DB schema where it's appropriate. And as already mentioned: Run your Linq against SQL to get a better idea of the performance.
Well, but how to improve performance anyway?
You may want to have a glance at the following link:
10 tips to improve LINQ to SQL Performance
To me, probably the most important points listed (in the link above):
Retrieve Only the Number of Records You Need
Turn off ObjectTrackingEnabled Property of Data Context If Not
Necessary
Filter Data Down to What You Need Using DataLoadOptions.AssociateWith
Use compiled queries when it's needed (please be careful with that one...)

Lambda SQL Query / Manipulate String At Query Or After Result

I'm using C#, EF5, and Lambda style queries against SQL.
I have the usual scenario of binding data to gridviews. Some of the results for my columns may be too long (character count) and so I only want to display the first 'n' characters. Let's say 10 characters for this example. When I truncate a result, I'd like to indicate this by appending "...". So, let's say the following last names are returned:
Mercer, Smith, Garcia-Jones
I'd like them to be returned like this:
Mercer, Smith, Garcia-Jon...
I was doing something like this:
using (var context = new iaiEntityConnection())
{
var query = context.applications.Where(c => c.id == applicationPrimaryKey);
var results = query.ToList();
foreach (var row in results)
{
if (row.employerName.Length > 10)
{
row.employerName = row.employerName.Substring(0, Math.Min(10, row.employerName.ToString().Length)) + "...";
}
if (row.jobTitle.Length > 10)
{
row.jobTitle = row.jobTitle.Substring(0, Math.Min(10, row.jobTitle.ToString().Length)) + "...";
}
}
gdvWorkHistory.DataSource = results;
gdvWorkHistory.DataBind();
However, if I change my query to select specific columns like this:
var query2 = context.applications.Select(c => new
{
c.id,
c.applicationCode,
c.applicationCategoryLong,
c.applicationType,
c.renew_certification.PGI_nameLast,
c.renew_certification.PGI_nameFirst,
c.renew_certification.PAI_homeCity,
c.renew_certification.PAI_homeState,
c.reviewStatusUser,
c.dateTimeSubmittedByUser
})
The result appears to become read-only if specific columns are selected, and I really should be selecting just the columns I need. I'm losing my ability to edit the result set.
So, I'm rethinking the entire approach. There must be away to select the first 'n' characters on select, right? Is there anyway to append the "..." if the length is > 10 on select? That seems trickier. Also, I guess I could parse through the gridview after bind and make this adjustment. Or, perhaps there is a way to maintain my ability to edit the result set when selecting specific columns?
I welcome your thoughts. Thanks!
To quote MSDN
Anonymous types provide a convenient way to encapsulate a set of read-only properties into a single object without having to explicitly define a type first.
So you would have to define a class and select into that if you want read write capability.
e.g.
public class MyClass {
public int id { get; set; }
public string applicationCode {get; set; }
// rest of property defintions.
}
var query2 = context.applications.Select(c => new MyClass {
id = c.id,
applicationCode = c.applicationCode,
// Rest of assignments
};
As to just providing 10 character limit with ... appended. I'm going to assume you mean on the applicationcategoryLog field but you can use the same logic on other fields.
var query2 = context.applications.Select(c => new
{
c.id,
c.applicationCode,
applicationCategoryLong = (c.applicationCategoryLong ?? string.Empty).Length <= 10 ?
c.applicationCategoryLong :
c.applicationCategoryLong.Substring(0,10) + "...",
c.applicationType,
c.renew_certification.PGI_nameLast,
c.renew_certification.PGI_nameFirst,
c.renew_certification.PAI_homeCity,
c.renew_certification.PAI_homeState,
c.reviewStatusUser,
c.dateTimeSubmittedByUser
})

Preventing duplicate element access when reading an XML using LINQ

I got an XML that I'm trying to parse using LINQ to XML and convert it to an anonymous list of objects. To do so, I come up with the following code snippet:
var res = doc
.Root
.Elements("Record")
.Elements("Term")
.Select(term => new
{
LanguageCode = term.Attribute("languageCode").Value,
ConceptNumber = Convert.ToInt32(term.Attribute("conceptNumber").Value),
IsHidden = Convert.ToBoolean(term.Attribute("hidden").Value),
Label = term.Value,
InputDate = DateTime.Parse(term.Parent.Element("InputDate").Value),
LastUpdate = DateTime.Parse(term.Parent.Element("LastUpdated").Value)
}).ToList();
Please pay attention to the InputDate & LastUpdate section. As you see, I've to access the parent node (say, term.Parent) so that I can access those 2 elements and this looks messy to me. Is there any way to declare term.Parent once and using it over and over again to extract those properties?
Here's an excerpt of the XML I'm trying to read:
<Record>
<Term languageCode="Prs" conceptNumber="10" hidden="False">Whatever</Term>
<Status>Approved</Status>
<Frequency>0</Frequency>
<InputDate>12/30/1899</InputDate>
<LastUpdate>10/25/2009</LastUpdate>
</Record>
Thank you
You need to use let clause. It creates a new variable inside query and allows to use it multiply times.
In your case it would be
var res = (from term in doc.Root.Elements("Record").Elements("Term")
let parent = term.Parent
select new
{
LanguageCode = term.Attribute("languageCode").Value,
ConceptNumber = Convert.ToInt32(term.Attribute("conceptNumber").Value),
IsHidden = Convert.ToBoolean(term.Attribute("hidden").Value),
Label = term.Value,
InputDate = parent.Element("InputDate").Value,
LastUpdate = parent.Element("LastUpdated").Value
}).ToList();
Note that its a code on a pure LINQ syntax that allow you to express ideas much clearer than using extension methods like in your question.
You can introduce an anonymous class in a projection (This is similar to what let does under the hood) and then use SelectMany() to flatten using the extra properties:
var results = doc.Elements("Record")
.Select( x => new
{
Terms = x.Elements("Term"),
InputDate = DateTime.Parse(x.Element("InputDate").Value),
LastUpdate = DateTime.Parse(x.Element("LastUpdate").Value)
})
.SelectMany(x => x.Terms, (record,term) => new
{
LanguageCode = term.Attribute("languageCode").Value,
..
InputDate = record.InputDate,
LastUpdate = record.LastUpdate
});

selection using LINQ

I have a sample xml file that looks like this:
<Books>
<Category Genre="Fiction" BookName="book_name" BookPrice="book_price_in_$" />
<Category Genre="Fiction" BookName="book_name" BookPrice="book_price_in_$" />
<Category Genre="NonFiction" BookName="book_name" BookPrice="book_price_in_$" />
<Category Genre="Children" BookName="book_name" BookPrice="book_price_in_$" />
</Books>
I need to collect all book names and book prices and pass to some other method. Right now, i get all book names and book prices seperately into two different List<string> using the following command:
List<string>BookNameList = root.Elements("Category").Select(x => (string)x.Attribute("BookName")).ToList();
List<string>BookPriceList = root.Elements("Category").Select(x => (string)x.Attribute("BookPrice")).ToList();
I create a text file and send this back to the calling function (stroing these results in a text file is a requirement, the text file has two fields bookname and bookprice).
To write to text file is use following code:
for(int i = 0; i < BookNameList.Count; i++)
{
//write BookNameList[i] to file
// Write BookPriceList[i] to file
}
I somehow dont feel good about this approach. suppose due to any reason both lists of not same size. Right now i do not take that into account and i feel using foreach is much more efficient (I maybe wrong). Is it possible to read both the entries into a datastructure (having two attributes name and price) from LINQ? then i can easily iterate over the list of that datastructure with foreach.
I am using C# for programming.
Thanks,
[Edit]: Thanks everyone for the super quick responses, i choose the first answer which I saw.
Selecting:
var books = root.Elements("Category").Select(x => new {
Name = (string)x.Attribute("BookName"),
Price = (string)x.Attribute("BookPrice")
}).ToList();
Looping:
foreach (var book in books)
{
// do something with
// book.Name
// book.Price
}
I think you could make it more tidy by some very simple means.
A somewhat simplified example follows.
First define the type Book:
public class Book
{
public Book(string name, string price)
{
Name = name;
Price = price;
}
public string Name { get; set; }
public string Price { get; set; } // could be decimal if we want a proper type.
}
Then project your XML data into a sequence of Books, like so:
var books = from category in root.Elements("Category")
select new Book((string) x.Attribute("BookName"), (string) x.Attribute("BookPrice"));
If you want better efficiency I would advice using a XmlReader and writing to the file on every encountered Category, but it's quite involved compared to your approach. It depends on your requirements really, I don't think you have to worry about it too much unless speed is essential or the dataset is huge.
The streamed approach would look something like this:
using (var outputFile = OpenOutput())
using (XmlReader xml = OpenInput())
{
try
{
while (xml.ReadToFollowing("Category"))
{
if (xml.IsStartElement())
{
string name = xml.GetAttribute("BookName");
string price = xml.GetAttribute("BookPrice");
outputFile.WriteLine(string.Format("{0} {1}", name, price));
}
}
}
catch (XmlException xe)
{
// Parse error encountered. Would be possible to recover by checking
// ReadState and continue, this would obviously require some
// restructuring of the code.
// Catching parse errors is recommended because they could contain
// sensitive information about the host environment that we don't
// want to bubble up.
throw new XmlException("Uh-oh");
}
}
Bear in mind that if your nodes have XML namespaces you must register those with the XmlReader through a NameTable or it won't recognize the nodes.
You can do this with a single query and a foreach loop.
var namesAndPrices = from category in root.Elements("Category")
select new
{
Name = category.Attribute("BookName").Value,
Price = category.Attribute("BookPrice").Value
};
foreach (var nameAndPrice in namesAndPrices)
{
// TODO: Output to disk
}
To build on Jeff's solution, if you need to pass this collection into another function as an argument you can abuse the KeyValuePair data structure a little bit and do something along the lines of:
var namesAndPrices = from category in root.Elements("Category")
select new KeyValuePair<string, string>(
Name = category.Attribute("BookName").Value,
Price = category.Attribute("BookPrice").Value
);
// looping that happens in another function
// Key = Name
// Value = Price
foreach (var nameAndPrice in namesAndPrices)
{
// TODO: Output to disk
}

Categories

Resources