How do I group these XDocuments? - c#

Problem
I have a collection of XDocument instances. Each document has a repeated element that can take a different value. I want to group them by this value but each element can specify a different value.
<sampledoc>
<value>a</value>
<value>b</value>
<value>c</value>
</sampledoc>
Example
Document A has values a, b, c
Document B has values b, c, d
Document C has values a, b
I want a grouping that is:
group a
Document A
Document C
group b
Document A
Document B
Document C
group c
Document A
Document B
group d
Document B
Question
I'm sure I must be able to do this but I can't see the wood for the trees right now.
docs.GroupBy... won't work on it's own (as far as I can tell) because the expression it takes should return a single value, and each document can contain multiple values. My head says a single LINQ query should be possible, but it can't fathom what it would be.
Can this be done using the GroupBy or AsLookup LINQ methods? Is there a way to do this?
I'd prefer examples in C# if anyone would be willing to provide one.
Update
Thanks to the answer from Pavel Minaev and a little inspiration, I solved this as follows:
// Collate all the different values
docs.SelectMany(doc => doc.Elements("Value")
.Select(el => el.Value))
// Remove duplicate values
.Distinct()
// Generate a lookup of specific value to all
// documents that contain that value
.ToLookup(v => v, v => docs.Where(doc => doc.Elements("Value")
.Any(el=>el.Value == v)));

GroupBy won't help you here anyway, because it assigns every element in the sequence to only one group.
var docs = new XDocument[] { docA, docB, docC } ;
var result = docs
.SelectMany(doc => doc.Root.Elements("Value"))
.Select(el => el.Value)
.Distinct()
.Select(key => new {
Key = key,
Documents = docs.Where(doc =>
doc.Root.Elements("Value").Any(el => el.Value == key))
});

Related

Use LINQ to get only the most recent JOINed item for each element

I have a LINQ query:
Elements.Join(ElementStates,
element => element.ElementID,
elementState => elementState.ElementID,
(element , elementState ) => new { element, elementState })
OK, so each Element has an ElementState associated to it. However there can be multiple states for each element for historical purposes, marked by a DateModified column. In this query, I would like to return only the most recent ElementState for each Element.
Is such a thing possible, using LINQ?
EDIT:
Credit to Gilad Green for their helpful answer.
I have converted it to Method syntax for anyone else who would like to see this in the future:
Elements.GroupJoin(ElementStates,
element => element.ElementID,
elementState => elementState.ElementID,
(element, elementState) =>
new { element, elementState = elementState.OrderByDescending(y => y.DateModified).FirstOrDefault() });
You can use GroupJoin instead of Join and then retrieve the first record after ordering the group by the DateModified:
var result = from e in Elements
join es in ElementStates on e.ElementID equals es.ElementID into esj
select new {
Element = e,
State = esj.OrderByDescending(i => i.DateModified).FirstOrDefault()
};
The same can be implemented with method syntax instead of query syntax but in my opinion this is more readable
For the difference between simply joining and group joining: Linq to Entities join vs groupjoin

Separate list of strings into a new list depending on its first letter

I have a list of country names
Afghanistan
Albania
Bahamas, The
Bahrain
Cambodia
Cameroon
.
.
.
What I want to do is separate this list into other lists depending on the first letter.
So basically I want to have a list of countries that begin with a, b, c, ......
So, you have a collection of strings. You can use LINQ to group them and convert them to a Dictionary.
First, you'll need to group them based on the first letter (in this situation, case matters so a and A will be treated differently) using GroupBy(n => n[0]) where n[0] gets the first character in the string.
Second, you'll want to convert the grouping to something that you can use an indexer with. A Dictionary would be perfect. Use ToDictionary(g => g.Key, g => g).
When you string it together, it'll look like:
var dict = names.GroupBy(n => n[0]).ToDictionary(g => g.Key, g => g);
And allow you to get the grouped names using:
foreach(var n in dict['A'])
{
// Print out each country starting with 'A'
Console.WriteLine(n);
}
You can do it like this:
var countriesWithStartingLetterA = countries.Where(x => x.StartsWith("A")).ToList();

How to add a where clause on a linq join (lambda)?

I have two database tables Contact (Id, Name, ...) and ContactOperationalPlaces (ContactId, MunicipalityId), where a contact can be connected to several ContactOperationalPlaces.
What I'm trying to do is to build a query (ASP .NET, C#) with IQueryable, that only selects all the contacts that exists in the ContactOperationalPlaces table, with a given MunicipalityId.
The sql query looks like this:
select * from Contacts c
right join ContactOperationPlaces cop on c.Id = cop.ContactId
where cop.MunicipalityId = 301;
With linq it would look something like this:
//_ctx is the context
var tmp = (from c in _ctx.Contacts
join cop in _ctx.ContactOperationPlaces on c.Id equals cop.ContactId
where cop.MunicipalityId == 301
select c);
So, I know how to do this if the point was to select all of this at once, unfortunately it's not. I'm building a query based on user input, so I don't know all of the selection at once.
So this is what my code looks like:
IQueryable<Contacts> query = (from c in _ctx.Contacts select c);
//Some other logic....
/*Gets a partial name (string nameStr), and filters the contacts
so that only those with a match on names are selected*/
query = query.Where(c => c.Name.Contains(nameStr);
//Some more logic
//Gets the municipalityId and wants to filter on it! :( how to?
query = query.where(c => c.ContactOperationalPlaces ...........?);
The difference with the two where statements is that with the first one, each contact has only one name, but with the latter a contact can contain several operational places...
I have managed to find one solution, but this solution gives me an unidentyfied object, that contains both of the tables. And I don't know how to proceed with it.
query.Join(_ctx.ContactOperationPlaces, c => c.Id, cop => cop.ContactId,
(c, cop) => new {c, cop}).Where(o => o.cop.municipalityId == 301);
The object returned from this expression is System.Linq.Iqueryable<{c:Contact, cop:ContactOperationalPlace}>, and it can't be cast to Contacts...
So, that's the issue. The answer is probably pretty simple, but I just can't find it...
You create an anonymous type with both objects before your where clause and filter it on ContactOperationPlaces value. You just have to select the Contact after that.
query.Join(_ctx.ContactOperationPlaces, c => c.Id, cop => cop.ContactId,
(c, cop) => new {c, cop}).Where(o => o.cop.municipalityId == 301)
.Select(o => o.c)
.Distinct();
You don't need to return new objects in the result selector function. The delegate provides both variables so you can choose one or the other, or some other variation (which would require a new object). Try this:
query.Join(_ctx.ContactOperationPlaces, c => c.Id, cop => cop.ContactId,
(c, cop) => c).Where(o => o.cop.municipalityId == 301);
can you just cast it to var and try to use intellisense on it?
var myCast = query.Join(_ctx.ContactOperationPlaces, c => c.Id, cop => cop.ContactId,
(c, cop) => new {c, cop}).Where(o => o.cop.municipalityId == 301);
Just a thought
I think it would be much easier if you start this as 2 different queries, then combine them. I'm assuming the relation is Contact (1 <-> many) Contactoperationplaces ? And in the end, you will be showing 1 item per Contactoperationplaces, not 1 item per Contact?
Do it like this:
IQueryable<Contacts> query = (from c in _ctx.Contacts select c);
...
query = query.Where(x=> x.Name.ToLower().Contains(nameStr.ToLower());
...
IQueryable<ContactOperationPlaces> query_2 =
(from c in _ctx.ContactOperationPlaces
where query.Where(x=> x.Name == c.Contact.Name).Count() > 0
select c);
//Now query_2 contains all contactoperationsplaces which have a contact that was found in var query
Conversely, there is a much easier way to do this, and that's by skipping the first part entirely.
IQueryable<ContactOperationPlaces> query_2 =
(from c in _ctx.ContactOperationPlaces
where c.Contact.Name.ToLower().Contains(strName.ToLower())
select c);
If you're using Entity Framework, you don't have to do any joins as long as you defined associations between the tables.
Now that I look at it, my second solution is far more efficient and easier. But if you need to do some other processing inbetween these commands, solution one works too :)
If you need more explanation, feel free to ask :)

LINQ: grouping based on property in sublist

I'am trying to use LINQ to create a grouped list of documents based on metadata which is a list on the document.
Below is how my object structure looks:
List<Document>
--> List<Metadata>
--> Metadata has a name and a value property.
I want to group the documents based on an metadata tag which has a name: ID and group them where the values for the ID property are the same.
I tried it like this:
var x = response.Document
.GroupBy(d => d.Metadata.Where(dc => dc.Name == DocProperty.ID)
.Select(dc => dc.Value));
This results in a list of single documents, but not grouped on ID.
Also thought about selecting a distinct list of ID's and then loop through the document list and find documents that match the ID. That one seems like a lot of overhead, because for every ID in the distinct list i have to go every time into the metadata list and find the documents and have to extra checks for multiple items found, get the property i need etc.
Anyone has a good idea about how to get this thing working?
var x = from doc in source
from meta in doc.Metadata
where meta.Name == DocProperty.Id
group doc by meta.Value;
Or (comments) as fluent notation:
var y = source
.SelectMany(doc => doc.Metadata, (doc, meta) => new { doc, meta })
.Where(pair => pair.meta.Name == DocProperty.Id)
.GroupBy(pair => pair.meta.Value, pair => pair.doc);

LINQ: Doing an order by!

i have some Linq to Entity code like so:
var tablearows = Context.TableB.Include("TableA").Where(c => c.TableBID == 1).Select(c => c.TableA).ToList();
So i'm returning the results of TableA with TableB.TableBID = 1
That's all good
Now how can I sort TableA by one of its column? There is a many to many relation ship between the two tables
I tried various ways with no look, for example
var tablearows = Context.TableB.Include("TableA").Where(c => c.TableBID == 1).Select(c => c.TableA).OrderBy(p => p.ColumnToSort).ToList();
In the above case when i type "p." i don't have access to the columns from TableA, presumably because it's a collection of TableA objects, not a single row
How about using SelectMany instead of Select :
var tablearows = Context.TableB.Include("TableB")
.Where(c => c.TableBID == 1)
.SelectMany(c => c.TableA)
.OrderBy(p => p.ColumnToSort)
.ToList();
EDIT :
The expression below returns collection of TableAs -every element of the collection is an instance of TableA collection not TableA instance- (that's why you can't get the properties of the TableA) :
var tablearows = Context.TableB.Include("TableB")
.Where(c => c.TableBID == 1)
.Select(c => c.TableA);
If we turn the Select to SelectMany, we get the result as one concatenated collection that includes elements :
var tablearows = Context.TableB.Include("TableB")
.Where(c => c.TableBID == 1)
.SelectMany(c => c.TableA);
Okay, so now I've taken on board that there's a many to many relationship, I think Canavar is right - you want a SelectMany.
Again, that's easier to see in a query expression:
var tableARows = from rowB in Context.TableB.Include("TableA")
where rowB.TableBID == 1
from rowA in rowB.TableA
orderby rowA.ColumnToSort
select rowA;
The reason it didn't work is that you've got a different result type. Previously, you were getting a type like:
List<EntitySet<TableA>>
(I don't know the exact type as I'm not a LINQ to Entities guy, but it would be something like that.)
Now we've flattened all those TableA rows into a single list:
List<TableA>
Now you can't order a sequence of sets by a single column within a row - but you can order a sequence of rows by a column. So basically your intuition in the question was right when you said "presumably because it's a collection of TableA objects, not a single row" - but it wasn't quite clear what you mean by "it".
Now, is that flattening actually appropriate for you? It means you no longer know which B contributed any particular A. Is there only actually one B involved here, so it doesn't matter? If so, there's another option which may even perform better (I really don't know, but you might like to look at the SQL generated in each case and profile it):
var tableARows = Context.TableB.Include("TableA")
.Where(b => b.TableBID == 1)
.Single()
.TableA.OrderBy(a => a.ColumnToSort)
.ToList();
Note that this will fail (or at least would in LINQ to Objects; I don't know exactly what will happen in entities) if there isn't a row in table B with an ID of 1. Basically it selects the single row, then selects all As associated with that row, and orders them.

Categories

Resources