Merge Complex Object List using Union / Intersect

Merge Complex Object List using Union / Intersect - c#

Consider two lists of complex objects say :
var first = new List<Record>
{
new Record(1, new List<int> { 2, 3 }),
new Record(4, new List<int> { 5, 6 })
};
var second = new List<Record>
{
new Record(1, new List<int> { 4 })
};
where a Record is defined as below. Nothing fancy, just a class with Id and list of
SecondaryIdentifiers.
public class Record
{
private readonly IList<int> _secondaryIdentifiers;
private readonly int _id;
public Record(int id, IList<int> secondaryIdentifiers)
{
_id = id;
_secondaryIdentifiers = secondaryIdentifiers;
}
public IList<int> SecondaryIdentifiers
{
get { return _secondaryIdentifiers; }
}
public int Id
{
get { return _id; }
}
}
How can I union / interest such that the Union and Intersect operations merge the SecondaryIdentifiers.
var union = first.Union(second);
var intersect = first.Intersect(second);
Union will be
{
new Record(1, new List<int> { 2, 3 , 4 }),
new Record(4, new List<int> { 5, 6 })
};
Intersect will be
{
new Record(1, new List<int> { 2, 3 , 4 }),
};
What I have tried
I tried using a first.Union(second, new EqualityComparer()) where the EqualityComparer extends IEqualityComparer<Record> and merges the two SecondaryIdentifiers if the two items compared are equal, but it seemed a little hacky to me.
Is there a more elegant way of doing this ?

Is there a more elegant way of doing this
It is opinion based but I would do it as:
var union = first.Concat(second)
.GroupBy(x => x.Id)
.Select(g => g.SelectMany(y => y.SecondaryIdentifiers).ToList())
.ToList();
var intersect = first.Concat(second)
.GroupBy(x => x.Id)
.Where(x => x.Count() > 1)
.Select(g => g.SelectMany(y => y.SecondaryIdentifiers).ToList())
.ToList();
PS: Feel free to remove .ToList()s for lazy evaluation.

this should work for the union part:
from a in first
join b in second on a.Id equals b.Id into rGroup
let ids = a.SecondaryIdentifiers.Union(rGroup.SelectMany(r => r.SecondaryIdentifiers))
select new Record(a.Id, ids.ToList())
and the intersect:
from a in first
join b in second on a.Id equals b.Id
select new Record(a.Id, a.SecondaryIdentifiers.Union(b.SecondaryIdentifiers).ToList())

Related

LINQ group by sum not working as expected

I have this class:
public class tempClass
{
public int myKey { get; set; }
public int total { get; set; }
}
Code to group by and sum:
var list = new List<tempClass>();
list.Add(new tempClass { myKey = 1, total = 1 });
list.Add(new tempClass { myKey = 1, total = 2 });
list.Add(new tempClass { myKey = 2, total = 3 });
list.Add(new tempClass { myKey = 2, total = 4 });
list = list
.Select(w => new tempClass { myKey = w.myKey, total = w.total })
.GroupBy(x => new tempClass { myKey = x.myKey })
.Select(y => new tempClass { myKey = y.Key.myKey, total = y.Sum(z => z.total) })
.ToList();
The list count is still 4 after the GroupBy.
Same result for code below:
list = list
.GroupBy(x => new tempClass { myKey = x.myKey })
.Select(y => new tempClass { myKey = y.Key.myKey, total = y.Sum(z => z.total) })
.ToList();

The reason for this is that you group by a class which doesn't override Equals and GetHashCode. Then the implementation of System.Object is used which just compares references. Since all are different references you get one group for every instance.
You could group by this property or override Equals and GetHashCode to compare this property:
list = list
.Select(w => new tempClass { myKey = w.myKey, total = w.total })
.GroupBy(x => x.myKey)
.Select(y => new tempClass { myKey = y.Key, total = y.Sum(z => z.total) })
.ToList();

You don't need two Select lines, one is enough. And inside GroupBy, just select your key, don't create a new object of your class there:
list = list
.GroupBy(x => x.myKey)
.Select(y => new tempClass { myKey = y.Key, total = y.Sum(z => z.total) })
.ToList();
And here's the declarative-query-syntax version:
list = (from x in list
group x by x.myKey into g
select new tempClass { myKey = g.Key, total = g.Sum(z => z.total) }).ToList();

My, you are creating a lot of new TempClass objects in your LINQ statement, don't you?
The reason that you don't get the correct result is that your GroupBy doesn't make groups of TempClass objects with the equal TempClass.MyKey, but with equal TempClass.
The default EqualityComparer for TempClass declares two TempClass objects equal if they are the same object, thus making two TempClass objects unequal, even if they have the same values.
Your query should be:
var result = list
.GroupBy(listItem => listItem.MyKey) // make groups with equal MyKey
.Select(group => new // from every group make one new item
{
Key = group.Key, // with key the common MyKey in the group
GrandTotal = group.Sum(groupItem => groupItem.Total);
// and value the sum of all Total values in the group
});
I chose not to make the final resulting items a sequence of TempClasses, because I'm not sure if you would consider items with this GrandTotal as TempClass objects. But if you want, you could change the final select:
.Select(group => new TempKey()
{
Key = group.Key,
Total = group.Sum(groupItem => groupItem.Total);
});

Make C# ParallelEnumerable.OrderBy stable sort

I'm sorting a list of objects by their integer ids in parallel using OrderBy. I have a few objects with the same id and need the sort to be stable.
According to Microsoft's documentation, the parallelized OrderBy is not stable, but there is an implementation approach to make it stable. However, I cannot find an example of this.
var list = new List<pair>() { new pair("a", 1), new pair("b", 1), new pair("c", 2), new pair("d", 3), new pair("e", 4) };
var newList = list.AsParallel().WithDegreeOfParallelism(4).OrderBy<pair, int>(p => p.order);
private class pair {
private String name;
public int order;
public pair (String name, int order) {
this.name = name;
this.order = order;
}
}

The remarks for the other OrderBy method suggest this approach:
var newList = list
.Select((pair, index) => new { pair, index })
.AsParallel().WithDegreeOfParallelism(4)
.OrderBy(p => p.pair.order)
.ThenBy(p => p.index)
.Select(p => p.pair);

How to sort collection quite specifically by linq

var ids = new int[] { 3, 2, 20, 1 };
var entities = categories.Where(entity => ids.Contains(entity.Id));
I have to sort entities by exactly same like in ids array. How can i do that ?

This should do the trick (written off the top of my head, so may have mistakes)
var ids = new int[] { 3, 2, 20, 1 };
var ordering = ids.Select((id,index) => new {id,index});
var entities =
categories
.Where(entity => ids.Contains(entity.Id))
.AsEnumerable() //line not necessary if 'categories' is a local sequence
.Join(ordering, ent => ent.Id, ord => ord.id, (ent,ord) => new {ent,ord})
.OrderBy(x => x.ord.index)
.Select(x => x.ent)

You could use OrderBy with the index of the Ids in ids.
To get the index of an Id from ids, you could create a map of Id to index. That way you can look up the index in almost constant time, instead of having to call IndexOf and traverse the whole list each time.
Something like this:
var idToIndexMap = ids
.Select((i, v) => new { Index = i, Value = v })
.ToDictionary(
pair => pair.i,
pair => pair.v
);
var sortedEntities = categories
.Where(e => ids.Contains(e.Id))
.ToList() // Isn't necessary if this is Linq-to-Objects instead of entities...
.OrderBy(e => idToIndexMap[e.Id])
;

You may have a go with this:
public class Foo
{
public void Bar()
{
int[] idOrder = new int[] { 3, 2, 20, 1 };
var lookup = idOrder.ToDictionary(i => i,
i => Array.IndexOf(idOrder, i));
foreach(var a in idOrder.OrderBy(i => new ByArrayComparable<int>(lookup, i)))
Console.WriteLine(a);
}
}
public class ByArrayComparable<T> : IComparable<ByArrayComparable<T>> where T : IComparable<T>
{
public readonly IDictionary<T, int> order;
public readonly T element;
public ByArrayComparable(IDictionary<T, int> order, T element)
{
this.order = order;
this.element = element;
}
public int CompareTo(ByArrayComparable<T> other)
{
return this.order[this.element].CompareTo(this.order[other.element]);
}
}
This works for unique elements only, but the lookup efford is constant.

how to get an ordered list with default values using linq

I have an ICollection of records (userID,itemID,rating) and an IEnumerable items
for a specific userID and each itemID from a set of itemIDs, i need to produce a list of the users rating for the items or 0 if no such record exists. the list should be ordered by the items.
example:
records = [(1,1,2),(1,2,3),(2,3,1)]
items = [3,1]
userID = 1
result = [0,2]
my attempt:
dataset.Where((x) => (x.userID == uID) & items.Contains(x.iID)).Select((x) => x.rating);
it does the job but it doesn't return 0 as default value and it isnt ordered...
i'm new to C# and LINQ, a pointer in the correct direction will be very appreciated.
Thank you.

This does the job:
var records = new int[][] { new int[] { 1, 1, 2 }, new int[] { 1, 2, 3 }, new int[] { 2, 3, 1 } };
var items = new int[] { 3, 1 };
var userId = 1;
var result = items.Select(i =>
{
// When there's a match
if (records.Any(r => r[0] == userId && r[1] == i))
{
// Return all numbers
return records.Where(r => r[0] == userId && r[1] == i).Select(r => r[2]);
}
else
{
// Just return 0
return new int[] { 0 };
}
}).SelectMany(r => r); // flatten the int[][] to int[]
// output
result.ToList().ForEach(i => Console.Write("{0} ", i));
Console.ReadKey(true);

How about:
dataset.Where((x) => (x.userID == uID)).Select((x) => items.Contains(x.iID) ? x.rating : 0)

This does the job. But whether it's maintainable/readable solution is topic for another discussion:
// using your example as pseudo-code input
var records = [(1,1,2),(1,2,3),(2,3,1)];
var items = [3,1];
var userID = 1;
var output = items
.OrderByDescending(i => i)
.GroupJoin(records,
i => i,
r => r.ItemId,
(i, r) => new { ItemId = i, Records = r})
.Select(g => g.Records.FirstOrDefault(r => r.UserId == userId))
.Select(r => r == null ? 0 : r.Rating);
How this query works...
ordering is obvious
the ugly GroupJoin - it joins every element from items with all records that share same ItemId into annonymous type {ItemId, Records}
now we select first record for each entry that matches userId - if none is found, null will be returned (thanks to FirstOrDefault)
last thing we do is check whether we have value (we select Rating) or not - 0

How about this. your question sounds bit like an outer join from SQL, and you can do this with a GroupJoin, SelectMany:
var record1 = new Record() { userID = 1, itemID = 1, rating = 2 };
var record2 = new Record() { userID = 1, itemID = 2, rating = 3 };
var record3 = new Record() { userID = 2, itemID = 3, rating = 1 };
var records = new List<Record> { record1, record2, record3 };
int userID = 1;
var items = new List<int> { 3, 1 };
var results = items
.GroupJoin( records.Where(r => r.userID == userID), item => item, record => record.itemID, (item, record) => new { item, ratings = record.Select(r => r.rating) } )
.OrderBy( itemRating => itemRating.item)
.SelectMany( itemRating => itemRating.ratings.DefaultIfEmpty(), (itemRating, rating) => rating);
To explain what is going on
For each item GroupJoin gets the list of rating (or empty list if no rating) for the specified user
OrderBy is obvious
SelectMany flattens the ratings lists, providing a zero if the ratings list is empty (by DefaultIfEmpty)
Hope this makes sense.
Be aware, if there is more than one rating for an item by a user, they will all appear in the final list.

how would i use linq to find the most occured data in a data set?

List<int> a = 11,2,3,11,3,22,9,2
//output
11

This may not be the most efficient way, but it will get the job done.
public static int MostFrequent(IEnumerable<int> enumerable)
{
var query = from it in enumerable
group it by it into g
select new {Key = g.Key, Count = g.Count()} ;
return query.OrderByDescending(x => x.Count).First().Key;
}
And the fun single line version ...
public static int MostFrequent(IEnumerable<int> enumerable)
{
return (from it in enumerable
group it by it into g
select new {Key = g.Key, Count = g.Count()}).OrderByDescending(x => x.Count).First().Key;
}

a.GroupBy(item => item).
Select(group => new { Key = group.Key, Count = group.Count() }).
OrderByDescending(pair => pair.Count).
First().
Key;

Another example :
IEnumerable<int> numbers = new[] { 11, 2, 3, 11, 3, 22, 9, 2 };
int most = numbers
.Select(x => new { Number = x, Count = numbers.Count(y => y == x) })
.OrderByDescending(z => z.Count)
.First().Number;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Merge Complex Object List using Union / Intersect - c#

Related

LINQ group by sum not working as expected

Make C# ParallelEnumerable.OrderBy stable sort

How to sort collection quite specifically by linq

how to get an ordered list with default values using linq

how would i use linq to find the most occured data in a data set?

Categories

Resources