linq distinct or group by multiple properties

linq distinct or group by multiple properties - c#

How can I using c# and Linq to get a result from the next list:
var pr = new List<Product>()
{
new Product() {Title="Boots",Color="Red", Price=1},
new Product() {Title="Boots",Color="Green", Price=1},
new Product() {Title="Boots",Color="Black", Price=2},
new Product() {Title="Sword",Color="Gray", Price=2},
new Product() {Title="Sword",Color="Green",Price=2}
};
Result:
{Title="Boots",Color="Red", Price=1},
{Title="Boots",Color="Black", Price=2},
{Title="Sword",Color="Gray", Price=2}
I know that I should use GroupBy or Distinct, but understand how to get what is needed
List<Product> result = pr.GroupBy(g => g.Title, g.Price).ToList(); //not working
List<Product> result = pr.Distinct(...);
Please help

It's groups by needed properties and select:
List<Product> result = pr.GroupBy(g => new { g.Title, g.Price })
.Select(g => g.First())
.ToList();

While a new anonymous type will work, it might make more sense, be more readable, and consumable outside of your method to either create your own type or use a Tuple. (Other times it may simply suffice to use a delimited string: string.Format({0}.{1}, g.Title, g.Price))
List<Product> result = pr.GroupBy(g => new Tuple<string, decimal>(g.Title, g.Price))
.ToList();
List<Product> result = pr.GroupBy(g => new ProductTitlePriceGroupKey(g.Title, g.Price))
.ToList();
As for getting the result set you want, the provided answer suggests just returning the first, and perhaps that's OK for your purposes, but ideally you'd need to provide a means by which Color is aggregated or ignored.
For instance, perhaps you'd rather list the colors included, somehow:
List<Product> result = pr
.GroupBy(g => new Tuple<string, decimal>(g.Title, g.Price))
.Select(x => new Product()
{
Title = x.Key.Item1,
Price = x.Key.Item2,
Color = string.Join(", ", x.Value.Select(y => y.Color) // "Red, Green"
})
.ToList();
In the case of a simple string property for color, it may make sense to simply concatenate them. If you had another entity there, or simply don't want to abstract away that information, perhaps it would be best to have another entity altogether that has a collection of that entity type. For instance, if you were grouping on title and color, you might want to show the average price, or a range of prices, where simply selecting the first of each group would prevent you from doing so.
List<ProductGroup> result = pr
.GroupBy(g => new Tuple<string, decimal>(g.Title, g.Price))
.Select(x => new ProductGroup()
{
Title = x.Key.Item1,
Price = x.Key.Item2,
Colors = x.Value.Select(y => y.Color)
})
.ToList();

If you want to abstract away some of the logic into a reusable extension method, you can add the following:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (!seenKeys.Contains(keySelector(element)))
{
seenKeys.Add(keySelector(element));
yield return element;
}
}
}
This will work for both single properties and composite properties and return the first
matching element
// distinct by single property
var productsByTitle = animals.DistinctBy(a => a.Title);
// distinct by multiple properties
var productsByTitleAndColor = animals.DistinctBy(a => new { a.Title, a.Color} );
One benefit to this approach (instead of group by + first) is you can return a yieldable enumerable in case you have later criteria that don't force you to loop through the entire collection
Further Reading: linq query to return distinct field values from a list of objects

Related

Separate collection on 2 different NEW collections

I have collection of elements and one additional small collection as filter.
I need to separate it on 2 new collections by some filter. In my case it is first collection that contains some elements and another that doesn't.
There aren't items that doesn't exists out of that 2 new collections.
I did it like :
var collection1= baseCollection.Where(r => filterCollection.Contains(r.Property)).ToList();
var collection2= baseCollection.Where(r => !filterCollection.Contains(r.Property)).ToList();
But is there another, I hope more elegant way, to separate collection?
For me it looks like "I repeat myself", use almost the same code 2 times.

You can create a variable for the function - this way you will not "repeat yourself" (wouldn't use in this case, there are better options below, but still an option):
Func<YourClass,bool> filtering = (r) => filterCollection.Contains(r.Property);
var collection1 = baseCollection.Where(r => filtering(r));
var collection2 = baseCollection.Where(r => !filtering(r));
If your type of the collection overrides Equals and GetHashCode you can use Except:
var collection1 = baseCollection.Where(r => filterCollection.Contains(r.Property));
var collection2 = baseCollection.Except(collection1);
Using Except with a given IEqualityComparer (Check also first comment for guidlines):
public class Comparer : IEqualityComparer<YourClass>
{
public bool Equals(YourClass x, YourClass y)
{
// Your implementation
}
public int GetHashCode(YourClass obj)
{
// Your implementation
}
}
var collection1 = baseCollection.Where(r => filterCollection.Contains(r.Property));
var collection2 = baseCollection.Except(collection1, new Comparer());
You can also use GroupBy (probably less good performance wise):
var result baseCollection.GroupBy(r => filterCollection.Contains(r.Property))
.ToDictionary(key => key.Key, value => value.ToList());
var collection1 = result[true];
var collection2 = result[false];
Otherwise another way will just to use a loop:
List<YourType> collection1 = new List<YourType>();
List<YourType> collection2 = new List<YourType>();
foreach(var item in baseCollection)
{
if(filterCollection.Contains(item.Property))
{
collection1.Add(item);
}
else
{
collection2.Add(item);
}
}

Sort in-memory list by another in-memory list

Is possible to sort an in-memory list by another list (the second list would be a reference data-source or something like this) ?
public class DataItem
{
public string Name { get; set; }
public string Path { get; set; }
}
// a list of Data Items, randomly sorted
List<DataItem> dataItems = GetDataItems();
// the sort order data source with the paths in the correct order
IEnumerable<string> sortOrder = new List<string> {
"A",
"A.A1",
"A.A2",
"A.B1"
};
// is there a way to tell linq to sort the in-memory list of objects
// by the sortOrder "data source"
dataItems = dataItems.OrderBy(p => p.Path == sortOrder).ToList();

First, lets assign an index to each item in sortOrder:
var sortOrderWithIndices = sortOrder.Select((x, i) => new { path = x, index = i });
Next, we join the two lists and sort:
var dataItemsOrdered =
from d in dataItems
join x in sortOrderWithIndices on d.Path equals x.path //pull index by path
orderby x.index //order by index
select d;
This is how you'd do it in SQL as well.

Here is an alternative (and I argue more efficient) approach to the one accepted as answer.
List<DataItem> dataItems = GetDataItems();
IDictionary<string, int> sortOrder = new Dictionary<string, int>()
{
{"A", int.MaxValue},
{"A.A1", int.MaxValue-1},
{"A.A2", int.MaxValue -2},
{"A.B1", int.MaxValue-3},
};
dataItems.Sort((di1, di2) => sortOrder[di1.Path].CompareTo(sortOrder[di2.Path]));
Let's say Sort() and OrderBy() both take O(n*logn), where n is number of items in dataItems. The solution given here takes O(n*logn) to perform the sort. We assume the step required to create the dictionary sortOrder has a cost not significantly different from creating the IEnumerable in the original post.
Doing a join and then sorting the collection, however adds an additional cost O(nm) where m is number of elements in sortOrder. Thus the total time complexity for that solution comes to O(nm + nlogn).
In theory, the approach using join may boil down to O(n * (m + logn)) ~= O(n*logn) any way. But in practice, join is costing extra cycles. This is in addition to possible extra space complexity incurred in the linq approach where auxiliary collections might have been created in order to process the linq query.

If your list of paths is large, you would be better off performing your lookups against a dictionary:
var sortValues = sortOrder.Select((p, i) => new { Path = p, Value = i })
.ToDictionary(x => x.Path, x => x.Value);
dataItems = dataItems.OrderBy(di => sortValues[di.Path]).ToList();

custom ordering is done by using a custom comparer (an implementation of the IComparer interface) that is passed as the second argument to the OrderBy method.

C# Sorting list by another list

I have now 2 lists:
list<string> names;
list<int> numbers;
and I need to sort my names based on the values in numbers.
I've been searching, and most use something like x.ID, but i don't really know what that value is. So that didn't work.
Does anyone know, what to do, or can help me out in the ID part?

So i assume that the elements in both lists are related through the index.
names.Select((n, index) => new { Name = n, Index = index })
.OrderBy(x => numbers.ElementAtOrDefault(x.Index))
.Select(x => x.Name)
.ToList();
But i would use another collection type like Dictionary<int,string> instead if both lists are related insomuch.

Maybe this is a task for the Zip method. Something like
names.Zip(numbers, (name, number) => new { name, number, })
will "zip" the two sequences into one. From there you can either order the sequence immediately, like
.OrderBy(a => a.number)
or you can instead create a Dictionary<,>, like
.ToDictionary(a => a.number, a => a.name)
But it sounds like what you really want is a SortedDictionary<,>, not a Dictionary<,> which is organized by hash codes. There's no LINQ method for creating a sorted dictionary, but just say
var sorted = new SortedDictionary<int, string>();
foreach (var a in zipResultSequence)
sorted.Add(a.number, a.name);
Or alternatively, with a SortedDictionary<,>, skip Linq entirely, an go like:
var sorted = new SortedDictionary<int, string>();
for (int idx = 0; idx < numbers.Count; ++idx) // supposing the two list have same Count
sorted.Add(numbers[idx], names[idx]);

To complement Tims answer, you can also use a custom data structure to associate one name with a number.
public class Person
{
public int Number { get; set; } // in this case you could also name it ID
public string Name { get; set; }
}
Then you would have a List<Person> persons; and you can sort this List by whatever Attribute you like:
List<Person> persons = new List<Person>();
persons.Add(new Person(){Number = 10, Name = "John Doe"});
persons.Add(new Person(){Number = 3, Name = "Max Muster"});
// sort by number
persons = persons.OrderBy(p=>p.Number).ToList();
// alternative sorting method
persons.Sort((a,b) => a.Number-b.Number);

I fixed it by doing it with an dictionary, this was the result:
dictionary.OrderBy(kv => kv.Value).Reverse().Select(kv => kv.Key).ToList();

LINQ 'in' clause for child object properties

With the following object hierarchy, I need to confirm whether or not all string Id values are present in Inventories of each SearchResult e.g.
Given a string[] list = { "123", "234", "345" } confirm all list values are present at least once in the array of Inventory elements. I'm curious if I can clean this up using one LINQ statement.
SearchResult
--
Inventory[] Inventories
Inventory
--
String Id
Right now, I'm splitting list e.g.
list.Split(').ToDictionary(i => i.ToString(), i => false)
And iterating the dictionary, testing each Inventory. Then, I create a new List<SearchResult> and add items if there are no false values left in the dictionary. This feels clunky.
Code
// instock: IEnumerable<SearchResult>
foreach (var result in instock)
{
Dictionary<string, bool> ids = list.Split(',').ToDictionary(i => i.ToString(), i => false);
foreach (var id in ids)
if (result.Inventory.Any(i => i.Id == id.Key))
ids[id.Key] = true;
if (!ids.Any(i => i.Value == false))
// instockFiltered: List<SearchResult>
instockFiltered.Add(result);
}

Here is a bit of code I wrote. The advantage here is that it uses a hash map, so it has theoretically linear complexity.
public static bool ContainsAll<T>(this IEnumerable<T> superset, IEnumerable<T> subset, IEqualityComparer<T> comparer)
{
var set = new HashSet<T>(superset, comparer);
return set.IsSupersetOf(subset);
}

This bit of LINQ will iterate over the entire stock and then interrogate the inventory (if it's not null) and find inventory that contain one of the values in your list.
var matches = instock.Where(stock => stock.Inventory != null && stock.Inventory.All(i => list.Contains(i.Id));

Average extension method in Linq for default value

Anyone know how I can set a default value for an average? I have a line like this...
dbPlugins = (from p in dbPlugins
select new { Plugin = p, AvgScore = p.DbVersions.Average(x => x.DbRatings.Average(y => y.Score)) })
.OrderByDescending(x => x.AvgScore)
.Select(x => x.Plugin).ToList();
which throws an error becase I have no ratings yet. If I have none I want the average to default to 0. I was thinking this should be an extension method where I could specify what the default value should be.

There is: DefaultIfEmpty.
I 'm not sure about what your DbVersions and DbRatings are and which collection exactly has zero items, but this is the idea:
var emptyCollection = new List<int>();
var average = emptyCollection.DefaultIfEmpty(0).Average();
Update: (repeating what's said in the comments below to increase visibility)
If you find yourself needing to use DefaultIfEmpty on a collection of class type, remember that you can change the LINQ query to project before aggregating. For example:
class Item
{
public int Value { get; set; }
}
var list = new List<Item>();
var avg = list.Average(item => item.Value);
If you don't want to/can not construct a default Item with Value equal to 0, you can project to a collection of ints first and then supply a default:
var avg = list.Select(item => item.Value).DefaultIfEmpty(0).Average();

My advice would to create a reusable solution instead of a solution for this problem only.
Make an extension method AverageOrDefault, similar to FirstOrDefault. See extension methods demystified
public static class MyEnumerableExtensions
{
public static double AverageOrDefault(this IEnumerable<int> source)
{
// TODO: decide what to do if source equals null: exception or return default?
if (source.Any())
return source.Average();
else
return default(int);
}
}
There are 9 overloads of Enumerable.Average, so you'll need to create an AverageOrDefault for double, int?, decimal, etc. They all look similar.
Usage:
// Get the average order total or default per customer
var averageOrderTotalPerCustomer = myDbContext.Customers
.GroupJoin(myDbContext.Orders,
customer => customer.Id,
order => order.CustomerId,
(customer, ordersOfThisCustomer) => new
{
Id = customer.Id,
Name = customer.Name,
AverageOrder = ordersOfThisCustomer.AverageOrDefault(),
});

I don't think there's a way to select default, but how about this query
dbPlugins = (from p in dbPlugins
select new {
Plugin = p, AvgScore =
p.DbVersions.Any(x => x.DbRatings) ?
p.DbVersions.Average(x => x.DbRatings.Average(y => y.Score)) : 0 })
.OrderByDescending(x => x.AvgScore)
.Select(x => x.Plugin).ToList();
Essentially the same as yours, but we first ask if there are any ratings before averaging them. If not, we return 0.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

linq distinct or group by multiple properties - c#

It's groups by needed properties and select: List<Product> result = pr.GroupBy(g => new { g.Title, g.Price }) .Select(g => g.First()) .ToList();

Related

Separate collection on 2 different NEW collections

Sort in-memory list by another in-memory list

C# Sorting list by another list

LINQ 'in' clause for child object properties

Average extension method in Linq for default value

Categories

Resources