Intersect LINQ query

Intersect LINQ query - c#

If I have an IEnumerable where ClassA exposes an ID property of type long.
Is it possible to use a Linq query to get all instances of ClassA with ID belonging to a second IEnumerable?
In other words, can this be done?
IEnumerable<ClassA> = original.Intersect(idsToFind....)?
where original is an IEnumerable<ClassA> and idsToFind is IEnumerable<long>.

Yes.
As other people have answered, you can use Where, but it will be extremely inefficient for large sets.
If performance is a concern, you can call Join:
var results = original.Join(idsToFind, o => o.Id, id => id, (o, id) => o);
If idsToFind can contain duplicates, you'll need to either call Distinct() on the IDs or on the results or replace Join with GroupJoin (The parameters to GroupJoin would be the same).

I will post an answer using Intersect.
This is useful if you want to intersect 2 IEnumerables of the same type.
First we will need an EqualityComparer:
public class KeyEqualityComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, object> keyExtractor;
public KeyEqualityComparer(Func<T, object> keyExtractor)
{
this.keyExtractor = keyExtractor;
}
public bool Equals(T x, T y)
{
return this.keyExtractor(x).Equals(this.keyExtractor(y));
}
public int GetHashCode(T obj)
{
return this.keyExtractor(obj).GetHashCode();
}
}
Secondly we apply the KeyEqualityComparer to the Intersect function:
var list3= list1.Intersect(list2, new KeyEqualityComparer<ClassToCompare>(s => s.Id));

You can do it, but in the current form, you'd want to use the Where extension method.
var results = original.Where(x => yourEnumerable.Contains(x.ID));
Intersect on the other hand will find elements that are in both IEnumerable's. If you are looking for just a list of ID's, you can do the following which takes advantage of Intersect
var ids = original.Select(x => x.ID).Intersect(yourEnumerable);

A simple way would be:
IEnumerable<ClassA> result = original.Where(a => idsToFind.contains(a.ID));

Use the Where method to filter the results:
var result = original.Where(o => idsToFind.Contains(o.ID));

Naming things is important. Here is an extension method base on the Join operator:
private static IEnumerable<TSource> IntersectBy<TSource, TKey>(
this IEnumerable<TSource> source,
IEnumerable<TKey> keys,
Func<TSource, TKey> keySelector)
=> source.Join(keys, keySelector, id => id, (o, id) => o);
You can use it like this var result = items.IntersectBy(ids, item => item.id).

I've been tripping up all morning on Intersect, and how it doesn't work anymore in core 3, due to it being client side not server side.
From a list of items pulled from a database, the user can then choose to display them in a way that requires children to attached to that original list to get more information.
What use to work was:
itemList = _context.Item
.Intersect(itemList)
.Include(i => i.Notes)
.ToList();
What seems to now work is:
itemList = _context.Item
.Where(item => itemList.Contains(item))
.Include(i => i.Notes)
.ToList();
This seems to be working as expected, without any significant performance difference, and is really no more complicated than the first.

Related

Is there a way to look for a list of ids in a list of parent object type? [duplicate]

I have a list with some identifiers like this:
List<long> docIds = new List<long>() { 6, 1, 4, 7, 2 };
Morover, I have another list of <T> items, which are represented by the ids described above.
List<T> docs = GetDocsFromDb(...)
I need to keep the same order in both collections, so that the items in List<T> must be in the same position than in the first one (due to search engine scoring reasons). And this process cannot be done in the GetDocsFromDb() function.
If necessary, it's possible to change the second list into some other structure (Dictionary<long, T> for example), but I'd prefer not to change it.
Is there any simple and efficient way to do this "ordenation depending on some IDs" with LINQ?

docs = docs.OrderBy(d => docsIds.IndexOf(d.Id)).ToList();

Since you don't specify T,
public static IEnumerable<T> OrderBySequence<T, TId>(
this IEnumerable<T> source,
IEnumerable<TId> order,
Func<T, TId> idSelector)
{
var lookup = source.ToDictionary(idSelector, t => t);
foreach (var id in order)
{
yield return lookup[id];
}
}
Is a generic extension for what you want.
You could use the extension like this perhaps,
var orderDocs = docs.OrderBySequence(docIds, doc => doc.Id);
A safer version might be
public static IEnumerable<T> OrderBySequence<T, TId>(
this IEnumerable<T> source,
IEnumerable<TId> order,
Func<T, TId> idSelector)
{
var lookup = source.ToLookup(idSelector, t => t);
foreach (var id in order)
{
foreach (var t in lookup[id])
{
yield return t;
}
}
}
which will work if source does not zip exactly with order.

Jodrell's answer is best, but actually he reimplemented System.Linq.Enumerable.Join. Join also uses Lookup and keeps ordering of source.
docIds.Join(
docs,
i => i,
d => d.Id,
(i, d) => d);

One simple approach is to zip with the ordering sequence:
List<T> docs = GetDocsFromDb(...).Zip(docIds, Tuple.Create)
.OrderBy(x => x.Item2).Select(x => x.Item1).ToList();

How to avoid OrderByDescending in query linq

I have the linq query:
var ed = db.table
.GroupBy(x => x.Sn)
.Select(g => g.OrderByDescending(x => x.Date).FirstOrDefault());
I need to rewrite this query for server-side evaluation.
My table:
Sn Value Data
150 180.3 01/06/2020
150 195.0 01/05/2020
149 13.3 01/06/2020
345 27.5 27/06/2013
....

.Select(g => g.OrderByDescending(x => x.Date).FirstOrDefault())
is probably just:
.Select(g => g.Max(x => x.Date))
Which the parser probably handles better

You can try using Aggregate
var ed = db.table
.GroupBy(x => x.Sn)
.Select(x => x.Aggregate((max, cur) => max.Date > cur.Date ? max : cur));
This might help you to know more How to use LINQ to select object with minimum or maximum property value.

It depends on whether dt.Table is IQueryable or not.
Normally an IQueryable is to be executed by a different process, quite often a database management system. It that is the case, you'll have to use OrderBy followed by a FirstOrDefault.
Luckily proper database management systems are extremely optimized to sort. If you are not satisfied with the efficiency of the sort, and you don't change your table too often, consider adding an extra index in DbContext.OnModelCreating:
modelBuilder.Entity<Customer>()
.HasIndex(customer => customer.Name)
Your database management system knows this extra index, and can immediately return the element that the last item of the index refers to.
Whenever you change the name, or add a new Customer, the index has to be recreated. So don't do this if you are changing customer names often, like 10 times a second.
If dt.table is not IQueryable, but IEnumerable, OrderBy is relatively slow. Alas there is no Enumerable.Max for you, but you can use on of the overloads of Enumerable.Aggregate.
As you are certain that every group contains at least one element, you can use the overload without Seed:
var result = db.table.GroupBy(x => x.Sn)
.Aggregate( (maxItem, nextitem) =>(nextItem.Date > maxItem.Date) ?? nextItem : maxItem)
If you use this quite often, consider to create an extension method. Creating an extension method is quite easy. See extension methods demystified
public static T MaxOrDefault<T, TProperty> MaxPropertyOrDefault(
this IEnumerable<T> source,
Func<TSource, TProperty> propertySelector)
{
return MaxPropertyOrDefault(source, propertySelector, null)
}
Overload with comparer: if comparer equals null, use default comparer
public static T MaxOrDefault<T, TProperty> MaxPropertyOrDefault(
this IEnumerable<T> source,
Func<TSource, TProperty> propertySelector,
IComparer<TProperty) comparer)
{
// TODO: what to do if source == null?
// TODO: what to do if propertySelector == null?
if (comparer == null) comparer = Comparer<TProperty>.Default();
var enumerator = source.GetEnumerator();
if (!enumerator.MoveNext)
{
// empty source, return default:
return default(T);
}
else
{
TProperty maxPropertyValue = propertySelector(enumerator.Current);
T maxValue = enumerator.Current();
while (enumerator.MoveNext())
{
TProperty currentPropertyValue = propertySelector(enumerator.Current);
if (comparer.Compare(currentPropetyValue, maxPropertyValue) > 0)
{
maxPropertyValue = currentPropertyValue;
maxValue = enumerator.Current;
}
}
return maxValue;
}
}
Usage:
var ed = db.table.GroupBy(x => x.Sn)
.Select(group => group.MaxOrDefault(groupElement => groupElement.Date);

Why doesn't IOrderedEnumerable retain order after where filtering

I've created a simplification of the issue. I have an ordered IEnumerable, I'm wondering why applying a where filter could unorder the objects
This does not compile while it should have the potential to
IOrderedEnumerable<int> tmp = new List<int>().OrderBy(x => x);
//Error Cannot Implicitly conver IEnumerable<int> To IOrderedEnumerable<int>
tmp = tmp.Where(x => x > 1);
I understand that there would be no gaurenteed execution order if coming from an IQueryable such as using linq to some DB Provider.
However, when dealing with Linq To Object what senario could occur that would unorder your objects, or why wasn't this implemented?
EDIT
I understand how to properly order this that is not the question. My Question is more of a design question. A Where filter on linq to objects should enumerate the give enumerable and apply filtering. So why is that we can only return an IEnumerable instead of an IOrderedEnumerable?
EDIT
To Clarify the senario in when this would be userful. I'm building Queries based on conditions in my code, I want to reuse as much code as possible. I have a function that is returning an OrderedEnumerable, however after applying the additional where I would have to reorder this even though it would be in its original ordered state

Rene's answer is correct, but could use some additional explanation.
IOrderedEnumerable<T> does not mean "this is a sequence that is ordered". It means "this is a sequence that has had an ordering operation applied to it and you may now follow that up with a ThenBy to impose additional ordering requirements."
The result of Where does not allow you to follow it up with ThenBy, and therefore you may not use it in a context where an IOrderedEnumerable<T> is required.
Make sense?
But of course, as others have said, you almost always want to do the filtering first and then the ordering. That way you are not spending time putting items into order that you are just going to throw away.
There are of course times when you do have to order and then filter; for example, the query "songs in the top ten that were sung by a woman" and the query "the top ten songs that were sung by a woman" are potentially very different! The first one is sort the songs -> take the top ten -> apply the filter. The second is apply the filter -> sort the songs -> take the top ten.

The signature of Where() is this:
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
So this method takes an IEnumerable<int> as first argument. The IOrderedEnumerable<int> returned from OrderBy implements IEnumerable<int> so this is no problem.
But as you can see, Where returns an IEnumerable<int> and not an IOrderedEnumerable<int>. And this cannot be casted into one another.
Anyway, the object in that sequence will still have the same order. So you could just do it like this
IEnumerable<int> tmp = new List<int>().OrderBy(x => x).Where(x => x > 1);
and get the sequence you expected.
But of course you should (for performance reasons) filter your objects first and sort them afterwards when there are fewer objects to sort:
IOrderedEnumerable<int> tmp = new List<int>().Where(x => x > 1).OrderBy(x => x);

The tmp variable's type is IOrderedEnumerable.
Where() is a function just like any other with a return type, and that return type is IEnumerable. IEnumerable and IOrderedEnumerable are not the same.
So when you do this:
tmp = tmp.Where(x => x > 1);
You are trying to assign the result of a Where() function call, which is an IEnuemrable, to the tmp variable, which is an IOrderedEnumerable. They are not directly compatible, there is no implicit cast, and so the compiler sends you an error.
The problem is you are being too specific with the tmp variable's type. You can make one simple change that will make this all work by being just be a little less specific with your tmp variable:
IEnumerable<int> tmp = new List<int>().OrderBy(x => x);
tmp = tmp.Where(x => x > 1);
Because IOrderedEnumerable inherits from IEnumerable, this code will all work. As long as you don't want to call ThenBy() later on, this should give you exactly the same results as you expect without any other loss of ability to use the tmp variable later.
If you really need an IOrderedEnumerable, you can always just call .OrderBy(x => x) again:
IOrderedEnumerable<int> tmp = new List<int>().OrderBy(x => x);
tmp = tmp.Where(x => x > 1).OrderBy(x => x);
And again, in most cases (not all, but most) you want to get your filtering out of the way before you start sorting. In other words, this is even better:
var tmp = new List<int>().Where(x => x > 1).OrderBy(x => x);

why wasn't this implemented?
Most likely because the LINQ designers decided that the effort to implement, test, document etc. isn't worth enough compared to the potential use cases. In fact your are the first one I hear complaining about that.
But if it's so important to you, you can add that missing functionality yourself (similar to #Jon Skeet MoreLINQ extension library). For instance, something like this:
namespace MyLinq
{
public static class Extensions
{
public static IOrderedEnumerable<T> Where<T>(this IOrderedEnumerable<T> source, Func<T, bool> predicate)
{
return new WhereOrderedEnumerable<T>(source, predicate);
}
class WhereOrderedEnumerable<T> : IOrderedEnumerable<T>
{
readonly IOrderedEnumerable<T> source;
readonly Func<T, bool> predicate;
public WhereOrderedEnumerable(IOrderedEnumerable<T> source, Func<T, bool> predicate)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (predicate == null) throw new ArgumentNullException(nameof(predicate));
this.source = source;
this.predicate = predicate;
}
public IOrderedEnumerable<T> CreateOrderedEnumerable<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer, bool descending) =>
new WhereOrderedEnumerable<T>(source.CreateOrderedEnumerable(keySelector, comparer, descending), predicate);
public IEnumerator<T> GetEnumerator() => Enumerable.Where(source, predicate).GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
}
}
And putting it into action:
using System;
using System.Collections.Generic;
using System.Linq;
using MyLinq;
var test = Enumerable.Range(0, 100)
.Select(n => new { Foo = 1 + (n / 20), Bar = 1 + n })
.OrderByDescending(e => e.Foo)
.Where(e => (e.Bar % 2) == 0)
.ThenByDescending(e => e.Bar) // Note this compiles:)
.ToList();

Making a list distinct in C#

In C#, I have an object type 'A' that contains a list of key value pairs.
The key value pairs is a category string and a value string.
To instantiate object type A, I would have to do the following:
List<KeyValuePair> keyValuePairs = new List<KeyValuePair>();
keyValuePairs.Add(new KeyValuePair<"Country", "U.S.A">());
keyValuePairs.Add(new KeyValuePair<"Name", "Mo">());
keyValuePairs.Add(new KeyValuePair<"Age", "33">());
A a = new A(keyValuePairs);
Eventually, I will have a List of A object types and I want to manipulate the list so that i only get unique values and I base it only on the country name. Therefore, I want the list to be reduced to only have ONE "Country", "U.S.A", even if it appears more than once.
I was looking into the linq Distinct, but it does not do what I want because it I can't define any parameters and because it doesn't seem to be able to catch two equivalent objects of type A. I know that I can override the "Equals" method, but it still doesn't solve the my problem, which is to render the list distinct based on ONE of the key value pairs.

To expand upon Karl Anderson's suggestion of using morelinq, if you're unable to (or don't want to) link to another dll for your project, I implemented this myself awhile ago:
public static IEnumerable<T> DistinctBy<T, U>(this IEnumerable<T> source, Func<T, U>selector)
{
var contained = new Dictionary<U, bool>();
foreach (var elem in source)
{
U selected = selector(elem);
bool has;
if (!contained.TryGetValue(selected, out has))
{
contained[selected] = true;
yield return elem;
}
}
}
Used as follows:
collection.DistinctBy(elem => elem.Property);
In versions of .NET that support it, you can use a HashSet<T> instead of a Dictionary<T, Bool>, since we don't really care what the value is so much as that it has already been hashed.

Check out the DistinctBy syntax in the morelinq project.
A a = new A(keyValuePairs);
a = a.DistinctBy(k => new { k.Key, k.Value }).ToList();

You need to select the distinct property first:
Because it's a list inside a list, you can use the SelectMany. The SelectMany will concat the results of subselections.
List<A> listOfA = new List<A>();
listOfA.SelectMany(a => a.KeyValuePairs
.Where(keyValue => keyValue.Key == "Country")
.Select(keyValue => keyValue.Value))
.Distinct();
This should be it. It will select all values where the key is "Country" and concat the lists. Final it will distinct the country's. Given that the property KeyValuePairs of the class A is at least a IEnumerable< KeyValuePair< string, string>>

var result = keyValuePairs.GroupBy(x => x.Key)
.SelectMany(g => g.Key == "Country" ? g.Distinct() : g);

You can use the groupby statement. From here you can do all kind off cool stuf
listOfA.GroupBy(i=>i.Value)
You can groupby the value and then sum all the keys or something other usefull

Pass Func<> to Select

I'm starting with this:
query
.Take(20)
.Select(item => new
{
id = item.SomeField,
value = item.AnotherField
})
.AsEnumerable()
.ToDictionary(item => item.id, item => item.value);
Now, I want to reuse everything except SomeField and AnotherField.
public static Dictionary<int, string> ReusableMethod<T>(
this IQueryable<T> query,
Func<T, int> key,
Func<T, string> value)
{
return query
.Take(20)
.Select(item => new
{
id = key(item),
value = value(item)
})
.AsEnumerable()
.ToDictionary(item => item.id, item => item.value);
}
query.ReusableMethod(item => item.SomeField, item => item.AnotherField);
This works, but the DB query selects more data than required, so I guess that means ReusableMethod is using linq-to-objects.
Is it possible to do this while only selecting the required data? I'll add that Func<> is still part magic for me, so I might be missing something obvious.
Clarification to avoid confusion: the Take(20) is fine, the Select() isn't.

Wrap your funcs with Expression and remove the AsEnumerable call.
public static Dictionary<int, string> ReusableMethod<T>(
this IQueryable<T> query,
Expression<Func<T, int>> key,
Expression<Func<T, string>> value)
An alternative would be to just return the whole row then. No need for Expression in this case.
return query
.Take(20)
.ToDictionary(key, value);

Recently I had the same problem and here is what I did:
You have some DbEntity (generated by LINQ to EF,SQL), but you want to query only some fields (I did this to save network bandwidth). You have to create class derived from DbEntity, beacuse you cant create anonyous types in Expression trees and you can not create new instance of DbEntity in select statement. (No need to add any fields, properties etc.)
public class LocalEntity : DbEntity {}
You need to define a method to generate your select expression tree. It should look like this. This will generate expression tree similar to this: .Select(db => new LocalEntity() { Property1 = db.Property1, Proeprty2 = db.Property2})
protected Expression<Func<DbEntity, LocalEntity>> getSelectExpression()
{
ParameterExpression paramExpr = Expression.Parameter(typeof(DbEntity), "dbRecord");
var selectLambda = Expression.Lambda<Func<DbEntity, LocalEntity>>(
Expression.MemberInit(
Expression.New(typeof(LocalEntity)),
Expression.Bind(typeof(LocalEntity).GetProperty("DbEntityFieldName"), Expression.Property(paramExpr, "DbEntityFieldName"),
....
))
),
paramExpr);
return selectLambda;
}
Use it like this:
query.Select(getSelectExpression()).ToDictionary();
Consider this more as pseudo-code than C# code, as I had to simplify it a lot and I can´t test it, but if oyu make it work, it will transfer from DB only fields you define in getSelectedExpression, not the whole row.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Intersect LINQ query - c#

A simple way would be: IEnumerable<ClassA> result = original.Where(a => idsToFind.contains(a.ID));

Use the Where method to filter the results: var result = original.Where(o => idsToFind.Contains(o.ID));

Related

Is there a way to look for a list of ids in a list of parent object type? [duplicate]

How to avoid OrderByDescending in query linq

Why doesn't IOrderedEnumerable retain order after where filtering

Making a list distinct in C#

Pass Func<> to Select

Categories

Resources