join two collections on a custom condition

join two collections on a custom condition - c#

I have two collections, IEnumerable<A> as and IEnumerable<B> bs
I also have a predicate Func<A, B, boolean> predicate
I would like to join as and bs together to get something equivalent to an IEnumerable<IGrouping<A, B>> joined such that for each element group in joined, for each element b in group, predicate(group.key, b) holds.
To get such a grouping, there usually is the GroupBy extension method, but that cant operate based on a custom predicate.
I considered two approaches, one just building a collection with nested loops, the other doing the same with Aggregate. Both look really ugly. Is there a better way to do this?
In this particular case, for each element b in bs there is exactly one A in as for which the predicate holds, and I don't mind relying on that property if that makes for a nicer solution.
As far as I can see, in the general case it can't make for a better asymptotic runtime complexity than O(n * m) where n is the length of as and m is the length of bs. I'm OK with that.

Considering that you have
IEnumerable<A> aEnumerable;
IEnumerable<B> bEnumerable;
and the following restriction:
In this particular case, for each element b in bs there is exactly one A in as for which the predicate holds
You may do the following:
IEnumerable<IGrouping<A, B>> grouping = bEnumerable
.GroupBy(b => aEnumerable.Single(a => func(a, b)));
Another option which comes to mind and looks more convenient is a simple dictionary:
IEnumerable<A> aEnumerable;
IEnumerable<B> bEnumerable;
Dictionary<A, B[]> dict = aEnumerable
.ToDictionary(a => a,
a => bEnumerable.Where(b => func(a, b)).ToArray());
For every key A in this dictionary there will be those items which hold this predicate.

Related

Groupby Transform into another group

I have a
IGrouping<string, MyObj>
I want to transform it into another IGrouping. For argument sake the key is the same, but MyObj will transform into MyOtherObj i.e.
IGrouping<string, MyOtherObj>
I am using Linq2Sql but I can copy with this last bit not being transformable into SQL.
I want it to be still be an IGrouping<T,TT> because it is a recognised type and I want the signature and result to be apparent. I also want to be able to do this so I can break my link down a bit and put into better labelled methods. i.e.
GetGroupingWhereTheSearchTextAppearsMoreThanOnce()
RetrieveRelatedResultsAndMap()
Bundle up and return encased in an IEnumerable - no doubt as to what is going on.
I have come close by daisy chaining
IQueryable<IGrouping<string, MyObj>> grouping ....
IQueryable<IGrouping<string, IEnumerable<MyOtherObj>>> testgrouping = grouping.GroupBy(gb => gb.Key, contacts => contacts.Select(s => mapper.Map<MyObj, MyOtherObj>(s)));
but I end up with
IGrouping<string, IEnumerable<MyOtherObj>>
I know it is because of how I am accessing the enumerable that the IGrouping represents but I can't figure out how to do it.

You could just flatten the groupings with SelectMany(x => x) then do the GroupBy again, but then you're obviously doing the work twice.
You should be able to do the projection as part of the first GroupBy call instead.
Alternatively, you can add your own implementation of IGrouping, as described here What is the implementing class for IGrouping?, then simply do:
groups.Select(g => new MyGrouping(g.Key, g.Select(myObj => Mapper.Map<MyObj,MyOtherObj>(myObj))))

When using LINQ to Entities, is there a better method than casting to list in order to use unsupported code/extensions?

I find myself needing to do things like this frequently, and I'm just curious if there is a better way.
Suppose I have a class that holds a snapshot of some data:
private List<Person> _people;
In some method I populate that list from a LINQ query using Entity Framework, and perhaps I need to, for example, run a custom IEqualityComparer on it. Since this isn't supported in LINQ to entities, I end up with something like this:
_people = db.People.Where(...)
.ToList()
.Distinct(new MyCustomComparer())
.ToList();
Another example might be using an extension method, which also is not supported in LINQ to entities:
_people = db.People.Where(...)
.ToList()
.Select(_ => new { Age = _.DOB.MyExtensionMethod() })
.ToList();
In order to use either of these I have to cast the database entities into regular memory objects with the first ToList(), and then since I ultimately want a list anyway, I have a final cast ToList() at the end. This seems inefficient to me, and I'm wondering if there's a better pattern for these types of situations?

You can use AsEnumerable():
_people = db.People.Where(...)
.AsEnumerable()
.Distinct(new MyCustomComparer())
.ToList();
Which is equivalent to:
IEnumerable<Person> _people = db.People.Where(...);
_people = _people.Distinct(new MyCustomComparer()).ToList();
This is not much of an improvement, but at least it doesn't create another List<T> and is better expressing that you want to switch to the realm of IEnumerable<T> (in-memory).
See MSDN

Building a sorted dictionary using ToDictionary

I'm not an expert in C# and LINQ.
I have a Dictionary, which I understand a hash table, that is, keys are not sorted.
dataBase = new Dictionary<string, Record>()
Record is a user-defined class that holds a number of data for a given key string.
I found an interesting example that converts this Dictionary into a sorted dictionary by LINQ:
var sortedDict = (from entry in dataBase orderby entry.Key ascending select entry)
.ToDictionary(pair => pair.Key, pair => pair.Value);
This code works correctly. The resulting sortedDict is sorted by keys.
Question: I found that sortedDict is still a hash table, a type of:
System.Collections.Generic.Dictionary<string, Record>
I expected the resulting dictionary should be a sort of map as in C++ STL, which is generally implemented as a (balanced) binary tree to maintain the ordering of the keys. However, the resulting dictionary is still a hash table.
How sortedDict can maintain the ordering? A hash table can't hold the ordering of the keys. Is the implementation of C#'s Generic.Dictionary other than a typical hash table?

Dictionary maintains two data structures: a flat array that's kept in insertion order for enumeration, and the hash table for retrieval by key.
If you use ToDictionary() on a sorted set, it will be in order when enumerated, but it won't be maintained in order. Any newly inserted items will be added to the back when enumerating.
Edit: If you want to rely on this behaviour, I would recommend looking at the MSDN docs to see if this is guaranteed, or just incidental.

SortedDictionary takes an existing Dictionary in the constructor so making a SortedDictionary is very easy.
But you can make it an extension method if you want then you can use dataBase.ToSortedDictionary()
public static SortedDictionary<K, V> ToSortedDictionary<K,V>(this Dictionary<K, V> existing)
{
return new SortedDictionary<K, V>(existing);
}

the linq code looks building a sorted dictionary, but the sorting is done by the linq, not the dictionary itself, whereas a SortedDictionary should maintain the sorting by itself.
to get a sorted dictionary, use new SortedDictionary<string, Record>(yourNormalDictionary);
if you want to make it more accessible, then you may write an extension to the ienumerable:
public static class Extensions
{
public static SortedDictionary<T1, T2> ToSortedDictionary<T1, T2>(this IEnumerable<T2> source, Func<T2, T1> keySelector)
{
return new SortedDictionary<T1, T2>(source.ToDictionary(keySelector));
}
}

Why the order of LINQ to objects methods counts

I read this question's answers that explain the order of the LINQ to objects methods makes a difference. My question is why?
If I write a LINQ to SQL query, it doesn't matter the order of the LINQ methods-projections for example:
session.Query<Person>().OrderBy(x => x.Id)
.Where(x => x.Name == "gdoron")
.ToList();
The expression tree will be transformed to a rational SQL like this:
SELECT *
FROM Persons
WHERE Name = 'gdoron'
ORDER BY Id;
When I Run the query, SQL query will built according to the expression tree no matter how weird the order of the methods.
Why it doesn't work the same with LINQ to objects?
when I enumerate an IQueryable all the projections can be placed in a rational order(e.g. Order By after Where) just like the Data Base optimizer does.

Why it doesn't work this way with LINQ to objects?
LINQ to Objects doesn't use expression trees. The statement is directly turned into a series of method calls, each of which runs as a normal C# method.
As such, the following in LINQ to Objects:
var results = collection.OrderBy(x => x.Id)
.Where(x => x.Name == "gdoron")
.ToList();
Gets turned into direct method calls:
var results = Enumerable.ToList(
Enumerable.Where(
Enumerable.OrderBy(collection, x => x.Id),
x => x.Name = "gdoron"
)
);
By looking at the method calls, you can see why ordering matters. In this case, by placing OrderBy first, you're effectively nesting it into the inner-most method call. This means the entire collection will get ordered when the resutls are enumerated. If you were to switch the order:
var results = collection
.Where(x => x.Name == "gdoron")
.OrderBy(x => x.Id)
.ToList();
Then the resulting method chain switches to:
var results = Enumerable.ToList(
Enumerable.OrderBy(
Enumerable.Where(collection, x => x.Name = "gdoron"),
x => x.Id
)
);
This, in turn, means that only the filtered results will need to be sorted as OrderBy executes.

Linq to objects's deferred execution works differently than linq-to-sql's (and EF's).
With linq-to-objects, the method chain will be executed in the order that the methods are listed—it doesn't use expression trees to store and translate the whole thing.
Calling OrderBy then Where with linq-to-objects will, when you enumerate the results, sort the collection, then filter it. Conversely, filtering results with a call to Where before sorting it with OrderBy will, when you enumerate, first filter, then sort. As a result the latter case can make a massive difference, since you'd potentially be sorting many fewer items.

Because, with LINQ for SQL, the SQL grammar for SELECT mandates that the different clauses occur in a particular sequence. The compiler must generate grammatically correct SQL.
Applying LINQ for objects on an IEnumerable involves iterating over the IEnumerable and applying a sequence of actions to each object in the IEnumerable. Order matters: some actions may transform the object (or the stream of objects itself), others may throw objects away (or inject new objects into the stream).
The compiler can't divine your intent. It builds code that does what you said to do in the order in which you said to do it.

It's perfectly legal to use side-effecting operations. Compare:
"crabapple"
.OrderBy(c => { Console.Write(c); return c; })
.Where(c => { Console.Write(c); return c > 'c'; })
.Count();
"crabapple"
.Where(c => { Console.Write(c); return c > 'c'; })
.OrderBy(c => { Console.Write(c); return c; })
.Count();

Linq to Objects does not reorder to avoid a would-be run-time step to do something that should be optimized at coding time. The resharpers of the world may at some point introduce code analysis tools to smoke out optimization opportunities like this, but it is definitely not a job for the runtime.

Faking IGrouping for LINQ

Imagine you have a large dataset that may or may not be filtered by a particular condition of the dataset elements that can be intensive to calculate. In the case where it is not filtered, the elements are grouped by the value of that condition - the condition is calculated once.
However, in the case where the filtering has taken place, although the subsequent code still expects to see an IEnumerable<IGrouping<TKey, TElement>> collection, it doesn't make sense to perform a GroupBy operation that would result in the condition being re-evaluated a second time for each element. Instead, I would like to be able to create an IEnumerable<IGrouping<TKey, TElement>> by wrapping the filtered results appropriately, and thus avoiding yet another evaluation of the condition.
Other than implementing my own class that provides the IGrouping interface, is there any other way I can implement this optimization? Are there existing LINQ methods to support this that would give me the IEnumerable<IGrouping<TKey, TElement>> result? Is there another way that I haven't considered?

the condition is calculated once
I hope those keys are still around somewhere...
If your data was in some structure like this:
public class CustomGroup<T, U>
{
T Key {get;set;}
IEnumerable<U> GroupMembers {get;set}
}
You could project such items with a query like this:
var result = customGroups
.SelectMany(cg => cg.GroupMembers, (cg, z) => new {Key = cg.Key, Value = z})
.GroupBy(x => x.Key, x => x.Value)

Inspired by David B's answer, I have come up with a simple solution. So simple that I have no idea how I missed it.
In order to perform the filtering, I obviously need to know what value of the condition I am filtering by. Therefore, given a condition, c, I can just project the filtered list as:
filteredList.GroupBy(x => c)
This avoids any recalculation of properties on the elements (represented by x).
Another solution I realized would work is to revers the ordering of my query and perform the grouping before I perform the filtering. This too would mean the conditions only get evaluated once, although it would unnecessarily allocate groupings that I wouldn't subsequently use.

What about putting the result into a LookUp and using this for the rest of the time?
var lookup = data.ToLookUp(i => Foo(i));

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

join two collections on a custom condition - c#

Related

Groupby Transform into another group

When using LINQ to Entities, is there a better method than casting to list in order to use unsupported code/extensions?

Building a sorted dictionary using ToDictionary

Why the order of LINQ to objects methods counts

Faking IGrouping for LINQ

Categories

Resources