Understanding DefaultIfEmpty in LINQ - c#

I don't understand how DefaultIfEmpty method works. It is usually used to be reminiscent of left-outer join in LINQ.
DefaultIfEmpty() method must be run on a collection.
DefaultIfEmpty() method cannot be run on null collection reference.
A code example I don't understand some points that
Does p, which is after into keyword, refer to products?
Is ps the group of product objects? I mean a sequence of sequences.
If DefaultIfEmpty() isn't used, doesn't p, from p in ps.DefaultIfEmpty(), run into select? Why?
,
#region left-outer-join
string[] categories = {
"Beverages",
"Condiments",
"Vegetables",
"Dairy Products",
"Seafood"
};
List<Product> products = GetProductList();
var q = from c in categories
join p in products on c equals p.Category into ps
from p in ps.DefaultIfEmpty()
select (Category: c, ProductName: p == null ? "(No products)" : p.ProductName);
foreach (var v in q)
{
Console.WriteLine($"{v.ProductName}: {v.Category}");
}
#endregion
Code from 101 Examples of LINQ.

I ain't generally answer my own question, however, I think some people might find the question somewhat intricate.
In the first step, the working logic of the DefaultIfEmpty method group should be figured out(LINQ doesn't support its overloaded versions, by the by).
class foo
{
public string Test { get; set; }
}
// list1
var l1 = new List<foo>();
//l1.Add(null); --> try the code too by uncommenting
//list2
var l2 = l1.DefaultIfEmpty();
foreach (var x in l1)
Console.WriteLine((x == null ? "null" : "not null") + " entered l1");
foreach (var x in l2)
Console.WriteLine((x == null ? "null" : "not null") + " entered l2");
When being run, seeing that it gives null entered l2 out result out.
What if l1.Add(null); is commented in? It is at your disposal, not hard to guess at all.
l2 has an item which is of null since foo is not one of the building block types like Int32, String, or Char. If it were, default promotion would be applied to, e.g. for string, " "(blank character) is supplied to.
Now let's examine the LINQ statement being mentioned.
Just for a remembrance, unless an aggregate operator or a To{a
collection}() is applied to a LINQ expression, lazy evaluation(honor
deferred) is carried out.
The followed image, albeit not belonging to C#, helps to get what it means.
In the light of the lazy evaluation, we are now wisely cognizant of the fact that the LINQ using query expression is evaluated when requested, that is, on-demand.
So, ps contains product items iff the equality expressed at on keyword of join is satisfied. Further, ps has different product items at each demand of the LINQ expression. Otherwise, unless DefaultIfEmpty() is used, select is not hit thereby not iterating over and not yielding any Console.WriteLine($"{productName}: {category}");. (Please correct me at this point if I'm wrong.)

Answers
Does p refer to products after into keyword?
The p in the from clause is a new local variable referring to a single product of one category.
Is ps the group of product objects? I mean a sequence of sequences.
Yes, ps is the group of products for the category c. But it is not a sequence of sequences, just a simple IEnumerable<Product>, just like c is a single category, not all categories in the group join.
In the query you only see data for one result row, never the whole group join result. Look at the final select, it prints one category and one product it joined with. That product comes from the ps group of product that one category joined with.
The query then does the walking over all categories and all their groups of products.
If DefaultIfEmpty() isn't used, doesn't p, from p in ps.DefaultIfEmpty(), run into select? Why?
It is not equal to a Select, because the from clause creates a new join with itself, which turns into SelectMany.
Structure
Taking the query by parts, first the group join:
from c in categories
join p in products on c equals p.Category into ps
After this only c and ps are usable, representing a category and its joined products.
Now note that the whole query is in the same form as:
from car in Cars
from passenger in car.Passengers
select (car, passenger)
Which joins Cars with its own Passengers using Cars.SelectMany(car => car.Passengers, (car, passenger) => (car, passenger));
So in your query
from group_join_result into ps
from p in ps.DefaultIfEmpty()
creates a new join of the previous group join result with its own data (lists of grouped products) ran through DefaultIfEmpty using SelectMany.
Conclusion
In the end the complexity is in the Linq query and not the DefaultIfEmpty method. The method is simply explained on the MSDN page i posted in comment. It simply turns a collection with no elements into collection that has 1 element, which is either the default() value or the supplied value.
Compiled source
This is approximately the C# code the query gets compiled to:
//Pairs of: (category, the products that joined with the category)
IEnumerable<(string category, IEnumerable<Product> groupedProducts)> groupJoinData = Enumerable.GroupJoin(
categories,
products,
(string c) => c,
(Product p) => p.Category,
(string c, IEnumerable<Product> ps) => (c, ps)
);
//Flattening of the pair collection, calling DefaultIfEmpty on each joined group of products
IEnumerable<(string Category, string ProductName)> q = groupJoinData.SelectMany(
catProdsPair => catProdsPair.groupedProducts.DefaultIfEmpty(),
(catProdsPair, p) => (catProdsPair.category, (p == null) ? "(No products)" : p.ProductName)
);
Done with the help of ILSpy using C# 8.0 view.

Related

Filter items from database based on a List<>

I have a method that accepts two List<int> for which I need to get data from the database based on the List<>s.
So, I receive a List<PersonId> and List<NationalityId> for example, and I need to get a result set where records match the PersonIds and NationalistIds.
public List<PersonDTO> SearchPeople(List<int> persons, Lisy<int> nationalities)
{
var results = (from c in myDbContect.People where .... select c).ToList();
}
Note that I think Lists might be null.
Is there an efficient way?
I was going to try:
where ((persons != null && persons.Count > 0) && persons persons.Contains(x=>x.PersonId))
But this would generate rather inefficient SQL, and as I add more search parameters, the linq may get very messy.
Is there an efficient way to achieve this?
The join method may be easy to read, but the issue I face is that IF the input list is empty, then it shouldn't filter. That is, if nationalities is empty, don't filter any out:
var results = (from c in entities.Persons
join p in persons on c.PersonId equals b
join n in nationalities on c.NationalityId equals n
equals n
select c).ToList();
This would return no results if any of the lists were empty. Which, is bad.
If you join an IQueryable with an IEnumerable (in this case, entities.Persons and persons), your filtering will not happen within your query. Instead, your IQueryable is enumerated, retrieving all of your records from the database, while the join is performed in memory using the IEnumerable join method.
To perform your filtering against a list within your query, there are two main options:
Join using an IQueryable on both sides. This might be possible if your list of ids comes from the execution of another query, in which case you can use the underlying query in your join instead of the resulting set of ids.
Use the contains operator against your list. This is only possible with small lists, because each additional id requires its own query parameter. If you have many ids, you can possibly extend this approach with batching.
If you want to skip filtering when the list is empty, then you might consider using the extension method invocation instead of the LINQ syntax. This allows you to use an if statement:
IQueryable<Person> persons = entities.persons;
List<int> personIds = new List<int>();
if(personIds.Count > 0)
{
persons = persons.Where(p => personIds.Contains(p.PersonId));
}
var results = persons.ToList();
Note that the Where predicate uses option #2 above, and is only applied if there are any ids in the collection.
If you want to get all the records for persons for example if the list is empty and then filter by nationalityId list if its not empty you can do something like this:
List<int> personsIds = ...;
List<int> nationalitiesIds = ...;
var results = (from c in entities.Persons
join p in persons on c.PersonId equals b
join n in nationalities on c.NationalityId equals n
where ((personsIds == null || personsIds.Contains(p.Id))
&& (nationalitiesIds == null || nationalitiesIds.Contains(n.Id))
select c).ToList();

Linq to SQL brief question

I have a query below. although can anyone point out what "from p" means? and also "var r"?
DataClasses1DataContext db = new DataClasses1DataContext();
var r = from p in db.Products
where p.UnitPrice > 15 // If unit price is greater than 15...
select p; // select entries
r is the composed query - an IQueryable<Product> or similar; note the query has not yet executed - it is just a pending query. var means "compiler, figure out the type of r from the expression on the right". You could have stated it explicitly in this case, but not all. But it wouldn't add any value, so var is fine.
p is a convenience marker for each product; the query is "for each product (p), restricting to those with unit price greater than 15 (where p > 15), select that product (select p) as a result.
Ultimately this compiles as:
IQueryable<Product> r =
db.Products.Where(p => p.UnitPrice > 15);
(in this case, a final .Select(p => p) is omitted by the compiler, but with a non-trivial projection, or a trivial query, the .Select(...) is retained)
The p means each specific item in the collection referenced (db.Products). See from on MSDN.
var is syntactic sugar - it resolves to the type returned from the LINQ query, assigning the type to the variable r. See var on MSDN.
For better understanding of LINQ, I suggest reading through 101 LINQ Samples.
from p means any record from db.Product and var r means the collection of p
overall whole statements means give me all those records(p) from db.Products where p.UnitPrice is greater than 15
see this question to know more about var

Is there an efficient way in LINQ to use a contains match if and only if there is no exact match?

I have an application where I am taking a large number of 'product names' input by a user and retrieving some information about each product. The problem is, the user may input a partial name or even a wrong name, so I want to return the closest matches for further selection.
Essentially if product name A exactly matches a record, return that, otherwise return any contains matches. Otherwise return null.
I have done this with three separate statements, and I was wondering if there was a more efficient way to do this. I am using LINQ to EF, but I materialize the products to a list first for performance reasons.
productNames is a List of product names (input by the user).
products is a List of product 'records'
var directMatches = (from s in productNames
join p in products on s.ToLower() equals p.name.ToLower() into result
from r in result.DefaultIfEmpty()
select new {Key = s, Product = r});
var containsMatches = (from d in directMatches
from p in products
where d.Product == null
&& p.name.ToLower().Contains(d.Key)
select new { d.Key, Product = p });
var matches = from d in directMatches
join c in containsMatches on d.Key equals c.Key into result
from r in result.DefaultIfEmpty()
select new {d.Key, Product = d.Product ?? (r != null ? r.Product: null) };
If you have a small to medium-sized list in-memory, take a look at LiquidMetal and for phonetic matches, the Soundex algorithm to rank the closest matches.
If you are using SQL Server, look into Full-Text Search, which is what Stack Overflow uses. Otherwise, here is how I implemented a keyword-based search.

C# Linq eqiuvalent of SQL Count()

I have a fairly complicated join query that I use with my database. Upon running it I end up with results that contain an baseID and a bunch of other fields. I then want to take this baseID and determine how many times it occurs in a table like this:
TableToBeCounted (Many to Many)
{
baseID,
childID
}
How do I perform a linq query that still uses the query I already have and then JOINs the count() with the baseID?
Something like this in untested linq code:
from k in db.Kingdom
join p in db.Phylum on k.KingdomID equals p.KingdomID
where p.PhylumID == "Something"
join c in db.Class on p.PhylumID equals c.PhylumID
select new {c.ClassID, c.Name};
I then want to take that code and count how many orders are nested within each class. I then want to append a column using linq so that my final select looks like this:
select new {c.ClassID, c.Name, o.Count()}//Or something like that.
The entire example is based upon the Biological Classification system.
Update:
Assume for the example that I have multiple tables:
Kingdom
|--Phylum
|--Class
|--Order
Each Phylum has a Phylum ID and a Kingdom ID. Meaning that all phylum are a subset of a kingdom. All Orders are subsets of a Class ID. I want to count how many Orders below to each class.
I hope this is clear now.
Normally this is done with a group. For example:
from k in db.Kingdom
join p in db.Phylum on k.KingdomID equals p.KingdomID
where p.PhylumID == "Something"
join c in db.Class on p.PhylumID equals c.PhylumID
group c by new { c.ClassID, c.Name } into g
select new { Count = g.Count(), g.Key.ClassID, g.Key.Name };
That will basically count how many entries you have for each ClassID/Name pair. However, as Winston says in the comments, you're possibly interested in another table (Order) that you haven't told us about. We can't really give much more information until we know what you're doing here. Do you already have a relationship set up for this in LINQ to SQL? Please tell us about the Order table and how it relates to your other tables.
EDIT: Okay, with the modified question, I suspect we can ignore phylum and kingdom completely, unless I'm missing something. (I also can't see how this relates to a many-to-many mapping...)
I think this would work:
from o in db.Order
group o by o.ClassID into g
join c in db.Class on g.Key.ClassID equals c.ClassID
select new { c.ClassID, c.Name, g.Count() };

nested linq queries, how to get distinct values?

table data of 2 columns "category" and "subcategory"
i want to get a collection of "category", [subcategories]
using code below i get duplicates. Puting .Distinct() after outer "from" does not help much. What do i miss?
var rootcategories = (from p in sr.products
orderby p.category
select new
{
category = p.category,
subcategories = (
from p2 in sr.products
where p2.category == p.category
select p2.subcategory).Distinct()
}).Distinct();
sr.products looks like this
category subcategory
----------------------
cat1 subcat1
cat1 subcat2
cat2 subcat3
cat2 subcat3
what i get in results is
cat1, [subcat1,subcat2]
cat1, [subcat1,subcat2]
but i only want one entry
solved my problem with this code:
var rootcategories2 = (from p in sr.products
group p.subcategory by p.category into subcats
select subcats);
now maybe it is time to think of what was the right question.. (-:
solved with this code
var rootcategories2 = (from p in sr.products
group p.subcategory by p.category into subcats
select subcats);
thanks everyone
I think you need 2 "Distinct()" calls, one for the main categories and another for the subcategories.
This should work for you:
var mainCategories = (from p in products select p.category).Distinct();
var rootCategories =
from c in mainCategories
select new {
category = c,
subcategories = (from p in products
where p.category == c
select p.subcategory).Distinct()
};
The algorithm behind Distinct() needs a way to tell if 2 objects in the source IEnumerable are equal.
The default method for that is to compare 2 objects by their reference and therefore its likely that no 2 objects are "equal" since you are creating them with the "new" keyword.
What you have to do is to write a custom class which implements IEnumerable and pass that to the Distinct() call.
Your main query is on Products, so you're going to get records for each product. Switch it around so you're querying on Category, but filtering on Product.Category

Categories

Resources