I have data something like this:
Id | Customer | CartTotal
-------------------------------
1 | a | 100
2 | a | 50
3 | b | 110
4 | b | 128
I need to order it by CartTotal (descending) and return distinct customers
so that I should have this in my result set:
Id | Customer | CartTotal
-------------------------------
4 | b | 128
1 | a | 100
I believe I need to do an order and projection. I'm working with a strongly typed IList<> datasource. I'm new to LINQ.. any help would be greatly appreciated.
Something like the following should do what you're after:
var filteredPurchases = purchases.OrderByDescending(p => p.CartTotal)
.GroupBy(p => p.Customer)
.Select(g => g.First());
It will return the purchase with the maximum CartTotal for each Customer, giving the desired result.
The answers so far, while correct, are significantly less efficient then needed because they 1)sort before grouping and 2)only need the largest element in the first place. Sorting first makes the solution O(n*log(n)).
Taking care of number 1, we can do the following:
var query = purchases
.GroupBy(p => p.Customer)
.Select(g => g.OrderByDescending(p => p.CartTotal).First());
This gets us a solution something like O(n + n * log(n/c), where c is the number of customers. Assuming that orders per customer is roughly constant, it is O(n).
Now, we can do better, by just finding the maximum element for each customer and selecting it in one pass. Unfortunately, the Max operator in Linq makes this more painful than it should be. If you pull down MoreLinq, you can do the following:
var query = purchases
.GroupBy(p => p.Customer)
.Select(g => g.MaxBy(p => p.CartTotal));
This solution is always O(n), regardless of the distribution of purchases to customers. I would also expect it to be the fastest by far on large data sets.
Here's a query expression version:
var query = from cart in carts
orderby cart.CartTotal descending
group cart by cart.Customer into custCarts
select custCarts.First();
Related
Books Table
Id VendorId ASIN Price
-- -------- ---- ------
1 gold123 123 10
2 sil123 123 11
3 gold456 456 15
4 gold678 678 12
5 sil456 456 12
6 gold980 980 12
I want to write a linq query which will return me rows for which corresponding to every gold if sil vendor id not exist. The last three digit of vendor Id is corresponding ASIN column in that row.
Ex- For gold123 corresponding sil123 exist so that row will not be returned but for gold678 and gold980 corresponding sil not exist. So those rows will be returned.
I tried following
var gold = _repository.Query<Books>().Where(x =>
x.VendorId.Contains("gold"))
.OrderBy(x => x.Id).Skip(0).Take(500).ToList();
var asinsForGold = gold.Select(x => x.ASIN).ToList();
var correspondingSilver = _repository.Query<Books>().Where(x =>
x.VendorId.Contains("sil")
&& asinsForGold.Contains(x.ASIN)).ToList();
var correspondingSilverAsins = correspondingSilver.Select(x => x.ASIN).ToList();
var goldWithoutCorrespondingSilver = gold.Where(x =>
!correspondingSilverAsins.Contains(x.ASIN));
Can We apply self join or better way to get result only in one query instead of two query and several other list statement.
It's just another predicate, "where a corresponding silver vendor doesn't exist":
var goldWoSilver = _repository.Query<Books>()
.Where(x => x.VendorId.Contains("gold"))
.Where(x => !_repository.Query<Books>()
.Any(s => s.ASIN == x.ASIN
&& s.VendorId.Contains("sil"))
.OrderBy(x => x.Id).Skip(0).Take(500).ToList();
In many cases this is a successful recipe: start the query with the entity you want to return and only add predicates. In general, joins shouldn't be used for filtering, only to collect related data, although in that case navigation properties should be used which implicitly translate to SQL joins.
See if it helps -
var goldWithoutCorrespondingSilver = from b1 in books
join b2 in books on b1.ASIN equals b2.ASIN
where b1.VendorId.Contains("gold")
group b2 by b1.VendorId into g
where !g.Any(x => x.VendorId.Contains("sil"))
select g.FirstOrDefault();
What I have done is -
Selected records with matching ASIN
Grouped them by VendorID
Selected ones which do not have sil
I have a usage table which stores daily usage for customers of various products. I want to now return a result which groups results by customerID for total / combined usage of various products.
See below a perfectly illustrated example of the current data structure :)
id | customerID | prod1 | prod2
1 . 123 . 0 . 1
2 . 125 . 5 . 5
3 . 125 . 1 . 1
I am looking to return a result set as such (again, admire my illustrating ability):
customerID | prod1 | prod2
123 . 0 . 1
125 . 6 . 6
Will this kind of calculation be possible using EF? I am trying to avoid a multitude of loops to achieve the same thing so this would greatly help a brother out.
What you need is GroupBy and Sum:
var result = context.Customers
// .Where(filter) // filter if needed
.GroupBy(m => m.CustomerID)
.Select(g => new
{
CustomerID = g.Key, // We grouped according to CustomerID, so key = CustomerID
SumProd1 = g.Sum(m => m.Prod1), // Sum Prod1 of grouped data
SumProd2 = g.Sum(m => m.Prod2) // Sum Prod2 of grouped data
})
.ToList();
Note: ToList() is for retrieving data, it is not needed if you plan to work on query.
Clarified example
I have a database of users that is created by a script that scans through Active Directory. One of the fields it applies is a "ScanDate" field, which indicates when the scan took place. The script scans through multiple Active Directory domains.
GOAL: Obtain an IList from the database that contains the list of users for ALL domains, but where the ScanDate is the MAX(ScanDate) for each domain.
This ensures I get the freshest data for each domain.
A SQL query that appears to work for me:
SELECT *
FROM ADScans a
WHERE a.ScanDate = (SELECT MAX(b.ScanDate) FROM ADScans b WHERE a.Domain = b.Domain) AND Enabled = 1
However, having trouble getting that expressed in LINQ
e.g.:
Category | Date
Cat1 4/4/16
Cat2 | 4/4/16
Cat3 | 4/4/16
Cat1 | 4/3/16
I would expect a list:
Cat1 | 4/4/16
Cat2 | 4/4/16
Cat3 | 4/4/16
Some clarification
I would expect to have multiple rows returned per category - the MAX(Date) will not just give me one. I am looking to obtain ALL of the rows for the MAX(Date) of each category.
Something like this should work:
var result =
list
//Group by Category
.GroupBy(x => x.Category)
//For each Category, select Category and max Date within the Category
//This would create an anonymous object, you could do a "new Entity" instead if you want
.Select(g => new {Category = g.Key, Date = g.Max(x => x.Date)})
.ToList();
I'm an idiot.
from u in this.db.ADScans
where u.ScanDate ==
(from s in this.db.ADScans where u.Domain == s.Domain select s.ScanDate).Max()
&& u.Enabled
select u;
Rather than using Max(), just order the items in the groups and take the top item: since you ordered the items, it's guaranteed to be the highest one:
var mostRecentScanFromEachDomain =
from u in this.db.ADScans
where u.Enabled
group u by u.Domain into g
select g.OrderByDescending(u => u.ScanDate)
.FirstOrDefault();
You can GroupBy by Domain to get the max ScanDate, then keep only the rows with that Date.
For a model like this:
class ADScan
{
public int Domain { get; set; }
public DateTime ScanDate { get; set; }
}
You can get the scans doing this:
var result = scans.GroupBy(s => s.Domain)
.SelectMany(g => g.Where(s => s.ScanDate == g.Max(d => d.ScanDate)));
This produces a collection containing the scans with the max ScanDate for its Domain.
how can I use link to fetch one-to-one relation that does not contain duplicates? Example:
ID | STATUS
1 | CHECKIN
2 | CHECKOUT
2 | CHECKOUT
1 | CHECKIN
3 | CHECKOUT <--
I should only retrieve the ID 3 CHECKOUT because he is not duplicated.
Can you help me using linq?
You need to make a Group and ask for only group items that = 1
Dim nonDuplicates = (From x In query Group By x.Id, x.Status Into grp = Group
Where grp.Count = 1)
The other answer will still retrieve all the duplicated items, just removing duplicates of them. If you want to only retrieve non-duplicated items, as you stated in your original question, this will work for you:
Item singles = items.Where(i => !items.Any(j => !i.Equals(j) && i.id == j.id));
I have a table like so:
Code | BuildDate | BuildQuantity
---------------------------------
1 | 2013-04-10 | 4
1 | 2014-09-23 | 1
1 | 2014-08-20 | 2
7 | 2014-02-05 | 4
I want the LINQ query to pick up the LATEST Build date for each Code, pick the BuildQuantity for that Code, and so on for all the records, and sum it up. So for the data above, the result should be 1 + 4 = 5.
This is what I'm trying:
var built = (from tb in db.Builds
orderby tb.BuildDate descending
group tb by tb.Code into tbgrp
select tbgrp.Sum(c => c.BuildQuantity)).First();
This query returns 7... What am I doing wrong?
You are summing all code's BuildQuantities before you take the first, instead you want to sum the firsts.
int built = db.Builds
.GroupBy(b => b.Code)
.Sum(g => g.OrderByDescending(b => b.BuildDate).First().BuildQuantity);
You are looking for the sum of the build quantity of the last entry per code. You're currently ordering before you group, which doesn't actually do anything (after the group, the ordering isn't defined)
So first of, you're looking to get the latest element by code. Lets group first. I'm more comfortable writing through the extension methods:
IGrouping<int, Build> grouped = db.Builds.GroupBy(tb => tb.Code)
we now have the elements grouped. From each group, we want to get the first element ordered descending by build date.
var firsts = grouped.Select(gr => gr.OrderByDescending(gr => gr.BuildDate)
.First())
finally, we can get the sum:
var sum = firsts.Sum(tb => tb.BuildQuantity);
plugging this all together becomes
var sum = db.Builds.GroupBy(tb => tb.Code).
.Select(gr => gr.OrderByDescending(gr => gr.BuildDate).First())
.Sum(tb => tb.BuildQuantity);
Group by has overloads that allows you to roll almost everything in the group.
If you like compact code, you could write
var sum = db.Builds
.GroupBy(tb => tb.Code,
tb => tb,
gr => gr.OrderByDescending(gr => gr.BuildDate)
.First()
.BuildQuantity)
.Sum()
though I wouldn't recommend it from a readability point of view