Combining two simple related linq queries - c#

I have two queries and i'm using the result of the first one in the second one like this
var temp = (ObjectTable.Where(o => o.Category == "Y"));
var anonymousObjList = temp.Select(o => new {o, IsMax = (o.Value == temp.Max(x => x.Value))});
Is there a way to combine these into one query?
EDIT:
I cannot just chain them directly because I'm using temp.Max() in the second query.

Why? it would be clearer (and more efficient) to make it three:
var temp = (ObjectTable.Where(o => o.Category == "Y"));
int max = temp.Max(x => x.Value);
var anonymousObjList = temp.Select(o => new {o, IsMax = (o.Value == max)});

You can do it in one statement using query syntax, using the let keyword. It only evaluates the 'max' once, so it just like the three separate statements, just in one line.
var anonymousObjList = from o in ObjectTable
where o.Category == "Y"
let max = ObjectTable.Max(m => m.Value)
select new { o, IsMax = (o.Value == max) };
This is the only time I ever use query syntax. You can't do this using method syntax!
edit: ReSharper suggests
var anonymousObjList = ObjectTable.Where(o => o.Category == "Y")
.Select(o => new {o, max = ObjectTable.Max(m => m.Value)})
.Select(#t => new {#t.o, IsMax = (#t.o.Value == #t.max)});
however this is not optimal. The first Select is projecting a max Property for each item in ObjectTable - the Max function will be evaluated for every item. If you use query syntax it's only evaluated once.
Again, you can only do this with query syntax. I'm not fan of query syntax but this makes it worthwhile, and is the only case in which I use it. ReSharper is wrong.

Possibly the most straightfirward refactoring is to replace all instances of "temp" with the value of temp. Since it appears that this value is immutable, the refactoring should be valid (yet ugly):
var anonymousObjList = ObjectTable.Where(o => o.Category == "Y")
.Select(o => new {o, IsMax = (o.Value == ObjectTable.Where(o => o.Category == "Y").Max(x => x.Value))});
As has already been pointed out, this query really has no advantages over the original, since queries use deffered execution and can be built up. I would actually suggest splitting the query even more:
var temp = (ObjectTable.Where(o => o.Category == "Y"));
var maxValue = temp.Max(x => x.Value);
var anonymousObjList = temp.Select(o => new {o, IsMax = (o.Value == maxValue)});
This is better than the original because every time "Max" is called causes another iteration over the entire dataset. Since it is being called in the Select of the original, Max was being called n times. That makes the original O(n^2)!

Related

LINQ subquery with multiple columns

I'm trying to recreate this SQL query in LINQ:
SELECT *
FROM Policies
WHERE PolicyID IN(SELECT PolicyID
FROM PolicyRegister
WHERE PolicyRegister.StaffNumber = #CurrentUserStaffNo
AND ( PolicyRegister.IsPolicyAccepted = 0
OR PolicyRegister.IsPolicyAccepted IS NULL ))
Relationship Diagram for the two tables:
Here is my attempt so far:
var staffNumber = GetStaffNumber();
var policyRegisterIds = db.PolicyRegisters
.Where(pr => pr.StaffNumber == staffNumber && (pr.IsPolicyAccepted == false || pr.IsPolicyAccepted == null))
.Select(pr => pr.PolicyID)
.ToList();
var policies = db.Policies.Where(p => p.PolicyID.//Appears in PolicyRegisterIdsList)
I think I'm close, will probably make two lists and use Intersect() somehow but I looked at my code this morning and thought there has to be an easier way to do this,. LINQ is supposed to be a more readble database language right?
Any help provided is greatly appreciated.
Just use Contains:
var policies = db.Policies.Where(p => policyRegisterIds.Contains(p.PolicyID));
Also better store policyRegisterIds as a HashSet<T> instead of a list for search in O(1) instead of O(n) of List<T>:
var policyRegisterIds = new HashSet<IdType>(db.PolicyRegisters......);
But better still is to remove the ToList() and let it all happen as one query in database:
var policyRegisterIds = db.PolicyRegisters.Where(pr => pr.StaffNumber == staffNumber &&
(pr.IsPolicyAccepted == false || pr.IsPolicyAccepted == null));
var policies = db.Policies.Where(p => policyRegisterIds.Any(pr => pr.PolicyID == p.PolicyID));

Improving O(n^2) algorithm

I have the following piece of code:
var keywordItems = adwordsService
.ParseReport(report)
.Where(e => e.Keyword.IndexOf('+') == -1);
var keywordTranslations = keywordTranslationService
.GetKeywordTranslationsByClient(id);
model.KeywordItems = keywordItems
.Where(e =>
{
int lastUnderscore = e.CampaignName.LastIndexOf('_');
var identifer = e.CampaignName.Substring(lastUnderscore + 1);
var translation = keywordTranslations
.FirstOrDefault(t => t.translation == e.Keyword &&
t.LocalCombination_id == identifer);
return translation == null;
})
.OrderBy(e => e.Keyword);
It receives an array and then filters each of these element based on whether or not they've already been seen before.
However, this runs pretty slow, as there's a lot of new elements, so I would like it, if someone can point me in the right direction regarding the best algorithm to use in this case.
Simple join will do the job - it uses hashset for matching between collections, which gives you O(1) for search operation:
from k in keywordItems
let identifer = k.CampaignName.Substring(k.CampaignName.LastIndexOf('_') + 1)
join t in keywordTranslations on
new { k.Keyword, Id = identifer } equals
new { Keyword = t.translation, Id = t.LocalCombination_id } into g
where !g.Any()
orderby k.Keyword
select k
To further improve performance you can move identifier extraction directly to the key creation. Thus you will omit introducing new range variable.
I suggest using hashing, e.g. HashSet<T> or Dictionary<T>. Providing that translation as well as LocalCombination_id are of type string:
HashSet<Tuple<string, int>> keywordTranslations =
new HashSet<Tuple<string, string>>(keywordTranslationService
.GetKeywordTranslationsByClient(id)
.Select(t => new Tuple<string, int>(t.translation, t.LocalCombination_id)));
model.KeywordItems = keywordItems
.Where(e => !keywordTranslations.Contains(new Tuple<string, string>(
e.Keyword,
e.CampaignName.Substring(e.CampaignName.LastIndexOf('_') + 1))))
.OrderBy(e => e.Keyword);

Flattening Complex LINQ to SQL

I have a somewhat complex LINQ to SQL query that I'm trying to optimise (no, not prematurely, things are slow), that goes a little bit like this;
IQueryable<SearchListItem> query = DbContext.EquipmentLives
.Where(...)
.Select(e => new SearchListItem {
EquipmentStatusId = e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null).Id,
StatusStartDate = e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null).DateFrom,
...
});
The where clauses aren't important, they don't filter EquipmentStatuses, happy to include if someone thinks they're required.
This is on quite a large set of tables and returns a fairly details object, there's more references to EquipmentStatuses, but I'm sure you get the idea. The problem is that there's quite obviously two sub-queries and I'm sure that (among some other things) is not ideal, especially since they are exactly the same sub-query each time.
Is it possible to flatten this out a bit? Perhaps it's easier to do a few smaller queries to the database and create the SearchListItem in a foreach loop?
Here's my take given your comments, and with some assumptions I've made
It may look scary, but give it a try, with and without the ToList() before the GroupBy()
If you have LinqPad, check the SQL produced, and the number of queries, or just plug in the SQL Server Profiler
With LinqPad you could even put a Stopwatch to measure things precisely
Enjoy ;)
var query = DbContext.EquipmentLives
.AsNoTracking() // Notice this!!!
.Where(...)
// WARNING: SelectMany is an INNER JOIN
// You won't get EquipmentLive records that don't have EquipmentStatuses
// But your original code would break if such a case existed
.SelectMany(e => e.EquipmentStatuses, (live, status) => new
{
EquipmentLiveId = live.Id, // We'll need this one for grouping
EquipmentStatusId = status.Id,
EquipmentStatusDateTo = status.DateTo,
StatusStartDate = status.DateFrom
//...
})
// WARNING: Again, you won't get EquipmentLive records for which none of their EquipmentStatuses have a DateTo == null
// But your original code would break if such a case existed
.Where(x => x.EquipmentStatusDateTo == null)
// Now You can do a ToList() before the following GroupBy(). It depends on a lot of factors...
// If you only expect one or two EquipmentStatus.DateTo == null per EquipmentLive, doing ToList() before GroupBy may give you a performance boost
// Why? GroupBy sometimes confuses the EF SQL generator and the SQL Optimizer
.GroupBy(x => x.EquipmentLiveId, x => new SearchListItem
{
EquipmentLiveId = x.EquipmentLiveId, // You may or may not need this?
EquipmentStatusId = x.EquipmentStatusId,
StatusStartDate = x.StatusStartDate,
//...
})
// Now you have one group of SearchListItem per EquipmentLive
// Each group has a list of EquipmenStatuses with DateTo == null
// Just select the first one (you could do g.OrderBy... as well)
.Select(g => g.FirstOrDefault())
// Materialize
.ToList();
You don't need to repeat the FirstOrDefault. You can add an intermediate Select to select it once and then reuse it:
IQueryable<SearchListItem> query = DbContext.EquipmentLives
.Where(...)
.Select(e => e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null))
.Select(s => new SearchListItem {
EquipmentStatusId = s.Id,
StatusStartDate = s.DateFrom,
...
});
In query syntax (which I find more readable) it would look like this:
var query =
from e in DbContext.EquipmentLives
where ...
let s = e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null)
select new SearchListItem {
EquipmentStatusId = s.Id,
StatusStartDate = s.DateFrom,
...
});
There is another problem in your query though. If there is no matching EquipmentStatus in your EquipmentLive, FirstOrDefault will return null, which will cause an exception in the last select. So you might need an additional Where:
IQueryable<SearchListItem> query = DbContext.EquipmentLives
.Where(...)
.Select(e => e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null))
.Where(s => s != null)
.Select(s => new SearchListItem {
EquipmentStatusId = s.Id,
StatusStartDate = s.DateFrom,
...
});
or
var query =
from e in DbContext.EquipmentLives
where ...
let s = e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null)
where s != null
select new SearchListItem {
EquipmentStatusId = s.Id,
StatusStartDate = s.DateFrom,
...
});
Given that you don't test for null after calling FirstOrDefault(s => s.DateTo == null) I assume that:
either for each device there is always a status with DateTo == null or
you need to see only devices which have such status
In order to do so you need to join EquipmentLives with EquipmentStatuses to avoid subqueries:
var query = DbContext.EquipmentLives
.Where(l => true)
.Join(DbContext.EquipmentStatuses.Where(s => s.DateTo == null),
eq => eq.Id,
status => status.EquipmentId,
(eq, status) => new SelectListItem
{
EquipmentStatusId = status.Id,
StatusStartDate = status.DateFrom
});
However, if you do want to perform a left join replace DbContext.EquipmentStatuses.Where(s => s.DateTo == null) with DbContext.EquipmentStatuses.Where(s => s.DateTo == null).DefaultIfEmpty().

Fluent nHibernate Query => QueryOver, Join, Distinct

I try to change that query to QueryOver<> to be able to do the Distinct operation yet inside the (generated sql) query
var result = (from x in Session.Query<Events>()
join o in Session.Query<Receivers>() on x.ID equals o.ID
where x.Owner.ID == 1 //the user is the owner of that Event (not null)
||
x.EVType.ID == 123 //(not null)
||
x.Receivers.Count(y => y.User.ID == 1) > 0 //the user is one of the Event Receivers
select x.StartDate)
.Distinct();
I tried something like that
Events x = null;
List<Receivers> t = null;
var result = Session.QueryOver<Events>(() => x)
.JoinAlias(() => x.Receivers, () => t)
.Where(() => x.Owner.ID == 1
||
x.EVType.ID == 123
||
t.Count(y => y.User.ID == 1) > 0)
.TransformUsing(Transformers.DistinctRootEntity)
.Select(a => a.StartDate)
.List();
but then I got the Value can not be null. Parameter name: source exception. Any ideas how can I fix that query ?
edit
thanks to the xanatos' answer, the final SQL query is correct (I used his 2nd approach):
SELECT distinct this_.StartDate as y0_
FROM Events this_
WHERE
(
this_.UserID = ?
or
this_.EventTypeID = ?
or
exists (SELECT this_0_.ID as y0_
FROM Receivers this_0_
WHERE this_0_.UserID = ?)
)
"In QueryOver, aliases are assigned using an empty variable. The variable can be declared anywhere (but should be empty/default at runtime). The compiler can then check the syntax against the variable is used correctly, but at runtime the variable is not evaluated (it's just used as a placeholder for the alias)." http://nhibernate.info/blog/2009/12/17/queryover-in-nh-3-0.html
Setting List<Receivers> t to empty collection as you did (as you have mentioned in comments) means that you check is event id in local empty collection - doesn't have sense at all.
You can try do your query with subquery (should work but i'm not sure, I wrote it without testing, "by hand"):
Receivers receiversSubQueryAlias = null;
var subquery = session.QueryOver<Events>()
.JoinQueryOver<Receivers>(x => x.Receivers, () => receiversSubqueryAlias, JoinType.Inner)
.Where(()=> receiversSubQueryAlias.UserId == 1)
.Select(x => x.Id)
.TransformUsing(Transformers.DistinctRootEntity);
Events eventsAlias = null;
var mainQueryResults = session.QueryOver<Events>(() => eventsAilas)
.Where(Restrictions.Disjunction()
.Add(() => eventAlias.OwnerId == 1)
.Add(() => eventAlias.EVType.Id == 123)
.Add(Subqueries.WhereProperty<Events>(() => eventAlias.Id).In(subquery))
).Select(x => x.StartDate)
.TransformUsing(Transformers.DistinctRootEntity)
.List();
As written by #fex, you can't simply do a new List<Receivers>. The problem is that you can't mix QueryOver with "LINQ" (the t.Count(...) part). The QueryOver "parser" tries to execute "locally" the t.Count(...) instead of executing it in SQL.
As written by someone else, TransformUsing(Transformers.DistinctRootEntity) is client-side. If you want to do a DISTINCT server-side you have to use Projections.Distinct .
You have to make an explicit subquery. Here there are two variants of the query. the first one is more similar to the LINQ query, the second one doesn't use the Count but uses the Exist (in LINQ you could have done the same by changing the Count(...) > 0 with a Any(...)
Note that when you use a .Select() you normally have to explicitly tell the NHibernate the type of the .List<something>()
Events x = null;
Receivers t = null;
// Similar to LINQ, with COUNT
var subquery2 = QueryOver.Of<Receivers>(() => t)
.Where(() => t.SOMETHING == x.SOMETHING) // The JOIN clause between Receivers and Events
.ToRowCountQuery();
var result2 = Session.QueryOver<Events>(() => x)
.Where(Restrictions.Disjunction()
.Add(() => x.Owner.ID == 1)
.Add(() => x.EVType.ID == 123)
.Add(Subqueries.WhereValue(0).Lt(subquery2))
)
.Select(Projections.Distinct(Projections.Property(() => x.StartDate)))
.List<DateTime>();
// With EXIST
var subquery = QueryOver.Of<Receivers>(() => t)
.Where(() => t.SOMETHING == x.SOMETHING) // The JOIN clause between Receivers and Events
.Select(t1 => t1.ID);
var result = Session.QueryOver<Events>(() => x)
.Where(Restrictions.Disjunction()
.Add(() => x.Owner.ID == 1)
.Add(() => x.EVType.ID == 123)
.Add(Subqueries.WhereExists(subquery))
)
.Select(Projections.Distinct(Projections.Property(() => x.StartDate)))
.List<DateTime>();
Note that you'll have to set "manually" the JOIN condition in the subquery.
Hopefully this answer can help others. This error was being caused by declaring
List<Receivers> t = null;
followed by the query expression
t.Count(y => y.User.ID == 1) > 0
The QueryOver documentation states "The variable can be declared anywhere (but should be empty/default at runtime)." Since in this case, the place holder is a List, you must initialize it as an empty list.
List<Receivers> t = new List<Receivers>();
Otherwise, when you try to reference the Count method, or any other method on the placeholder object, the source (t) will be null.
This however still leaves a problem as #fex and #xanatos, in which it makes no sense to reference Count() from the alias List t, as it won't convert into SQL. Instead you should be creating a subquery. See their answers for more comprehensive answer.

Entity framework use already selected value saved in new variable later in select sentance

I wrote some entity framework select:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id) * count ...
});
I have always select total:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id) * a.OtherTable.Where(b => b.id == id).Sum(c => c.value)
});
Is it possible to select it like in my first example, because I have already retrieved the value (and how to do that) or should I select it again?
One possible approach is to use two successive selects:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id)
})
.Select(x => new
{
count = x.count,
total = x.total * x.count
};
You would simple do
var listFromDatabase = context.MyTable;
var query1 = listFromDatabase.Select(a => // do something );
var query2 = listFromDatabase.Select(a => // do something );
Although to be fair, Select requires you to return some information, and you aren't, you're somewhere getting count & total and setting their values. If you want to do that, i would advise:
var listFromDatabase = context.MyTable.ToList();
listFromDatabase.ForEach(x =>
{
count = do_some_counting;
total = do_some_totalling;
});
Note, the ToList() function stops it from being IQueryable and transforms it to a solid list, also the List object allows the Linq ForEach.
If you're going to do complex stuff inside the Select I would always do:
context.MyTable.AsEnumerable()
Because that way you're not trying to still Query from the database.
So to recap: for the top part, my point is get all the table contents into variables, use ToList() to get actual results (do a workload). Second if trying to do it from a straight Query use AsEnumerable to allow more complex functions to be used inside the Select

Categories

Resources