Flattening Complex LINQ to SQL - c#

I have a somewhat complex LINQ to SQL query that I'm trying to optimise (no, not prematurely, things are slow), that goes a little bit like this;
IQueryable<SearchListItem> query = DbContext.EquipmentLives
.Where(...)
.Select(e => new SearchListItem {
EquipmentStatusId = e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null).Id,
StatusStartDate = e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null).DateFrom,
...
});
The where clauses aren't important, they don't filter EquipmentStatuses, happy to include if someone thinks they're required.
This is on quite a large set of tables and returns a fairly details object, there's more references to EquipmentStatuses, but I'm sure you get the idea. The problem is that there's quite obviously two sub-queries and I'm sure that (among some other things) is not ideal, especially since they are exactly the same sub-query each time.
Is it possible to flatten this out a bit? Perhaps it's easier to do a few smaller queries to the database and create the SearchListItem in a foreach loop?

Here's my take given your comments, and with some assumptions I've made
It may look scary, but give it a try, with and without the ToList() before the GroupBy()
If you have LinqPad, check the SQL produced, and the number of queries, or just plug in the SQL Server Profiler
With LinqPad you could even put a Stopwatch to measure things precisely
Enjoy ;)
var query = DbContext.EquipmentLives
.AsNoTracking() // Notice this!!!
.Where(...)
// WARNING: SelectMany is an INNER JOIN
// You won't get EquipmentLive records that don't have EquipmentStatuses
// But your original code would break if such a case existed
.SelectMany(e => e.EquipmentStatuses, (live, status) => new
{
EquipmentLiveId = live.Id, // We'll need this one for grouping
EquipmentStatusId = status.Id,
EquipmentStatusDateTo = status.DateTo,
StatusStartDate = status.DateFrom
//...
})
// WARNING: Again, you won't get EquipmentLive records for which none of their EquipmentStatuses have a DateTo == null
// But your original code would break if such a case existed
.Where(x => x.EquipmentStatusDateTo == null)
// Now You can do a ToList() before the following GroupBy(). It depends on a lot of factors...
// If you only expect one or two EquipmentStatus.DateTo == null per EquipmentLive, doing ToList() before GroupBy may give you a performance boost
// Why? GroupBy sometimes confuses the EF SQL generator and the SQL Optimizer
.GroupBy(x => x.EquipmentLiveId, x => new SearchListItem
{
EquipmentLiveId = x.EquipmentLiveId, // You may or may not need this?
EquipmentStatusId = x.EquipmentStatusId,
StatusStartDate = x.StatusStartDate,
//...
})
// Now you have one group of SearchListItem per EquipmentLive
// Each group has a list of EquipmenStatuses with DateTo == null
// Just select the first one (you could do g.OrderBy... as well)
.Select(g => g.FirstOrDefault())
// Materialize
.ToList();

You don't need to repeat the FirstOrDefault. You can add an intermediate Select to select it once and then reuse it:
IQueryable<SearchListItem> query = DbContext.EquipmentLives
.Where(...)
.Select(e => e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null))
.Select(s => new SearchListItem {
EquipmentStatusId = s.Id,
StatusStartDate = s.DateFrom,
...
});
In query syntax (which I find more readable) it would look like this:
var query =
from e in DbContext.EquipmentLives
where ...
let s = e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null)
select new SearchListItem {
EquipmentStatusId = s.Id,
StatusStartDate = s.DateFrom,
...
});
There is another problem in your query though. If there is no matching EquipmentStatus in your EquipmentLive, FirstOrDefault will return null, which will cause an exception in the last select. So you might need an additional Where:
IQueryable<SearchListItem> query = DbContext.EquipmentLives
.Where(...)
.Select(e => e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null))
.Where(s => s != null)
.Select(s => new SearchListItem {
EquipmentStatusId = s.Id,
StatusStartDate = s.DateFrom,
...
});
or
var query =
from e in DbContext.EquipmentLives
where ...
let s = e.EquipmentStatuses.FirstOrDefault(s => s.DateTo == null)
where s != null
select new SearchListItem {
EquipmentStatusId = s.Id,
StatusStartDate = s.DateFrom,
...
});

Given that you don't test for null after calling FirstOrDefault(s => s.DateTo == null) I assume that:
either for each device there is always a status with DateTo == null or
you need to see only devices which have such status
In order to do so you need to join EquipmentLives with EquipmentStatuses to avoid subqueries:
var query = DbContext.EquipmentLives
.Where(l => true)
.Join(DbContext.EquipmentStatuses.Where(s => s.DateTo == null),
eq => eq.Id,
status => status.EquipmentId,
(eq, status) => new SelectListItem
{
EquipmentStatusId = status.Id,
StatusStartDate = status.DateFrom
});
However, if you do want to perform a left join replace DbContext.EquipmentStatuses.Where(s => s.DateTo == null) with DbContext.EquipmentStatuses.Where(s => s.DateTo == null).DefaultIfEmpty().

Related

Is it possible to combine small queries into single query?

I have the following queries:
var truckCount = await DbContext.Trucks
.Where(t => t.Departure == null)
.CountAsync();
var firstTruck = await DbContext.Trucks
.Where(t => t.Departure == null)
.MinAsync(t => t.Arrival);
var railcarCount = await DbContext.Railcars
.Where(r => r.Departure == null)
.CountAsync();
var firstRailcar = await DbContext.Railcars
.Where(t => t.Departure == null)
.MinAsync(t => t.Arrival);
Can anyone tell me if it's possible to combine these queries into one so that there is only one round trip to the database?
I'd be looking to generate a query something like this.
select
(select count(*) from Trucks where Departure is null) as TruckCount,
(select min(Arrival) from Trucks where Departure is null) as FirstTruck,
(select count(*) from Railcars where Departure is null) as RailcarCount,
(select min(Arrival) from Railcars where Departure is null) as FirstRailcar
My backend is SQL Server.
You need to use a third party library which enables to execute multiple queries in a single roundtrip to the database. Maybe this extension with it's future queries works for you.
Otherwise you could implement a stored-procedure which encapsulates the queries (as subqueries) and returns the desired information.
Another option might be to just use 2 queries instead of 4:
var truckInfo = await DbContext.Trucks
.GroupBy(t => t.Departure == null)
.Where(g => g.Key == true)
.Select(g => new { Count = g.Count(), FirstTruck = g.Min(t => t.Arrival) })
.FirstOrDefaultAsync() ?? new { Count = 0, FirstTruck = DateTime.MinValue };
// same for Railcars
Not with linq, no. Why? Because of two reasons:
Query syntax has no way to get count and use union to get from one query
Method count, is immediate execution and not deferred, so you can't chain into one query
To be honest, that would be difficult to achieve even with a sql query as the data has different data types and columns.
Just in case, there is EF Core extension linq2db.EntityFrameworkCore (note that I'm one of the creators) which can run this query and almost any SQL ANSI query via LINQ
using var db = DbContext.CreateLinqToDBConnection();
var trucks = DbContext.Trucks
.Where(t => t.Departure == null);
var railcars = DbContext.Railcars
.Where(r => r.Departure == null);
var result = await db.SelectAsync(() => new
{
TruckCount = trucks.Ccount(),
FirstTruck = trucks.Min(t => t.Arrival),
RailcarCount = railcars.Count(),
FirstRailcar = railcars.Min(t => t.Arrival)
});

LINQ efficiency

Consider the following LINQ statements:
var model = getModel();
// apptId is passed in, not the order, so get the related order id
var order = (model.getMyData
.Where(x => x.ApptId == apptId)
.Select(y => y.OrderId));
var orderId = 0;
var orderId = order.LastOrDefault();
// see if more than one appt is associated to the order
var apptOrders = (model.getMyData
.Where(x => x.OrderId == orderId)
.Select(y => new { y.OrderId, y.AppointmentsId }));
This code works as expected, but I could not help but think that there is a more efficient way to accomplish the goal ( one call to the db ).
Is there a way to combine the two LINQ statements above into one? For this question please assume I need to use LINQ.
You can use GroupBy method to group all orders by OrderId. After applying LastOrDefault and ToList will give you the same result which you get from above code.
Here is a sample code:
var apptOrders = model.getMyData
.Where(x => x.ApptId == apptId)
.GroupBy(s => s.OrderId)
.LastOrDefault().ToList();
Entity Framework can't translate LastOrDefault, but it can handle Contains with sub-queries, so lookup the OrderId as a query and filter the orders by that:
// apptId is passed in, not the order, so get the related order id
var orderId = model.getMyData
.Where(x => x.ApptId == apptId)
.Select(y => y.OrderId);
// see if more than one appt is associated to the order
var apptOrders = model.getMyData
.Where(a => orderId.Contains(a.OrderId))
.Select(a => a.ApptId);
It seems like this is all you need:
var apptOrders =
model
.getMyData
.Where(x => x.ApptId == apptId)
.Select(y => new { y.OrderId, y.AppointmentsId });

Why EF code is not selecting a single column?

I have used this to pick just a single column from the collection but it doesn't and throws casting error.
ClientsDAL ClientsDAL = new DAL.ClientsDAL();
var clientsCollection= ClientsDAL.GetClientsCollection();
var projectNum = clientsCollection.Where(p => p.ID == edit.Clients_ID).Select(p => p.ProjectNo).ToString();
Method:
public IEnumerable<Clients> GetClientsCollection(string name = "")
{
IEnumerable<Clients> ClientsCollection;
var query = uow.ClientsRepository.GetQueryable().AsQueryable();
if (!string.IsNullOrEmpty(name))
{
query = query.Where(x => x.Name.Contains(name));
}
ClientsCollection = (IEnumerable<Clients>)query;
return ClientsCollection;
}
As DevilSuichiro said in comments you should not cast to IEnumerable<T> just call .AsEnumerable() it will keep laziness.
But in your case it looks like you do not need that at all because First or FirstOrDefault work with IQueryable too.
To get a single field use this code
clientsCollection
.Where(p => p.ID == edit.Clients_ID)
.Select(p => p.ProjectNo)
.First() // if you sure that at least one item exists
Or (more safe)
var projectNum = clientsCollection
.Where(p => p.ID == edit.Clients_ID)
.Select(p => (int?)p.ProjectNo)
.FirstOrDefault();
if (projectNum != null)
{
// you find that number
}
else
{
// there is no item with such edit.Clients_ID
}
Or even simpler with null propagation
var projectNum = clientsCollection
.FirstOrDefault(p => p.ID == edit.Clients_ID)?.ProjectNo;

Using two Linq query in a single method

As shown in the below code, the API will hit the database two times to perform two Linq Query. Can't I perform the action which I shown below by hitting the database only once?
var IsMailIdAlreadyExist = _Context.UserProfile.Any(e => e.Email == myModelUserProfile.Email);
var IsUserNameAlreadyExist = _Context.UserProfile.Any(x => x.Username == myModelUserProfile.Username);
In order to make one request to database you could first filter for only relevant values and then check again for specific values in the query result:
var selection = _Context.UserProfile
.Where(e => e.Email == myModelUserProfile.Email || e.Username == myModelUserProfile.Username)
.ToList();
var IsMailIdAlreadyExist = selection.Any(x => x.Email == myModelUserProfile.Email);
var IsUserNameAlreadyExist = selection.Any(x => x.Username == myModelUserProfile.Username);
The .ToList() call here will execute the query on database once and return relevant values
Start with
var matches = _Context
.UserProfile
.Where(e => e.Email == myModelUserProfile.Email)
.Select(e => false)
.Take(1)
.Concat(
_Context
.UserProfile
.Where(x => x.Username == myModelUserProfile.Username)
.Select(e => true)
.Take(1)
).ToList();
This gets enough information to distinguish between the four possibilities (no match, email match, username match, both match) with a single query that doesn't return more than two rows at most, and doesn't retrieve unused information. Hence about as small as such a query can be.
With this done:
bool isMailIdAlreadyExist = matches.Any(m => !m);
bool isUserNameAlreadyExist = matches.LastOrDefault();
It's possible with a little hack, which is grouping by a constant:
var presenceData = _Context.UserProfile.GroupBy(x => 0)
.Select(g => new
{
IsMailIdAlreadyExist = g.Any(x => x.Email == myModelUserProfile.Email),
IsUserNameAlreadyExist = g.Any(x => x.Username == myModelUserProfile.Username),
}).First();
The grouping gives you access to 1 group containing all UserProfiles that you can access as often as you want in one query.
Not that I would recommend it just like that. The code is not self-explanatory and to me it seems a premature optimization.
You can do it all in one line, using ValueTuple and LINQ's .Aggregate() method:
(IsMailIdAlreadyExist, IsUserNameAlreadyExist) = _context.UserProfile.Aggregate((Email:false, Username:false), (n, o) => (n.Email || (o.Email == myModelUserProfile.Email ? true : false), n.Username || (o.Username == myModelUserProfile.Username ? true : false)));

Entity framework use already selected value saved in new variable later in select sentance

I wrote some entity framework select:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id) * count ...
});
I have always select total:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id) * a.OtherTable.Where(b => b.id == id).Sum(c => c.value)
});
Is it possible to select it like in my first example, because I have already retrieved the value (and how to do that) or should I select it again?
One possible approach is to use two successive selects:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id)
})
.Select(x => new
{
count = x.count,
total = x.total * x.count
};
You would simple do
var listFromDatabase = context.MyTable;
var query1 = listFromDatabase.Select(a => // do something );
var query2 = listFromDatabase.Select(a => // do something );
Although to be fair, Select requires you to return some information, and you aren't, you're somewhere getting count & total and setting their values. If you want to do that, i would advise:
var listFromDatabase = context.MyTable.ToList();
listFromDatabase.ForEach(x =>
{
count = do_some_counting;
total = do_some_totalling;
});
Note, the ToList() function stops it from being IQueryable and transforms it to a solid list, also the List object allows the Linq ForEach.
If you're going to do complex stuff inside the Select I would always do:
context.MyTable.AsEnumerable()
Because that way you're not trying to still Query from the database.
So to recap: for the top part, my point is get all the table contents into variables, use ToList() to get actual results (do a workload). Second if trying to do it from a straight Query use AsEnumerable to allow more complex functions to be used inside the Select

Categories

Resources