Linq: Performant database query only querying every nth element - c#

I'm working on a personal project where I'm in need of some help with a performant Linq query to a database. The DB in question could have millions of log entries, and through an API (Asp) I want to have an option to only return a representative subset of those logs to a graphical interface.
Here's the method in question:
public IEnumerable<Log> GetByParameter(int ParameterID,DateTime timeStart, DateTime timeEnd)
{
return _context.Logs.Where
(a => a.ParameterID == ParameterID &&
(DateTime.Compare(a.LogDate,timeStart) > 0 && DateTime.Compare(a.LogDate,timeEnd) < 0)).ToList();
}
Note that the method takes in two DateTimes as parameters, which yield a range of time where the logs should be queried.
I would want to augment this method like so:
public IEnumerable<Log> GetByParameter(int ParameterID,DateTime timeStart, DateTime timeEnd, int limit)
For example, the DB might contain 2 million entries given the parameters passed, and the "limit" of the consumer of the API might be 40 000 entries. Thus:
numberOfEntries/limit = n
2*106 / 4*104 = 50
In this example I would want to return every 50th element to the consumer of the API, with an evenly spaced time interval between the elements.
An easy way would just be to query the entire table given the parameters and then filter out afterwards, but that seems messy and a bit antithetical to this approach, possibly very ineffective as well.
So here is my question: Is there any way to write a query such that it only queries the DB for every Nth row?
Thanks in advance!

You can implement it using SQL Server window functions like row_number:
WITH x AS
(
SELECT ROW_NUMBER() over (order by LogDate) as rn, *
FROM MyTable
WHERE
ParameterID = #ParameterID AND
LogDate > #StartDate AND
LogDate < #EndDate
)
SELECT * from X WHERE rn % 50 = 0
In LINQ you can try to use the following clause:
var data = _context.Logs
.Select((x, i) => new { Data = x, Number = i })
.Where(x => x.Number % 50 == 0)
.Select(x => x.Data);
But it's necessary to check actual execution plan, I guess that it will not be optimal.
Don't forget to create an index on LogDate.
Honestly I'm not sure that SQL Server is a good choice to store logs, I would like to use something like Elastic.

An approach you could take is using modulus on some kind of index. If you already have an auto generated Id, that could be used - but it's not ideal as you can't rely on it being continuous.
You could use RANK() to create an index column within a view, but unfortunately you can't use RANK() directly from EF code.
Something like the following:
var interval = 5;
return _context.Logs
.Where(a =>
a.ParameterID == ParameterID &&
(
DateTime.Compare(a.LogDate,timeStart) > 0 &&
DateTime.Compare(a.LogDate,timeEnd) < 0) &&
a.Id % interval == 0).ToList(); //Filter on modulus of an index
In this instance however I personally would write the query in SQL.

Related

C# LINQ to Entities - Retrieve all records and position for waiting list records

I have a waitlist table (id, locationid, timeslotid, sessiondate, memberid, dateAdded) which contains a list of people on various waitlist for various appointments.
I am trying to retrieve a list of all waitlist records for a specific user but i also need to get the position of that record (ordered by dateAdded) so that i can see if they are in position 1, 2, 3 etc in the queue.
The following code is what i have atm for getting all user records, but it am struggling on how to join the count to this query.
db.WaitingLists.Where(x => x.MemberId == member.Id && x.LocationId == locationId && x.SessionDate >= currentLocationDate.Date).ToList();
Some suggestions would be welcomed on how to do this.
Thanks
============= UPDATE ==============
This is the SQL that provides the response I need. I am trying to prevent using a stored Proc and try and use linq to entities where possible.
select
(
SELECT count(*) RowNr
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY CreatedDate) AS RowNr,
MemberId
FROM waitinglist where LocationId = wl.LocationId and TimeSlotId = wl.TimeSlotId and [SessionDate] = wl.SessionDate
) sub
)
as Position, * from WaitingList as wl where memberid = '00000000-0000-0000-0000-000000000000'
I haven't tested this, but it should be pretty close. Filter first by the location and date, then sort it, then use the overload of Select that gives you an index, then filter by the member ID.
db.WaitingLists
.Where(x => x.Member.LocationId == locationId && x.Member.SessionDate >= currentLocationDate.Date)
.OrderBy(x => x.DateAdded)
.Select((x, i) => new { Position = i, Member = x })
.Where(x => x.Member.MemberId == member.Id)
.ToList();
This will give you a list of anonymous objects with two properties:
Position, which is the position in the waiting list, and
Member, which is the member details
I can't say what the SQL will actually look like, and if it will be efficient.
This might work for you. I'm assuming that the queue position is across all locations and session dates. If it isn't, insert another Where clause in between db.WaitingLists and the OrderBy.
var temp = db.WaitingLists
.Orderby(x => x.dateAdded)
.Select( (r, i) => new { Request = r, QueuePosition = i + 1)
.Where(x => x.Request.MemberId == member.Id &&
x.Request.LocationId == locationId &&
x.Request.SessionDate >= currentLocationDate.Date);
You now have a list of anonymous-type objects that have the Request (my name for your type; rename it if you want to) and a second property called QueuePosition, based on the position of the requests in the queue, as represented by the WaitingList sorted by dateAdded.
If you need to extract just the Request object, you can get it this way:
var selectedRequests = temp.Select(x => x.Request).ToList();
If you want to get any entries that were in the first 3 positions, you'd do this:
var selectedRequests = temp.Where(x => x.QueuePosition <= 3).Select(x => x.Request).ToList();
Disclaimer: done from memory, not actually tested.

Adjust .NET Linq results list so it shows ASC future then DESC past list which can be skipped for a single page

I've read the many related questions but can't find this exactly.
I'm trying to adjust the ordering on an OLD .NET (4.0) web page so it shows events that are upcoming ASC (showing closest in the future first), followed by events in the past showing them DESC in a single list that is skipped to take a 'page' of results.
So like this:
Event 1 - tomorrow
Event 2 - in a week
Event 3 - in a month
Event 4 - yesterday
Event 5 - a week ago
Event 6 - a month ago
The current function grabs the list and does a sort, skip and take (a single page):
// Currently this creates a single list order by date desc
var numToLoad = 20;
var list = _context.AllItems.Where(i => i.typeTitle == source);
var items = list.OrderByDescending(i => i.dateCreated).Skip((pageNum - 1) * numToLoad).Take(numToLoad);
I have tried making two lists, ordering each appropriately, and then concatenating them, but then I can't do a skip, as that requires a sorted list.
// We need a single list but with upcoming dates first, ascending, then past dates descending
var dateNow = DateTime.Now;
var listFuture = _context.AllItems.Where(i => i.typeTitle == source && i.dateCreated >= dateNow).OrderBy(i => i.dateCreated);
var listPast = _context.AllItems.Where(i => i.typeTitle == source && i.dateCreated < dateNow).OrderByDescending(i => i.dateCreated);
var listAll = listFuture.Concat(listPast);
var itemsAll = listAll.Skip((pageNum - 1) * numToLoad).Take(numToLoad); // <-- this gives an error as it's not sorted
So I don't have to rewrite all the code that handles the returned list (pagination etc) I'd really like to be able to return a single list from the function!
I did see that it might be possible to do conditional sorting, then do the skip and take in a single linq but I just can't get anything like that to work.
Any ideas?
The problem here is that .Concat() of two queries loose ordering when translated to SQL (you cannot to UNION two queries with different order each).
If you are using MSSQL, you can use ordering like that:
var dateNow = DateTime.Now;
var query = _context.AllItems
.Where(i => i.typeTitle == source)
// future items first
.OrderBy(i => i.dateCreated >= dateNow ? 0 : 1)
// items which closer to now first
.ThenBy(i => Math.Abs(System.Data.Entity.SqlServer.SqlFunctions.DateDiff("day", dateNow, i.dateCreated).Value));
var list = query.Skip((page - 1) * pageSize).Take(pageSize).ToList();

Fetch every nth row with LINQ

We have a table in our SQL database with historical raw data I need to create charts from. We access the DB via Entity Framework and LINQ.
For smaller datetime intervals, I can simply read the data and generate the charts:
var mydata = entity.DataLogSet.Where(dt => dt.DateTime > dateLimit);
But we want to implement a feature where you can quickly "zoom out" from the charts to include larger date intervals (last 5 days, last month, last 6 months, last 10 years and so on and so forth.)
We don't want to chart every single data point for this. We want to use a sample of the data, by which I mean something like this --
Last 5 days: chart every data point in the table
Last month: chart every 10th data point in the table
Last 6 months: chart every 100th data point
The number of data points and chart names are only examples. What I need is a way to pick only the "nth" row from the database.
You can use the Select overload that includes the item index of enumerations. Something like this should do the trick --
var data = myDataLogEnumeration.
Select((dt,i) => new { DataLog = dt, Index = i }).
Where(x => x.Index % nth == 0).
Select(x => x.DataLog);
If you need to limit the query with a Where or sort with OrderBy, you must do it before the first Select, otherwise the indexes will be all wrong --
var data = myDataLogEnumeration.
Where(dt => dt.DateTime > dateLimit).
OrderBy(dt => dt.SomeField).
Select((dt,i) => new { DataLog = dt, Index = i }).
Where(x => x.Index % nth == 0).
Select(x => x.DataLog);
Unfortunately, as juharr commented, this overload is not supported in Entity Framework. One way to deal with this is to do something like this --
var data = entity.DataLogSet.
Where(dt => dt.DateTime > dateLimit).
OrderBy(dt => dt.SomeField).
ToArray().
Select((dt,i) => new { DataLog = dt, Index = i }).
Where(x => x.Index % nth == 0).
Select(x => x.DataLog);
Note the addition of a ToArray(). This isn't ideal though as it will force loading all the data that matches the initial query before selecting only every nth row.
There might be a trick that is supported by ef that might work for this.
if (step != 0)
query = query.Where(_ => Convert.ToInt32(_.Time.ToString().Substring(14, 2)) % step == 0);
this code converts the date into string then cuts the minutes out converts the minutes into an int and then gets every x'th minute for example if the variable step is 5 it's every 5 minutes.
For Postgresql this converts to:
WHERE ((substring(c.time::text, 15, 2)::INT % #__step_1) = 0)
this works best with fixed meassure points such as once a minute.
However, you can use the same method to group up things by cutting up to the hour or the minutes or the first part of the minute (10 minutes grouped) and use aggregation functions such as max() average() sum(), what might even is more desirable.
For example, this groups up in hours and takes the max of most but the average of CPU load:
using var ef = new DbCdr.Context();
IQueryable<DbCdr.CallStatsLog> query;
query = from calls in ef.Set<DbCdr.CallStatsLog>()
group calls by calls.Time.ToString().Substring(0, 13)
into g
orderby g.Max(_ => _.Time) descending
select new DbCdr.CallStatsLog()
{
Time = g.Min(_ => _.Time),
ConcurrentCalls = g.Max(_ => _.ConcurrentCalls),
CpuUsage = (short)g.Average(_ => _.CpuUsage),
ServerId = 0
};
var res = query.ToList();
translates to:
SELECT MAX(c.time) AS "Time",
MAX(c.concurrent_calls) AS "ConcurrentCalls",
AVG(c.cpu_usage::INT::double precision)::smallint AS "CpuUsage",
0 AS "ServerId"
FROM call_stats_log AS c
GROUP BY substring(c.time::text, 1, 13)
ORDER BY MAX(c.time) DESC
note: the examples work with postgres and iso datestyle.

Smartest and Most Performant Way to find the First Element Matching a Condition in a TableStorage

Assume I have a huge table storage where customers are stored in. Let's say Partition Key is their Zip-Code, RowKey is their signup-timestamp.
Now, what is the smartest and most efficient way to find the ONE FIRST customer for a given (zip code) area, who signed up after a given date (the early bird :-) )? Assume that the entries where not ordered when written to the table storage.
My initial idea was to have a helper method like this (which I need anyway for other purposes):
public IEnumerable<Customer> GetCustomers(string zip, long stampStart, long stampEnd)
{
if (_table == null) return new List<Customer>();
return query = (from entry in _table.CreateQuery<Customer>()
where entry.PartitionKey == zip
&& entry.RowKey.CompareTo(stampStart) <= 0
&& entry.RowKey.CompareTo(stampEnd) >= 0
select entry);
}
and then use it to fire a request like this:
public Customer GetEarlyBird(string zip, long stamp)
{
if (_table == null) return null;
return
GetCustomers(zip, stamp, stamp + 31536000) //covers a one year period
.OrderBy(x => x.SignupStamp)
.FirstOrDefault();
}
And finally call
var zip = //some zip code;
var lookupStamp = //some long timestamp;
var earlyBird = GetEarlyBird(zip, lookupStamp);
However, due to the OrderBy-call, the entire query result must be evaluated which takes forever. On the other hand, without ordering the result of the query, FirstOrDefault does not necessarily return the Customer who signed up closest after stamp, but instead the first in the list (which could be any customer from that area, as they were not necessarily ordered when stored in the table).
What am I missing? What is the smartest way to "outsource" the ordering to the Database instead of doing it in memory? Or has my approach some other major flaws I'm missing?
If you convert the signup-timestamp to DateTime.Ticks and then subtract that from DateTime.Max.Ticks and use that as the row key, Azure Table Storage service will naturally sort the latest entry at the top because it will have the smallest row key. So if you query with specific partition key with Take(1) you will retrieve the latest entry for that partition key. This way there is no partition scan, there is no filtering neither in client nor service.
As Dogu Arslan says you could use the DateTime.Ticks to become your rowkey and use take method to get the first value.
More details, you could refer to below codes:
DateTime d1 = new DateTime(2016, 11, 1);
DateTime d2 = new DateTime(2016, 12, 1);
var query = (from ent in query2
where
ent.PartitionKey == "ZIP"
&& ent.RowKey.CompareTo(string.Format("{0:D19}", d1.Ticks)) > 0
&& ent.RowKey.CompareTo(string.Format("{0:D19}", d2.Ticks)) < 0
select ent).Take(1).FirstOrDefault() ;
I suggest you could also pay attention to below things:
1.If you want to get the who firstly signed up after a given date.
I suggest you could directly use the DateTime.Now.Ticks, since the azure table will automatic order the entities by partition key and row key ascending.
The early time ticks will be smaller than the time now ticks.
2.You must pad the reverse tick value with leading zeroes to ensure the string value sorts as expected.
More details, you could refer to below image:

Multiple LINQ to SQL queries. Enumeration optimisation

I have a list of ~ 15,000 'team's that need an individual linq query to return results.
Namely - [Select the last 10 games for 'team']
public IEnumerable<ResultsByDate> SelectLast10Games(DateTime date, string team)
{
return
( from e in db.E0s
where e.DateFormatted < date &&
(e.HomeTeam == team || e.AwayTeam == team)
orderby e.Date descending
select new ResultsByDate
{
Date = e.Date,
HomeTeam = e.HomeTeam,
AwayTeam = e.AwayTeam,
HomeGoals = e.FTHG,
AwayGoals = e.FTAG
}
).Take(10);
}
This query is probably fine, it seems fast enough when called 15,000 times.
My real issue is that I have to enumerate each query and this really kills the performance.
For each of these queries I need to run a method on the 10 results and hence the queries need enumerating.
The question is how can I avoid 15,000 enumerations?
I thought about placing each of the results into a big list and then calling .ToList() or whatever's best, but adding to a List enumerates as it goes along so this doesn't seem viable.
Is there a way to combine all 15,000 LINQ queries into one giant LINQ query such as..
public IEnumerable<ResultsByDate> SelectLast10Games(DateTime date, List<string> Teams)
{
foreach(var team in Teams)
{ var query =
(from e in db.E0s
where e.DateFormatted < date &&
(e.HomeTeam == team || e.AwayTeam == team)
orderby e.Date descending
select new ResultsByDate
{
Date = e.Date,
HomeTeam = e.HomeTeam,
AwayTeam = e.AwayTeam,
HomeGoals = e.FTHG,
AwayGoals = e.FTAG
}
).Take(10);
}
}
So this would return one huge result set that I can then enumerate in one go and work from there?
I have tried but I can't seem to get the LINQ loop correct ( if it's even possible - and the best way to fix my issue).
The whole program takes ~ 29 minutes to complete. Without the enumeration its around 30 seconds which is not amazing but satisfactory given the criteria.
Thanks!
This can be accomplish with using Teams.Select(team => ..)
var query = Teams
.Select(team =>
db.E0s
.Where(e => e.DateFormatted < date && (e.HomeTeam == team || e.AwayTeam == team))
.OrderByDescending(e => e.Date)
.Select(
e =>
new ResultsByDate {
Date = e.Date,
HomeTeam = e.HomeTeam,
AwayTeam = e.AwayTeam,
HomeGoals = e.FTHG,
AwayGoals = e.FTAG
}
)
.Take(10)
)
If you're looking for best performance for heavily querying, you should consider using SQL Stored Procedure and calling it using ADO.NET, Dapper or EntityFramework (The order of choices is from the optimal to the trivial) My recommendation is using Dapper. This will speed up your query, especially if the table is indexed correctly.
To feed 15k parameters efficiently into server, You can use TVP:
http://blog.mikecouturier.com/2010/01/sql-2008-tvp-table-valued-parameters.html
My real issue is that I have to enumerate each query and this really
kills the performance.
Unless You enumerate the result, there is no call to the server. So no wonder it is fast without enumeration. But that does not mean that the enumeration is the problem.

Categories

Resources