Fetch every nth row with LINQ - c#

We have a table in our SQL database with historical raw data I need to create charts from. We access the DB via Entity Framework and LINQ.
For smaller datetime intervals, I can simply read the data and generate the charts:
var mydata = entity.DataLogSet.Where(dt => dt.DateTime > dateLimit);
But we want to implement a feature where you can quickly "zoom out" from the charts to include larger date intervals (last 5 days, last month, last 6 months, last 10 years and so on and so forth.)
We don't want to chart every single data point for this. We want to use a sample of the data, by which I mean something like this --
Last 5 days: chart every data point in the table
Last month: chart every 10th data point in the table
Last 6 months: chart every 100th data point
The number of data points and chart names are only examples. What I need is a way to pick only the "nth" row from the database.

You can use the Select overload that includes the item index of enumerations. Something like this should do the trick --
var data = myDataLogEnumeration.
Select((dt,i) => new { DataLog = dt, Index = i }).
Where(x => x.Index % nth == 0).
Select(x => x.DataLog);
If you need to limit the query with a Where or sort with OrderBy, you must do it before the first Select, otherwise the indexes will be all wrong --
var data = myDataLogEnumeration.
Where(dt => dt.DateTime > dateLimit).
OrderBy(dt => dt.SomeField).
Select((dt,i) => new { DataLog = dt, Index = i }).
Where(x => x.Index % nth == 0).
Select(x => x.DataLog);
Unfortunately, as juharr commented, this overload is not supported in Entity Framework. One way to deal with this is to do something like this --
var data = entity.DataLogSet.
Where(dt => dt.DateTime > dateLimit).
OrderBy(dt => dt.SomeField).
ToArray().
Select((dt,i) => new { DataLog = dt, Index = i }).
Where(x => x.Index % nth == 0).
Select(x => x.DataLog);
Note the addition of a ToArray(). This isn't ideal though as it will force loading all the data that matches the initial query before selecting only every nth row.

There might be a trick that is supported by ef that might work for this.
if (step != 0)
query = query.Where(_ => Convert.ToInt32(_.Time.ToString().Substring(14, 2)) % step == 0);
this code converts the date into string then cuts the minutes out converts the minutes into an int and then gets every x'th minute for example if the variable step is 5 it's every 5 minutes.
For Postgresql this converts to:
WHERE ((substring(c.time::text, 15, 2)::INT % #__step_1) = 0)
this works best with fixed meassure points such as once a minute.
However, you can use the same method to group up things by cutting up to the hour or the minutes or the first part of the minute (10 minutes grouped) and use aggregation functions such as max() average() sum(), what might even is more desirable.
For example, this groups up in hours and takes the max of most but the average of CPU load:
using var ef = new DbCdr.Context();
IQueryable<DbCdr.CallStatsLog> query;
query = from calls in ef.Set<DbCdr.CallStatsLog>()
group calls by calls.Time.ToString().Substring(0, 13)
into g
orderby g.Max(_ => _.Time) descending
select new DbCdr.CallStatsLog()
{
Time = g.Min(_ => _.Time),
ConcurrentCalls = g.Max(_ => _.ConcurrentCalls),
CpuUsage = (short)g.Average(_ => _.CpuUsage),
ServerId = 0
};
var res = query.ToList();
translates to:
SELECT MAX(c.time) AS "Time",
MAX(c.concurrent_calls) AS "ConcurrentCalls",
AVG(c.cpu_usage::INT::double precision)::smallint AS "CpuUsage",
0 AS "ServerId"
FROM call_stats_log AS c
GROUP BY substring(c.time::text, 1, 13)
ORDER BY MAX(c.time) DESC
note: the examples work with postgres and iso datestyle.

Related

Adjust .NET Linq results list so it shows ASC future then DESC past list which can be skipped for a single page

I've read the many related questions but can't find this exactly.
I'm trying to adjust the ordering on an OLD .NET (4.0) web page so it shows events that are upcoming ASC (showing closest in the future first), followed by events in the past showing them DESC in a single list that is skipped to take a 'page' of results.
So like this:
Event 1 - tomorrow
Event 2 - in a week
Event 3 - in a month
Event 4 - yesterday
Event 5 - a week ago
Event 6 - a month ago
The current function grabs the list and does a sort, skip and take (a single page):
// Currently this creates a single list order by date desc
var numToLoad = 20;
var list = _context.AllItems.Where(i => i.typeTitle == source);
var items = list.OrderByDescending(i => i.dateCreated).Skip((pageNum - 1) * numToLoad).Take(numToLoad);
I have tried making two lists, ordering each appropriately, and then concatenating them, but then I can't do a skip, as that requires a sorted list.
// We need a single list but with upcoming dates first, ascending, then past dates descending
var dateNow = DateTime.Now;
var listFuture = _context.AllItems.Where(i => i.typeTitle == source && i.dateCreated >= dateNow).OrderBy(i => i.dateCreated);
var listPast = _context.AllItems.Where(i => i.typeTitle == source && i.dateCreated < dateNow).OrderByDescending(i => i.dateCreated);
var listAll = listFuture.Concat(listPast);
var itemsAll = listAll.Skip((pageNum - 1) * numToLoad).Take(numToLoad); // <-- this gives an error as it's not sorted
So I don't have to rewrite all the code that handles the returned list (pagination etc) I'd really like to be able to return a single list from the function!
I did see that it might be possible to do conditional sorting, then do the skip and take in a single linq but I just can't get anything like that to work.
Any ideas?
The problem here is that .Concat() of two queries loose ordering when translated to SQL (you cannot to UNION two queries with different order each).
If you are using MSSQL, you can use ordering like that:
var dateNow = DateTime.Now;
var query = _context.AllItems
.Where(i => i.typeTitle == source)
// future items first
.OrderBy(i => i.dateCreated >= dateNow ? 0 : 1)
// items which closer to now first
.ThenBy(i => Math.Abs(System.Data.Entity.SqlServer.SqlFunctions.DateDiff("day", dateNow, i.dateCreated).Value));
var list = query.Skip((page - 1) * pageSize).Take(pageSize).ToList();

Linq: Performant database query only querying every nth element

I'm working on a personal project where I'm in need of some help with a performant Linq query to a database. The DB in question could have millions of log entries, and through an API (Asp) I want to have an option to only return a representative subset of those logs to a graphical interface.
Here's the method in question:
public IEnumerable<Log> GetByParameter(int ParameterID,DateTime timeStart, DateTime timeEnd)
{
return _context.Logs.Where
(a => a.ParameterID == ParameterID &&
(DateTime.Compare(a.LogDate,timeStart) > 0 && DateTime.Compare(a.LogDate,timeEnd) < 0)).ToList();
}
Note that the method takes in two DateTimes as parameters, which yield a range of time where the logs should be queried.
I would want to augment this method like so:
public IEnumerable<Log> GetByParameter(int ParameterID,DateTime timeStart, DateTime timeEnd, int limit)
For example, the DB might contain 2 million entries given the parameters passed, and the "limit" of the consumer of the API might be 40 000 entries. Thus:
numberOfEntries/limit = n
2*106 / 4*104 = 50
In this example I would want to return every 50th element to the consumer of the API, with an evenly spaced time interval between the elements.
An easy way would just be to query the entire table given the parameters and then filter out afterwards, but that seems messy and a bit antithetical to this approach, possibly very ineffective as well.
So here is my question: Is there any way to write a query such that it only queries the DB for every Nth row?
Thanks in advance!
You can implement it using SQL Server window functions like row_number:
WITH x AS
(
SELECT ROW_NUMBER() over (order by LogDate) as rn, *
FROM MyTable
WHERE
ParameterID = #ParameterID AND
LogDate > #StartDate AND
LogDate < #EndDate
)
SELECT * from X WHERE rn % 50 = 0
In LINQ you can try to use the following clause:
var data = _context.Logs
.Select((x, i) => new { Data = x, Number = i })
.Where(x => x.Number % 50 == 0)
.Select(x => x.Data);
But it's necessary to check actual execution plan, I guess that it will not be optimal.
Don't forget to create an index on LogDate.
Honestly I'm not sure that SQL Server is a good choice to store logs, I would like to use something like Elastic.
An approach you could take is using modulus on some kind of index. If you already have an auto generated Id, that could be used - but it's not ideal as you can't rely on it being continuous.
You could use RANK() to create an index column within a view, but unfortunately you can't use RANK() directly from EF code.
Something like the following:
var interval = 5;
return _context.Logs
.Where(a =>
a.ParameterID == ParameterID &&
(
DateTime.Compare(a.LogDate,timeStart) > 0 &&
DateTime.Compare(a.LogDate,timeEnd) < 0) &&
a.Id % interval == 0).ToList(); //Filter on modulus of an index
In this instance however I personally would write the query in SQL.

Select if criteria is met

Lets say I have a datatable with three columns: timestamp, curveID and price. I would like to give a time and then select for each day the timestamp, curveID and price but only if all curveIDs are present.
The problem is, not for every time all the data is present, so at 10:00:00 there might be only data for curveID 1 but nothing for ID =2, and so forth.
I thought I could do the following to select the first dataset where all curveIDs are there and time is greater or equal to my criteria:
dataSet.ReadXml(#"C:\temp\Prices.xml", XmlReadMode.InferTypedSchema);
ds = dataSet.Tables[0];
var dt = ds.Clone();
int criteria = 10;
var list = ds.AsEnumerable().Where(x => x.Field<DateTime>("Timestamp").Hour >= criteria)
.GroupBy(x => new{Date = x.Field<DateTime>("Timestamp").Date, Curve = x.Field<object>("CurveID")})
.First().ToList();
However, this returns multiple records on the same day (at different times) for the same curve ID.
I would like to return only a single record for each curveID on each day at a time close to the criteria time where all curveIDs are present.
For clarity, lets say I m looking for curveID 1 & 2, if at 10:00:00 on day 1 only curveID 1 is present but curveID 2 is missing I would need to check whether at 10:01:00 both are there, if yes I take for that day the two record sets from that time. This I would have to check for every day in the database
// criteria is your integer Hour representation
var criteria = 10;
// array of curveIds to look for
var curveIds = new int[] {1, 2};
var result =
// grouping by date first
ds.GroupBy(x => x.Field<DateTime>("Timestamp").Date,
(date, items) => new { date, items = items
// items with the same timestamp go to one group
.GroupBy(i => i.Field<DateTime>("Timestamp"), (datetime, timestampItems) => new { datetime, timestampItems })
// filter by criteria
.Where(dti => dti.datetime.Hour >= criteria)
// filter by curveIds
.Where(dti => curveIds.All(cid => dti.timestampItems.Any(tsi => tsi.Field<int>("curveID") == cid)))
.OrderBy(dti => dti.datetime)
.FirstOrDefault() });
In the end you will receive a "per day" result fitting all your mentioned requirements: occurs after some criteria, have all curveIds, be earliest one.
You may want to group by Date first and then by hour using something like
group thing by new {
firstThing = x.Field<DateTime>("TimeStamp").Date,
secondThing = x.Field<DateTime>("TimeStamp").Date.Hour,
}
My syntax is probably off by a little, but that should get you moving in the right direction

How can I group a date column to a less precise format while selecting in Entity Framework or LINQ?

I am collecting data within every ten seconds (six records in a minute).
Using Entity Framework or LINQ, I want to obtain the average of the records within a every minute.
Simply, Date column is in (%Y.%m.%d %H:%i:%s) format, and i want to group by (%Y.%m.%d %H:%i) format in mysql using Entity Framework or LINQ
Assuming that you mean you have a date time column you can group by the DateTime properties Year, Month, Day, Hour, and Minute.
var results = from row in db.SomeTable
group by new
{
row.Date.Year,
row.Date.Month,
row.Date.Day,
row.Date.Hour,
row.Date.Minute
} into grp
select new
{
YMDHM = grp.Key
SomeAverage = grp.Average(x => x.SomeValueToAverage)
};
Then when you iterate the results you can turn the values in YMDHM back into a DateTime. Here's an example where you could turn the results into a Dictionary.
var dictionaryOfAverages = results.ToDictionary(
x => new DateTime(x.YMDHM.Year,
x.YMDHM.Month,
x.YMDHM.Day,
x.YMDHM.Hour,
x.YMDHM.Minute,
0),
x => x.SomeAverage);
On the other hand if you actually mean you are storing the date time in the DB as a formatted string then the following would be what you want
var results = from row in db.SomeTable
group by new row.Date.SubString(0, 16) into grp
select new
{
YMDHM = grp.Key
SomeAverage = grp.Average(x => x.SomeValueToAverage)
};
This is assuming that your string formatted dates look like "2013.05.08 07:25:33" If the format isn't fixed width with leading zeros for month, day, and/or hour you'd have to do something like row.Date.SubString(0, row.Date.Length - 3) instead

How can I subsample data from a time series with LINQ to SQL?

I have a database table full of time points and experimental values at those time points. I need to retrieve the values for an experiment and create a thumbnail image showing an XY plot of its data. Because the actual data set for each experiment is potentially 100,000 data points and my image is only 100 pixels wide, I want to sample the data by taking every nth time point for my image and ignoring the rest.
My current query (which retrieves all the data without sampling) is something simple like this:
var points = from p in db.DataPoints
where p.ExperimentId == myExperimentId
orderby p.Time
select new {
X = p.Time,
Y = p.Value
}
So, how can I best take every nth point from my result set in a LINQ to SQL query?
This will do every nth element:
int nth = 100;
var points = db.DataPoints
.Where(p => p.ExperimentId == myExperimentId)
.OrderBy(p => p.Time)
.Where( (p, index) => index % nth == 0 )
.Select( p => new { X = p.Time, Y = p.Value } );
It works by using the Queryable.Where overload which provides an index in the sequence, so you can filter based off the index.
.Skip(n).Take(1)
Will return one sample point. Call it repeatedly to get more points.
http://msdn.microsoft.com/en-us/library/bb386988.aspx
If performance becomes an issue, and you have a primary identity key of type int containing consecutive values, you can try returning all records that will evenly divide the primary key by your n.
.Where(x => x.PK % n == 0)
You should you
.Skip(n).Take(100)
It skips how many every record you want it to skip and Takes 100 records.
HTH

Categories

Resources