I'm trying to find out what's a good way to continually iterate through a dbset over various function calls, looping once at the end.
Essentially I've got a bunch of Ads in the database, and I want to getNext(count) on the dbset.Ads
Here's an example
ID Text Other Columns...
1 Ad1 ...
2 Ad2 ...
3 Ad3 ...
4 Ad4 ...
5 Ad5 ...
6 Ad6 ...
Let's say that in my View, I determine I need 2 ads to display for User 1. I want to return Ads 1-2. User 2 then requests 3 ads. I want it to return 3-5. User 3 needs 2 ads, and the function should return ads 6 and 1, looping back to the beginning.
Here's my code that I've been working with (it's in class AdsManager):
Ad NextAd = db.Ads.First();
public IEnumerable<Ad> getAds(count)
{
var output = new List<Ad>();
IEnumerable<Ad> Ads = db.Ads.OrderBy(x=>x.Id).SkipWhile(x=>x.Id != NextAd.Id);
output.AddRange(Ads);
//If we're at the end, handle that case
if(output.Count != count)
{
NextAd = db.Ads.First();
output.AddRange(getAds(count - output.Count));
}
NextAd = output[count-1];
return output;
}
The problem is that the function call IEnumerable<Ad> Ads = db.Ads.OrderBy(x=>x.Id).SkipWhile(x=>x.Id != NextAd.Id); throws an error on AddRange(Ads):
LINQ to Entities does not recognize the method 'System.Linq.IQueryable'1[Domain.Entities.Ad] SkipWhile[Ad](System.Linq.IQueryable'1[Domain.Entities.Ad], System.Linq.Expressions.Expression'1[System.Func`2[Domain.Entities.Ad,System.Boolean]])' method, and this method cannot be translated into a store expression.
I originally had loaded the entire dbset into a Queue, and did enqueue/dequeue, but that would not updat when a change was made to the database. I got the idea for this algorithm based on Get the next and previous sql row by Id and Name, EF?
What call should I be making to the database to get what I want?
UPDATE: Here's the working Code:
public IEnumerable<Ad> getAds(int count)
{
List<Ad> output = new List<Ad>();
output.AddRange(db.Ads.OrderBy(x => x.Id).Where(x => x.Id >= NextAd.Id).Take(count + 1));
if(output.Count != count+1)
{
NextAd = db.Ads.First();
output.AddRange(db.Ads.OrderBy(x => x.Id).Where(x => x.Id >= NextAd.Id).Take(count - output.Count+1));
}
NextAd = output[count];
output.RemoveAt(count);
return output;
}
SkipWhile is not supported in the EF; it can't be translated into SQL code. EF basically works on sets and not sequences (I hope that sentence makes sense).
A workaround is to simply use Where, e.g.:
IEnumerable<Ad> Ads = db.Ads.OrderBy(x=>x.Id).Where(x=>x.Id >= NextAd.Id);
Maybe you can simplify this into something like this:
Ad NextAd = db.Ads.First();
public IQueryable<Ad> getAds(count)
{
var firstTake = db.Ads
.OrderBy(x => x.Id)
.Where(x => x.Id >= NextAd.Id);
var secondTake = db.Ads
.OrderBy(x => x.Id)
.Take(count - result.Count());
var result = firstTake.Concat(secondTake);
NextAd = result.LastOrDefault()
return result;
}
Sorry, haven't tested this, but it should work.
Related
This question already has answers here:
Limiting query size with entity framework
(3 answers)
Closed last month.
How to limit rest api result? I set up a basic API but total result is 800k from database. I was reading on CMS website document and they set up a size & offset so each api call return at max 5000 rows. How can I do this?
web api url call examples:
localhost2343/api/dataset/data?First_Name=Dave
localhost2343/api/dataset/data?Last_Name=Dave&First_Name=Ram
localhost2343/api/dataset/data?size=1&offset=50
localhost2343/api/dataset/data?First_Name=Dave&size=1&offset=50
my web api
[Route("data")]
[HttpGet]
public async Task<IQueryable<My_Model>> Get(My_Model? filter)
{
IQueryable<My_Model> Query = (from x in _context.My_DbSet
select x);
// add filters
if (filter.Id != 0)
{
Query = Query.Where(x => x.Id == filter.Id);
}
if (!string.IsNullOrEmpty(filter.First_Name))
{
Query = Query.Where(x => x.First_Name == filter.First_Name);
}
...
return Query;
}
What I tried: I think I have to do something like below but not sure. I am also not sure how to pass these values in URL becuase user enter 0 or 2 filters in different orders.
Query = Query.Skip((offset - 1) * size).Take(size);
You can use pagination, It's essential if you’re dealing with a lot of data and endpoints. Pagination automatically implies adding order to the query result. You can do something like this. And you can Take() and Skip()
[Route("data")]
[HttpGet]
public async Task<IQueryable<My_Model>> Get(My_Model? filter, int pageSize = 50, int page = 1)
{
IQueryable<My_Model> Query = (from x in _context.My_DbSet
select x);
// add filters
if (filter.Id != 0)
{
Query = Query.Where(x => x.Id == filter.Id);
}
if (!string.IsNullOrEmpty(filter.First_Name))
{
Query = Query.Where(x => x.First_Name == filter.First_Name);
}
return Query.Skip((page - 1) * pageSize).Take(pageSize);
}
To call the API, you can use query parameters like.
/data?filter.Id=1&filter.First_Name=John&pageSize=50&page=1
var fdPositions = dbContext.FdPositions.Where(s => s.LastUpdated > DateTime.UtcNow.AddDays(-1));
foreach (JProperty market in markets)
{
// bunch of logic that is irrelevant here
var fdPosition = fdPositions.Where(s => s.Id == key).FirstOrDefault();
if (fdPosition is not null)
{
fdPosition.OddsDecimal = oddsDecimal;
fdPosition.LastUpdated = DateTime.UtcNow;
}
else
{
// bunch of logic that is irrelevant here
}
}
await dbContext.SaveChangesAsync();
This block of code will make 1 database call on this line
var fdPosition = fdPositions.Where(s => s.Id == key).FirstOrDefault();
for each value in the loop, there will be around 10,000 markets to loop through.
What I thought would happen, and what I would like to happen, is 1 database call is made
var fdPositions = dbContext.FdPositions.Where(s => s.LastUpdated > DateTime.UtcNow.AddDays(-1));
on this line, then in the loop, it is checking against the local table I thought I pulled on the first line, making sure I still properly am updating the DB Object in this section though
if (fdPosition is not null)
{
fdPosition.OddsDecimal = oddsDecimal;
fdPosition.LastUpdated = DateTime.UtcNow;
}
So my data is properly propagated to the DB when I call
await dbContext.SaveChangesAsync();
How can I update my code to accomplish this so I am making 1 DB call to get my data rather than 10,000 DB calls?
Define your fdPositions variable as a Dictionary<int, T>, in your query do a GroupBy() on Id, then call .ToDictionary(). Now you have a materialized dictionary that lets you index by key quickly.
var fdPositions = context.FdPositions.Where(s => s.LastUpdatedAt > DateTime.UtcNow.AddDays(-1))
.GroupBy(x=> x.Id)
.ToDictionary(x=> x.Key, x=> x.First());
//inside foreach loop:
// bunch of logic that is irrelevant here
bool found = fdPositions.TryGetValue(key, out var item);
I don't have a problem currently, but I want to make sure, that the performance is not too shabby for my issue. My search on Microsofts documentation was without any success.
I have a Entity of the name Reservation. I now want to add some statistics to the program, where I can see some metrics about the reservations (reservations per month and favorite spot/seat in particular).
Therefore, my first approach was the following:
public async Task<ICollection<StatisticElement<Seat>>> GetSeatUsage(Company company)
{
var allReservations = await this.reservationService.GetAll(company);
return await this.FetchGroupedSeatData(allReservations, company);
}
public async Task<ICollection<StatisticElement<DateTime>>> GetMonthlyReservations(Company company)
{
var allReservations = await this.reservationService.GetAll(company);
return this.FetchGroupedReservationData(allReservations);
}
private async Task<ICollection<StatisticElement<Seat>>> FetchGroupedSeatData(
IEnumerable<Reservation> reservations,
Company company)
{
var groupedReservations = reservations.GroupBy(r => r.SeatId).ToList();
var companySeats = await this.seatService.GetAll(company);
return (from companySeat in companySeats
let groupedReservation = groupedReservations.FirstOrDefault(s => s.Key == companySeat.Id)
select new StatisticElement<Seat>()
{
Value = companySeat,
StatisticalCount = groupedReservation?.Count() ?? 0,
}).OrderByDescending(s => s.StatisticalCount).ToList();
}
private ICollection<StatisticElement<DateTime>> FetchGroupedReservationData(IEnumerable<Reservation> reservations)
{
var groupedReservations = reservations.GroupBy(r => new { Month = r.Date.Month, Year = r.Date.Year }).ToList();
return groupedReservations.Select(
groupedReservation => new StatisticElement<DateTime>()
{
Value = new DateTime(groupedReservation.Key.Year, groupedReservation.Key.Month, 1),
StatisticalCount = groupedReservation.Count(),
}).
OrderBy(s => s.Value).
ToList();
}
To explain the code a little bit: With GetSeatUsage and GetMonthlyReservations I can get the above mentioned data of a company. Therefore, I fetch ALL reservations at first (with reservationService.GetAll) - this is the point, where I think the performance will be a problem in the future.
Afterwards, I call either FetchGroupedSeatData or FetchGroupedReservationData, which first groups the reservations I previously fetched from the database and then converts them in a, for me, usable format.
As I said, I think the group by after I have read ALL the data from the database MIGHT be a problem, but I cannot find any information regarding performance in the documentation.
My other idea was, that I create a new method in my ReservationService, which then already returns the grouped list. But, again, I can't find the information, that the EF adds the GroupBy to the DB Query or basically does it after all of the data has been read from the database. This method would look something like this:
return await this.Context.Set<Reservation>.Where(r => r.User.CompanyId == company.Id).GroupBy(r => r.SeatId).ToListAsync();
Is this already the solution? Where can I check that? Am I missing something completely obvious?
I am trying to make a program that checks if a given sudoku board is valid (solved correctly).
Also want to do it using linq however I find it hard to come up with a solution to get all the 3x3 groups from the board.
I want to get them as a IEnumerable<IEnumerable<int>> because of how I wrote the rest of the code.
Here is my solution so far :
public static bool IsValidSudoku(IEnumerable<IEnumerable<int>> sudokuBoard)
{
if (sudokuBoard == null)
{
throw new ArgumentNullException();
}
var columns = Enumerable.Range(0, 9)
.Select(lineCount => Enumerable.Range(0,9)
.Select(columnCount=>sudokuBoard
.ElementAt(columnCount)
.ElementAt(lineCount)
));
var groups = //this is where I got stuck
return columns.All(IsValidGroup) &&
sudokuBoard.All(IsValidGroup) &&
groups.All(IsValidGroup);
}
static bool IsValidGroup(IEnumerable<int> group)
{
return group.Distinct().Count() == group.Count()&&
group.All(x => x <= 9 && x > 0)&&
group.Count() == 9;
}
Performance is not important here.
Thank you for any advice!
You need two enumerables to choose which 3x3 group you're selecting, and then you can use .Skip and .Take to take runs of three elements to fetch those groups.
var groups = Enumerable.Range(0, 3).SelectMany(gy =>
Enumerable.Range(0, 3).Select(gx =>
// We now have gx and gy 0-2; find the three rows we want
sudoBoard.Skip(gy * 3).Take(3).Select(row =>
// and from each row take the three columns
row.Skip(gx * 3).Take(3)
)
));
This should give you an IEnumerable of IEnumerable<IEnumerable<int>>s as requested. However to pass each group to IsValidGroup you'll have to flatten the 3x3 IEnumerable<IEnumerable<int>> into a 9-longIEnumerable<int>s, e.g. groups.Select(group => group.SelectMany(n => n)).
I'm creating a report generating tool that use custom data type of different sources from our system. The user can create a report schema and depending on what asked, the data get associated based different index keys, time, time ranges, etc. The project is NOT doing queries in a relational database, it's pure C# code in collections from RAM.
I'm having a huge performance issue and I'm looking at my code since a few days and struggle with trying to optimize it.
I stripped down the code to the minimum for a short example of what the profiler point as the problematic algorithm, but the real version is a bit more complex with more conditions and working with dates.
In short, this function return a subset of "values" satisfying the conditions depending on the keys of the values that were selected from the "index rows".
private List<LoadedDataSource> GetAssociatedValues(IReadOnlyCollection<List<LoadedDataSource>> indexRows, List<LoadedDataSource> values)
{
var checkContainers = ((ValueColumn.LinkKeys & ReportLinkKeys.ContainerId) > 0 &&
values.Any(t => t.ContainerId.HasValue));
var checkEnterpriseId = ((ValueColumn.LinkKeys & ReportLinkKeys.EnterpriseId) > 0 &&
values.Any(t => t.EnterpriseId.HasValue));
var ret = new List<LoadedDataSource>();
foreach (var value in values)
{
var valid = true;
foreach (var index in indexRows)
{
// ContainerId
var indexConservedSource = index.AsEnumerable();
if (checkContainers && index.CheckContainer && value.ContainerId.HasValue)
{
indexConservedSource = indexConservedSource.Where(t => t.ContainerId.HasValue && t.ContainerId.Value == value.ContainerId.Value);
if (!indexConservedSource.Any())
{
valid = false;
break;
}
}
//EnterpriseId
if (checkEnterpriseId && index.CheckEnterpriseId && value.EnterpriseId.HasValue)
{
indexConservedSource = indexConservedSource.Where(t => t.EnterpriseId.HasValue && t.EnterpriseId.Value == value.EnterpriseId.Value);
if (!indexConservedSource.Any())
{
valid = false;
break;
}
}
}
if (valid)
ret.Add(value);
}
return ret;
}
This works for small samples, but as soon as I have thousands of values, and 2-3 index rows with a few dozens values too, it can take hours to generate.
As you can see, I try to break as soon as a index condition fail and pass to the next value.
I could probably do everything in a single "values.Where(####).ToList()", but that condition get complex fast.
I tried generating a IQueryable around indexConservedSource but it was even worse. I tried using a Parallel.ForEach with a ConcurrentBag for "ret", and it was also slower.
What else can be done?
What you are doing, in principle, is calculating intersection of two sequences. You use two nested loops and that is slow as the time is O(m*n). You have two other options:
sort both sequences and merge them
convert one sequence into hash table and test the second against it
The second approach seems better for this scenario. Just convert those index lists into HashSet and test values against it. I added some code for inspiration:
private List<LoadedDataSource> GetAssociatedValues(IReadOnlyCollection<List<LoadedDataSource>> indexRows, List<LoadedDataSource> values)
{
var ret = values;
if ((ValueColumn.LinkKeys & ReportLinkKeys.ContainerId) > 0 &&
ret.Any(t => t.ContainerId.HasValue))
{
var indexes = indexRows
.Where(i => i.CheckContainer)
.Select(i => new HashSet<int>(i
.Where(h => h.ContainerId.HasValue)
.Select(h => h.ContainerId.Value)))
.ToList();
ret = ret.Where(v => v.ContainerId == null
|| indexes.All(i => i.Contains(v.ContainerId)))
.ToList();
}
if ((ValueColumn.LinkKeys & ReportLinkKeys.EnterpriseId) > 0 &&
ret.Any(t => t.EnterpriseId.HasValue))
{
var indexes = indexRows
.Where(i => i.CheckEnterpriseId)
.Select(i => new HashSet<int>(i
.Where(h => h.EnterpriseId.HasValue)
.Select(h => h.EnterpriseId.Value)))
.ToList();
ret = ret.Where(v => v.EnterpriseId == null
|| indexes.All(i => i.Contains(v.EnterpriseId)))
.ToList();
}
return ret;
}