Most recent records with 2 tables and take / skip - c#

What I want to do, is basically what this question offers: SQL Server - How to display most recent records based on dates in two tables .. Only difference is: I am using Linq to sql.
I have to tables:
Assignments
ForumPosts
These are not very similar, but they both have a "LastUpdated" field. I want to get the most recent joined records. However, I also need a take/skip functionality for paging (and no, I don't have SQL 2012).
I don't want to create a new list (with ToList and AddRange) with ALL my records, so I know the whole set of records, and then order.. That seems extremely unefficient.
My attempt:
Please don't laugh at my inefficient code.. Well ok, a little (both because it's inefficient and... it doesn't do what I want when skip is more than 0).
public List<TempContentPlaceholder> LatestReplies(int take, int skip)
{
using (GKDBDataContext db = new GKDBDataContext())
{
var forumPosts = db.dbForumPosts.OrderBy(c => c.LastUpdated).Skip(skip).Take(take).ToList();
var assignMents = db.dbUploadedAssignments.OrderBy(c => c.LastUpdated).Skip(skip).Take(take).ToList();
List<TempContentPlaceholder> fps =
forumPosts.Select(
c =>
new TempContentPlaceholder()
{
Id = c.PostId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
}).ToList();
List<TempContentPlaceholder> asm =
assignMents.Select(
c =>
new TempContentPlaceholder()
{
Id = c.UploadAssignmentId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
}).ToList();
fps.AddRange(asm);
return fps.OrderBy(c=>c.LastUpdated).ToList();
}
}
Any awesome Linq to SQl people, who can throw me a hint? I am sure someone can join their way out of this!

First, you should be using OrderByDescending, since later dates have greater values than earlier dates, in order to get the most recent updates. Second, I think what you are doing will work, for the first page, but you need to only take the top take values from the joined list as well. That is if you want the last 20 entries from both tables combined, take the last 20 entries from each, merge them, then take the last 20 entries from the merged list. The problem comes in when you attempt to use paging because what you will need to do is know how many elements from each list went into making up the previous pages. I think, your best bet is probably to merge them first, then use skip/take. I know you don't want to hear that, but other solutions are probably more complex. Alternatively, you could take the top skip+take values from each table, then merge, skip the skip values and apply take.
using (GKDBDataContext db = new GKDBDataContext())
{
var fps = db.dbForumPosts.Select(c => new TempContentPlaceholder()
{
Id = c.PostId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
})
.Concat( db.dbUploadedAssignments.Select(c => new TempContentPlaceholder()
{
Id = c.PostId,
LastUpdated = c.LastUpdated,
Type = ContentShowingType.ForumPost
}))
.OrderByDescending( c => c.LastUpdated )
.Skip(skip)
.Take(take)
.ToList();
return fps;
}

Related

c# find start date and end date based on a list of dates?

I have a database table with over 200K+ records and a column containing a Date (NOT NULL). I am struggling to do a GroupBy Date since the database is massive the query takes soooo long to process (like 1 minute or so).
My Theory:
Get the list of all records from that table
From that list find the end date and the start date (basically the oldest date and the newest)
Then taking say like 20 dates to do the GroupBy on so the query will be done in a shorter set of records..
Here is my Model that I have to get the list:
registration.Select(c => new RegistrationViewModel()
{
DateReference = c.DateReference,
MinuteWorked = c.MinuteWorked,
});
The DateReferenceis the database column that I have to work with...
I am not pretty sure how to cycle through my list getting the dates start and end without taking too long.
Any idea on how to do that?
EDIT:
var registrationList = await context.Registration
.Where(c => c.Status == StatusRegistration.Active) // getting all active registrations
.ToRegistrationViewModel() // this is simply a select method
.OrderBy(d => d.DateReference.Date) // this takes long
.ToListAsync();
The GroupBy:
var grpList = registrationList.GroupBy(x => x.DateReference.Date).ToList();
var tempList = new List<List<RegistrationViewModel>>();
foreach (var item in grpList)
{
var selList = item.Select(c => new RegistrationViewModel()
{
RegistrationId = c.RegistrationId,
DateReference = c.DateReference,
MinuteWorked = c.MinuteWorked,
}).ToList();
tempList.Add(selList);
}
This is my SQL table:
This is the ToRegistrationViewModel() function:
return registration.Select(c => new RegistrationViewModel()
{
RegistrationId = c.RegistrationId,
PeopleId = c.PeopleId,
DateReference = c.DateReference,
DateChange = c.DateChange,
UserRef = c.UserRef,
CommissionId = c.CommissionId,
ActivityId = c.ActivityId,
MinuteWorked = c.MinuteWorked,
Activity = new ActivityViewModel()
{
Code = c.Activity.Code,
Description = c.Activity.Description,
},
Commission = new CommissionViewModel()
{
Code = c.Commission.Code,
Description = c.Commission.Description
},
People = new PeopleViewModel()
{
UserId = c.People.UserId,
Code = c.People.Code,
Name = c.People.Name,
Surname = c.People.Surname,
Active = c.People.Active
}
});
There are multiple potential problems here
Lack of indexes
Your query uses the Status and DateReference, and neither looks to have an index. If there are only a few active statuses a index on that column might suffice, otherwise you need a index on the date to speedup sorting. You might also consider a composite index that includes both columns. An appropriate index should solve the sorting issue.
Materializing the query
ToListAsync will trigger the execution of the sql query, making every subsequent operation run on the client. I would also be highly suspicious of ToRegistrationViewModel, I would try changing this to an anonymous type, and only convert to an actual type after the query has been materialized. Running things like sorting and grouping on the client is generally considered a bad idea, but you need to consider where the actual bottleneck is, optimizing the grouping will not help if the transfer of data takes most time.
Transferring data
Fetching a large number of rows will be slow, no matter what. The goal is usually to do as much filtering in the database as possible so you do not need to fetch so many rows. If you have to fetch a large amount of records you might use Pagination, i.e. combine OrderBy with Skip and Take to fetch smaller chunks of data. This will not save time overall, but can allow for things like progress and showing data continuously.

How to sort/filter a list in the same way you sort a database result

I am currently coding with .net core 5 preview 3 and I am having an issue with filtering a list of best matched customers.
Given these two different code samples how come they produce different results?
How can I fix the second sample to return the same results as the first sample?
Sample One (this works)
//This properly gives the top 10 best matches from the database
using (var context = new CustomerContext(_contextOptions))
{
customers = await context.vCustomer.Where(c => c.Account_Number__c.Contains(searchTerm))
.Select(c => new
{
vCustomer = c,
MatchEvaluator = searchTerm.Contains(c.Account_Number__c)
})
.OrderByDescending(c => c.MatchEvaluator)
.Select(c => new CustomerModel
{
CustomerId = c.vCustomer.Account_Number__c,
CustomerName = c.vCustomer.Name
})
.Take(10)
.ToListAsync();
}
Customer Id Results from sample one (these are the best results)
247
2470
247105
247109
247110
247111
247112
247113
247116
247117
Sample Two (This doesn't work the same even though its the same code)
//this take all customers from database and puts them in a list so they can be cached and sorted on later.
List<CustomerModel> customers = new List<CustomerModel>();
using (var context = new CustomerContext(_contextOptions))
{
customers = await context.vCustomer
.Select(c => new CustomerModel
{
CustomerId = c.Account_Number__c,
CustomerName = c.Name
})
.ToListAsync();
}
//This does not properly gives the top 10 best matches from the list that was generated from the database
List<CustomerModel> bestMatchedCustomers = await Task.FromResult(
customers.Where(c => c.CustomerId.Contains(searchTerm))
.Select(c => new
{
Customer = c,
MatchEvaluator = searchTerm.Contains(c.CustomerId)
})
.OrderByDescending(c => c.MatchEvaluator)
.Select(c => new CustomerModel
{
CustomerId = c.Customer.CustomerId,
CustomerName = c.Customer.CustomerName
})
.Take(10)
.ToList()
);
Customer Id Results from sample two
247
1065247
247610
32470
324795
624749
762471
271247
247840
724732
You asked "why are they different" and for this you need to appreciate that databases have a optimizer that looks at the query being run and changes its data access strategy according to various things like how many records are being selected, whether indexes apply, what sorting is requested etc
One of your queries selects all the database table into the client side list and then uses the list to do the filter and sort, the other uses the database to do the filter and the sort. To a database these will be very different things; hitting a table you likely get the rows out in the order they're stored on disk, which could be random. Using a filter you might see the database using some indexing strategy where it includes/discounts a large number of rows based on an index, or it might even use the index to retrieve the requested data. How it then sorts the ties, if it does, might be completely different to how the client side list sorts ties (does nothing with them actually). Either way, the important point is the database is planning and executing your two different queries differently. It sees different queries because your second version runs the query without a where or order by
When you couple this up with your sort operation being on a column that is incredibly cardinality (how unique the values in the column are) i.e. your lead result, the one where the record equals the search term, is 1 and EVERYTHING else is 0. This means that one record bubbles to the top then the rest of the records are free to be sorted however the system doing the sorting likes, and then you take a subset of them
..hence why one looks like X and the other like Y
If you didn't take the subset the two datasets would be in different orders but everything in set 1 would be in set 2 somewhere... it's just that one set is like 1 3 5 7 2 4 6, the other is like 1 7 6 5 4 3 2, you're taking the first three results and asking "why is 1 3 5 different to 1 7 6"
In terms of your code, I think I would have just done something simple that also sorts in a stable fashion (rows in same order because there is no ambiguity/ties) like:
await context.vCustomer
.Where(c => c.Account_Number__c.Contains(searchTerm))
.OrderBy(c => c.Account_Number__c.Length)
.ThenBy(c => c.Account_Number__c) //stable, if unique
.Take(10)
.Select(c => new CustomerModel
{
CustomerId = c.vCustomer.Account_Number__c,
CustomerName = c.vCustomer.Name
}
)
.ToListAsync();
If you sort the results by their length in chars then 247 is better match than 2470, which is better than 24711 or 12471 etc
"Contains" can be quite performance penalising; perhaps consider StartsWith; theoretically at least, an index could still be used for that
ps: calling your var a MatchEvaluator makes things really confusing for people who know regex well btw
You're ordering by the MatchEvaluator value which is either 1 or 0.
If I understood correctly what you want to do is first order by the MatchEvaluator and then by the CustomerId:
List<CustomerModel> bestMatchedCustomers =
await Task.FromResult(
customers.Where(c => c.CustomerId.Contains(searchTerm))
.OrderBy(c => c.CustomerId.IndexOf(searchTerm))
.ThenBy(c => c.CustomerId)
.Select(c => new CustomerModel
{
CustomerId = c.Customer.CustomerId,
CustomerName = c.Customer.CustomerId
})
.Take(10)
.ToList()
);

Handle Linq queries outputs

I've been struggling for the last 3 days on that topic.
I'm sure i'm doing something wrong but there, i need help.
During the load of a form, i'm doing a Linq query (on a global dataset) to populate fields on that form. As i want to be able to change the views of the form, i want queries that will make the data available in a specific format (to avoid having to query every now on then (the dataset is 20,000 lines)).
so i came up with that first queries :
var results =
from row in Globals.ds.Tables["Song"].AsEnumerable()
group row by (row.Field<int>("year"), row.Field<int>("rating")) into grp
orderby grp.Key
select new
{
year = grp.Key.Item1,
conte = grp.ToList().Count,
rating = grp.Key.Item2,
duree = grp.Sum(r => r.Field<int>("duree"))
};
It works and i'm pasting the result in the following screenshot (conte is the count)
Result of the query
1 have 2 issues :
1/ I really dont know how to handle that result : i would like to filter for a specific year and list all the subsequent ratings (i have from 1 to 6 per year). I tried the .ToList() but it only helped to get the count. The CopyToDataTable is not available for the query.
2/ i have buttons in the form that will need to access to that query, yet the var result is only available in the load and i can't manage to declare it at the class level.
Thanks for the help :)
So:
Your first point have been answered by #jdweng
It is possible to use LinQ also for collections (ex. List), not only Db queries.
The reason is that the result of the query is an anonymous type, and it can't be declared outside local scope. You must create a new class with the same structure.
public class MyResultClass
{
public int year;
public int conte;
public int rating;
public int duree;
}
Define your field:
List<MyResultClass> data;
And then use both:
var result =
from row in Globals.ds.Tables["Song"].AsEnumerable()
group row by (row.Field<int>("year"), row.Field<int>("rating")) into grp
orderby grp.Key
select new MyResultClass
{
year = grp.Key.Item1,
conte = grp.ToList().Count,
rating = grp.Key.Item2,
duree = grp.Sum(r => r.Field<int>("duree"))
};
data = result.ToList();
I hope I was helpful.

Why is linq reversing order in group by

I have a linq query which seems to be reversing one column of several in some rows of an earlier query:
var dataSet = from fb in ds.Feedback_Answers
where fb.Feedback_Questions.Feedback_Questionnaires.QuestionnaireID == criteriaType
&& fb.UpdatedDate >= dateFeedbackFrom && fb.UpdatedDate <= dateFeedbackTo
select new
{
fb.Feedback_Questions.Feedback_Questionnaires.QuestionnaireID,
fb.QuestionID,
fb.Feedback_Questions.Text,
fb.Answer,
fb.UpdatedBy
};
Gets the first dataset and is confirmed working.
This is then grouped like this:
var groupedSet = from row in dataSet
group row by row.UpdatedBy
into grp
select new
{
Survey = grp.Key,
QuestionID = grp.Select(i => i.QuestionID),
Question = grp.Select(q => q.Text),
Answer = grp.Select(a => a.Answer)
};
While grouping, the resulting returnset (of type: string, list int, list string, list int) sometimes, but not always, turns the question order back to front, without inverting answer or questionID, which throws it off.
i.e. if the set is questionID 1,2,3 and question A,B,C it sometimes returns 1,2,3 and C,B,A
Can anyone advise why it may be doing this? Why only on the one column? Thanks!
edit: Got it thanks all! In case it helps anyone in future, here is the solution used:
var groupedSet = from row in dataSet
group row by row.UpdatedBy
into grp
select new
{
Survey = grp.Key,
QuestionID = grp.OrderBy(x=>x.QuestionID).Select(i => i.QuestionID),
Question = grp.OrderBy(x=>x.QuestionID).Select(q => q.Text),
Answer = grp.OrderBy(x=>x.QuestionID).Select(a => a.Answer)
};
Reversal of a grouped order is a coincidence: IQueryable<T>'s GroupBy returns groups in no particular order. Unlike in-memory GroupBy, which specifies the order of its groups, queries performed in RDBMS depend on implementation:
The query behavior that occurs as a result of executing an expression tree that represents calling GroupBy<TSource,TKey,TElement>(IQueryable<TSource>, Expression<Func<TSource,TKey>>, Expression<Func<TSource,TElement>>) depends on the implementation of the type of the source parameter.`
If you would like to have your rows in a specific order, you need to add OrderBy to your query to force it.
How I do it and maintain the relative list order, rather than apply an order to the resulting set?
One approach is to apply grouping to your data after bringing it into memory. Apply ToList() to dataSet at the end to bring data into memory. After that, the order of subsequent GrouBy query will be consistent with dataSet. A drawback is that the grouping is no longer done in RDBMS.

LINQ to SQL exception: Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator

I know this is a duplicate on SO, but I can't figure out how to use the contains operator in my specific code:
I have 5 bookings in the database:
ID, Booking, Pren, ReservationCode
1, VisitHere, 1, 1000A
2, VisitHere, 1, 1000A
3, VisitHere, 1, 1000A
4, VisitThere, 2, 2000A
5, VisitThere, 2, 2000A
public int SpecialDelete(DataContext db, IEnumerable<BookingType> bookings) {
var rescodes = (from b in bookings
select b).Distinct().ToArray();
// Code Breaks here
IEnumerable<BookingType> bookingsToDelete = db.GetTable<BookingType>().Where(b => bookings.Any(p => p.Pren == b.Pren && p.ReservationCode == b.ReservationCode));
int deleted = bookingsToDelete.Count();
db.GetTable<BookingType>().DeleteAllOnSubmit(bookingsToDelete);
db.SubmitChanges();
return deleted;
}
When I pass the first record into this method (1, VisitHere, 1, 1000A), I want it to retrieve ids 1,2 and 3, but not 4 and 5.
I can do this by matching Pren and ReservationCode.
How can I do this as the .Any and .All operators are throwing the above exception?
Note: The method must accept a list of bookings because the argument will always be multiple bookings passed into the method, I just used a single booking as an example.
Edit: I basically need LINQ2SQL to generate a bunch of SQL statements like so (let's say I want to delete all records in my DB):
DELETE
FROM Bookings b
WHERE b.ReservationCode = '1000A' AND b.Pren = 1
DELETE
FROM Bookings b
WHERE b.ReservationCode = '2000A' AND b.Pren = 2
The error you are getting is trying to direct you to use the .Contains method passing in a simple array. By default it translates that array into an In clause in the format:
Where foo In ("b1", "B2", "B3")
Notice here that you can't do a multi-dimentional array in the In clause (as you would need to do). Since you can't join server side to a local array, your options become limited as long as you have a composite key relationship.
If you don't need to fetch the rows in order to delete them, it will probably be faster anyway to just use Context's ExecuteCommand to issue your deletes. Just make sure to parameterize your query (see http://www.thinqlinq.com/Post.aspx/Title/Does-LINQ-to-SQL-eliminate-the-possibility-of-SQL-Injection)
string deleteQuery = "DELETE FROM Bookings b WHERE b.ReservationCode = {0} AND b.Pren = {1}";
foreach (var bookingType in bookings)
{
db.ExecuteCommand(deleteQuery, bookingType.ReservationCode, bookingType.Preen);
}
What if you have a quasi temp table on the server. You can put the list values in there.
This is a real problem with ORMs. You have a lot is mismatch between local and remote capabilities.
I have tried even using .Range to generated a remote list to join against, but it doesn't work either.
Essentially you have to rearrange your data islands somehow ( i.e. where does the lists of pren and rs come from? Is it on the server somewhere ? ) or upload one of your local collections to a staging area on the server.
The error message says "except the contains operator." Have you considered using the Contains operator? It should do the same thing.
So from
IEnumerable<BookingType> bookingsToDelete = db.GetTable<BookingType>().Where(b => bookings.Any(p => p.Pren == b.Pren && p.ReservationCode == b.ReservationCode));
to
IEnumerable<BookingType> bookingsToDelete = db.GetTable<BookingType>().Where(b => bookings.Contains(p => p.Pren == b.Pren && p.ReservationCode == b.ReservationCode));
I realise that the list wont contain the same objects so you may need to do something like:
bookings.Select(booking => booking.PrimaryKeyOfAwesome).Contains(b => b.PrimaryKeyOfAwesome) etc etc.
Edited for clarity
Edit for humility
Ok, so after actually recreating the entire setup I realised that my solution doesnt work because of the two parameter, not just one. Apologies. This is what I came up with in the end, which works, but is genuinely a terrible solution and should not be used. I include it here only for closure ;)
public static int SpecialDelete(DataContext db, IEnumerable<BookingType> bookings)
{
var compositeKeys = bookings.Select(b => b.Pren.ToString() + b.ReservationCode).Distinct();
IEnumerable<BookingType> bookingsToDelete = db.GetTable<BookingType>().Where(b => compositeKeys.Contains(b.Pren.ToString() + b.ReservationCode));
int deleted = bookingsToDelete.Count();
db.GetTable<BookingType>().DeleteAllOnSubmit(bookingsToDelete);
db.SubmitChanges();
return deleted;
}

Categories

Resources