Performance in ENTITY FRAMEWORK - c#

Sometimes when I'm writing queries using LINQ and if I use it inside of a loop. It turns so slow the performance.
var query1 = from c in db.Classes
where c.TeacherId.Equals(teacherId)
select c;
// AnsweredAssignment Query
var query2 = (from c in db.AnsweredAssignments
where c.AssignmentId == assignmentId && c.Student.Class.TeacherId.Equals(teacherId)
select c).ToArray();
// Tokens Query
var query3 = (from c in db.Tokens
where c.AssignmentId == assignmentId && c.Student.Class.TeacherId.Equals(teacherId)
select c).ToArray();
// OverwrittenScores Query
var query4 = (from os in db.OverwrittenScores
where os.AssignmentId == assignmentId && os.Student.Class.TeacherId.Equals(teacherId)
select os).ToArray();
foreach (var c in query1)
{
foreach (var s in c.Students)
{
var aaItems = (from aa in query2
where aa.StudentId == s.StudentId
select aa).ToArray();
// Generate scores for objectives
var id3 = (from aa in aaItems
where !aa.IsMakeup
orderby aa.Score descending
select aa).FirstOrDefault();
if (id3 != null)
{
var aa3 = (from aa in query2
where aa.AnsweredAssignmentId == id3.AnsweredAssignmentId
select aa).SingleOrDefault();
...
}
var tokens = (from t in query3
where t.StudentId == s.StudentId
select new MonitorByGeneralScoreToAnsweredAssignment(AssignmentStatus.Pending)).ToList();
...
// does exist any overwritten score?
var osItem = query4.Where(os => os.StudentId == s.StudentId).SingleOrDefault();
...
}
// OverwrittenScores Query
var query4 = (from os in db.OverwrittenScores
where os.AssignmentId == assignmentId && os.Student.Class.TeacherId.Equals(teacherId)
select os).ToArray();
What I'm doing now is to get the records which I'm gonna use instead of getting one by one inside of the loop. Is this a good practice? Sometimes I guess that I'm not doing a good work :(
When I've got the records, I've save it into memory and using LINQ TO OBJECTS (from memory) I get to record.

So remember that making calls to a database will always be slow. In fact, it's often the slowest part of most applications. Thus, you should strive to return a lot of stuff at once, rather than trying to get items one at a time.
Strive to rewrite your queries such that you return as much of the required information in one go as necessary. Although you might use up more memory, it's more often than not worth it for the time savings. Connecting to databases is slow!
Secondly, (last I checked) Entity Framework uses reflection to be able to set properties on your objects. Reflection is also very slow, which is why - despite EFs cool factor - I still prefer to do my queries by hand. The performance is just significantly faster (but of course introduces another layer of complication since now you're not only dealing with one language - C# - but two - C# and SQL - which are conceptually very different).

Related

How to improve performance when joining List and Linq object

I have one list, read from file:
var lsData = ReadExcelFile<CustomerEntity>(path);
And one Object (loaded into memory):
lsCustomer = await CustomerService.GetAll()
.Where(c => c.isDeleted == null || !c.isDeleted.Value)
.OrderBy(c=> c.Code)
.ToListAsync();
And the join command:
var lsDuplicateEmail =
(from imp in lsData
join cust in lsCustomer
on ImportHelpers.GetPerfectStringWithoutSpace(imp.Email) equals ImportHelpers.GetPerfectStringWithoutSpace(cust.Email)
into gjoin
from g in gjoin.DefaultIfEmpty()
select new
{
ImportItem = imp,
CustomerItem = g,
}
into result
where !string.IsNullOrEmpty(result.ImportItem.Email) && result.CustomerItem != null
&& !ImportHelpers.CompareString(result.ImportItem.Code, result.CustomerItem.Code)
select result);
var lsDuplicateEmailInSystem = lsDuplicateEmail.Select(c => c.ImportItem.Code).Distinct().ToList();
I perform test with lsData list about 2000 records, lsCustomer about 200k records.
The Customer Email field is not indexed in the DB.
The join command executes with about 10s (even though the result is 0 records), too slow.
I've looked around and can't seem to index the email field in lsCustomer. I know the reason for the slowness is because the complexity is O(n*m).
Is there any way to improve performance?
Try the following code. Instead of GroupJoin, which is not needed here I have used Join. Also moved filters up in query.
var lsDuplicateEmail =
from imp in lsData
where !string.IsNullOrEmpty(imp.Email)
join cust in lsCustomer
on ImportHelpers.GetPerfectStringWithoutSpace(imp.Email) equals ImportHelpers.GetPerfectStringWithoutSpace(cust.Email)
where !ImportHelpers.CompareString(imp.Code, cust.Code)
select new
{
ImportItem = imp,
CustomerItem = cust,
};
Also show GetPerfectStringWithoutSpace implementation, maybe it is slow.
Another possible solution is to swap lsData and lsCustomer in query, maybe lookup search is not so fast.

Search method joing multiple tables

I need help with a search method for searching the tables for a matching text.
This works, except that the join needs to be LEFT OUTER JOIN otherwise I dont get any results if the pageId is missing in any of the tables.
This solution takes to long time to run, I would appreciate if someone can help me out with a better solution to handle this task.
public async Task<IEnumerable<Result>> Search(string query)
{
var temp = await (from page in _context.Pages
join pageLocation in _context.PageLocations on page.Id equals pageLocation.PageId
join location in _context.Locations on pageLocation.LocationId equals location.Id
join pageSpecialty in _context.PageSpecialties on page.Id equals pageSpecialty.PageId
join specialty in _context.Specialties on pageSpecialty.SpecialtyId equals specialty.Id
where
page.Name.ToLower().Contains(query)
|| location.Name.ToLower().Contains(query)
|| specialty.Name.ToLower().Contains(query)
select new Result
{
PageId = page.Id,
Name = page.Name,
Presentation = page.Presentation,
Rating = page.Rating
}).ToListAsync();
var results = new List<Result>();
foreach (var t in temp)
{
if (!results.Exists(p => p.PageId == t.PageId))
{
t.Locations = GetLocations(t.PageId);
t.Specialties = GetSpecialties(t.PageId);
results.Add(t);
}
}
return results;
}
Using navigation properties, the query could look like:
var temp = await (from page in _context.Pages
where Name.Contains(query)
|| page.PageLocation.Any(pl => pl.Location.Name.Contains(query))
|| page.PageSpecialties.Any(pl => pl.Specialty.Name.Contains(query))
select new Result
{
PageId = page.Id,
Name = page.Name,
Presentation = page.Presentation,
Rating = page.Rating,
Locations = page.PageLocation.Select(pl => pl.Location),
Specialties = page.PageSpecialties.Select(pl => pl.Specialty)
}).ToListAsync();
This has several benefits:
By the absence of joins, The query returns unique Result objects right away, so you don't need to deduplicate them afterwards.
The locations and specialties are loaded in the same query instead of two queries per Result (aka n+1 problem).
(Likely) ToLower is removed because the search is probably not case sensitive anyway. The query is executed as SQL and most of the times, SQL databases have case-insensitive collations. Removing ToLower makes the query sargable again.

Linq To Sql - return table result and count

i'm very new to linq to sql and in need of a little assistance.
Basically i'm building a message board in C#. I have 3 database tables - basic info is as follows.
FORUMS
forumid
name
THREADS
threadid
forumid
title
userid
POSTS
postid
threadid
text
userid
date
Basically I want to bring back everything I need in one query. I want to list a page of THREADS (for a particular FORUM) and also display the number of POSTS in that THREAD row and when the last POST was for that THREAD.
At the moment i'm getting back all THREADS and then looping through each the result set and making calls to the POST table seperately for the POST count for a Thread and the Latest Post in that thread but obviously this will cause problems in terms of hitting the database as the Message Board gets bigger.
My Linq To SQL so far:
public IList<Thread> ListAll(int forumid)
{
var threads =
from t in db.Threads
where t.forumid == forumid
select t;
return threads.ToList();
}
basicaly i now need to get the number of POSTS in each thread and the date of the last post in each thread.
Any help would be most appreciated :)
EDIT
Hi guys. Thanks for tyour help so far. Basically i'm almost there. However, I left an important part out of my initial question in the fact that I need to retrieve the user name of the person making the last POST. Therefore I need to join p.userid with u.userid on the USERS table. So far I have the following but just need to amend this to join the POST table with the USER table:
public IList<ThreadWithPostInfo> ListAll(int forumid)
{
var threads = (from t in db.Threads
where t.forumid == forumid
join p in db.Posts on t.threadid equals p.threadid into j
select new ThreadWithPostInfo() { thread = t, noReplies = j.Count(), lastUpdate = j.Max(post => post.date) }).ToList();
return threads;
}
UPDATE:
public IList<ThreadWithPostInfo> ListAll(int forumid)
{
var threads = (from t in db.Threads
from u in db.Users
where t.forumid == forumid && t.hide == "No" && t.userid == u.userid
join p in db.Posts on t.threadid equals p.threadid into j
select new ThreadWithPostInfo() { thread = t, deactivated = u.deactivated, lastPostersName = j.OrderByDescending(post => post.date).FirstOrDefault().User.username, noReplies = j.Count(), lastUpdate = j.Max(post => post.date) }).ToList();
return threads;
}
I finally figured that part of it out with thanks to all of you guys :). My only problem now is the Search Results method. At the moment it is like this:
public IList<Thread> SearchThreads(string text, int forumid)
{
var searchResults = (from t in db.Threads
from p in db.Posts
where (t.title.Contains(text) || p.text.Contains(text)) && t.hide == "No"
&& p.threadid == t.threadid
&& t.forumid == forumid
select t).Distinct();
return searchResults.ToList();
}
Note that I need to get the where clause into the new linq code:
where (t.title.Contains(text) || p.text.Contains(text)) && t.hide == "No"
so incorporating this clause into the new linq method. Any help is gratefully received :)
SOLUTION:
I figured out a solution but I don't know if its the best one or most efficient. Maybe you guys can tell me because i'm still getting my head around linq. James I think your answer was closest and got me to near to where I wanted to be - thanks :)
public IList<ThreadWithPostInfo> SearchThreads(string text, int forumid)
{
var searchResults = (from t in db.Threads
from p in db.Posts
where (t.title.Contains(text) || p.text.Contains(text)) && t.hide == "No"
&& p.threadid == t.threadid
&& t.forumid == forumid
select t).Distinct();
//return searchResults.ToList();
var threads = (from t in searchResults
join p in db.Posts on t.threadid equals p.threadid into j
select new ThreadWithPostInfo() { thread = t, lastPostersName = j.OrderByDescending(post => post.date).FirstOrDefault().User.username, noReplies = j.Count(), lastUpdate = j.Max(post => post.date) }).ToList();
return threads;
}
May be Too many database calls per session ....
Calling the database,. whether to query or to write, is a remote call, and we want to reduce the number of remote calls as much as possible. This warning is raised when the profiler notices that a single session is making an excessive number of calls to the database. This is usually an indication of a potential optimization in the way the session is used.
There are several reasons why this can be:
A large number of queries as a result of a Select N + 1
Calling the database in a loop
Updating (or inserting / deleting) a large number of entities
A large number of (different) queries that we execute to perform our task
For the first reason, you can see the suggestions for Select N + 1. Select N + 1 is a data access anti-pattern where the database is accessed in a suboptimal way. Take a look at this code sample :
// SELECT * FROM Posts
var postsQuery = from post in blogDataContext.Posts
select post;
foreach (Post post in postsQuery)
{
//lazy loading of comments list causes:
// SELECT * FROM Comments where PostId = #p0
foreach (Comment comment in post.Comments)
{
//print comment...
}
}
In this example, we can see that we are loading a list of posts (the first select) and then traversing the object graph. However, we access the collection in a lazy fashion, causing Linq to Sql to go to the database and bring the results back one row at a time. This is incredibly inefficient, and the Linq to Sql Profiler will generate a warning whenever it encounters such a case.
The solution for this example is simple. Force an eager load of the collection using the DataLoadOptions class to specify what pieces of the object model we want to load upfront.
var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<Post>(p => p.Comments);
blogDataContext.LoadOptions = loadOptions;
// SELECT * FROM Posts JOIN Comments ...
var postsQuery = (from post in blogDataContext.Posts
select post);
foreach (Post post in postsQuery)
{
// no lazy loading of comments list causes
foreach (Comment comment in post.Comments)
{
//print comment...
}
}
next is updating a large number of entities is discussed in Use Statement Batching, and can be achieved by using the PLinqO project, which is a set of extensions on top of Linq to Sql. How cool would it be to store items in cache as a group. Well, guess what! PLINQO is cool! When storing items in cache, just tell PLINQO the query result needs to belong to a group and specify the name. Invalidating cache is where the coolness of grouping really shows up. No coupling of cache and actions taken on that cache when they are in a group. Check out this example :
public ActionResult MyTasks(int userId)
{
// will be separate cache for each user id, group all with name MyTasks
var tasks = db.Task
.ByAssignedId(userId)
.ByStatus(Status.InProgress)
.FromCache(CacheManager.GetProfile().WithGroup("MyTasks"));
return View(tasks);
}
public ActionResult UpdateTask(Task task)
{
db.Task.Attach(task, true);
db.SubmitChanges();
// since we made an update to the tasks table, we expire the MyTasks cache
CacheManager.InvalidateGroup("MyTasks");
}
PLinqO supports the notion of query batching, using a feature called futures, which allow you to take several different queries and send them to the database in a single remote call. This can dramatically reduce the number of remote calls that you make and increase your application performance significantly.
cmiiw ^_^
public IList<Thread> ListAll(int forumid)
{
var threads =
from t in db.Threads
where t.forumid == forumid
select new
{
Thread = t,
Count = t.Post.Count,
Latest = t.Post.OrderByDescending(p=>p.Date).Select(p=>p.Date).FirstOrDefault()
}
}
Should be something like that
I think what you're really looking for is this:
var threadsWithPostStats = from t in db.Threads
where t.forumid == forumid
join p in db.Posts on t.threadid equals p.threadid into j
select new { Thread = t, PostCount = j.Count(), LatestPost = j.Max(post => post.date) };
Per your comment and updated question, I'm adding this restatement:
var threadsWithPostsUsers = from t in db.Threads
where t.forumid == forumid
join p in db.Posts on t.threadid equals p.threadid into threadPosts
let latestPostDate = threadPosts.Max(post => post.date)
join post in db.Posts on new { ThreadID = t.threadid, PostDate = latestPostDate } equals new { ThreadID = post.threadid, PostDate = post.date} into latestThreadPosts
let latestThreadPost = latestThreadPosts.First()
join u in db.Users on latestThreadPost.userid equals u.userid
select new { Thread = t, LatestPost = latestThreadPost, User = u };
Wouldn't hurt to get familiar with group by in LINQ and aggregates (Max, Min, Count).
Something like this:
var forums = (from t in db.Threads
group t by t.forumid into g
select new { forumid = g.Key, MaxDate = g.Max(d => d.ForumCreateDate) }).ToList();
Also check out this article for how to count items in a LINQ query with group by:
LINQ to SQL using GROUP BY and COUNT(DISTINCT)
LINQ aggregates:
LINQ Aggregate with Sub-Aggregates

LINQ-To-SQL - slow query

I'm just wondering if anyone can offer any advice on how to improve my query.
Basically, it'll be merging 2 rows into 1. The only thing the rows will differ by is a 'Type' char column ('S' or 'C') and the Value. What I want to do is select one row, with the 'S' value and the 'C' value, and calculate the difference (S-C).
My query works, but it's pretty slow - it takes around 8 seconds to get the results, which is not ideal for my application. I wish I could change the database structure but I can't sadly!
Here is my query:
var sales = (from cm in dc.ConsignmentMarginBreakdowns
join sl in dc.SageAccounts on new { LegacyID = cm.Customer, Customer = true } equals new { LegacyID = sl.LegacyID, Customer = sl.Customer }
join ss in dc.SageAccounts on sl.ParentAccount equals ss.ID
join vt in dc.VehicleTypes on cm.ConsignmentTripBreakdown.VehicleType.Trim() equals vt.ID.ToString() into vtg
where cm.ConsignmentTripBreakdown.DeliveryDate >= dates.FromDate && cm.ConsignmentTripBreakdown.DeliveryDate <= dates.ToDate
where (customer == null || ss.SageID == customer)
where cm.BreakdownType == 'S'
orderby cm.Depot, cm.TripNumber
select new
{
NTConsignment = cm.NTConsignment,
Trip = cm.ConsignmentTripBreakdown,
LegacyID = cm.LegacyID,
Costs = dc.ConsignmentMarginBreakdowns.Where(a => a.BreakdownType == 'C' && a.NTConsignment == cm.NTConsignment && a.LegacyID == cm.LegacyID && a.TripDate == cm.TripDate && a.Depot == cm.Depot && a.TripNumber == cm.TripNumber).Single().Value,
Sales = cm.Value ?? 0.00m,
Customer = cm.Customer,
SageID = ss.SageID,
CustomerName = ss.ShortName,
FullCustomerName = ss.Name,
Vehicle = cm.ConsignmentTripBreakdown.Vehicle ?? "None",
VehicleType = vtg.FirstOrDefault().VehicleTypeDescription ?? "Subcontractor"
});
A good place to start when optimizing Linq to SQL queries is the SQL Server Profiler. There you can find what SQL code is being generated by Linq to SQL. From there, you can toy around with the linq query to see if you can get it to write a better query. If that doesn't work, you can always write a stored procedure by hand, and then call it from Linq to SQL.
There really isn't enough information supplied to make an informed opinion. For example, how many rows in each of the tables? What does the generated T-SQL look like?
One thing I would suggest first is to take the outputted T-SQL, generate a query plan and look for table or index scans.

Why does this additional join increase # of queries?

I'm having trouble coming up with an efficient LINQ-to-SQL query. I am attempting to do something like this:
from x in Items
select new
{
Name = x.Name
TypeARelated = from r in x.Related
where r.Type == "A"
select r
}
As you might expect, it produces a single query from the "Items" table, with a left join on the "Related" table. Now if I add another few similar lines...
from x in Items
select new
{
Name = x.Name
TypeARelated = from r in x.Related
where r.Type == "A"
select r,
TypeBRelated = from r in x.Related
where r.Type == "B"
select r
}
The result is that a similar query to the first attempt is run, followed by an individual query to the "Related" table for each record in "Items". Is there a way to wrap this all up in a single query? What would be the cause of this? Thanks in advance for any help you can provide.
The above query if written directly in SQL would be written like so (pseudo-code):
SELECT
X.NAME AS NAME,
(CASE R.TYPE WHEN A THEN R ELSE NULL) AS TypeARelated,
(CASE R.TYPE WHEN B THEN R ELSE NULL) AS TypeBRelated
FROM Items AS X
JOIN Related AS R ON <some field>
However, linq-to-sql is not as efficient, from your explanation, it does one join, then goes to individually compare each record. A better way would be to use two linq queries similar to your first example, which would generate two SQL queries. Then use the result of the two linq queries and join them, which would not generate any SQL statement. This method would limit the number of queries executed in SQL to 2.
If the number of conditions i.e. r.Type == "A" etc., are going to increase over time, or different conditions are going to be added, you're better off using a stored procedure, which would be one SQL query at all times.
Hasanain
You can use eager loading to do a single join on the server to see if that helps. Give this a try.
using (MyDataContext context = new MyDataContext())
{
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Item>(i => i.Related);
context.LoadOptions = options;
// Do your query now.
}

Categories

Resources