Search method joing multiple tables - c#

I need help with a search method for searching the tables for a matching text.
This works, except that the join needs to be LEFT OUTER JOIN otherwise I dont get any results if the pageId is missing in any of the tables.
This solution takes to long time to run, I would appreciate if someone can help me out with a better solution to handle this task.
public async Task<IEnumerable<Result>> Search(string query)
{
var temp = await (from page in _context.Pages
join pageLocation in _context.PageLocations on page.Id equals pageLocation.PageId
join location in _context.Locations on pageLocation.LocationId equals location.Id
join pageSpecialty in _context.PageSpecialties on page.Id equals pageSpecialty.PageId
join specialty in _context.Specialties on pageSpecialty.SpecialtyId equals specialty.Id
where
page.Name.ToLower().Contains(query)
|| location.Name.ToLower().Contains(query)
|| specialty.Name.ToLower().Contains(query)
select new Result
{
PageId = page.Id,
Name = page.Name,
Presentation = page.Presentation,
Rating = page.Rating
}).ToListAsync();
var results = new List<Result>();
foreach (var t in temp)
{
if (!results.Exists(p => p.PageId == t.PageId))
{
t.Locations = GetLocations(t.PageId);
t.Specialties = GetSpecialties(t.PageId);
results.Add(t);
}
}
return results;
}

Using navigation properties, the query could look like:
var temp = await (from page in _context.Pages
where Name.Contains(query)
|| page.PageLocation.Any(pl => pl.Location.Name.Contains(query))
|| page.PageSpecialties.Any(pl => pl.Specialty.Name.Contains(query))
select new Result
{
PageId = page.Id,
Name = page.Name,
Presentation = page.Presentation,
Rating = page.Rating,
Locations = page.PageLocation.Select(pl => pl.Location),
Specialties = page.PageSpecialties.Select(pl => pl.Specialty)
}).ToListAsync();
This has several benefits:
By the absence of joins, The query returns unique Result objects right away, so you don't need to deduplicate them afterwards.
The locations and specialties are loaded in the same query instead of two queries per Result (aka n+1 problem).
(Likely) ToLower is removed because the search is probably not case sensitive anyway. The query is executed as SQL and most of the times, SQL databases have case-insensitive collations. Removing ToLower makes the query sargable again.

Related

How to improve performance when joining List and Linq object

I have one list, read from file:
var lsData = ReadExcelFile<CustomerEntity>(path);
And one Object (loaded into memory):
lsCustomer = await CustomerService.GetAll()
.Where(c => c.isDeleted == null || !c.isDeleted.Value)
.OrderBy(c=> c.Code)
.ToListAsync();
And the join command:
var lsDuplicateEmail =
(from imp in lsData
join cust in lsCustomer
on ImportHelpers.GetPerfectStringWithoutSpace(imp.Email) equals ImportHelpers.GetPerfectStringWithoutSpace(cust.Email)
into gjoin
from g in gjoin.DefaultIfEmpty()
select new
{
ImportItem = imp,
CustomerItem = g,
}
into result
where !string.IsNullOrEmpty(result.ImportItem.Email) && result.CustomerItem != null
&& !ImportHelpers.CompareString(result.ImportItem.Code, result.CustomerItem.Code)
select result);
var lsDuplicateEmailInSystem = lsDuplicateEmail.Select(c => c.ImportItem.Code).Distinct().ToList();
I perform test with lsData list about 2000 records, lsCustomer about 200k records.
The Customer Email field is not indexed in the DB.
The join command executes with about 10s (even though the result is 0 records), too slow.
I've looked around and can't seem to index the email field in lsCustomer. I know the reason for the slowness is because the complexity is O(n*m).
Is there any way to improve performance?
Try the following code. Instead of GroupJoin, which is not needed here I have used Join. Also moved filters up in query.
var lsDuplicateEmail =
from imp in lsData
where !string.IsNullOrEmpty(imp.Email)
join cust in lsCustomer
on ImportHelpers.GetPerfectStringWithoutSpace(imp.Email) equals ImportHelpers.GetPerfectStringWithoutSpace(cust.Email)
where !ImportHelpers.CompareString(imp.Code, cust.Code)
select new
{
ImportItem = imp,
CustomerItem = cust,
};
Also show GetPerfectStringWithoutSpace implementation, maybe it is slow.
Another possible solution is to swap lsData and lsCustomer in query, maybe lookup search is not so fast.

Linq Join query returning empty dataset

I am using below code to join two tables based on officeId field. Its retuning 0 records.
IQueryable<Usage> usages = this.context.Usage;
usages = usages.Where(usage => usage.OfficeId == officeId);
var agencyList = this.context.Agencies.ToList();
var usage = usages.ToList();
var query = usage.Join(agencyList,
r => r.OfficeId,
a => a.OfficeId,
(r, a) => new UsageAgencyApiModel () {
Id = r.Id,
Product = r.Product,
Chain = a.Chain,
Name = a.Name
}).ToList();
I have 1000+ records in agencies table and 26 records in usage table.
I am expecting 26 records as a result with chain and name colums attached to result from agency table.
Its not returning anything. I am new to .net please guide me if I am missing anything
EDIT
#Tim Schmelter's solution works fine if I get both table context while executing join. But I need to add filter on top of usage table before applying join
IQueryable<Usage> usages = this.context.Usage;
usages = usages.Where(usage => usage.OfficeId == officeId);
var query = from a in usages
// works with this.context.usages instead of usages
join u in this.context.Agencies on a.OfficeId equals u.OfficeId
select new
{
Id = a.Id,
Product = a.Product,
Chain = u.Chain,
Name = u.Name
};
return query.ToList();
Attaching screenshot here
same join query works fine with in memory data as you see below
Both ways works fine if I add in memory datasource or both datasource directly. But not working if I add filter on usages based on officeId before applying join query
One problem ist that you load all into memory first(ToList()).
With joins i prefer query syntax, it is less verbose:
var query = from a in this.context.Agencies
join u in this.context.Usage on a.OfficeId equals u.OfficeId
select new UsageAgencyApiModel()
{
Id = u.Id,
Product = u.Product,
Chain = a.Chain,
Name = a.Name
};
List<UsageAgencyApiModel> resultList = query.ToList();
Edit: You should be able to apply the Where after the Join. If you still don't get records there are no matching:
var query = from a in this.context.Agencies
join u in this.context.Usage on a.OfficeId equals u.OfficeId
where u.OfficeId == officeId
select new UsageAgencyApiModel{ ... };
The following code can help to get the output based on the ID value.
Of course, I wrote with Lambda.
var officeId = 1;
var query = context.Agencies // your starting point - table in the "from" statement
.Join(database.context.Usage, // the source table of the inner join
agency => agency.OfficeId, // Select the primary key (the first part of the "on" clause in an sql "join" statement)
usage => usage.OfficeId , // Select the foreign key (the second part of the "on" clause)
(agency, usage) => new {Agency = agency, Usage = usage }) // selection
.Where(x => x.Agency.OfficeId == id); // where statement

Is it possible to incrementally add Where Clauses in an EF Query containing a Join?

Aim
I'd like to filter EF queries dynamically based on joined models (/tables in SQL thinking) as I already can with a single model.
Working Code with a Single Model
Given a searchParam I know the type of where clauses I need to do based on its searchId.
var initialSet = db.Files;
foreach (var searchParam in searchParams)
{
if (searchParam.SearchId == "Id")
{
var searchContent = (int)Convert.ChangeType(searchParam.SearchContent, typeof(int));
initialSet = initialSet.Where(m => m.Id == searchContent);
}
else if (searchParam.SearchId == "Name")
{
initialSet = initialSet.Where(m => m.Name == searchParam.SearchContent);
}
else if (searchParam.SearchId == "ImportDate")
{
var searchContent = DateTime.Parse(searchParam.SearchContent);
initialSet = initialSet.Where(m => m.ImportDate == searchContent);
}
}
return initialSet;
Problem with Joined Models
Without thinking about clauses for the moment, I have this initial join in mind.
var results = (from r1 in db.Runs
join rf1 in db.RunFiles on r1 equals rf1.Run
select new { r1, rf1 });
However when taking into account where clauses, since the join requires a select to occur, I can't simply join the tables, iterate the for loop like above and then select after. If there is a way to pass multiple prepared where clauses to a single query, I'm unaware of it.
I'd like to handle 0-to-many search parameters with these joined models. Any ideas?

Updating a property of an object using LINQ when value needed is from db

How do you suppose I tackle this? Basically, I have this inital query:
var orders = (from order in _dbContext.Orders
join orderDetail in _dbContext.OrderDetails on order.ID equals orderDetail.OrderID
where order.StoreID == storeID
select new Order
{
ID = order.ID,
No = order.ID,
Type = "", // Notice that this is empty; this one needs updating
Quantity = order.Quantity,
// more properties here
}).AsQueryable();
After this query, I need to loop through the result and update the Type property based on different criteria like this:
string type = "";
foreach (OrderDetailDto order in orders)
{
if (order.UserID != null)
type = "UserOrder";
else if (order.UserID == null)
type = "NonUserOrder";
else if (order.Cook == null && (order.Option == "fiery"))
type = "FieryCook";
else if (check if this has corresponding records in another table) // this part I don't know how to effectively tackle
type = "XXX";
// Update.
order.Type = type;
}
The problem is one of my criteria needs me to check if there are existing record in the database. I would use JOIN but if I have to loop thru several hundreds or thousands of records and then JOIN each one of them then check on db just to get one value, I think that would be very slow.
I can't do the JOIN on the initial query because I might do a different JOIN based on a different criterion. Any ideas?
You could just join all the lookup tables you might possibly need in left join type way:
from o in Orders
from c in Cooks.Where(x => x.OrderId == m.OrderId).DefaultIfEmpty()
from u in Users.Where(x => x.OrderId == o.OrderId).DefaultIfEmpty()
select new
{
Order = m,
Cook = c,
User = u
}
or depending on your usage patterns you could build the required tables into local Lookups or Dictionaries for linear time searching thereafter:
var userDict = Users.ToDictionary(x => x.UserId);
var userIdDict = Users.Select(x => x.UserId).ToDictionary(x => x);
var cooksLookup = Cooks.ToLookup(x => x.Salary);

Linq To Sql - return table result and count

i'm very new to linq to sql and in need of a little assistance.
Basically i'm building a message board in C#. I have 3 database tables - basic info is as follows.
FORUMS
forumid
name
THREADS
threadid
forumid
title
userid
POSTS
postid
threadid
text
userid
date
Basically I want to bring back everything I need in one query. I want to list a page of THREADS (for a particular FORUM) and also display the number of POSTS in that THREAD row and when the last POST was for that THREAD.
At the moment i'm getting back all THREADS and then looping through each the result set and making calls to the POST table seperately for the POST count for a Thread and the Latest Post in that thread but obviously this will cause problems in terms of hitting the database as the Message Board gets bigger.
My Linq To SQL so far:
public IList<Thread> ListAll(int forumid)
{
var threads =
from t in db.Threads
where t.forumid == forumid
select t;
return threads.ToList();
}
basicaly i now need to get the number of POSTS in each thread and the date of the last post in each thread.
Any help would be most appreciated :)
EDIT
Hi guys. Thanks for tyour help so far. Basically i'm almost there. However, I left an important part out of my initial question in the fact that I need to retrieve the user name of the person making the last POST. Therefore I need to join p.userid with u.userid on the USERS table. So far I have the following but just need to amend this to join the POST table with the USER table:
public IList<ThreadWithPostInfo> ListAll(int forumid)
{
var threads = (from t in db.Threads
where t.forumid == forumid
join p in db.Posts on t.threadid equals p.threadid into j
select new ThreadWithPostInfo() { thread = t, noReplies = j.Count(), lastUpdate = j.Max(post => post.date) }).ToList();
return threads;
}
UPDATE:
public IList<ThreadWithPostInfo> ListAll(int forumid)
{
var threads = (from t in db.Threads
from u in db.Users
where t.forumid == forumid && t.hide == "No" && t.userid == u.userid
join p in db.Posts on t.threadid equals p.threadid into j
select new ThreadWithPostInfo() { thread = t, deactivated = u.deactivated, lastPostersName = j.OrderByDescending(post => post.date).FirstOrDefault().User.username, noReplies = j.Count(), lastUpdate = j.Max(post => post.date) }).ToList();
return threads;
}
I finally figured that part of it out with thanks to all of you guys :). My only problem now is the Search Results method. At the moment it is like this:
public IList<Thread> SearchThreads(string text, int forumid)
{
var searchResults = (from t in db.Threads
from p in db.Posts
where (t.title.Contains(text) || p.text.Contains(text)) && t.hide == "No"
&& p.threadid == t.threadid
&& t.forumid == forumid
select t).Distinct();
return searchResults.ToList();
}
Note that I need to get the where clause into the new linq code:
where (t.title.Contains(text) || p.text.Contains(text)) && t.hide == "No"
so incorporating this clause into the new linq method. Any help is gratefully received :)
SOLUTION:
I figured out a solution but I don't know if its the best one or most efficient. Maybe you guys can tell me because i'm still getting my head around linq. James I think your answer was closest and got me to near to where I wanted to be - thanks :)
public IList<ThreadWithPostInfo> SearchThreads(string text, int forumid)
{
var searchResults = (from t in db.Threads
from p in db.Posts
where (t.title.Contains(text) || p.text.Contains(text)) && t.hide == "No"
&& p.threadid == t.threadid
&& t.forumid == forumid
select t).Distinct();
//return searchResults.ToList();
var threads = (from t in searchResults
join p in db.Posts on t.threadid equals p.threadid into j
select new ThreadWithPostInfo() { thread = t, lastPostersName = j.OrderByDescending(post => post.date).FirstOrDefault().User.username, noReplies = j.Count(), lastUpdate = j.Max(post => post.date) }).ToList();
return threads;
}
May be Too many database calls per session ....
Calling the database,. whether to query or to write, is a remote call, and we want to reduce the number of remote calls as much as possible. This warning is raised when the profiler notices that a single session is making an excessive number of calls to the database. This is usually an indication of a potential optimization in the way the session is used.
There are several reasons why this can be:
A large number of queries as a result of a Select N + 1
Calling the database in a loop
Updating (or inserting / deleting) a large number of entities
A large number of (different) queries that we execute to perform our task
For the first reason, you can see the suggestions for Select N + 1. Select N + 1 is a data access anti-pattern where the database is accessed in a suboptimal way. Take a look at this code sample :
// SELECT * FROM Posts
var postsQuery = from post in blogDataContext.Posts
select post;
foreach (Post post in postsQuery)
{
//lazy loading of comments list causes:
// SELECT * FROM Comments where PostId = #p0
foreach (Comment comment in post.Comments)
{
//print comment...
}
}
In this example, we can see that we are loading a list of posts (the first select) and then traversing the object graph. However, we access the collection in a lazy fashion, causing Linq to Sql to go to the database and bring the results back one row at a time. This is incredibly inefficient, and the Linq to Sql Profiler will generate a warning whenever it encounters such a case.
The solution for this example is simple. Force an eager load of the collection using the DataLoadOptions class to specify what pieces of the object model we want to load upfront.
var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<Post>(p => p.Comments);
blogDataContext.LoadOptions = loadOptions;
// SELECT * FROM Posts JOIN Comments ...
var postsQuery = (from post in blogDataContext.Posts
select post);
foreach (Post post in postsQuery)
{
// no lazy loading of comments list causes
foreach (Comment comment in post.Comments)
{
//print comment...
}
}
next is updating a large number of entities is discussed in Use Statement Batching, and can be achieved by using the PLinqO project, which is a set of extensions on top of Linq to Sql. How cool would it be to store items in cache as a group. Well, guess what! PLINQO is cool! When storing items in cache, just tell PLINQO the query result needs to belong to a group and specify the name. Invalidating cache is where the coolness of grouping really shows up. No coupling of cache and actions taken on that cache when they are in a group. Check out this example :
public ActionResult MyTasks(int userId)
{
// will be separate cache for each user id, group all with name MyTasks
var tasks = db.Task
.ByAssignedId(userId)
.ByStatus(Status.InProgress)
.FromCache(CacheManager.GetProfile().WithGroup("MyTasks"));
return View(tasks);
}
public ActionResult UpdateTask(Task task)
{
db.Task.Attach(task, true);
db.SubmitChanges();
// since we made an update to the tasks table, we expire the MyTasks cache
CacheManager.InvalidateGroup("MyTasks");
}
PLinqO supports the notion of query batching, using a feature called futures, which allow you to take several different queries and send them to the database in a single remote call. This can dramatically reduce the number of remote calls that you make and increase your application performance significantly.
cmiiw ^_^
public IList<Thread> ListAll(int forumid)
{
var threads =
from t in db.Threads
where t.forumid == forumid
select new
{
Thread = t,
Count = t.Post.Count,
Latest = t.Post.OrderByDescending(p=>p.Date).Select(p=>p.Date).FirstOrDefault()
}
}
Should be something like that
I think what you're really looking for is this:
var threadsWithPostStats = from t in db.Threads
where t.forumid == forumid
join p in db.Posts on t.threadid equals p.threadid into j
select new { Thread = t, PostCount = j.Count(), LatestPost = j.Max(post => post.date) };
Per your comment and updated question, I'm adding this restatement:
var threadsWithPostsUsers = from t in db.Threads
where t.forumid == forumid
join p in db.Posts on t.threadid equals p.threadid into threadPosts
let latestPostDate = threadPosts.Max(post => post.date)
join post in db.Posts on new { ThreadID = t.threadid, PostDate = latestPostDate } equals new { ThreadID = post.threadid, PostDate = post.date} into latestThreadPosts
let latestThreadPost = latestThreadPosts.First()
join u in db.Users on latestThreadPost.userid equals u.userid
select new { Thread = t, LatestPost = latestThreadPost, User = u };
Wouldn't hurt to get familiar with group by in LINQ and aggregates (Max, Min, Count).
Something like this:
var forums = (from t in db.Threads
group t by t.forumid into g
select new { forumid = g.Key, MaxDate = g.Max(d => d.ForumCreateDate) }).ToList();
Also check out this article for how to count items in a LINQ query with group by:
LINQ to SQL using GROUP BY and COUNT(DISTINCT)
LINQ aggregates:
LINQ Aggregate with Sub-Aggregates

Categories

Resources