I need to search products through products in a database and I'm looking to set this up in the right way so that I can have solid performance when there are a lot of rows (1,000,000). I'm somewhat experienced with LINQ and EF, but never written any search algos and I've got the following code, but just have a few lingering questions.
context.products.Where(i => i.Name.ToLower().Contains(searchText.ToLower());
I also need to search the description.
context.products.Where(i => i.Description.ToLower().Contains(searchText.ToLower());
Is .ToLower() in this context going to reduce performance?
I have a regular index on Name and FullText on Description? Is this appropriate and does a regular index work well with .contains()?
Should I be using LINQ or some other method?
Is there way to do this where I can get the number of times the search text occurs in the name/description?
Thank you
I would seriously consider writing a stored procedure to do this, or raw sql in the DbContext using SqlQuery. If I was writing this code, that is what I would do. To me EntityFramework and performant never really went together.
Please do not use lower, since it will significantly impact performance.
use one of the following:
StringComparison.OrdinalIgnoreCase
StringComparison.CurrentCultureIgnoreCase
Since you're using contains, which will be translated as like'%text%' it is unlikely that sql server will use indexes. if you implement FullText search, you have to use a stored procedure to use advantages of a fulltext search.
Linq is always slower than a hand-written sql statement. I've seen some performance metrics on dapper.net website
Generally, if you're using the latest entity framework, you should get a pretty good performance since they've had some significant performance improvements in version 5.
In my case I didn't want to use stored procedures. I was using Entity Framework and that's what I wanted to use!
See if this method might help you.
public static IQueryable<T> LikeOr<T>(this IQueryable<T> source, string columnName, string searchTerm)
{
IEnumerable<string> words =
searchTerm.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries).Where(x => x.Length > 1);
var sb = new StringBuilder();
for (int i = 0; i < words.Count(); i++)
{
if (i != 0)
sb.Append(" || ");
sb.Append(string.Format("{0}.Contains(#{1})", columnName, i));
}
return source.Where(sb.ToString(), words.ToArray());
}
All the above method does it build up a SQL string and then pass it to a Dynamic LINQ Where method. On a high level all this does is allow you to use straight sql only where you need it, rather than writing the entire query in SQL.
It's called by something like this:
public List<Book> SearchForBooks(string phrase)
{
return _db.Books.Include(x=> x.Images).LikeOr("Title", phrase).OrderBy(x => x.Title)
.Take(6).Select(x => x).ToList()
.OrderByCountDescending("Title", phrase);
}
It's made possible by dynamic LINQ dll created by Microsoft but not included in the framework. Dynamic LINQ This will allow you to be more flexible in some areas.
Related
I need to get the amount of records with a certain filter.
Theoretically this instruction:
_dbContext.People.Count (w => w.Type == 1);
It should generate SQL like:
Select count (*)
from People
Where Type = 1
However, the generated SQL is:
Select Id, Name, Type, DateCreated, DateLastUpdate, Address
from People
Where Type = 1
The query being generated takes much longer to run in a database with many records.
I need to generate the first query.
If I just do this:
_dbContext.People.Count ();
Entity Framework generates the following query:
Select count (*)
from People
.. which runs very fast.
How to generate this second query passing search criteria to the count?
There is not much to answer here. If your ORM tool does not produce the expected SQL query from a simple LINQ query, there is no way you can let it do that by rewriting the query (and you shouldn't be doing that at the first place).
EF Core has a concept of mixed client/database evaluation in LINQ queries which allows them to release EF Core versions with incomplete/very inefficient query processing like in your case.
Excerpt from Features not in EF Core (note the word not) and Roadmap:
Improved translation to enable more queries to successfully execute, with more logic being evaluated in the database (rather than in-memory).
Shortly, they are planning to improve the query processing, but we don't know when will that happen and what level of degree (remember the mixed mode allows them to consider query "working").
So what are the options?
First, stay away from EF Core until it becomes really useful. Go back to EF6, it's has no such issues.
If you can't use EF6, then stay updated with the latest EF Core version.
For instance, in both v1.0.1 and v1.1.0 you query generates the intended SQL (tested), so you can simply upgrade and the concrete issue will be gone.
But note that along with improvements the new releases introduce bugs/regressions (as you can see here EFCore returning too many columns for a simple LEFT OUTER join for instance), so do that on your own risk (and consider the first option again, i.e. Which One Is Right for You :)
Try to use this lambda expression for execute query faster.
_dbContext.People.select(x=> x.id).Count();
Try this
(from x in _dbContext.People where x.Type == 1 select x).Count();
or you could do the async version of it like:
await (from x in _dbContext.People where x.Type == 1 select x).CountAsync();
and if those don't work out for you, then you could at least make the query more efficient by doing:
(from x in _dbContext.People where x.Type == 1 select x.Id).Count();
or
await (from x in _dbContext.People where x.Type == 1 select x.Id).CountAsync();
If you want to optimize performance and the current EF provider is not not (yet) capable of producing the desired query, you can always rely on raw SQL.
Obviously, this is a trade-off as you are using EF to avoid writing SQL directly, but using raw SQL can be useful if the query you want to perform can't be expressed using LINQ, or if using a LINQ query is resulting in inefficient SQL being sent to the database.
A sample raw SQL query would look like this:
var results = _context.People.FromSql("SELECT Id, Name, Type, " +
"FROM People " +
"WHERE Type = #p0",
1);
As far as I know, raw SQL queries passed to the FromSql extension method currently require that you return a model type, i.e. returning a scalar result may not yet be supported.
You can however always go back to plain ADO.NET queries:
using (var connection = _context.Database.GetDbConnection())
{
connection.Open();
using (var command = connection.CreateCommand())
{
command.CommandText = "SELECT COUNT(*) FROM People WHERE Type = 1";
var result = command.ExecuteScalar().ToString();
}
}
It seems that there has been some problem with one of the early releases of Entity Framework Core. Unfortunately you have not specified exact version so I am not able to dig into EF source code to tell what exactly has gone wrong.
To test this scenario, I have installed the latest EF Core package and managed to get correct result.
Here is my test program:
And here is SQL what gets generated captured by SQL Server Profiler:
As you can see it matches all the expectations.
Here is the excerpt from packages.config file:
...
<package id="Microsoft.EntityFrameworkCore" version="1.1.0" targetFramework="net452" />
...
So, in your situation the only solution is to update to the latest package which is 1.1.0 at the time of writing this.
Does this get what you want:
_dbContext.People.Where(w => w.Type == 1).Count();
I am using EFCore 1.1 here.
This can occur if EFCore cannot translate the entire Where clause to SQL. This can be something as simple as DateTime.Now that might not even think about.
The following statement results in a SQL query that will surprisingly run a SELECT * and then C# .Count() once it has loaded the entire table!
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > DateTime.Now.AddDays(-7)).Count();
But this query will run an SQL SELECT COUNT(*) as you would expect / hope for:
DateTime earliestDate = DateTime.Now.AddDays(-7);
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template
&& x.SendConfirmedDate > earliestDate).Count();
Crazy but true. Fortunately this also works:
DateTime now = DateTime.Now;
int sentCount = ctx.ScheduledEmail.Where(x => x.template == template &&
x.SendConfirmedDate > now.AddDays(-7)).Count();
sorry for the bump, but...
probably the reason the query with the where clause is slow is because you didnt provide your database a fast way to execute it.
in case of the select count(*) from People query we dont need to know the actual data for each field and we can just use a small index that doesnt have all these fields in them so we havent got to spend our slow I/O on. The database software would be clever enough to see that the primary key index requires the least I/O to do the count on. The pk id's require less space than the full row so you get more back to count per I/O block so you can complete faster.
Now in the case of the query with the Type it needs to read the Type to determine it's value. You should create an index on Type if you want your query to be fast or else it will have to do a very slow full table scan, reading all rows. It helps when your values are more discriminating. A column Gender (usually) only has two values and isnt very discriminating, a primary key column where every value is unique is highly dscriminating. Higher discriminating values will result in a shorter index range scan and a faster result to the count.
What I used to count rows using a search query was
_dbContext.People.Where(w => w.Type == 1).Count();
This can also be achieved by
List<People> people = new List<People>();
people = _dbContext.People.Where(w => w.Type == 1);
int count = people.Count();
This way you will get the people list too if you need it further.
I have a list of strings that are search Queries.
I want to see if a string from the database contains anyone of those terms in the Query. I'd like to do this on one line of code, that doesn't make multiple calls to the database. This should work but I want it to be more optimized.
var queries = searchQuery.Trim().Split(' ', StringSplitOptions.RemoveEmptyEntries).Distinct();
var query = context.ReadContext.Divisions.AsQueryable();
queries.ForEach(q => {
query = query.Where(d => (d.Company.CompanyCode + "-" + d.Code).Contains(q));
});
Is there a function that can do this better or a more optimal way of writing that?
There are two issues with your proposed solution:
Most LINQ to SQL providers don't understand string.Contains("xyz") so the provider will either throw an exception or fetch all the data to your machine. The right thing to do is to use SqlMethods.Like as explained in Using contains() in LINQ to SQL
Also, the code you show will check whether the division contains all of the specified strings.
To implement the 'any' behavior you need to construct a custom expression, which will not be possible using plain C#. You would need to look at the System.Linq.Expressions namespace: https://msdn.microsoft.com/en-us/library/system.linq.expressions(v=vs.110).aspx
It is possible, but quite involved.
I would like some suggestions for how to make this simple LINQ code to be as fast and efficient as possible
tbl_WatchList contains 51996 rows
The below test takes 2 secs to run according to VS2012 test explorer
[TestMethod]
public void TestRemoveWatch()
{
using (var DB = new A4C_2012_devEntities())
{
var results = DB.tbl_WatchList.OrderByDescending(x => x.ID).Take(1);
int WatchID = results.AsEnumerable().First().ID;
Assert.IsTrue(WatchList.RemoveWatch(WatchID));
}
}
You don't need to sort whole collection.
int WatchID = DB.tbl_WatchList.Max(wl => wl.ID);
Should be sufficient.
To optimize, do the following:
Use a profiling tool (such as SQL profiler) to see what SQL queries are sent to the database and to see what the real performance is of those queries.
Select the slow performing queries and analyse there query plan manually or use an Index Tuning Advisor to see what indexes you are missing.
Add the missing indexes.
I'm learning RavenDB and I am rather confused. As far as I understand, one should create indexes in order to have really efficient queries. However, it is possible to simply make LINQ queries, such as
using(IDocumentSession session = _store.OpenSession())
{
MyDocument doc = session.Query<MyDocument>()
.Where(d => d.Property == value)
.Single();
}
This type of query works perfectly fine. I have, however, never created an index for it (and never reference an index when making the query, of course).
Should I be using this kind of query when working with RavenDB? If not, why is it even available in the API?
There's two things you are asking, here.
Can we use Indexes .. which are suppose to be more efficient than dynamic queries?
If we use indexes .. then should we use Linq and chaining?
Indexes
As Matt Warren correctly said, you're not using any indexes in your sample query. Right now, with your sample query, RavenDb is smart enough to create a temp (dynamic) index. If that dynamic index is used enough, it get auto-promoted to a static / perminent index.
So .. should you use indexes? If you can, then yeah!
here's your statement again, this time with an Index defined.
using(IDocumentSession session = _store.OpenSession())
{
MyDocument doc = session.Query<MyDocument>("ByProperty")
.Where(d => d.Property == value)
.Single();
}
In this case an index called MyDocument_ByProperty was created somewhere. I'm not going to explain the details of indexes .. go and read all about them here.
Linq and chaining
(Not sure if that is the correct terminology ... )
If you create a linq statement (which I did above) with OR without an index .. a query is still generated .. which then is translated into an HTTP RESTful request to the RavenDB Server. If you have an index .. then the query is smart enough to ask to use that. None? Then the server will create a dynamic index .. which means it will also have to go through the motions of indexing first, then retrieving your results.
TL;DR;
Yes use indexes. Yes use Linq chaining.
RavenDb comes with a native support for .Net and Linq.
The Linq provider, under the hood, does normal REST calls to the ravendb server, but for you it's easier to code on it since you can use IQueryable<T> with strongly typed classes.
So yes, you can and you should use linq/lambda to work with RavenDB in a .Net envorinment.
Something to be aware of that caught me out is that if you include a linq statement such as .Where(d => d.SomeProperty == null) then you might expect that if the document does not have the property then you would return a match. However this is not the case. If the document does not have the property then its value is not considered to be null (or any other value).
I have some trouble with LINQ. In my program I generate a SQL search query like
select * from emp "where empId=1 and empname='abc'"
(where the quoted text is generated in my code). I can pass the generated "where empId..." string text to the SQL query.
I'd like to do the same thing in LINQ - I want to pass this string as the search criteria i.e. something like
var employee=from a in Employee.AsEnumerable()
"where empId=1 and empname='abc'"
select a;
Is this possible? Thanks in advance.
You can take the base query (in your case Employee.AsEnumerable()) and use the logic you use to generate the string to compose a new query. For example:
if(/*your logic for generating the string "where empId=1" here*/)
{
query = query.Where(a.empId == 1);
}
if(/*your logic for generating the string "empname='abc'" here*/)
{
query = query.Where(a.empname == "abc");
}
The resulting query object will have all the operators composed. However as others have said this is not trivial in the general case. It is not trivial with SQL strings either. If all you need to generate are several filters it will work but if you need complex expressions it will be a problem.
101 LINQ Samples
It's pretty hard, unless you intend to employ:
Dynamic code compilation, or
You are willing to create a (very complicated) parser to analyze the query and call the respective linq extension methods
I personally have no experience in the latter. As for the former, it is a bit tricky and can go nastily wrong if you don't do proper caching and security checks. Executable code injection is very dangerous.
I think you had better use different methods to filter content using methods like Where() if the number of queries can be predetermined or return to SQL if not. Usually you don't need to do this unless the query is manually entered by the user.