I would like some suggestions for how to make this simple LINQ code to be as fast and efficient as possible
tbl_WatchList contains 51996 rows
The below test takes 2 secs to run according to VS2012 test explorer
[TestMethod]
public void TestRemoveWatch()
{
using (var DB = new A4C_2012_devEntities())
{
var results = DB.tbl_WatchList.OrderByDescending(x => x.ID).Take(1);
int WatchID = results.AsEnumerable().First().ID;
Assert.IsTrue(WatchList.RemoveWatch(WatchID));
}
}
You don't need to sort whole collection.
int WatchID = DB.tbl_WatchList.Max(wl => wl.ID);
Should be sufficient.
To optimize, do the following:
Use a profiling tool (such as SQL profiler) to see what SQL queries are sent to the database and to see what the real performance is of those queries.
Select the slow performing queries and analyse there query plan manually or use an Index Tuning Advisor to see what indexes you are missing.
Add the missing indexes.
Related
I'm coding an application with Entity Framework in which I rely heavily on user defined functions.
I have a question about the best way (most optimized way) of how I limit and page my result sets. Basically I am wondering if these two options are the same or one is prefered performance wise.
Option 1.
//C#
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
Option 2.
//C#
var result2 = _DB.fn_GetData(page = 0, size = 100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
ORDER BY Id
OFFSET (size * page) ROWS FETCH NEXT size ROWS ONLY
To me these seem to be producing about the same result, but maybe I am missing some key aspect.
You'll have to be aware when your LINQ statement is AsEnumerable and when it is AsQueryable. As long as your statement is an IQueryable<...> the software will try to translate it into SQL and let your database do the query. Once it really has lost the IQueryable, and has become an implementation of an IEnumerable, the data has been brought to local memory, and all further LINQ statements will be performed by your process, not by the database.
If you use your debugger, you will see that the return value of your fn_getData returns an IEnumerable. This means that the result of fn_GetData is brought to local memory and your OrderBy etc is performed by your process.
Usually it is much more efficient to only move the records that you will use to local memory. Besides: do not fetch the complete records, but only the properties that you plan to use. So in this case I guess you'll have to create an extended version of fn_GetData that returns only the values you plan to use
I suggest second option because SQL Server can more faster then C# methods.
In your first option, you take all of the records in table and loop through. But second option, SQL Server do it for you and you get what you want.
You should apply the limiting and where clauses (it depends on table indexes) in the database as far as possible. For first example;
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
The whole table is retrieved from database into in-memory and it kills the performance and reliability. I strongly don't suggest it. You should consider to put some limitations to filter records on the database. So, the second option is better approach in this case.
I'm dumping a table out of MySQL into a DataTable object using MySqlDataAdapter. Database input and output is doing fine, but my application code seems to have a performance issue I was able to track down to a specific LINQ statement.
The goal is simple, search the contents of the DataTable for a column value matching a specific string, just like a traditional WHERE column = 'text' SQL clause.
Simplified code:
foreach (String someValue in someList) {
String searchCode = OutOfScopeFunction(someValue);
var results = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.Take(1);
if (results.Any()) {
results.First()["columnname"] = 10;
}
}
This simplified code is executed thousands of times, once for each entry in someList. When I run Visual Studio Performance Profiler I see that the "results.Any()" line is highlighted as consuming 93.5% of the execution time.
I've tried several different methods for optimizing this code, but none have improved performance while keeping the emoteTable DataTable as the primary source of the data. I can convert emoteTable to Dictionary<String, DataRow> outside of the foreach, but then I have to keep the DataTable and the Dictionary in sync, which while still a performance improvement, feels wrong.
Three questions:
Is this the proper way to search for a value in a DataTable (equivalent of a traditional SQL WHERE clause)? If not, how SHOULD it be done?
Addendum to 1, regardless of the proper way, what is the fastest (execution time)?
Why does the results.Any() line consume 90%+ resources? In this situation it makes more sense that the var results line should consume the resources, after all, it's the line doing the actual search, right?
Thank you for your time. If I find an answer I shall post it here as well.
Any() is taking 90% of the time because the result is only executed when you call Any(). Before you call Any(), the query is not actually made.
It would seem the problem is that you first fetch entire table into the memory and then search. You should instruct your database to search.
Moreover, when you call results.First(), the whole results query is executed again.
With deferred execution in mind, you should write something like
var result = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.FirstOrDefault();
if (result != null) {
result["columnname"] = 10;
}
What you have implemented is pretty much join :
var searchCodes = someList.Select(OutOfScopeFunction);
var emotes = emoteTable.AsEnumerable();
var results = Enumerable.Join(emotes, searchCodes, e=>e, sc=>sc.Field<String>("code"), (e, sc)=>sc);
foreach(var result in results)
{
result["columnname"] = 10;
}
Join will probably optimize the access to both lists using some kind of lookup.
But first thing I would do is to completely abandon idea of combining DataTable and LINQ. They are two different technologies and trying to assert what they might do inside when combined is hard.
Did you try doing raw UPDATE calls? How many items are you expecting to update?
I need to search products through products in a database and I'm looking to set this up in the right way so that I can have solid performance when there are a lot of rows (1,000,000). I'm somewhat experienced with LINQ and EF, but never written any search algos and I've got the following code, but just have a few lingering questions.
context.products.Where(i => i.Name.ToLower().Contains(searchText.ToLower());
I also need to search the description.
context.products.Where(i => i.Description.ToLower().Contains(searchText.ToLower());
Is .ToLower() in this context going to reduce performance?
I have a regular index on Name and FullText on Description? Is this appropriate and does a regular index work well with .contains()?
Should I be using LINQ or some other method?
Is there way to do this where I can get the number of times the search text occurs in the name/description?
Thank you
I would seriously consider writing a stored procedure to do this, or raw sql in the DbContext using SqlQuery. If I was writing this code, that is what I would do. To me EntityFramework and performant never really went together.
Please do not use lower, since it will significantly impact performance.
use one of the following:
StringComparison.OrdinalIgnoreCase
StringComparison.CurrentCultureIgnoreCase
Since you're using contains, which will be translated as like'%text%' it is unlikely that sql server will use indexes. if you implement FullText search, you have to use a stored procedure to use advantages of a fulltext search.
Linq is always slower than a hand-written sql statement. I've seen some performance metrics on dapper.net website
Generally, if you're using the latest entity framework, you should get a pretty good performance since they've had some significant performance improvements in version 5.
In my case I didn't want to use stored procedures. I was using Entity Framework and that's what I wanted to use!
See if this method might help you.
public static IQueryable<T> LikeOr<T>(this IQueryable<T> source, string columnName, string searchTerm)
{
IEnumerable<string> words =
searchTerm.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries).Where(x => x.Length > 1);
var sb = new StringBuilder();
for (int i = 0; i < words.Count(); i++)
{
if (i != 0)
sb.Append(" || ");
sb.Append(string.Format("{0}.Contains(#{1})", columnName, i));
}
return source.Where(sb.ToString(), words.ToArray());
}
All the above method does it build up a SQL string and then pass it to a Dynamic LINQ Where method. On a high level all this does is allow you to use straight sql only where you need it, rather than writing the entire query in SQL.
It's called by something like this:
public List<Book> SearchForBooks(string phrase)
{
return _db.Books.Include(x=> x.Images).LikeOr("Title", phrase).OrderBy(x => x.Title)
.Take(6).Select(x => x).ToList()
.OrderByCountDescending("Title", phrase);
}
It's made possible by dynamic LINQ dll created by Microsoft but not included in the framework. Dynamic LINQ This will allow you to be more flexible in some areas.
I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.
Can anyone tell me if the following query calls the database multiple times or just once?
var bizItems = new
{
items = (
from bl in db.rc_BusinessLocations
orderby bl.rc_BusinessProfiles.BusinessName
select new BusinessLocationItem
{
BusinessLocation = bl,
BusinessProfile = bl.rc_BusinessProfiles,
Products = bl.rc_InventoryProducts_Business
})
.Skip(pageSkip).Take(pageSize),
Count = db.rc_BusinessLocations.Count()
};
I really need to get the Count() out of the query and I couldn't find another way to do it so if you have some better optimized code, feel free to share it!
Thanks in advance!
Gwar
It totally depends on what you are doing with the bizItems variable, because after you've ran just this code, only a COUNT(*) query will have ran. This is because the item contains an IQueryable which is a description of a query (an intent to run), not the result of the operation. The query will only run when you start iterating this query, by using foreach or an operator such as .Count(). Besides this, the BusinessProfile and Products properties will probably also contain IQueryables.
So, let's take a look at what you might do with this code:
foreach (var item in bizItems.items)
{
Console.WriteLine(item.BusinessLocation.City);
foreach (var profile in item.BusinessProfile)
{
Console.WriteLine(profile.Name);
}
foreach (var product in item.Products)
{
Console.WriteLine(product.Price);
}
Console.WriteLine(item.Count);
Console.WriteLine();
}
So, if you ask me again, looking at this code, how many queries will be sent to the database, my answer is: 2 + 2 * the number of items in bizItems.items. So the number of queries will be between 2 and (2 + 2 * pageSize).
You should check out Scott Guthrie's post on using the LINQ to SQL Debug Visualizer. This will allow you to see exactly the SQL that will be generated from your LINQ statement.
This will definitely result in two calls.