I am building a backend (.NET 5 WebApi via REST) for a mobile app.
We have a few million entries in the database (Azure SQL Server) which all have a geolocation.
The app should query them sorted by the current location.
In addition, this should be paged, so e.g. take the first 30 results with the first call, then the next 30, etc.
I cannot come up with a really clever solution.
My current code for the third page of 30 entries looks like that:
data.OrderBy(p => p.Location.Distance(currentLocation)).skip(60).take(30).toListAsync()
The problem is that even if I know that I need only 30 results, the query needs to order the full table.
I know I can boost it with an index, but does anyone have a hint how to optimize this LINQ code?
Thanks a lot!
this part looks suspect: p.Location.Distance(currentLocation). If this is running EF Core 2.x then my guess is this would be triggering client-side evaluation resulting in all data being queried back prior to the sorting and pagination. I would recommend hooking up a profiler to the database and reviewing the SQL that is actually being run.
To better arrange sorting by distance I would consider something like:
var x = currentLocation.X;
var y = currentLocation.Y;
var results = await data.OrderBy(p => Math.Abs(p.Location.X - x) + Math.Abs(p.Location.Y - y))
.Skip(pageNumber * pageSize)
.Take(pageSize)
.ToListAsync();
This ensures the sorting is done DB server-side. (Though make sure data is still an IQueryable.) Substitute X/Y with Lat/Long or whatever coordinate fields you are using.
This doesn't give you the distance, but it gives you a value relative to the distance for each point to compare against other points. To get the distance would be Math.Sqrt(Math.Pow(p.Location.X - x,2) + Math.Pow(Location.Y - y,2)). I believe EF will translate that to SQL, at least for SQL Server's provider. It's more math conversion to put into an SQL Search which can't be indexed, but that might be more useful if you want to return the distance with the results.
Related
I'm coding an application with Entity Framework in which I rely heavily on user defined functions.
I have a question about the best way (most optimized way) of how I limit and page my result sets. Basically I am wondering if these two options are the same or one is prefered performance wise.
Option 1.
//C#
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
Option 2.
//C#
var result2 = _DB.fn_GetData(page = 0, size = 100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
ORDER BY Id
OFFSET (size * page) ROWS FETCH NEXT size ROWS ONLY
To me these seem to be producing about the same result, but maybe I am missing some key aspect.
You'll have to be aware when your LINQ statement is AsEnumerable and when it is AsQueryable. As long as your statement is an IQueryable<...> the software will try to translate it into SQL and let your database do the query. Once it really has lost the IQueryable, and has become an implementation of an IEnumerable, the data has been brought to local memory, and all further LINQ statements will be performed by your process, not by the database.
If you use your debugger, you will see that the return value of your fn_getData returns an IEnumerable. This means that the result of fn_GetData is brought to local memory and your OrderBy etc is performed by your process.
Usually it is much more efficient to only move the records that you will use to local memory. Besides: do not fetch the complete records, but only the properties that you plan to use. So in this case I guess you'll have to create an extended version of fn_GetData that returns only the values you plan to use
I suggest second option because SQL Server can more faster then C# methods.
In your first option, you take all of the records in table and loop through. But second option, SQL Server do it for you and you get what you want.
You should apply the limiting and where clauses (it depends on table indexes) in the database as far as possible. For first example;
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
The whole table is retrieved from database into in-memory and it kills the performance and reliability. I strongly don't suggest it. You should consider to put some limitations to filter records on the database. So, the second option is better approach in this case.
We are decided to use mongo db in our game for real time database but the performance of the search result is not acceptable. These are the test result with 15.000 documents and 17 fields(strings, int,float)
// 14000 ms
MongoUrl url = new MongoUrl("url-adress");
MongoClient client = new MongoClient(url);
var server = client.GetServer();
var db = server.GetDatabase("myDatabase");
var collection = db.GetCollection<PlayerFields>("Player");
var ranks = collection.FindAll().AsQueryable().OrderByDescending(p=>p.Score).ToList().FindIndex(FindPlayer).Count();
This one is the worst. //.ToList() is for testing purposes. Don't use in production code.
Second test
//9000 ms
var ranks = collection.FindAll().AsQueryable().Where(p=>p.Score < PlayerInfos.Score).Count();
Third test
//2000 ms
var qq = Query. GT("Kupa", player.Score);
var ranks = collection.Find( qq ).Where(pa=>(pa.Win + pa.Lose + pa.Draw) != 0 );
Is there any other way to make fast searches in mongo with C# .Net 2.0. We want to get player's rank according to users score and rank them.
To caveat this, I've not been a .NET dev for a few years now, so if there is a problem with the c# driver then I can't really comment, but I've got a good knowledge of Mongo so hopefully I'll help...
Indexes
Indexes will help you out a lot here. As you are ordering and filtering on fields which aren't indexed, this will only cause you problems as the database gets larger.
Indexes are direction specific (ascending/descending). Meaning that your "Score" field should be indexed descending:
db.player.ensureIndex({'Score': -1}) // -1 indicating descending
Queries
Also, Mongo is really awesome (in my opinion) and it doesn't look like you're using it to be best of it's abilities.
Your first call:
var ranks = collection.FindAll().AsQueryable().OrderByDescending(p=>p.Score).ToList().FindIndex(FindPlayer).Count();
It appears (this is where my .NET knowledge may be letting me down) that you're retrieving the entire collection ToList(), then filtering it in memory (FindPlayer predicate) in order to retrieve a subset of data. I believe that this will be evaluating the entire curser (15.000 documents) into the memory of your application.
You should update your query so that Mongo is doing the work rather than your application.
Given your other queries are filtering on Score, adding the index as described above should drastically increase the performance of these other queries
Profiling
If the call that you're expecting to make when run from the mongo cli is behaving as expected, it could be that the driver is making slightly different queries.
In the mongo CLI, you will first need to set the profiling:
db.setProfilingLevel(2)
You can then query the profile collection to see what queries are actually being made:
db.system.profile.find().limit(5).sort({ts: -1}).pretty()
This will show you the 5 most recent calls.
I have some database ang now it contains a table with about 100 rows. But in future it will have not 100 but 1 000 000+ rows and I have to be careful with my web application I'm developing now.
Problem is next: at web page I need to create paged list what will show records to user. And here is a sample of code that I plan to use
public IQueryable<MyTable> GetRows(int from, int to)
{
var queryRes = (from row in SomeDataContext.MyTable
order by row.id
select row).AsQueriable();
return queryRes.Take(to).Skip(from);
}
It is only sample of code. I did not run it.
But question is what will go on in this case? I see tow scenarios
It will load all rows from database and at server side and records in range from 'from' to 'to' will be returned. Other will be ignored. In this case my application will have big troubles. Imagine load 1 000 000 rows from database every time. It will be disaster.
It will construct SQL request what will return only rows I need without loading others. That's exactly what I need.
I think that it will be 2 scenario but I'm not sure and can't check it. Am I correct?
As a side-note, you don't have to call AsQueryable. It is enough to do
var queryRes = SomeDataContext.MyTable.OrderBy(r => r.Id);
return queryRes.Take(to).Skip(from);
And to answer your question - scenario 2 will be executed. You can always check the generated SQL by using the SQL Server Profiler, but in case you are using Entity Framework, you can even do queryRes.ToString(). And as #Aron correctly pointed out - the query will be actually executed against the database only when enumerating the results (e.g. calling queryRes.ToList()).
These questions address the issue of looking up the SQL code in more detail:
How to view generated SQL from Entity Framework?
exact sql query executed by Entity Framework
Strictly speaking, neither 1 nor 2 is correct. Running the code DOES NOT hit the database. It constructs an expression tree. The calling code can still modify the expression tree further without hitting the database.
With the IQueryable interface no SQL is run. It is at the point when you call IEnumerable.GetEnumerator() that the underlying Linq Provider converts the WHOLE expression into a query. In this case a SQL query, and then run it.
So for example, with this code. You could have
void Main()
{
var foo = from x in GetRows(10, 10)
where x.Id > 1000
select x;
foreach(var f in foo)
{
//Stuff
}
}
The sql that is actually run will actually be closer to
SELECT a,b,c FROM
(SELECT a,b,c, ROW_NUMBER() OVER (ORDER BY ...) as row_number
FROM Table
WHERE id > 1000) t0
WHERE to.row_number BETWEEN 10 and 20;
To be honest you are going about this wrong. You don't need a GetRows method. I would directly call the Linq query when constructing the table itself. You should take a look at the IRepository pattern that MVC scaffolding uses.
Finally if this is meant to be called as a WebQuery for AJAX I would look at the two OData implementations in .net (WCF Data Services and WebAPI OData).
You are right.
The 2. scenario is what will happen. When the query is eventuallty exectuted.
I Would sugges to reverse the Take - Skip, so you start by Skip
queryRes.Skip(from).Take(to)
Debuggen this method will not make any calls to the database. It just returns the query - not the resualt.
If you want to test exactly what will happen, try download LinqPad - it is a great to for demystifying linq queries.
I have the following query:
if (idUO > 0)
{
query = query.Where(b => b.Product.Center.UO.Id == idUO);
}
else if (dependencyId > 0)
{
query = query.Where(b => b.DependencyId == dependencyId );
}
else
{
var dependencyIds = dependencies.Select(d => d.Id).ToList();
query = query.Where(b => dependencyIds.Contains(b.DependencyId.Value));
}
[...] <- Other filters...
if (specialDateId != 0)
{
query = query.Where(b => b.SpecialDateId == specialDateId);
}
So, I have other filters in this query, but at the end, I process the query in the database with:
return query.OrderBy(b => b.Date).Skip(20 * page).Take(20).ToList(); // the returned object is a Ticket object, that has 23 properties, 5 of them are relationships (FKs) and i fill 3 of these relationships with lazy loading
When I access the first page, its OK, the query takes less than one 1 second, but when I try to access the page 30000, the query takes more than 20 seconds. There is a way in the linq query, that I can improve the performance of the query? Or only in the database level? And in the database level, for this kind of query, which is the best way to improve the performance?
There is no much space here, imo, to make things better (at least looking on the code provided).
When you're trying to achieve a good performance on such numbers, I would recommend do not use LINQ at all, or at list use it on the stuff with smaler data access.
What you can do here, is introduce paging of that data on DataBase level, with some stored procedure, and invoke it from your C# code.
1- Create a view in DB which orders items by date including all related relationships, like Products etc.
2- Create a stored procedure querying this view with related parameters.
I would recommend that you pull up SQL Server Profiler, and run a profile on the server while you run the queries (both the fast and the slow).
Once you've done this, you can pull it into the Database Engine Tuning Advisor to get some tips about Indexes that you should add.. This has had great effect for me in the past. Of course, if you know what indexes you need, you can just add them without running the Advisor :)
I think you'll find that the bottleneck is occurring at the database. Here's why;
query.
You have your query, and the criteria. It goes to the database with a pretty ugly, but not too terrible select statement.
.OrderBy(b => b.Date)
Now you're ordering this giant recordset by date, which probably isn't a terrible hit because it's (hopefully) indexed on that field, but that does mean the entire set is going to be brought into memory and sorted before any skipping or taking occurs.
.Skip(20 * page).Take(20)
Ok, here's where it gets rough for the poor database. Entity is pretty awful at this sort of thing for large recordsets. I dare you to open sql profiler and view the random mess of sql it's sending over.
When you start skipping and taking, Entity usually sends queries that coerce the database into scanning the entire giant recordset until it finds what you are looking for. If that's the first ordered records in the recordset, say page 1, it might not take terribly long. By the time you're picking out page 30,000 it could be scanning a lot of data due to the way Entity has prepared your statement.
I highly recommend you take a look at the following link. I know it says 2005, but it's applicable to 2008 as well.
http://www.codeguru.com/csharp/.net/net_data/article.php/c19611/Paging-in-SQL-Server-2005.htm
Once you've read that link, you might want to consider how you can create a stored procedure to accomplish what you're going for. It will be more lightweight, have cached execution plans, and is pretty well guaranteed to return the data much faster for you.
Barring that, if you want to stick with LINQ, read up on Compiled Queries and make sure you're setting MergeOption.NoTracking for read-only operations. You should also try returning an Object Query with explicit Joins instead of an IQueryable with deferred loading, especially if you're iterating through the results and joining to other tables. Deferred Loading can be a real performance killer.
Note: I know there are a number of questions around for issues with Linq's .Include(table) not loading data, I believe I have exhausted the options people have listed, and still had problems.
I have a large Linq2Entities query on an application I'm maintaining. The query is built up as such:
IQueryable<Results> query = context.MyTable
.Where(r =>
r.RelatedTable.ID == 2 &&
r.AnotherRelatedTable.ID == someId);
Then predicates are built up depending on various business logic, such as:
if (sortColumn.Contains("dob "))
{
if (orderByAscending)
query = query.OrderBy(p => p.RelatedTable.OrderByDescending(q => q.ID).FirstOrDefault().FieldName);
else
query = query.OrderByDescending(p => p.RelatedTable.OrderByDescending(q => q.ID).FirstOrDefault().FieldName);
}
Note - there is always a sort order provided.
Originally the included tables were set at the beginning, after reading articles such as the famous Tip 22, so now they are done at the end (which didn't fix the problem):
var resultsList = (query.Select(r => r) as ObjectQuery<Results>)
.Include("RelatedTable")
.Include("AnotherRelatedTable")
.Skip((page - 1) * rowsPerPage)
.Take(rowsPerPage);
Seemingly at random (approximately for every 5000 users of the site, this issue happens once) the RelatedTable data won't load. It can be brute forced by calling load on the related table. But even the failure to load isn't consistent, I've run the query in testing and it's worked, but most of the time hasn't, without changing any code or data.
It is fine, when the skip and take aren't included, and the whole dataset is returned, but I would expect the skip and take to be done on the complete dataset - it certainly appears to be from profiling the SQL...
UPDATE 16/11/10: I have profiled the SQL against a problem data set, and I've been able to reproduce the query failing about 9/10 times, but succeeding the rest. The SQL being executed is identical when the query fails or succeeds except, as expected, for the parameters passed to the SQL.
The issue has been solved with the following change, but the question remains as to why this should be.
Failing - get LINQ to handle the rows:
var resultsList = (query.Select(r => r) as ObjectQuery<Results>)
.Include("RelatedTable")
.Include("AnotherRelatedTable")
.Skip((page - 1) * rowsPerPage)
.Take(rowsPerPage)
.ToList();
Working - enumerate the data then get the rows:
var resultsList = (query.Select(r => r) as ObjectQuery<Results>)
.Include("RelatedTable")
.Include("AnotherRelatedTable")
.ToList()
.Skip((page - 1) * rowsPerPage)
.Take(rowsPerPage);
Unfortunately the SQL this query creates contains some sensitive schema data so I can't post it, it is also 1400 lines long, so I wouldn't subject the public to it anyway!
The sole effect of Take() is to change the generated SQL. Other than that, the Entity Framework does not care about it at all. Same for .Skip(). It's hard to believe that this would have an effect on query materialization (although stranger things have happened).
So what could be causing this behavior? Off the top of my head:
A bug in your application or mapping which is causing an incorrect query to be generated.
A bug in the Entity Framework which would cause returned data to be materialized into objects incorrectly in certain circumstances.
Bad data in your database.
A bug in your database's SQL parser.
I don't think you're going to get a lot further with this until you can capture the generated SQL and run it yourself. This is actually not terribly hard, as you can set up a SQL profiler with an appropriate filter. If you find that the generated SQL is different in the buggy case, you can work backwards from there. If you find that the generated SQL is identical in the buggy case, the next step would be to look at the rows returned, preferably in the same context as the application ran it.
In short, I think you just have to keep tweaking your SQL profiling until you have the information you need.