MongoDB performance problems in Unity Game - c#

We are decided to use mongo db in our game for real time database but the performance of the search result is not acceptable. These are the test result with 15.000 documents and 17 fields(strings, int,float)
// 14000 ms
MongoUrl url = new MongoUrl("url-adress");
MongoClient client = new MongoClient(url);
var server = client.GetServer();
var db = server.GetDatabase("myDatabase");
var collection = db.GetCollection<PlayerFields>("Player");
var ranks = collection.FindAll().AsQueryable().OrderByDescending(p=>p.Score).ToList().FindIndex(FindPlayer).Count();
This one is the worst. //.ToList() is for testing purposes. Don't use in production code.
Second test
//9000 ms
var ranks = collection.FindAll().AsQueryable().Where(p=>p.Score < PlayerInfos.Score).Count();
Third test
//2000 ms
var qq = Query. GT("Kupa", player.Score);
var ranks = collection.Find( qq ).Where(pa=>(pa.Win + pa.Lose + pa.Draw) != 0 );
Is there any other way to make fast searches in mongo with C# .Net 2.0. We want to get player's rank according to users score and rank them.

To caveat this, I've not been a .NET dev for a few years now, so if there is a problem with the c# driver then I can't really comment, but I've got a good knowledge of Mongo so hopefully I'll help...
Indexes
Indexes will help you out a lot here. As you are ordering and filtering on fields which aren't indexed, this will only cause you problems as the database gets larger.
Indexes are direction specific (ascending/descending). Meaning that your "Score" field should be indexed descending:
db.player.ensureIndex({'Score': -1}) // -1 indicating descending
Queries
Also, Mongo is really awesome (in my opinion) and it doesn't look like you're using it to be best of it's abilities.
Your first call:
var ranks = collection.FindAll().AsQueryable().OrderByDescending(p=>p.Score).ToList().FindIndex(FindPlayer).Count();
It appears (this is where my .NET knowledge may be letting me down) that you're retrieving the entire collection ToList(), then filtering it in memory (FindPlayer predicate) in order to retrieve a subset of data. I believe that this will be evaluating the entire curser (15.000 documents) into the memory of your application.
You should update your query so that Mongo is doing the work rather than your application.
Given your other queries are filtering on Score, adding the index as described above should drastically increase the performance of these other queries
Profiling
If the call that you're expecting to make when run from the mongo cli is behaving as expected, it could be that the driver is making slightly different queries.
In the mongo CLI, you will first need to set the profiling:
db.setProfilingLevel(2)
You can then query the profile collection to see what queries are actually being made:
db.system.profile.find().limit(5).sort({ts: -1}).pretty()
This will show you the 5 most recent calls.

Related

Mongodb c# driver: view MQL bson query generated from linq

Using the latest version (2.14) is there any way to view the bson query document generated by a specific linq query?
I want to do this for two reasons:
debugging queries
copying them to run in another mongo client like compass
I know I can enable profiling, but I can't see any way to guarantee a query you find in the mongo log was generated by a specific line of code or query. Plus it's a bit long winded to do it via profiling.
You have 2 options to get MQL query from LINQ request:
Install lately released query analyzer. As I know it may not be 100% accurate if you use global static serialization configuration.
Configure CommandStartedEvent event subscriber and analyze Command document. Pay attention that you may need to remove some technical fields like $db (maybe few more) that might not be parsed by compass correctly, you will see it in the exception message if any.
#dododo answer is the right and best one, I'm just adding some code here which works for option 2:
var settings = MongoClientSettings.FromUrl(new MongoUrl(#"mongodb://localhost"));
settings.ClusterConfigurator = builder =>
{
builder.Subscribe<CommandStartedEvent>(x =>
{
var queryDocument = x.Command;
});
};
var client = new MongoClient(settings);

Paginating Geo-Data with high performance

I am building a backend (.NET 5 WebApi via REST) for a mobile app.
We have a few million entries in the database (Azure SQL Server) which all have a geolocation.
The app should query them sorted by the current location.
In addition, this should be paged, so e.g. take the first 30 results with the first call, then the next 30, etc.
I cannot come up with a really clever solution.
My current code for the third page of 30 entries looks like that:
data.OrderBy(p => p.Location.Distance(currentLocation)).skip(60).take(30).toListAsync()
The problem is that even if I know that I need only 30 results, the query needs to order the full table.
I know I can boost it with an index, but does anyone have a hint how to optimize this LINQ code?
Thanks a lot!
this part looks suspect: p.Location.Distance(currentLocation). If this is running EF Core 2.x then my guess is this would be triggering client-side evaluation resulting in all data being queried back prior to the sorting and pagination. I would recommend hooking up a profiler to the database and reviewing the SQL that is actually being run.
To better arrange sorting by distance I would consider something like:
var x = currentLocation.X;
var y = currentLocation.Y;
var results = await data.OrderBy(p => Math.Abs(p.Location.X - x) + Math.Abs(p.Location.Y - y))
.Skip(pageNumber * pageSize)
.Take(pageSize)
.ToListAsync();
This ensures the sorting is done DB server-side. (Though make sure data is still an IQueryable.) Substitute X/Y with Lat/Long or whatever coordinate fields you are using.
This doesn't give you the distance, but it gives you a value relative to the distance for each point to compare against other points. To get the distance would be Math.Sqrt(Math.Pow(p.Location.X - x,2) + Math.Pow(Location.Y - y,2)). I believe EF will translate that to SQL, at least for SQL Server's provider. It's more math conversion to put into an SQL Search which can't be indexed, but that might be more useful if you want to return the distance with the results.

Calling SKIP() in code or using TOP in function

I'm coding an application with Entity Framework in which I rely heavily on user defined functions.
I have a question about the best way (most optimized way) of how I limit and page my result sets. Basically I am wondering if these two options are the same or one is prefered performance wise.
Option 1.
//C#
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
Option 2.
//C#
var result2 = _DB.fn_GetData(page = 0, size = 100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
ORDER BY Id
OFFSET (size * page) ROWS FETCH NEXT size ROWS ONLY
To me these seem to be producing about the same result, but maybe I am missing some key aspect.
You'll have to be aware when your LINQ statement is AsEnumerable and when it is AsQueryable. As long as your statement is an IQueryable<...> the software will try to translate it into SQL and let your database do the query. Once it really has lost the IQueryable, and has become an implementation of an IEnumerable, the data has been brought to local memory, and all further LINQ statements will be performed by your process, not by the database.
If you use your debugger, you will see that the return value of your fn_getData returns an IEnumerable. This means that the result of fn_GetData is brought to local memory and your OrderBy etc is performed by your process.
Usually it is much more efficient to only move the records that you will use to local memory. Besides: do not fetch the complete records, but only the properties that you plan to use. So in this case I guess you'll have to create an extended version of fn_GetData that returns only the values you plan to use
I suggest second option because SQL Server can more faster then C# methods.
In your first option, you take all of the records in table and loop through. But second option, SQL Server do it for you and you get what you want.
You should apply the limiting and where clauses (it depends on table indexes) in the database as far as possible. For first example;
var result1 = _DB.fn_GetData().OrderBy(x => Id).Skip(page *100).Take(100).ToList();
// SQL in fn_GetData
SELECT * FROM [Data].[Table]
The whole table is retrieved from database into in-memory and it kills the performance and reliability. I strongly don't suggest it. You should consider to put some limitations to filter records on the database. So, the second option is better approach in this case.

LINQ Optimization for searching a if an object exist in a list within a list

Currently I have 7,000 video entries and I have a hard time optimizing it to search for Tags and Actress.
This is my code I am trying to modify, I tried using HashSet. It is my first time using it but I don't think I am doing it right.
Dictionary dictTag = JsonPairtoDictionary(tagsId,tagsName);
Dictionary dictActresss = JsonPairtoDictionary(actressId, actressName);
var listVid = new List<VideoItem>(db.VideoItems.ToList());
HashSet<VideoItem> lll = new HashSet<VideoItem>(listVid);
foreach (var tags in dictTag)
{
lll = new HashSet<VideoItem>(lll.Where(q => q.Tags.Exists(p => p.Id == tags.Key)));
}
foreach (var actress in dictActresss)
{
listVid = listVid.Where(q => q.Actress.Exists(p => p.Id == actress.Key)).ToList();
}
First part I get all the Videos in Db by using db.VideoItems.ToList()
Then it will go through a loop to check if a Tag exist
For each VideoItem it has a List<Tags> and I use 'exist' to check if a tag is match.
Then same thing with Actress.
I am not sure if its because I am in Debug mode and ApplicationInsight is active but it is slow. And I will get like 10-15 events per second with baseType:RemoteDependencyData which I am not sure if it means it still connected to database (should not be since I only should only be messing with the a new list of all videos) or what.
After 7 mins it is still processing and that's the longest time I have waited.
I am afraid to put this on my live site since this will eat up my resource like candy
Instead of optimizing the linq you should optimize your database query.
Databases are great at optimized searches and creating subsets and will most likely be faster than anything you write. If you have need to create a subset based on more than on database parameter I would recommend looking into creating some indexes and using those.
Edit:
Example of db query that would eliminate first for loop (which is actually multiple nested loops and where the time delay comes from):
select * from videos where tag in [list of tags]
Edit2
To make sure this is most efficient, require the database to index on the TAGS column. To create the index:
CREATE INDEX video_tags_idx ON videos (tag)
Use 'explains' to see if the index is being used automatically (it should be)
explain select * from videos where tag in [list of tags]
If it doesn't show your index as being used you can look up the syntax to force the use of it.
The problem was not optimization but it was utilization of the Microsoft SQL or my ApplicationDbContext.
I found this when I realize that http://www.albahari.com/nutshell/predicatebuilder.aspx
Because the problem with Keyword search, there can be multiple keywords, and the code I made above doesn't utilize the SQL which made the long execution time.
Using the predicate builder, it will be possible to create dynamic conditions in LINQ

How to use MongoDB as unique/enumeration store

This seems to be like a common use case... but somehow I cannot get it working.
I'm attempting to use MongoDB as an enumeration store with unique items. I've created a collection with a byte[] Id (the unique ID) and a timestamp (a long, used for enumeration). The store is quite big (terabytes) and distributed among different servers. I am able to re-build the store from scratch currently, since I'm still in the testing phase.
What I want to do is two things:
Create a unique id for each item that I insert. This basically means that if I insert the same ID twice, MongoDB will detect this and give an error. This approach seems to work fine.
Continuously enumerate the store for new items by other processes. The approach I took was to add a second index to InsertID and used a high precision timestamp on this along with the server id and a counter (just to make it unique and ascending).
In the best scenario this would mean that the enumerator would keep track of an index cursor for every server. From what I've learned from mongodb query processing I expected this behavior. However, when I try to execute the code (below) it seems to take forever to get anything.
long lastid = 0;
while (true)
{
DateTime first = DateTime.UtcNow;
foreach (var item in collection.FindAllAs<ContentItem>().OrderBy((a)=>(a.InsertId)).Take(100))
{
lastid = item.InsertId;
}
Console.WriteLine("Took {0:0.00} for 100", (DateTime.UtcNow - first).TotalSeconds);
}
I've read about cursors, but am unsure if they fulfill the requirements when new items are inserted into the store.
As I said, I'm not bound to any table structure or something like that... the only things that are important is that I can get new items over time and without getting duplicate items.
-Stefan.
Somehow I figured it out... more or less...
I created the query manually and ended up with something like this:
db.documents.find({ "InsertId" : { "$gt" : NumberLong("2020374866209304106") } }).limit(10).sort({ "InsertId" : 1 });
The LINQ query I put in the question doesn't generate this query. After some digging in the code I found that it should be this LINQ query:
foreach (var item in collection.AsQueryable().Where((a)=>(a.InsertId > lastid)).OrderBy((a) => (a.InsertId)).Take(100))
The AsQueryable() seems to be the key to execute the rewriting of LINQ to MongoDB queries.
This gives results, but still they appeared to be slow (4 secs for 10 results, 30 for 100). However, when I added 'explain()' I noticed '0 millis' in the query execution.
I stopped the process doing bulk inserts and tada, it works, and fast. In other words: the issues I was having were due to the locking behavior of MongoDB, and due to the way I interpreted the linq implementation. Since the former is the result of initial bulk-filling the data store, this means that the problem is solved.
On the 'negative' part of the solution: I would have preferred a solution that involved serializable cursors or something like that... this 'take' solution has to iterate the b-tree over and over again. If someone has an answer for this, please let me know.
-Stefan.

Categories

Resources