I have documents with huge count of fields (7500 fields in each)
but fields values is simple data (numbers only), when i try to query Collection it works great (i look at mongo profiler and it use indexes correctly)
but it takes too long time to iterate cursor (to receive data)
count of resulting documents is ~450 but it takes about ~2 minutes tu receive all documents
i already updated mongoDB version to last one, also updated MongoDB driver (for .NET), recreated indexes but nothing helps
P.S connection is not slow (BD server is in my local network - 100Base-T/Fast Ethernet)
Query Code example is below
var builder = Builders<BsonDocument>.Filter;
var filter = builder.Eq("OrgID", orgID);
filter = filter & builder.Eq("DateDeleted", (DateTime?)null);
var collection = GetCollection("NameOfCollection");
var result = collection.Find(filter);
using (var cursor = result.ToCursor())
{
while (cursor.MoveNext())
{
var batch = cursor.Current;
foreach (var document in batch)
{
yield return document;
}
}
}
and i have index for that fields separately and also there is composite index with both fields in one index
and it works great with collections where is much more documents but less fields (~20 fields in each document)
Why are you using a cursor? It is my understanding that fetches each record individually. I bet if you iterated through the ToList() instead you’d get better performance because it’d fetch all the data in a single call.
Foreach(var batch in collection.Find(filter).ToList()){
// your other code here
}
Also you are yielding the result which means that this is nested in an IEnumerable and whatever you are doing in between these retrieve calls can be slowing down the process, but you left that code out so it’s hard to say.
Related
We're planning to add a Redis-cache to an existing solution.
We have this core entity which is fetched a lot, several times per session. The entity consists of 13 columns where the majority is less than 20 characters. Typically it's retrieved by parent id, but sometimes as a subset that is fetched by a list of ids. To solve this we're thinking of implementing the solution below, but the question is if it's a good idea? Typically the list is around 400 items, but in some cases it could be up to 3000 items.
We would store the instances in the list with this key pattern: EntityName:{ParentId}:{ChildId}, where ParentId and ChildId is ints.
Then to retrieve the list based on ParentId we would call the below method with EntityName:{ParentId}:* as the value of the pattern-argument:
public async Task<List<T>> GetMatches<T>(string pattern)
{
var keys = _multiPlexer.GetServer(_multiPlexer.GetEndPoints(true)[0]).Keys(pattern: pattern).ToArray();
var values = await Db.StringGetAsync(keys: keys);
var result = new List<T>();
foreach (var value in values.Where(x => x.HasValue))
{
result.Add(JsonSerializer.Deserialize<T>(value));
}
return result;
}
And to retrieve a specific list of items we would call the below method with a list of exact keys:
public async Task<List<T>> GetList<T>(string[] keys)
{
var values = await Db.StringGetAsync(keys: keys.Select(x => (RedisKey)x).ToArray());
var result = new List<T>();
foreach (var value in values.Where(x => x.HasValue))
{
result.Add(JsonSerializer.Deserialize<T>(value));
}
return result;
}
The obvious worry here is the amount of objects to deserialize and the performance of System.Text.Json.
A alternative to this would be to store the data twice, both as a list and on it's own, but that would only help in the case where we're fetching by ParentId. We could also only store the data as a list and retrieve it every time only to sometimes use a subset.
Is there a better way to tackle this?
All input is greatly appreciated! Thanks!
Edit
I wrote a small console application to load test the alternatives, fetching 2000 items 100 times took 2020ms with the pattern matching and fetching the list took 1568ms. I think we can live with that difference and go with the pattern matching.
It seems like #Xerillio was right. I did some load testing using hosted services and then it was almost three times slower to fetch the list using the pattern matching, slower then receiving the list directly from SQL. So, to answer my own question if it's a good idea, I would say no it isn't. The majority of the added time was not because of deserialization rather because of fetching the keys using the pattern matching.
Here's the result from fetching 2000 items 100 items in a loop:
Fetch directly from db = 8625ms
Fetch using list of exact keys = 5663ms
Fetch using match = 13098ms
Fetch full list = 5352ms
Currently I have 7,000 video entries and I have a hard time optimizing it to search for Tags and Actress.
This is my code I am trying to modify, I tried using HashSet. It is my first time using it but I don't think I am doing it right.
Dictionary dictTag = JsonPairtoDictionary(tagsId,tagsName);
Dictionary dictActresss = JsonPairtoDictionary(actressId, actressName);
var listVid = new List<VideoItem>(db.VideoItems.ToList());
HashSet<VideoItem> lll = new HashSet<VideoItem>(listVid);
foreach (var tags in dictTag)
{
lll = new HashSet<VideoItem>(lll.Where(q => q.Tags.Exists(p => p.Id == tags.Key)));
}
foreach (var actress in dictActresss)
{
listVid = listVid.Where(q => q.Actress.Exists(p => p.Id == actress.Key)).ToList();
}
First part I get all the Videos in Db by using db.VideoItems.ToList()
Then it will go through a loop to check if a Tag exist
For each VideoItem it has a List<Tags> and I use 'exist' to check if a tag is match.
Then same thing with Actress.
I am not sure if its because I am in Debug mode and ApplicationInsight is active but it is slow. And I will get like 10-15 events per second with baseType:RemoteDependencyData which I am not sure if it means it still connected to database (should not be since I only should only be messing with the a new list of all videos) or what.
After 7 mins it is still processing and that's the longest time I have waited.
I am afraid to put this on my live site since this will eat up my resource like candy
Instead of optimizing the linq you should optimize your database query.
Databases are great at optimized searches and creating subsets and will most likely be faster than anything you write. If you have need to create a subset based on more than on database parameter I would recommend looking into creating some indexes and using those.
Edit:
Example of db query that would eliminate first for loop (which is actually multiple nested loops and where the time delay comes from):
select * from videos where tag in [list of tags]
Edit2
To make sure this is most efficient, require the database to index on the TAGS column. To create the index:
CREATE INDEX video_tags_idx ON videos (tag)
Use 'explains' to see if the index is being used automatically (it should be)
explain select * from videos where tag in [list of tags]
If it doesn't show your index as being used you can look up the syntax to force the use of it.
The problem was not optimization but it was utilization of the Microsoft SQL or my ApplicationDbContext.
I found this when I realize that http://www.albahari.com/nutshell/predicatebuilder.aspx
Because the problem with Keyword search, there can be multiple keywords, and the code I made above doesn't utilize the SQL which made the long execution time.
Using the predicate builder, it will be possible to create dynamic conditions in LINQ
We are decided to use mongo db in our game for real time database but the performance of the search result is not acceptable. These are the test result with 15.000 documents and 17 fields(strings, int,float)
// 14000 ms
MongoUrl url = new MongoUrl("url-adress");
MongoClient client = new MongoClient(url);
var server = client.GetServer();
var db = server.GetDatabase("myDatabase");
var collection = db.GetCollection<PlayerFields>("Player");
var ranks = collection.FindAll().AsQueryable().OrderByDescending(p=>p.Score).ToList().FindIndex(FindPlayer).Count();
This one is the worst. //.ToList() is for testing purposes. Don't use in production code.
Second test
//9000 ms
var ranks = collection.FindAll().AsQueryable().Where(p=>p.Score < PlayerInfos.Score).Count();
Third test
//2000 ms
var qq = Query. GT("Kupa", player.Score);
var ranks = collection.Find( qq ).Where(pa=>(pa.Win + pa.Lose + pa.Draw) != 0 );
Is there any other way to make fast searches in mongo with C# .Net 2.0. We want to get player's rank according to users score and rank them.
To caveat this, I've not been a .NET dev for a few years now, so if there is a problem with the c# driver then I can't really comment, but I've got a good knowledge of Mongo so hopefully I'll help...
Indexes
Indexes will help you out a lot here. As you are ordering and filtering on fields which aren't indexed, this will only cause you problems as the database gets larger.
Indexes are direction specific (ascending/descending). Meaning that your "Score" field should be indexed descending:
db.player.ensureIndex({'Score': -1}) // -1 indicating descending
Queries
Also, Mongo is really awesome (in my opinion) and it doesn't look like you're using it to be best of it's abilities.
Your first call:
var ranks = collection.FindAll().AsQueryable().OrderByDescending(p=>p.Score).ToList().FindIndex(FindPlayer).Count();
It appears (this is where my .NET knowledge may be letting me down) that you're retrieving the entire collection ToList(), then filtering it in memory (FindPlayer predicate) in order to retrieve a subset of data. I believe that this will be evaluating the entire curser (15.000 documents) into the memory of your application.
You should update your query so that Mongo is doing the work rather than your application.
Given your other queries are filtering on Score, adding the index as described above should drastically increase the performance of these other queries
Profiling
If the call that you're expecting to make when run from the mongo cli is behaving as expected, it could be that the driver is making slightly different queries.
In the mongo CLI, you will first need to set the profiling:
db.setProfilingLevel(2)
You can then query the profile collection to see what queries are actually being made:
db.system.profile.find().limit(5).sort({ts: -1}).pretty()
This will show you the 5 most recent calls.
I'm dumping a table out of MySQL into a DataTable object using MySqlDataAdapter. Database input and output is doing fine, but my application code seems to have a performance issue I was able to track down to a specific LINQ statement.
The goal is simple, search the contents of the DataTable for a column value matching a specific string, just like a traditional WHERE column = 'text' SQL clause.
Simplified code:
foreach (String someValue in someList) {
String searchCode = OutOfScopeFunction(someValue);
var results = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.Take(1);
if (results.Any()) {
results.First()["columnname"] = 10;
}
}
This simplified code is executed thousands of times, once for each entry in someList. When I run Visual Studio Performance Profiler I see that the "results.Any()" line is highlighted as consuming 93.5% of the execution time.
I've tried several different methods for optimizing this code, but none have improved performance while keeping the emoteTable DataTable as the primary source of the data. I can convert emoteTable to Dictionary<String, DataRow> outside of the foreach, but then I have to keep the DataTable and the Dictionary in sync, which while still a performance improvement, feels wrong.
Three questions:
Is this the proper way to search for a value in a DataTable (equivalent of a traditional SQL WHERE clause)? If not, how SHOULD it be done?
Addendum to 1, regardless of the proper way, what is the fastest (execution time)?
Why does the results.Any() line consume 90%+ resources? In this situation it makes more sense that the var results line should consume the resources, after all, it's the line doing the actual search, right?
Thank you for your time. If I find an answer I shall post it here as well.
Any() is taking 90% of the time because the result is only executed when you call Any(). Before you call Any(), the query is not actually made.
It would seem the problem is that you first fetch entire table into the memory and then search. You should instruct your database to search.
Moreover, when you call results.First(), the whole results query is executed again.
With deferred execution in mind, you should write something like
var result = emoteTable.AsEnumerable()
.Where(myRow => myRow.Field<String>("code") == searchCode)
.FirstOrDefault();
if (result != null) {
result["columnname"] = 10;
}
What you have implemented is pretty much join :
var searchCodes = someList.Select(OutOfScopeFunction);
var emotes = emoteTable.AsEnumerable();
var results = Enumerable.Join(emotes, searchCodes, e=>e, sc=>sc.Field<String>("code"), (e, sc)=>sc);
foreach(var result in results)
{
result["columnname"] = 10;
}
Join will probably optimize the access to both lists using some kind of lookup.
But first thing I would do is to completely abandon idea of combining DataTable and LINQ. They are two different technologies and trying to assert what they might do inside when combined is hard.
Did you try doing raw UPDATE calls? How many items are you expecting to update?
I've been using Linq-to-SQL for quite awhile and it works great. However, lately I've been experimenting with using it to pull really large amounts of data and am running across some issues. (Of course, I understand that L2S may not be the right tool for this particular kind of processing, but that's why I'm experimenting - to find its limits.)
Here's a code sample:
var buf = new StringBuilder();
var dc = new DataContext(AppSettings.ConnectionString);
var records = from a in dc.GetTable<MyReallyBigTable>() where a.State == "OH" select a;
var i = 0;
foreach (var record in records) {
buf.AppendLine(record.ID.ToString());
i += 1;
if (i > 3) {
break; // Takes forever...
}
}
Once I start iterating over the data, the query executes as expected. When stepping through the code, I enter the loop right away which is exactly what I hoped for - that means that L2S appears to be using a DataReader behind the scenes instead of pulling all the data first. However, once I get to the break, the query continues to run and pull all the rest of the records. Here are my questions for the SO community:
1.) Is there a way to stop Linq-to-SQL from finishing execution of a really big query in the middle the way you can with a DataReader?
2.) If you execute a large Linq-to-SQL query, is there a way to prevent the DataContext from filling up with change tracking information for every object returned. Basically, instead of filling up memory, can I do a large query with short object lifecycles the way you can with DataReader techniques?
I'm okay if this isn't functionality built-in to the DataContext itself and requires extending the functionality with some customization. I'm just looking to leverage the simplicity and power of Linq for large queries for nightly processing tasks instead of relying on T-SQL for everything.
1.) Is there a way to stop Linq-to-SQL from finishing execution of a really
big query in the middle the way you
can with a DataReader?
Not quite. Once the query is finally executed the underlying SQL statement is returning a result set of matching records. The query is deferred up till that point, but not during traversal.
For your example you could simply use records.Take(3) but I understand your actual logic to halt the process might be external to SQL or not easily translatable.
You could use a combination approach by building a strongly typed LINQ query then executing it with old fashioned ADO.NET. The downside is you lose the mapping to the class and have to manually deal with the SqlDataReader results. An example of this is shown below:
var query = from c in Customers
where c.ID < 15
select c;
using (var command = dc.GetCommand(query))
{
command.Connection.Open();
using (var reader = command.ExecuteReader())
{
int i = 0;
while (reader.Read())
{
Customer c = new Customer();
c.ID = reader.GetInt32(reader.GetOrdinal("ID"));
c.Name = reader.GetString(reader.GetOrdinal("Name"));
Console.WriteLine("{0}: {1}", c.ID, c.Name);
i++;
if (i > 3)
break;
}
}
}
2.) If you execute a large Linq-to-SQL query, is there a way to prevent the
DataContext from filling up with
change tracking information for every
object returned.
If your intention for a particular query is to use it for read-only purposes then you could disable object tracking to increase performance by setting the DataContext.ObjectTrackingEnabled property to false:
using (var dc = new MyDataContext())
{
dc.ObjectTrackingEnabled = false;
// do stuff
}
You can also read this MSDN topic: How to: Retrieve Information As Read-Only (LINQ to SQL).