RavenDB rebuilds all indexes when one document is updated

RavenDB rebuilds all indexes when one document is updated - c#

I can't seem to find an answer for this, even after Googling around.
We are experiencing issues causing our app to lock up. Partly this is because we have outstanding WaitForNonStaleResultsAsOfNow calls that we are waiting to release fixes for (i.e. we have removed them) but also this is being caused by a total rebuild of all indexes. I believe the trigger that causes all indexes to be rebuilt is when we make a change to one (type of) document. For example:
We have a model called "Agency". When our users log in, we use their "AgencyId" in order to provide them with data specific to them. As such, most other documents (such as "Placements", "Invoices" etc) have an "AgencyId" field.
Agency model looks something like:
public class Agency
{
public string Id {get;set;}
public string AgencyName {get;set;}
// ...
}
Example of Placement (and other Agency specific documents)
public class Placement
{
public string Id {get;set;}
public string AgencyId {get;set;} // relates to Agency Document
// ...
}
We have a feature that allows Administrators to upload documents (PDFs) to an Agency's profile. We store the PDF in a DFS and set the "DocumentPath" property on the Agency model to where it's saved.
My question: Would updating the Agency record cause a rebuild of all related documents' indexes? i.e. I know the AgencyIndex would rebuild but would this cause the PlacementIndex (and all other related indexes) to rebuild as well?
More information:
Raven Client Build#: 2.5.2952
Raven Server Build#: 2.5.2952 (RavenHQ)
Also worth noting: We are working on upgrading to RavenDB 3.0 asap but this is a real live problem and I need to understand why it's happening!

Yes, for sure updating a doc the many others points to causes the indexes to rebuild.
Some types of operations needs the index no to be stale (or force update on stale index). It's necessary to pass a deadline to your WaitForNonStaleResultsAsOfNow, that can receive a TimeSpan as param, so you'll wait for the index not to be stale for predefined type.

Related

How to check this exact class isn't already in DB with another name? (MongoDB)

Using C# and MongoDb im saving a class similar to the following.
public class Zone
{
public string ZoneName { get; set; }
public List<string> IncludedCountries { get; set; } = new List<string>();
}
This is filled by user and saved in my DB, currently I am checking that the zone name isn't duplicated when inserting. Like so.
if (All().Any(x => x.Name.ToLower() == zone.Name.ToLower())) { throw new System.Exception($"Zone \"{zone.ZoneName}\" is already in database, please edit the zone"); };
But if user currently tries to add the exact same values (So exact same list of included countries) with different name, I wouldn't catch it.
I want to be able to, as dont want to be duplicating same classes in DB (My actual class will have more properties, this is an example). I am aware I can check it the same way im checking for name, but having in mind I have a lot of properties, i'd like to know what the best way is..

Ideally you wouldn't perform a search, then use that to decide whether to add or not. In a collaborative system with potentially multiple users you could find another user in another transaction runs the same code at the same time, and ends up adding the record just after your check, but just before your insert.
It's better, assuming your datastore supports it, to use a uniqueness constraint on some value of the data you're inserting. Here's the docs for Mongo: https://docs.mongodb.com/manual/core/index-unique/
This means the transaction will be failed by the database if you attempt to insert a duplicate. To be fair, there's nothing wrong with doing the "ask-then-tell" as well I suppose, in order to avoid ugly exceptions being shown to users, but if you're able to interrogate the exception details you can probably catch it and show the user some helpful information rather than letting them see an error page.
To support your requirement for "has the same list of things" in this way, I'd suggest creating a SHA256 hash value (here's a link: https://stackoverflow.com/a/6839784/26414) for the list, and storing that as a property in it's own right. Just make sure it's recalculated if the list changes.
One additional thing - technically "class" defines the schema, or shape of a bit of data. When you create an instance of a class at runtime, which has actual values and takes up memory, that's technically an "object". So an "object" is an "instance" of a "class".

Unity/C# Savegame Migration

I've written a SaveLoad class, which contains a Savegame class that has a bunch of ints, doubles, bools but also more complex things like an array of self-written class objects.
That savegame object is being created, serialized and AES encrypted on save and vice versa on load - so far, so good.
The problem I'm facing now is that if there are new variables (in a newer version of the game) that have to be stored and loaded, the game crashes on load, because the new variables can't be loaded correctly (because they are not contained in the old save file). E.g. ints and doubles contain the default 0 while an array is not initialized, thus null.
My current "solution": For each variable that is being loaded, check if it doesn't contain a specific value (which I set in the Savegame class).
For example: In Savegame I set
public int myInt = int.MinValue;
and when loading, I check:
if(savegame.myInt != int.MinValue){
//load successful
}else{
//load failed
};
This works so far for int and double, but once I hit the first bool, I realized, that for every variable I have to find a value that makes "no sense"(not reachable usually), thus was a failed load. => Shitty method for bools.
I could now go ahead and convert all bools to int, but this is getting ugly...
There must be a cleaner and/or smarter solution to this. Maybe some sort of savegame migrator? If there is a well done, free plugin for this, that would also be fine for me, but I'd prefer a code-solution, which may also be more helpful for other people with a similar problem.
Thanks in advance! :)

Your issue is poor implementation.
If you are going to be having changes like this, you should be following Extend, Deprecate, Delete (EDD).
In this case, you should be implementing new properties/fields as nullables until you can go through and data repair your old save files. This way, you can check first if the loaded field is null or has a value. If it has a value, you're good to go, if it's null, you don't have a value, you need to handle that some way.
e.g.
/*We deprecate the old one by marking it obsolete*/
[Obsolete("Use NewSaveGameFile instead")]
public class OldSaveGameFile
{
public int SomeInt { get; set; }
}
/*We extend by creating a new class with old one's fields*/
/*and the new one's fields as nullables*/
public class NewSaveGameFile
{
public int SomeInt { get; set; }
public bool? SomeNullableBool { get; set; }
}
public class FileLoader
{
public SavedGame LoadMyFile()
{
NewSaveGameFile newFile = GetFileFromDatabase(); // Code to load the file
if (newFile.SomeNullableBool.HasValue)
{
// You're good to go
}
else
{
// It's missing this property, so set it to a default value and save it
}
}
}
Then once everything has been data repaired, you can fully migrate to the NewSaveGameFile and remove the nullables (this would be the delete step)

So one solution would be to store the version of the save file system in the save file itself. So a property called version.
Then when initially opening the file, you can call the correct method to load the save game. It could be a different method, an interface which gets versioned, different classes, etc but then you would require one of these for each save file version you have.
After loading it in file's version, you could then code migration objects/methods that would populate the default values as it becomes a newer version in memory. Similar to your checks above, but you'd need to know which properties/values need to be set between each version and apply the default. This would give you the ability to migrate forward to each version of the save file, so a really old save could be updated to the newest version available.

I'm facing the same problem and trying to build a sustainable solution. Ideally someone should be able to open the game in 10 years and still access their save, even if the game has changed substantially.
I'm having a hard time finding a library that does this for me, so I may build my own (please let me know if you know of one!)
The way that changing schemas is generally handled in the world of web-engineering is through migrations-- if an old version of a file is found, we run it through sequential schema migrations until it's up-to-date.
I can think of two ways to do this:
Either you could save all saved files to the cloud, say, in MongoDB, then change their save data for them whenever they make updates or
You need to run old save data through standardized migrations on the client when they attempt to load an old version of the save file
If I wanted to make the client update stale saved states then, every time I need to change the structure of the save file (on a game that's been released):
Create a new SavablePlayerData0_0_0 where 0_0_0 is using semantic versioning
Make sure every SavablePlayerData includes public string version="0_0_0"
We'll maintain static Dictionary<string, SavedPlayerData> versionToType = {"0_0_0": typeof(SavablePlayerData0_0_0)} and a static string currentSavedDataVersion
We'll also maintain a list of migration methods which we NEVER get rid of, something like:
Something like
public SavablePlayerData0_0_1 Migration_0_0_0_to_next(SavablePlayerData0_0_0 oldFile)
{
return new SavablePlayerData0_0_1(attrA: oldFile.attrA, attrB: someDefault);
}
Then you'd figure out which version they were on from the file version, the run their save state through sequential migrations until it matches the latest, valid state.
Something like (total pseudocode)
public NewSavedDataVersion MigrateToCurrent(PrevSavedDataVersion savedData)
{
nextSavedData = MigrationManager.migrationDict[GetVersion(savedData)]
if (GetVersion(nextSavedData) != MigrationManager.currentVersion) {
return MigrateToCurrent(nextSavedData, /* You'd keep a counter to look up the next one */)
}
}
Finally, you'd want to make sure you use a type alias and [Obsolete] to quickly shift over your codebase to the new save version
It might all-in-all be easier to just work with save-file-in-the-cloud so you can control migration. If you do this, then when a user tries to open the game with an older version, you must block them and force them to update the game to match the saved version stored in the cloud.

How to update all document fields except specified ones in mongodb

I present a simple model:
public class UserDocument
{
[BsonRepresentation(BsonType.ObjectId)]
public string Id { get; set; }
public string DisplayName { get; set; }
public List<string> Friends { get; set; }
}
I am using the latest C# driver which has the ability to replace a document using a C# object which will automatically update all its fields. Problem is I want to update all fields except for the user friends, because it's a field containing the object relations to other documents. Of course I can manually update each field of the ones I want to get updated, which here are just two.
But this example is simple just to make my point. In reality the fields are much more and it would be harder to update each field. That would require a single line for each one to use the Set operator. Also, newly-added fields would have to be supported in the same way as opposed to updating to automatically just works.
Is there a way to achieve that - automatically update all fields with just specifying a list of excluded fields?

There is no way, using the provided builders to have a "blacklist" update which excludes only specific fields.
You can query the old document, copy the old values of these fields to the new instance and then replace it entirely in the database.
You can also generate such an update command by iterating over the fields using reflection.
But the MongoDB driver doesn't offer such a query built in.

I figured out a way to do this with MongoDB using Javascript/NodeJS, but maybe the logic can translate to C#?
I wanted to update all fields without having to actually explicitly state them (all fields except for one, it turned out).
Attempted update of all document fields:
await examCollection.findOneAndUpdate(
{_id: new ObjectID(this.examId)},
{$set: this.data}
)
...except, this.data happened to have _id in it as well, which I didn't want to update. (In fact, it gave me an error, because _id is immutable.)
So, for my workaround, I ended up "deleting" all fields on the object that I didn't want to update (i.e. _id).
Successful update of all non-specified document fields:
// (1) specify fields that I don't want updated (aka get rid of them from object) (similar option in C#?)
delete this.data._id
//delete this.data.anotherField
//delete this.data.anotherField2
//delete this.data.anotherField3
// (2) update MongoDB document
await examCollection.findOneAndUpdate(
{_id: new ObjectID(this.examId)},
{$set: this.data}
)
This was much easier than explicitly stating all the fields I did want to update, because there were A LOT, and they could potentially change in the future (new fields added, fields deleted, etc.).
Hopefully this strategy can help!
Note: In reality, I did my "field specifying" earlier in another file, rather than immediately before updating like it shows in the example, but same effect.

MongoDB C# driver - serialization of POCO references?

I'm researching MongoDB at the moment. It's my understanding that the official C# driver can perform serialization and deserialization of POCOs. What I haven't found information on yet is how a reference between two objects is serialized. [I'm talking about something that would be represented as two seperate documents, with ID links, rather than embeded documents.
Can the serialization mechanism handle this kind of situation? (1):
class Thing {
Guid Id {get; set;}
string Name {get; set;}
Thing RelatedThing {get; set;}
}
Or do we have to sacrifice some OOP, and do something like this? (2) :
class Thing {
Guid Id {get; set;}
string Name {get; set;}
Guid RelatedThing_ID {get; set;}
}
UPDATE:
Just a couple of related questions then...
a) If the serializer is able to handle situation (1). What is an example of how to do this without using embedding?
b) If using embedding, would it be possible to query across all 'Things' regardless of whether they were 'parents' or embedded elements? How would such a query look like?

The C# driver can handle serializing the class containing a reference to another instance of itself (1). However:
As you surmised, it will use embedding to represent this
There must be no circular paths in the object graph or a stack overflow will occur
If you want to store it as separate documents you will have to use your second class (2) and do multiple inserts.
Querying across multiple levels is not really possible when the object is stored as one large document with nested embedding. You might want to look at some alternatives like:
https://docs.mongodb.com/manual/applications/data-models-tree-structures/

Yes, That is completely possible.
One thing you must understand about MongoDB and most NoSQL solutions is that objects can be contained within other objects. In the case of MongoDB, it's basically, if you can create the object in JSON, then you can create the object in MongoDB.
In general, you should strive to have a "relatively" denormalized database structure. A little bit of duplicated data is ok as long as you're not updating it often.

If you really want a reference to another document, you can use a DBRef. However there is limitation with references in MongoDB.
you can only query by id on a ref
when you get your Thing's document, you'll have to make a second query to get the associated RelatingThing's document as join doesn't exists in MongoDB.

I've encountered the same issue recently, and I usually steer away from them but... I'm thinking that this could be a good use for a significant numbering system deployed on the Id field.
class Thing {
string Id {get; set;}
string Name {get; set;}
string RelatedThing {get; set;}}
So, simplifying, if Id was something like "T00001" (or indeed T + GUID), you could easily get the set of things from Mongo by querying for something like Id starts with T, and setting up objects for them all (or just for the subset you know contains your reference, if it is a very large set).
You know/expect that RelatedThing to be a Thing, but it will just be a string when it comes back from Mongo. But if you've set up objects as above, you could effectively use the string as if it were an object reference (after all, that is what it really is, done kind of "manually").
Its a 'loose' way of doing it, but might be workable for you.
Can anyone see any pitfalls with that approach?

Db4O activation depth, Faq, Best Practise for Web Application

Our database includes 4,000,000 records (sql server) and it's physical size is 550 MB .
Entities in database are related each other as graph style. When i load an entity from db with 5 level depth there is a problem (all records are loaded).
Is there any mechanism like Entity Framework( Include("MyProperty.ItsProperty"))
What is the best Types for using with db4O databases?
Is there any issue for Guid, Generic Collections?
Is there any best practise for WebApplication with db4o? Session Containers+EmbeddedDb4ODb or Client/ServerDb4O?
Thx for help..
Thx for good explanation. But i want to give my exact problem as a sample:
I have three entities: (N-N relationship. B is an intersection Entity. Concept:Graph)
class A
{
public B[] BList;
public int Number;
public R R;
}
class B
{
public A A;
public C C;
public D D;
public int Number;
}
class C
{
public B[] BList;
public E E;
public F F;
public int Number;
}
I want to query dbContext.A.Include("BList.C.BList.A").Include("BList.C.E.G").Where(....)
I want to get :A.BList.C.BList.A.R
But I dont want to get :A.R
I want to get :A.BList.C.E.G
But I dont want to get :A.BList.C.F
I want to get :A.BList.C.E.G
But I dont want get :A.BList.D
Note:this requirements can change a query to another query
Extra question is there any possibility to load
A.BList[#Number<120].C.BList.A[#Number>100] Super syntax :)

Activation: As you said db4o uses it's activation-mechanism to control which objects are loaded. To prevent that to many objects are loaded there are different strategies.
Lower the global default activation-depth: configuration.Common.ActivationDepth = 2 Then use the strategies below to activate objects on need.
Use class-specific activation configuration like cascading activation, minimum and maximun activation-depth etc.
Activate objects explicit on demand: container.Activate(theObject,5)
However all these stuff is rather painful on complex object graphs. The only strategy to get away from that pain is transparent activation. Create an attribute like TransparentlyActivated. Use this attribute to mark your stored classes. Then use the db4otool to enhance your classes. Add the db4otool-command to the Post-Build events in Visual Studio: Like 'PathTo\Db4oTool.exe -ta -debug -by-attribute:YourNamespace.TransparentlyActivated $(TargetPath)
Guid, Generic Collections:
No (in Version 7.12 or 8.0). However if you store your own structs: Those are handled very poorly by db4o
WebApplication: I recommend an embedded-container, and then a session-container for each request.
Update for extended question part
To your case. For such complex activation schema I would use transparent activation.
I assume you are using properties and not public fields in your real scenario, otherwise transparent persistence doesn't work.
The transparent activation basically loads an object in the moment a method/property is called the first. So when you access the property A.R then A itself it loaded, but not the referenced objects. I just go through a few of you access patterns to show what I mean:
Getting 'A.BList.C.BList.A.R'
A is loaded when you access A.BList. The BList array is filled with unactivate objects
You keep navigating further to BList.C. At this moment the BList object is loaded
Then you access C.BList. db4o loads the C-object
And so on and so forth.
So when you get 'A.BList.C.BList.A.R' then 'A.R' isn't loaded
A unloaded object is represented by an 'empty'-shell object, which has all values set to null or the default value. Arrays are always fully loaded, but first filled with unactivated objects.
Note that theres no real query syntax to do some kind of elaborate load requests. You load your start object and then pull stuff in as you need it.
I also need to mention that this kind of access will perform terrible over the network with db4o.
Yet another hint. If you want to do elaborate work on a graph-structure, you also should take a look at graph databases, like Neo4J or Sones Graph DB

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.