MongoDB C# Upsert with Guid

MongoDB C# Upsert with Guid - c#

When attempting to perform an upsert operation in Mongo, I'd like to have it generate a GUID for the ID instead of an Object ID. In this case, I'm checking to make sure an object with specific properties doesn't already exist and actually throwing an exception if the update occurs.
Here's a stub of the class definition:
public class Event
{
[BsonId(IdGenerator = typeof(GuidGenerator) )]
[BsonRepresentation(BsonType.String)]
[BsonIgnoreIfDefault]
public Guid Id { get; set; }
// ... more properties and junk
}
And here is how we are performing the upsert operation:
// query to see if there are any pending operations
var keyMatchQuery = Query<Event>.In(r => r.Key, keyList);
var statusMatchQuery = Query<Event>.EQ(r => r.Status, "pending");
var query = Query.And(keyMatchQuery , statusMatchQuery );
var updateQuery = new UpdateBuilder();
var bson = request.ToBsonDocument();
foreach (var item in bson)
{
updateQuery.SetOnInsert(item.Name, item.Value);
}
var fields = Fields<Request>.Include(req => req.Id);
var args = new FindAndModifyArgs()
{
Fields = fields,
Query = query,
Update = updateQuery,
Upsert = true,
VersionReturned = FindAndModifyDocumentVersion.Modified
};
// Perform the upsert
var result = Collection.FindAndModify(args);
Doing it this way will generate the ID as an ObjectID rather than a GUID.
I can definitely get the behavior I want as a two step operation by performing a .FindOne first, and if it fails, doing a direct insert:
var existingItem = Collection.FindOneAs<Event>(query);
if (existingItem != null)
{
throw new PendingException(string.Format("Event already pending: id={0}", existingItem.Id));
}
var result = Collection.Insert(mongoRequest);
In this case, it correctly sets the GUID for the new item, but the operation is non-atomic. I was searching for a way to set the default ID generation mechanism at the driver level, and thought this would do it:
BsonSerializer.RegisterIdGenerator(typeof(Guid), GuidGenerator.Instance);
...but to no avail, and I assume that's because for the upsert, the ID field can't be included so there is no serialization happening and Mongo is doing all of the work. I also looked into implementing a convention, but that didn't make sense since there are separate generation mechanisms to handle that. Is there a different approach I should be looking at for this and/or am I just missing something?
I do realize that GUIDs are not always ideal in Mongo, but we are exploring using them due to compatibility with another system.

What's happening is that only the server knows whether the FindAndModify is going to end up being an upsert or not, and as currently written it is the server that is automatically generating the _id value, and the server can only assume that the _id value should be an ObjectId (the server knows nothing about your class declarations).
Here's a simplified example using the shell showing your scenario (minus all the C# code...):
> db.test.drop()
> db.test.find()
> var query = { x : 1 }
> var update = { $setOnInsert : { y : 2 } }
> db.test.findAndModify({ query: query, update : update, new : true, upsert : true })
{ "_id" : ObjectId("5346c3e8a8f26cfae50837d6"), "x" : 1, "y" : 2 }
> db.test.find()
{ "_id" : ObjectId("5346c3e8a8f26cfae50837d6"), "x" : 1, "y" : 2 }
>
We know this was an upsert because we ran it on an empty collection. Note that the server used the query as an initial template for the new document (that's where the "x" came from), applied the update specification (that's where the "y" came from), and because the document had no "_id" it generated a new ObjectId for it.
The trick is to generate the _id client side in case it turns out to be needed, but to put it in the update specification in such a way that it only applies if it's a new document. Here's the previous example using $setOnInsert for the _id:
> db.test.drop()
> db.test.find()
> var query = { x : 1 }
> var update = { $setOnInsert : { _id : "E3650127-9B23-4209-9053-1CD989AE62B9", y : 2 } }
> db.test.findAndModify({ query: query, update : update, new : true, upsert : true })
{ "_id" : "E3650127-9B23-4209-9053-1CD989AE62B9", "x" : 1, "y" : 2 }
> db.test.find()
{ "_id" : "E3650127-9B23-4209-9053-1CD989AE62B9", "x" : 1, "y" : 2 }
>
Now we see that the server used the _id we supplied instead of generating an ObjectId.
In terms of your C# code, simply add the following to your updateQuery:
updateQuery.SetOnInsert("_id", Guid.NewGuid().ToString());
You should consider renaming your updateQuery variable to updateSpecification (or just update) because technically it's not a query.
There's a catch though... this technique is only going to work against the current 2.6 version of the server. See: https://jira.mongodb.org/browse/SERVER-9958

You seem to be following the recommended practice for this, but possibly this is bypassed with "upserts" somehow. The general problem seems to be that the operation does not actually know which "class" it is actually dealing with and has no way of knowing that it needs to call the custom Id generator.
Any value that you pass in to MongoDB for the _id field will always be honored in place of generating the default ObjectID. Therefore if that field is included in the update "document" portion of the statement it will be used.
Probably the safest way to do this when expecting "upsert" behavior is to use the $setOnInsert modifier. Anything specified in here will only be set when an insert occurs from a related "upsert" operation. So in general terms:
db.collection.update(
{ "something": "matching" }
{
// Only on insert
"$setOnInsert": {
"_id": 123
},
// Always applied on update
"$set": {
"otherField": "value"
}
},
{ upsert: true }
)
So anything within the $set ( or other valid update operators ) will always be "updated" when the matching "query" condition is found. The $setOnInsert fields will be applied when the "insert" actually occurs due to no match. Naturally any literal conditions used in the query portion to "match" are also set so that future "upserts" will issue an "update" instead.
So as long as you structure your "update" BSON document to include your newly generated GUID in this way then you will always get the correct value in there.
Much of your code is on the right track, but you will need to invoke the method from your generator class and place value in the $setOnInsert portion of the statement, which you are already using, but just not including that _id value yet.

Related

EF Core 3 Nested Value Conversion Cannot be Translated

Update
The crux of the issue is caused by using a nested field on a property inside the where clause; I've provided a simplified example (which fails with the same "cannot be translated" error) for illustration. For fuller picture, read the content under the Original heading.
The following clause (rest of the query was working before I started using this nested property)
where (r.Status.IntVal == (int)StatusEnum.Answered) // IntVal is just an int
gets translated at runtime to
.Where(ti => ti.Outer.Status.IntVal == 1) // right hand side translates as expected
So the question is: How do you write a proper ValueConverter (or what other step do you take) in order to properly read and evaluate a nested field like this in EntityFramework?
Orginal Post
I recently updated a field on one of my EF models (using EF Core v3.1.3) from an enum to a class which has a System.Enum field on it. Because I want the database / EF to treat that field as the underlying integer value (like it does with enum) I'm using .HasConversion() in the fluent API, but I'm getting the dreaded "could not be translated" error.
The following line is the code in the query which is causing the issue
...
where (r.Status.Value == StatusEnum.Answered || r.Status.Value == StatusEnum.AwaitingResponse)
...
which gets translated to
.Where(ti => (int)ti.Outer.Status.Value == 1 || (int)ti.Outer.Status.Value == 0)
The value it is trying to translate is of this type:
public class MyStatusClass<T> where T : Enum
{
public MyStatusClass(int i)
{
Value = (T)Enum.Parse(typeof(T), value.ToString());
IntVal = i // This was added to simplify problem, see update above
}
public T Value { get; internal set; } // This is the field that I need stored in the database
public int IntVal {get; internal set; } // This was added to simplify problem, see update above
// Some other methods in here; the reason I converted from an enum to this class
}
The conversion in my DbContext ModelBuilder looks like this
entity.Property(x => x.Status)
.HasConversion(
status => status.Value,
status => new StatusClass((int)status) // This is the part to focus on, as its' the part that tells EF how to read the property from the database
);

MongoDB: Combine Aggregation and Filter

Please see the following post for some background: MongoDB C# Driver - Return last modified rows only
After almost two years of running this code, we've been experiencing performance problems lately and as much as I keep on saying that the code is not the issue, Infrastructure are insisting it's because I'm doing full table scans.
The thing is that the problem is environment specific. Our QA environment runs like a dream all the time but Dev and Prod are very slow at times and fine at other - it's very erratic. They have the same data and code on but Dev and Prod have another app that is also running on the database.
My data has an Id as well as an _id (or AuditId) - I group the data by Id and then return the last _id for that record where it was not deleted. We have multiple historic records for the same ID and I would like to return the last one (see original post).
So I have the following method:
private static FilterDefinition<T> ForLastAuditIds<T>(IMongoCollection<T> collection) where T : Auditable, IMongoAuditable
{
var pipeline = new[] { new BsonDocument { { "$group", new BsonDocument { { "_id", "$Id" }, { "LastAuditId", new BsonDocument { { "$max", "$_id" } } } } } } };
var lastAuditIds = collection.Aggregate<Audit>(pipeline).ToListAsync().Result.ToList().Select(_ => _.LastAuditId);
var forLastAuditIds = Builders<T>.Filter.Where(_ => lastAuditIds.Contains(_.AuditId) && _.Status != "DELETE");
return forLastAuditIds;
}
This method is called by the one below, which accepts an Expression that it appends to the FilterDefinition created by ForLastAuditIds.
protected List<T> GetLatest<T>(IMongoCollection<T> collection,
Expression<Func<T, bool>> filter, ProjectionDefinition<T, T> projection = null,
bool disableRoleCheck = false) where T : Auditable, IMongoAuditable
{
var forLastAuditIds = ForLastAuditIds(collection);
var limitedList = (
projection != null
? collection.Find(forLastAuditIds & filter, new FindOptions()).Project(projection)
: collection.Find(forLastAuditIds & filter, new FindOptions())
).ToListAsync().Result.ToList();
return limitedList;
}
Now, all of this works really well and is re-used by all of my code that calls Collections, but this specific collection is a lot bigger than the others and we are getting slowdowns just on that one.
My question is: Is there a way for me to take the aggregate and Filter Builder and combine them to return a single FilterDefinition that I could use without running the full table scan first?
I really hope I am making sense.

Assuming I fully understand what you want, this should be as easy as this:
First, put a descending index on the LastAuditId field:
db.collection.createIndex{ "LastAuditId": -1 /* for sorting */ }
Or even extend the index to cover for other fields that you have in your filter:
db.collection.createIndex{ "Status": 1, "LastAuditId": -1 /* for sorting */ }
Make sure, however, that you understand how indexes can/cannot support certain queries. And always use explain() to see what's really going on.
The next step is to realize that you must always filter as much as possible as the very first step to reduce the amount of sorting required.
So, if you need to e.g. filter by Name then by all means do it as the very first step if your business requirements permit it. Be careful, however, that filtering at the start changes your semantics in the sense that you will get the last modified documents per each Id that passed the preceeding $match stage as opposed to the last documents per each Id that happen to also pass the following $match stage.
Anyway, most importantly, once you've got a sorted set, you can easily and quickly get the latest full document by using $group with $first which - with the right index in place - will not do a collection scan anymore (it'll be an index scan for now and hence way faster).
Finally, you want to run the equivalent of the following MongoDB query through C# leveraging the $$ROOT variable in order to avoid a second query (I can put the required code together for you once you post your Audit, Auditable and IMongoAuditable types as well as any potential serializers/conventions):
db.getCollection('collection').aggregate({
$match: {
/* some criteria that you currently get in the "Expression<Func<BsonDocument, bool>> filter" */
}
}, {
$sort: {
"ModifiedDate": -1 // this will use the index!
}
}, {
$group: {
"_id": "$Id",
"document": { $first: "$$ROOT" } // no need to do a separate subsequent query or a $max/$min across the entire group because we're sorted!
}
}, {
$match: { // some additional filtering depending on your needs
"document.Status": { $ne: "Delete" }
}
})
Lastly, kindly note that it might be a good idea to move to the latest version of MongoDB because they are currently putting a lot of effort into optimizing aggregation cases like yours, e.g. this one: https://jira.mongodb.org/browse/SERVER-9507

How to determine if $addToSet actually added a new item into a MongoDB document or if the item already existed?

I'm using the C# driver (v1.8.3 from NuGet), and having a hard time determining if an $addtoSet/upsert operation actually added a NEW item into the given array, or if the item was already existing.
Adding a new item could fall into two cases, either the document didn't exist at all and was just created by the upsert, or the document existed but the array didn't exist or didn't contain the given item.
The reason I need to do this, is that I have large sets of data to load into MongoDB, which may (shouldn't, but may) break during processing. If this happens, I need to be able to start back up from the beginning without doing duplicate downstream processing (keep processing idempotent). In my flow, if an item is determined to be newly added, I queue up downstream processing of that given item, if it is determined to already have been added in the doc, then no more downstream work is required. My issue is that the result always returns saying that the call modified one document, even if the item was already existing in the array and nothing was actually modified.
Based on my understanding of the C# driver api, I should be able to make the call with WriteConcern.Acknowledged, and then check the WriteConcernResult.DocumentsAffected to see if it indeed updated a document or not.
My issue is that in all cases, the write concern result is returning back that 1 document was updated. :/
Here is an example document that my code is calling $addToSet on, which may or may not have this specific item in the "items" list to start with:
{
"_id" : "some-id-that-we-know-wont-change",
"items" : [
{
"s" : 4,
"i" : "some-value-we-know-is-static",
}
]
}
My query always uses an _id value which is known based on the processing metadata:
var query = new QueryDocument
{
{"_id", "some-id-that-we-know-wont-change"}
};
My update is as follows:
var result = mongoCollection.Update(query, new UpdateDocument()
{
{
"$addToSet", new BsonDocument()
{
{ "items", new BsonDocument()
{
{ "s", 4 },
{ "i", "some-value-we-know-is-static" }
}
}
}
}
}, new MongoUpdateOptions() { Flags = UpdateFlags.Upsert, WriteConcern = WriteConcern.Acknowledged });
if(result.DocumentsAffected > 0 || result.UpdatedExisting)
{
//DO SOME POST PROCESSING WORK THAT SHOULD ONLY HAPPEN ONCE PER ITEM
}
If i run this code one time on an empty collection, the document is added and response is as expected ( DocumentsAffected = 1, UpdatedExisting = false). If I run it again (any number of times), the document doesn't appear to be updated as it remains unchanged but the result is now unexpected (DocumentsAffected = 1, UpdatedExisting = true).
Shouldn't this be returning DocumentsAffected = 0 if the document is unchanged?
As we need to do many millions of these calls a day, I'm hesitant to turn this logic into multiple calls per item (first checking if the item exists in the given documents array, and then adding/queuing or just skipping) if at all possible.
Is there some way to get this working in a single call?

Of course what you are doing here is actually checking the response which does indicate whether a document was updated or inserted or in fact if neither operation happened. That is your best indicator as for an $addToSet to have performed an update the document would then be updated.
The $addToSet operator itself cannot produce duplicates, that is the nature of the operator. But you may indeed have some problems with your logic:
{
"$addToSet", new BsonDocument()
{
{ "items", new BsonDocument()
{
{ "id", item.Id },
{ "v", item.Value }
}
}
}
}
So clearly you are showing that an item in your "set" is composed of two fields, so if that content varies in any way ( i.e same id but different value) then the item is actually a "unique" member of the set and will be added. There would be no way for instance for the $addToSet operator to not add new values purely based on the "id" as a unique identifier. You would have to actually roll that in code.
A second possibility here for a form of duplicate is that your query portion is not correctly finding the document that has to be updated. The result of this would be creating a new document that contains only the newly specified member in the "set". So a common usage mistake is something like this:
db.collection.update(
{
"id": ABC,
"items": { "$elemMatch": {
"id": 123, "v": 10
}},
{
"$addToSet": {
"items": {
"id": 123, "v": 10
}
}
},
{ "upsert": true }
)
The result of that sort of operation would always create a new document because the existing document did not contain the specified element in the "set". The correct implementation is to not check for the presence of the "set" member and allow $addToSet to do the work.
If indeed you do have true duplicate entries occurring in the "set" where all elements of the sub-document are exactly the same, then it has been caused by some other code either present or in the past.
Where you are sure there a new entries being created, look through the code for instances of $push or indeed and array manipulation in code that seems to be acting on the same field.
But if you are using the operator correctly then $addToSet does exactly what it is intended to do.

how to read back out the auto generated ID from mongo after insertion using official c# driver?

after you insert a new document into mongodb via the official c# driver, how to you immediately read back the generated _id so I can use it as a "foreign" key to other collections? i know in sql server i can immediately read back the identity column value for the newly inserted row, so i need the similar functionality in mongodb.
since the _id generated by mongo isn't an actual member of the object, assume you need to do something with the generic bsondocument?

You can do an upsert with the findAndModify command to achieve this same effect with less work than going through generating your own id's. (Why bother, there is a very good reason 10gen decided on the sceme that is used -- it enables easy sharding)
The findAndModify command lets you find or upsert (create if it doesn't exist) a document and return that same document.
The general form is as follows:
db.runCommand( { findAndModify : <collection>, <options> } )
You can read more about it here.
You would want to use the new in addition to the upsert option so that you get back the newly created object.

If you need the _id, you can generate it yourself and set it manually on the document.

In MongoDB, ids are (usually) generated on client side. And you can generate one yourself, using appropriate driver call, put in into document and it'll get saved.
I didn't work with C# driver, but Ruby driver does all the work for me.
ruby-1.9.3-p0 :027 > obj = coll.insert({'foo' => 'bar'})
=> BSON::ObjectId('4ef15e7f0ed4c00272000001')
ruby-1.9.3-p0 :030 > coll.find.to_a
=> [{"_id"=>BSON::ObjectId('4ef15e7f0ed4c00272000001'), "foo"=>"bar"}]
This is how I can make a new ID
ruby-1.9.3-p0 :039 > newid = BSON::ObjectId.new
=> BSON::ObjectId('4ef15f030ed4c00272000002')
ruby-1.9.3-p0 :040 > coll.insert({_id: newid, test: 'test'})
=> BSON::ObjectId('4ef15f030ed4c00272000002')
ruby-1.9.3-p0 :041 > coll.find.to_a
=> [{"_id"=>BSON::ObjectId('4ef15e7f0ed4c00272000001'), "foo"=>"bar"}, {"_id"=>BSON::ObjectId('4ef15f030ed4c00272000002'), "test"=>"test"}]

In most drivers the _id field is actually generated on the client side before going to the server. MongoDB does not use an "auto-increment" ID, so you can actually generate a random ID and tell the server "use this".
In C# the code looks like this:
var id = ObjectId.GenerateNewId();
So you can create a BSON document and simply save it:
var toSave = new BsonDocument {
{ "_id", ObjectId.GenerateNewId() },
{ "data", "my data" }
};
db.collection.Save(toSave);
However, by default, when you .Save() a document this will update the _id field. So you can generally just save the BSONDocument (or BSONSerializable class) and then read it back.
Note that there is a specification called DBRef that helps simplify the implementation of "Foreign Keys". The docs are here, in C# you will want to look at the DBRef class.

Like the other answers here say, IDs are assigned client-side. Something you can do is create a default value convention that generates a new ID during insert if it hasn't been set yet.
public class DefaultValueConvention : MongoDB.Bson.Serialization.Conventions.IDefaultValueConvention
{
public object GetDefaultValue(MemberInfo memberInfo)
{
var type = memberInfo.MemberType == MemberTypes.Property
? ((PropertyInfo) memberInfo).PropertyType
: ((FieldInfo) memberInfo).FieldType;
if (type.IsSubclassOf(typeof(ObjectId)))
return ObjectId.GenerateNewId();
else
return null;
}
}
And setup the driver to use this convention:
var profile = new ConventionProfile();
profile.SetDefaultValueConvention(new DefaultValueConvention());
BsonClassMap.RegisterConventions(profile, x => x.FullName.StartsWith("ConsoleApplication"));
So now you can create an object & persist it in 2 lines:
var animal = new Animal {Name = "Monkey", PercentDeviationFromHumans = 2.01};
db["animals"].Save(animal);
Actually, with the most recent driver you don't even need to set the default value convention, it already has this behavior OOTB. Regardless, conventions are underused in mongo.

NHibernate PreSelect?

I am trying to implement privileges using NHibernate, and what I want to do, is each time there is a Select query, check what the return type is, and if it is a security enabled type (such as invoices) i want to add restrictions to the ICriteria object, to restrict retrieving only certain records (According if the user has read all, or read own privileges).
I managed to implement these kind of privileges for Insert and Update using
NHibernater.Event.IPreUpdateEventListener
NHibernater.Event.IPreInsertEventListener
but unfortunately the IPreLoadEventListener is called after the database is queried, and therefore it is waste as the filtering will be done locally on the computer rather then by the database.
Does anyone know if NHibernate provides some sort of event that is called before a query is executed?

If you're able to use it, check out Rhino.Security It does exactly what you're trying to do. Even if you're unable to use it, you can then see his implementation of this problem.

Can't you achieve this by using Filters ?
More information can be found here
I've used this in combination with Interceptors in a project of mine:
I have some entities where each user can create instances from, but only the user who has created them, should be able to see / modify those instances. Other users cannot see instances created by user X.
In order to that, I've created an interface IUserContextAware. Entities that are 'user context aware' implement this interface.
When building my session-factory, I created the necessary filters:
var currentUserFilterParametersType = new Dictionary<string, NHibernate.Type.IType> (1);
currentUserFilterParametersType.Add (CurrentUserContextFilterParameter, NHibernateUtil.Guid);
cfg.AddFilterDefinition (new FilterDefinition (CurrentUserContextFilter,
"(:{0} = UserId or UserId is null)".FormatString (CurrentUserContextFilterParameter),
currentUserFilterParametersType,
false));
When this was done, I needed to define the additional filter criteria:
foreach( var mapping in cfg.ClassMappings )
{
if( typeof (IUserContextAware).IsAssignableFrom (mapping.MappedClass) )
{
// The filter should define the names of the columns that are used in the DB, rather then propertynames.
// Therefore, we need to have a look at the mapping information.
Property userProperty = mapping.GetProperty ("UserId");
foreach( Column c in userProperty.ColumnIterator )
{
string filterExpression = ":{0} = {1}";
// When the BelongsToUser field is not mandatory, NULL should be taken into consideration as well.
// (For instance: a PrestationGroup instance that is not User-bound (that can be used by any user), will have
// a NULL value in its BelongsToUser field).
if( c.IsNullable )
{
filterExpression = filterExpression + " or {1} is null";
}
mapping.AddFilter (CurrentUserContextFilter, "(" + filterExpression.FormatString (CurrentUserContextFilterParameter, c.Name) + ")");
break;
}
}
Now, whenever I instantiate an ISession, I specify that a certain interceptor should be used:
This interceptor makes sure that the parameter in the filter is populated:
internal class ContextAwareInterceptor : EmptyInterceptor
{
public override void SetSession( ISession session )
{
if( AppInstance.Current == null )
{
return;
}
// When a User is logged on, the CurrentUserContextFilter should be enabled.
if( AppInstance.Current.CurrentUser != null )
{
session.EnableFilter (AppInstance.CurrentUserContextFilter)
.SetParameter (AppInstance.CurrentUserContextFilterParameter,
AppInstance.Current.CurrentUser.Id);
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.