MongoDB + NoRM- Concurrency and collections - c#

Lets say we have the following document structure:
class BlogPost
{
[MongoIdentifier]
public Guid Id{get;set;}
public string Body{get;set;}
....
}
class Comment
{
[MongoIdentifier]
public Guid Id{get;set;}
public string Body {get;set;}
}
If we assume that multiple users may post comments for the same post, what would be the best way to model the relation between these?
if Post have a collection of comments, I might get concurrency problems, won't I ?
And placing a FK like attribute on Comment seems too relational , or?

You basically have two options: 1. Aggregate comments in the post document, or 2. Model post and comment as documents.
If you aggregate the comments, you should either a) implement a revision number on the post, allowing you to detect race conditions and implement handling of optimistic concurreny, or b) add new comments with a MongoDB modifier - e.g. something like
var posts = mongo.GetCollection<Post>();
var crit = new { Id = postId };
var mod = new { Comments = M.Push(new Comment(...)) };
posts.Update(crit, mod, false, false);
If you model post and comment as separate documents, handling concurrency is probably easier, but you lose the ability to load a post and its comments with a single findOne command.
In my opinion, (1) is by far the most interesting option because it models the post as an aggregate object, which is exactly what it is when you put your OO glasses on :). It's definitely the document-oriented approach, whereas (2) resembles the flat structure of a relational database.

This is one of the canonical NoSQL examples. The standard method for doing this is to store the Comments as an array of objects inside of the BlogPost.
To avoid concurrency problems MongoDB provides several atomic operations. In particular there are several update modifiers that work well with "sub-documents" or "sub-arrays".
For something like "add this comment to the post", you would typically use the $push command which will append the comment to the Post.
I see that you're using the "NoRM" drivers. It looks like they have support for atomic commands, as evidenced by their tests. In fact, their tests perform a "push this comment to the blog post".

They sort of give an example of how you'd model it over on the MongoDB page on inserting - I think you'd want a collection of comments exposed as a property on your post. You'd add comments to a given Post entity and this would do away with tying a Comment entity back to its parent Post entity which, as you are right to question, is something that makes sense in a RDBMS but not so much in a NoSQL solution.
As far as concurrency goes, if you don't trust Mongo to handle that for you, it's probably a big hint that you shouldn't be building an application on top of it.

I've created a test app that spawns 1000 concurrent threads adding "Comments" to the same "Post", the result is that alot of comments are lost.
So MongoDB treats child collections as a single value, it does not merge changes by default.
If I have a Comments collection on post, then I get concurrency problems when two or more users are adding comments at the exact same time (unlikely but possible)
So is it possible to add a comment to the post.comments collection without updating the entire post object?

Related

Add field to a Mongo document using a string for its name

I'd like to create a new field on an existing document. I'm using the following line to get all the documents out of the database into POCOs:
var collection = _database.GetCollection<Address>(collectionName);
I then make some changes to each and save them back into the database using this line:
collection.ReplaceOne(
filter: new BsonDocument("_id", a.id),
options: new UpdateOptions { IsUpsert = true },
replacement: a);
That all works nicely.
To add a field, I've tried the following:
a.ToBsonDocument().Add("RainfallMM", rainfallMM);
But it doesn't make it to the database. Is there a trick to it?
I asked you in comment did you add new property to address model and you said you didn't and you want to add it dynamically. Trick here is you initialize your collection as collection of addresses and mongo ignores all properties that are not part of address model by default.
If you want to add it dynamically you need to change how you start your collection to:
var collection = _database.GetCollection<BsonDocument>("addresses");
Now your collection is not tied to address model and you can work with it as you wish. Everything is BSON document now!
For example you can do this:
var inserted = MongoCollection.Find(x => true).FirstOrDefault();
inserted.Add(new BsonElement("RainfallMM", rainfallMM);
MongoCollection.ReplaceOne(new BsonDocument("_id", inserted["_id"]), inserted);
PS there are some other workarounds and if this one you don't like I can show you the rest =)
Hope it helps! Cheers!
As #Fidel asked me to I will try to briefly summarize other solutions. The problem in accepted answer is that while it works it loses it's connection to Address model and OP is stuck with working with BSON documents.
IMO working with plain BSON documents is pain.
If he ever wishes to change back to initializing collection as collection of Addresses and tries to get anything from db he will encounter an error saying something like:
Missing serialization information for rainfallMM
He can fix that by including tag above his Address class like this:
[BsonIgnoreExtraElements]
public class Address
{
...fluff and stuff...
}
Problem now is if he is not careful he can lose all his info in dynamically added properties.
Other problem is if he adds another property dynamically. Now he has to remember there are 2 properties which are not in model and the hell breaks loose.
Weather he likes it or not, to make his life easier he will probably have to modify Address model. There are 2 approaches and their official documentation is great (I think), so I will just link it here:
http://mongodb.github.io/mongo-csharp-driver/2.4/examples/mixing_static_and_dynamic/
IF you ask me which one is better I will honestly tell you it depends on you. From documentation you will see that if you use an extra BSON document property you don't have to worry about naming your extra properties.
That would be all I can think of now!
Hope it helps you!
try with
a.Add({"RainfallMM", rainfallMM});

How do I update all tags on a blog post?

I have a blog website built using c# ASP.NET and entity framework. I am setting up a page to create a blog which allows the user to add tags. This works fine. However, when it comes to edit the blog post I am sure I must be missing a trick. I can't work out how I would update all the tags attached to the blog post in a single simple process.
The Blog and Tag entities are setup as many-to-many.
So currently I have:
_blog = blogRepo.GetBlogByID(blog.Id);
_blog.Tags = blog.Tags;
blogRepo.UpdateBlog(_blog);
blogRepo.Save();
Which works fine if I'm adding new tags. However, if I'm removing tags it only works Entity Framework side of things. As soon as the DB Context re-initialises, it picks up from the database that the tag is still attached to the blog.
E.g. I have tag "test" added to the blog. I edit the blog and remove the tag "test" and save the blog. The blog is returned by the same request with the tag list empty. If I then make another request for the blog then the tag "test" is back again.
I thought I could just remove all tags from the blog each time and then add any which are there. But I'm sure there must be a better way. Or something is set wrong and this should work in the current setup?
Any help appreciated. Particularly if it points out something stupid which I'm not seeing.
You can't simply assign a new child list to an entities object and expect Entities to figure out all your changes. You have to do that yourself. Here's a simple way to do it. It's not the most efficient, and there are tricks to speeding this up, but it works.
First you need to get the existing list of tags. I'm assuming GetBlogByID() does this. Then, rather than assign a new list of tags, you need to call Remove() on each tag you want removed. Here's an example:
//Generate a list of tags to remove
var tagsToRemove = _blog.Tags.Except(myNewListOfTags).ToArray();
foreach(var toRemove in tagsToRemove)
_blog.Tags.Remove(toRemove);
...Save changes
Now, as a optimization if there are a lot of tags, I sometimes will do a direct SQL call to delete all the many-to-many relationships, and then add them all again using Entities, rather than have to figure out each add and remove operation.
_myDbContext.Database.ExecuteSqlCommand(
"DELETE FROM BlogTagsManyToManyTable WHERE BlogId = #BlogId",
new SqlParameter("#BlogId", blogId));
I can then add a new list of Blog Tags without having to do any special work.

interesting news article/blog post scraping problem

i need to scrape the text of blog posts to build a summary description of the blog posts similar to what techmeme.com does. not a problem when it's one or a handful of blog posts. however, the possible blogs from which to scrape the text is variable and unlimited. how would you go about doing this?
i've used the html agility pack and yql in the past, but there's nothing built-in either of those solutions to handle this requirement.
one thought i had was to search for div ids and div attributes named things like content, post, article etc and see how that worked - not really leaning this direction. the other idea was to search for the biggest text node in the html document and assume that's the node i want - could lead to some false positives. the final idea was to endeavor to create a crowdsourced data repository on google apps that would allow for the community to manage (read: create, update, delete) the xpath mappings for most of the popular news/blog platforms then you could query this list by domain or blog platform type and get the requisite xpath - but this seems like a hella undertaking.
of course, i know some of you have ideas that will work better than any of my hare-brained ideas.
what are your thoughts?
The only sure-fire way of doing this is to have a class for each blog. That way you can do what you need in the implementation of each specific class for each specific blog.
So you'll have an abstract base class that processes a blog and returns the data/info you need from a blog.
for example
public abstract class BlogProcessor
{
public abstract BlogResult ProcessBlog(string url);
}
Where BlogResult is a type you define that has all the information you'll need from a blog such as title, date, tags, post etc.
Each descendant knows how to extract this information for the blog is is specialized for.
If you call code you'll treat these descendant classes pollymorphic-ally like so:
foreach(var url in BlogsToParse)
{
var blogProcessor = BlogProcessorFactory.CreateInstance(url);
var blogResult = blogProcessor.ProcessBlog(url);
/* Do Something with blogResult */
}
Does that make sense?
In the implementation of each "ProcessBlog" method you could use HtmlAgilityPack to do the specific parsing.

NHibernate lazy loading doesn't appear to be working for my domain?

I'm new to NHibernate, but have managed to get it all running fine for my latest project. But now I've reached the inevitable performance problem where I need to get beyond the abstraction to fix it.
I've created a nunit test to isolate the method that takes a long time. But first a quick overview of my domain model is probably a good idea:
I have a 'PmqccForm' which is an object that has a 'Project' object, which contains Name, Number etc and it also has a 'Questions' object, which is a class that itself contains properties for various different 'Question' objects. There is a JobVelocityQuestion object which itself has an answer and some other properties, and a whole bunch of similar Question objects.
This is what I'm talking about with my PmqccForm having a Questions object
This is the questions part of the model:
The key point is that I want to be able to type
form.Questions.JobVelocityQuestion
as there is always exactly 1 JobVelocityQuestion for each PmqccForm, its the same for all the other questions. These are C# properties on the Questions object which is just a holding place for them.
Now, the method that is causing me issues is this:
public IEnumerable<PmqccForm> GetPmqccFormsByUser(StaffMember staffMember)
{
ISession session = NHibernateSessionManager.Instance.GetSession();
ICriteria criteria = session.CreateCriteria(typeof(PmqccForm));
criteria.CreateAlias("Project", "Project");
criteria.Add(Expression.Eq("Project.ProjectLeader", staffMember));
criteria.Add(Expression.Eq("Project.IsArchived", false));
return criteria.List<PmqccForm>();
}
A look in my console from the Nunit test which just runs this method shows that there is nearly 2000 sql queries being processsed!
http://rodhowarth.com/otherstorage/queries.txt is the console log.
The thing is, at this stage I just want the form object, the actual questions can be accessed on a need to know basis. I thought that NHibernate was meant to be able to do this?
Here is my mapping file:
http://rodhowarth.com/otherstorage/hibernatemapping.txt
Can anyone hint me as to what I'm missing? or a way to optimize what I'm doing in relation to NHibernate?
What if I made the questions a collection, and then made the properties loop through this collection and return the correct one. Would this be better optimization from nhibernates point of view?
Just try to add fetch="subselect" to the mapping file for Questions component and see if this solves the issue with multiple selects to that table - you should now see one 2nd select instead of hundreds separate queries, e.g.
<component name="Questions" insert="true" update="true" class="PmqccDomain.DomainObjects.Questions" fetch="subselect">
See for more info - Improving performance

Copying from EntityCollection to EntityCollection impossible?

How would you do this (pseudo code): product1.Orders.AddRange(product2.Orders);
However, the function "AddRange" does not exist, so how would you copy all items in the EntityCollection "Orders" from product2 to product1?
Should be simple, but it is not...
The problem is deeper than you think.
Your foreach attempt fails, because when you call product1.Orders.Add, the entity gets removed from product2.Orders, thus rendering the existing enumerator invalid, which causes the exception you see.
So why does entity get removed from produc2? Well, seems quite simple: because Order can only belong to one product at a time. The Entity Framework takes care of data integrity by enforcing rules like this.
If I understand correctly, your aim here is to actually copy the orders from one product to another, am I correct?
If so, then you have to explicitly create a copy of each order inside your foreach loop, and then add that copy to product1.
For some reason that is rather obscure to me, there is no automated way to create a copy of an entity. Therefore, you pretty much have to manually copy all Order's properties, one by one. You can make the code look somewhat more neat by incorporating this logic into the Order class itself - create a method named Clone() that would copy all properties. Be sure, though, not to copy the "owner product reference" property, because your whole point is to give it another owner product, isn't it?
Anyway, do not hesitate to ask more questions if something is unclear. And good luck.
Fyodor
Based on the previous two answers, I came up with the following working solution:
public static void AddRange<T>(this EntityCollection<T> destinationEntityCollection,
EntityCollection<T> sourceEntityCollection) where T : class
{
var array = new T[sourceEntityCollection.Count()];
sourceEntityCollection.CopyTo(array,0);
foreach (var entity in array)
{
destinationEntityCollection.Add(entity);
}
}
Yes, the usual collection related functions are not there.
But,
1. Did you check CopyTo method?
2. Do you find any problem with using the iterator? You know, GetEnumerator, go through the collection and copy the entities?
The above two can solve your problems. But, I'm sure in .NET 3.0+ there would be compact solutions.
My answers are related to .NET 2.0

Categories

Resources