MongoDB: How to define a dynamic entity in my own domain class?

MongoDB: How to define a dynamic entity in my own domain class? - c#

New to MongoDB. Set up a C# web project in VS 2013.
Need to insert data as document into MongoDB. The number of Key-Value pair every time could be different.
For example,
document 1: Id is "1", data is one pair key-value: "order":"shoes"
document 2: Id is "2", data is a 3-pair key-value: "order":"shoes", "package":"big", "country":"Norway"
In this "Getting Started" says because it is so much easier to work with your own domain classes this quick-start will assume that you are going to do that. suggests make our own class like:
public class Entity
{
public ObjectId Id { get; set; }
public string Name { get; set; }
}
then use it like:
var entity = new Entity { Name = "Tom" };
...
entity.Name = "Dick";
collection.Save(entity);
Well, it defeats the idea of no-fixed columns, right?
So, I guess BsonDocument is the the model to use and is there any good samples for beginners?

I'm amazed how often this topic comes up... Essentially, this is more of a 'statically typed language limitation' than a MongoDB issue:
Schemaless doesn't mean you don't have any schema per se, it basically means you don't have to tell the database up front what you're going to store. It's basically "code first" - the code just writes to the database like it would to RAM, with all the flexibility involved.
Of course, the typical application will have some sort of reoccurring data structure, some classes, some object-oriented paradigm in one way or another. That is also true for the indexes: indexes are (usually) 'static' in the sense that you do have to tell mongodb about which field to index up front.
However, there is also the use case where you don't know what to store. If your data is really that unforeseeable, it makes sense to think "code first": what would you do in C#? Would you use the BsonDocument? Probably not. Maybe an embedded Dictionary does the trick, e.g.
public class Product {
public ObjectId Id {get;set;}
public decimal Price {get;set;}
public Dictionary<string, string> Attributes {get;set;}
// ...
}
This solution can also work with multikeys to simulate a large number of indexes to make queries on the attributes reasonably fast (though the lack of static typing makes range queries tricky). See
It really depends on your needs. If you want to have nested objects and static typing, things get a lot more complicated than this. Then again, the consumer of such a data structure (i.e. the frontend or client application) often needs to make assumptions that make it easy to digest this information, so it's often not possible to make this type safe anyway.
Other options include indeed using the BsonDocument, which I find too invasive in the sense that you make your business models depend on the database driver implementation; or using a common base class like ProductAttributes that can be extended by classes such as ProductAttributesShoes, etc. This question really revolves around the whole system design - do you know the properties at compile time? Do you have dropdowns for the property values in your frontend? Where do they come from?
If you want something reusable and flexible, you could simply use a JSON library, serialize the object to string and store that to the database. In any case, the interaction with such objects will be ugly from the C# side because they're not statically typed.

Related

Best practices for my dynamic queries in Entity Framework

I am building a web application that is a recreation of an older system, and I am trying to build it in an architected, yet pragmatic and maintainable way (unlike the old system). Anyways, I am currently designing my queries for my models in my application. The old system allows developers to assign any field through a boolean to be a searchable value from a table, meaning a single view for maintaining some models' records might contain 20 searchable fields in the front-end and doing that only requires ticking a single box.
Now I would like to implement something similar in this new system with C# with a backend using EF as the data mapper, but I am not sure what approach is the most maintainable. In my current approach the filters are sent by the client as a record that (at most) contains all the possible filterable fields e.g
public record GetOrderQuery()
{
public string OrderReference { get; set;}
public string OrdererName { get; set; }
public int ItemCount { get; set; }
//etc...
}
I am fine with it, if the record limits filters which can be applied ( should I have the record contain an object that has fieldName, fieldValue, queryType and have that as an iterable property in the record instead?), but I would like to streamline the actual filtering. Basically if the client sent any of the above fields in the request (as JSON and none are required), the filtering is applied to those fields. I am currently thinking that I could implement this with reflection: I try to find a field in the actual model where the property name is the same as in the record, then I construct the predicate for the Where() by chaining expressions.
I construct expressions for each property that has a value in the query and can be found through reflection (a property with the same name), then I link those together using a Binary Expressions, combinining each of the filters in to a single expression. I am not sure if this is the best approach or even what is a good way to implement this though (performance or maintainability wise or just in general). Are there any other ways to implement this, are there any pitfalls in this I should look out for, any resources I should read? Thanks!

Custom Field Design with C# and RavenDB

I'm facing a key design question related to how to attach custom fields to entities in my system. The entities are represented in C# and persisted in RavenDB. We are roughly following tenants of Domain Driven Design and our entities are aggregate roots.
[Note: I would like to avoid any debate around the appropriateness of a generic feature like custom fields in a DDD approach. Let's assume we have a legitimate user need to attach and display arbitrary data to our entities. Also, I have made my examples generic for illustrating the design challenges. :)]
My question is concerning how best to lay out the field definitions and the field value instances.
Imagine a domain where we have aggregate roots of Book and Author. We want users to be able to attach arbitrary data attributes to instances of Books and Authors. So, we might define a custom field with a class like this:
public enum CustomFieldType
{
Text,
Numeric,
DateTime,
SingleSelect,
MultiSelect
}
public class CustomFieldDefinition
{
public string Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public CustomFieldType Type { get; set; }
public Collection<string> Options { get; set; }
}
A CustomFieldDefinition (CFD) that attached to Book might have values like:
Id: "BookCustomField\1"
Name: "FooCode"
Type: Text
Description: "Foo Corp's special identifier."
Type: Text
Options: null
The first question I'm facing is what to store on each instance of a Book. The choices range from...
the low end:
store just the CFD Id and the instance value
to
the high end:
store the entire CFD along with the value
The "low end" is bad because I cannot display a Book without pulling in the CFD, which is in another document. Also, if I change the CFD in any way, I've change the meaning of values in historical documents.
The "high end" is bad because there would be a lot of duplication. The CFD could be pretty heavy for select list CFDs because the definition contains all of the selectable options.
The first question is... How much should be stored in the document for each Book? Just enough to display the Book (and I'd have to go back to the CFD to display the options and description if I'm going to allow the user to edit the CF value)?
The second question is... Should I store I store the entire collection of CFDs for one entity type in one document or keep each CFD in it's own document?
Each CFD as a document keeps things simple for each CFD (especially when I start to do things like deactivate definitions), but then I need a way to separate Book CFDs from Author CFDs. This also forces me to load 1 document for each CF attached to the entity whenever I want to edit the entity.
All of the CFDs for a given type in one document allows me to load just one document, but then I'm loading all of the deactivated definitions as well.
Third question... Is there a better way to implement this altogether?
Fourth question... Are there any sample or open source solutions out there so I don't have to reinvent this wheel?

Since you said in comments:
... a Book from a year ago should show the custom fields as of a year ago.
There are only two viable options I can see.
Option 1
Custom field definitions exist in their own documents.
Every book contains a copy of the custom field definitions that apply to that book, along with the selected values for each custom field.
They are copied when the book is first created, but could be copied again as your logic sees fit. Perhaps on edit, you might want to take a new copy, potentially invalidating the current selections.
Advantages: Self-contained, easy to index and manipulate.
Disadvantages: Lots of copies of the custom field definitions. Storage requirements could be very large.
Option 2
Use the Temporal Versioning Bundle (disclaimer: I am its author).
Custom field definitions still exist in their own documents, but they are tracked temporally. This means that revisions to the custom fields will be maintained in a usable history.
Books only contain the selected values. They don't contain copies of the definitions.
Books don't need to be tracked temporally, but they do need some kind of effective date in their data. Perhaps an "entered on" date. Use whatever makes sense for you.
The Book-to-CFD relationship is an Nt:Tx type. You can find another example of this relationship type here. You might want to get an overview of temporal relationships in order to make some sense of this. Beware, this is a tricky subject and gets complicated quickly.
Advantages: Much less storage required, since there are not many duplicate copies of the custom field definition data.
Disadvantages: Learning curve. Complexity of working with temporal data. Requirement to install a custom bundle on your database server.
With either option, I would simply keep a property on the custom field definition that says what type(s) it applies to (Book, Author, etc).

How to properly design a class that should contain dual language information

If my domain object should contain string properties in 2 languages, should I create 2 separate properties or create a new type BiLingualString?
For example in plant classification application, the plant domain object can contain Plant.LatName and Plant.EngName.
The number of bi-lingual properties for the whole domain is not big, about 6-8, I need only to support two languages, information should be presented to UI in both languages at the same time. (so this is not locallization). The requirements will not change during development.
It may look like an easy question, but this decision will have impact on validation, persistance, object cloning and many other things.
Negative sides I can think of using new dualString type:
Validation: If i'm going to use DataAnattations, Enterprise Library validation block, Flued validation this will require more work, object graph validation is harder than simple property validation.
Persistance: iether NH or EF will require more work with complex properties.
OOP: more complex object initialization, I will have to initialize this new Type in constructor before I can use it.
Architecture: converting objects for passing them between layers is harder, auto mapping tools will require more hand work.

While reading your question I was thinking about why not localization all the time but when I read information should be presented to UI in both languages at the same time. I think it makes sense to use properties.
In this case I would go for a class with one string for each languages as you have mentioned BiLingualString
public class Names
{
public string EngName {get;set;}
public string LatName {get;set;}
}
Then I would use this class in my main Plant Class like this
public class Plant: Names
{
}

If you 100% sure that it will always be only Latin and English I would just stick with simplest solution - 2 string properties. It also more flexible in UI then having BiLingualString. And you won't have to deal with Complex types when persisting.

To help decide, I suggest considering how consistent this behavior will be at all layers. If you expose these as two separate properties on the business object, I would also expect to see it stored as two separate columns in a database record, for example, rather than two translations for the same property stored in a separate table. It does seem odd to store translations this way, but your justifications sound reasonable, and 6 properties is not un-managable. But be sure that you don't intend to add more languages in the future.
If you expect this system to by somewhat dynamic in that you may need to add another language at some point, it would seem to make more sense to me to implement this differently so that you don't have to alter the schema when a new language needs to be supported.
I guess the thing to balance is this: consider the likelihood of having to adjust the languages or properties to accommodate a new language against the advantage (simplicity) you gain by exposing these directly as separate properties rather than having to load translations as a separate level.

What Data Structure would you use for a Curriculum of a Department in a University?

For my homework, I'm implementing a course registration system for a university and I implemented a simple class for Curriculum with list of semesters and other properties like name of the department, total credits etc.
But I'm wondering if I can inherit this class from a Graph Data Structure with Edges and vertices.
Anybody done similar things before?
My current design is something like this:
public class Curriculum
{
public string NameOfDepartment { get; set; }
public List<Semester> Semesters { get; set; }
public bool IsProgramDesigned { get; set; }
public Curriculum()
{
IsProgramDesigned = false;
}
//
public string AddSemester(Semester semester)
{

As an enterprise architect I would absolutely not use a graph structure for this data. This data is a list and nothing more.
For a problem similar to this, the only reason I would ever consider using a graph structure would be to potentially create the relationship of course requirements and prerequisites.
This way you could then use the graph algorithm to determine if it is valid for a student to register for a class by making sure it is a valid addition to the tree. Same for removing classes, it could be validated to make sure you aren't dropping a class and staying enrolled in the lab for the class example.
Now if I was going to actually implement this. I would still have an overall list of classes that have a Key to the vertex in the graph representation. One thing to keep in mind is that graph algorithms are about the biggest heavy hitter you can throw at a database so minimize the amount of work done to pull the graph out is always key. Depending on the size and scope, I would also evaluate if I could store entire graphs in a serialized form or to use a document database for the same reason.
Which in this example would be the most likely route I would take. I would store the entire object of prerequisites co-requisites and so on right inline with my course object. Since the graph is a set it and done event there's no need to do an actual graph traversal and you're better off storing the pre-calculated graph.

Yes you can inherit this class from a Graph data structure. You can make it a subclass of anything you want (except for a sealed class). The question of whether or not it is a wise design is entirely dependant on what you want to do. I assume you know how, so comment if you need an example of how to implement inheritance.
IF you are wanting to write your own graphing algorithms, why not just model it yourself? It would probably be a fun exercise.

Is it ok to use C# Property like this

One of my fellow developer has a code similar to the following snippet
class Data
{
public string Prop1
{
get
{
// return the value stored in the database via a query
}
set
{
// Save the data to local variable
}
}
public void SaveData()
{
// Write all the properties to a file
}
}
class Program
{
public void SaveData()
{
Data d = new Data();
// Fetch the information from database and fill the local variable
d.Prop1 = d.Prop1;
d.SaveData();
}
}
Here the Data class properties fetch the information from DB dynamically. When there is a need to save the Data to a file the developer creates an instance and fills the property using self assignment. Then finally calls a save. I tried arguing that the usage of property is not correct. But he is not convinced.
This are his points
There are nearly 20 such properties.
Fetching all the information is not required except for saving.
Instead of self assignment writing an utility method to fetch all will have same duplicate code in the properties.
Is this usage correct?

I don't think that another developer who will work with the same code will be happy to see :
d.Prop1 = d.Prop1;
Personally I would never do that.
Also it is not the best idea to use property to load data from DB.
I would have method which will load data from DB to local variable and then you can get that data using property. Also get/set logically must work with the same data. It is strange to use get for getting data from DB but to use set to work with local variable.

Properties should really be as lightweight as possible.
When other developers are using properties, they expect them to be intrinsic parts of the object (that is, already loaded and in memory).
The real issue here is that of symmetry - the property get and set should mirror each other, and they don't. This is against what most developers would normally expect.
Having the property load up from database is not recommended - normally one would populate the class via a specific method.

This is pretty terrible, imo.
Properties are supposed to be quick / easy to access; if there's really heavy stuff going on behind a property it should probably be a method instead.
Having two utterly different things going on behind the same property's getter and setter is very confusing. d.Prop1 = d.Prop1 looks like a meaningless self-assignment, not a "Load data from DB" call.
Even if you do have to load twenty different things from a database, doing it this way forces it to be twenty different DB trips; are you sure multiple properties can't be fetched in a single call? That would likely be much better, performance-wise.

"Correct" is often in the eye of the beholder. It also depends how far or how brilliant you want your design to be. I'd never go for the design you describe, it'll become a maintenance nightmare to have the CRUD actions on the POCOs.
Your main issue is the absense of separations of concerns. I.e., The data-object is also responsible for storing and retrieving (actions that need to be defined only once in the whole system). As a result, you end up with duplicated, bloated and unmaintainable code that may quickly become real slow (try a LINQ query with a join on the gettor).
A common scenario with databases is to use small entity classes that only contain the properties, nothing more. A DAO layer takes care of retrieving and filling these POCOs with data from the database and defined the CRUD actions only ones (through some generics). I'd suggest NHibernate for the ORM mapping. The basic principle explained here works with other ORM mappers too and is explained here.
The reasons, esp. nr 1, should be a main candidate for refactoring this into something more maintainable. Duplicated code and logic, when encountered, should be reconsidered strongly. If the gettor above is really getting the database data (I hope I misunderstand that), get rid of it as quickly as you can.
Overly simplified example of separations of concerns:
class Data
{
public string Prop1 {get; set;}
public string Prop2 {get; set;}
}
class Dao<T>
{
SaveEntity<T>(T data)
{
// use reflection for saving your properies (this is what any ORM does for you)
}
IList<T> GetAll<T>()
{
// use reflection to retrieve all data of this type (again, ORM does this for you)
}
}
// usage:
Dao<Data> myDao = new Dao<Data>();
List<Data> allData = myDao.GetAll();
// modify, query etc using Dao, lazy evaluation and caching is done by the ORM for performance
// but more importantly, this design keeps your code clean, readable and maintainable.
EDIT:
One question you should ask your co-worker: what happens if you have many Data (rows in database), or when a property is a result of a joined query (foreign key table). Have a look at Fluent NHibernate if you want a smooth transition from one situation (unmaintainable) to another (maintainable) that's easy enough to understand by anybody.

If I were you I would write a serialize / deserialize function, then provide properties as lightweight wrappers around the in-memory results.
Take a look at the ISerialization interface: http://msdn.microsoft.com/en-us/library/system.runtime.serialization.iserializable.aspx

This would be very hard to work with,
If you set the Prop1, and then get Prop1, you could end up with different results
eg:
//set Prop1 to "abc"
d.Prop1 = "abc";
//if the data source holds "xyz" for Prop1
string myString = d.Prop1;
//myString will equal "xyz"
reading the code without the comment you would expect mystring to equal "abc" not "xyz", this could be confusing.
This would make working with the properties very difficult and require a save every time you change a property for it to work.

As well as agreeing with what everyone else has said on this example, what happens if there are other fields in the Data class? i.e. Prop2, Prop3 etc, do they all go back to the database, each time they are accessed in order to "return the value stored in the database via a query". 10 properties would equal 10 database hits. Setting 10 properties, 10 writes to the database. That's not going to scale.

In my opinion, that's an awful design. Using a property getter to do some "magic" stuff makes the system awkward to maintain. If I would join your team, how should I know that magic behind those properties?
Create a separate method that is called as it behaves.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.