Custom Field Design with C# and RavenDB

Custom Field Design with C# and RavenDB - c#

I'm facing a key design question related to how to attach custom fields to entities in my system. The entities are represented in C# and persisted in RavenDB. We are roughly following tenants of Domain Driven Design and our entities are aggregate roots.
[Note: I would like to avoid any debate around the appropriateness of a generic feature like custom fields in a DDD approach. Let's assume we have a legitimate user need to attach and display arbitrary data to our entities. Also, I have made my examples generic for illustrating the design challenges. :)]
My question is concerning how best to lay out the field definitions and the field value instances.
Imagine a domain where we have aggregate roots of Book and Author. We want users to be able to attach arbitrary data attributes to instances of Books and Authors. So, we might define a custom field with a class like this:
public enum CustomFieldType
{
Text,
Numeric,
DateTime,
SingleSelect,
MultiSelect
}
public class CustomFieldDefinition
{
public string Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public CustomFieldType Type { get; set; }
public Collection<string> Options { get; set; }
}
A CustomFieldDefinition (CFD) that attached to Book might have values like:
Id: "BookCustomField\1"
Name: "FooCode"
Type: Text
Description: "Foo Corp's special identifier."
Type: Text
Options: null
The first question I'm facing is what to store on each instance of a Book. The choices range from...
the low end:
store just the CFD Id and the instance value
to
the high end:
store the entire CFD along with the value
The "low end" is bad because I cannot display a Book without pulling in the CFD, which is in another document. Also, if I change the CFD in any way, I've change the meaning of values in historical documents.
The "high end" is bad because there would be a lot of duplication. The CFD could be pretty heavy for select list CFDs because the definition contains all of the selectable options.
The first question is... How much should be stored in the document for each Book? Just enough to display the Book (and I'd have to go back to the CFD to display the options and description if I'm going to allow the user to edit the CF value)?
The second question is... Should I store I store the entire collection of CFDs for one entity type in one document or keep each CFD in it's own document?
Each CFD as a document keeps things simple for each CFD (especially when I start to do things like deactivate definitions), but then I need a way to separate Book CFDs from Author CFDs. This also forces me to load 1 document for each CF attached to the entity whenever I want to edit the entity.
All of the CFDs for a given type in one document allows me to load just one document, but then I'm loading all of the deactivated definitions as well.
Third question... Is there a better way to implement this altogether?
Fourth question... Are there any sample or open source solutions out there so I don't have to reinvent this wheel?

Since you said in comments:
... a Book from a year ago should show the custom fields as of a year ago.
There are only two viable options I can see.
Option 1
Custom field definitions exist in their own documents.
Every book contains a copy of the custom field definitions that apply to that book, along with the selected values for each custom field.
They are copied when the book is first created, but could be copied again as your logic sees fit. Perhaps on edit, you might want to take a new copy, potentially invalidating the current selections.
Advantages: Self-contained, easy to index and manipulate.
Disadvantages: Lots of copies of the custom field definitions. Storage requirements could be very large.
Option 2
Use the Temporal Versioning Bundle (disclaimer: I am its author).
Custom field definitions still exist in their own documents, but they are tracked temporally. This means that revisions to the custom fields will be maintained in a usable history.
Books only contain the selected values. They don't contain copies of the definitions.
Books don't need to be tracked temporally, but they do need some kind of effective date in their data. Perhaps an "entered on" date. Use whatever makes sense for you.
The Book-to-CFD relationship is an Nt:Tx type. You can find another example of this relationship type here. You might want to get an overview of temporal relationships in order to make some sense of this. Beware, this is a tricky subject and gets complicated quickly.
Advantages: Much less storage required, since there are not many duplicate copies of the custom field definition data.
Disadvantages: Learning curve. Complexity of working with temporal data. Requirement to install a custom bundle on your database server.
With either option, I would simply keep a property on the custom field definition that says what type(s) it applies to (Book, Author, etc).

Related

MongoDB: How to define a dynamic entity in my own domain class?

New to MongoDB. Set up a C# web project in VS 2013.
Need to insert data as document into MongoDB. The number of Key-Value pair every time could be different.
For example,
document 1: Id is "1", data is one pair key-value: "order":"shoes"
document 2: Id is "2", data is a 3-pair key-value: "order":"shoes", "package":"big", "country":"Norway"
In this "Getting Started" says because it is so much easier to work with your own domain classes this quick-start will assume that you are going to do that. suggests make our own class like:
public class Entity
{
public ObjectId Id { get; set; }
public string Name { get; set; }
}
then use it like:
var entity = new Entity { Name = "Tom" };
...
entity.Name = "Dick";
collection.Save(entity);
Well, it defeats the idea of no-fixed columns, right?
So, I guess BsonDocument is the the model to use and is there any good samples for beginners?

I'm amazed how often this topic comes up... Essentially, this is more of a 'statically typed language limitation' than a MongoDB issue:
Schemaless doesn't mean you don't have any schema per se, it basically means you don't have to tell the database up front what you're going to store. It's basically "code first" - the code just writes to the database like it would to RAM, with all the flexibility involved.
Of course, the typical application will have some sort of reoccurring data structure, some classes, some object-oriented paradigm in one way or another. That is also true for the indexes: indexes are (usually) 'static' in the sense that you do have to tell mongodb about which field to index up front.
However, there is also the use case where you don't know what to store. If your data is really that unforeseeable, it makes sense to think "code first": what would you do in C#? Would you use the BsonDocument? Probably not. Maybe an embedded Dictionary does the trick, e.g.
public class Product {
public ObjectId Id {get;set;}
public decimal Price {get;set;}
public Dictionary<string, string> Attributes {get;set;}
// ...
}
This solution can also work with multikeys to simulate a large number of indexes to make queries on the attributes reasonably fast (though the lack of static typing makes range queries tricky). See
It really depends on your needs. If you want to have nested objects and static typing, things get a lot more complicated than this. Then again, the consumer of such a data structure (i.e. the frontend or client application) often needs to make assumptions that make it easy to digest this information, so it's often not possible to make this type safe anyway.
Other options include indeed using the BsonDocument, which I find too invasive in the sense that you make your business models depend on the database driver implementation; or using a common base class like ProductAttributes that can be extended by classes such as ProductAttributesShoes, etc. This question really revolves around the whole system design - do you know the properties at compile time? Do you have dropdowns for the property values in your frontend? Where do they come from?
If you want something reusable and flexible, you could simply use a JSON library, serialize the object to string and store that to the database. In any case, the interaction with such objects will be ugly from the C# side because they're not statically typed.

How to properly design a class that should contain dual language information

If my domain object should contain string properties in 2 languages, should I create 2 separate properties or create a new type BiLingualString?
For example in plant classification application, the plant domain object can contain Plant.LatName and Plant.EngName.
The number of bi-lingual properties for the whole domain is not big, about 6-8, I need only to support two languages, information should be presented to UI in both languages at the same time. (so this is not locallization). The requirements will not change during development.
It may look like an easy question, but this decision will have impact on validation, persistance, object cloning and many other things.
Negative sides I can think of using new dualString type:
Validation: If i'm going to use DataAnattations, Enterprise Library validation block, Flued validation this will require more work, object graph validation is harder than simple property validation.
Persistance: iether NH or EF will require more work with complex properties.
OOP: more complex object initialization, I will have to initialize this new Type in constructor before I can use it.
Architecture: converting objects for passing them between layers is harder, auto mapping tools will require more hand work.

While reading your question I was thinking about why not localization all the time but when I read information should be presented to UI in both languages at the same time. I think it makes sense to use properties.
In this case I would go for a class with one string for each languages as you have mentioned BiLingualString
public class Names
{
public string EngName {get;set;}
public string LatName {get;set;}
}
Then I would use this class in my main Plant Class like this
public class Plant: Names
{
}

If you 100% sure that it will always be only Latin and English I would just stick with simplest solution - 2 string properties. It also more flexible in UI then having BiLingualString. And you won't have to deal with Complex types when persisting.

To help decide, I suggest considering how consistent this behavior will be at all layers. If you expose these as two separate properties on the business object, I would also expect to see it stored as two separate columns in a database record, for example, rather than two translations for the same property stored in a separate table. It does seem odd to store translations this way, but your justifications sound reasonable, and 6 properties is not un-managable. But be sure that you don't intend to add more languages in the future.
If you expect this system to by somewhat dynamic in that you may need to add another language at some point, it would seem to make more sense to me to implement this differently so that you don't have to alter the schema when a new language needs to be supported.
I guess the thing to balance is this: consider the likelihood of having to adjust the languages or properties to accommodate a new language against the advantage (simplicity) you gain by exposing these directly as separate properties rather than having to load translations as a separate level.

XML Based Meta Data Driven WPF Application

I am working on a BI application in WPF. I am in the process of designing its architecture and am in search of a way to directly bind controls in the view to a xml which contains the metadata of the view. Do you think this is going to be possible? then how? or is it advisable to read off from the xml and generate the views accordingly?
Edited
Properties such as colors of charts, who created the chart, the next chart upon drilling down a chart, the user names and their passwords, user group names etc. are stored in XML files. When a user starts the application the dashboards he has created should be displayed; this happens with the retrieval of data from the back end and by assigning the correct chart colors. So if these data are available in the XML, my question is the best way to generate the charts and dashboards upon user request.
Edited
As I explained earlier as well, the problem is to store the metadata related to this application in the most efficient and structured way to call back upon a user loging in.
Thanks in advance.

I'm not sure I quite understand what you are looking to do. If you just want to bind some UI control properties to data in an XML document, that's entirely possible. I blogged about it years ago here.

I will suggest use of XAML instead of XML.
XAML will not only let you define the UI but XAML also can contain your other metadata or config information that you can read/write in the form of XAML to directly your CLR class.
Benefits are,
Xaml serialization is exactly same as that of Xml's serialization
Xaml will give you powerful intellisense while editing in Visual Studio (xml also can give but you will have to create and update schema everytime you make changes to your configuration schema)
In case of intellisense, Xaml is better because it will automatically give validation errors
It will also allow you to use Enums
It will also hide/show members or classes based on inheritance hierarchy
You can load XAML from string coming from database as well
It will let you specify bindings as well if your object is derived from DependencyObject and you will be able to transfer or reuse the bindings in your UI
For example,
public class ScreenElement{
public string Author {get;set;}
public DateTime DateCreated {get;set;}
}
// XAML can not directly deal with generics so this step is
// necessary
public class ScreenElements : ObservableCollection<ScreenElement>
{
}
[ContentProperty("Elements")]
public class Screen
{
public Screen(){
this.Elements = new ScreenElements();
}
public string Title{get;set;}
public bool ToolbarPresent {get;set;}
// this attribute is necessary if
// you want to save Screen to xaml
[DesignerSerializationVisibility(DesignerSerializationVisibility.Content)]
public ScreenElements Elements {get; private set;}
}
And your Screen xaml can look like
<Screen xmlns="clr-namespace:MyNamespace"
Title="Home Screen"
ToolbarPresent="false"
>
<ScreenElement Author="Myself" DateCreated="..."/>
<ScreenElement Author="Yourself" DateCreated="..."/>
</Screen>
You can create XAML resource and load it like...
Screen s = XamlReader.Load(.. resource uri to your XAML)
// and now you can use your "s" loaded with elements to
// populate your UI
foreach(ScreenElement e in s.Elements){
// use attributes of e to populate things..
}

I think the best in your case would be to devide all possible data in the system by data classes - metatypes. after that, in xml, specify data metatype so your data would be always have metatype. And when, before view creation, you should read all metatypes for data you are intend to display and create screen controls according to that metatypes. After that you could load and display data. Such approach works well in my small programm and I thinks it would yield good results in your system too.
[EDIT]
OK, your application includes business domain (your business data, business logic and rules for data displaying). All this things you have spread among three parts: Model, View and ViewModel. As I understand correctly your question is stright about ViewModel.
For example your hypothetical application containы employee information and suppose every employee may have three types of information about he or she:
Personal information (Name, date of birth, photo, home address, mobile phone number)
Education information (information about education, list of completed training cources)
Proffesional experience information (list of succesfully completed commercial projects)
So we have domain - employee. This domain may be devided into three metatype:
Personal metatype
Education metatype
Proffesional experience metatype
For each metatype we should create subscreen which would display metatype information according to business rules. I'll recomend you to make metatype subscreens with MVC pattern because of in case of editing of data some special editing rule or data validation may be applyed. When we have each subscreen created we can be free to display each type of meta information in the system.
For example you application have loaded employee information. After that you can determine which metatype presented in loaded data and can force creation of appropriate subscreens. The last part of work is to pass appropriate data to each sub screen.
It was very vague explanation sorry for my english, if you have any question about I have explained feel free and ask question again

Class structure for an inventory control system

As an exercise in good OO methods and design, I want to know what is a good way to model inventory control in a company. The problem description is
a. A company can have different types of items, like documents (both electronic and physical), computers etc which may further have their own sub types.
b. The items can be kept in a store, may be circulated to an employee, may be mailed out etc. An electronic document can be emailed to many people at a time.
c. Items may have certain restrictions like a classified document be circulated to only people/places with access (eg, people with classified clearance or a room cleared to store those documents etc)
what is a good class structure(s) that can be used to model this kind of tracking? (pseudo C# class structure or c++ would be helpful) and what kind of design patterns would be good for such a task

Answering your question would need deep investigation of the problem domain. I don't think there is a universally valid approach.
There are some patterns that are likely to appear, though. One of them (and one of the most difficult to implement, by the way), is the type/instance pattern. Based on my experience, I am assuming that the types of the items that your inventory app must keep track of cannot be fixed, and that users of your system must be able to create and modify types at any time. This means that your system needs to handle two levels of classification rather than the usual one; in other words, your system will have classes (in code), instances of those classes (in run-time) and instances of those instances (in run-time too).
For example, if you create the DocumentType class in code, your users would instantiate it a number of times, creating objects such as Report, Memo or Manual. Then, each individual report that your system manages would be an instance of Report. And each individual memo would be an instance of Memo. And so on.
This is easy to implement if subtypes (Report, Memo, Manual in my example) don't carry their own attributes or their own relationships to other pieces of information. However, if they need specific data structures (attributes and/or associations), then the problem becomes much harder, because you'll need to mimic a complete object-oriented type/instance engine within your system.
It's lots of fun, though!

Here's a preliminary suggestion to get you started:
struct Item
{
// The requirements say everything is an item.
};
struct Document
: public Item
{
// There are documents
};
struct Document_Physical
: public Document
{
// Physical documents
};
struct Document_Electronic
: public Document
{
// Electronic documents
};
struct Computer
: public Item
{
};
The contents of the items and their methods depends on the interpretation of the requirements document. This is one schema, there may be others.
Some useful design patterns: Visitor, Factory, Producer-Consumer to name a few. Also research the topic "refactoring".

Design pattern for class with upwards of 100 properties

What advice/suggestions/guidance would you provide for designing a class that has upwards of 100 properties?
Background
The class describes an invoice. An invoice can have upwards of 100 attributes describing it, i.e. date, amount, code, etc...
The system we are submitting the invoice to uses each of the 100 attributes and is submitted as a single entity (as opposed to various parts being submitted at different times).
The attributes describing the invoice are required as part of the business process. The business process can not be changed.
Suggestions?
What have others done when faced with designing a class that has 100 attributes? i.e., create the class with each of the 100 properties?
Somehow break it up (if so, how)?
Or is this a fairly normal occurrence in your experience?
EDIT
After reading through some great responses and thinking about this further, I don't think there really is any single answer for this question. However, since we ended up modeling our design along the lines of LBrushkin's Answer I have given him credit. Albeit not the most popular answer, LBrushkin's answer helped push us into defining several interfaces which we aggregate and reuse throughout the application as well as a nudged us into investigating some patterns that may be helpful down the road.

You could try to 'normalize' it like you would a database table. Maybe put all the address related properties in an Address class for example - then have a BillingAddress and MailingAddress property of type Address in your Invoice class. These classes could be reused later on also.

The bad design is obviously in the system you are submitting to - no invoice has 100+ properties that cannot be grouped into a substructure. For example an invoice will have a customer and a customer will have an id and an address. The address in turn will have a street, a postal code, and what else. But all this properties should not belong directly to the invoice - an invoice has no customer id or postal code.
If you have to build an invoice class with all these properties directly attached to the invoice, I suggest to make a clean design with multiple classes for a customer, an address, and all the other required stuff and then just wrap this well designed object graph with a fat invoice class having no storage and logic itself just passing all operations to the object graph behind.

I would imagine that some of these properties are probably related to each other. I would imagine that there are probably groups of properties that define independent facets of an Invoice that make sense as a group.
You may want to consider creating individual interfaces that model the different facets of an invoice. This may help define the methods and properties that operate on these facets in a more coherent, and easy to understand manner.
You can also choose to combine properties that having a particular meaning (addresses, locations, ranges, etc) into objects that you aggregate, rather than as individual properties of a single large class.
Keep in mind, that the abstraction you choose to model a problem and the abstraction you need in order to communicate with some other system (or business process) don't have to be the same. In fact, it's often productive to apply the bridge pattern to allow the separate abstractions to evolve independently.

Hmmm... Are all of those really relevant specifically, and only to the invoice? Typically what I've seen is something like:
class Customer:
.ID
.Name
class Address
.ID
.Street1
.Street2
.City
.State
.Zip
class CustomerAddress
.CustomerID
.AddressID
.AddressDescription ("ship","bill",etc)
class Order
.ID
.CustomerID
.DatePlaced
.DateShipped
.SubTotal
class OrderDetails
.OrderID
.ItemID
.ItemName
.ItemDescription
.Quantity
.UnitPrice
And tying it all together:
class Invoice
.OrderID
.CustomerID
.DateInvoiced
When printing the invoice, join all of these records together.
If you really must have a single class with 100+ properties, it may be better to use a dictionary
Dictionary<string,object> d = new Dictionary<string,object>();
d.Add("CustomerName","Bob");
d.Add("ShipAddress","1600 Pennsylvania Ave, Suite 0, Washington, DC 00001");
d.Add("ShipDate",DateTime.Now);
....
The idea here is to divide your into logical units. In the above example, each class corresponds to a table in a database. You could load each of these into a dedicated class in your data access layer, or select with a join from the tables where they are stored when generating your report (invoice).

Unless your code actually uses many of the attributes at many places, I'd go for a dictionary instead.
Having real properties has its advantages(type-safety, discoverability/intellisense, refactorability) but these don't matter if all the code does is gets these from elsewhere, displays on UI, sends in a web-service, saves to a file etc.

It would be too many columns when your class / table that you store it in starts to violate the rules of normalization.
In my experience, it has been very hard to get that many columns when you are normalizing properly. Apply the rules of normalization to the wide table / class and I think you will end up with fewer columns per entity.

It's considered bad O-O style, but if all you're doing is populating an object with properties to pass them onward for processing, and the processing only reads the properties (presumably to create some other object or database updates), them perhaps a simple POD object is what you need, having all public members, a default constructor, and no other member methods. You can thus treat is as a container of properties instead of a full-blown object.

I used a Dictionary < string,string > for something like this.
it comes with a whole bunch of functions that can process it, it's easy to convert strings to other structures, easy to store, etc.

You should not be motivated purely by aesthetic considerations.
Per your comments, the object is basically a data transfer object consumed by a legacy system that expects the presence of all the fields.
Unless there is real value in composing this object from parts, what precisely is gained by obscuring its function and purpose?
These would be valid reasons:
1 - You are gathering the information for this object from various systems and the parts are relatively independent. It would make sense to compose the final object in that case based on process considerations.
2 - You have other systems that can consume various sub-sets of the fields of this object. Here reuse is the motivating factor.
3 - There is a very real possibility of a next generation invoicing system based on a more rational design. Here extensibility and evolution of the system are the motivating factor.
If none of these considerations are applicable in your case, then what's the point?

It sounds like for the end result you need to produce an invoice object with around 100 properties. Do you have 100 such properties in every case? Maybe you would want a factory, a class that would produce an invoice given a smaller set of parameters. A different factory method could be added for each scenario where the relevant fields of the invoice are relevant.

If what you're trying to create is a table gateway for pre-existing 100-column table to this other service, a list or dictionary might be pretty quick way to get started. However if you're taking input from a large form or UI wizard, you're probably going to have to validate the contents before submission to your remote service.
A simple DTO might look like this:
class Form
{
public $stuff = array();
function add( $key, $value ) {}
}
A table gateway might be more like:
class Form
{
function findBySubmitId( $id ) {} // look up my form
function saveRecord() {} // save it for my session
function toBillingInvoice() {} // export it when done
}
And you could extend that pretty easily depending on if you have variations of the invoice. (Adding a validate() method for each subclass might be appropriate.)
class TPSReport extends Form {
function validate() {}
}
If you want to separate your DTO from the delivery mechanism, because the delivery mechanism is generic to all your invoices, that could be easy. However you might be in a situation where there is business logic around the success or failure of the invoice. And this is where I'm prolly going off into the weeds. But it's where and OO model can be useful...I'll wage a penny that there will be different invoices and different procedures for different invoices, and if invoice submission barfs, you'll need extra routines :-)
class Form {
function submitToBilling() {}
function reportFailedSubmit() {}
function reportSuccessfulSubmit() {}
}
class TPSReport extends Form {
function validate() {}
function reportFailedSubmit() { /* oh this goes to AR */ }
}
Note David Livelys answer: it is a good insight. Often, fields on a form are each their own data structures and have their own validation rules. So you can model composite objects pretty quickly. This would associate each field type with its own validation rules and enforce stricter typing.
If you do have to get further into validation, often business rules are a whole different modelling from the forms or the DTOs that supply them. You could also be faced with logic that is oriented by department and has little to do with the form. Important to keep that out of the validation of the form itself and model submission process(es) separately.
If you are organizing a schema behind these forms, instead of a table with 100 columns, you would probably break down the entries by field identifiers and values, into just a few columns.
table FormSubmissions (
id int
formVer int -- fk of FormVersions
formNum int -- group by form submission
fieldName int -- fk of FormFields
fieldValue text
)
table FormFields (
id int
fieldName char
)
table FormVersions (
id
name
)
select s.* f.fieldName from FormSubmissions s
left join FormFields f on s.fieldName = f.id
where formNum = 12345 ;
I would say this is definitely a case where you're going to want to re-factor your way around until you find something comfortable. Hopefully you have some control over things like schema and your object model. (BTW...is that table known a 'normalized'? I've seen variations on that schema, typically organized by data type...good?)

Do you always need all the properties that are returned? Can you use projection with whatever class is consuming the data and only generate the properties you need at the time.

You could try LINQ, it will auto-gen properties for you. If all the fields are spread across multiple tables and you could build a view and drag the view over to your designer.

Dictionary ? why not, but not necessarily. I see a C# tag, your language has reflection, good for you. I had a few too large classes like this in my Python code, and reflection helps a lot :
for attName in 'attr1', 'attr2', ..... (10 other attributes):
setattr( self, attName, process_attribute( getattr( self, attName ))
When you want to convert 10 string members from some encoding to UNICODE, some other string members shouldn't be touched, you want to apply some numerical processing to other members... convert types... a for loop beats copy-pasting lots of code anytime for cleanliness.

If an entity has a hundred unique attributes than a single class with a hundred properties is the correct thing to do.
It may be possible to split things like addresses into a sub class, but this is because an address is really an entity in itself and easily recognised as such.
A textbook (i.e. oversimplified not usable in the real world) invoice would look like:-
class invoice:
int id;
address shipto_address;
address billing_address;
order_date date;
ship_date date;
.
.
.
line_item invoice_line[999];
class line_item;
int item_no.
int product_id;
amt unit_price;
int qty;
amt item_cost;
.
.
.
So I am surpised you dont have at least an array of line_items in there.
Get used to it! In the business world an entity can easily have hundreds and sometimes thousands of unique attributes.

if all else fails, at least split the class to several partial classes to have better readability. it'll also make it easier for the team to work in parallel on different part of this class.
good luck :)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.