MongoDB automatic document relation mapping C# - c#

Are there any frameworks out there for MongoDB in C# that can automatically map Document Relations? I'm now talking about a model or "schema" that is purely defined by documents themselves and not by objects in .Net or any other external schema for that matter.
Think dynamic objects / bsondocuments that can automatically lazy-load relations between other documents.
I have several ideas how to solve this myself however if there already exist any frameworks or perhaps BsonDocument extensions (how I intended to solve this myself) this would lessen the need to add complexity to the project I'm working at itself.

The question is largely off-topic ('are there frameworks'), but I'd like to challenge the idea in itself:
this would lessen the need to add complexity to the project I'm working at itself.
I think it would merely hide complexity by moving it to a part of the code that knows nothing about your functional or non-functional requirements. Combined with a database that has no constraints except unique that doesn't sound like a good idea.
I'd recommend to stay away from lazy loading as an almost general rule, because it makes it impossible to tell whether
an operation is super costly (database call) or a mere memory lookup
the properties' state will be fetched on access, or is cached, thus hiding the key aspect of serialization from the user.
In other words: I'd stay away from the idea, or use something like EF with whatever database for it. If you don't care about your serialization, use a well-tested commonplace solution.

Related

Schema Migration Scripts in NoSQL Databases

I have a active project that has always used C#, Entity Framework, and SQL Server. However, with the feasibility of NoSQL alternatives daily increasing, I am researching all the implications of switching the project to use MongoDB.
It is obvious that the major transition hurdles would be due to being "schema-less". A good summary of what that implies for languages like C# is found here in the official MongoDB documentation. Here are the most helpful relevant paragraphs (bold added):
Just because MongoDB is schema-less does not mean that your code can
handle a schema-less document. Most likely, if you are using a
statically typed language like C# or VB.NET, then your code is not
flexible and needs to be mapped to a known schema.
There are a number of different ways that a schema can change from one
version of your application to the next.
How you handle these is up to you. There are two different strategies:
Write an upgrade script. Incrementally update your documents as they
are used. The easiest strategy is to write an upgrade script. There is
effectively no difference to this method between a relational database
(SQL Server, Oracle) and MongoDB. Identify the documents that need to
be changed and update them.
Alternatively, and not supportable in most relational databases, is
the incremental upgrade. The idea is that your documents get updated
as they are used. Documents that are never used never get updated.
Because of this, there are some definite pitfalls you will need to be
aware of.
First, queries against a schema where half the documents are version 1
and half the documents are version 2 could go awry. For instance, if
you rename an element, then your query will need to test both the old
element name and the new element name to get all the results.
Second, any incremental upgrade code must stay in the code-base until
all the documents have been upgraded. For instance, if there have been
3 versions of a document, [1, 2, and 3] and we remove the upgrade code
from version 1 to version 2, any documents that still exist as version
1 are un-upgradeable.
The tooling for managing/creating such an initialization or upgrade scripts in SQL ecosystem is very mature (e.g. Entity Framework Migrations)
While there are similar tools and homemade scripts available for such upgrades in the NoSQL world (though some believe there should not be), there seems to be less consensus on "when" and "how" to run these upgrade scripts. Some suggest after deployment. Unfortunately this approach (when not used in conjunction with incremental updating) can leave the application in an unusable state when attempting to read existing data for which the C# model has changed.
If
"The easiest strategy is to write an upgrade script."
is truly the easiest/recommended approach for static .NET languages like C#, are there existing tools for code-first schema migration in NoSql Databases for those languages? or is the NoSql ecosystem not to that point of maturity?
If you disagree with MongoDB's suggestion, what is a better implementation, and can you give some reference/examples of where I can see that implementation in use?
Short version
Is "The easiest strategy is to write an upgrade script." is truly the easiest/recommended approach for static .NET languages like C#?
No. You could do that, but that's not the strength of NoSQL. Using C# does not change that.
are there existing tools for code-first schema migration in NoSql Databases for those languages?
Not that I'm aware of.
or is the NoSql ecosystem not to that point of maturity?
It's schemaless. I don't think that's the goal or measurement of maturity.
Warnings
First off, I'm rather skeptical that just pushing an existing relational model to NoSql would in a general case solve more problems than it would create.
SQL is for working with relations and on sets of data, noSQL is targeted for working with non-relational data: "islands" with few and/or soft relations. Both are good at what what they are targeting, but they are good at different things. They are not interchangeable. Not without serious effort in data redesign, team mindset and application logic change, possibly invalidating most previous technical design decision and having impact run up to architectural system properties and possibly up to user experience.
Obviously, it may make sense in your case, but definitely do the ROI math before committing.
Dealing with schema change
Assuming you really have good reasons to switch, and schema change management is a key in that, I would suggest to not fight the schemaless nature of NoSQL and embrace it instead. Accept that your data will have different schemas.
Don't do upgrade scripts
.. unless you know your application data set will never-ever grow or change notably. The other SO post you referenced explains it really well. You just can't rely on being able to do this in long term and hence you need a plan B anyway. Might as well start with it and only use schema update scripts if it really is the simpler thing to do for that specific case.
I would maybe add to the argumentation that a good NoSQL-optimized data model is usually optimized for single-item seeks and writes and mass-updates can be significantly heavier compared to SQL, i.e. to update a single field you may have to rewrite a larger portion of the document + maybe handle some denormalizations introduced to reduce the need of lookups in noSQL (and it may not even be transactional). So "large" in NoSql may happen to be significantly smaller and occur faster than you would expect, when measuring in upgrade down-time.
Support multiple schemas concurrently
Having different concurrently "active" schema versions is in practice expected since there is no enforcement anyway and that's the core feature you are buying into by switching to NoSQL in the first place.
Ideally, in noSQL mindset, your logic should be able to work with any input data that meets the requirements a specific process has. It should depend on its required input not your storage model (which also makes universally sense for dependency management to reduce complexity). Maybe logic just depends on a few properties in a single type of document. It should not break if some other fields have changed or there is some extra data added as long as they are not relevant to given specific work to be done. Definitely it should not care if some other model type has had changes. This approach usually implies working on some soft value bags (JSON/dynamic/dictionary/etc).
Even if the storage model is schema-less, then each business logic process has expectations about input model (schema subset) and it should validate it can work with what it's given. Persisted schema version number along model also helps in trickier cases.
As a C# guy, I personally avoid working with dynamic models directly and prefer creating a strongly typed objects to wrap each dynamic storage type. To avoid having to manage N concurrent schema version models (with minimal differences) and constantly upgrade logic layer to support new schema versions, I would implement it as a superset of all currently supported schema versions for given entity and implement any interfaces you need. Of course you could add N more abstraction layers ;) Once some old schema versions have eventually phased out from data, you can simplify your model and get strongly typed support to reach all dependents.
Also, it's important for logic layer should have a fallback or reaction plan should the input model NOT match the requirements for carrying out the intended logic. It's up to app when and where you can auto-upgrade, accept a discard, partial reset or have to direct to some trickier repair queue (up to manual fix if no automatics can cut it) or have to just outright reject the request due to incompatibility.
Yes, there's the problem of querying across sets of models with different versions, so you should always consider those cases as well. You may have to adjust querying logic to query different versions separately and merge results (or accept partial results if acceptable).
There definitely are tradeoffs to consider, sure.
So, migrations?
A downside (if you consider migrations tool set availability) is that you don't have one true schema to auto generate the model or it's changes as the C# model IS the source-of-truth schema you're currently supporting. Actually, quite similar to code-first mindset, but without migrations.
You could implement an incoming model pipe which auto-upgrades the models as they are read and hence reduce the number schema versions you need to support upstream. I would say this is as close to migrations as you get. I don't know any tools to do this for you automatically and I'm not sure I would want it to. There are trade-offs to consider, for example some clients consuming the data may get upgraded with different time-line etc. Upgrade to latest may not always be what you want.
Conclusion
NoSQL is by definition not SQL. Both are cool, but expecting equivalency or interchangeability is bound for trouble.
You still have to consider and manage schema in NoSQL, but if you want one true enforced & guaranteed schema, then consider SQL instead.
While Imre's answer is really great and I agree with it in every detail I would like to add more to it but also trying to not duplicate information.
Short version
If you plan to migrate your existing C#/EF/SQL project to MongoDB it is a high chance that you shouldn't. It probably works quite well for some time, the team knows it and probably hundreds or more bugs have been already fixed and users are more or less happy with it. This is the real value that you already have. And I mean it. For reasons why you should not replace old code with new code see here:
https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/.
Also more important than existence of tools for any technology is that it brings value and it works as promised (tooling is secondary).
Disclaimers
I do not like the explanation from mongoDB you cited that claims that statically typed language is an issue here. It is true but only on a basic, superficial level. More on this later.
I do not agree that EF Code First Migration is very mature - though it is really great for development and test environments and it is much, much better than previous .NET database-first approaches but still you have to have your own careful approach for production deployments.
Investing in your own tooling should not be a blocker for you. In fact if the engine you choose would be really great it is worthwhile to write some specific tooling around it. I believe that great teams rarely use tooling "off the shelves". They rather choose technologies wisely and then customize tools to their needs or build new tools around it (probably selling the tool a year or two years later).
Where the front line lays
It is not between statically and dynamically typed languages. This difference is highly overrated.
It is more about problem at hand and nature of your schema.
Part of the schema is quite static and it will play nicely both in static and dynamic "world" but other part can be naturally changing with time and it fits better for dynamically typed languages but not in the essence of it.
You can easily write code in C# that has a list of pairs (key, value) and thus have dynamism under control. What dynamically typed languages gives you is impression that you call properties directly while in C# you access it by "key". While being easier and prettier to use for developer it does not save you from bigger problems like deploy schema changes, access different versions of schemas etc.
So static/dynamic languages case is not an issue here at all.
It is rather drawing a line between data that you want to control from your code (that is involved in any logic) and the other part that you do not have to control strictly. The second part do not have to be explicitly and minutely expressed in schema in your code (it can be rather list or dictionary than named fields/properties because maintaining such fields costs you but does not brings any value).
My Use Case
Once upon a time my team has made a project that uses three different databases:
SQL for "usual" configuration and evidence stuff
Graph database to make it natural to build wide network of arbitrarily connected objects
Document database tuned for searching (Elastic Search in fact) to make searching instant and really modern (like dealing with typos or the like)
Of course it is a challenge to deploy such wide technology stack but each part of it brings its best to the whole solution.
The aim of the project is to search through a knowledge base of literally anything (projects, peoples, books, products, documents, simply anything).
That's why SQL is here only to record a list of available "knowledge databases" and users assigned to them. The schema here is obvious, stable and trivial. There is low probability of changes in the future.
Next, graph database allows to literally "throw" anything into the database from different sources around and connect things with each other. The idea, to put it simply, is to have objects accessible by ID.
Next, Elastic search is here to accumulate IDs and a selected subset of properties to make them searchable in the instant. Here the schema contains only ID and list of pairs (key, value).
As the final step, to put it simply, the solution calls Elastic Search, gets Ids and displays details (schema is irrelevant as we treat it as a list of pairs key x value, so GUI is prepared to build screens dynamically).
Though the way to the solution was really painful.
We tested a few graph databases by running concept proofs to find that most of them simply does not work in operations like updating data! (ugh!!!) Finally we have found one good enough DB.
On the other hand finding and using Elastic Search was a great pleasure! Though being great you have to be aware that under pressure of uploading massive data it can break so you have to adjust your tooling to adapt to it.
(so no silver bullet here).
Going into more widely used direction
Apart from my use case which is kind of extreme usually you have sth "in-between".
For example a database for documents.
It can have almost static "header" of fields like ID, name, author, and so on and your code can manage it "traditionally" but all other fields could be managed in a way that it can exists or not and can have different contents or structure.
"The header" is the part you decided to make it relevant for the project and controllable by the project. The rest is rather accompanying than crucial (from the project logic point of view).
Different approaches
I would rather recommend to learn about strengths of particular NoSQL database types, find answers why were they created, why are they popular and useful. Then answer in which way they can bring benefits to your project.
BTW. This is interesting why you have indicated MongoDB?
The other way around would be to answer what are your project's current greatest weaknesses or greatest challenges from technological point of view - being it performance, cost of support changes, need to scale significantly or other. Then try to answer if some NoSQL DB would be great at resolving the issue.
Conclusion
I'm sure you can find benefits of NoSQL databases to your project either by replacing part of it or by bringing new values to users (searching for example?). Either way I would prefer a really good technology that brings what it promises rather than looking if it is fully supported by tools around it.
And also concept proof is a really good tool to check technologies in a scenario that is very simple but at the same time meaningful for you. But the approach should be not to play with technologies but aggressively and quickly prove or disprove quality of them.
There are so much promises and advertises around that we should protect ourselves by focusing of the real things that works.

Serializing complex EF model over JSON

I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.

Does setting serialization mode to Unidirectional have any drawback or performance hit?

I want to add a history feature for one of my application's entities. My application is using linq to sql as O/RM.
The solution I have selected for saving previous versions of entity is to serialize old version and save serialized versions somewhere.
Anyway, I was looking for the best way to serialize linq to sql entities. One of the best solutions I found was simply setting serialization mode to Unidirectional in DataContext object, and then use .net serializer. It seems that the method works well.
But, my question is, do this method have any side effect on data context or any performance hit? For example, does lazy loading of objects works intact after turning unidirectional serialization mode on?
I've searched a lot but found nothing.
And, do you think I should go another way to serialize objects? Any cheaper method?
PS 1: I don't like using DTOs, because I should maintain compatibility between DataContext object and DTOs.
PS 2: Please note, that the application I'm working on is relatively large and currently under heavy load. So I should be careful about changes in O/RM behavior or performance issues, because I could not review the entire application + online versions.
Thanks a lot :)
Re "and then use .net serializer", the intent of unidirectional mode is to support DataContractSerializer (specifically). The effect is basically just the addition of the [DataContract]/[DataMember] flags, but: based on experience, and despite your reservations, I would strongly advise you to go the DTO route:
ORMs, due to lazy loading of both sub-collections and properties (sometimes) makes it very hard to predict exacty what is going to happen
and at the same time, make it hard to monitor when data load is happening (N+1 in your serialization code can go un-noticed)
you have very little control over the size of the model (how many levels of descendants etc) are going to be serialized
and don't even get me started on parent navigation
the deserialized objects will not have a data-context, which means that their lazy loading won't work - this can lead to unexpected behaviours
you can get into all sorts of problems if you can't version your data model separately from your serialization model
there are some... "peculiarities" that impact LINQ-to-SQL and serialization (from memory, the most "fun" one is a difference in how the IList.Add and IList<T>.Add is implemented, meaning that one triggers data-change flags and one doesn't; nasty)
you are making everything dependent on your data model, which also means that you can never change your data model (at least, not easily)
a lot of scenarios that involve DTO are read-based, not write-based, so you might not even need the ORM overheads in the first place - for example, you might be able to replace some hot code-path pieces with lighter tools (dapper, etc) - you can't do that well if everything
is tied to the data model
I strongly advocate biting the bullet and creating a DTO model that matches what you want to serialize; the DTO model would be incredibly simple, and should only take a few minutes to throw together, including the code to map to/from your data model. This will give you total control over what is serialized, while also giving you freedom to iterate without headaches, and do things like use different serializers

DAL: DataContext principles understanding

Not sure if there's a "offical" name, but by DataContext I mean an object which transparently maintains objects' state, providing change tracking, statement-of-work functionality, concurrency and probably many other useful features. (In Entity Framework it's ObjectContext, in NHibernate - ISession).
Eventually I've come to an idea that something like that should be implemented in my application (it uses mongodb as back-end, and mongodb's partial updates are fine when we're able to track a certain property change).
So actually, I've got several questions on this subject
Could anyone formulate requirements to DataContext? - what's your understanding of it's tasks and responsibilities? (The most relevant I've managed to find is Esposito's book, but unfortunately that's at about msdn samples level).
What would you suggest for changes tracking implementation? (In simplest way it's possible to track changes "manually" in entities, but requires coding and mixes dal with business logic, so I mostly interested in "automatic" way, keeping entities more poco).
Is there way to utilize some existing solution? (I hoped nhibernate infrastructure would allow plugging-in custom module to work with mongo behind the scene, but not sure if it allows working with non-sql dbs at all).
The DataContext (ObjectContext or DbContext in EF) is nothing else than an implementation of the Unit of Work (UoW)/Repository pattern.
I suggest you Fowler's book on Patterns of Enterprise Application Architecture in which he outlines the implementation of several persistency patterns. That might be a help in implementing your own solution.
A DataContext basically needs to fullfil the job of a UoW. It needs to handle the reading and management of objects that are involved in a given lifecycle (i.e. HTTP request), s.t. there are no two objects in memory that represent the same record on the DB. Moreover it needs to provide some change tracking for performing partial updates to the DB (as you already mentioned).
To what regards change tracking, I fully agree that polluting properties with change events etc is bad. One of the recent templates introduced in EF4.1 uses Proxies to handle that and to give the possibility to have plain POCOs.
Answer to quetion 2: To make POCO classes work, you will need to generate code at run-time, possibly using System.Reflection.
If you analyse EntityFramework, you will see that it requires virtual properties to do change-tracking... that is because it needs to create a generated class at run-time, that overrides each property, and then adds code to tell the DataContext when someone changes that property.
EntityFramework also generates code to initialize collections, so that when someone try to do operations like Add and Remove, the collection object itself knows what to do.

Rich domain model with ORM

I seem to be missing something and extensive use of google didn't help to improve my understanding...
Here is my problem:
I like to create my domain model in a persistence ignorant manner, for example:
I don't want to add virtual if I don't need it otherwise.
I don't like to add a default constructor, because I like my objects to always be fully constructed. Furthermore, the need for a default constructor is problematic in the context of dependency injection.
I don't want to use overly complicated mappings, because my domain model uses interfaces or other constructs not readily supported by the ORM.
One solution to this would be to have separate domain objects and data entities. Retrieval of the constructed domain objects could easily be solved using the repository pattern and building the domain object from the data entity returned by the ORM. Using AutoMapper, this would be trivial and not too much code overhead.
But I have one big problem with this approach: It seems that I can't really support lazy loading without writing code for it myself. Additionally, I would have quite a lot of classes for the same "thing", especially in the extended context of WCF and UI:
Data entity (mapped to the ORM)
Domain model
WCF DTO
View model
So, my question is: What am I missing? How is this problem generally solved?
UPDATE:
The answers so far suggest what I already feared: It looks like I have two options:
Make compromises on the domain model to match the prerequisites of the ORM and thus have a domain model the ORM leaks into
Create a lot of additional code
UPDATE:
In addition to the accepted answer, please see my answer for concrete information on how I solved those problems for me.
I would question that matching the prereqs of an ORM is necessarily "making compromises". However, some of these are fair points from the standpoint of a highly SOLID, loosely-coupled architecture.
An ORM framework exists for one sole reason; to take a domain model implemented by you, and persist it into a similar DB structure, without you having to implement a large number of bug-prone, near-impossible-to-unit-test SQL strings or stored procedures. They also easily implement concepts like lazy-loading; hydrating an object at the last minute before that object is needed, instead of building a large object graph yourself.
If you want stored procs, or have them and need to use them (whether you want to or not), most ORMs are not the right tool for the job. If you have a very complex domain structure such that the ORM cannot map the relationship between a field and its data source, I would seriously question why you are using that domain and that data source. And if you want 100% POCO objects, with no knowledge of the persistence mechanism behind, then you will likely end up doing an end run around most of the power of an ORM, because if the domain doesn't have virtual members or child collections that can be replaced with proxies, then you are forced to eager-load the entire object graph (which may well be impossible if you have a massive interlinked object graph).
While ORMs do require some knowledge in the domain of the persistence mechanism in terms of domain design, an ORM still results in much more SOLID designs, IMO. Without an ORM, these are your options:
Roll your own Repository that contains a method to produce and persist every type of "top-level" object in your domain (a "God Object" anti-pattern)
Create DAOs that each work on a different object type. These types require you to hard-code the get and set between ADO DataReaders and your objects; in the average case a mapping greatly simplifies the process. The DAOs also have to know about each other; to persist an Invoice you need the DAO for the Invoice, which needs a DAO for the InvoiceLine, Customer and GeneralLedger objects as well. And, there must be a common, abstracted transaction control mechanism built into all of this.
Set up an ActiveRecord pattern where objects persist themselves (and put even more knowledge about the persistence mechanism into your domain)
Overall, the second option is the most SOLID, but more often than not it turns into a beast-and-two-thirds to maintain, especially when dealing with a domain containing backreferences and circular references. For instance, for fast retrieval and/or traversal, an InvoiceLineDetail record (perhaps containing shipping notes or tax information) might refer directly to the Invoice as well as the InvoiceLine to which it belongs. That creates a 3-node circular reference that requires either an O(n^2) algorithm to detect that the object has been handled already, or hard-coded logic concerning a "cascade" behavior for the backreference. I've had to implement "graph walkers" before; trust me, you DO NOT WANT to do this if there is ANY other way of doing the job.
So, in conclusion, my opinion is that ORMs are the least of all evils given a sufficiently complex domain. They encapsulate much of what is not SOLID about persistence mechanisms, and reduce knowledge of the domain about its persistence to very high-level implementation details that break down to simple rules ("all domain objects must have all their public members marked virtual").
In short - it is not solved
(here goes additional useless characters to post my awesome answer)
All good points.
I don't have an answer (but the comment got too long when I decided to add something about stored procs) except to say my philosophy seems to be identical to yours and I code or code generate.
Things like partial classes make this a lot easier than it used to be in the early .NET days. But ORMs (as a distinct "thing" as opposed to something that just gets done in getting to and from the database) still require a LOT of compromises and they are, frankly, too leaky of an abstraction for me. And I'm not big on having a lot of dupe classes because my designs tend to have a very long life and change a lot over the years (decades, even).
As far as the database side, stored procs are a necessity in my view. I know that ORMs support them, but the tendency is not to do so by most ORM users and that is a huge negative for me - because they talk about a best practice and then they couple to a table-based design even if it is created from a code-first model. Seems to me they should look at an object datastore if they don't want to use a relational database in a way which utilizes its strengths. I believe in Code AND Database first - i.e. model the database and the object model simultaneously back and forth and then work inwards from both ends. I'm going to lay it out right here:
If you let your developers code ORM against your tables, your app is going to have problems being able to live for years. Tables need to change. More and more people are going to want to knock up against those entities, and now they all are using an ORM generated from tables. And you are going to want to refactor your tables over time. In addition, only stored procedures are going to give you any kind of usable role-based manageability without dealing with every tabl on a per-column GRANT basis - which is super-painful. If you program well in OO, you have to understand the benefits of controlled coupling. That's all stored procedures are - USE THEM so your database has a well-defined interface. Or don't use a relational database if you just want a "dumb" datastore.
Have you looked at the Entity Framework 4.1 Code First? IIRC, the domain objects are pure POCOs.
this what we did on our latest project, and it worked out pretty well
use EF 4.1 with virtual keywords for our business objects and have our own custom implementation of T4 template. Wrapping the ObjectContext behind an interface for repository style dataaccess.
using automapper to convert between Bo To DTO
using autoMapper to convert between ViewModel and DTO.
you would think that viewmodel and Dto and Business objects are same thing, and they might look same, but they have a very clear seperation in terms of concerns.
View Models are more about UI screen, DTO is more about the task you are accomplishing, and Business objects primarily concerned about the domain
There are some comprimises along the way, but if you want EF, then the benfits outweigh things that you give up
Over a year later, I have solved these problems for me now.
Using NHibernate, I am able to map fairly complex Domain Models to reasonable database designs that wouldn't make a DBA cringe.
Sometimes it is needed to create a new implementation of the IUserType interface so that NHibernate can correctly persist a custom type. Thanks to NHibernates extensible nature, that is no big deal.
I found no way to avoid adding virtual to my properties without loosing lazy loading. I still don't particularly like it, especially because of all the warnings from Code Analysis about virtual properties without derived classes overriding them, but out of pragmatism, I can now live with it.
For the default constructor I also found a solution I can live with. I add the constructors I need as public constructors and I add an obsolete protected constructor for NHibernate to use:
[Obsolete("This constructor exists because of NHibernate. Do not use.")]
protected DataExportForeignKey()
{
}

Categories

Resources