pluggable data store architectures - c#

I have a pluggable system management tool. The architecture of this kind of thing is well understood (interfaces, publish/ subscribe, ....). How about the data store though. What do people do?
I need plugins to be able to add new entities, extend existing entities, establish new relationships, etc.
My thoughts (SQL), not necessarily well thought out
each plugin simply extends the schema when they are installed. In the old days changing the schema was a big no-no; now databases are very relaxed about this
plugins have their own tables. If 2 of them have an entity (say) person, then there are 2 tables p1_person and p2_person
plugins have their own database
invent some sort of flexible scheme where the tables are softly typed. Maybe many attributes packed into a single attribute. The ultimate is to have one big table called data, with key of table name & column name and a single data value.
Not SQL
object DB. I have no experience with these. Anybody care to pass on experience. db4o for example. Can I change the 'schema' of objects as the app evolves
NO-SQL
this is 'where its at' at the moment. Most of these seem to be aimed slightly differently than my needs. Anybody want to pass on experience with these
Apologies for the open ended question

My suggestion is go read about the entity framework
a lot of the situations you are describing can be solved (very elegantly) using table inheritance.
Your idea of one big table called data makes the hamsters in my computer cry ;)
The general trend is away from weakly typed schemas because they cannot be debugged at compile time. What you get from something like entity framework is a strongly typed extenislbe schema that you can code against using linq.
Object databases:
like you i havent played with them massivley - however the time when i was considering them was a time when there was no good ORM for .net and writing ado.net code was slowly killing me.
as for NO-SQL these are databases that meet a performance need. SQL performs badly in situations here there are lots of small writes occuring. I say badly tounge in cheek - it performs very well but when you scale to millions of concurrent users everything changes. My understanding of no sql is that it is a non rationalised format designed for lots of small fast writes and reads. The scale of sites that use these is usually very large.
OK - in response
I am currently lucky enough to be on a green field project so i am using EF to generate my schema.
On non greenfield projects I use sql scripts to update my table structures. As for implementing table inheritance in sql its very easy once you know the concept, its essentially a one to many relationship with a constraint that it will only ever be 0-1.
I wouldn't write .net code that updates the database structure ... that sounds like a disaster waiting to happen to me.
Beginning to think i have misunderstood what you are looking for. I find databases to be second nature as I have spent so long with them.
I haven't found a replacement for being meticulous about script management.

Related

Schema Migration Scripts in NoSQL Databases

I have a active project that has always used C#, Entity Framework, and SQL Server. However, with the feasibility of NoSQL alternatives daily increasing, I am researching all the implications of switching the project to use MongoDB.
It is obvious that the major transition hurdles would be due to being "schema-less". A good summary of what that implies for languages like C# is found here in the official MongoDB documentation. Here are the most helpful relevant paragraphs (bold added):
Just because MongoDB is schema-less does not mean that your code can
handle a schema-less document. Most likely, if you are using a
statically typed language like C# or VB.NET, then your code is not
flexible and needs to be mapped to a known schema.
There are a number of different ways that a schema can change from one
version of your application to the next.
How you handle these is up to you. There are two different strategies:
Write an upgrade script. Incrementally update your documents as they
are used. The easiest strategy is to write an upgrade script. There is
effectively no difference to this method between a relational database
(SQL Server, Oracle) and MongoDB. Identify the documents that need to
be changed and update them.
Alternatively, and not supportable in most relational databases, is
the incremental upgrade. The idea is that your documents get updated
as they are used. Documents that are never used never get updated.
Because of this, there are some definite pitfalls you will need to be
aware of.
First, queries against a schema where half the documents are version 1
and half the documents are version 2 could go awry. For instance, if
you rename an element, then your query will need to test both the old
element name and the new element name to get all the results.
Second, any incremental upgrade code must stay in the code-base until
all the documents have been upgraded. For instance, if there have been
3 versions of a document, [1, 2, and 3] and we remove the upgrade code
from version 1 to version 2, any documents that still exist as version
1 are un-upgradeable.
The tooling for managing/creating such an initialization or upgrade scripts in SQL ecosystem is very mature (e.g. Entity Framework Migrations)
While there are similar tools and homemade scripts available for such upgrades in the NoSQL world (though some believe there should not be), there seems to be less consensus on "when" and "how" to run these upgrade scripts. Some suggest after deployment. Unfortunately this approach (when not used in conjunction with incremental updating) can leave the application in an unusable state when attempting to read existing data for which the C# model has changed.
If
"The easiest strategy is to write an upgrade script."
is truly the easiest/recommended approach for static .NET languages like C#, are there existing tools for code-first schema migration in NoSql Databases for those languages? or is the NoSql ecosystem not to that point of maturity?
If you disagree with MongoDB's suggestion, what is a better implementation, and can you give some reference/examples of where I can see that implementation in use?
Short version
Is "The easiest strategy is to write an upgrade script." is truly the easiest/recommended approach for static .NET languages like C#?
No. You could do that, but that's not the strength of NoSQL. Using C# does not change that.
are there existing tools for code-first schema migration in NoSql Databases for those languages?
Not that I'm aware of.
or is the NoSql ecosystem not to that point of maturity?
It's schemaless. I don't think that's the goal or measurement of maturity.
Warnings
First off, I'm rather skeptical that just pushing an existing relational model to NoSql would in a general case solve more problems than it would create.
SQL is for working with relations and on sets of data, noSQL is targeted for working with non-relational data: "islands" with few and/or soft relations. Both are good at what what they are targeting, but they are good at different things. They are not interchangeable. Not without serious effort in data redesign, team mindset and application logic change, possibly invalidating most previous technical design decision and having impact run up to architectural system properties and possibly up to user experience.
Obviously, it may make sense in your case, but definitely do the ROI math before committing.
Dealing with schema change
Assuming you really have good reasons to switch, and schema change management is a key in that, I would suggest to not fight the schemaless nature of NoSQL and embrace it instead. Accept that your data will have different schemas.
Don't do upgrade scripts
.. unless you know your application data set will never-ever grow or change notably. The other SO post you referenced explains it really well. You just can't rely on being able to do this in long term and hence you need a plan B anyway. Might as well start with it and only use schema update scripts if it really is the simpler thing to do for that specific case.
I would maybe add to the argumentation that a good NoSQL-optimized data model is usually optimized for single-item seeks and writes and mass-updates can be significantly heavier compared to SQL, i.e. to update a single field you may have to rewrite a larger portion of the document + maybe handle some denormalizations introduced to reduce the need of lookups in noSQL (and it may not even be transactional). So "large" in NoSql may happen to be significantly smaller and occur faster than you would expect, when measuring in upgrade down-time.
Support multiple schemas concurrently
Having different concurrently "active" schema versions is in practice expected since there is no enforcement anyway and that's the core feature you are buying into by switching to NoSQL in the first place.
Ideally, in noSQL mindset, your logic should be able to work with any input data that meets the requirements a specific process has. It should depend on its required input not your storage model (which also makes universally sense for dependency management to reduce complexity). Maybe logic just depends on a few properties in a single type of document. It should not break if some other fields have changed or there is some extra data added as long as they are not relevant to given specific work to be done. Definitely it should not care if some other model type has had changes. This approach usually implies working on some soft value bags (JSON/dynamic/dictionary/etc).
Even if the storage model is schema-less, then each business logic process has expectations about input model (schema subset) and it should validate it can work with what it's given. Persisted schema version number along model also helps in trickier cases.
As a C# guy, I personally avoid working with dynamic models directly and prefer creating a strongly typed objects to wrap each dynamic storage type. To avoid having to manage N concurrent schema version models (with minimal differences) and constantly upgrade logic layer to support new schema versions, I would implement it as a superset of all currently supported schema versions for given entity and implement any interfaces you need. Of course you could add N more abstraction layers ;) Once some old schema versions have eventually phased out from data, you can simplify your model and get strongly typed support to reach all dependents.
Also, it's important for logic layer should have a fallback or reaction plan should the input model NOT match the requirements for carrying out the intended logic. It's up to app when and where you can auto-upgrade, accept a discard, partial reset or have to direct to some trickier repair queue (up to manual fix if no automatics can cut it) or have to just outright reject the request due to incompatibility.
Yes, there's the problem of querying across sets of models with different versions, so you should always consider those cases as well. You may have to adjust querying logic to query different versions separately and merge results (or accept partial results if acceptable).
There definitely are tradeoffs to consider, sure.
So, migrations?
A downside (if you consider migrations tool set availability) is that you don't have one true schema to auto generate the model or it's changes as the C# model IS the source-of-truth schema you're currently supporting. Actually, quite similar to code-first mindset, but without migrations.
You could implement an incoming model pipe which auto-upgrades the models as they are read and hence reduce the number schema versions you need to support upstream. I would say this is as close to migrations as you get. I don't know any tools to do this for you automatically and I'm not sure I would want it to. There are trade-offs to consider, for example some clients consuming the data may get upgraded with different time-line etc. Upgrade to latest may not always be what you want.
Conclusion
NoSQL is by definition not SQL. Both are cool, but expecting equivalency or interchangeability is bound for trouble.
You still have to consider and manage schema in NoSQL, but if you want one true enforced & guaranteed schema, then consider SQL instead.
While Imre's answer is really great and I agree with it in every detail I would like to add more to it but also trying to not duplicate information.
Short version
If you plan to migrate your existing C#/EF/SQL project to MongoDB it is a high chance that you shouldn't. It probably works quite well for some time, the team knows it and probably hundreds or more bugs have been already fixed and users are more or less happy with it. This is the real value that you already have. And I mean it. For reasons why you should not replace old code with new code see here:
https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/.
Also more important than existence of tools for any technology is that it brings value and it works as promised (tooling is secondary).
Disclaimers
I do not like the explanation from mongoDB you cited that claims that statically typed language is an issue here. It is true but only on a basic, superficial level. More on this later.
I do not agree that EF Code First Migration is very mature - though it is really great for development and test environments and it is much, much better than previous .NET database-first approaches but still you have to have your own careful approach for production deployments.
Investing in your own tooling should not be a blocker for you. In fact if the engine you choose would be really great it is worthwhile to write some specific tooling around it. I believe that great teams rarely use tooling "off the shelves". They rather choose technologies wisely and then customize tools to their needs or build new tools around it (probably selling the tool a year or two years later).
Where the front line lays
It is not between statically and dynamically typed languages. This difference is highly overrated.
It is more about problem at hand and nature of your schema.
Part of the schema is quite static and it will play nicely both in static and dynamic "world" but other part can be naturally changing with time and it fits better for dynamically typed languages but not in the essence of it.
You can easily write code in C# that has a list of pairs (key, value) and thus have dynamism under control. What dynamically typed languages gives you is impression that you call properties directly while in C# you access it by "key". While being easier and prettier to use for developer it does not save you from bigger problems like deploy schema changes, access different versions of schemas etc.
So static/dynamic languages case is not an issue here at all.
It is rather drawing a line between data that you want to control from your code (that is involved in any logic) and the other part that you do not have to control strictly. The second part do not have to be explicitly and minutely expressed in schema in your code (it can be rather list or dictionary than named fields/properties because maintaining such fields costs you but does not brings any value).
My Use Case
Once upon a time my team has made a project that uses three different databases:
SQL for "usual" configuration and evidence stuff
Graph database to make it natural to build wide network of arbitrarily connected objects
Document database tuned for searching (Elastic Search in fact) to make searching instant and really modern (like dealing with typos or the like)
Of course it is a challenge to deploy such wide technology stack but each part of it brings its best to the whole solution.
The aim of the project is to search through a knowledge base of literally anything (projects, peoples, books, products, documents, simply anything).
That's why SQL is here only to record a list of available "knowledge databases" and users assigned to them. The schema here is obvious, stable and trivial. There is low probability of changes in the future.
Next, graph database allows to literally "throw" anything into the database from different sources around and connect things with each other. The idea, to put it simply, is to have objects accessible by ID.
Next, Elastic search is here to accumulate IDs and a selected subset of properties to make them searchable in the instant. Here the schema contains only ID and list of pairs (key, value).
As the final step, to put it simply, the solution calls Elastic Search, gets Ids and displays details (schema is irrelevant as we treat it as a list of pairs key x value, so GUI is prepared to build screens dynamically).
Though the way to the solution was really painful.
We tested a few graph databases by running concept proofs to find that most of them simply does not work in operations like updating data! (ugh!!!) Finally we have found one good enough DB.
On the other hand finding and using Elastic Search was a great pleasure! Though being great you have to be aware that under pressure of uploading massive data it can break so you have to adjust your tooling to adapt to it.
(so no silver bullet here).
Going into more widely used direction
Apart from my use case which is kind of extreme usually you have sth "in-between".
For example a database for documents.
It can have almost static "header" of fields like ID, name, author, and so on and your code can manage it "traditionally" but all other fields could be managed in a way that it can exists or not and can have different contents or structure.
"The header" is the part you decided to make it relevant for the project and controllable by the project. The rest is rather accompanying than crucial (from the project logic point of view).
Different approaches
I would rather recommend to learn about strengths of particular NoSQL database types, find answers why were they created, why are they popular and useful. Then answer in which way they can bring benefits to your project.
BTW. This is interesting why you have indicated MongoDB?
The other way around would be to answer what are your project's current greatest weaknesses or greatest challenges from technological point of view - being it performance, cost of support changes, need to scale significantly or other. Then try to answer if some NoSQL DB would be great at resolving the issue.
Conclusion
I'm sure you can find benefits of NoSQL databases to your project either by replacing part of it or by bringing new values to users (searching for example?). Either way I would prefer a really good technology that brings what it promises rather than looking if it is fully supported by tools around it.
And also concept proof is a really good tool to check technologies in a scenario that is very simple but at the same time meaningful for you. But the approach should be not to play with technologies but aggressively and quickly prove or disprove quality of them.
There are so much promises and advertises around that we should protect ourselves by focusing of the real things that works.

Database Design In SQL Server or C#?

Should a database be designed on SQL Server or C#?
I always thought it was more appropriate to design it on SQL Server, but recently I started reading a book (Pro ASP.NET MVC Framework) which, to my understanding, basically says that it's probably a better idea to write it in C# since you will be accessing the model through C#, which does make sense.
I was wondering what everyone else's opinion on this matter was...
I mean, for example, do you consider "correct" having a table that specifies constants (like an AccessLevel table that is always supposed to contain
1 Everyone
2 Developers
3 Administrators
4 Supervisors
5 Restricted
Wouldn't it be more robust and streamlined to just have an enum for that same purpose?
A database schema should be designed on paper or with an ERD tool.
It should be implemented in the database.
Are you thinking about ORMs like Entity Framework that let you use code to generate the database?
Personally, I would rather think through my design on paper before committing it to a DB myself. I would be happy to use an ORM or class generator from this DB later on.
Before VS.NET 2010 I was using SQL Server Management Studio to design my databases, now I am using EF 4.0 designer, for me it's the best way to go.
If your problem domain is complex or its complexity grows as the system evolves you'll soon discover you need some meta data to make life easier. C# can be a good choice as a host language for such stuff as you can utilize its type-system to enforce some invariants (like char-columns length, null/not null restrictions or check-constraints; you can declared it as consts, enums, etc). Unfortunately i don't know utilities (sqlmetal.exe can export some meta but only as xml) that can do it out of the box, although some CASE tools probably can be customized. I'd go for some custom-made generator to produce the db schema from C# (just a few hours work comparing to learning, for example, customization options offered by Sybase PowerDesigner).
ORMs have their place, that place is NOT database design. There are many considerations in designing a database that need to be thought through not automatically generated no matter how appealing the idea of not thinking about design might be. There are often many things that need to be considered that have nothing to do with the application, things like data integrity, reporting, audit tables and data imports. Using an ORM to create a database that looks like an object model may not be the best design for performance and may not have the the things you really need in terms of data integrity. Remember even if you think nothing except the application will touch the database ever, this is not true. At some point the data base will need to have someone do a major data revision (to fix a problem) that is done directly on the database not through the application. At somepoint you are going to need need to import a million records from some other company you just bought and are goping to need an ETL process outside teh application. Putting all your hopes and dreams for the database (as well as your data integrity rules) is short-sighted.

What are some "mental steps" a developer must take to begin moving from SQL to NO-SQL (CouchDB, FathomDB, MongoDB, etc)?

I have my mind firmly wrapped around relational databases and how to code efficiently against them. Most of my experience is with MySQL and SQL. I like many of the things I'm hearing about document-based databases, especially when someone in a recent podcast mentioned huge performance benefits. So, if I'm going to go down that road, what are some of the mental steps I must take to shift from SQL to NO-SQL?
If it makes any difference in your answer, I'm a C# developer primarily (today, anyhow). I'm used to ORM's like EF and Linq to SQL. Before ORMs, I rolled my own objects with generics and datareaders. Maybe that matters, maybe it doesn't.
Here are some more specific:
How do I need to think about joins?
How will I query without a SELECT statement?
What happens to my existing stored objects when I add a property in my code?
(feel free to add questions of your own here)
Firstly, each NoSQL store is different. So it's not like choosing between Oracle or Sql Server or MySQL. The differences between them can be vast.
For example, with CouchDB you cannot execute ad-hoc queries (dynamic queries if you like). It is very good at online - offline scenarios, and is small enough to run on most devices. It has a RESTful interface, so no drivers, no ADO.NET libraries. To query it you use MapReduce (now this is very common across the NoSQL space, but not ubiquitous) to create views, and these are written in a number of languages, though most of the documentation is for Javascript. CouchDB is also designed to crash, which is to say if something goes wrong, it just restarts the process (the Erlang process, or group of linked processes that is, not the entire CouchDB instance typically).
MongoDB is designed to be highly performant, has drivers, and seems like less of a leap for a lot of people in the .NET world because of this. I believe though that in crash situations it is possible to lose data (it doesn't offer the same level of transactional guarantees around writes that CouchDB does).
Now both of these are document databases, and as such they share in common that their data is unstructured. There are no tables, no defined schema - they are schemaless. They are not like a key-value store though, as they do insist that the data you persist is intelligible to them. With CouchDB this means the use of JSON, and with MongoDB this means the use of BSON.
There are many other differences between MongoDB and CouchDB and these are considered in the NoSQL space to be very close in their design!
Other than document databases, their are network oriented solutions like Neo4J, columnar stores (column oriented rather than row oriented in how they persist data), and many others.
Something which is common across most NoSQL solutions, other than MapReduce, is that they are not relational databases, and that the majority do not make use of SQL style syntax. Typcially querying follows an imperative mode of programming rather than the declarative style of SQL.
Another typically common trait is that absolute consistency, as typically provided by relational databases, is traded for eventual models of consistency.
My advice to anyone looking to use a NoSQL solution would be to first really understand the requirements they have, understand the SLAs (what level of latency is required; how consistent must that latency remain as the solutions scales; what scale of load is anticipated; is the load consistent or will it spike; how consistent does a users view of the data need to be, should they always see their own writes when they query, should their writes be immediately visible to all other users; etc...). Understand that you can't have it all, read up on Brewers CAP theorum, which basically says you can't have absolute consistence, 100% availability, and be partition tolerant (cope when nodes can't communicate). Then look into the various NoSQL solutions and start to eliminate those which are not designed to meet your requirements, understand that the move from a relational database is not trivial and has a cost associated with it (I have found the cost of moving an organisation in that direction, in terms of meetings, discussions, etc... itself is very high, preventing focus on other areas of potential benefit). Most of the time you will not need an ORM (the R part of that equation just went missing), sometimes just binary serialisation may be ok (with something like DB4O for example, or a key-value store), things like the Newtonsoft JSON/BSON library may help out, as may automapper. I do find that working with C#3 theere is a definite cost compared to working with a dynamic language like, say Python. With C#4 this may improve a little with things like the ExpandoObject and Dynamic from the DLR.
To look at your 3 specific questions, with all it depends on the NoSQL solution you adopt, so no one answer is possible, however with that caveat, in very general terms:
If persisting the object (or aggregate more likely) as a whole, your joins will typically be in code, though you can do some of this through MapReduce.
Again, it depends, but with Couch you would execute a GET over HTTP against either a specific resource, or against a MapReduce view.
Most likely nothing. Just keep an eye-out for the serialisation, deserialisation scenarios. The difficulty I have found comes in how you manage versions of your code. If the property is purely for pushing to an interface (GUI, web service) then it tends to be less of an issue. If the property is a form of internal state which behaviour will rely on, then this can get more tricky.
Hope it helps, good luck!
Just stop thinking about the database.
Think about modeling your domain. Build your objects to solve the problem at hand following good patterns and practices and don't worry about persistence.

What is a better practice ? Working with Dataset or Database

I have been developing many application and have been into confusion about using dataset.
Till date i dont use dataset and works into my application directly from my database using queries and procedures that runs on Database Engine.
But I would like to know, what is the good practice
Using Dataset ?
or
Working direclty on Database.
Plz try to give me certain cases also when to use dataset along with operation (Insert/Update)
can we set read/write lock on dataset with respect to our database
You should either embrace stored procedures, or make your database dumb. That means that you have no logic whatsoever in your db, only CRUD operations. If you go with the dumb database model, Datasets are bad. You are better off working with real objects so you can add business logic to them. This approach is more complicated than just operating directly on your database with stored procs, but you can manage complexity better as your system grows. If you have large system with lots of little rules, stored procedures become very difficult to manage.
In ye olde times before MVC was a mere twinkle in Haack's eye, it was jolly handy to have DataSet handle sorting, multiple relations and caching and whatnot.
Us real developers didn't care about such trivia as locks on the database. No, we had conflict resolution strategies that generally just stamped all over the most recent edits. User friendliness? < Pshaw >.
But in these days of decent generic collections, a plethora of ORMs and an awareness of separation of concerns they really don't have much place any more. It would be fair to say that whenever I've seen a DataSet recently I've replaced it. And not missed it.
As a rule of thumb, I would put logic that refers to data consistency, integrity etc. as close to that data as possible - i.e. in the database. Also, if I am having to fetch my data in a way that is interdependent (i.e. fetch from tables A, B and C where the relationship between A, B and C's contribution is known at request time), then it makes sense to save on callout overhead and do it one go, via a database object such as a function, procedure (as already pointed out by OMGPonies). For logic that is a level or two removed, it makes sense to have it where dealing with it "procedurally" is a bit more intuitive, such as in a dataset. Having said all that, rules of thumb are sometimes what their acronym infers...ROT!
In past .Net projects I've often done data imports/transformations (e.g. for bank transaction data files) in the database (one callout, all logic is encapsulated in in procedure and is transaction protected), but have "parsed" items from that same data in a second stage, in my .net code using datatables and the like (although these days I would most likely skip the dataset stage and work on them from a higher lever of abstraction, using class objects).
I have seen datasets used in one application very well, but that is in 7 years development on quite a few different applications (at least double figures).
There are so many best practices around these days that point twords developing with Objects rather than datasets for enterprise development. Objects along with an ORM like NHibernate or Entity Framework can be very powerfull and take a lot of the grunt work out of creating CRUD stored procedures. This is the way I favour developing applications as I can seperate business logic nicely this way in a domain layer.
That is not to say that datasets don't have their place, I am sure in certain circumstances they may be a better fit than Objects but for me I would need to be very sure of this before going with them.
I have also been wondering this when I never needed DataSets in my source code for months.
Actually, if your objects are O/R-mapped, and use serialization and generics, you would never need DataSets.
But DataSet has a great use in generating reports.
This is because, reports have no specific structure that can be or should be O/R-mapped.
I only use DataSets in tandem with reporting tools.

Are there good reasons not to use an ORM? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
During my apprenticeship, I have used NHibernate for some smaller projects which I mostly coded and designed on my own. Now, before starting some bigger project, the discussion arose how to design data access and whether or not to use an ORM layer. As I am still in my apprenticeship and still consider myself a beginner in enterprise programming, I did not really try to push in my opinion, which is that using an object relational mapper to the database can ease development quite a lot. The other coders in the development team are much more experienced than me, so I think I will just do what they say. :-)
However, I do not completely understand two of the main reasons for not using NHibernate or a similar project:
One can just build one’s own data access objects with SQL queries and copy those queries out of Microsoft SQL Server Management Studio.
Debugging an ORM can be hard.
So, of course I could just build my data access layer with a lot of SELECTs etc, but here I miss the advantage of automatic joins, lazy-loading proxy classes and a lower maintenance effort if a table gets a new column or a column gets renamed. (Updating numerous SELECT, INSERT and UPDATE queries vs. updating the mapping config and possibly refactoring the business classes and DTOs.)
Also, using NHibernate you can run into unforeseen problems if you do not know the framework very well. That could be, for example, trusting the Table.hbm.xml where you set a string’s length to be automatically validated. However, I can also imagine similar bugs in a “simple” SqlConnection query based data access layer.
Finally, are those arguments mentioned above really a good reason not to utilise an ORM for a non-trivial database based enterprise application? Are there probably other arguments they/I might have missed?
(I should probably add that I think this is like the first “big” .NET/C# based application which will require teamwork. Good practices, which are seen as pretty normal on Stack Overflow, such as unit testing or continuous integration, are non-existing here up to now.)
The short answer is yes, there are really good reasons. As a matter of fact there are cases where you just cannot use an ORM.
Case in point, I work for a large enterprise financial institution and we have to follow a lot of security guidelines. To meet the rules and regulations that are put upon us, the only way to pass audits is to keep data access within stored procedures. Now some may say that's just plain stupid, but honestly it isn't. Using an ORM tool means the tool/developer can insert, select, update or delete whatever he or she wants. Stored procedures provide a lot more security, especially in environments when dealing with client data. I think this is the biggest reason to consider. Security.
The sweet spot of ORMs
ORMs are useful for automating the 95%+ of queries where they are applicable. Their particular strength is where you have an application with a strong object model architecture and a database that plays nicely with that object model. If you're doing a new build and have strong modelling skills on your team then you will probably get good results with an ORM.
You may well have a handful of queries that are better done by hand. In this case, don't be afraid to write a few stored procedures to handle this. Even if you intend to port your app across multiple DBMS platforms the database dependent code will be in a minority. Bearing in mind that you will need to test your application on any platform on which you intend to support it, a little bit of extra porting effort for some stored procedures isn't going to make a lot of difference to your TCO. For a first approximation, 98% portable is just as good as 100% portable, and far better than convoluted or poorly performing solutions to work around the limits of an ORM.
I have seen the former approach work well on a very large (100's of staff-years) J2EE project.
Where an ORM may not be the best fit
In other cases there may be approaches that suit the application better than an ORM. Fowler's Patterns of Enterprise Application Architecture has a section on data access patterns that does a fairly good job of cataloguing various approaches to this. Some examples I've seen of situations where an ORM may not be applicable are:
On an application with a substantial legacy code base of stored procedures you may want to use a functionally oriented (not to be confused with functional languages) data access layer to wrap the incumbent sprocs. This re-uses the existing (and therefore tested and debugged) data access layer and database design, which often represents quite a substantial development and testing effort, and saves on having to migrate data to a new database model. It is often quite a good way wrapping Java layers around legacy PL/SQL code bases, or re-targeting rich client VB, Powerbuilder or Delphi apps with web interfaces.
A variation is where you inherit a data model that is not necessarily well suited to O-R mapping. If (for example) you are writing an interface that populates or extracts data from a foreign interface you may be better off working direclty with the database.
Financial applications or other types of systems where cross-system data integrity is important, particularly if you're using complex distributed transactions with two-phase commit. You may need to micromanage your transactions better than an ORM is capable of supporting.
High-performance applications where you want to really tune your database access. In this case, it may be preferable to work at a lower level.
Situations where you're using an incumbent data access mechanism like ADO.Net that's 'good enough' and playing nicely with the platform is of greater benefit than the ORM brings.
Sometimes data is just data - it may be the case (for example) that your application is working with 'transactions' rather than 'objects' and that this is a sensible view of the domain. An example of this might be a financials package where you've got transactions with configurable analysis fields. While the application itself may be built on an O-O platform, it is not tied to a single business domain model and may not be aware of much more than GL codes, accounts, document types and half a dozen analysis fields. In this case the application isn't aware of a business domain model as such and an object model (beyond the ledger structure itself) is not relevant to the application.
First off - using an ORM will not make your code any easier to test, nor will it necessarily provide any advantages in a Continuous Integration scenerio.
In my experience, whilst using an ORM can increase the speed of development, the biggest issues you need to address are:
Testing your code
Maintaining your code
The solutions to these are:
Make your code testable (using SOLID principles)
Write automated tests for as much of the code as possible
Run the automated tests as often as possible
Coming to your question, the two objections you list seem more like ignorance than anything else.
Not being able to write SELECT queries by hand (which, I presume, is why the copy-paste is needed) seems to indicate that there's a urgent need for some SQL training.
There are two reasons why I'd not use an ORM:
It is strictly forbidden by the company's policy (in which case I'd go work somewhere else)
The project is extremely data intensive and using vendor specific solutions (like BulkInsert) makes more sense.
The usual rebuffs about ORMs (NHibernate in particular) are:
Speed
There is no reason why using an ORM would be any slower than hand coded Data Access. In fact, because of the caching and optimisations built into it, it can be quicker.
A good ORM will produce a repeatable set of queries for which you can optimise your schema.
A good ORM will also allow efficient retrieval of associated data using various fetching strategies.
Complexity
With regards to complexity, using an ORM means less code, which generally means less complexity.
Many people using hand-written (or code generated) data access find themselves writing their own framework over "low-level" data access libraries (like writing helper methods for ADO.Net). These equate to more complexity, and, worse yet, they're rarely well documented, or well tested.
If you are looking specifically at NHibernate, then tools like Fluent NHibernate and Linq To NHibernate also soften the learning curve.
The thing that gets me about the whole ORM debate is that the same people who claim that using an ORM will be too hard/slow/whatever are the very same people who are more than happy using Linq To Sql or Typed Datasets. Whilst the Linq To Sql is a big step in the right direction, it's still light years behind where some of the open source ORMs are. However, the frameworks for both Typed Datasets and for Linq To Sql is still hugely complex, and using them to go too far of the (Table=Class) + (basic CRUD) is stupidly difficult.
My advice is that if, at the end of the day, you can't get an ORM, then make sure that your data access is separated from the rest of the code, and that you you follow the Gang Of Four's advice of coding to an interface. Also, get a Dependancy Injection framework to do the wiring up.
(How's that for a rant?)
There are a wide range of common problems for which ORM tools like Hibernate are a god-send, and a few where it is a hindrance. I don't know enough about your project to know which it is.
One of Hibernate's strong points is that you get to say things only 3 times: every property is mentioned in the class, the .hbm.xml file, and the database. With SQL queries, your properties are in the class, the database, the select statements, the insert statements, the update statements, the delete statements, and all the marshalling and unmarshalling code supporting your SQL queries! This can get messy fast. On the other hand, you know how it works. You can debug it. It's all right there in your own persistence layer, not buried in the bowels of a 3rd party tool.
Hibernate could be a poster-child for Spolsky's Law of Leaky Abstractions. Get a little bit off the beaten path, and you need to know deep internal workings of the tool. It can be very annoying when you know you could have fixed the SQL in minutes, but instead you are spending hours trying to cajole your dang tool into generating reasonable SQL. Debugging is sometimes a nightmare, but it's hard to convince people who have not been there.
EDIT: You might want to look into iBatis.NET if they are not going to be turned around about NHibernate and they want control over their SQL queries.
EDIT 2: Here's the big red flag, though: "Good practices, which are seen as pretty normal on Stack Overflow, such as unit testing or continuous integration, are non-existing here up to now." So, these "experienced" developers, what are they experienced in developing? Their job security? It sounds like you might be among people who are not particularly interested in the field, so don't let them kill your interest. You need to be the balance. Put up a fight.
There's been an explosion of growth with ORMs in recent years and your more experienced coworkers may still be thinking in the "every database call should be through a stored procedure" mentality.
Why would an ORM make things harder to debug? You'll get the same result whether it comes from a stored proc or from the ORM.
I guess the only real detriment that I can think of with an ORM is that the security model is a little less flexible.
EDIT: I just re-read your question and it looks they are copy and pasting the queries into inline sql. This makes the security model the same as an ORM, so there would be absolutely no advantage over this approach over an ORM. If they are using unparametrized queries then it would actually be a security risk.
I worked on one project where not using an ORM was very successfully. It was a project that
Had to be horizontally scalealbe from the start
Had to be developed quickly
Had a relatively simple domain model
The time that it would have taken to get NHibernate to work in a horizontally partitioned structure would have been much longer than the time that it took to develop a super simple datamapper that was aware of our partitioning scheme...
So, in 90% of projects that I have worked on an ORM has been an invaluable help. But there are some very specific circumstances where I can see not using an ORM as being best.
Let me first say that ORMs can make your development life easier if integrated properly, but there are a handful of problems where the ORM can actually prevent you from achieving your stated requirements and goals.
I have found that when designing systems that have heavy performance requirements that I am often challenged to find ways to make the system more performant. Many times, I end up with a solution that has a heavy write performance profile (meaning we're writing data a lot more than we're reading data). In these cases, I want to take advantage of the facilities the database platform offers to me in order to reach our performance goals (it's OLTP, not OLAP). So if I'm using SQL Server and I know I have a lot of data to write, why wouldn't I use a bulk insert... well, as you may have already discovered, most ORMS (I don't know if even a single one does) do not have the ability to take advantage of platform specific advantages like bulk insert.
You should know that you can blend the ORM and non-ORM techniques. I've just found that there are a handful of edge cases where ORMs can not support your requirements and you have to work around them for those cases.
For a non-trivial database based enterprise application, there really is no justifying not using an ORM.
Features aside:
By not using an ORM, you are solving a problem that has already
solved repeatedly by large communities or companies with significant
resources.
By using an ORM, the core piece of your data access layer benefits
from the debugging efforts of that community or company.
To put some perspective in the argument, consider the advantages of using ADO.NET vs. writing the code to parse the tabular data stream oneself.
I have seen ignorance of how to use an ORM justify a developer's disdain for ORMs For example: eager loading (something I noticed you didn't mention). Imagine you want to retrieve a customer and all of their orders, and for those all of the order detail items. If you rely on lazy loading only, you will walk away from your ORM experience with the opinion: "ORMs are slow." If you learn how to use eager loading, you will do in 2 minutes with 5 lines of code, what your colleagues will take a half a day to implement: one query to the database and binding the results to a hierarchy of objects. Another example would be the pain of manually writing SQL queries to implement paging.
The possible exception to using an ORM would be if that application were an ORM framework designed to apply specialized business logic abstractions, and designed to be reused on multiple projects. Even if that were the case, however, you would get faster adoption by enhancing an existing ORM with those abstractions.
Do not let the experience of your senior team members drag you in the opposite direction of the evolution of computer science. I have been developing professionally for 23 years, and one of the constants is the disdain for the new by the old-school. ORMs are to SQL as the C language was to assembly, and you can bet that the equivalents to C++ and C# are on their way. One line of new-school code equals 20 lines of old-school.
When you need to update 50000000 records. Set a flag or whatever.
Try doing this using an ORM without calling a stored procedure or native SQL commands..
Update 1 : Try also retrieving one record with only a few of its fields. (When you have a very "wide" table). Or a scalar result. ORMs suck at this too.
UPDATE 2 : It seems that EF 5.0 beta promises batch updates but this is very hot news (2012, January)
I think there are many good reasons to not use an ORM. First and foremost, I'm a .NET developer and I like to stick within what the wonderful .NET framework has already provided to me. It does everything I possibly need it to. By doing this, you stay with a more standard approach, and thus there is a much better chance of any other developer working on the same project down the road being able to pick up what's there and run with it. The data access capabilities already provided by Microsoft are quite ample, there's no reason to discard them.
I've been a professional developer for 10 years, lead multiple very successful million+ dollar projects, and I have never once written an application that needed to be able to switch to any database. Why would you ever want a client to do this? Plan carefully, pick the right database for what you need, and stick with it. Personally SQL Server has been able to do anything I've ever needed to do. It's easy and it works great. There's even a free version that supports up to 10GB data. Oh, and it works awesome with .NET.
I have recently had to start working on several projects that use an ORM as the datalayer. I think it's bad, and something extra I had to learn how to use for no reason whatsoever. In the insanely rare circumstance the customer did need to change databases, I could have easily reworked the entire datalayer in less time than I've spent fooling with the ORM providers.
Honestly I think there is one real use for an ORM: If you're building an application like SAP that really does need the ability to run on multiple databases. Otherwise as a solution provider, I tell my clients this application is designed to run on this database and that is how it is. Once again, after 10 years and a countless number of applications, this has never been a problem.
Otherwise I think ORMs are for developers that don't understand less is more, and think the more cool 3rd party tools they use in their app, the better their app will be. I'll leave things like this to the die hard uber geeks while I crank out much more great software in the meantime that any developer can pick up and immediately be productive with.
I think that maybe when you work on bigger systems you can use a code generator tool like CodeSmith instead of a ORM... I recently found this: Cooperator Framework which generates SQL Server Stored Procedures and also generates your business entities, mappers, gateways, lazyload and all that stuff in C#...check it out...it was written by a team here in Argentina...
I think it's in the middle between coding the entire data access layer and use a ORM...
Personally, i have (until recently) opposed to use an ORM, and used to get by with writing a data access layer encapsulating all the SQL commands. The main objection to ORMs was that I didn't trust the ORM implementation to write exactly the right SQL. And, judging by the ORMs i used to see (mostly PHP libraries), i think i was totally right.
Now, most of my web development is using Django, and i found the included ORM really convenient, and since the data model is expressed first in their terms, and only later in SQL, it does work perfectly for my needs. I'm sure it wouldn't be too hard to outgrow it and need to supplement with hand-written SQL; but for CRUD access is more than enough.
I don't know about NHibernate; but i guess it's also "good enough" for most of what you need. But if other coders don't trust it; it will be a prime suspect on every data-related bug, making verification more tedious.
You could try to introduce it gradually in your workplace, focus first on small 'obvious' applications, like simple data access. After a while, it might be used on prototypes, and it might not be replaced...
If it is an OLAP database (e.g. static, read-only data used for reporting/analytics, etc.) then implementing an ORM framework is not appropriate. Instead, using the database's native data access functionality such as stored procedures would be preferable. ORMs are better suited for transactional (OLTP) systems.
Runtime performance is the only real downside I can think of but I think that's more than a fair trade-off for the time ORM saves you developing/testing/etc. And in most cases you should be able to locate data bottlenecks and alter your object structures to be more efficient.
I haven't used Hibernate before but one thing I have noticed with a few "off-the-shelf" ORM solutions is a lack of flexibility. I'm sure this depends on which you go with and what you need to do with it.
There are two aspects of ORMs that are worrisome. First, they are code written by someone else, sometimes closed source, sometimes open source but huge in scope. Second, they copy the data.
The first problem causes two issues. You are relying on outsiders code. We all do this, but the choice to do so should not be taken lightly. And what if it doesn't do what you need? When will you discover this? You live inside the box that your ORM draws for you.
The second problem is one of two phase commit. The relational database is being copied to a object model. You change the object model and it is supposed to update the database. This is a two phase commit and not the easiest thing to debug.
I suggest this reading for a list of the downsides of ORMs.
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
For my self, I've found ORMs very useful for most applications I've written!
/Asger
The experience I've had with Hibernate is that its semantics are subtle, and when there's problems, it's a bit hard to understand what's going wrong under the hood. I've heard from a friend that often one starts with Criteria, then needs a bit more flexibility and needs HQL, and later notices that after all, raw SQL is needed (for example, Hibernate doesn't have union AFAIK).
Also with ORM, people easily tend to overuse existing mappings/models, which leads to that there's an object with lots of attributes that aren't initiliazed. So after the query, inside transaction Hibernate makes additional data fetching, which leads to potential slow down. Also sadly, the hibernate model object is sometimes leaked into the view architecture layer, and then we see LazyInitializationExceptions.
To use ORM, one should really understand it. Unfortunately one gets easily impression that it's easy while it's not.
Not to be an answer per se, I want to rephrase a quote I've heard recently. "A good ORM is like a Yeti, everyone talks about one but no one sees it."
Whenever I put my hands on an ORM, I usually find myself struggling with the problems/limitations of that ORM. At the end, yes it does what I want and it was written somewhere in that lousy documentation but I find myself losing another hour I will never get. Anyone who used nhibernate, then fluent nhibernate on postgresql would understand what I've been thru. Constant feeling of "this code is not under my control" really sucks.
I don't point fingers or say they're bad, but I started thinking of what I'm giving away just to automate CRUD in a single expression. Nowadays I think I should use ORM's less, maybe create or find a solution that enables db operations at minimum. But it's just me. I believe some things are wrong in this ORM arena but I'm not skilled enough to express it what not.
I think that using an ORM is still a good idea. Especially considering the situation you give. It sounds by your post you are the more experienced when it comes to the db access strategies, and I would bring up using an ORM.
There is no argument for #1 as copying and pasting queries and hardcoding in text gives no flexibility, and for #2 most orm's will wrap the original exception, will allow tracing the queries generated, etc, so debugging isnt rocket science either.
As for validation, using an ORM will also usually allow much easier time developing validation strategies, on top of any built in validation.
Writing your own framework can be laborious, and often things get missed.
EDIT: I wanted to make one more point. If your company adopts an ORM strategy, that further enhances its value, as you will develop guidelines and practices for using and implementing and everyone will further enhance their knowledge of the framework chosen, mitigating one of the issues you brought up. Also, you will learn what works and what doesnt when situations arise, and in the end it will save lots of time and effort.
Every ORM, even a "good one", comes saddled with a certain number of assumptions that are related to the underlying mechanics that the software uses to pass data back and forth between your application layer and your data store.
I have found that for moderately sophisticated application, that working around those assumptions usually takes me more time than simply writing a more straightfoward solution such as: query the data, and manually instantiate new entities.
In particular, you are likely to run into hitches as soon as you employ multi-column keys or other moderately-complex relationships that fall just outside the scope of the handy examples that your ORM provided you when you downloaded the code.
I concede that for certain types of applications, particularly those that have a very large number of database tables, or dynamically-generated database tables, that the auto-magic process of an ORM can be useful.
Otherwise, to hell with ORMs. I now consider them to basically be a fad.

Categories

Resources