Ways and techniques to get defense from SQL-injections - c#

i have some WinForms app (Framework to develop some simple apps), written in C#. My framework later would be used to develop win forms applications. Other developers they are beginers often and sometimes do not use Parameters - they write direct SQL in code. So first i need somehow to do protection in my framework base classes in C#.
Do solve this, one developer suggested me to using an ORM such as NHibernate, which takes care of this issue for you (and you don't have to write SQL statements yourself most of the time).
So I want to ask, is there some general alternatives(other ways and techniques) when i want to get defense from SQL-injections.Some links or examples would be very nice.

I don't see how there is any means to protect any SQL-based library from developer misuse without crippling its functionality (i.e. never giving direct access to the database).
Even with NHibernate or Linq to SQL it's possible to bypass the mapping layers and directly write a SQL statement.
Personally I think your best option would be to write in BIG BOLD TEXT that people who use your library need to PARAMETERIZE THEIR QUERIES. Failing that, you could try to do some kind of clumsy input sanitization, but that is honestly a flimsy second-rate hack.
Parameterized queries have been around for so long now, there's no excuse for anyone writing code that touches any database to not be aware of it or understand how to use it. The only cure for ignorance is education.
Maybe if we knew more about what this library is supposed to do with respect to data access, we could offer more targeted suggestions...

Agree with Aaronaught, a framework will not completely prevent the possibility. I would never substitute stringent validation on the data layer. Also provide an abstraction layer around your data access that you open up as the API rather then allow developers to connect directly to database.

It sounds like you need to train your developers to use parameter binding instead of looking for a technical solution.
One other alternative would be to keep the database layer in a different project and only allow your SQL savy developers to code in it. The GUI can be in a different project. That way the GUI programmers won't mess up your DB.

Security is usually a process, not a product or api.
It is also an evolving process, we have to adapt or get hacked.
A heavy handed approach:
You can force everyone to write stored procedures,and not allow
direct table access from the accounts that are allowed to talk to
the database. (GRANT EXECUTE ON etc)
Then you would need to ensure that nobody writes any fancy stored procedures
that take a sql query as a parameter and evaluates it dynamically.
This tends to slow down development, and I personally would not use it,
but I have consulted at several shops that did.

Related

How to create an application supporting multiple databases

I have a situation where I need to create an application which supports multiple databases. Multiple databases means the client can use any of the database like Oracle, SQL Server, MySQL, PostgreSQL at first.
I was trying to use ORM like NHibernate or MyBatis. But they have their limitation and need expertise to use.
So I decide to user the Data Providers provided by Microsoft like ADO.NET, OLEDB, ODP.NET etc.
Is there any way so that the my logic of database keep same for all the database? I have tried IDbConeection, IDbCommand etc but they have a problem in case of Oracle (Ref Cursor).
I there any way to achieve this? Some link or guide would be appreciated.
Edit:
There is problem with the DBTypes because they are enum define differently with different data providers.
Well, real-life applications are complicated like that. Before you know it, you want to replace the UI with an App, expose your logic as a WCF service, change the e-mail service with another service provider, test pieces of your code while mocking the DAL and change the database with another one.
The usual way to deal with this is to pass all calls through an interface that separates the implementation from the caller. After that, you can implement the different DAL's.
Personally I usually go with this approach:
First create a single DLL that contains all interfaces. Basically the idea is to expose all calls that your UI, App or whatever needs through the interface. From now on, your UI doesn't talk to databases or e-mail providers anymore.
If you need to get access to the interface, you use a factory pattern. Never use 'new'; that will get you in trouble in the long run.
It's not trivial to create this, and needs proper crafting. Usually I begin with a bare minimum version, hack everything else in the UI as a first version, then move everything that touches a DB or a service into the right project while creating interfaces and finally re-engineer everything until I'm 100% satisfied.
Interfaces should be built to last. Sure, changes will happen over time, but you really want to minimize these. Think about what the future will hold, read up on what other people came up with and ensure your interfaces reflect that.
Basically you now have a working piece of software that works with a single database, mail provider, etc. So far so good.
Next, re-engineer the factory. Basically you want to use the configuration settings to pick the right provider (the right DLL that implements your interface) for your data. A simple switch can suffice in most cases.
At this point I usually make it a habit to make a ton of unit tests for the interfaces.
The last step is to create DLL's for the different database providers. One of these will be loaded at run-time in your application.
I prefer simple Linq to SQL (I also use the library from LinqConnect) because it's pretty fast. I simply start by copy-pasting the other database provider, and then re-engineer it until it works. Personally I don't believe in a magic 'support all sql databases' solution anymore: In my experience, some databases will handle certain queries a much, much faster than other databases - which means that you will probably end up with some custom code for each database anyways.
This is also the point where your unit tests are really going to pay off. Basically, you can just start with copy-paste and give it a test. If you're lucky, everything will run right away with decent performance... if not, you know where to start.
Build to last
Build things to last. Things will change:
Think about updates and test them. Prefer automatic tests.
You don't want to tinker with your Factory every day. Use Reflection, Expressions, Code generation or whatever your poison is to save yourself the trouble of changing code.
Spend time writing tests. Make sure you cover the bulk. I cannot stress this enough; under pressure people usually 'save' time by not writing tests. You'll notice that this time that you 'save' will double back on you as support when you've gone live. Every month.
What about Entity Framework
I've seen a lot of my customers get into trouble with performance because of this. In the many times that I've tested it, I had the same experience. I noticed customers hacking around EF for a lot of queries to get a bit of decent performance.
To be fair, I gave up a few years ago, and I know they have made considerable performance improvements. Still, I would test it (especially with complex queries) before considering it.
If I would use EF, I'd implement all EF stuff in a 'database common DLL', and then derive classes from that. As I said, not all databases are the same with queries - and you might want to implement some hacks that are necessary to get decent performance. Your tests will tell.
Bonuses
Other reasons for programming through interfaces has a lot of advantages in combination with proxy's. To name a few, you can easily create log sinks, caching, statistics, WCF, etc. by simply implementing the same interface. And if you end up hating your current OR mapper some day, you can just throw it away without touching a single line of your app.
I believe Microsoft's Data Access Components would be suitable to you.
https://en.wikipedia.org/wiki/Microsoft_Data_Access_Components
How about writing microservices and connect them by using a rest api?
You (and maybe your team) could provide a core application which handles the logic and the ui. This is still based on your current technology. But instead of adding directly some kind of database connection, you could provide multiple types of microservices (based on asp.net or core) providing a rest api. You get your data from each database from such a microservice. So you would develop 1 micro service for e.g. MySQl and another one for MsSQL and when a new customer comes up with oracle you write a new small microservice which handles your expected API.
More info (based on .net core) is here: https://docs.asp.net/en/latest/tutorials/first-web-api.html
I think this is a teams discussion, which kind of technology you decide to use. But today I would recommend writing a micro service. It makes the attachment of a new app for a e.g. mobile device also much easier :)
Yes its possible.
Right now am working with the same scenario where my all logic related data( typically you can call meta data) reside inside one DB and date resides in another DB.
What you need to do. you should have connection related parameter in two different file or you can call these file as prop files. now you need to have connection concrete class which take the parameter from these prop file. so where you need to create connection just supply the prop files and it will created the db connection as desired.

Should I manually code ADO.Net database access?

I'm really late to the .Net game and struggling to learn ADO.Net. I prefer to learn how to do data access the "right way". Somewhere I've picked up on the idea that it's considered superior to manually code your own Connections, Data Adapters, DataSets, DataTables, and even command statements for updating, adding, and deleting rather than using Visual Studios data wizard. I understand from my reading that there are some things you can only do by writing your own command statements but it isn't completely clear to me what that might be.
Should I always code my own connections, data adapters, datasets and datatables? What about my update, insert, and delete command statements? How do I know when I should code those manually?
There is no right or wrong way. However I would suggest you first do things the "hard way" in that you write your own code for each of the data access routines you need. Of course that would mean you'll also need to know and understand SQL. Eventually you could use/build tools that generate all of your code just the way you need it.
Preferably you'll use stored procedures instead of SQL statements in code, because stored procedures provide an additional level of abstraction, abstracting your database schema from even your data layer and of course your business layer.
I'd used ADO.NET core (that is writing your own code for data access and such). I'd use DataSets/DataTable (if you have to) purely as in-memory data structures without using them to do automatic updates/deletes and the like. Stick to DataReaders to the extent possible converting them over to DTOs (for data retrieval methods). For data modification methods, your data layer should get DTOs as parameters (or simple data types as parameters if there are just one or two).
Personally I use tools to generate the data access layer code that uses ADO.NET core (and not EF or LINQ2SQL and such). That is my personal preference and depending on the size of your application it goes a very long way in towards performance as well as needing to have in-depth knowledge of only two things. Your database and SQL and C# code without also having to learn about the nuances of abstraction layers and specialized languages (in some cases).
In large projects (and teams) leaving the database schema and stored procedures to people specialized in that area becomes a necessity and requirement and in those cases using ADO.NET core also becomes a requirement.
On my blog I have posted an article where in I introduce a tool that generates all of the code. The tool and source code are available for download. The tool also generates code for strongly typed datareaders. That is under the covers you're using a DataReader while in code it looks/feels like a DTO in terms of strongly typed properties.
Data Access Layer CodeGen
DataReader Wrappers - TypeSafe
in my own experience is preferred to always use hard code instead using smart control wizard.
I think you should learn how its done under the covers first and then pick your own abstraction layer of which there are many.
LINQ to SQL does a great job of automating common Db tasks. All your basic CRUD (Create,Read,Update,Delete) operations will be much easier to code by using a DataContext dbml file. The code is much easier to write, does not rely on strings, is compatible with other ADO.NET commands (You can execute a direct DbCommand against your DataContext, and it is more highly optimized than anything most people will write (Especially a beginner!). You will save yourself a whole lot of time by using something like LINQ to SQL or another ORM. Unless your objective is pure learning, you would be best off by creating a working DataContext, and analyzing the source to see how it is working instead of teaching yourself ADO.NET. The fact that you are at a point where you need to ask this question, probably indicates that you will not add value to your application by writing your own boiler plate DB access code.
It looks like a lot of people are recommending that you hard code your DAL first, before you use an ORM like LINQ to SQL. I would just like to point out that the logic involved in this line of thinking would necessitate that we also learn to code with IL before writing C# code, build a computer before we use one, and sail across the ocean before we take an international air plane.
There's not really going to be a black-and-white answer for this, but in my experience, I've always been better off coding my own stuff. This has largely been because I'm just an anal-retentive obsessive-compulsive control freak, and I just don't trust wizards to write code the way I want it written. I'm sure that many people agree with me, just as I'm sure that many people disagree with me.
The fact that OR/Ms exist is plenty of proof to prove that you don't always need to roll your own code. The fact that it's not mandatory is also proof that you aren't compelled to use it.
Do whatever feels right and meets the needs of your solution and its time and budgetary constraints.

Database Design In SQL Server or C#?

Should a database be designed on SQL Server or C#?
I always thought it was more appropriate to design it on SQL Server, but recently I started reading a book (Pro ASP.NET MVC Framework) which, to my understanding, basically says that it's probably a better idea to write it in C# since you will be accessing the model through C#, which does make sense.
I was wondering what everyone else's opinion on this matter was...
I mean, for example, do you consider "correct" having a table that specifies constants (like an AccessLevel table that is always supposed to contain
1 Everyone
2 Developers
3 Administrators
4 Supervisors
5 Restricted
Wouldn't it be more robust and streamlined to just have an enum for that same purpose?
A database schema should be designed on paper or with an ERD tool.
It should be implemented in the database.
Are you thinking about ORMs like Entity Framework that let you use code to generate the database?
Personally, I would rather think through my design on paper before committing it to a DB myself. I would be happy to use an ORM or class generator from this DB later on.
Before VS.NET 2010 I was using SQL Server Management Studio to design my databases, now I am using EF 4.0 designer, for me it's the best way to go.
If your problem domain is complex or its complexity grows as the system evolves you'll soon discover you need some meta data to make life easier. C# can be a good choice as a host language for such stuff as you can utilize its type-system to enforce some invariants (like char-columns length, null/not null restrictions or check-constraints; you can declared it as consts, enums, etc). Unfortunately i don't know utilities (sqlmetal.exe can export some meta but only as xml) that can do it out of the box, although some CASE tools probably can be customized. I'd go for some custom-made generator to produce the db schema from C# (just a few hours work comparing to learning, for example, customization options offered by Sybase PowerDesigner).
ORMs have their place, that place is NOT database design. There are many considerations in designing a database that need to be thought through not automatically generated no matter how appealing the idea of not thinking about design might be. There are often many things that need to be considered that have nothing to do with the application, things like data integrity, reporting, audit tables and data imports. Using an ORM to create a database that looks like an object model may not be the best design for performance and may not have the the things you really need in terms of data integrity. Remember even if you think nothing except the application will touch the database ever, this is not true. At some point the data base will need to have someone do a major data revision (to fix a problem) that is done directly on the database not through the application. At somepoint you are going to need need to import a million records from some other company you just bought and are goping to need an ETL process outside teh application. Putting all your hopes and dreams for the database (as well as your data integrity rules) is short-sighted.

Is everyone here jumping on the ORM band wagon?

Microsoft Linq to SQL, Entity Framework (EF), and nHibernate, etc are all proposing ORMS as the next generation of Data Mapping technologies, and are claiming to be lightweight, fast and easy. Like for example this article that just got published in VS magazine:
http://visualstudiomagazine.com/features/article.aspx?editorialsid=2583
Who all are excited about implementing these technologies in their projects? Where is the innovation in these technologies that makes them so great over their predecessors?
I have written data access layers, persistence components, and even my own ORMs in hundreds of applications over the years (one of my "hobbies"); I have even implemented my own business transaction manager (discussed elsewhere on SO).
ORM tools have been around for a long time on other platforms, such as Java, Python, etc. It appears that there is a new fad now that Microsoft-centric teams have discovered them. Overall, I think that is a good thing--a necessary step in the journey to explore and comprehend the concepts of architecture and design that seems to have been introduced along with the arrival of .NET.
Bottom line: I would always prefer to do my own data access rather than fight some tool that is trying to "help" me. It is never acceptable to give up my control over my destiny, and data access is a critical part of my application's destiny. Some simple principles make data access very manageable.
Use the basic concepts of modularity, abstraction, and encapsulation--so wrap your platform's basic data access API (e.g., ADO.NET) with your own layer that raises the abstraction level closer to your problem space. DO NOT code all your data access DIRECTLY against that API (also discussed elsewhere on SO).
Severely apply the DRY (Don't Repeat Yourself) principle = refactor the daylights out of your data access code. Use code generation when appropriate as a means of refactoring, but seek to eliminate the need for code generation whenever you can. Generally, code generation reveals that something is missing from your environment--language deficiency, designed-in tool limitation, etc.
Meanwhile, learn to use the available API well, particularly regarding performance and robustness, then incorporate those lessons into your own abstracted data access layer. For example, learn to make proper use of parameters in your SQL rather than embedding literal values into SQL strings.
Finally, keep in mind that any application/system that becomes successful will grow to encounter performance problems. Fixing performance problems relies more on designing them out rather than just "tweaking" something in the implementation. That design work will affect the database and the application, which must change in sync. Therefore, seek to be able to make such changes easily (agile) rather than attempt to avoid ever changing the application itself. In part, that eventually means being able to deploy changes without downtime. It is not hard to do, if you don't "design" away from it.
I'm a huge ORM advocate. Code generation with ORM saves my shop about 20-30% on most of our projects.
And we do contract development, so this is a big win.
Chris Lively made an interesting point about having to do a redeploy any time a query gets touched. This may be a problem for some people, but it does not touch us at all. We have rules against making production database changes directly.
Also, you can still rely on traditional sprocs and even views when appropriate... We are not dogmatically 100% ORM, that's for sure.
I have been on the ORM train for the longest time, since the free version of LLBLGen to the latest and greatest commercial product LLBLGen Pro. I think ORMs fit in very well for a lot of the common data input output systems.
That isn't to say they solve all problems however. It is a tool which can be used where it makes sense to be used. If your database schema is relativly close to how your business objects need to be, ORMs are the best.
It's not a bandwagon to jump on, is a reaction to a real problem! Object Relational Mapping (ORM) has been around for a long time and it solves a real problem.
Original Object Oriented(OO) languages were all about simulating real world problems using a computer language. It could be argued that if you are really using an OO language to build systems you will be simulating the real world problem domain using a Domain Driven Design (DDD). This logically takes you to a separation of concerns model in order to keep your DDD clean and clear from all the clutter of data persistence and application controls.
When you build systems following a DDD pattern and use a Relational database for persistence then you really need a good ORM or you will be spending too much time building and maintaining database crud (pun intended).
ORM is an old problem and was solved years ago by products like Object Lens and Top Link. Object Lens was a Smalltalk ORM built by ParkPlace in the 90's. Top Link was built by Object People for Smalltalk, then converted for Java, and is currently used by Oracle. Top Link has also been around since the 90's. DDD advocates are now beginning to clearly articulate the case for DDD and gaining mind share. Therefore ORM, by necessity, is becoming mainstream and Microsoft is just reacting as usual.
No. Not everyone is.
Here's the number one big ass elephant in the room with most of the ORM tools (especially LINQ to SQL:
You are guaranteed that ANY data related change will require a full redeployment of your application.
For example, my day job can currently fix most query problems by modifying an existing stored procedure and redeploying just that one piece. With Linq, et al, your data queries are moved into your actual code. Guess what that means?
ORM is a good match for people who get along ok with software that writes software for them; but if you are obsessive about controlling what's happening and why, ORM can be suboptimal particularly with database optimization. Any abstraction layer has costs and benefits; ORM has both, but the balance isn't right yet IMHO. And ORM, in its current form, ironically adds an abstraction layer that still puts classes and unabstacted database schemas to intimately together.
My experience is that it can help you get a proof-of-concept version together quickly, but can introduce refactoring requirements you may not be familiar with (at least yet.)
Add to that, that the tool is evolving, and best-practices and patterns are not well established, nor a concensus of the kind that lets other programmers (or future programmers) feel comfortable with your code. So I expect to see higher-than-usual refactoring requirements for a while.
I'll reserve judgment (optimistically) about where it will settle in terms of being mainstream. I wouldn't bet a current project on it at this point. My patterns for dealing with the impedance mismatch are satisfactory for my purposes.
You have to fight with the ORM system once you want to do anything beyond the simplest select, update or delete. And your performance goes into the toilet once you begin doing real stuff.
So no.
I look forward to the day my team starts looking into ORM solutions. Until that day, we are a pure DataSet/Stored Procedure shop and let me tell you that it isn't all biscuits and gravy being "pure".
Firstly, the new Linq to SQL is performing close to that of stored procs and data readers. We use datasets everywhere, so performance would improve. So far so good for ORM.
Secondly, stored procs have the added benefit of being released separate of code, but they also have the detriment of being released separate of code. Pretend for a second that you have a database with more than 5 systems connecting to it and more than 10 people working on it. Now think about managing all those stored procedure changes and dependencies, especially when there is a common code base. It is a nightmare...
Third, it is difficult to debug stored procs. They often result in erroneous behavior for any number of reasons. That is not to say the same could no result from the dynamic sql being generated by the ORM, but one less problem is one less problem. No permissions issues (though mostly resolved in SQL 2005), no more multi step development. Writing the query, testing the query, deploying the query, tying it into the code, testing the code, deploying the code. The query is part of the code and I see this as a good thing.
Fourth, you can still used stored procedures. Running some reports that take a long time? Stored procs are a better solution. Why? Query execution plans can be optimized by the database engine. I won't pretend to understand all the workings of the database, but I do know there are some limitations to optimizing dynamic sql currently and that is a trade off we make when going with an ORM. However, stored procs are not ruled out when using an ORM solution.
Really the biggest reason I see people avoiding an ORM is that they simply don't have experience with one. There will be an obvious learning curve and ignorance stage. However, if it is going to improve development performance and hardly hinder (or in my case improve) performance. It is a trade off worth making.
I'm a big fan as well, using EF and Linq-to-SQL. My reasons are:
Since LINQ is compiled and type safe, you don't get the problems of typos in "string-based" SQL. I don't know how many hours I've spent of my life tracking down an error in an SP or other SQL where a "tick" or some other keyword was in the wrong place.
The above and other factors make development faster.
Though there certainly is overhead compared to "closer to the metal" methods of database querying, none of us would be using .NET at all, or even C++ if performance was our #1 concern. For most apps, I've been able to get excellent performance from Linq-To-SQL, even without using the stored proc approach (the client-based compiled queries is my usual approach).
Certainly for some applications you still want to do things the old fashioned way though.
I guess what I meant was, what is the innovation that ORMs provide over building your DAL using traditional ADO.NET, SQL and mapping them to objects in code?
Here are the three major peices of my DAL and I am comparing with ORMs to see the benefits:
You still have to have a query in an ORM = SQL (SQL is more powerful by far)
Mapping code moves to configuration but still not eliminated, just shifts from one paradigm to another
Objects have to be defined and managed tightly relatedto your Data Schema unlike in the traditional approach which I can keep them decoupled.
Am I missing something?
I have been following Fluent-NHibernate very closely as it has some of the most potential I've ever seen in a project.
I am a big ORM guy, get the logic out of the database, use the database only for speed.
I love the speed you can develop an application. The biggest advantage, depending on the ORM, is you can change the back end without having to translate your business logic.
I switched to LINQ and have never looked back. DBLinq is awesome for doing other database than MSSQL. I have used it with MY SQL and it is GREAT.
not yet, still skeptical; like most microsoft products, i wait for SP2 or a year and a half before trusting them in a producton environment
and note that pretty much every new thing introduced by anyone, not just microsoft, is hailed as "lightweight, fast and easy" - take it with a block of salt. They do not advertise the problems/issues quite as loudly as the benefits/features! That's why we wait for early adopters to discover them.
This is not to disparage ORM or LINQ or anything like that; I'm reserving judgement until
I have time to evaluate them,
some need arises that only they can satisfy,
the technology appears stable and well-supported enough to risk in one of my clients' production environments, and/or
a client requests it
Please note: I've done ORM before, manually, and it worked just fine, so I have high hopes for the newer ORM systems.
If codesmith generates code based on your tables, aren't you still tightly coupled to your data schema? I would prefer a solution that decouples my objects from my database schema for mor flexibility in the architecture
That's from one of your comments - It's true, CodeSmith tightly couples you to your tables.
NHibernate on the other hand has allot of features that can help with that : you have Components so that in code you can have : Person with a property Address ,where Address is a separate class .
You also inheritance mapping. So it does a fair job of decoupling your schema from your domain.
We still use a hand rolled, repetitive cut'n'paste DAL where I work. It's extremely painful, complex, and error prone, but it's something all developers can understand. Although it is working for us at the moment, I don't recommend it as it begins to break down quickly on large projects. If someone doesn't want to go to full blown ORM, I'd at least advocate some sort of DAL generation.
I'm actually working on writing an ORM tool in .NET as a side project to keep me entertained.
SQL is dead to me. I hate writing it, especially having it in my code anywhere. Manually writing select/insert/update/delete queries for each object is a waste of time IMO. And don't even get me started on handling NULLs ("where col_1 = ?" vs "where col_1 is null") when dynamically generating queries. The ORM tools can handle that for me.
Also, having 1 place that SQL may be dynamically generated would hopefully go a long was to eliminating SQL injection attacks.
On the other hand, I've used Hibernate, and I absolutely hate it. In a real-word project, we ran into limitations, unimplemented bits, and bugs every few weeks.
Keeping the query logic DB side (usually as a view or stored procedure) also has the benefit of being available for DBAs to tune. That was a constant battle at my last job, using Hibernate. DBAs: "give us all the possible queries so we can tune" Devs: "uh, I don't know because Hibernate will generate them on the fly. I can give you some HQL and an XML mapping though!" DBAs: "Don't make me punch you in the face!"
I dislike the code generation used in most ORMs. In fact, code generation in general I find to be a weak tool that is usually indicative of using the wrong language in the first place.
In particular with .Net reflection, I don't see any need for code gen for ORM purposes.
Here's one strong opinion.
No, I dumped ORMs and switched to an smalltalk and an OODB: Gemstone. Better code, less code, faster development.

Are there good reasons not to use an ORM? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
During my apprenticeship, I have used NHibernate for some smaller projects which I mostly coded and designed on my own. Now, before starting some bigger project, the discussion arose how to design data access and whether or not to use an ORM layer. As I am still in my apprenticeship and still consider myself a beginner in enterprise programming, I did not really try to push in my opinion, which is that using an object relational mapper to the database can ease development quite a lot. The other coders in the development team are much more experienced than me, so I think I will just do what they say. :-)
However, I do not completely understand two of the main reasons for not using NHibernate or a similar project:
One can just build one’s own data access objects with SQL queries and copy those queries out of Microsoft SQL Server Management Studio.
Debugging an ORM can be hard.
So, of course I could just build my data access layer with a lot of SELECTs etc, but here I miss the advantage of automatic joins, lazy-loading proxy classes and a lower maintenance effort if a table gets a new column or a column gets renamed. (Updating numerous SELECT, INSERT and UPDATE queries vs. updating the mapping config and possibly refactoring the business classes and DTOs.)
Also, using NHibernate you can run into unforeseen problems if you do not know the framework very well. That could be, for example, trusting the Table.hbm.xml where you set a string’s length to be automatically validated. However, I can also imagine similar bugs in a “simple” SqlConnection query based data access layer.
Finally, are those arguments mentioned above really a good reason not to utilise an ORM for a non-trivial database based enterprise application? Are there probably other arguments they/I might have missed?
(I should probably add that I think this is like the first “big” .NET/C# based application which will require teamwork. Good practices, which are seen as pretty normal on Stack Overflow, such as unit testing or continuous integration, are non-existing here up to now.)
The short answer is yes, there are really good reasons. As a matter of fact there are cases where you just cannot use an ORM.
Case in point, I work for a large enterprise financial institution and we have to follow a lot of security guidelines. To meet the rules and regulations that are put upon us, the only way to pass audits is to keep data access within stored procedures. Now some may say that's just plain stupid, but honestly it isn't. Using an ORM tool means the tool/developer can insert, select, update or delete whatever he or she wants. Stored procedures provide a lot more security, especially in environments when dealing with client data. I think this is the biggest reason to consider. Security.
The sweet spot of ORMs
ORMs are useful for automating the 95%+ of queries where they are applicable. Their particular strength is where you have an application with a strong object model architecture and a database that plays nicely with that object model. If you're doing a new build and have strong modelling skills on your team then you will probably get good results with an ORM.
You may well have a handful of queries that are better done by hand. In this case, don't be afraid to write a few stored procedures to handle this. Even if you intend to port your app across multiple DBMS platforms the database dependent code will be in a minority. Bearing in mind that you will need to test your application on any platform on which you intend to support it, a little bit of extra porting effort for some stored procedures isn't going to make a lot of difference to your TCO. For a first approximation, 98% portable is just as good as 100% portable, and far better than convoluted or poorly performing solutions to work around the limits of an ORM.
I have seen the former approach work well on a very large (100's of staff-years) J2EE project.
Where an ORM may not be the best fit
In other cases there may be approaches that suit the application better than an ORM. Fowler's Patterns of Enterprise Application Architecture has a section on data access patterns that does a fairly good job of cataloguing various approaches to this. Some examples I've seen of situations where an ORM may not be applicable are:
On an application with a substantial legacy code base of stored procedures you may want to use a functionally oriented (not to be confused with functional languages) data access layer to wrap the incumbent sprocs. This re-uses the existing (and therefore tested and debugged) data access layer and database design, which often represents quite a substantial development and testing effort, and saves on having to migrate data to a new database model. It is often quite a good way wrapping Java layers around legacy PL/SQL code bases, or re-targeting rich client VB, Powerbuilder or Delphi apps with web interfaces.
A variation is where you inherit a data model that is not necessarily well suited to O-R mapping. If (for example) you are writing an interface that populates or extracts data from a foreign interface you may be better off working direclty with the database.
Financial applications or other types of systems where cross-system data integrity is important, particularly if you're using complex distributed transactions with two-phase commit. You may need to micromanage your transactions better than an ORM is capable of supporting.
High-performance applications where you want to really tune your database access. In this case, it may be preferable to work at a lower level.
Situations where you're using an incumbent data access mechanism like ADO.Net that's 'good enough' and playing nicely with the platform is of greater benefit than the ORM brings.
Sometimes data is just data - it may be the case (for example) that your application is working with 'transactions' rather than 'objects' and that this is a sensible view of the domain. An example of this might be a financials package where you've got transactions with configurable analysis fields. While the application itself may be built on an O-O platform, it is not tied to a single business domain model and may not be aware of much more than GL codes, accounts, document types and half a dozen analysis fields. In this case the application isn't aware of a business domain model as such and an object model (beyond the ledger structure itself) is not relevant to the application.
First off - using an ORM will not make your code any easier to test, nor will it necessarily provide any advantages in a Continuous Integration scenerio.
In my experience, whilst using an ORM can increase the speed of development, the biggest issues you need to address are:
Testing your code
Maintaining your code
The solutions to these are:
Make your code testable (using SOLID principles)
Write automated tests for as much of the code as possible
Run the automated tests as often as possible
Coming to your question, the two objections you list seem more like ignorance than anything else.
Not being able to write SELECT queries by hand (which, I presume, is why the copy-paste is needed) seems to indicate that there's a urgent need for some SQL training.
There are two reasons why I'd not use an ORM:
It is strictly forbidden by the company's policy (in which case I'd go work somewhere else)
The project is extremely data intensive and using vendor specific solutions (like BulkInsert) makes more sense.
The usual rebuffs about ORMs (NHibernate in particular) are:
Speed
There is no reason why using an ORM would be any slower than hand coded Data Access. In fact, because of the caching and optimisations built into it, it can be quicker.
A good ORM will produce a repeatable set of queries for which you can optimise your schema.
A good ORM will also allow efficient retrieval of associated data using various fetching strategies.
Complexity
With regards to complexity, using an ORM means less code, which generally means less complexity.
Many people using hand-written (or code generated) data access find themselves writing their own framework over "low-level" data access libraries (like writing helper methods for ADO.Net). These equate to more complexity, and, worse yet, they're rarely well documented, or well tested.
If you are looking specifically at NHibernate, then tools like Fluent NHibernate and Linq To NHibernate also soften the learning curve.
The thing that gets me about the whole ORM debate is that the same people who claim that using an ORM will be too hard/slow/whatever are the very same people who are more than happy using Linq To Sql or Typed Datasets. Whilst the Linq To Sql is a big step in the right direction, it's still light years behind where some of the open source ORMs are. However, the frameworks for both Typed Datasets and for Linq To Sql is still hugely complex, and using them to go too far of the (Table=Class) + (basic CRUD) is stupidly difficult.
My advice is that if, at the end of the day, you can't get an ORM, then make sure that your data access is separated from the rest of the code, and that you you follow the Gang Of Four's advice of coding to an interface. Also, get a Dependancy Injection framework to do the wiring up.
(How's that for a rant?)
There are a wide range of common problems for which ORM tools like Hibernate are a god-send, and a few where it is a hindrance. I don't know enough about your project to know which it is.
One of Hibernate's strong points is that you get to say things only 3 times: every property is mentioned in the class, the .hbm.xml file, and the database. With SQL queries, your properties are in the class, the database, the select statements, the insert statements, the update statements, the delete statements, and all the marshalling and unmarshalling code supporting your SQL queries! This can get messy fast. On the other hand, you know how it works. You can debug it. It's all right there in your own persistence layer, not buried in the bowels of a 3rd party tool.
Hibernate could be a poster-child for Spolsky's Law of Leaky Abstractions. Get a little bit off the beaten path, and you need to know deep internal workings of the tool. It can be very annoying when you know you could have fixed the SQL in minutes, but instead you are spending hours trying to cajole your dang tool into generating reasonable SQL. Debugging is sometimes a nightmare, but it's hard to convince people who have not been there.
EDIT: You might want to look into iBatis.NET if they are not going to be turned around about NHibernate and they want control over their SQL queries.
EDIT 2: Here's the big red flag, though: "Good practices, which are seen as pretty normal on Stack Overflow, such as unit testing or continuous integration, are non-existing here up to now." So, these "experienced" developers, what are they experienced in developing? Their job security? It sounds like you might be among people who are not particularly interested in the field, so don't let them kill your interest. You need to be the balance. Put up a fight.
There's been an explosion of growth with ORMs in recent years and your more experienced coworkers may still be thinking in the "every database call should be through a stored procedure" mentality.
Why would an ORM make things harder to debug? You'll get the same result whether it comes from a stored proc or from the ORM.
I guess the only real detriment that I can think of with an ORM is that the security model is a little less flexible.
EDIT: I just re-read your question and it looks they are copy and pasting the queries into inline sql. This makes the security model the same as an ORM, so there would be absolutely no advantage over this approach over an ORM. If they are using unparametrized queries then it would actually be a security risk.
I worked on one project where not using an ORM was very successfully. It was a project that
Had to be horizontally scalealbe from the start
Had to be developed quickly
Had a relatively simple domain model
The time that it would have taken to get NHibernate to work in a horizontally partitioned structure would have been much longer than the time that it took to develop a super simple datamapper that was aware of our partitioning scheme...
So, in 90% of projects that I have worked on an ORM has been an invaluable help. But there are some very specific circumstances where I can see not using an ORM as being best.
Let me first say that ORMs can make your development life easier if integrated properly, but there are a handful of problems where the ORM can actually prevent you from achieving your stated requirements and goals.
I have found that when designing systems that have heavy performance requirements that I am often challenged to find ways to make the system more performant. Many times, I end up with a solution that has a heavy write performance profile (meaning we're writing data a lot more than we're reading data). In these cases, I want to take advantage of the facilities the database platform offers to me in order to reach our performance goals (it's OLTP, not OLAP). So if I'm using SQL Server and I know I have a lot of data to write, why wouldn't I use a bulk insert... well, as you may have already discovered, most ORMS (I don't know if even a single one does) do not have the ability to take advantage of platform specific advantages like bulk insert.
You should know that you can blend the ORM and non-ORM techniques. I've just found that there are a handful of edge cases where ORMs can not support your requirements and you have to work around them for those cases.
For a non-trivial database based enterprise application, there really is no justifying not using an ORM.
Features aside:
By not using an ORM, you are solving a problem that has already
solved repeatedly by large communities or companies with significant
resources.
By using an ORM, the core piece of your data access layer benefits
from the debugging efforts of that community or company.
To put some perspective in the argument, consider the advantages of using ADO.NET vs. writing the code to parse the tabular data stream oneself.
I have seen ignorance of how to use an ORM justify a developer's disdain for ORMs For example: eager loading (something I noticed you didn't mention). Imagine you want to retrieve a customer and all of their orders, and for those all of the order detail items. If you rely on lazy loading only, you will walk away from your ORM experience with the opinion: "ORMs are slow." If you learn how to use eager loading, you will do in 2 minutes with 5 lines of code, what your colleagues will take a half a day to implement: one query to the database and binding the results to a hierarchy of objects. Another example would be the pain of manually writing SQL queries to implement paging.
The possible exception to using an ORM would be if that application were an ORM framework designed to apply specialized business logic abstractions, and designed to be reused on multiple projects. Even if that were the case, however, you would get faster adoption by enhancing an existing ORM with those abstractions.
Do not let the experience of your senior team members drag you in the opposite direction of the evolution of computer science. I have been developing professionally for 23 years, and one of the constants is the disdain for the new by the old-school. ORMs are to SQL as the C language was to assembly, and you can bet that the equivalents to C++ and C# are on their way. One line of new-school code equals 20 lines of old-school.
When you need to update 50000000 records. Set a flag or whatever.
Try doing this using an ORM without calling a stored procedure or native SQL commands..
Update 1 : Try also retrieving one record with only a few of its fields. (When you have a very "wide" table). Or a scalar result. ORMs suck at this too.
UPDATE 2 : It seems that EF 5.0 beta promises batch updates but this is very hot news (2012, January)
I think there are many good reasons to not use an ORM. First and foremost, I'm a .NET developer and I like to stick within what the wonderful .NET framework has already provided to me. It does everything I possibly need it to. By doing this, you stay with a more standard approach, and thus there is a much better chance of any other developer working on the same project down the road being able to pick up what's there and run with it. The data access capabilities already provided by Microsoft are quite ample, there's no reason to discard them.
I've been a professional developer for 10 years, lead multiple very successful million+ dollar projects, and I have never once written an application that needed to be able to switch to any database. Why would you ever want a client to do this? Plan carefully, pick the right database for what you need, and stick with it. Personally SQL Server has been able to do anything I've ever needed to do. It's easy and it works great. There's even a free version that supports up to 10GB data. Oh, and it works awesome with .NET.
I have recently had to start working on several projects that use an ORM as the datalayer. I think it's bad, and something extra I had to learn how to use for no reason whatsoever. In the insanely rare circumstance the customer did need to change databases, I could have easily reworked the entire datalayer in less time than I've spent fooling with the ORM providers.
Honestly I think there is one real use for an ORM: If you're building an application like SAP that really does need the ability to run on multiple databases. Otherwise as a solution provider, I tell my clients this application is designed to run on this database and that is how it is. Once again, after 10 years and a countless number of applications, this has never been a problem.
Otherwise I think ORMs are for developers that don't understand less is more, and think the more cool 3rd party tools they use in their app, the better their app will be. I'll leave things like this to the die hard uber geeks while I crank out much more great software in the meantime that any developer can pick up and immediately be productive with.
I think that maybe when you work on bigger systems you can use a code generator tool like CodeSmith instead of a ORM... I recently found this: Cooperator Framework which generates SQL Server Stored Procedures and also generates your business entities, mappers, gateways, lazyload and all that stuff in C#...check it out...it was written by a team here in Argentina...
I think it's in the middle between coding the entire data access layer and use a ORM...
Personally, i have (until recently) opposed to use an ORM, and used to get by with writing a data access layer encapsulating all the SQL commands. The main objection to ORMs was that I didn't trust the ORM implementation to write exactly the right SQL. And, judging by the ORMs i used to see (mostly PHP libraries), i think i was totally right.
Now, most of my web development is using Django, and i found the included ORM really convenient, and since the data model is expressed first in their terms, and only later in SQL, it does work perfectly for my needs. I'm sure it wouldn't be too hard to outgrow it and need to supplement with hand-written SQL; but for CRUD access is more than enough.
I don't know about NHibernate; but i guess it's also "good enough" for most of what you need. But if other coders don't trust it; it will be a prime suspect on every data-related bug, making verification more tedious.
You could try to introduce it gradually in your workplace, focus first on small 'obvious' applications, like simple data access. After a while, it might be used on prototypes, and it might not be replaced...
If it is an OLAP database (e.g. static, read-only data used for reporting/analytics, etc.) then implementing an ORM framework is not appropriate. Instead, using the database's native data access functionality such as stored procedures would be preferable. ORMs are better suited for transactional (OLTP) systems.
Runtime performance is the only real downside I can think of but I think that's more than a fair trade-off for the time ORM saves you developing/testing/etc. And in most cases you should be able to locate data bottlenecks and alter your object structures to be more efficient.
I haven't used Hibernate before but one thing I have noticed with a few "off-the-shelf" ORM solutions is a lack of flexibility. I'm sure this depends on which you go with and what you need to do with it.
There are two aspects of ORMs that are worrisome. First, they are code written by someone else, sometimes closed source, sometimes open source but huge in scope. Second, they copy the data.
The first problem causes two issues. You are relying on outsiders code. We all do this, but the choice to do so should not be taken lightly. And what if it doesn't do what you need? When will you discover this? You live inside the box that your ORM draws for you.
The second problem is one of two phase commit. The relational database is being copied to a object model. You change the object model and it is supposed to update the database. This is a two phase commit and not the easiest thing to debug.
I suggest this reading for a list of the downsides of ORMs.
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
For my self, I've found ORMs very useful for most applications I've written!
/Asger
The experience I've had with Hibernate is that its semantics are subtle, and when there's problems, it's a bit hard to understand what's going wrong under the hood. I've heard from a friend that often one starts with Criteria, then needs a bit more flexibility and needs HQL, and later notices that after all, raw SQL is needed (for example, Hibernate doesn't have union AFAIK).
Also with ORM, people easily tend to overuse existing mappings/models, which leads to that there's an object with lots of attributes that aren't initiliazed. So after the query, inside transaction Hibernate makes additional data fetching, which leads to potential slow down. Also sadly, the hibernate model object is sometimes leaked into the view architecture layer, and then we see LazyInitializationExceptions.
To use ORM, one should really understand it. Unfortunately one gets easily impression that it's easy while it's not.
Not to be an answer per se, I want to rephrase a quote I've heard recently. "A good ORM is like a Yeti, everyone talks about one but no one sees it."
Whenever I put my hands on an ORM, I usually find myself struggling with the problems/limitations of that ORM. At the end, yes it does what I want and it was written somewhere in that lousy documentation but I find myself losing another hour I will never get. Anyone who used nhibernate, then fluent nhibernate on postgresql would understand what I've been thru. Constant feeling of "this code is not under my control" really sucks.
I don't point fingers or say they're bad, but I started thinking of what I'm giving away just to automate CRUD in a single expression. Nowadays I think I should use ORM's less, maybe create or find a solution that enables db operations at minimum. But it's just me. I believe some things are wrong in this ORM arena but I'm not skilled enough to express it what not.
I think that using an ORM is still a good idea. Especially considering the situation you give. It sounds by your post you are the more experienced when it comes to the db access strategies, and I would bring up using an ORM.
There is no argument for #1 as copying and pasting queries and hardcoding in text gives no flexibility, and for #2 most orm's will wrap the original exception, will allow tracing the queries generated, etc, so debugging isnt rocket science either.
As for validation, using an ORM will also usually allow much easier time developing validation strategies, on top of any built in validation.
Writing your own framework can be laborious, and often things get missed.
EDIT: I wanted to make one more point. If your company adopts an ORM strategy, that further enhances its value, as you will develop guidelines and practices for using and implementing and everyone will further enhance their knowledge of the framework chosen, mitigating one of the issues you brought up. Also, you will learn what works and what doesnt when situations arise, and in the end it will save lots of time and effort.
Every ORM, even a "good one", comes saddled with a certain number of assumptions that are related to the underlying mechanics that the software uses to pass data back and forth between your application layer and your data store.
I have found that for moderately sophisticated application, that working around those assumptions usually takes me more time than simply writing a more straightfoward solution such as: query the data, and manually instantiate new entities.
In particular, you are likely to run into hitches as soon as you employ multi-column keys or other moderately-complex relationships that fall just outside the scope of the handy examples that your ORM provided you when you downloaded the code.
I concede that for certain types of applications, particularly those that have a very large number of database tables, or dynamically-generated database tables, that the auto-magic process of an ORM can be useful.
Otherwise, to hell with ORMs. I now consider them to basically be a fad.

Categories

Resources