We are currently using NHibernate for accessing our database.
We have database, which store a "configuration". We have the notion of "revision", meaning that every table in our database has a revision, and when we make any change(even a small one), every field in our database get duplicated(except the changes that will not be the same).
The goal is to be able to switch easily from one revision from another, and be able to delete one configuration and still be able to switch from an earlier or older revision.
It implies that when we change the configuration, we make a lot of writing in the database(and other application will have to read also it).
This steps can take a very long time(5-10 minutes, we have a lot of table), compared to 10-20 seconds to store it in xml.
After spending some time in analysis, we have the impression that NHibernate has to make a lot of reflection to map database to c# objects(using our hbm.xml files).
My questions:
How does NHibernate read/write properties in every object, with reflection, right?
Does it make reflection at every write, or is there some optimization(cache, ...?)
Is there a possibility to avoid this "reflection"? Like having some class created/compiled on build(like it's possible with entity framework)?
If I've a working NHibernate model, is there something I can do to "tune" the DB access, without changing the database?
Thank you very much for your help.
Before assuming it is relection. I would strongly recommend downloading (as a free trial at least) NHProf to see what is taking NHibernate so long. Or some other database profiler.
If I had to guess it was the number of database updates that is required here but I wouldn't guess about performance without getting some metrics first ;)
For instance it might be that you need to increase your batch size in NHibernate if you are doing a lot of small updates in one session, which can be done in your nh config file
<property name="adonet.batch_size">300</property>
You cannot avoir Reflection, but it is not a problem. NHibernate use a lot of Reflection to prepare and emit dynamic code. Generated code is faster than static code because msil allow you more thing than c#.
If implementation of reflection usage seems to be your issue you can juste extends NHibernate but write a byte code provider.
Personnaly, proxy generator, usertype and property accessors can be faster than builtin implementation.
Generally, performance issue caused by Nhibernate is often :
missing fetching
precision data causing unwanted update
bad mapping (structure or simply datatype)
bad database/datatable configuration
bad commit strategy (flush mode for exemple)
Related
We have a rather large ASP.NET MVC project using LINQ to SQL which we are in the process of migrating to Windows Azure.
Now, we need to serialize objects for storing in the Azure distributed cache, and setting "Serialization Mode" to "Unidirectional" in the .dbml file, and thus automatically decorating the generated classes and properties with DataContract and DataMember attributes accordingly, seems to be the recommended way. However, this makes any relationships not yet loaded by LINQ to SQL to be lost when serializing and saved as null.
What would be the prefered way of proceeding, taking into account a couple of things:
As mentioned, it's a rather large project with the generated *.designer.cs file being close to 1.5MB
Disabling lazy loading completely would most likely be a big
performance hit due to many deep class relationships.
Changing ORM tool is something we are considering, but doing this at the same time as switching platforms would probably be a bad thing.
If this boils down to somehow manually specifying which objects and relations to serialize accross the whole project; using something like protobuf-net for some extra performance gains would probably not be a huge step.
However, this makes any relationships not yet loaded by LINQ to SQL to be lost when serializing and saved as null.
Yes, this is normal and expected when serializing - you are essentially taking a snapshot of what was available at that time, because lazy-loading depends on it being loaded via a data-context. It would be inadviseable for any tool to crawl the entire model looking for things to load, because that could keep going indefinitely, essentially bringing large chunks of unwanted data into play.
Options:
explicitly fetch (either pre-emptively via "loadwith", or by hitting the appropriate properties) the data you are interested in before serializing
or, load the data into a completely separate DTO model for serialization - in many ways, this is a re-statement of the first, since it will by necessity involve iterating over (projecting) the data you want, but it means you are creating the DTO to suit the exact shape you actually want to send
I am working on an application with a kinda simple data model (4 tables including two small, having around 10 rows, and two bigger, having hundreds of rows).
I'm working with C# and currently use an OdbcDriver for my Data Access Layer.
I was wondering if there is any difference in terms of performance between this driver or NHibernate?
The application works but I'd like to know if installing NHibernate instead of a classic OdbcDriver would make it faster? If so, is the difference really worth installing NHibernate? (according to the fact that I have never used such technology)
Thanks!
Short answer: no, NHibernate will actually slow your performance in most cases.
Longer answer: NHibernate uses the basic ADO.NET drivers, including OdbcConnection (if there's nothing better), to perform the actual SQL queries. On top of that, it is using no small amount of reflection to digest queries into SQL, and to turn SQL results into lists of objects. This extra layer, as flexible and powerful as it is, is going to perform more slowly than a hard-coded "firehose" solution based on a DataReader.
Where NHibernate may get you the APPEARANCE of faster performance is in "lazy-loading". Say you have a list of People, who each have a list of PhoneNumbers. You are retrieving People from the database, just to get their names. A naive DataReader-based implementation may involve calling a stored procedure for the People that includes a join to their PhoneNumbers, which you don't need in this case. Instead, NHibernate will retrieve only People, and set a "proxy" into the reference to the list of PhoneNumbers; when the list needs to be evaluated, the proxy object will perform another call. If the phone number is never needed, the proxy is never evaluated, saving you the trouble of pulling phone numbers you don't need.
NHibernate isn't about making it faster and it'll alwasy be slower than just using the database primatives like you are (it uses them "under the hood").
In my opinion NHibernate about making a reusable entity layer that can be applied to different applications or at the very least reused in multiple areas in one medium to large application. Therefore moving your application to NHibernate would be a waste of time (it sounds very small).
You might get better performance by using a specific datbase driver for your database engine.
For amount of data in your database it won't make any difference. But in general using NHibernate will slow down application performance, but increase development speed. But this is generally true for all ORM's.
SOme hint: NHIbernate is not magic. It sits on top of ADO.NET. Want a faster driver? GET ONE. Why are yo using a slow outdated technilogy like ODbc anyway? WHat is your data source? Don't they support ANY newer standard like OLEDB?
I ultimately have two areas with a few questions each about Entity Framework, but let me give a little background so you know what context I am asking for this information in.
At my place of work my team is planning a complete re-write of our application structure so we can adhere to more modern standards. This re-write includes a completely new data layer project. In this project most of the team wants to use Entity Framework. I too would like to use it because I am very familiar with it from my using it in personal projects. However, one team member is opposed to this vehemently, stating that Entity Framework uses reflection and kills performance. His other argument is that EF uses generated SQL that is far less efficient than stored procedures. I'm not so familiar with the inner-workings of EF and my searches haven't turned up anything extremely useful.
Here are my questions. I've tried to make them as specific as possible. If you need some clarification please ask.
Issue 1 Questions - Reflection
Is this true about EF using reflection and hurting performance?
Where does EF use reflection if it does?
Are there any resources that compare performance? Something that I could use to objectively compare technologies for data access in .NET and then present it to my team?
Issue 2 Questions - SQL
What are the implications of this?
Is it possible to use stored procedures to populate EF entities?
Again are there some resources that compare the generated queries with stored procedures, and what the implications of using stored procedures to populate entities (if you can) would be?
I did some searching on my own but didn't come up with too much about EF under the hood.
Yes, it does like many other ORMs (NHibernate) and useful frameworks (DI tools). For example WPF cannot work without Reflection.
While the performance implications of using Reflection has not changed much over the course of the last 10 years since .NET 1.0 (although there has been improvements), with the faster hardware and general trend towards readability, it is becoming less of a concern now.
Remember that main performance hit is at the time of reflecting aka binding which is reading the type metadata into xxxInfo (such as MethodInfo) and this happens at the application startup.
Calling reflected method is definitely slower but is not considered much of an issue.
UPDATE
I have used Reflector to look at the source code of EF and I can confirm it heavily uses Reflection.
Answer for Issue 1:
You can take a look at exactly what is output by EF by examining the Foo.Designer.cs file that is generated. You will see that the resulting container does not use reflection, but does make heavy use of generics.
Here are the places that Entity Framework certainly does use reflection:
The Expression<T> interface is used in creating the SQL statements. The extension methods in System.Linq are based around the idea of Expression Trees which use the types in System.Reflection to represent function calls and types, etc.
When you use a stored procedure like this: db.ExecuteStoreQuery<TEntity>("GetWorkOrderList #p0, #p1", ...), Entity Framework has to populate the entity, and at very least has to check that the TEntity type provided is tracked.
Answer for Issue 2:
It is true that the queries are often strange-looking but that does not indicate that it is less efficient. You would be hard pressed to come up with a query whose acutal query plan is worse.
On top of that, you certainly can use Stored Procedures, or even Inline SQL with entity framework, for querying and for Creating, Updating and Deleting.
Aside:
Even if it used reflection everywhere, and didn't let you use stored procedures, why would that be a reason not to use it? I think that you need to have your coworker prove it.
I can comment on Issue 2 about Generated EF Queries are less efficient than Stored Procedures.
Basically yes sometimes the generated queries are a mess and need some tuning. There are many tools to help you correct this, SQL Profiler, LinqPad, and others. But in the end the Generated Queries may look like crap but they do typically run quickly.
Yes you can map EF entities to Procedures. This is possible and will allow you to control some of the nasty generated EF queries. In turn you could also map views to your entities allowing you to control how the views select the data.
I cannot think of resources but I must say this. The comparison to using EF vs SQL stored procedures is apples to oranges. EF provides a robust way of mapping your Database to your code directly. This combined with LINQ to Entity queries will allow your developers to quickly produce code. EF is an ORM where as SQL store procedures is not.
The entity framework likely uses reflection, but I would not expect this to hurt performance. High-end librairies that are based on reflection typically use light-weight code generation to mitigate the cost. They inspect each type just once to generate the code and then use the generated code from that point on. You pay a little when your application starts up, but the cost is negligible from there on out.
As for stored procedures, they are faster than plain old queries, but this benefit is often oversold. The primary advantage is that the database will precompile and store the execution plan for each stored procedure. But the database will also cache the execution plans it creates for plain old SQL queries. So this benefit varies a great deal depending on the number and type of queries your application executes. And yes, you can use stored procedures with the entity framework if you need to.
I don't know if EF uses reflection (I don't believe it does... I can't think of what information would have to be determined at run-time); but even if it did, so what?
Reflection is used all over the place in .NET (e.g. serializers), some of which are called all of the time.
Reflection really isn't all that slow; especially in the context of this discussion. I imagine the overhead of making a database call, running the query, returning it and hydrating the objects dwarf the performance overhead of reflection.
EDIT
Article by Rick Strahl on reflection performance: .Net Reflection and Performance(old, but still relevant).
Entity Framework generated sql queries are fine, even if they are not exactly as your DBA would write by hand.
However it generates complete rubbish when it comes to queries over base types. If you are planning on using a Table-per-Type inheritance scheme, and you anticipate queries on the base types, then I would recommend proceeding with caution.
For a more detailed explanation of this bizarre shortcoming see my question here. take special note of the accepted answer to that question --- this is arguably a bug.
As for the issue of reflection -- I think your co-worker is grasping at straws. It's highly unlikely that reflection will be the root cause of any performance problems in your app.
EF uses reflection. I didn't check it by I think it is the main point of the mapping when materializing instance of entity from database record. You will say what is a column's name and what is the name of a property and when a data reader is executed you must somehow fill the property which you only know by its name.
I believe that all these "performance like" issues are solved correctly by caching needed information for mapping. Performance impact caused by reflection will probably be nothing comparing to network communication with the server, executing the complex query, etc.
If we talk about EF performance check these two documents: Part 1 and Part2. It describes some reason why people sometimes think the EF is very slow.
Are stored procedures faster then EF queries? My answer: I don't think so. The main advantage of the linq is that you can build your query in the application. This is extreamly handy when you need anything like list filtering, paging and sorting + changing number of displayed columns, loading realted entities etc. If you want to implement this in a stored procedure you will either have tens of procedures for diffent query configurations or you will use a dynamic sql. Dynamic sql is exactly what uses EF. But in case of EF that query has compile time validation which is not the case of plain SQL. The only difference in performance is the amount of transfered data when you send whole query to the server and when you send only exec procecdeure and parameters.
It is true that queries are sometimes very strange. Especially inheritance produces sometimes bad queries. But this is already solved. You can always use custom stored procedure or SQL query to return entities, complex types or custom types. EF will materialize results for you so you don't need to bother with data readers etc.
If you want to be objective when presenting Entity framework you should also mention its cons. Entity framework doesn't support command batching. If you need to update or insert multiple entities each sql command have its own roundrip to database. Because of that you should not use EF for data migrations etc. Another problem with EF is almost no hooks for extensibility.
I have a pluggable system management tool. The architecture of this kind of thing is well understood (interfaces, publish/ subscribe, ....). How about the data store though. What do people do?
I need plugins to be able to add new entities, extend existing entities, establish new relationships, etc.
My thoughts (SQL), not necessarily well thought out
each plugin simply extends the schema when they are installed. In the old days changing the schema was a big no-no; now databases are very relaxed about this
plugins have their own tables. If 2 of them have an entity (say) person, then there are 2 tables p1_person and p2_person
plugins have their own database
invent some sort of flexible scheme where the tables are softly typed. Maybe many attributes packed into a single attribute. The ultimate is to have one big table called data, with key of table name & column name and a single data value.
Not SQL
object DB. I have no experience with these. Anybody care to pass on experience. db4o for example. Can I change the 'schema' of objects as the app evolves
NO-SQL
this is 'where its at' at the moment. Most of these seem to be aimed slightly differently than my needs. Anybody want to pass on experience with these
Apologies for the open ended question
My suggestion is go read about the entity framework
a lot of the situations you are describing can be solved (very elegantly) using table inheritance.
Your idea of one big table called data makes the hamsters in my computer cry ;)
The general trend is away from weakly typed schemas because they cannot be debugged at compile time. What you get from something like entity framework is a strongly typed extenislbe schema that you can code against using linq.
Object databases:
like you i havent played with them massivley - however the time when i was considering them was a time when there was no good ORM for .net and writing ado.net code was slowly killing me.
as for NO-SQL these are databases that meet a performance need. SQL performs badly in situations here there are lots of small writes occuring. I say badly tounge in cheek - it performs very well but when you scale to millions of concurrent users everything changes. My understanding of no sql is that it is a non rationalised format designed for lots of small fast writes and reads. The scale of sites that use these is usually very large.
OK - in response
I am currently lucky enough to be on a green field project so i am using EF to generate my schema.
On non greenfield projects I use sql scripts to update my table structures. As for implementing table inheritance in sql its very easy once you know the concept, its essentially a one to many relationship with a constraint that it will only ever be 0-1.
I wouldn't write .net code that updates the database structure ... that sounds like a disaster waiting to happen to me.
Beginning to think i have misunderstood what you are looking for. I find databases to be second nature as I have spent so long with them.
I haven't found a replacement for being meticulous about script management.
Currently, my entire website does updating from SQL parameterized queries. It works, we've had no problems with it, but it can occasionally be very slow.
I was wondering if it makes sense to refactor some of these SQL commands into classes so that we would not have to hit the database so often. I understand hitting the database is generally the slowest part of any web application For example, say we have a class structure like this:
Project (comprised of) Tasks (comprised of) Assignments
Where Project, Task, and Assignment are classes.
At certain points in the site you are only working on one project at a time, and so creating a Project class and passing it among pages (using Session, Profile, something else) might make sense. I imagine this class would have a Save() method to save value changes.
Does it make sense to invest the time into doing this? Under what conditions might it be worth it?
If your site is slow, you need to figure out what the bottleneck is before you randomly start optimizing things.
Caching is certainly a good idea, but you shouldn't assume that this will solve the problem.
Caching is almost always underutilized in ASP .NET applications. Any time you hit your database, you should look for ways to cache the results.
Serializing objects into the session can be costly in itself, but most likley faster than just hitting the database every single time. You are benefiting now from Execution Plan caching in SQL Server so it's very likely that what you're getting is optimal performance out of your stored procedure.
One option you might consider doing to increase performance is to astract your data into objects via LINQ to SQL (against your sprocs) and then use AppFabric to cache the objects.
http://msdn.microsoft.com/en-us/windowsserver/ee695849.aspx
As for your updates, you should do that directly against the sprocs, but you will also need to clear our the Cache in AppFabric for objects that are affected by the Insert/Update/Delete.
You could also do the same thing simply using the standard Cache as well, but AppFabric has some added benefits.
Use the SQL Profiler to identify your slowest queries, and see if you can improve them with some simple index changes (removing unused indexes, adding missing indexes).
You could very easily improve your application performance by an order of magnitude without changing your front-end app at all.
See http://sqlserverpedia.com/wiki/Find_Missing_Indexes
If you have look up data only, you can store it in Cache object. This will avoid the hits to DB. Only data that can be used globally should be stored in Cache.
If this data requires filtering, you can restore it from Cache, and filter the data before rendering.
Session can be used to store user specific data. But care must be taken that too much of session variables can easily cause performance problems.
This book may be helpful.
http://www.amazon.com/Ultra-Fast-ASP-NET-Build-Ultra-Scalable-Server/dp/1430223839