So, I have been using AutoMapper with the IQueryable extensions to select some really simple viewmodels for list views. This allows me to not load up an entire entity framework object, but I have ran into a little bit of a less than ideal situation where I need to pull a simple viewmodel for a single complex object.
userQuery.Where(u => u.Id == id).ProjectTo<SimpleUserViewModel>().FirstOrDefault();
I could do a normal AutoMapper.Map, but this pulls in the whole object and child objects, when I may only need a single property off of the child and I don't want to eat the database retrieval cost.
Is there a better way of approaching this for getting a single entity and emitting a select through entity framework for only grabbing the necessary objects?
It does look inefficient, but it isn't.
Just like many other LINQ methods, most notably the Select which it replaces, ProjectTo<> relies on deferred execution. It won't pull the data until it reaches the point of having to present (or act on) the data.
Common ways to trigger this execution are ToList, First, Single (including the OrDefault and Async variants for all of them). Essentially, any action that requires actual knowledge of the data set itself.
I know the feeling, it feels less elegant to not be able to do something like ProjectToSingle<SimpleUserViewModel>(x => x.Id == id). If it really bothers you, you can write this wrapper method yourself, essentially translating it to a Where/ProjectTo/Single chain.
I feel the same way, but I've gotten used to writing Where/ProjectTo/Single and it doesn't feel wrong anymore. It's still a lot better than having to write the include statements.
Also, as an aside, even if you weren't using Automapper, but you'd still want to cut down on the columns that you fetch (because you know you won't need all of them) instead of loading the whole entity, you'd still be required to use a Where/Select/Single method chain.
So Automapper didn't make the syntax any less elegant than it already was with regular LINQ/EF.
Related
I am creating a web application on the top of ASP.NET 6 framework. I am trying to figure out the best ORM to use for this project. I am leaning toward Entity Framework for the following reason
I'll be able to use LINQ to write my queries
I'll be able to access my relations easily and directly using native C# model.
Here is where the complication starts. This app will be connecting to a very large database with over 500 tables. Also, the app is going to be broken down into many small logical areas so it's easy for me to maintain it.
If Entity Framework is the way to go, how should I setup the DbContext so I can manage 500+ DbSet and the relations? In other words, should I create a single DbContext for the entire app even when my app is broken down into multiple Areas? Or should I create a DbContext for each area? But if I do that, what if I need to establish relation across multiple areas? For example, X model in X-area need to create a relation to B model in B-area and C model in C-area? I thought about introducing DbContext inheritance where CAreaDbContext would inherit from BAreaDbContext which inherits from AAreaDbContext but that would break real quick.
Is Entity Framework if the right framework for a large database app? If so, how can I manage the DbContext across multiple areas? If not, what would be the alternative to use without having to write plain SQL queries?
EF is perfectly fine for large databases. When mapping a large number of tables and relationships there is a single-time startup cost for the very first query as EF initializes and validates its mapping, but this is a static cost for an application, not each time a DbContext is initialized.
You can split the application across several DbContexts to help make organizing entities more logical and reduce those initial setup costs. This is generally referred to as using Bounded Contexts if you want to search up examples. These typically organize your application down to aggregate roots or top-level entities with everything else falling under those aggregates or serving as lookups, etc. Entities can be registered with multiple DbContexts, though you should aim to ensure that one aggregate root is nominated for being responsible for editing and creating a given entity.
The most important details to consider with EF and areas of performance and avoiding unwanted/unexpected behaviour would be to ensure you generally don't load more data than you need through the entities, more often than you need to.
Some general advice would include:
Absolutely AVOID the temptation to use the Generic Repository pattern with EF. Non-generic Repositories are great to facilitate unit testing or centralize important, common rules/validation, but Generic flavours lead to inefficient and expensive, or overly complex code, usually both.
Keep DbContext lifetimes as short as possible. For Web applications this should be kept no longer than the Request length (when using an IoC container for instance) or shorter. Worst case, use using blocks to scope your DbContext. The longer a DbContext is kept alive, the more entities it tracks, and the more it tracks, the more it needs to sift through looking for references when loading other entities that might have navigation properties and the slower it gets. Long-lived DbContexts can also get "poisoned" when you have an issue attempting to save entity changes. Those invalid entities will remain tracked by the DbContext and interfere with future unrelated SaveChanges calls until they are removed (Detached) or corrected.
Gain an understanding of Projection using Select or AutoMapper's ProjectTo method. Loading entire entity graphs will get expensive, especially if the DbContext is left to track all of those instances. Projection down to ViewModels/DTOs help ensure that only as much data is needed is ever loaded and transmitted and makes it crystal clear what is being passed around. (As opposed to passing detached entities, or worse, partially filled detached entities)
Understand IQueryable and everything that Linq can bring to working with the data. EF query building is extremely valuable, so you can leverage sorting, filtering, pagination, projection, as well as scenarios to get Counts and check existence (.Any()) all without fetching a ton of data via Entities. See point #1 to avoid falling into this trap.
Use ToList/ToListAsync sparingly and be aware that any logic you feed EF in Linq expressions needs to be able to translated down to SQL. Sometimes you will find yourself trying to build a query where EF complains that it cannot evaluate your expression. Things like calling private methods / unmapped properties. Adding a ToList before the expression will seem like a magic fix, forcing a client-side evaluation. This is an expensive operation as you are effectively fetching (and typically tracking) all entities up to that point then continuing in memory. This gets expensive for memory use.
Asynchronous methods are not a silver bullet and does not make queries faster. Awaiting asynchronous EF methods is very useful when you have queries that are going to take a while to run, or be called extremely often. My advice is to default to synchronous methods and test run your code against production-like volumes as early as possible. I use 250ms as a threshold, but pick something acceptable to you and profile your queries. Anything over that threshold is something that would likely benefit from being made asynchronous. Typically things like searches, especially ones that involve text match searches are good candidates as these can be a bit slow and are generally run fairly frequently by several users at a time. The same goes for any operation that might get called a lot through the course of an application by many users at the same time. async/await doesn't make queries faster, they make them slightly slower, but they do make your server more responsive by not hanging the request until the query finishes. Using this by default makes your code a touch slower and a bit tougher to debug for no real benefit. (As it can easily be introduced as needed.)
Profile your queries. With traditional data access you would create your schema and write your access queries (Sprocs etc.) creating indexes as you go. With EF building your queries, indexing becomes more of a reactionary process where you might add your typical indexes, but should look at the queries being run in a production-like scenario to refine indexes based on high-volume queries that EF is building. This also provides key insight into other inefficiencies that might creep into your queries, as well as performance problems like lazy loading being tripped. Expensive queries should be investigated and optimized where possible.
Prepare to employ things like queuing for truly expensive queries. Systems will often call for things like Reports and data exports or just really expensive query options. Aim to set reasonable expectations by default so for instance avoiding things like string Contains() in text searches opting for string StartsWith(). Where you do need to support expensive queries, build a mechanism to allow users/processes to queue the query details as a request and employ a background worker/pool to pick up and process these requests. The temptation might be to just employ async/await here but the important thing is to avoid situations where too many of these queries are kicked off at once. Queries like this will "touch" a lot of data leading to locks and deadlocks in a system. Users have a bad tendency to repeatedly kick off actions when it looks like one isn't responding which compounds the problem on the back-end.
So I've been developing with Entity + LINQ for a bit now and I'm really starting to wonder about best practices. I'm used to the model of "if I need to get data, reference a stored procedure". Stored procedures can be changed on the fly if needed and don't require code recompiling. I'm finding that my queries in my code are looking like this:
List<int> intList = (from query in context.DBTable
where query.ForeignKeyId == fkIdToSearchFor
select query.ID).ToList();
and I'm starting to wonder what the difference is between that and this:
List<int> intList = SomeMgrThatDoesSQLExecute.GetResults(
string.Format("SELECT [ID]
FROM DBTable
WHERE ForeignKeyId = {0}",
fkIdToSearchFor));
My concern is that that I'm essentially hard coding the query into the code. Am I missing something? Is that the point of Entity? If I need to do any real query work should I put it in a sproc?
The power of Linq doesn't really make itself apparent until you need more complex queries.
In your example, think about what you would need to do if you wanted to apply a filter of some form to your query. Using the string built SQL you would have to append values into a string builder, protected against SQL injection (or go through the additional effort of preparing the statement with parameters).
Let's say you wanted to add a statement to your linq query;
IQueryable<Table> results = from query in context.Table
where query.ForeignKeyId = fldToSearchFor
select query;
You can take that and make it;
results.Where( r => r.Value > 5);
The resulting sql would look like;
SELECT * FROM Table WHERE ForeignKeyId = fldToSearchFor AND Value > 5
Until you enumerate the result set, any extra conditions you want to decorate will get added in for you, resulting in a much more flexible and powerful way to make dynamic queries. I use a method like this to provide filters on a list.
I personally avoid to hard-code SQL requests (as your second example). Writing LINQ instead of actual SQL allows:
ease of use (Intellisense, type check...)
power of LINQ language (which is most of the time more simple than SQL when there is some complexity, multiple joins...etc.)
power of anonymous types
seeing errors right now at compile-time, not during runtime two months later...
better refactoring if your want to rename a table/column/... (you won't forget to rename anything with LINQ becaues of compile-time checks)
loose coupling between your requests and your database (what if you move from Oracle to SQL Server? With LINQ you won't change your code, with hardcoded requests you'll have to review all of your requests)
LINQ vs stored procedures: you put the logic in your code, not in your database. See discussion here.
if I need to get data, reference a stored procedure. Stored procedures
can be changed on the fly if needed and don't require code recompiling
-> if you need to update your model, you'll probably also have to update your code to take the update of the DB into account. So I don't think it'll help you avoid a recompilation most of the time.
Is LINQ is hard-coding all your queries into your application? Yes, absolutely.
Let's consider what this means to your application.
If you want to make a change to how you obtain some data, you must make a change to your compiled code; you can't make a "hotfix" to your database.
But, if you're changing a query because of a change in your data model, you're probably going to have to change your domain model to accommodate the change.
Let's assume your model hasn't changed and your query is changing because you need to supply more information to the query to get the right result. This kind of change most certainly requires that you change your application to allow the use of the new parameter to add additional filtering to the query.
Again, let's assume you're happy to use a default value for the new parameter and the application doesn't need to specify it. The query might include an field as part of the result. You don't have to consume this additional field though, and you can ignore the additional information being sent over the wire. It has introduced a bit of a maintenance problem here, in that your SQL is out-of-step with your application's interpretation of it.
In this very specific scenario where you either aren't making an outward change to the query, or your application ignores the changes, you gain the ability to deploy your SQL-only change without having to touch the application or bring it down for any amount of time (or if you're into desktops, deploy a new version).
Realistically, when it comes to making changes to a system, the majority of your time is going to be spent designing and testing your queries, not deploying them (and if it isn't, then you're in a scary place). The benefit of having your query in LINQ is how much easier it is to write and test them in isolation of other factors, as unit tests or part of other processes.
The only real reason to use Stored Procedures over LINQ is if you want to share your database between several systems using a consistent API at the SQL-layer. It's a pretty horrid situation, and I would prefer to develop a service-layer over the top of the SQL database to get away from this design.
Yes, if you're good at SQL, you can get all that with stored procs, and benefit from better performance and some maintainance benefits.
On the other hand, LINQ is type-safe, slightly easier to use (as developers are accustomed to it from non-db scenarios), and can be used with different providers (it can translate to provider-specific code). Anything that implements IQueriable can be used the same way with LINQ.
Additionally, you can pass partially constructed queries around, and they will be lazy evaluated only when needed.
So, yes, you are hard coding them, but, essentially, it's your program's logic, and it's hard coded like any other part of your source code.
I also wondered about that, but the mentality that database interactivity is only in the database tier is an outmoded concept. Having a dedicated dba churn out stored procedures while multiple front end developers wait, is truly an inefficient use of development dollars.
Java shops hold to this paradigm because they do not have a tool such as linq and hold to that paradigm due to necessity. Databases with .Net find that the need for a database person is lessoned if not fully removed because once the structure is in place the data can be manipulated more efficiently (as shown by all the other responders to this post) in the data layer using linq than by raw SQL.
If one has a chance to use code first and create a database model from the ground up by originating the entity classes first in the code and then running EF to do the magic of actually creating the database...one understands how the database is truly a tool which stores data and the need for stored procedures and a dedicated dba is not needed.
Facts:
I have a desktop app backed by a SQL CE database.
Many of the tables are read-only.
I'm using EF Code First to generate the database and handle crud operations.
I generally keep one DbContext open for the whole session and dispose on app exit. (Note: I've tried creating a new DbContext more frequently and found that performance was actually worse. Also, I don't ever need to roll back changes.)
For several of my entities, I have added some getters to provide necessary data derived from other properties of the entity. A few of these getters actually perform LINQ queries (which I suspect is part of my problem).
Everything is working, but I'm realizing now that I need to do some optimization. Performance is somewhat sluggish in general, but I'm especially concerned about one really giant query users need to run periodically that flattens pretty much the whole database into a single table then outputs it to a delimited text file. This query is taking much longer than I'd like.
One idea I have for speeding things up would be to store anything that is read-only in a List instead of grabbing it from my DbContext each time I need it. The tricky part is that, while I could easily grab all my Car entities from the DbContext and store them in a list, how would I ensure that all the "helper properties", such as public virtual ICollection<Wheel> Wheels { get; set; } or public virtual Engine Engine { get; set; } get stored as well? It doesn't do a lot of good to iterate through all the Car entities if all the associated entities are not iterated as well.
The other idea I have for improving performance is to come up with a way to cache the getters that are the most processor intensive. As I mentioned above, some of my getters actually perform LINQ queries. I know this is sapping performance, but since some of these are dependent on data that can change, caching becomes quite tricky. I have a feeling it would require implementing INotifyPropertyChanged.
What I'd like is a strategy for improving performance without throwing away all the good things that EF is already doing for me. There's no point in me storing something in a list, for example, if EF has already iterated the list and will not iterate the list again unless and until it becomes out of date.
Given my specific situation and the above background, what strategy(ies) do you recommend to speed up my app?
Clarification
As I wrote in comments below...
For the large query, I'm iterating through all my entities (putting each into a List<TEntity>), then doing a giant join, then selecting the actual data I need. For this final selection, some of the data is raw (a basic property of the entity), while some is calculated (a property getter that depends on basic properties of the entity). For the calculated properties, there are sometimes LINQ queries involved. So, a lot of the heavy lifting is done in C# rather than via the database (which is file-based, being SQL CE).
For everything else in my app. I directly query the database when possible and make no attempt to cache anything. If I could figure out a way to completely cache everything that is read-only into memory, I have to believe that would improve performance, but I'm not entirely clear on how to do that given that (1) I have entities that store other entities and (2) the entities have getters, some of which perform queries.
Caching is your friend: http://support.microsoft.com/kb/323290
I'll suggest performance tuning at the database level; creating relevant indexes if applicable.
I've never had to tune a SQLCE database, so a quick google lookup yielded http://www.sql-server-performance.com/2003/sql-server-ce/ and http://msdn.microsoft.com/en-us/library/ms172984(v=sql.90).aspx.
For the join query (your report view), you can introduce materialized views (if such exist in ce).
I'm working on creating a dashboard. I have started to refactor the application so methods relating to querying the database are generic or dynamic.
I'm fairly new to the concept of generics and still an amateur programmer but I have done some searching and I have tried to come up with a solution. The problem is not really building the query string in a dynamic way. I'm fine with concatenating string literals and variables, I don't really need anything more complex. The bigger issue for me is when I create this query, getting back the data and assigning it to the correct variables in a dynamic way.
Lets say I have a table of defects, another for test cases and another for test runs. I want to create a method that looks something like:
public void QueryDatabase<T>(ref List<T> Entitylist, List<string> Columns, string query) where T: Defect, new()
Now this is not perfect but you get the idea. Not everything about defects, test cases and test runs are the same but, I'm looking for a way to dynamically assign the retrieved columns to its "correct" variable.
If more information is needed I can provide it.
You're re-inventing the wheel. Use an ORM, like Entity Framework or NHibernate. You will find it's much more flexible, and tools like that will continue to evolve over time and add new features and improve performance while you can focus on more important things.
EDIT:
Although I think overall it's important to learn to use tools for something like this (I'm personally a fan of Entity Framework and have used it successfully on several projects now, and used LINQ to SQL before that), it can still be valuable as a learning exercise to understand how to do this. The ORMs I have experience with use XML to define the data model, and use code generation on the XML file to create the class model. LINQ to SQL uses custom attributes on the code-generated classes to define the source table and columns for each class and property, and reflection at run-time to map the data returned from a SqlDataReader to the properties on your class object. Entity Framework can behave differently depending on the version you're using, whether you use the "default" or "POCO" templates, but ultimately does basically the same thing (using reflection to map the database results to properties on your class), it just may or not use custom attributes to determine the mapping. I assume NHibernate does it the same way as well.
You are reinventing the wheel, yes, it's true. You are best advised to use an object-relational mapper off the "shelf". But I think you also deserve an answer to your question: to assign the query results dynamically to the correct properties, you would use reflection. See the documentation for the System.Reflection namespace if you want more information.
I have some difficulties implementing the repository and service pattern in my RavenDB project. The major concern is how my repository interface should look like because in RavenDB I use a couple of indexes for my queries.
Let's say I need to fetch all items where the parentid equals 1. One way is to use the IQueryable List() and get all documents and then add a where clause to select the items where the parentid equals 1. This seems like a bad idea because I can't use any index features in RavenDB. So the other approach is to have something like this, IEnumerable Find(string index, Func predicate) in the repository but that also seems like a bad idea because it's not generic enough and requires that I implement this method for if I would change from RavenDB to a common sql server.
So how can I implement a generic repository but still get the benefits of indexes in RavenDB?
This post sums it all up very nicely:
http://novuscraft.com/blog/ravendb-and-the-repository-pattern
First off, ask why you want to use the repository pattern?
If you're wanting to use the pattern because you're doing domain driven design, then as another of these answers points out, you need to re-think the intent of your query, and talk about it in terms of your domain - and you can start to model things around this.
In that case, specifications are probably your friend and you should look into them.
HOWEVER, let's look at a single part of your question momentarily before continuing with my answer:
seems like a bad idea because it's not generic enough and requires that I implement this method for if I would change from RavenDB to a common sql server.
You're going about it the wrong way - trying to make your system entirely persistence-agnostic at this level is asking for trouble - if you try hiding the unique features of your datastore from the queries themselves then why bother using RavenDB?
A method I tend to use in simple document-oriented (IE, I do talk in terms of data, which is what you appear to be doing), is to split up my queries from my commands.
Ask yourself, why do you want to query for your documents by parent ID? Is it to display a list on a page? Why are you trying to model this in terms of documents then? Why not model this in terms of a view model and use the most effective method of retrieving this data from RavenDB? (A query over an index (dynamic or otherwise)), stick this in a factory which takes 'some inputs' and generates 'the output' and if you do decide to change your persistence store, you can change these factories. (I go one step further in my ASP.NET MVC applications, and have single action controllers, and I don't call them controllers, making the query from those in most cases).
If you want to actually pull out your documents by parent id in order to update them or run some business logic across them, perhaps you've modelled them wrong - a write operation will typically only involve change to a single document, or in other words you should be modelling your documents around your transaction boundaries.
TL;DR
Think about what it is you actually want to achieve - why do you want to use the "Repository pattern" or the "Service pattern" - these words exist as ways of describing a scenario you might end up with if you model your application around your needs, as a common way of expressing the role of a certain object- not as something you need to shoehorn your every piece of functionality into.
Let's say I need to fetch all items
where the parentid equals 1.
First, stop thinking of your data access needs this way.
You DO NOT need to "fetch all items where the parentid equals 1". It will help to try and stop thinking in such a data oriented way.
What you need is to fetch all items with a particular parent. This is a concept that exists in your problem space (your application's domain).
The fact that you model this in the database with a foreign key and a field named parentid is an implementation detail. Encapsulate this, do not leak it throughout your application.
One way is to use the IQueryable
List() and get all documents and then
add a where clause to select the items
where the parentid equals 1. This
seems like a bad idea because I can't
use any index features in RavenDB. So
the other approach is to have
something like this, IEnumerable
Find(string index, Func predicate) in
the repository but that also seems
like a bad idea because
Both of these are bad ideas. What you are suggesting is requiring the code that calls your repository or query to have knowledge of your schema.
Why should the consumer of your repository care or know that there is a parentid field? If this changes, if the definition of some particular concept in your problem space changes, how many places in your code will have to change?
Every single place that fetches items with a particular parent.
This is bad, it is the antithesis of encapsulation.
My opinion is that you will want to model queries as explicit concepts, and not lambda's or strings passed around and used all over.
You can model queries explicitly with the Specification pattern, named query methods on a repository, Query Object pattern, etc.
it's not
generic enough and requires that I
implement this method for if I would
change from RavenDB to a common sql
server.
Well, that Func is too generic. Again, think about what your consuming code will need to know in order to use such a method of querying, you will be tying upper layers of your code directly to your DB schema doing this.
Also, if you change from one storage engine to another, you cannot avoid re-implementing queries where performance was enough of a factor to use storage-engine-specific aids (indexes in Raven, for example).
I would actually discourage you from using the repository pattern. In most cases, it is over-architecting and actually makes the code more complicated.
Ayende has made a number of posts to that end recently:
http://ayende.com/Blog/archive/2011/03/16/architecting-in-the-pit-of-doom-the-evils-of-the.aspx
http://ayende.com/Blog/archive/2011/03/18/the-wages-of-sin-over-architecture-in-the-real-world.aspx
http://ayende.com/Blog/archive/2011/03/22/the-wages-of-sin-proper-and-improper-usage-of-abstracting.aspx
I recommend just writing against Raven's native API.
If you feel that my response is too general, list some of the benefits you hope to gain from using another layer of abstraction and we can continue the discussion.