I have an issue, which I assume many professional developers will run into. My workplace has adopted entity framework. We use it, and love it. However, we seem to have run into a very frustrating limitation.
Let's assume you have an object chain as such
A -> B -> C -> D
We are professionals, so these objects have a lot of data, and there is a lot of them in their respective database tables. It seems EF has a terrible time loading anything past object B. The SQL queries it generates are really inefficient and not good. The call would be something like
context.objects.include("bObjectName.cObjectName.dObjectName").FirstOrDefault(x => x.PK == somePK);
We have gotten around this by explicitly loading objects past that second level with the .Load() command. This works well for single objects. However, when we talk about a collection of objects, we start to run into issues with .Load().
Firstly, there does not seems to be a way to keep proxy tracking of objects in a collection without the virtual keyword. This makes sense because it needs to overwrite the get and set functions. However, this enables lazy loading, and .Load() doesn't map entities when lazy loading is enabled. I find this to be somewhat odd myself. If you remove the virtual keyword, .Load() does automatically link loaded objects to the relevant objects in context.
So here is the crux of my issue. I want proxy tracking, but also want .Load() to map the navigation properties for me. None of this would be an issue if EF could generate good queries. I understand why it can't, it has to be a one show fits all kind of thing.
So to load the third tier of objects I might create a loader function in my service layer that takes all the primary keys of the second tier of objects, and then calls a .Load() on them.
Does anyone have a solution for this? It seems like EF7, or Core 1.0 solves this by:
Removing lazy loading entirely, which we could shut off as well, but it would break a lot of older code.
Adding a new "ThenInclude" feature, which supposedly increases the efficiency of chaining includes massively.
If turning off lazy loading is the answer, that's fine, I just want to exhaust all options before I waste a lot of time redeveloping a huge webapps worth of service calls.
Does anyone have any ideas? I'm willing to give anything a shot. We are using EF6.
EDIT: It seems the answer is to shut off lazy loading at a context level, or upgrade to EF7. I'll change this if anyone else manages to find a solution whereby you can have proxy tracking with forced eager loading on a single object for EF6.
You're absolutely right about chained .Include() statements, the performance documentation warns against chaining more than 3 as each .Include() adds an outer join or union. I didn't know about ThenInclude but it sounds like a gamechanger!
If you keep your virtual navigation properties but turn off Lazy Loading on the DbContext
context.ObjectContext().ContextOptions.LazyLoadingEnabled = false;
then (as long as Change Tracking is enabled) you can do the following:
var a = context.aObjectNames.First(x=> x.id == whatever);
context.bObjectNames.Where(x=> x.aId == a.Id).Load()
This should populate a.bObjects
Related
I am creating a web application on the top of ASP.NET 6 framework. I am trying to figure out the best ORM to use for this project. I am leaning toward Entity Framework for the following reason
I'll be able to use LINQ to write my queries
I'll be able to access my relations easily and directly using native C# model.
Here is where the complication starts. This app will be connecting to a very large database with over 500 tables. Also, the app is going to be broken down into many small logical areas so it's easy for me to maintain it.
If Entity Framework is the way to go, how should I setup the DbContext so I can manage 500+ DbSet and the relations? In other words, should I create a single DbContext for the entire app even when my app is broken down into multiple Areas? Or should I create a DbContext for each area? But if I do that, what if I need to establish relation across multiple areas? For example, X model in X-area need to create a relation to B model in B-area and C model in C-area? I thought about introducing DbContext inheritance where CAreaDbContext would inherit from BAreaDbContext which inherits from AAreaDbContext but that would break real quick.
Is Entity Framework if the right framework for a large database app? If so, how can I manage the DbContext across multiple areas? If not, what would be the alternative to use without having to write plain SQL queries?
EF is perfectly fine for large databases. When mapping a large number of tables and relationships there is a single-time startup cost for the very first query as EF initializes and validates its mapping, but this is a static cost for an application, not each time a DbContext is initialized.
You can split the application across several DbContexts to help make organizing entities more logical and reduce those initial setup costs. This is generally referred to as using Bounded Contexts if you want to search up examples. These typically organize your application down to aggregate roots or top-level entities with everything else falling under those aggregates or serving as lookups, etc. Entities can be registered with multiple DbContexts, though you should aim to ensure that one aggregate root is nominated for being responsible for editing and creating a given entity.
The most important details to consider with EF and areas of performance and avoiding unwanted/unexpected behaviour would be to ensure you generally don't load more data than you need through the entities, more often than you need to.
Some general advice would include:
Absolutely AVOID the temptation to use the Generic Repository pattern with EF. Non-generic Repositories are great to facilitate unit testing or centralize important, common rules/validation, but Generic flavours lead to inefficient and expensive, or overly complex code, usually both.
Keep DbContext lifetimes as short as possible. For Web applications this should be kept no longer than the Request length (when using an IoC container for instance) or shorter. Worst case, use using blocks to scope your DbContext. The longer a DbContext is kept alive, the more entities it tracks, and the more it tracks, the more it needs to sift through looking for references when loading other entities that might have navigation properties and the slower it gets. Long-lived DbContexts can also get "poisoned" when you have an issue attempting to save entity changes. Those invalid entities will remain tracked by the DbContext and interfere with future unrelated SaveChanges calls until they are removed (Detached) or corrected.
Gain an understanding of Projection using Select or AutoMapper's ProjectTo method. Loading entire entity graphs will get expensive, especially if the DbContext is left to track all of those instances. Projection down to ViewModels/DTOs help ensure that only as much data is needed is ever loaded and transmitted and makes it crystal clear what is being passed around. (As opposed to passing detached entities, or worse, partially filled detached entities)
Understand IQueryable and everything that Linq can bring to working with the data. EF query building is extremely valuable, so you can leverage sorting, filtering, pagination, projection, as well as scenarios to get Counts and check existence (.Any()) all without fetching a ton of data via Entities. See point #1 to avoid falling into this trap.
Use ToList/ToListAsync sparingly and be aware that any logic you feed EF in Linq expressions needs to be able to translated down to SQL. Sometimes you will find yourself trying to build a query where EF complains that it cannot evaluate your expression. Things like calling private methods / unmapped properties. Adding a ToList before the expression will seem like a magic fix, forcing a client-side evaluation. This is an expensive operation as you are effectively fetching (and typically tracking) all entities up to that point then continuing in memory. This gets expensive for memory use.
Asynchronous methods are not a silver bullet and does not make queries faster. Awaiting asynchronous EF methods is very useful when you have queries that are going to take a while to run, or be called extremely often. My advice is to default to synchronous methods and test run your code against production-like volumes as early as possible. I use 250ms as a threshold, but pick something acceptable to you and profile your queries. Anything over that threshold is something that would likely benefit from being made asynchronous. Typically things like searches, especially ones that involve text match searches are good candidates as these can be a bit slow and are generally run fairly frequently by several users at a time. The same goes for any operation that might get called a lot through the course of an application by many users at the same time. async/await doesn't make queries faster, they make them slightly slower, but they do make your server more responsive by not hanging the request until the query finishes. Using this by default makes your code a touch slower and a bit tougher to debug for no real benefit. (As it can easily be introduced as needed.)
Profile your queries. With traditional data access you would create your schema and write your access queries (Sprocs etc.) creating indexes as you go. With EF building your queries, indexing becomes more of a reactionary process where you might add your typical indexes, but should look at the queries being run in a production-like scenario to refine indexes based on high-volume queries that EF is building. This also provides key insight into other inefficiencies that might creep into your queries, as well as performance problems like lazy loading being tripped. Expensive queries should be investigated and optimized where possible.
Prepare to employ things like queuing for truly expensive queries. Systems will often call for things like Reports and data exports or just really expensive query options. Aim to set reasonable expectations by default so for instance avoiding things like string Contains() in text searches opting for string StartsWith(). Where you do need to support expensive queries, build a mechanism to allow users/processes to queue the query details as a request and employ a background worker/pool to pick up and process these requests. The temptation might be to just employ async/await here but the important thing is to avoid situations where too many of these queries are kicked off at once. Queries like this will "touch" a lot of data leading to locks and deadlocks in a system. Users have a bad tendency to repeatedly kick off actions when it looks like one isn't responding which compounds the problem on the back-end.
I've been learning DDD for the past few days and struggling to understand some core concepts of aggregate roots. Maybe somebody could give me push in the right direction and elaborate what's best practice in this scenario:
To make this example less complex, let's say we have a domain of two entities: restaurant and opening time. Restaurant is defined as aggregate root.
From what I understood in all the online examples (correct me if I'm mistaken), the aggregate root is eager loading all sub-entities every time I need an instance of it. So whenever I want to call call a method on the restaurant, all opening times are loaded (no matter if they are used or not).
In this example I want to verify that there is no intersection of opening times when a new time get's added to that restaurant. In this case the eager loading of every other opening time makes sense because I need to compare them with the existing ones.
BUT: this is kind of restricting because I know every time I want to add another collection of something (let's say restaurant images), the SQL load get's heavier and heavier even though most of the methods only require one of the collections.
I could think of two possible solutions:
Lazy loading
Lazy loading opening times / sub entities through entity framework property proxies. So the aggregate root can exist without eager loading them but whenever they are needed they can be accessed.
However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice. Maybe somebody could explain why.
Smaller aggregate roots
Of course I could define the opening time itself as an aggregate root but then I need to take the business logic (in this case the verification of intersections) outside of the model.
In all the examples above I'm only talking about the command side (not querying or serializing).
Maybe I'm missing some fundamental ideas. How should the aggregate roots be organized in this example and why is lazy loading considered bad practice?
EDIT
Not sure why this question was closed because of "opinion based". I'm asking for best practice and why lazy loading is not in this case.
However everywhere I was searching for an answer I've red that lazy loading in aggregate roots is considered bad practice.
That is not exactly the case. The actual restriction is for lazy loading other aggregates, due to direct references, at aggregate roots; referring other aggregates with their identifiers is recommended instead. This restriction has very good reasons behind it.
References to aggregates unnecessarily increases application's memory footprint (retrieving an entity that is not going to be used in a transaction); in highly concurrent use cases, where locks are in effect, degrades application performance (unnecessarily locking an entity); hampers data partitioning (both aggregates need to be processed in the same data node).
Each aggregate root defines its own consistency boundary; each transaction is meant to ensure one aggregate's consistency. Update operations (or transactions) across aggregates, communicated through domain events, are supposed to be eventually consistent.
If you are holding direct reference to another aggregate, thereby necessitating lazy loading, and performing updates on them, you should rethink your design.
The choice in your scenario, as usual, depends on the business context or the domain it deals with. If you think OpeninigTime is a separate aggregate, with its own consistency boundary, you should hold only its id and publish domain events, containing the id, that the OpeningTime aggregate will handle by retrieving the appropriate aggregate. If, however, it is not the case (that seems to be more likely), you may very much hold reference, lazy load, and perform updates on it.
This problem classifies as a set validation problem and there's a few potential solutions depending on the domain needs.
Strong Consistency
Create an AR to protect the entire set. In the above example if such set has a reasonable length you may create a dedicated RestaurantSchedule AR. You may even partition such AR further into RestaurantWeekSchedule where you'd have 1 AR for each week. When you want to add/remove opening days it would create/load the AR of the given week. Another option could be to tweak the ORM to load only a subset of the collection e.g. schedule = scheduleRepo.loadForWeek(openingTime.week()); schedule.add(openingTime);. Optimistic locking would still allow to detect conflicts.
Enforce the rule in the DB. If you have a relational DB for instance you may have OpeningTime as an AR and then use unique constraints to prevent violations. The rule would end up living outside the domain, but it may be acceptable. Uniqueness rules aren't that interesting.
Eventual Consistency
A very common way to deal with set validation problems is to avoid trying to avoid the violation and focus on detecting and correcting it after the fact either through manual or automated compensating actions. This can be as simple as an exception report that displays overlapping entries or more complex such as a single-threaded listener to OpeningTimeAdded events that checks for conflicts and marks such entries for correction.
For your question why is lazy loading considered as bad practice:
Some issue I have with lazy loading is that you need to be constantly aware of potential performance issues. On the other hand However LL is a decent default behaviour for ORMs. It is great if you have long term running applications but it is pointless implementing it on top of your services. I think it depends a lot on the data you are using so its neither good or bad.
We are using AutoMapper 3.1.1.0 in our Dot net application.
We are having lots of classes which neeed to map.
Time required to initialize mapping is almost 22 seconds.
We are having almost 1327 DTO which need to mapped.
And we can say that each DTO having average 8 properties.
My concern is for each message we check in list of 1327 mapped DTO,
and then use
if (MappingManager.MessageMappings.ContainsKey(message.GetType()))
{
var myMessage = Mapper.Map(message, message.GetType(), MappingManagerFile.MessageMappings[message.GetType()]);
So it hurts performance.
Do we need to Dispose after use, or automapper take care itself?
In task manager the component which do this conversion is taking lots of memory.
So please suggest what alterantives we need to use to improve performance.
Later versions of AutoMapper lazily compile the configuration. There's still some startup time, discovering and mapping types, but compiling the runtime mapping function is done lazily.
I would suggest trying the 5.0 release and comparing the numbers.
Having that many entities mapped with automapper is going to take some time. Are you eager loading your entities or using lazy loading? I have seen these issues in the past when using lazy loading as Automapper generates a large number of database hits when getting all of the relational data.
Eager loading may be your best bet here, or I would recommend only loading exactly what you need. Seems like a lot of data to load at once.
I want to add a history feature for one of my application's entities. My application is using linq to sql as O/RM.
The solution I have selected for saving previous versions of entity is to serialize old version and save serialized versions somewhere.
Anyway, I was looking for the best way to serialize linq to sql entities. One of the best solutions I found was simply setting serialization mode to Unidirectional in DataContext object, and then use .net serializer. It seems that the method works well.
But, my question is, do this method have any side effect on data context or any performance hit? For example, does lazy loading of objects works intact after turning unidirectional serialization mode on?
I've searched a lot but found nothing.
And, do you think I should go another way to serialize objects? Any cheaper method?
PS 1: I don't like using DTOs, because I should maintain compatibility between DataContext object and DTOs.
PS 2: Please note, that the application I'm working on is relatively large and currently under heavy load. So I should be careful about changes in O/RM behavior or performance issues, because I could not review the entire application + online versions.
Thanks a lot :)
Re "and then use .net serializer", the intent of unidirectional mode is to support DataContractSerializer (specifically). The effect is basically just the addition of the [DataContract]/[DataMember] flags, but: based on experience, and despite your reservations, I would strongly advise you to go the DTO route:
ORMs, due to lazy loading of both sub-collections and properties (sometimes) makes it very hard to predict exacty what is going to happen
and at the same time, make it hard to monitor when data load is happening (N+1 in your serialization code can go un-noticed)
you have very little control over the size of the model (how many levels of descendants etc) are going to be serialized
and don't even get me started on parent navigation
the deserialized objects will not have a data-context, which means that their lazy loading won't work - this can lead to unexpected behaviours
you can get into all sorts of problems if you can't version your data model separately from your serialization model
there are some... "peculiarities" that impact LINQ-to-SQL and serialization (from memory, the most "fun" one is a difference in how the IList.Add and IList<T>.Add is implemented, meaning that one triggers data-change flags and one doesn't; nasty)
you are making everything dependent on your data model, which also means that you can never change your data model (at least, not easily)
a lot of scenarios that involve DTO are read-based, not write-based, so you might not even need the ORM overheads in the first place - for example, you might be able to replace some hot code-path pieces with lighter tools (dapper, etc) - you can't do that well if everything
is tied to the data model
I strongly advocate biting the bullet and creating a DTO model that matches what you want to serialize; the DTO model would be incredibly simple, and should only take a few minutes to throw together, including the code to map to/from your data model. This will give you total control over what is serialized, while also giving you freedom to iterate without headaches, and do things like use different serializers
We have a rather large ASP.NET MVC project using LINQ to SQL which we are in the process of migrating to Windows Azure.
Now, we need to serialize objects for storing in the Azure distributed cache, and setting "Serialization Mode" to "Unidirectional" in the .dbml file, and thus automatically decorating the generated classes and properties with DataContract and DataMember attributes accordingly, seems to be the recommended way. However, this makes any relationships not yet loaded by LINQ to SQL to be lost when serializing and saved as null.
What would be the prefered way of proceeding, taking into account a couple of things:
As mentioned, it's a rather large project with the generated *.designer.cs file being close to 1.5MB
Disabling lazy loading completely would most likely be a big
performance hit due to many deep class relationships.
Changing ORM tool is something we are considering, but doing this at the same time as switching platforms would probably be a bad thing.
If this boils down to somehow manually specifying which objects and relations to serialize accross the whole project; using something like protobuf-net for some extra performance gains would probably not be a huge step.
However, this makes any relationships not yet loaded by LINQ to SQL to be lost when serializing and saved as null.
Yes, this is normal and expected when serializing - you are essentially taking a snapshot of what was available at that time, because lazy-loading depends on it being loaded via a data-context. It would be inadviseable for any tool to crawl the entire model looking for things to load, because that could keep going indefinitely, essentially bringing large chunks of unwanted data into play.
Options:
explicitly fetch (either pre-emptively via "loadwith", or by hitting the appropriate properties) the data you are interested in before serializing
or, load the data into a completely separate DTO model for serialization - in many ways, this is a re-statement of the first, since it will by necessity involve iterating over (projecting) the data you want, but it means you are creating the DTO to suit the exact shape you actually want to send