Methods for breaking apart a large DbContext with many relationships

Methods for breaking apart a large DbContext with many relationships - c#

A project I'm working on has DbContext that tracks a lot of different Entities. Due to the large number of relationships involved, it takes a long time to query from the context the first time around while it generates its views. In order to reduce the startup time, and better organize contexts into functional areas, I'm looking for ways to split it apart.
These are some methods I've tried so far, and problems I've seen with them:
Create a new smaller Context with a subset of DbSets from the huge Context.
This doesn't help, since EF seems to crawl through all the navigation properties and include all related entities anyway (according to LINQPad at least, which shows all the entities related to the Context when it's expanded in the connection panel). We have a few top-level entities that are far reaching, so there are very few subsets that can be fully isolated without removing navigation properties and doing a good amount of refactoring.
Split Entities into classes that include navigation properties, and ones that are just db fields, like so:
public class PersonLight
{
public int Id { get; set; }
public string Name { get; set; }
public int JobId { get; set; }
}
public class Person : PersonLight
{
public Job Job { get; set; }
}
public class ContextLight : DbContext
{
public virtual DbSet<PersonLight> People { get; set; }
}
No dice here as well. Even though Person isn't used at all, EF (or again, possibly just LINQPad) includes Person despite the fact that it can't be used. I assume this is because EF supports inheritance patterns, so it ends crawling related entities in this direction as well.
Do the same as #2, but with PersonLight and Person in different projects (or use partial classes in different projects). This is the best option so far, but it would be nice to have PersonFields right next to Person for easy reference.
So my questions are:
Are there any better ways to do this that I'm missing?
Why, in #3, does putting them in different projects seem to separate them enough that EF doesn't try to include both? I've tried putting them in different namespaces, but that doesn't do the trick.
Thanks.

Options to speed things along:
Generated views
Bounded Contexts
Ironically IIS app pool only needs to generate the view once.
Command line based on my tests, generates the view each time.
Not sure what linqpad does.
BTW I didn't originally add this link since you tagged it EF6.
But in case others aren't on EF6. There are some performance improvements reported. More information here:
EF6 Ninja edition

Related

EF Core without additional repository pattern

There seems to be a certain movement advocating that when we use EF Core we should avoid creating a Repository & Unit of work pattern because EF Core already implement those two and we can leverage this implicit implementation. That would be great because implementing those patterns is not always as straightforward as it would seem.
So here's the problem. When implementing repository the 'classic' way we have a place to put the code that builds our domain objects. Let me explain with an example; we have an Invoice and an InvoiceRow entities. Each Invoice has many InvoiceRows. I included only the navigational property for brevity.
public class Invoice
{
public int Id { get; set; }
public DateTime Date { get; set; }
public decimal Total { get; set; }
public List<InvoiceRow> InvoiceRows { get; }
}
public class InvoiceRow
{
public int Id { get; set; }
public Invoice Invoice { get; set; }
public decimal UnitPrice { get; set; }
public decimal RowPrice { get; set; }
}
Now, my business object is an Invoice with its rows, and this should be the only way to manipulate the invoices.
When using 'explicit' repository we would do something like:
public class InvoicesRepo
{
public AppDbContext AppDbContext { get; private set; }
public Invoice Find(int id)
{
return
AppDbContext.Invoices.Where(invoice => invoice.Id == id)
.Include(nameof(InvoiceRow))
.First();
}
}
This restricts the access to the Invoice to the method [InvoicesRepo].Find(id) that builds the invoice in the way that is expected by the domain logic code.
Is it possible to achieve this with bare EF Core? Maybe working with visibility of DbSets and/or additional features that I don't know? Since this seems to be quite a fundamental functionality of a full-blown repository, if it's not achievable, have I just destroyed the main argument of experts advocating for no (additional) repository when using EF Core?

Is it possible to achieve this with bare EF Core? Maybe working with visibility of DbSets and/or additional features that I don't know?
Sure, accepting that the DbContext is your repository doesn't mean you can't make design decisions and you have to have use the default DbContext design.
You can add reusable data access code to your DbContext for convenience and consistency, eg methods like:
public Invoice FindInvoice(int id)
{
this.Invoices.Where(invoice => invoice.Id == id)
.Include(nameof(InvoiceRow))
.First();
}
So for code that needs the standard shape of Invoice with InvoiceRows, they call this method. But for code that needs some nonstandard shape, they still can access the DbSets or IQueryable methods and construct a custom query.
You can even eliminate the DbSet properties, to more strongly guide users to use your custom methods, like:
public IQueryable<Invoice> Invoices => this.Invoices.Include(nameof(InvoiceRow));
Then to get Invoices without InvoiceRows a consumer would either add a custom projection to this, something like
db.Invoices.Where(i => i.CustomerID == custId).Select(i => new InvoiceDTO(i)).ToList();
or access the DbSet
var invoice = db.Set<Invoice>().Find(invoiceId);
And you can organize the methods on your DbContext by having it implement various interfaces.

Ok, this is just a preliminary answer based on many readings all around.
public class InvoiceQuery
{
public AppDbContext AppDbContext { get; private set; }
public int Id { get; set; }
public Invoice Execute()
{
return AppDbContext
.Invoices
.Include(nameof(Invoice.InvoiceRows))
.Where(invoice => invoice.Id == Id)
.FirstOrDefault();
}
}
My problem was that this is not substantially different from what we would have into a repository. That's from a practical point of view; from a theoretical point of view, this puts you on the right perspective while the repository is sort of misleading.
The reason why it's misleading is that there is no 1-1 association between entities and actions or queries that you can do on the database. Even in this case, Invoice is not just invoice, but is Invoice plus InvoiceRows. (By the way, I think that InvoiceQuery is a good name (and not InvoiceWithRowsQuery) because from a business logic point of view an invoice is a full-loaded invoice; an invoice with 0 rows is an empty invoice, not an partially-loaded invoice.)
So a query focuses on what you get, not on the entity you start from, because they can be more than one.
This "query" name is sort of counter-intuitive, because one would say that as you move towards the business logic, you stop seeing things like queries and you start seeing things like paramenters. And actually we have parameters, "query" is only the name of the container. Maybe we should call it Business Object Query, but it would be too long, but that's the meaning. So this query object is just a simple container for an EF Query; I will talk about this later on. In this class based implementation, besides being glorified as object, there is no addictional functionality. Maybe you would expect some additional functionality, instead you have something less. This missing thing is that they are no associated with an entity anymore. This is another counter-intuitive fact that proves that here we are dismantleing something: the contrived entity-repository view.
Having a so simple object poses a question. Why not a method on some related class? If we put it on the Entity, we go back to the repository-like organization, that seems flawed mainly because the missing 1-1 bla bla means that in many cases you couldn't tell which entity associate a query to. But some authors use a sort of 'container class' that is really no more than a container just it's not entitled to an entity (OrdersData for example). If you like me like classes that have a clear purpose, this sends you shivers down the spine. We started up saying that we have a solution that is better than the repository. That the repository has so many problems. We end up with a class that has "Data" in its name. How could a better solution be so lame? It feels like the repository alternative is just a bunch of stuff, not a steady and well designed as expected. On the other hand, this is just a way to collect queries (the same queries that you would have in a repository, the very same query that you would have in a query/command pattern...) under a name that is not that of an entity.
Yes, all this seems to boil down to a naming choice. Advocates of repository pattern and advocates of no-repository fiercely fighting each others. Then you look at the code, what it actually does. The code is the same and the names are all we are talking about. That is: if you look what repositoriests do in their repositories and non-repositoriests do in their whatever (not always clear) classes, they do exaclty the same things. But this is just an impression, I've to time-proof it.
In the repository model, having a repository for entity with its methods was reassuring. It provided a sort of scaffolding where to put all of the methods. Unfortunately that scaffolding seems to be too limitating, again because of the not 1-1 relation between entities and queries/commands. Yet we have to say that the root entities like Invoice (in DDD speaking Invoice are root, InvoiceRows are aggregate) would fit quite well in repository-style organization.
For this reason, in this query based solution the flatness of the Query collection is not fully satisfying, and this is another reason why I think I want to further elaborate on this topic.

How do I use the UserManager from project A in project B?

Both project A and project B are ASP NET Core 2.2 apps.
Project B uses Hangfire for background jobs and does very little else, and the fact that it uses Hangfire may not even be important (more on this at the bottom). Project A enqueues jobs on B's Hangfire.
Now, let's say I have my class representing a task, called Job. This is contained in project C, a plain old class library referenced by project B, and which in turns references other projects containing the entities it's working with.
Dependencies are to be injected into this class through the constructor:
public class Job
{
public Job(UserManager<ApplicationUser> userManager,
IThisRepository thisRepository,
IThatRepository thatRepository)
{
}
public void Execute(string userId)
{
// this is where the work gets done
}
}
and for the most part they do get injected: IThisRepository and IThatRepository are injected and they work... mostly.
In project B's Startup.cs, the one that is supposed to run this job, I manually and successfully registered those interfaces, along with the DbContext that they require a some other stuff.
UserManager was quite a bit harder to register manually because of all the parameters its constructor requires, so since I didn't really need it inside my job, I just decided to make a few changes.
Now, an example of the entities I'm working with is as follows:
public class Category
{
[Key]
public int Id { get; set; }
// several other properties of primitive types
public ApplicationUser User { get; set; }
[Required]
public string UserId { get; set; }
}
public class Dish
{
[Key]
public int Id { get; set; }
// several other properties of primitive types
public ApplicationUser User { get; set; }
[Required]
public string UserId { get; set; }
public Category Category { get; set; }
[Required]
public string CategoryId { get; set; }
}
now the problem is this: inside of Job I try to create a new Dish and associate it with both the user and the category. Since I just have the user id and I don't have access to UserManager, this is what I try to do:
// ...
var category = await categoryRepository.FindByUserAndCode(userId, "ABC");
// this is a category that is guaranteed to exist
var dish = new Dish();
dish.UserId = userId;
// notice there's no dish.User assignment, because I don't have an ApplicationUser object
dish.Category = category;
dishRepository.Upsert(dish); (which internally either creates a new entity or updates the existing one as appropriate)
and this is where it all breaks down, because it says that a category with the same Id I'm trying to insert is already present, so I'm trying to duplicate a primary key.
Since the category with code ABC for this user exists in the db, I thought it was odd.
Here's the thing: the instance of Category that the repository returns does have it's UserId property populated, but the User property is null.
I think this is what causes my problem: EF probably sees that the property is null and considers this object a new one.
I don't know why it comes up null (and it does even for other entities that all have a property referencing the user), but I tried to backtrack and, instead of using just the user id, I wanted to try to get Hangfire to instantiate Job injecting UserManager<ApplicationUser> into it, so at least I could get an instance of my user by its id.
It's worth noting that this works in other parts of project A, it's just that when I'm executing the background job something goes horribly wrong and I can't for the life of me figure out what it is.
However the dependencies of UserManager are many, and I fear I might be barking up the wrong tree or doing it completely wrong.
I said that the fact I'm using Hangfire might not matter because the assumption under which it operates is: just give me the name of your class, I'll take care of instantiating it as long as all the dependencies have been registered.
Anyone has done this before and can help shed some light?

You've included an absolute ton of information here that is entirely inconsequential to the problem at hand. What your issue boils down is simply the exception you're getting when attempting to add a dish: "a category with the same Id I'm trying to insert is already present, so I'm trying to duplicate a primary key."
This is most normally caused by attempting to use a detached entity as a relationship, i.e.:
dish.Category = category;
If category is detached from the context, then EF will attempt to create it because of this assignment, and since it already exists, that creation fails. We can't see what's going on in categoryRepository.FindByUserAndCode, but I'd imagine you're either calling AsNoTracking with the query, or are newing up an instance of Category manually yourself. In either case, that instance, then, is detached from the context. To attach it again, you simply need to do:
context.Attach(category);
However, you don't have direct access to your context here. This is yet one more reason that you should never use the repository pattern with EF. So much pain and suffering has been subjected on developers throughout the year by either bad advice or erroneously attempting to do things as they are used to.
EF is an ORM (object relational mapper), which is a fancy way of saying that it is itself a data layer. The DbContext is the unit of work and each DbSet is a repository... already. The repository pattern is for abstracting low-level database access (i.e. all the crud of constructing SQL strings, for example). EF is already a high-level abstraction, trying to cram it into another repository pattern layer only cripples it and leads to problems like what you're experiencing here.
Long and short, the issue is that category is detached. You need to either ensure that it never becomes detached in the first place (i.e. don't use AsNoTracking for example) or find a way to ensure that it's reattached later. However, your best bet here is to throw away all this repository garbage completely and just use the context directly. Choosing to use an ORM like EF is simply choosing to use a third-party DAL, rather than write your own. Writing your own, anyways, on top of that is just wrong. You use the built in routing framework in ASP.NET Core. You use the built in templating engine (i.e. Razor). Do you feel the need to put some abstraction around those? Of course not, so why is a DAL any different? If you simply must create an abstraction, then use a meaningful one such as CQRS, service layer, or microservices patterns.

What is a proper way of writing entity POCO classes in Entity Framework Core?

EF Core has a "code first mentality" by default, i.e. it is supposed to be used in a code-first manner, and even though database-first approach is supported, it is described as nothing more than reverse-engineering the existing database and creating code-first representation of it. What I mean is, the model (POCO classes) created in code "by hand" (code-first), and generated from the database (by Scaffold-DbContext command), should be identical.
Surprisingly, official EF Core docs demonstrate significant differences. Here is an example of creating the model in code: https://ef.readthedocs.io/en/latest/platforms/aspnetcore/new-db.html And here is the example of reverse-engineering it from existing database: https://ef.readthedocs.io/en/latest/platforms/aspnetcore/existing-db.html
This is the entity class in first case:
public class Blog
{
public int BlogId { get; set; }
public string Url { get; set; }
public List<Post> Posts { get; set; }
}
public class Post
{
public int PostId { get; set; }
public string Title { get; set; }
public string Content { get; set; }
public int BlogId { get; set; }
public Blog Blog { get; set; }
}
and this is the entity class in second case:
public partial class Blog
{
public Blog()
{
Post = new HashSet<Post>();
}
public int BlogId { get; set; }
public string Url { get; set; }
public virtual ICollection<Post> Post { get; set; }
}
The first example is a very simple, quite obvious POCO class. It is shown everywhere in the documentation (except for the examples generated from database). The second example though, has some additions:
Class is declared partial (even though there's nowhere to be seen another partial definition of it).
Navigation property is of type ICollection< T >, instead of just List< T >.
Navigation property is initialized to new HashSet< T >() in the constructor. There is no such initialization in code-first example.
Navigation property is declared virtual.
DbSet members in a generated context class are also virtual.
I've tried scaffolding the model from database (latest tooling as of this writing) and it generates entities exactly as shown, so this is not an outdated documentation issue. So the official tooling generates different code, and the official documentation suggests writing different (trivial) code - without partial class, virtual members, construction initialization, etc.
My question is, trying to build the model in code, how should I write my code? I like using ICollection instead of List because it is more generic, but other than that, I'm not sure whether I need to follow docs, or MS tools? Do I need to declare them as virtual? Do I need to initialize them in a constructor? etc...
I know from the old EF times that virtual navigation properties allow lazy loading, but it is not even supported (yet) in EF Core, and I don't know of any other uses. Maybe it affects performance? Maybe tools try to generate future-proof code, so that when lazy-loading will be implemented, the POCO classes and context will be able to support it? If so, can I ditch them as I don't need lazy loading (all data querying is encapsulated in a repo)?
Shortly, please help me understand why is the difference, and which style should I use when building the model in code?

I try to give a short answer to each point you mentioned
partial classes are specially useful for tool-generated code. Suppose you want to implement a model-only derived property. For code first, you would just do it, wherever you want. For database first, the class file will be re-written if you update your model. So if you want to keep your extension code, you want to place it in a different file outside the managed model - this is where partial helps you to extend the class without tweaking the auto-generated code by hand.
ICollection is definitely a suitable choice, even for code first. Your database probably won't support a defined order anyway without a sorting statement.
Constructor initialization is a convenience at least... suppose you have either an empty collection database-wise or you didn't load the property at all. Without the constructor you have to handle null cases explicitely at arbitrary points in code. Whether you should go with List or HashSet is something I can't answer right now.
virtual enables proxy creation for the database entities, which can help with two things: Lazy Loading as you already mentioned and change tracking. A proxy object can track changes to virtual properties immediately with the setter, while normal objects in the context need to be inspected on SaveChanges. In some cases, this might be more efficient (not generally).
virtual IDbSet context entries allow easier design of testing-mockup contexts for unit tests. Other use cases might also exist.

How to insert an ObservableCollection property to a local sqlite-net db?

I have a quick question about the sqlite-net library which can be found here : https://github.com/praeclarum/sqlite-net.
The thing is I have no idea how collections, and custom objects will be inserted into the database, and how do I convert them back when querying, if needed.
Take this model for example:
[PrimaryKey, AutoIncrement]
public int Id { get; set; }
private string _name; // The name of the subject. i.e "Physics"
private ObservableCollection<Lesson> _lessons;

Preface: I've not used sqlite-net; rather, I spent some time simply reviewing the source code on the github link posted in the question.
From the first page on the sqlite-net github site, there are two bullet points that should help in some high level understanding:
Very simple methods for executing CRUD operations and queries safely (using parameters) and for retrieving the results of those
query in a strongly typed fashion
In other words, sqlite-net will work well with non-complex models; will probably work best with flattened models.
Works with your data model without forcing you to change your classes. (Contains a small reflection-driven ORM layer.)
In other words, sqlite-net will transform/map the result set of the SQL query to your model; again, will probably work best with flattened models.
Looking at the primary source code of SQLite.cs, there is an InsertAll method and a few overloads that will insert a collection.
When querying for data, you should be able to use the Get<T> method and the Table<T> method and there is also an Query<T> method you could take a look at as well. Each should map the results to the type parameter.
Finally, take a look at the examples and tests for a more in-depth look at using the framework.

I've worked quite a bit with SQLite-net in the past few months (including this presentation yesterday)
how collections, and custom objects will be inserted into the database
I think the answer is they won't.
While it is a very capable database and ORM, SQLite-net is targeting lightweight mobile apps. Because of this lightweight focus, the classes used are generally very simple flattened objects like:
public class Course
{
public int CourseId { get; set; }
public string Name { get; set; }
}
public class Lesson
{
public int LessonId { get; set; }
public string Name { get; set; }
public int CourseId { get; set; }
}
If you then need to Join these back together and to handle insertion and deletion of related objects, then that's down to you - the app developer - to handle. There's no auto-tracking of related objects like there is in a larger, more complicated ORM stack.
In practice, I've not found this a problem. I find SQLite-net very useful in my mobile apps.

What are the Pros and Cons of having one-to-many bi-direcional relationships in Entity Framework (4.1)?

I am working on a new project that makes use of Entity Framework. I was in love with relationship inverses, having navigation properties on both types would make my life really easier.
Problem is: our project manager thinks that using this facility may lead to mistakes and thus is advising us not to use it.
Let's say we have Category and Product:
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public virtual Category Category { get; set; }
}
public class Category
{
public int Id { get; set; }
public string Name { get; set; }
public virtual ICollection<Product> Products { get; set; }
}
He says that one can, by mistake, set productZ.Category to categoryX and then do a categoryY.Products.Add(productZ), for example. Then what relationship will be persisted when we save the context? The last one? Ok, what if the first one was the correct category? And that this kind of error is hard to debug (which I agree).
I don't think this scenario can happen but I really respect his opinion, he's way more experienced than me, but I really want to convince him to let us have bidirectional navigation properties and the only argument I have is: if everyone is doing it then it cannot be this bad. :)
Can you guys help me with some more solid arguments?

one can by mistake ...
One can also write unit test to validate that code sets expected relations. You can always make mistake and removing navigation property is not a solution to avoid the problem. The solution is to validate that your code does what is expected = testing and code coverage.
What you ask is more about designing your classes - not about mistakes. Is it meaningful to have navigation properties on both sides? Are you adding product to category or assigning category to product? Do you need to offer both ways? Can product exists without category? Are you handling products and categories separately or are they aggregate? Based on answers to these questions you can sometimes find that navigation property on one side of relation is not necessary.

I would agree with your project manager, especially when reading "he's way more experienced than me", as it suggests at least some of the team is a bit green (yourself?) and likely to make the type of mistakes he mentions. The problem you are describing is much like storing a value in multiple databases (sometimes a necessary evil) and not setting one of the databases as the master.
While a bi-directional "relationship" might make things a bit easier, it is not so much easier that I would add the risk of messing up the data through a simple coding mistake. The fact you can navigate without adding the bi-directional support (perhaps not with as simple a statement, but it can be done), I would have to have strong arguments for using them.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.