Let's assume I have 2 tables, A and B with 1-0..1 relation. I use the Adapter approach. I need to load a collection of A entities in one place, and then load all related B entities for all A entities later. The reason to not use Prefetch is that in most cases I will not need to load B entities.
I use LINQ everywhere, so I would like to do it the same way.
The code I am trying to write looks like this:
var linqMetadata = new LinqMetaData(adapter)
{
ContextToUse = new Context()
};
var aCollection = linqMetadata.A.Where(/*some filter*/).ToList();
// ...
var bIds = aCollection.Select(x => x.BId);
var bCollection = linqMetadata.B.Where(x => bIds.Contains(x.BId)).ToList();
The problem is that bCollection and aCollection stay unlinked, i.e. all A entities have B = null and vice versa. I want these references to be set, and therefore the 2 graphs to be united into a single one.
I can join 2 collections using LINQ to Objects, but that's not elegant, and besides this might get much more complicated if both collections contain complex graph with interconnections that need to be established.
I can write a prefetch from B to A, but that's one extra DB query that is completely unnecessary.
Is there an easy way to get these 2 graphs merged?
Object relation graph is built automatically if you add related entities using assignment for master entity and AddRange for detail entities.
Here's a pseudo-code sample:
foreach aEntity in aCollection
aEntity.BEntity = bCollection.First(x=>x.Id == aEntity.bId);
OR
foreach bEntity in bCollection
bEntity.AEntity.AddRange(aCollection.Where(x=>x.bId == bEntity.Id));
After doing this your object graph is complete.
Since you are already using a Context, you can use PrefetchPath and context to load the PrefetchPath later, that way your entities are always linked. Then you can run a Linq2Objects query to load in-memory all B's.
var bs = aCollection.Select(x => x.B).ToList();
Related
I'm struggling to get my head around how I should implement dapper in my application. I have a n-tier mvc application and have some experience with EF. Even that I think EF is good, I have not passed the learning curve to make it flow easy and not struggle with performance. In the new project we decided to give dapper a go, mostly to get control over the sql and hopefully get good performance.
Background
I created a layered aplication (core) with these layer
Web - mvc
Service - Business layer to handle the business logic
Data - datalayer to access the ms sql server
I went ahead and started implementing a UnitOfWork and generic Repositories in the datalayer.
A normal structure in the Database would be
Order
ref to User
ref to Address
OrderLine
ref to Product
And in many cases I want to retrieve multiple orders with all lines and products.
So what I did was to have navigation properties on the entity models as you would in EF and populate them with dapper using either multiquery or split the result into different entities and mapping them to the graph.
The problem
The problem I run into is when I do an insert. I have an sqlextension that maps the properties to table columns. But the navigation would also be mapped by default. I realize that I can decorate with attributes and read them on the mapping, but as I google I'm getting aware that maybe I should drop the UnitOfWork pattern and also repository, making the data-layer "super thin" and just expose the connection.
Then the service-layer would call the Dapper with correct sql, kind of what I would do today but with repositories.
I would also drop the navigation properties and fetch each entity on it's own, and combine them in the ViewModel.
My problem with this is if we take the order table above I would have to do something like this to get a full list (normally paged, also I removed the User/address)
var listModel = new OrderListViewModel();
var orders = orderService.GetAll();
foreach(var order in orders) {
var orderModel = new OrderViewModel(); // also map fields
var orderLines = orderService.GetOrderLinesForOrder(order.OrderId);
foreach(var orderline in orderLines) {
var orderLineModel = new OrderLineViewModel(); // also map fields
var product = productService.GetProduct(orderline.ProductId);
orderLineModel.Product = new ProductViewModel(); // also map fields
orderModel.OrderLines(orderLineModel);
}
listModel.Orders.Add(orderModel);
}
This will generate ALOT of queries (almost like EF lazy loading). So I could do a mapping thing
var orders = orderService.GetAll();
var orderLines = orderService.GetOrderLinesForOrders(orders.Select(o => o.OrderId).ToArray() ); // get all orderlines for all orders
var products = productService.GetProductsForOrderLines(orderLines.Select(p => p.OrderLineId).ToArray() ); // get all products for all orderlines
foreach(var order in orders) {
var orderModel = new OrderViewModel(); // also map fields
var orderLines = orderLines.Where( o => o.OrderId == order.OrderId );
foreach(var orderline in orderLines) {
var orderLineModel = new OrderLineViewModel(); // also map fields
var product = products.First(p => p.ProductId == orderline.ProductId);
orderLineModel.Product = new ProductViewModel(); // also map fields
orderModel.OrderLines(orderLineModel);
}
listModel.Orders.Add(orderModel);
}
This will generate alot less sql queries and is optimal in performance I think. I know there can be a problem with more than 2100 (?) parameters, but I think that will not be a problem in my case.
The problem is that many of out tables have different status, and many relations to other tables. I would have to do alot of these queries all the time.
When I first did repository and navigation I would do it like
repo.Get<Order, OrderLines, Product, Order>(sqlThatWouldJoinAllTables);
// split and map the structure into order Entity and just return that
That way I could just call orderService.GetAll() and retrieve a graph of order, orderlines and products.
I don't know which of the solutions is "best practice". I've tried to find a good open source project using layers and dapper to get some real world usage, but without success.
The approach of removing navigation properties also remove some of the purpose of the service layer, since i'm in kind of a way moving some of the business logic to the mvc controller.
I can't find a good practice how I would go forward, please advice.
If the RDBMS you're using supports JSON, I would suggest to wrap everything you need to insert into a JSON and send it to stored procedure with just one call. Same technique can be used to return a graph of related object with just one call. The Unit-Of-Work, a transaction really, will be taken care in the stored procedure itself, which is also the right place where to deal with transactions that operation on data IMHO.
This helps enormously to reduce round-trips at the expense of more CPU used on the database. This is usually not a problem unless you expect a really huge number (= more then several thousands of concurrent queries per second.
I have wrote extensively about this here:
https://medium.com/dapper-net/one-to-many-mapping-with-dapper-55ae6a65cfd4
and more specifically the "Complex Custom Handling" sample shows exactly what I mentioned.
My model looks something like this:
Company
-Locations
Locations
-Stores
Stores
-Products
So I want to make a copy of a Company, and all of its associations should also be copied and saved to the database.
How can I do this if I have the Company loaded in memory?
Company company = DbContext.Companies.Find(123);
If it is tricky, I can loop through each association and then call create a new object. The Id's will be different but everything else should be the same.
I am using EF 6.
Cloning object graphs with EF is a piece of cake:
var company = DbContext.Companies.AsNoTracking()
.Include(c => c.Locations
.Select(l => l.Stores
.Select(s => s.Products)))
.Where(c => c.Id == 123)
.FirstOrDefault();
DbContext.Companies.Add(company);
DbContext.SaveChanges();
A few things to note here.
AsNoTracking() is vital, because the objects you add to the context shouldn't be tracked already.
Now if you Add() the company, all entities in its object graph will be marked as Added as well.
I assume that the database generates new primary key values (identity columns). If so, EF will ignore the current values from the existing objects in the database. If not, you'll have to traverse the object graph and assign new values yourself.
One caveat: this only works well if the associations are 1:0..n. If there is a n:m association, identical entities may get inserted multiple times. If, for example, Store-Product is n:m and product A occurs at store 1 and store 2, product A will be inserted twice. If you want to prevent this, you should fetch the objects by one context, with tracking (i.e. without AsNoTracking), and Add() them in a new context. By enabling tracking, EF keeps track of identical entities and won't duplicate them. In this case, proxy creation should be disabled, otherwise the entities keep a reference to the context they came from.
More details here: Merge identical databases into one
I would add a method to each model that needs to be cloneable this way, I'd recommend an interface for it also.
It could be done something like this:
//Company.cs
Company DeepClone()
{
Company clone = new Company();
clone.Name = this.name;
//...more properties (be careful when copying reference types)
clone.Locations = new List<Location>(this.Locations.Select(l => l.DeepClone()));
return clone;
}
You should repeat this basic pattern for every class and "child" class that needs to be copiable. This way each object is aware of how to create a deep clone of its self, and passes responsibility for child objects off to the child class, neatly encapsulating everything.
It could be used this way:
Company copyOfCompany123 = DbContext.Companies.Find(123).DeepClone;
My apologies if there are any errors in the above code; I don't have Visual Studio available at the moment to verify everything, I'm working from memory.
One other really simple and code efficient way to deeply clone an object using serialization can be found in this post How do you do a deep copy an object in .Net (C# specifically)?
public static T DeepClone<T>(T obj)
{
using (var ms = new MemoryStream())
{
var formatter = new BinaryFormatter();
formatter.Serialize(ms, obj);
ms.Position = 0;
return (T) formatter.Deserialize(ms);
}
}
Just be aware that this can have some pretty serious resource and performance issues depending on your object structure. Every class that you want to use it on must also be marked with the [Serializable] attribute.
I am currently working with Entity Framework 4 on a project that is using Table Per Hierarchy to represent one set of classes using a single table. The way this works is that this table represents states and different states are associated with different other classes.
So you might imagine it to look like this, ignoring the common fields all states share:
InactiveState
has a -> StateStore
ActiveState
has a -> ActivityLocation
CompletedState
has a -> CompletionDate
Each state has a collection of all the InventoryItems that belong to it.
Now each item in my inventory has many states, where the last one in the history is the current state. To save on lists I have a shortcut field that points to the current state of my Inventory:
public class InventoryItem : Entity
{
// whole bunch of other properties
public virtual ICollection<InventoryState> StateHistory { get; set; }
public virtual InventoryState LastState { get; set; }
}
The first problem I am having is when I want to find, for example, all the InventoryItems which are in the Active state.
It turns out Linq-To-Entities doesn't support GetType() so I can't use a statement like InventoryRepository.Get().Where( x => x.LastState.GetType() == someType ). I can use the is operator, but that requires a fixed type so rather than being able to have a method like this:
public ICollection<InventoryItem> GetInventoryItemsByState( Type state )
{
return inventoryRepository.Get().Where( x => x.LastState is state );
}
I have to run some kind of if statement based on the type before I make the Linq query, which feels wrong. The InventoryItem list is likely to get large, so I need to do this at the EF level, I can't pull the whole list into memory and use GetType(), for example.
I have two questions in this situation, connected closely enough that I think they can be combined as they probably reflect a lack of understanding on my part:
Is it possible to find a list of items that share a child table type using Linq To Entities?
Given that I am not using Lazy Loading, is it possible to Include related items for child table types using TPH so that, for example, if I have an InactiveState as the child of my InventoryItem I can preload the StateStore for that InactiveState?
Is it possible to find a list of items that share a child table type
using Linq To Entities?
I don't think it's possible in another way than using an if/switch that checks for the type and builds a filter expression using is T or OfType<T>. You could encapsulate this logic into an extension method for example to have a single place to maintain and a reusable method:
public static class Extensions
{
public static IQueryable<InventoryItem> WhereIsLastState(
this IQueryable<InventoryItem> query, Type state)
{
if (state == typeof(InactiveState))
return query.Where(i => i.LastState is InactiveState);
if (state == typeof(ActiveState))
return query.Where(i => i.LastState is ActiveState);
if (state == typeof(CompletedState))
return query.Where(i => i.LastState is CompletedState);
throw new InvalidOperationException("Unsupported type...");
}
}
To be used like this:
public ICollection<InventoryItem> GetInventoryItemsByState(Type state)
{
return inventoryRepository.Get().WhereIsLastState(state).ToList();
}
I don't know if it would be possible to build the i => i.LastState is XXX expression manually using the .NET Expression API and based on the Type passed into the method. (Would interest me too, to be honest, but I have almost no clue about expression manipulation to answer that myself.)
Given that I am not using Lazy Loading, is it possible to Include
related items for child table types using TPH so that, for example, if
I have an InactiveState as the child of my InventoryItem I can preload
the StateStore for that InactiveState?
I am not sure if I understand that correctly but generally eager loading with Include does not support any filtering or additional operations on specific included children.
One way to circumvent this limitation and still get the result in a single database roundtrip is using a projection which would look like this:
var result = context.InventoryItems
.Select(i => new
{
InventoryItem = i,
LastState = i.LastState,
StateStore = (i.LastState is InactiveState)
? (i.LastState as InactiveState).StateStore
: null
})
.AsEnumerable()
.Select(x => x.InventoryItem)
.ToList();
If the query is a tracked query (which it is in the example above) and the relationships are not many-to-many (they are not in your example) the context will fixup the relationships when the entities are loaded into the context, that is InventoryItem.LastState and InventoryItem.LastState.StateStore (if LastState is of type InactiveState) will be set to the loaded entities (as if they had been loaded with eager loading).
I wonder if there is a possibility to eager load related entities for certain subclass of given class.
Class structure is below
Order has relation to many base suborder classes (SuborderBase). MySubOrder class inherits from SuborderBase. I want to specify path for Include() to load MySubOrder related entities (Customer) when loading Order, but I got an error claiming that there is no relation between SuborderBase and Customer. But relation exists between MySubOrder and Customer.
Below is query that fails
Context.Orders.Include("SubOrderBases").Include("SubOrderBases.Customers")
How can I specify that explicitly?
Update. Entity scheme is below
This is a solution which requires only a single roundtrip:
var orders = Context.Orders
.Select(o => new
{
Order = o,
SubOrderBases = o.SubOrderBases.Where(s => !(s is MyOrder)),
MyOrdersWithCustomers = o.SubOrderBases.OfType<MyOrder>()
.Select(m => new
{
MyOrder = m,
Customers = m.Customers
})
})
.ToList() // <- query is executed here, the rest happens in memory
.Select(a =>
{
a.Order.SubOrderBases = new List<SubOrderBase>(
a.SubOrderBases.Concat(
a.MyOrdersWithCustomers.Select(m =>
{
m.MyOrder.Customers = m.Customers;
return m.MyOrder;
})));
return a.Order;
})
.ToList();
It is basically a projection into an anonymous type collection. Afterwards the query result is transformed into entities and navigation properties in memory. (It also works with disabled tracking.)
If you don't need entities you can omit the whole part after the first ToList() and work directly with the result in the anonymous objects.
If you must modify this object graph and need change tracking, I am not sure if this approach is safe because the navigation properties are not completely set when the data are loaded - for example MyOrder.Customers is null after the projection and then setting relationship properties in memory could be detected as a modification which it isn't and cause trouble when you call SaveChanges.
Projections are made for readonly scenarios, not for modifications. If you need change tracking the probably safer way is to load full entities in multiple roundtrips as there is no way to use Include in a single roundtrip to load the whole object graph in your situation.
Suppose u loaded the orders list as lstOrders, try this:
foreach (Orders order in lstOrders)
order.SubOrderBases.Load();
and the same for the customers..
Given an object graph like:
A { IEnum<B> }
B { IEnum<C>, IEnum<D>, IEnum<E>, ... }
C { IEnum<X> }
How can I eagerly load the entire object graph without N+1 issues?
Here is the pseudo code for the queries that I would ultimately like to execute:
var a = Session.Get<A>(1); // Query 1
var b_Ids = foreach(b in A.B's) => Select(b.Id); // Query 2
var c = Session.CreateQuery("from C where B in (b_Ids)").Future<C>(); // Query 3
var d = Session.CreateQuery("from D where B in (b_Ids)").Future<D>(); // Query 3
var e = Session.CreateQuery("from E where B in (b_Ids)").Future<E>(); // Query 3
// Iterate through c, d, e, ... find the correct 'B' parent, add to collection manually
The problem that I have with this approach is that when I go to add the instances of 'C', 'D', and 'E' to the corresponding collection of the parent 'B', the collection is still proxied, and when .Add() is called, the proxy initializes itself and executes more queries; I think NHibernate is not capable of seeing that I already have all of the data in first level cache, which is understandable.
I've tried to work around this problem by doing something like this in my Add method:
void Add(IEnum<C>)
{
_collection = new Collection<C>(); // replace the proxied instance to prevent initialization
foreach(c) => _collection.Add(c);
}
This gave me the optimum query strategy that I wanted, but caught up with me later when doing persistence (NHibernate tracks the original collection by-ref somewhere from what I can tell).
So my question is, how can I load a complex graph with children of children without N+1? The only thing I've come across to date is joining B-C, B-D, B-E which is not acceptable in my situation.
We are using NH 2.1.2 with FluentHN for mapping. An upgrade to v3 of NH or using hbm's/stored procs/whatever would not be off the table.
UPDATE:
One of the comments references a join approach, and I did come across a blog that demonstrates this approach. This work around is not acceptable in our situation, but it may help someone else: Eager fetch multiple child collections in 1 round trip with NHibernate
UPDATE 2:
Jordan's answer led me to the following posts that are related to my question: Similar Question and Ayende's blog. The pending question at this point is "how can you perform the subselects without a round trip per-path".
UPDATE 3:
I've accepted Jordan's answer even though the subselect solution is not optimal.
You can use SubSelect fetching which can be setup in the mapping files. This will avoid N+1 and cartesian product.
firstly- you can change your mappings to load these collections eagerly. see item #4 in this section.
secondly- I believe that the reason that your collection seems to be loading twice is that you first fetch it using a query, and then using the collection property.
meaning- nHibernate distinguishes between queries generated by the user (like the one you use) and queries it generates itself (like the one that occurs when you first read your 'C' collection). they do not mix.
so, when you first read your 'C' collection, nHib does not recognize that it actually once sent the exact same query to the DB (since it was a user query), and sends it again.
The way to avoid this is to retrieve your C collection via your B entity.