I would like to know if the following scenario is possible with Entity Framework:
I want to load several tables with the option AsNoTracking since they are all like static tables that cannot be changed by user.
Those tables also happen to be navigation property of others. Up till now I relied on the AutoMapping feature of the Entity Framework, and don't use the .Include() or LazyLoading functionality.
So instead of:
var result = from x in context.TestTable
.Include("ChildTestTable")
select x;
I am using it like this:
context.ChildTestTable.Load();
context.TestTable.Load();
var result = context.TestTable.Local;
This is working smoothly because the application is so designed that the tables within the Database are very small, there won't be a table that exceeds 600 rows (and that's already pretty high value in my app).
Now my way of loading data, isn't working with .AsNoTracking().
Is there any way to make it working?
So I can write:
context.ChildTestTable.AsNoTracking().List();
var result = context.TestTable.AsNoTracking().List();
Instead of:
var result = from x in context.TestTable.AsNoTracking()
.Include("ChildTestTable")
select x;
So basically, I want to have 1 or more tables loaded with AutoMapping feature on but without loading them into the Object State Manager, is that a possibility?
The simple answer is no. For normal tracking queries, the state manager is used for both identity resolution (finding a previously loaded instance of a given entity and using it instead of creating a new instance) and fixup (connecting navigation properties together). When you use a no-tracking query it means that the entities are not tracked in the state manager. This means that fixup between entities from different queries cannot happen because EF has no way of finding those entities.
If you were to use Include with your no-tracking query then EF would attempt to do some fixup between entities within the query, and this will work a lot of the time. However, some queries can result in referencing the same entity multiple times and in some of those cases EF has no way of knowing that it is the same entity being referenced and hence you may get duplicates.
I guess the thing you don't really say is why you want to use no-tracking. If your tables don't have a lot of data then you're unlikely to see significant perf improvements, although many factors can influence this. (As a digression, using the ObservableCollection returned by .Local could also impact perf and should not be necessary if the data never changes.) Generally speaking you should only use no-tracking if you have an explicit need to do so since otherwise it ends up adding complexity without benefit.
Related
I want to cache a never-changing aggregate which would be accessible by a root object only (all other entities are accessible only by using Reference/HasMany properties on the root object)?
Should I use NHibernate (which we are already using) second-level-cache or is it better to build some sort of singleton that provides access to all entities in the aggregate?
I found a blog post about getting everything with MultiQuery but my database does not support it.
The 'old way' to do this would be to
Do a select * from all aggregate tables
Loop the entities and set the References and the Collections manually
Something like:
foreach (var e in Entities)
{
e.Parent = loadedParentEntities.SingleOrDefault(pe => e.ParentId = pe.Id);
}
But surely there is a way to tell NHibernate to do this for me?
Update
Currently I tried merely fetching everything from the db and hope NHibernate does all the reference setting. It does not however :(
var getRoot = Session.Query<RootObject>().ToList();
var getRoot_hasMany = Session.Query<RootObjectCollection>().ToList();
var getRoot_hasMany_ref = Session.Query<RootObjectCollectionReference>().ToList();
var getRoot_hasMany_hasMany = Session.Query<RootObjectCollectionCollection>().ToList();
Domain:
Root objects are getRoot. These have a collection property 'HasMany'. These HasMany have each a reference back to GetRoot, and a reference to another entity (getRoot_hasMany_ref), and a collection of their own (getRoot_hasMany_hasMany). If this doesn't make sense, I'll create an ERD but the actual structure is not really relevant for the question (I think).
This results in 4 queries being executed. (which is good)
However, when accessing properties like getRoot.First().HasMany.First().Ref or getRoot.First().HasMany.First().HasMany().First() it still results in extra queries being executed even altough everything should already be known to the ISession?
So how do I tell NHibernate to perform those 4 queries and then build the graphs without using any proxy properties, ... so that I have access to everything even after the ISession went out of scope?
I think there are several questions in one.
I stopped trying to trick NHibernate too much. I wouldn't access entities from multiple threads, because they are usually not thread safe. At least when using lazy loading. Caching lazy entities is therefore something evil.
I would avoid too many queries by the use of batch size, which is far the cleanest and easiest solution and in most cases "good enough". It's fully transparent to the business logic, which makes it so cool.
I would:
Consider not caching the entity at all. Use NH first level cache (say: always load it using session.Get()). Make use of lazy loading when only a small part of the data is used in a single transaction.
Is there is a proven need to cache the data, consider to turn off lazy loading at all (by making the entities non-lazy and setting all the collections to non lazy. Load the entity once and cache it. Still consider thread safety when accessing the data while it is still loaded.
Should the entities be lazy, because some instances of the same type are not in the cache, consider using a DTO-like structure as cache. Copy all data in a similar class structure which are not entities. This may sound like a lot of additional work, but at the end it will avoid many strange problems and safe you much time.
Usually, query time is less important as flush time. This time is used by NH to find which entities changed in a session. To avoid this, make entities read only if you can.
if the whole object tree never changes (config settings?) then just load them efficiently with all references/collections initialised
using(var Session = Sessionfactory = OpenSession())
{
var root = Session.Query<RootObject>().FetchMany(x => x.Collection).ToFutureValue();
Session.Query<RootObjectCollection>().Fetch(x => x.Ref).FetchMany(x => x.Collection).ToFuture();
// Do something with root.Value
}
Right now I'm working on a pretty complex database. Our object model is designed to be mapped to the database. We're using EF 5 with POCO classes, manually generated.
Everything is working, but there's some complaining about the performances. I've never had performance problems with EF so I'm wondering if this time I just did something terribly wrong, or the problem could reside somewhere else.
The main query may be composed of dynamic parameters. I have several if and switch blocks that are conceptually like this:
if (parameter != null) { query = query.Where(c => c.Field == parameter); }
Also, for some complex And/Or combinations I'm using LinqKit extensions from Albahari.
The query is against a big table of "Orders", containing years and years of data. The average use is a 2 months range filter though.
Now when the main query is composed, it gets paginated with a Skip/Take combination, where the Take is set to 10 elements.
After all this, the IQueryable is sent through layers, reaches the MVC layer where Automapper is employed.
Here, when Automapper starts iterating (and thus the query is really executed) it calls a bunch of navigation properties, which have their own navigation properties and so on. Everything is set to Lazy Loading according to EF recommendations to avoid eager loading if you have more than 3 or 4 distinct entities to include. My scenario is something like this:
Orders (maximum 10)
Many navigation properties under Order
Some of these have other navigation under them (localization entities)
Order details (many order details per order)
Many navigation properties under each Order detail
Some of these have other navigation under them (localization entities)
This easily leads to a total of 300+ queries for a single rendered "page". Each of those queries is very fast, running in a few milliseconds, but still there are 2 main concerns:
The lazy loaded properties are called in sequence and not parallelized, thus taking more time
As a consequence of previous point, there's some dead time between each query, as the database has to receive the sql, run it, return it and so on for each query.
Just to see how it went, I tried to make the same query with eager loading, and as I predicted it was a total disaster, with a translated sql of more than 7K lines (yes, seven thousands) and way more slow overall.
Now I'm reluctant to think that EF and Linq are not the right choice for this scenario. Some are saying that if they were to write a stored procedure which fetches all the needed data, it would run tens of times faster. I don't believe that to be true, and we would lose the automatic materialization of all related entities.
I thought of some things I could do to improve, like:
Table splitting to reduce the selected columns
Turn off object tracking, as this scenario is read only (have untracked entities)
With all of this said, the main complaint is that the result page (done in MVC 4) renders too slowly, and after a bit of diagnostics it seems all "Server Time" and not "Network Time", taking about from 8 to 12 seconds of server time.
From my experience, this should not be happening. I'm wondering if I'm approaching this query need in a wrong way, or if I have to turn my attention to something else (maybe a bad configured IIS server, or anything else I'm really clueless). Needles to say, the database has its indexes ok, checked very carefully by our dba.
So if anyone has any tip, advice, best practice I'm missing about this, or just can tell me that I'm dead wrong in using EF with Lazy Loading for this scenario... you're all welcome.
For a very complex query that brings up tons of hierarchical data, stored procs won't generally help you performance-wise over LINQ/EF if you take the right approach. As you've noted, the two "out of the box" options with EF (lazy and eager loading) don't work well in this scenario. However, there are still several good ways to optimize this:
(1) Rather than reading a bunch of entities into memory and then mapping via automapper, do the "automapping" directly in the query where possible. For example:
var mapped = myOrdersQuery.Select(o => new OrderInfo { Order = o, DetailCount = o.Details.Count, ... })
// by deferring the load until here, we can bring only the information we actually need
// into memory with a single query
.ToList();
This approach works really well if you only need a subset of the fields in your complex hierarchy. Also, EF's ability to select hierarchical data makes this much easier than using stored procs if you need to return something more complex than flat tabular data.
(2) Run multiple LINQ queries by hand and assemble the results in memory. For example:
// read with AsNoTracking() since we'll be manually setting associations
var myOrders = myOrdersQuery.AsNoTracking().ToList();
var orderIds = myOrders.Select(o => o.Id);
var myDetails = context.Details.Where(d => orderIds.Contains(d.OrderId)).ToLookup(d => d.OrderId);
// reassemble in memory
myOrders.ForEach(o => o.Details = myDetails[o.Id].ToList());
This works really well when you need all the data and still want to take advantage of as much EF materialization as possible. Note that, in most cases a stored proc approach can do no better than this (it's working with raw SQL, so it has to run multiple tabular queries) but can't reuse logic you've already written in LINQ.
(3) Use Include() to manually control which associations are eager-loaded. This can be combined with #2 to take advantage of EF loading for some associations while giving you the flexibility to manually load others.
Try to think of an efficient yet simple sql query to get the data for your views.
Is it even possible?
If not, try to decompose (denormalize) your tables so that less joins is required to get data. Also, are there efficient indexes on table colums to speed up data retrieval?
If yes, forget EF, write a stored procedure and use it to get the data.
Turning tracking off for selected queries is a-must for a read-only scenario. Take a look at my numbers:
http://netpl.blogspot.com/2013/05/yet-another-orm-micro-benchmark-part-23_15.html
As you can see, the difference between tracking and notracking scenario is significant.
I would experiment with eager loading but not everywhere (so you don't end up with 7k lines long query) but in selected subqueries.
One point to consider, EF definitely helps make development time much quicker. However, you must remember that when you're returning lots of data from the DB, that EF is using dynamic SQL. This means that EF must 1. Create the SQL, 2.SQL Server then needs to create an execution plan. this happens before the query is run.
When using stored procedures, SQL Server can cache the execution plan (which can be edited for performance), which does make it faster than using EF. BUT... you can always create your stored proc and then execute it from EF. Any complex procedures or queries I would convert to stored procs and then call from EF. Then you can see your performance gain(s) and reevaluate from there.
In some cases, you can use Compiled Queries MSDN to improve query performance drastically. The idea is that if you have a common query that is run many times that might generate the same SQL call with different parameters, you compile the query tie first time it's run then pass it as a delegate, eliminating the overhead of Entity Framework re-generating the SQL for each subsequent call.
I am new to Entity Framework.
And I have one concern:
I need to walktrhough a quite big amount of data that is gathered via a LINQ to Entities that combines couple of properties from different entities in an anonymous type.
If I need to read the returned items of this query one by one until the end, am I under the risk of OutOfMemory exception as the collection is BIG or EF uses SqlDataReader implicitly?
( Or should I use EntityDateReader to ensure that I am reading the Db in a sequential order (But then I have to generate my query as a string I guess) )
As I see it there are two things you can do, firstly turn off tracking by using .AsNoTracking this will in most cases cut your memory set in half which may be enough.
If your set is still too big use skip and take to pull down the resultset in chunks. You should also use this in conjunction with AsNoTracking to ensure no memory is consumed with tracking
EDIT:
For example you could use something like the following the following to loop through all items in chunks of 1000. The below code should only hold 1000 items at a time in memory.
int numberOfItems = ctx.MySet.Count();
for(int i = 0; i < numberOfItems + 1000; i+=1000)
{
foreach(var item in ctx.MySet.AsNoTracking().Skip(i).Take(1000).AsEnumerable())
{
//do stuff with your entity
}
}
If the amount of the data is as big as you said i would recommend that you don't use EF for such a case. EF is great but you sometimes need to fallback and use standard SQL to get better performance.
Take a look at Dapper.NET https://github.com/SamSaffron/dapper-dot-net
If you really want to use EF in every case i would recommend that you use a Bounded Contexts (multiple DBContexts).
Splitting your model in multiple smaller contexts will improve the performance as you will use less resources when EF creates an in-memory model of the context.
The larger the context is the more resources are expended to generate
and maintain that in-memory model.
Also you can Detach and Attach when sharing instances of multiple context so you still don't load all the model.
I will start by saying I have already looked thoroughly in stack overflow, nhusers and the documentation for a possible solution to my issue.
I need to be able to query only the base class table in parts of my multi/future query when eagerly loading associations (although from the research I have done I don't think this is possible)
I have started to map an existing schema using fluent nhibernate as a proof of concept. I have mapped an inheritance hierarchy using table per sub class (The mappings all work perfectly fine so I won't paste them all in here). The hierarchy has around 15 sub classes and the base class has some additional associations. E.g.
Base
Dictionary<string, Attribute> Attributes
List<EntityChange> Changes
I need to eagerly load both of the collections as in the given scenario they are required for post processing and lazily loading them causes performance issues. I am eagerly loading them by a multi / future query:
var baseQuery = session.CreateCriteria<Base>("b")
.CreateCriteria("Nested", JoinType.LeftOuterJoin)
.CreateCriteria("Nested2", JoinType.LeftOuterJoin)
.CreateCriteria("Nested2.AdditionalNested", JoinType.LeftOuterJoin);
var logsQuery = session.CreateCriteria<Base>("b").CreateAlias("Changes", "c", JoinType.LeftOuterJoin,
Expression.And(Expression.Ge("c.EntryDate", changesStartDate), Expression.Le("c.EntryDate", changesEndDate)))
.AddOrder(Order.Desc("c.EntryDate"));
var attributesQuery = session.CreateCriteria<Base>("t").SetFetchMode("Attributes", FetchMode.Join);
logsQuery.Future<Base>();
attributesQuery.Future<Base>();
var results = baseQuery.Future<Base>().ToList();
The queries execute and return the correct results. But just to eagerly load the associations in this manner means that the attribute and changes queries have to perform a polymorphic fetch (the addition of about 15 left outer joins per query that aren't required). I know this is required for polymorphic querying but the base query will return the hierarchy that I desire. The other parts of the multi query that are issuing a polymorphic query are redundant.
I haven't yet mapped the whole of the hierarchy so there will be additional unecessary joins being performed and there are also other associations that could be loaded up front. These two combined without the addition of an increase in volume will lead to performance issues. The performance currently of this query is about 6 seconds (which admittedly is better than the 20 it's currently taking) but by messing around a bit with the query and taking out the extra joins I can get it down to about 2 seconds (this is a common query so getting it as low as possible is beneficial not just pleasing to me. It will also be run from multiple distributed machine so I would rather not get into a discussion about caching / 2nd level caching).
I have tried
using the class modifier in the query 'class = base'. I initially done this blindly but believe this is for discriminator values. Even if it is for the case statement in the SQL this will not prevent the extra joins.
Doing everything in a single query. This is slower than splitting it up as above and gives the cartesian product
Using 'Polymorphism.Explicit();' in the fluent mappings. This has no effect as I am using ClassMap with SubclassMaps so it is ignored. I tried changing all the maps to ClassMaps and using Join but this didn't give the desired behaviour.
Tried to trick nhibernate into joining the base class table onto itself for loading associations (basically load a more concrete type to prevent the polymorphic query) - create a derived class 'BaseOnlyLoading' which uses the same table and primary key as the base class. This was obviously a hack but I was just trying to see what's possible. NHibernate doesn't allow the class and sub class to use the same table.
Define the BaseOnlyLoadingMap to be a classmap with the same assocations as the BaseMap with a join back onto the Base. This was hopeful as assocation collections are resolved in the context based on full type name.
Use an interceptor which modifies the SQL that before it's execute. I wouldn't use this in production and just tried it out of interest. I passed an interceptor into a local session. This caused issues and I didn't proceed.
The HQL 'Type' query operator as explained here although I am not sure this has been implemented in the .NET version and might behave similarly to 1.
There is comment on highest rated answer (How to perform a non-polymorphic HQL query in Hibernate?) which suggest overriding the IsExplicitPolymorphism on the persister. I had a quick look and from what I remember the persister was either global per entity or created in the SessionImpl from a static factory which would prevent doing this. Even if this was possible I am not sure what sort of side effects this would have.
I tried using some SQL to load everything but even if I use a stored proc I am not sure how nhibernate will piece the graph back together. Maybe I could specify all the entities and aliases?
Specifying explicit per query would be nice. Any suggestions?
Thanks in advance.
Using LINQ to Entities sounds like a great way to query against a database and get actual CLR objects that I can modify, data bind against and so forth. But if I perform the same query a second time do I get back references to the same CLR objects or an entirely new set?
I do not want multiple queries to generate an ever growing number of copies of the same actual data. The problem here is that I could alter the contents of one entity and save it back to the database but another instance of the entity is still in existence elsewhere and holding the old data.
Within the same DataContext, my understanding is that you'll always get the same objects - for queries which return full objects instead of projections.
Different DataContexts will fetch different objects, however - so there's a risk of seeing stale data there, yes.
In the same DataContext you would get the same object if it's queried (DataContext maintains internal cache for this).
Be aware that that the objects you deal are most likely mutable, so instead of one problem (data duplication) you can get another (concurrent access).
Depending on business case it may be ok to let the second transaction with stale data to fail on commit.
Also, imagine a good old IDataReader/DataSet scenario. Two queries would return two different readers that would fill different datasets. So the data duplication problem isn't ORM specific.
[oops; note that this reply applies to Linq-to-SQL, not Entity Framework.]
I've left it here (rather than delete) because it is partly on-topic, and might be useful.
Further to the other replies, note that the data-context also has the ability to avoid doing a round-trip for simply "by primary key" queries - it will check the cache first.
Unfortunately, it was completely broken in 3.5, and is still half-broken in 3.5SP1, but it works for some queries. This can save a lot of time if you are getting individual objects.
So basically, IIRC you need to use:
// uses object identity cache (IIRC)
var obj = ctx.Single(x=>x.Id == id);
But not:
// causes round-trip (IIRC)
var obj = ctx.Where(x=>x.Id == id).Single();