Does LINQ to Entities reuse instances of objects? - c#

Using LINQ to Entities sounds like a great way to query against a database and get actual CLR objects that I can modify, data bind against and so forth. But if I perform the same query a second time do I get back references to the same CLR objects or an entirely new set?
I do not want multiple queries to generate an ever growing number of copies of the same actual data. The problem here is that I could alter the contents of one entity and save it back to the database but another instance of the entity is still in existence elsewhere and holding the old data.

Within the same DataContext, my understanding is that you'll always get the same objects - for queries which return full objects instead of projections.
Different DataContexts will fetch different objects, however - so there's a risk of seeing stale data there, yes.

In the same DataContext you would get the same object if it's queried (DataContext maintains internal cache for this).
Be aware that that the objects you deal are most likely mutable, so instead of one problem (data duplication) you can get another (concurrent access).
Depending on business case it may be ok to let the second transaction with stale data to fail on commit.
Also, imagine a good old IDataReader/DataSet scenario. Two queries would return two different readers that would fill different datasets. So the data duplication problem isn't ORM specific.

[oops; note that this reply applies to Linq-to-SQL, not Entity Framework.]
I've left it here (rather than delete) because it is partly on-topic, and might be useful.
Further to the other replies, note that the data-context also has the ability to avoid doing a round-trip for simply "by primary key" queries - it will check the cache first.
Unfortunately, it was completely broken in 3.5, and is still half-broken in 3.5SP1, but it works for some queries. This can save a lot of time if you are getting individual objects.
So basically, IIRC you need to use:
// uses object identity cache (IIRC)
var obj = ctx.Single(x=>x.Id == id);
But not:
// causes round-trip (IIRC)
var obj = ctx.Where(x=>x.Id == id).Single();

Related

Load Entities AsNoTracking() with navigation properties, without specifying includes

I would like to know if the following scenario is possible with Entity Framework:
I want to load several tables with the option AsNoTracking since they are all like static tables that cannot be changed by user.
Those tables also happen to be navigation property of others. Up till now I relied on the AutoMapping feature of the Entity Framework, and don't use the .Include() or LazyLoading functionality.
So instead of:
var result = from x in context.TestTable
.Include("ChildTestTable")
select x;
I am using it like this:
context.ChildTestTable.Load();
context.TestTable.Load();
var result = context.TestTable.Local;
This is working smoothly because the application is so designed that the tables within the Database are very small, there won't be a table that exceeds 600 rows (and that's already pretty high value in my app).
Now my way of loading data, isn't working with .AsNoTracking().
Is there any way to make it working?
So I can write:
context.ChildTestTable.AsNoTracking().List();
var result = context.TestTable.AsNoTracking().List();
Instead of:
var result = from x in context.TestTable.AsNoTracking()
.Include("ChildTestTable")
select x;
So basically, I want to have 1 or more tables loaded with AutoMapping feature on but without loading them into the Object State Manager, is that a possibility?
The simple answer is no. For normal tracking queries, the state manager is used for both identity resolution (finding a previously loaded instance of a given entity and using it instead of creating a new instance) and fixup (connecting navigation properties together). When you use a no-tracking query it means that the entities are not tracked in the state manager. This means that fixup between entities from different queries cannot happen because EF has no way of finding those entities.
If you were to use Include with your no-tracking query then EF would attempt to do some fixup between entities within the query, and this will work a lot of the time. However, some queries can result in referencing the same entity multiple times and in some of those cases EF has no way of knowing that it is the same entity being referenced and hence you may get duplicates.
I guess the thing you don't really say is why you want to use no-tracking. If your tables don't have a lot of data then you're unlikely to see significant perf improvements, although many factors can influence this. (As a digression, using the ObservableCollection returned by .Local could also impact perf and should not be necessary if the data never changes.) Generally speaking you should only use no-tracking if you have an explicit need to do so since otherwise it ends up adding complexity without benefit.

NHibernate - Eager load graphs of objects with multiple queries

I want to cache a never-changing aggregate which would be accessible by a root object only (all other entities are accessible only by using Reference/HasMany properties on the root object)?
Should I use NHibernate (which we are already using) second-level-cache or is it better to build some sort of singleton that provides access to all entities in the aggregate?
I found a blog post about getting everything with MultiQuery but my database does not support it.
The 'old way' to do this would be to
Do a select * from all aggregate tables
Loop the entities and set the References and the Collections manually
Something like:
foreach (var e in Entities)
{
e.Parent = loadedParentEntities.SingleOrDefault(pe => e.ParentId = pe.Id);
}
But surely there is a way to tell NHibernate to do this for me?
Update
Currently I tried merely fetching everything from the db and hope NHibernate does all the reference setting. It does not however :(
var getRoot = Session.Query<RootObject>().ToList();
var getRoot_hasMany = Session.Query<RootObjectCollection>().ToList();
var getRoot_hasMany_ref = Session.Query<RootObjectCollectionReference>().ToList();
var getRoot_hasMany_hasMany = Session.Query<RootObjectCollectionCollection>().ToList();
Domain:
Root objects are getRoot. These have a collection property 'HasMany'. These HasMany have each a reference back to GetRoot, and a reference to another entity (getRoot_hasMany_ref), and a collection of their own (getRoot_hasMany_hasMany). If this doesn't make sense, I'll create an ERD but the actual structure is not really relevant for the question (I think).
This results in 4 queries being executed. (which is good)
However, when accessing properties like getRoot.First().HasMany.First().Ref or getRoot.First().HasMany.First().HasMany().First() it still results in extra queries being executed even altough everything should already be known to the ISession?
So how do I tell NHibernate to perform those 4 queries and then build the graphs without using any proxy properties, ... so that I have access to everything even after the ISession went out of scope?
I think there are several questions in one.
I stopped trying to trick NHibernate too much. I wouldn't access entities from multiple threads, because they are usually not thread safe. At least when using lazy loading. Caching lazy entities is therefore something evil.
I would avoid too many queries by the use of batch size, which is far the cleanest and easiest solution and in most cases "good enough". It's fully transparent to the business logic, which makes it so cool.
I would:
Consider not caching the entity at all. Use NH first level cache (say: always load it using session.Get()). Make use of lazy loading when only a small part of the data is used in a single transaction.
Is there is a proven need to cache the data, consider to turn off lazy loading at all (by making the entities non-lazy and setting all the collections to non lazy. Load the entity once and cache it. Still consider thread safety when accessing the data while it is still loaded.
Should the entities be lazy, because some instances of the same type are not in the cache, consider using a DTO-like structure as cache. Copy all data in a similar class structure which are not entities. This may sound like a lot of additional work, but at the end it will avoid many strange problems and safe you much time.
Usually, query time is less important as flush time. This time is used by NH to find which entities changed in a session. To avoid this, make entities read only if you can.
if the whole object tree never changes (config settings?) then just load them efficiently with all references/collections initialised
using(var Session = Sessionfactory = OpenSession())
{
var root = Session.Query<RootObject>().FetchMany(x => x.Collection).ToFutureValue();
Session.Query<RootObjectCollection>().Fetch(x => x.Ref).FetchMany(x => x.Collection).ToFuture();
// Do something with root.Value
}

How does the Entity Framework walks through a collection that is too big?

I am new to Entity Framework.
And I have one concern:
I need to walktrhough a quite big amount of data that is gathered via a LINQ to Entities that combines couple of properties from different entities in an anonymous type.
If I need to read the returned items of this query one by one until the end, am I under the risk of OutOfMemory exception as the collection is BIG or EF uses SqlDataReader implicitly?
( Or should I use EntityDateReader to ensure that I am reading the Db in a sequential order (But then I have to generate my query as a string I guess) )
As I see it there are two things you can do, firstly turn off tracking by using .AsNoTracking this will in most cases cut your memory set in half which may be enough.
If your set is still too big use skip and take to pull down the resultset in chunks. You should also use this in conjunction with AsNoTracking to ensure no memory is consumed with tracking
EDIT:
For example you could use something like the following the following to loop through all items in chunks of 1000. The below code should only hold 1000 items at a time in memory.
int numberOfItems = ctx.MySet.Count();
for(int i = 0; i < numberOfItems + 1000; i+=1000)
{
foreach(var item in ctx.MySet.AsNoTracking().Skip(i).Take(1000).AsEnumerable())
{
//do stuff with your entity
}
}
If the amount of the data is as big as you said i would recommend that you don't use EF for such a case. EF is great but you sometimes need to fallback and use standard SQL to get better performance.
Take a look at Dapper.NET https://github.com/SamSaffron/dapper-dot-net
If you really want to use EF in every case i would recommend that you use a Bounded Contexts (multiple DBContexts).
Splitting your model in multiple smaller contexts will improve the performance as you will use less resources when EF creates an in-memory model of the context.
The larger the context is the more resources are expended to generate
and maintain that in-memory model.
Also you can Detach and Attach when sharing instances of multiple context so you still don't load all the model.

Entity Framework 4 and caching of query results

Say I have a table or 2 that contains data that will never or rarely change, is there any point of trying to cache those data? Or will the EF context cache that data for me when I load them the first time? I was thinking of loading all the data from those tables and use a static list or something to keep that data in memory and query the in memory data instead of the tables whenever I need the data within the same context. Those tables I speak of contains typically a few hundred rows of data.
The EF context will cache "per instance". That is, each instance of the DbContext keeps it's own independent cache of objects. You can store the resulting list of objects in a static list and query it all you like without returning to the database. To be safe, make sure you abandon the DbContext after you execute the query.
var dbContext = new YourDbContext();
StaticData.CachedListOfThings = dbContext.ListOfThings.ToList();
You can later use LINQ to query the static list.
var widgets = StaticData.CachedListOfThing.Where(thing => thing.Widget == "Foo");
The query executes the in-memory collection, not the database.
You can check EF caching provider but be aware that caching in this way is performed strictly on query basis - so you must use the same query all the time to get cached data. If you use another query it will first be executed to be considered as cached and then you use it again to hit the cache. If you want to avoid this and cache data with ability to run any query on cached collection you must roll on your own solution (simply load data to list and keep it somewhere). When you load entities to cached list make sure that you turn off proxy creation (lazy loading and change tracking).
Caching per context instance really works but using context itself as a cache is pretty bad choice - in most scenarios I would call it EF anti-pattern. Use context as unit of work = do not reuse context for multiple logical operations.
you'll have to roll your own for any ef4 linq queries, as they are always resolved to sql, and thus will always hit the db. a simple cache for your couple tables probably wouldn't be hard to write.
if you're going to be querying by id though, you can use the ObjectContext.GetObjectByKey method, and it will look in the object cache before querying the db.

What problems will I have with Entity Framework select specific columns instead of the entire object?

What problems will I have if I change my Entity Framework queries from this:
var contracts = from contract in Context.Contracts select contract;
to this:
var contracts = from contract in Context.Contracts select new MyContract{ Key = contract.key, Advertiser = new MyAdvertiser{ Key = contract.Advertiser.Key } };
i.e. Changing from selecting a contract, to selecting a new object based on the columns of the contract.
In either approach, I am mapping the entities to domain objects after load, and back to entities on save.
There is nothing wrong with this approach. There are a couple of things to keep in mind:
1) Domain objects should always be fully populated, to prevent cases where a domain process attempts to use data that looks like it is there but isn't. If you have same domain processes that don't need the full set of data from a domain object, and you don't want to have to load the unnecessary data, create a second domain object for use in those processes.
2) Database servers can optimize queries that have identical field lists better than queries which have differing field lists. If performance is critical to this application, make sure you measure the effect that this change has on query performance. It seems like limiting the result set would increase performance, and generally it does, but not always.
I don't think you will have any problem (I'm doing it this way too), just make sure you write a method to do the mappings, so you don't have to repeat them in every query.

Categories

Resources