Strange caching issues in Entity Framework 7 - c#

I've come across something quite specific and wondering if anyone out there has faced the same issue.
My SQL query (in a Stored Procedure) is simple, I've simplified it a little but:
BEGIN
SELECT DISTINCT
[ANU].[OldUserId] AS [ID]
,[ANU].[Email]
FROM
[dbo].[AspNetUsers] AS [ANU]
INNER JOIN
[dbo].[User] AS [U]
ON
[U].[ID] = [ANU].[OldUserId]
END
Pretty simple, and the SP is fine when run directly through SQL Management Studio.
However, I run it via Entity Framework as such:
[ResponseCache(Duration = 0)] // used this out of desperation
public List<DriverDTO> GetByOrganisation(int organisationId, bool isManager)
{
return _context.Set<DriverDTO>().FromSql("dbo.New_User_List #OrganisationId = {0}, #IsManager = {1}", organisationId, isManager).ToList();
}
DriverDTO:
public class DriverDTO
{
[Key] // tried removing this also
public int ID { get; set; }
public string Email { get; set; }
}
It runs and brings back results, fine. However these results are getting cached. Every call to the SP after the first call brings back the same results, even if I update the records. So, say I edit a User record and change the email - the originally fetched email will always be brought back.
Again, running the SP via SQL Manager brings back the correct results, but my C#/EF side does not. The only logical thing in my head here is that something is somehow being cached under the hood that I desperately need to get around?!

Your loaded entities are cached in the DbContext (in each DbSet's Local collection).
There are several options:
Use AsNoTracking for your query:
return _context.Set<DriverDTO>()
.AsNoTracking()
.FromSql("dbo.New_User_List #OrganisationId = {0}, #IsManager = {1}", organisationId, isManager)
.ToList();
This should avoid Entity Framework caching completely for this query
Use a new DbContext instance for each query
Alternatively, detach all cached entities from the context before issuing your query... something like (untested):
_context.Set<DriverDTO>().Local.ToList().ForEach(x=>
{
_context.Entry(x).State = EntityState.Detached;
});
Notice that, opposite to what one may think, you can't use _context.Set<DriverDTO>().Local.Clear(), since this will mark your entities as deleted (so if you SaveChanges afterwards, it'll delete the entities from the database), so be careful if you are experimenting with the local cache.
Unless you have a need to use a single DbContext or have the received entities from the SP attached to it, I'd go for #2. Otherwise, I'd go for #1. I put #3 there for completeness but I'd avoid mangling with the local cache unless strictly necessary.

Related

In EF Core, querying with FromSqlInterpolated() method on an entity that has an OwnedEntity doesn't work

I have some complex entities in my context.
DbSet<EntityA> AEntities { get; set; }
DbSet<EntityB> BEntities { get; set; }
where EntityA inherits EntityB (it's a superset of it). There is a column Discriminator that was created by Entity Framework to distinguish between the 2. Up to here, no problem.
EntityA also has an Owned Type:
modelBuilder.Entity<EntityA>().OwnsOne(p => p.OwnedThing,
ot =>
{
// ot......;
ot.ToTable("SectionTrailInfo");
});
As long as I use only EF to query the database, it's still all right.
But then for performance reasons I need to query an BEntities with a SQL Stored Proc.
// ...
return await BEntities.FromSqlInterpolated($"EXEC GetManyRecords #Id = {id}").ToListAsync();
That also worked before introducing the Owned Entity in EntityA. Now I get a 'FromSqlRaw' or 'FromSqlInterpolated' was called with non-composable SQL and with a query composing over it. error. The error is thrown as soon as I try the .ToQueryString() method.
I have tried modifying the Stored Proc to Left Join the Owned Type but it doesn't change a thing. In fact the query is never formed and never sent to the DB server. So I have to tell EF Core in a way or another how to deal with the Owned Type (that is not even part of this Entity, but a related one).
My only other option is to re-design completely my schema to removed the Owned Type but I am hoping I can avoid that.
Thanks in advance.
As kindly pointed out by #IvanStoev, the way to go is to use a Table-Valued-Function instead of a Stored Proc, in SQL.

how DbContext.AttachRange() works in this scenario

I saw a book with some code like this:
public class Order
{
public int OrderID { get; set; }
public ICollection<CartLine> Lines { get; set; }
...
}
public class CartLine
{
public int CartLineID { get; set; }
public Product Product { get; set; }
public int Quantity { get; set; }
}
//Product class is just a normal class that has properties such as ProductID, Name etc
and in the order repository, there is a SaveOrder method:
public void SaveOrder(Order order)
{
context.AttachRange(order.Lines.Select(l => l.Product));
if (order.OrderID == 0)
{
context.Orders.Add(order);
}
context.SaveChanges();
}
and the book says:
when store an Order object in the database. When the user’s cart data is deserialized from the session store, the JSON package creates new objects that are not known to
Entity Framework Core, which then tries to write all the objects into the database. For the Product objects, this means that Entity Framework Core tries to write objects that have already been stored, which causes an error. To avoid this problem, I notify Entity Framework Core that the objects exist and shouldn’t be stored in the database unless they are modified
I'm confused, and have two questions:
Q1-why writing objects that have already been stored will cause an error, in the point of view of underlying database, it's just an update SQL statement that modify all columns to their current values?I know it does unnecessary works by changing nothing and rewrite everything, but it shouldn't throw any error in database level?
Q2-why we don't do the same thing to CartLine as:
context.AttachRange(order.Lines.Select(l => l.Product));
context.AttachRange(order.Lines);
to prevent CartLine objects stored in the database just as the way we do it to Product object?
Okay, so this is gonna be a long one:
1st Question:
In Entity Framework (core or "old" 6), there's this concept of "Change tracking". The DbContext class is capable of tracking all the changes you made to your data, and then applying it in the DB via SQL statements (INSERT, UPDATE, DELETE). To understand why it throws an error in your case, you first need to understand how the DbContext / change tracking actually works. Let's take your example:
public void SaveOrder(Order order)
{
context.AttachRange(order.Lines.Select(l => l.Product));
if (order.OrderID == 0)
{
context.Orders.Add(order);
}
context.SaveChanges();
}
In this method, you receive an Order instance which contains Lines and Products. Let's assume that this method was called from some web application, meaning you didn't load the Order entity from the DB. This is what's know as the Disconected Scenario
It's "disconnected" in the sense that your DbContext is not aware of their existence. When you do context.AttachRange you are literally telling EF: I'm in control here, and I'm 100% sure these entities already exist in the DB. Please be aware of them for now on!,
Let's use your code again: Imagine that it's a new Order (so it will enter your if there) and you remove the context.AttachRange part of the code. As soon as the code reaches the Add and SaveChanges these things will happen internally in the DbContext:
The DetectChanges method will be called
It will try to find all the entities Order, Lines and Products in its current graph
If it doesn't find them, they will be added to the "pending changes" as a new records to be inserted
Then you continue and call SaveChanges and it will fail as the book tells you. Why? Imagine that the Products selected were:
Id: 1, "Macbook Pro"
Id: 2, "Office Chair"
When the DbContext looked at the entities and didn't know about them, it added them to the pending changes with a state of Added. When you call SaveChanges, it issues the INSERT statements for these products based on their current state in the model. Since Id's 1 and 2 already exists in the database, the operation failed, with a Primary Key violation.
That's why you have to call Attach (or AttachRange) in this case. This effectively tells EF that the entities exist in the DB, and it should not try to insert them again. They will be added to the context with a state of Unchanged. Attach is often used in these cases where you didn't load the entities from the dbContext before.
2nd question:
This is hard for me to access because I don't know the context/model at that level, but here's my guess:
You don't need to do that with the Cartline because with every order, you probably want to insert new Order line. Think like buying stuff at Amazon. You put the products in the cart and it will generate an Order, then Order Lines, things that compose that order.
If you were then to update an existing order and add more items to it, then you would run into the same issue. You would have to load the existing CartLines prior to saving them in the db, or call Attach as you did here.
Hope it's a little bit clearer. I have answered a similar question where I gave more details, so maybe reading that also helps more:
How does EF Core Modified Entity State behave?

Unexpected behavior in entity framework

I ran into what I think is a really odd situation with entity framework. Basically, if I update an row directly with a sql command, when I retrive that row through linq it doesn't have the updated information. Please see the below example for more information.
First I created a simple DB table
CREATE TABLE dbo.Foo (
Id int NOT NULL PRIMARY KEY IDENTITY(1,1),
Name varchar(50) NULL
)
Then I created a console application to add an object to the DB, update it with a sql command and then retrieve the object that was just created. Here it is:
public class FooContext : DbContext
{
public FooContext() : base("FooConnectionString")
{
}
public IDbSet<Foo> Foo { get; set; }
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<Foo>().ToTable("Foo");
base.OnModelCreating(modelBuilder);
}
}
public class Foo
{
[Key]
public int Id { get; set; }
public string Name { get; set; }
}
public class Program
{
static void Main(string[] args)
{
//setup the context
var context = new FooContext();
//add the row
var foo = new Foo()
{
Name = "Before"
};
context.Foo.Add(foo);
context.SaveChanges();
//update the name
context.Database.ExecuteSqlCommand("UPDATE Foo Set Name = 'After' WHERE Id = " + foo.Id);
//get the new foo
var newFoo = context.Foo.FirstOrDefault(x => x.Id == foo.Id);
//I would expect the name to be 'After' but it is 'Before'
Console.WriteLine(string.Format("The new name is: {0}", newFoo.Name));
Console.ReadLine();
}
}
The write line at the bottom prints out "Before" however I would expect that it prints out "After". The odd thing about it is that if I run profiler I see the sql query run and if I run the query in management studio myself, it returns "After" as the name. I am running sql server 2014.
Can someone please help me understand what is going on here?
UPDATE:
It is going to the database on the FirstOrDefault line. Please see the attached screen shot from sql profiler.
So my question really is this:
1) If it is caching, shouldn't it not be going to the DB? Is this a bug in EF?
2) If it is going to the db and spending the resources, shouldn't EF update the object.
FooContext includes change tracking and caching, so the in-memory object that is returned from your query is the same instance that you added earlier. Calling SaveChanges() does clear the context and FooContext is not aware of the changes that happened underneath it in the database.
This is usually a good thing -- not making expensive database calls for every operation.
In your sample, try making the same query from a new FooContext, and you should see "After".
update
Responding to your updated question, yes, you are right. I missed before that you were using FirstOrDefault(). If you were using context.Find(foo.Id), as I wrongly assumed, then there would be no query.
As for why the in-memory object is not updated to reflect the change in the database, I'd need to do some research to do anything more than speculate. That said, here is my speculation:
An instance of the database context cannot return more than one instance of the same entity. Within a unit of work, we must be able to rely on the context to return the same instance of the entity. Otherwise, we might query by different criteria and get 3 objects representing the same conceptual entity. At that point, how can the context deal with changes to any of them? What if the name is changed to a different value on two of them and then SaveChanges() is called -- what should happen?
Given then that the context tracks at most a single instance of each entity, why can't EF just update that entity at the point at which a query is executed? EF could even discard that change if there is a pending in-memory change, since it knows about those changes.
I think one part of the answer is that diffing all the columns on large entities and in large result sets is performance prohibitive.
I think a bigger part of the answer is that it executing a simple SELECT statement should not have the potential to cause side effects throughout the system. Entities may be grouped or looped over by the value of some property and to change the value of that property at an indeterminate time and as a result of a SELECT query is highly unsound.

Entity Framework: Linq query finds entries by original data but returns reference to changed entry

I just spent some days now to find a bug caused by some strange behavior of the Entity Framework (version 4.4.0.0). For an explanation I wrote a small test program. At the end you'll find some questions I have about that.
Declaration
Here we have a class "Test" which represents our test dataset. It only has an ID (primary key) and a "value" property. In our TestContext we implement a DbSet Tests, which shall handle our "Test" objects as a database table.
public class Test
{
public int ID { get; set; }
public int value { get; set; }
}
public class TestContext : DbContext
{
public DbSet<Test> Tests { get; set; }
}
Initialization
Now, we remove any (if existent) entries from our "Tests" table and add our one and only "Test" object. It has ID=1 (primary key) and value=10.
// Create a new DBContext...
TestContext db = new TestContext();
// Remove all entries...
foreach (Test t in db.Tests) db.Tests.Remove(t);
db.SaveChanges();
// Add one test entry...
db.Tests.Add(new Test { ID = 1, value = 10 });
db.SaveChanges();
Tests
Finally, we run some tests. We select our entry by it's original value (=10) and we change the "value" of our entry to 4711. BUT, we do not call db.SaveChanges(); !!!
// Find our entry by it's value (=10)
var result = from r in db.Tests
where r.value == 10
select r;
Test t2 = result.FirstOrDefault();
// change its value from 10 to 4711...
t2.value = 4711;
Now, we try to find the (old) entry by the original value (=10) and do some tests on the results of that.
// now we try to select it via its old value (==10)
var result2 = from r in db.Tests
where r.value == 10
select r;
// Did we get it?
if (result2.FirstOrDefault() != null && result2.FirstOrDefault().value == 4711)
{
Console.WriteLine("We found the changed entry by it's old value...");
}
When running the program we'll actually see "We found the changed entry by it's old value...". That means we have run a query for r.value == 10, found something... This would be acceptable. BUT, get receive the already changed object (not fulfilling value == 10)!!!
Note: You'll get an empty result set for "where r.value == 4711".
In some further testing, we find out, that the Entity Framework always hands out a reference to the same object. If we change the value in one reference, it's changed in the other one too. Well, that's ok... but one should know it happens.
Test t3 = result2.FirstOrDefault();
t3.value = 42;
if (t2.value == 42)
{
Console.WriteLine("Seems as if we have a reference to the same object...");
}
Summary
When running a LINQ query on the same Database Context (without calling SaveChanges()) we will receive references to the same object, if it has the same primary key. The strange thing is: Even, if we change an object we will find it (only!) by it's old values. But we will receive a reference to the already changed object. This means that the restrictions in our query (value == 10) is not guaranteed for any entries that we changed since our last call of SaveChanges().
Questions
Of course, I'll probably have to live with some effects here. But I, would like to avoid to "SaveChanges()" after every little change. Especially, because I would like to use it for transaction handling... to be able to revert some changes, if something goes wrong.
I would be glad, if anyone could answer me one or even both of the following questions:
Is there a possibility to change the behavior of entity framework to work as if I would communicate with a normal database during a transaction? If so... how to do it?
Where is a good resource for answering "How to use the context of entity framework?" which answers questions like "What can I rely on?" and "How to choose the scope of my DBContext object"?
EDIT #1
Richard just explained how to access the original (unchanged) database values. While this is valuable and helpful I've got the urge to clarify the goal ...
Let's have a look at what happens when using SQL. We setup a table "Tests":
CREATE TABLE Tests (ID INT, value INT, PRIMARY KEY(ID));
INSERT INTO Tests (ID, value) VALUES (1,10);
Then we have a transaction, that first looks for entities whose values are 10. After this, we update the value of these entries and look again for those entries. In SQL we already work on the updated version, so we will not find any results for our second query. After all we do a "rollback", so the value of our entry should be 10 again...
START TRANSACTION;
SELECT ID, value FROM Tests WHERE value=10; {1 result}
UPDATE Tests SET value=4711 WHERE ID=1; {our update}
SELECT ID, value FROM Tests WHERE value=10; {no result, as value is now 4711}
ROLLBACK; { just for testing transactions... }
I would like to have exactly this behavior for the Entity Framework (EF), where db.SaveChanges(); is equivalent to "COMMIT", where all LINQ queries are equivalent to "SELECT" statements and every write access to an entity is just like an "UPDATE". I don't care about when the EF does actually calls the UPDATE statement, but it should behave the same way as using a SQL Database the direct way... Of course, if "SaveChanges()" is called and returning successfully it should be guaranteed that all data was persisted correctly.
Note: Yes, I could call db.SaveChanges() before every query, but then I would loose the possibility for a "Rollback".
Regards,
Stefan
As you've discovered, Entity Framework tracks the entities it has loaded, and returns the same reference for each query which accesses the same entity. This means that the data returned from your query matches the current in-memory version of the data, and not necessarily the data in the database.
If you need to access the database values, you have several options:
Use a new DbContext to load the entity;
Use .AsNoTracking() to load an un-tracked copy of your entity;
Use context.Entry(entity).GetDatabaseValues() to load the property values from the database;
If you want to overwrite the properties of the local entity with the values from the database, you'll need to call context.Entry(entity).Reload().
You can wrap your updates in a transaction to achive the same result as in your SQL example:
using (var transaction = new TransactionScope())
{
var result = from r in db.Tests
where r.value == 10
select r;
Test t2 = result.FirstOrDefault();
// change its value from 10 to 4711...
t2.value = 4711;
// send UPDATE to Database but don't commit transcation
db.SaveChanges();
var result2 = from r in db.Tests
where r.value == 10
select r;
// should not return anything
Trace.Assert(result2.Count() == 0);
// This way you can commit the transaction:
// transaction.Complete();
// but we do nothing and after this line, the transaction is rolled back
}
For more information see http://msdn.microsoft.com/en-us/library/bb896325(v=vs.100).aspx
I think your problem is the expression tree. The Entity Framework executes your query to the database when you say SaveChanges(), as you allready mentioned. When manipulating something within the context, the changes do not happen on the database, they happen in your physical memory. Just when you call SaveChanges() your actions are translated to let's say SQL.
When you do a simple select the database is queried just in the moment when you acces the data. So if your have not call SaveChanges(), it finds the dataset in the database with (SQL)SELECT* FROM Test WHERE VALUE = 10 but interprets from the expression tree, that it has to be value == 4711.
The transaction in EF is happening in your storage. Everything you do before SaveChanges() is your transaction. Read for further information: MSDN
A really good ressource, which is probably up to date, for infomations about the EF is the Microsoft Data Developer Center

Improving efficiency with Entity Framework

I have been using the Entity Framework with the POCO First approach. I have pretty much followed the pattern described by Steve Sanderson in his book 'Pro ASP.NET MVC 3 Framework', using a DI container and DbContext class to connect to SQL Server.
The underlying tables in SQL server contain very large datasets used by different applications. Because of this I have had to create views for the entities I need in my application:
class RemoteServerContext : DbContext
{
public DbSet<Customer> Customers { get; set; }
public DbSet<Order> Orders { get; set; }
public DbSet<Contact> Contacts { get; set; }
...
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<Customer>().ToTable("vw_Customers");
modelBuilder.Entity<Order>().ToTable("vw_Orders");
...
}
}
and this seems to work fine for most of my needs.
The problem I have is that some of these views have a great deal of data in them so that when I call something like:
var customers = _repository.Customers().Where(c => c.Location == location).Where(...);
it appears to be bringing back the entire data set, which can take some time before the LINQ query reduces the set to those which I need. This seems very inefficient when the criteria is only applicable to a few records and I am getting the entire data set back from SQL server.
I have tried to work around this by using stored procedures, such as
public IEnumerable<Customer> CustomersThatMatchACriteria(string criteria1, string criteria2, ...) //or an object passed in!
{
return Database.SqlQuery<Customer>("Exec pp_GetCustomersForCriteria #crit1 = {0}, #crit2 = {1}...", criteria1, criteria2,...);
}
whilst this is much quicker, the problem here is that it doesn't return a DbSet and so I lose all of the connectivity between my objects, e.g. I can't reference any associated objects such as orders or contacts even if I include their IDs because the return type is a collection of 'Customers' rather than a DbSet of them.
Does anyone have a better way of getting SQL server to do the querying so that I am not passing loads of unused data around?
var customers = _repository.Customers().Where(c => c.Location == location).Where(...
If Customers() returns IQueryable, this statement alone won't actually be 'bringing back' anything at all - calling Where on an IQueryable gives you another IQueryable, and it's not until you do something that causes query execution (such as ToList, or FirstOrDefault) that anything will actually be executed and results returned.
If however this Customers method returns a collection of instantiated objects, then yes, since you are asking for all the objects you're getting them all.
I've never used either code-first or indeed even then repository pattern, so I don't know what to advise, other than staying in the realm of IQueryable for as long as possible, and only executing the query once you've applied all relevant filters.
What I would have done to return just a set of data would have been the following:
var customers = (from x in Repository.Customers where <boolean statement> &&/|| <boolean statement select new {variableName = x.Name , ...).Take(<integer amount for amount of records you need>);
so for instance:
var customers = (from x in _repository.Customers where x.ID == id select new {variableName = x.Name} ).take(1000);
then Iterate through the results to get the data: (remember, the linq statement returns an IQueryable)...
foreach (var data in customers)
{
string doSomething = data.variableName; //to get data from your query.
}
hope this helps, not exactly the same methods, but I find this handy in my code
Probably it's because your Cusomters() method in your repository is doing a GetAll() kind of thing and fetching the entire list first. This prohibits LINQ and your SQL Server from creating smart queries.
I don't know if there's a good workaround for your repository, but if you would do something like:
using(var db = new RemoteServerContext())
{
var custs = db.Customers.Where(...);
}
I think that will be a lot quicker. If your project is small enough, you can do without a repository. Sure, you'll lose an abstraction layer, but with small projects this may not be a big problem.
On the other hand, you could load all Customers in your repository once and use the resulting collection directly (instead of the method-call that fills the list). Beware of adding, removing and modifying Customers though.
You need the LINQ query to return less data like sql paging like top function in sql or do manual querying using stored procedures. In either cases, you need to rewrite your querying mechanism. This is one of the reasons why I didn't use EF, because you don't have a lot of control over the code it seems.

Categories

Resources