Caching Entities causes unwanted inserts

Caching Entities causes unwanted inserts - c#

If I cache a entire table:
static List<Table1> table1Cache = context.Table1.ToList();
Then I use it to associate:
var context = new Context();
var t2 = new Table2();
t2.MyTable1Reference = table1Cache.Single(x=>x.Id == paramIntId);
context.SaveChanges();
A new row will be inserted to Table1, because of the third line. EF thinks that is a new entity. I know that I can do somethings like always Attaching the cache when create de context(I have 1 context per Request), or use MyTable1ReferenceID = table1Cache.Single(x=>x.Id == paramIntId).Id;
But its not secure, I can forget sometimes, there is a good solution?

yes, that makes sense because the entity is not currently associated with the current context. therefore EF thinks it's transient and saves a new instance.
if you are caching across contexts, then you don't want to store the object itself. that is related to the context. instead you want to store the data in cache. basically serializing and deserializing the entity. You will also need to associate the entity when the current context so the next time it's retrieved from cache you can save change to both the cache and the database.
if all this sounds like a lot, it is. keeping 2 data stores synchronized is not an easy problem to solve. I would take a look at the implementation of 2nd level cache for NHibernate.

Related

Live Transfer of data from one provider to another in Entity Framework

I apologise if this has been asked already, I am struggling greatly with the terminology of what I am trying to find out about as it conflicts with functionality in Entity Framework.
What I am trying to do:
I would like to create an application that on setup gives the user to use 1 database as a "trial"/"startup" database, i.e. non-production database. This would allow a user to trial the application but would not have backups etc. in no way would this be a "production" database. This could be SQLite for example.
When the user is then ready, they could then click "convert to production" (or similar), and give it the target of the new database machine/database. This would be considered the "production" environment. This could be something like MySQL, SQLServer or.. whatever else EF connects to these days..
The question:
Does EF support this type of migration/data transfer live? Would it need another app where you could configure the EF source and EF destination for it to then run through the process of conversion/seeding/population of the data source to another data source?
Why I have asked here:
I have tried to search for things around this topic, but transferring/migration brings up subjects totally non-related, so any help would be much appreciated.

From what you describe I don't think there is anything out of the box to support that. You can map a DbContext to either database, then it would be a matter of fetching and detaching entities from the evaluation DbContext and attaching them to the production one.
For a relatively simple schema / object graph this would be fairly straight-forward to implement.
ICollection<Customer> customers = new List<Customer>();
using(var context = new AppDbContext(evalConnectionString))
{
customers = context.Customers.AsNoTracking().ToList();
}
using(var context = new AppDbContext(productionConnectionString))
{ // Assuming an empty database...
context.Customers.AddRange(customers);
}
Though for more complex models this could take some work, especially when dealing with things like existing lookups/references. Where you want to move objects that might share the same reference to another object you would need to query the destination DbContext for existing relatives and substitute them before saving the "parent" entity.
ICollection<Order> orders = new List<Order>();
using(var context = new AppDbContext(evalConnectionString))
{
orders = context.Orders
.Include(x => x.Customer)
.AsNoTracking()
.ToList();
}
using(var context = new AppDbContext(productionConnectionString))
{
var customerIds = orders.Select(x => x.Customer.CustomerId)
.Distinct().ToList();
var existingCustomers = context.Customers
.Where(x => customerIds.Contains(x.CustomerId))
.ToList();
foreach(var order in orders)
{ // Assuming all customers were loaded
var existingCustomer = existingCustomers.SingleOrDefault(x => x.CustomerId == order.Customer.CustomerId);
if(existingCustomer != null)
order.Customer = existingCustomer;
else
existingCustomers.Add(order.Customer);
context.Orders.Add(order);
}
}
This is a very simple example to outline how to handle scenarios where you may be inserting data with references that may, or may not exist in the target DbContext. If we are copying across Orders and want to deal with their respective Customers we first need to check if any tracked customer reference exists and use that reference to avoid a duplicate row being inserted or throwing an exception.
Normally loading the orders and related references from one DbContext should ensure that multiple orders referencing the same Customer entity will all share the same entity reference. However, to use detached entities that we can associate with the new DbContext via AsNoTracking(), detached references to the same record will not be the same reference so we need to treat these with care.
For example where there are 2 orders for the same customer:
var ordersA = context.Orders.Include(x => x.Customer).ToList();
Assert.AreSame(orders[0].Customer, orders[1].Customer); // Passes
var ordersB = context.Orders.Include(x => x.Customer).AsNoTracking().ToList();
Assert.AreSame(orders[0].Customer, orders[1].Customer); // Fails
Even though in the 2nd example both are for the same customer. Each will have a Customer reference with the same ID, but 2 different references because the DbContext is not tracking the references used. One of the several "gotchas" with detached entities and efforts to boost performance etc. Using tracked references isn't ideal since those entities will still think they are associated with another DbContext. We can detach them, but that means diving through the object graph and detaching all references. (Do-able, but messy compared to just loading them detached)
Where it can also get complicated is when possibly migrating data in batches (disposing of a DbContext regularly to avoid performance pitfalls for larger data volumes) or synchronizing data over time. It is generally advisable to first check the destination DbContext for matching records and use those to avoid duplicate data being inserted. (or throwing exceptions)
So simple data models this is fairly straight forward. For more complex ones where there is more data to bring across and more relationships between that data, it's more complicated. For those systems I'd probably look at generating a database-to-database migration such as creating INSERT statements for the desired target DB from the data in the source database. There it is just a matter of inserting the data in relational order to comply with the data constraints. (Either using a tool or rolling your own script generation)

EF tracking vs non-tracking

When do I want to have tracking enabled and when I want it disabled in a WebAPI? It almost seems like I would always want to use this:
context.ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;
with my DbContext and only when I need to persist an object I would mark the object as modified. Could you give me a specific example when I need to have tracking enabled and when I want it to be disabled?
Thanks

First Let's understand what exactly is tracking , this is a good reading about it but in short :
Tracking behavior controls if Entity Framework Core will keep
information about an entity instance in its change tracker. If an
entity is tracked, any changes detected in the entity will be
persisted to the database during SaveChanges().
var blog = context.Blogs.SingleOrDefault(b => b.BlogId == 1);
blog.Rating = 5;
context.SaveChanges();
as you can see in the above example if the query is tracked (which is default behavior) you don't even need to mark the object as modified , because this object is retrieved by the context it's attached to it and the context will notice changes preformed on it and persist it we SaveChanges() is called
So to answer your questions: it depends on the scenario ,if you are sure that you will not modify the retrieved data and won't need to persist any changes that you might perform on it then there is no point in using a tracked query , in fact it would benefit the performance if you used a No-Tracking query .
Think of No-Tracking queries as read-only data that you just want to retrieve to display to the user or extract some info from it
the mentioned articular talks about EF core but the tracking vs no-tracking concept are the same even in other ORMs

WebAPI will always have NoTracking.
Tracking is required when you do a fetch and then you make changes(updates) to the same object that was fetched. Now if you save that object back to DB, then tracking makes sense.
This is never the case in WebAPI.

The only context instances I use QueryTrackingBehavior.NoTracking are Reporting contexts, not API contexts unless the API applicable to that context is entirely read-only.
NoTracking will provide a nominal speed boost for data Read operations.
You can use NoTracking for update operations, but you will require a bit of additional code, and incur a nominal penalty for Updates. If you are building an Append-only (Inserts, no Updates) then NoTracking provides no penalty.
Why: When EF loads an entity with tracking, 2 things happen. First, the reference is loaded into the local cache. Second, a proxy is used which keeps track of updates against fields on the entity.
Given an update accepting a new Message for a Record entity:
void UpdateMessage(int recordId, string message);
With tracking:
void UpdateMessage(int recordId, string message)
{
using(var context = new AppContext())
{
var record = context.Records.Single(x => x.RecordId == recordId);
record.Message = message;
context.SaveChanges();
}
}
Without tracking:
void UpdateMessage(int recordId, string message)
{
using(var context = new AppContext())
{
var record = context.Records.AsNoTracking().Single(x => x.RecordId == recordId);
record.Message = message;
context.Update(record); // or Attach() and set Modified state.
context.SaveChanges();
}
}
These look very similar on the surface, but there is a distinct difference that will happen under the hood:
In the first case, EF will generate an SQL statement similar to:
UPDATE tblRecords SET Message = #1 WHERE RecordId = #0
In the second case, EF will generate:
UPDATE tblRecords SET Message = #1, SomeField = #2, SomeOtherField = #3, CreatedAt = #4, CreatedBy = #5 WHERE RecordId = #0
When taking untracked entities and "Updating" them, EF has no idea what changed, so every column is updated. With tracking, only fields that were updated will be in the query. For larger entities this can be noticeable.
Inserts, including for append-only systems aren't affected since these would include all columns anyways.

If you are projecting to viewmodels and/or sending over-the-wire, etc... tracking will not have a difference, and is a slight performance hit.
If you are doing more complicated queries where you are pulling data into memory and mutating it, then tracking makes more sense as it allows you to modify an entity and call SaveChanges again.
It's as simple as that.

Load table from a database at the first reference in context

I'm using Entity Framework 4.0. I do a lot of read operations on the database (data analysis). No data will be saved. Currently, despite the Lazy Loading, number of I / O operations to the database server slows down considerably the application. I decided most of the small tables loaded into memory (.ToList()) and then generate the calculation. Is there a way to automatically read the context of the data in the table are only the first references to it and have not been updated by the life context?. The idea is that with further references to this table was not queried database, only the memory of the application.
Now, I use this code:
public class cDBReader
{
private List<RISK_T_MEMBERS> fMembers;
public List<RISK_T_MEMBERS> Members
{
get
{
if (fMembers == null)
using (RiskEntities context = new RiskEntities(TConfiguration.connectionString))
{
context.RISK_T_MEMBERS.MergeOption = System.Data.Objects.MergeOption.NoTracking;
fMembers = context.RISK_T_MEMBERS.ToList();
}
return fMembers;
}
set { fMembers = value; }
}
}

Could you please add some more information? What is wrong with the current implementation?
It seems like what you need is some kind of caching. Here is an example implementation on caching.
An even simple ( less overkill in my opinion ) way to implement caching would be this.
Last but not least is any implementation of yours ( like the one you already made ). Maybe the singleton pattern could help you here? That is a singleton object that saves all those data you need globally and implements gets to your data by providing the context currently active, and in case the data is null, the application uses the active context to get them. Dropping and creating the context could also speed up the application after a while ( and that's the reason why I think you should provide the active context to the getData functions ).

You could try Eager Loading instead of Lazy Loading first to get full control on what EF loads and just manually preload what you need once. Loaded data will stay alive within one context lifetime bounds, if you will need more and are sure that database is read only, you could just store somewhere and then attach preloaded data when new context is created

How do you save a Linq object if you don't have its data context?

I have a Linq object, and I want to make changes to it and save it, like so:
public void DoSomething(MyClass obj) {
obj.MyProperty = "Changed!";
MyDataContext dc = new MyDataContext();
dc.GetTable<MyClass>().Attach(dc, true); // throws exception
dc.SubmitChanges();
}
The exception is:
System.InvalidOperationException: An entity can only be attached as modified without original state if it declares a version member or does not have an update check policy.
It looks like I have a few choices:
put a version member on every one of my Linq classes & tables (100+) that I need to use in this way.
find the data context that originally created the object and use that to submit changes.
implement OnLoaded in every class and save a copy of this object that I can pass to Attach() as the baseline object.
To hell with concurrency checking; load the DB version just before attaching and use that as the baseline object (NOT!!!)
Option (2) seems the most elegant method, particularly if I can find a way of storing a reference to the data context when the object is created. But - how?
Any other ideas?
EDIT
I tried to follow Jason Punyon's advice and create a concurrency field on on table as a test case. I set all the right properties (Time Stamp = true etc.) on the field in the dbml file, and I now have a concurrency field... and a different error:
System.NotSupportedException: An attempt has been made to Attach or Add an entity that is not new, perhaps having been loaded from another DataContext. This is not supported.
So what the heck am I supposed to attach, then, if not an existing entity? If I wanted a new record, I would do an InsertOnSubmit()! So how are you supposed to use Attach()?
Edit - FULL DISCLOSURE
OK, I can see it's time for full disclosure of why all the standard patterns aren't working for me.
I have been trying to be clever and make my interfaces much cleaner by hiding the DataContext from the "consumer" developers. This I have done by creating a base class
public class LinqedTable<T> where T : LinqedTable<T> {
...
}
... and every single one of my tables has the "other half" of its generated version declared like so:
public partial class MyClass : LinqedTable<MyClass> {
}
Now LinqedTable has a bunch of utility methods, most particularly things like:
public static T Get(long ID) {
// code to load the record with the given ID
// so you can write things like:
// MyClass obj = MyClass.Get(myID);
// instead of:
// MyClass obj = myDataContext.GetTable<MyClass>().Where(o => o.ID == myID).SingleOrDefault();
}
public static Table<T> GetTable() {
// so you can write queries like:
// var q = MyClass.GetTable();
// instead of:
// var q = myDataContext.GetTable<MyClass>();
}
Of course, as you can imagine, this means that LinqedTable must somehow be able to have access to a DataContext. Up until recently I was achieving this by caching the DataContext in a static context. Yes, "up until recently", because that "recently" is when I discovered that you're not really supposed to hang on to a DataContext for longer than a unit of work, otherwise all sorts of gremlins start coming out of the woodwork. Lesson learned.
So now I know that I can't hang on to that data context for too long... which is why I started experimenting with creating a DataContext on demand, cached only on the current LinqedTable instance. This then led to the problem where the newly created DataContext wants nothing to do with my object, because it "knows" that it's being unfaithful to the DataContext that created it.
Is there any way of pushing the DataContext info onto the LinqedTable at the time of creation or loading?
This really is a poser. I definitely do not want to compromise on all these convenience functions I've put into the LinqedTable base class, and I need to be able to let go of the DataContext when necessary and hang on to it while it's still needed.
Any other ideas?

Updating with LINQ to SQL is, um, interesting.
If the data context is gone (which in most situations, it should be), then you will need to get a new data context, and run a query to retrieve the object you want to update. It's an absolute rule in LINQ to SQL that you must retrieve an object to delete it, and it's just about as iron-clad that you should retrieve an object to update it as well. There are workarounds, but they are ugly and generally have lots more ways to get you in trouble. So just go get the record again and be done with it.
Once you have the re-fetched object, then update it with the content of your existing object that has the changes. Then do a SubmitChanges() on the new data context. That's it! LINQ to SQL will generate a fairly heavy-handed version of optimistic concurrency by comparing every value in the record to the original (in the re-fetched) record. If any value changed while you had the data, LINQ to SQL will throw a concurrency exception. (So you don't need to go altering all your tables for versioning or timestamps.)
If you have any questions about the generated update statements, you'll have to break out SQL Profiler and watch the updates go to the database. Which is actually a good idea, until you get confidence in the generated SQL.
One last note on transactions - the data context will generate a transaction for each SubmitChanges() call, if there is no ambient transaction. If you have several items to update and want to run them as one transaction, make sure you use the same data context for all of them, and wait to call SubmitChanges() until you've updated all the object contents.
If that approach to transactions isn't feasible, then look up the TransactionScope object. It will be your friend.

I think 2 is not the best option. It's sounding like you're going to create a single DataContext and keep it alive for the entire lifetime of your program which is a bad idea. DataContexts are lightweight objects meant to be spun up when you need them. Trying to keep the references around is also probably going to tightly couple areas of your program you'd rather keep separate.
Running a hundred ALTER TABLE statements one time, regenerating the context and keeping the architecture simple and decoupled is the elegant answer...

find the data context that originally created the object and use that to submit changes
Where did your datacontext go? Why is it so hard to find? You're only using one at any given time right?
So what the heck am I supposed to attach, then, if not an existing entity? If I wanted a new record, I would do an InsertOnSubmit()! So how are you supposed to use Attach()?
You're supposed to attach an instance that represents an existing record... but was not loaded by another datacontext - can't have two contexts tracking record state on the same instance. If you produce a new instance (ie. clone) you'll be good to go.
You might want to check out this article and its concurrency patterns for update and delete section.

The "An entity can only be attached as modified without original state if it declares a version member" error when attaching an entitity that has a timestamp member will (should) only occur if the entity has not travelled 'over the wire' (read: been serialized and deserialized again). If you're testing with a local test app that is not using WCF or something else that will result in the entities being serialized and deserialized then they will still keep references to the original datacontext through entitysets/entityrefs (associations/nav. properties).
If this is the case, you can work around it by serializing and deserializing it locally before calling the datacontext's .Attach method. E.g.:
internal static T CloneEntity<T>(T originalEntity)
{
Type entityType = typeof(T);
DataContractSerializer ser =
new DataContractSerializer(entityType);
using (MemoryStream ms = new MemoryStream())
{
ser.WriteObject(ms, originalEntity);
ms.Position = 0;
return (T)ser.ReadObject(ms);
}
}
Alternatively you can detach it by setting all entitysets/entityrefs to null, but that is more error prone so although a bit more expensive I just use the DataContractSerializer method above whenever I want to simulate n-tier behavior locally...
(related thread: http://social.msdn.microsoft.com/Forums/en-US/linqtosql/thread/eeeee9ae-fafb-4627-aa2e-e30570f637ba )

You can reattach to a new DataContext. The only thing that prevents you from doing so under normal circumstances is the property changed event registrations that occur within the EntitySet<T> and EntityRef<T> classes. To allow the entity to be transferred between contexts, you first have to detach the entity from the DataContext, by removing these event registrations, and then later on reattach to the new context by using the DataContext.Attach() method.
Here's a good example.

When you retrieve the data in the first place, turn off object tracking on the context that does the retrieval. This will prevent the object state from being tracked on the original context. Then, when it's time to save the values, attach to the new context, refresh to set the original values on the object from the database, and then submit changes. The following worked for me when I tested it.
MyClass obj = null;
using (DataContext context = new DataContext())
{
context.ObjectTrackingEnabled = false;
obj = (from p in context.MyClasses
where p.ID == someId
select p).FirstOrDefault();
}
obj.Name += "test";
using (DataContext context2 = new ())
{
context2.MyClasses.Attach(obj);
context2.Refresh(System.Data.Linq.RefreshMode.KeepCurrentValues, obj);
context2.SubmitChanges();
}

Saving a single entity instead of the entire context

I've run into a scenario where I essentially need to write the changes of a child entity of a one-to-many association to the database, but not save any changes made to the parent entity.
The Entity Framework currently deals with database commits in the context scope (EntityContext.SaveChanges()), which makes sense for enforcing relationships, etc. But I'm wondering if there is some best practice or maybe a recommended way to go about doing fine-grained database commits on individual entites instead of the entire context.

Best practices? Do you mean, besides, "Don't do it!"?
I don't think there is a best practice for making an ObjectContext different than the state of the database.
If you must do this, I would new up a new ObjectContext and make the changes to the child entity there. That way, both contexts are consistent.

I have a similar need. The solution I am considering is to implement wrapper properties on all entities that store any property changes privately without affecting the actual entity property. I then would add a SaveChanges() method to the entity which would write the changes to the entity and then call SaveChanges() on the context.
The problem with this approach is that you need to make all your entities conform to this pattern. But, it seems to work pretty well. It does have another downside in that if you make a lot of changes to a lot of objects with a lot of data, you end up with extraneous copies in memory.
The only other solution I can think of is to, upon saving changes, save the entity states of all changed/added/deleted entities, set them to unmodified except the one you're changing, save the changes, and then restore the states of the other entities. But that sounds potentially slow.

This can be accomplished by using AcceptAllChanges().
Make your changes to the parent entity, call AcceptAllChanges(), then make your changes to the related Entities and call SaveChanges(). The changes you have made to the parent will not be saved because they have been "committed" to the Entity but not saved to the database.
using (AdventureWorksEntities adv = new AdventureWorksEntities())
{
var completeHeader = (from o in adv.SalesOrderHeader.Include("SalesOrderDetail")
where o.DueDate > System.DateTime.Now
select o).First();
completeHeader.ShipDate = System.DateTime.Now;
adv.AcceptAllChanges();
var details = completeHeader.SalesOrderDetail.Where(x => x.UnitPrice > 10.0m);
foreach (SalesOrderDetail d in details)
{
d.UnitPriceDiscount += 5.0m;
}
adv.SaveChanges();
}

This worked for me. Use the ChangeTracker.Clear() method to clear out changes for other entities.
_contextICH.ChangeTracker.Clear();
var x = _contextICH.UnitOfMeasure.Attach(parameterModel);
x.State = (parameterModel.ID != null) ? Microsoft.EntityFrameworkCore.EntityState.Modified : Microsoft.EntityFrameworkCore.EntityState.Added;
_contextICH.SaveChanges();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.