Updating EF entities based on deep JSON data

Updating EF entities based on deep JSON data - c#

I have a data structure which looks something like this: foo 1:* bar 1:* baz
It could look something like this when passed to the client:
{
id: 1,
bar: [{
id: 1,
baz: []
},
{
id: 2,
baz: [{
id: 1
}]
}
]
}
In my UI, this is represented by a tree structure, where the user can update/add/remove items on all levels.
My question is, when the user has made modifications and I'm sending the altered data back to the server, how should I perform the EF database update? My initial thought was to implement dirty tracking on the client side, and make use of the dirty flag on the server in order to know what to update. Or maybe EF can be smart enough to do an incremental update itself?

Unfortunately EF provides little if no help for such scenario.
The change tracker works well in connected scenarios, but working with disconnected entities has been totally left out for the develpers using the EF. The provided context methods for manipulating entity state can handle simplified scenarios with primitive data, but does not play well with related data.
The general way to handle it is to load the existing data (icluding related) from the database, then detect and apply the add/updates/deletes yourself. But accounting for all related data (navigation property) types (one-to-many (owned), many-to-one (associated), many-to-many etc), plus the hard way to work with EF6 metadata makes the generic solution extremely hard.
The only attempt to address the issue generically AFAIK is the GraphDiff package. Applying the modifications with that package in your scenario is simple as that:
using RefactorThis.GraphDiff;
IEnumerable<Foo> foos = ...;
using (var db = new YourDbContext())
{
foreach (var foo in foos)
{
db.UpdateGraph(foo, fooMap =>
fooMap.OwnedCollection(f => f.Bars, barMap =>
barMap.OwnedCollection(b => b.Bazs)));
}
db.SaveChanges();
}
See Introducing GraphDiff for Entity Framework Code First - Allowing automated updates of a graph of detached entities for more info about the problem and how the package is addressing the different aspects of it.
The drawback is that the package is no more supported by the author, and also there is no support for EF Core in case you decide to port from EF6 (working with disconnected entities in EF Core has some improvements, but still doesn't offer a general out of the box solution).
But implementing correctly the update manually even for specific model is a real pain. Just for comparison, the most condensed equivalent of the above UpdateGraph method for 3 simple entities having only primitive and collection type navigation properties will look something like this:
db.Configuration.AutoDetectChangesEnabled = false;
var fooIds = foos.Where(f => f.Id != 0).Select(f => f.Id).ToList();
var oldFoos = db.Foos
.Where(f => fooIds.Contains(f.Id))
.Include(f => f.Bars.Select(b => b.Bazs))
.ToDictionary(f => f.Id);
foreach (var foo in foos)
{
Foo dbFoo;
if (!oldFoos.TryGetValue(foo.Id, out dbFoo))
{
dbFoo = db.Foos.Create();
dbFoo.Bars = new List<Bar>();
db.Foos.Add(dbFoo);
}
db.Entry(dbFoo).CurrentValues.SetValues(foo);
var oldBars = dbFoo.Bars.ToDictionary(b => b.Id);
foreach (var bar in foo.Bars)
{
Bar dbBar;
if (!oldBars.TryGetValue(bar.Id, out dbBar))
{
dbBar = db.Bars.Create();
dbBar.Bazs = new List<Baz>();
db.Bars.Add(dbBar);
dbFoo.Bars.Add(dbBar);
}
else
{
oldBars.Remove(bar.Id);
}
db.Entry(dbBar).CurrentValues.SetValues(bar);
var oldBazs = dbBar.Bazs.ToDictionary(b => b.Id);
foreach (var baz in bar.Bazs)
{
Baz dbBaz;
if (!oldBazs.TryGetValue(baz.Id, out dbBaz))
{
dbBaz = db.Bazs.Create();
db.Bazs.Add(dbBaz);
dbBar.Bazs.Add(dbBaz);
}
else
{
oldBazs.Remove(baz.Id);
}
db.Entry(dbBaz).CurrentValues.SetValues(baz);
}
db.Bazs.RemoveRange(oldBazs.Values);
}
db.Bars.RemoveRange(oldBars.Values);
}
db.Configuration.AutoDetectChangesEnabled = true;

Related

EF 4.1 Insert/Update Logic Best Practices

I'm inserting a lot of data, wrapped in a transaction (like 2 million+ rows) at a time, using EF 4.1. Now I'd like to add UPDATE logic. Keep in mind, change-tracking is disabled given the volume of data. Off the top of my head, I'd do something like this:
// Obviously simplified code...
public void AddOrUpdate(Foo foo)
{
if(!db.Foos.Any(x => someEqualityTest(foo)))
{
db.Foos.Add(foo);
}
else
{
var f = db.Foos.First(x => someEqualityTest(foo));
f = foo;
}
db.SaveChanges();
}
Any ideas on how possibly to improve on this?

I would keep the inserts separate from the updates.
For inserts, I would recommend using SqlBulkCopy to insert all records which don't already exist and it's going to be way faster.
First, the Bulk Insert method in your DbContext:
public class YourDbContext : DbContext
{
public void BulkInsert<T>(string tableName, IList<T> list)
{
using (var bulkCopy = new SqlBulkCopy(base.Database.Connection))
{
bulkCopy.BatchSize = list.Count;
bulkCopy.DestinationTableName = tableName;
var table = new DataTable();
var props = TypeDescriptor.GetProperties(typeof(T))
// Dirty hack to make sure we only have system
// data types (i.e. filter out the
// relationships/collections)
.Cast<PropertyDescriptor>()
.Where(p => "System" == p.PropertyType.Namespace)
.ToArray();
foreach (var prop in props)
{
bulkCopy.ColumnMappings.Add(prop.Name, prop.Name);
var type = Nullable.GetUnderlyingType(prop.PropertyType)
?? prop.PropertyType;
table.Columns.Add(prop.Name, type);
}
var values = new object[props.Length];
foreach (var item in list)
{
for (var i = 0; i < values.Length; i++)
{
values[i] = props[i].GetValue(item);
}
table.Rows.Add(values);
}
bulkCopy.WriteToServer(table);
}
}
}
Then, for your insert/update:
public void AddOrUpdate(IList<Foo> foos)
{
var foosToUpdate = db.Foos.Where(x => foos.Contains(x)).ToList();
var foosToInsert = foos.Except(foosToUpdate).ToList();
foreach (var foo in foosToUpdate)
{
var f = db.Foos.First(x => someEqualityTest(x));
// update the existing foo `f` with values from `foo`
}
// Insert the new Foos to the table named "Foos"
db.BulkInsert("Foos", foosToinsert);
db.SaveChanges();
}

Your update...
var f = db.Foos.First(x => someEqualityTest(foo));
f = foo;
...won't work because you are not changing the loaded and attached object f at all, you just overwrite the variable f with the detached object foo. The attached object is still in the context, but it has not been changed after loading and you don't have a variable anymore which points to it. SaveChanges will do nothing in this case.
The "standard options" you have are:
var f = db.Foos.First(x => someEqualityTest(foo));
db.Entry(f).State = EntityState.Modified;
or just
db.Entry(foo).State = EntityState.Modified;
// attaches as Modified, no need to load f
This marks ALL properties as modified - no matter if they really changed or not - and will send an UPDATE for each column to the database.
The second option which will only mark the really changed properties as modified and only send an UPDATE for the changed columns:
var f = db.Foos.First(x => someEqualityTest(foo));
db.Entry(f).CurrentValues.SetValues(foo);
Now, with 2 million objects to update you don't have a "standard" situation and it is possible that both options - especially the second which likely uses reflection internally to match property names of source and target object - are too slow.
The best option when it comes to performance of updates are Change Tracking Proxies. This would mean that you need to mark EVERY property in your entity class as virtual (not only the navigation properties, but also the scalar properties) and that you don't disable creation of change tracking proxies (it is enabled by default).
When you load your object f from the database EF will create then a dynamic proxy object (derived from your entity), similar to lazy loading proxies, which has code injected into every property setter to maintain a flag if the property has been changed or not.
The change tracking provided by proxies is much faster than the snapshot based change tracking (which happens in SaveChanges or DetectChanges).
I am not sure though if the two options mentioned above are faster if you use change tracking proxies. It is possible that you need manual property assignments to get the best performance:
var f = db.Foos.First(x => someEqualityTest(foo));
f.Property1 = foo.Property1;
f.Property2 = foo.Property2;
// ...
f.PropertyN = foo.PropertyN;
In my experience in a similar update situation with a few thousand of objects there is no real alternative to change tracking proxies regarding performance.

How to save an updated many-to-many collection on detached Entity Framework 4.1 POCO entity

For the last few days I'm trying to properly update my POCO entities. More specific, it's many-to-many relationship collections.
I've three database tables:
Author - 1..n - AuthorBooks - n..1 - Books.
Translates to two POCO entities:
An Author entity with a Books collection and Book entity with a Authors collection.
Case
When I have one active DbContext, retrieve a Book entity, add an Author and call SaveChanges(), the changes are properly send to the database. All fine so far.
However I've a desktop application with limited DbContext lifetime, as displayed in code fragments below.
public Book GetBook(int id)
{
using (var context = new LibariesContext())
{
return context.Books
.Include(b => b.Authors)
.AsNoTracking()
.Single(b => b.BookId == id);
}
}
public Author GetAuthor(int id)
{
using (var context = new LibariesContext())
{
return context.Authors
.AsNoTracking()
.Single(a => a.AuthorId == id);
}
}
A simplified example of various of my business logic methods, wraps it together:
public void BusinessLogicMethods()
{
Book book = GetBook(id: 1);
Author author = GetAuthor(id: 1);
book.Name = "New book title";
book.Authors.Add(author);
SaveBook(book);
}
public void SaveBook(Book book)
{
using (var context = new LibariesContext())
{
context.Entry(book).State = System.Data.EntityState.Modified;
context.SaveChanges();
}
}
Unfortunately the only thing that is really saved here, is the name of the book. The new author was not saved, neither was an Exception thrown.
Questions
What's the best way to save the collection of a detached entity?
Any workaround for this issue?

I'm new to EF 4.1 too, and if I understand your question correctly, I think I ran into this nonsense over the weekend. After trying every approach under the sun to get the entries to update, I found a mantra here on SO (I can't find it any more) and turned it into a generic extension method. So now, after the line:
book.Authors.Add(author);
I would add the line:
context.UpdateManyToMany(book, b => b.Authors)
You might need to restructure your code to make this happen.
Anyway... here's the extension method I wrote. Let me know if it works (no guarantees!!!)
public static void UpdateManyToMany<TSingle, TMany>(
this DbContext ctx,
TSingle localItem,
Func<TSingle, ICollection<TMany>> collectionSelector)
where TSingle : class
where TMany : class
{
DbSet<TSingle> localItemDbSet = ctx.Set(typeof(TSingle)).Cast<TSingle>();
DbSet<TMany> manyItemDbSet = ctx.Set(typeof(TMany)).Cast<TMany>();
ObjectContext objectContext = ((IObjectContextAdapter) ctx).ObjectContext;
ObjectSet<TSingle> tempSet = objectContext.CreateObjectSet<TSingle>();
IEnumerable<string> localItemKeyNames = tempSet.EntitySet.ElementType.KeyMembers.Select(k => k.Name);
var localItemKeysArray = localItemKeyNames.Select(kn => typeof(TSingle).GetProperty(kn).GetValue(localItem, null));
localItemDbSet.Load();
TSingle dbVerOfLocalItem = localItemDbSet.Find(localItemKeysArray.ToArray());
IEnumerable<TMany> localCol = collectionSelector(localItem)?? Enumerable.Empty<TMany>();
ICollection<TMany> dbColl = collectionSelector(dbVerOfLocalItem);
dbColl.Clear();
ObjectSet<TMany> tempSet1 = objectContext.CreateObjectSet<TMany>();
IEnumerable<string> collectionKeyNames = tempSet1.EntitySet.ElementType.KeyMembers.Select(k => k.Name);
var selectedDbCats = localCol
.Select(c => collectionKeyNames.Select(kn => typeof (TMany).GetProperty(kn).GetValue(c, null)).ToArray())
.Select(manyItemDbSet.Find);
foreach (TMany xx in selectedDbCats)
{
dbColl.Add(xx);
}
ctx.Entry(dbVerOfLocalItem).CurrentValues.SetValues(localItem);
}

I came across this question when I was attempting to solve the same problem. In the end I took a different approach which seems to be working. I had to end up exposing a "state" property on the entities and have the calling context set the state of the entities within the object graph.
This reduces the overhead on the web service/data context side to have to determine what's changed, given that the graph could be populated by any number of query permeation. There's also a NuGet package called GraphDiff which might work for you as well(details in the link below).
Anyhow, full details here: http://sanderstechnology.com/2013/solving-the-detached-many-to-many-problem-with-the-entity-framework/12505/

How can I correctly attach record object to data context in Entity Framework?

I need to switch data context for some records. So basically I have db context A and B, I fetch records using A, then I switch to B, alter records, and save them.
When I call Attach for B, I get exception that records are using by multiple data context, when I add Detach for A, I get exception, that records are not attached to A.
So how can I switch the data context?
Example
db_creator is creator of db context. Here I fetch the data (corrected version):
using (var db = db_creator.Create())
{
var coll = db.Mailing.Where(it => !it.mail_IsSent).ToList(); // (*)
coll.ForEach(it => db.Detach(it));
return coll;
}
(*) the mistake was caused by refactoring this piece, I created extra data context, and then later I tried to detach records from another one.
Now I would like to switch data context to new one, do some computation and modifications and save the records. coll is List of the records:
using (var db = db_creator.Create())
{
coll.ForEach(it => db.Mailing.Attach(it));
...
db.SaveChanges();
}

I recommend change your design and have ONE context at a time. (Based on your project type this could vary. Usually in web apps it's one context per http request.)
For example in a web application, you can do this like below:
protected MyContext Context
{
get
{
var context = HttpContext.Current.Items["MyContext"];
if (context == null)
{
context = new MyContext();
HttpContext.Current.Items.Add("MyContext", context);
}
return context as MyContext;
}
}
And dispose it in your Application_EndRequest:
app.EndRequest += (sender, args) =>
{
HttpContext.Current.Items.Remove("MyContext");
}
If you have multiple project types, then consider using an Ioc.
But if you still want to use two contexts, you can do as below(myEntity is your object you want to detach/attach):
if (context1.Entry(myEntity).State != EntityState.Detached);
{
((IObjectContextAdapter)context1).ObjectContext.Detach(myEntity);
}
context2.MyEntities.Attach(myEntity);

I have come to the conclusion that it's best (i.e. easier to avoid problems) to use ApplyCurrentValues instead of attaching. That is because when you call Attach there are several things going on that we don't know about, but which may surface in one way or the other through an exception. I prefer to do things the way I have control over what is done.
var myMailings = db_creator.Create().Mailing.Where(it => !it.mail_IsSent).ToList();
... // make modifications and retrieve coll a collection of Mailing objects
using (var db = db_creator.Create()) {
... // if you want to further modify the objects in coll you should do this before writing it to the context
foreach (Mailing it in coll) {
if (it.EntityKey != null) db.GetObjectByKey(it.EntityKey); // load entity
else db.Mailing.Single(x => x.YourId == it.YourId); // load the entity when EntityKey is not available, e.g. because it's a POCO
db.Mailing.ApplyCurrentValues(it); // sets the entity state to Modified
}
db.SaveChanges();
}
EDIT:
I tested the performance of this vs using Attach. It should be noted that for a simple table with an integer primary key, an int, a float and a string column for updating 1000 entries: the difference was 2.6s vs 0.27s, so this is significantly slower.
EDIT2:
A similar question was raised here. There the answer warned about using ApplyCurrentValues in conjunction with timestamp columns.
I also compared performance when loading the entity with db.GetObjectByKey(it.EntityKey) and there the performance difference is much smaller. ApplyCurrentValues then just takes 0.44s.

Relationships in Habanero

I have been trying to write some generic code to create an xml package of Habanero business objects. The code can currently handle compostion relationships, but I need to add the Association relationships manually. Is there any way to add association relationsips that don't have a composite reverse relationship in a more generic way.
This is how the composition relationships are added
private static void AddRelatedCompositionObjects(Package package, IBusinessObject businessObject)
{
businessObject.Relationships
.Where(rel => rel.RelationshipType == RelationshipType.Composition)
.Where(rel => rel is IMultipleRelationship)
.Select(rel => (IMultipleRelationship)rel)
.ForEach(rel => rel.BusinessObjectCollection
.AsEnumerable<IBusinessObject>()
//.ForEach(package.Add));
.ForEach(bo => BuildPackage(package, bo)));
businessObject.Relationships
.Where(rel => rel.RelationshipType == RelationshipType.Composition)
.Where(rel => rel is ISingleRelationship)
.Select(rel => (ISingleRelationship)rel)
//.ForEach(rel => package.Add(rel.GetRelatedObject()));
.ForEach(rel => BuildPackage(package, rel.GetRelatedObject()));
}
And then I manually add the association relationships
var package = new Package();
foreach (var returnDelivery in returnDeliveries)
{
package.Add(returnDelivery);
if (returnDelivery != null)
{
var materials = returnDelivery.DeliveryItems.Select(item => item.Material).Distinct();
materials.ToList().ForEach(material =>
{
package.Add(material);
material.EWTMaterials.ForEach(package.Add);
});
package.Add(returnDelivery.Customer);
}
}

First thing to realise is that
1) Habanero does not require you to have a reverse relationship defined. Although if you are generating your class definitions from Firestarter you will have one.
I have stolen this sample snippet from the ClassDefValidator in Habanero.BO so it might not be exactly what you want and could certainly be generalised into the architecture.
What this code snipped does is get the reverse relationshipDef for a relationshipDef
this code is in Habanero.BO.ClassDefValidator
CheckRelationshipsForAClassDef method if you look here you will see code to get the relatedClassDef. It should be pretty easy to convert this into something you need.
If you have any problems then give me a shout.
if (!HasReverseRelationship(relationshipDef)) return;
string reverseRelationshipName = relationshipDef.ReverseRelationshipName;
if (!relatedClassDef.RelationshipDefCol.Contains(reverseRelationshipName))
{
throw new InvalidXmlDefinitionException
(string.Format
("The relationship '{0}' could not be loaded for because the reverse relationship '{1}' defined for class '{2}' is not defined as a relationship for class '{2}'. Please check your ClassDefs.xml or fix in Firestarter.",
relationshipDef.RelationshipName, reverseRelationshipName, relatedClassDef.ClassNameFull));
}
var reverseRelationshipDef = relatedClassDef.RelationshipDefCol[reverseRelationshipName];
Brett

C# - Entity Framework - Understanding some basics

Model #1 - This model sits in a database on our Dev Server.
Model #1 http://content.screencast.com/users/Keith.Barrows/folders/Jing/media/bdb2b000-6e60-4af0-a7a1-2bb6b05d8bc1/Model1.png
Model #2 - This model sits in a database on our Prod Server and is updated each day by automatic feeds. alt text http://content.screencast.com/users/Keith.Barrows/folders/Jing/media/4260259f-bce6-43d5-9d2a-017bd9a980d4/Model2.png
I have written what should be some simple code to sync my feed (Model #2) into my working DB (Model #1). Please note this is prototype code and the models may not be as pretty as they should. Also, the entry into Model #1 for the feed link data (mainly ClientID) is a manual process at this point which is why I am writing this simple sync method.
private void SyncFeeds()
{
var sourceList = from a in _dbFeed.Auto where a.Active == true select a;
foreach (RivWorks.Model.NegotiationAutos.Auto source in sourceList)
{
var targetList = from a in _dbRiv.Product where a.alternateProductID == source.AutoID select a;
if (targetList.Count() > 0)
{
// UPDATE...
try
{
var product = targetList.First();
product.alternateProductID = source.AutoID;
product.isFromFeed = true;
product.isDeleted = false;
product.SKU = source.StockNumber;
_dbRiv.SaveChanges();
}
catch (Exception ex)
{
string m = ex.Message;
}
}
else
{
// INSERT...
try
{
long clientID = source.Client.ClientID;
var companyDetail = (from a in _dbRiv.AutoNegotiationDetails where a.ClientID == clientID select a).First();
var company = companyDetail.Company;
switch (companyDetail.FeedSourceTable.ToUpper())
{
case "AUTO":
var product = new RivWorks.Model.Negotiation.Product();
product.alternateProductID = source.AutoID;
product.isFromFeed = true;
product.isDeleted = false;
product.SKU = source.StockNumber;
company.Product.Add(product);
break;
}
_dbRiv.SaveChanges();
}
catch (Exception ex)
{
string m = ex.Message;
}
}
}
}
Now for the questions:
In Model #2, the class structure for Auto is missing ClientID (see red circled area). Now, everything I have learned, EF creates a child class of Client and I should be able to find the ClientID in the child class. Yet, when I run my code, source.Client is a NULL object. Am I expecting something that EF does not do? Is there a way to populate the child class correctly?
Why does EF hide the child entity ID (ClientID in this case) in the parent table? Is there any way to expose it?
What else sticks out like the proverbial sore thumb?
TIA

1) The reason you are seeing a null for source.Client is because related objects are not loaded until you request them, or they are otherwise loaded into the object context. The following will load them explicitly:
if (!source.ClientReference.IsLoaded)
{
source.ClientReference.Load();
}
However, this is sub-optimal when you have a list of more than one record, as it sends one database query per Load() call. A better alternative is to the Include() method in your initial query, to instruct the ORM to load the related entities you are interested in, so:
var sourceList = from a in _dbFeed.Auto .Include("Client") where a.Active == true select a;
An alternative third method is to use something call relationship fix-up, where if, in your example for instance, the related clients had been queried previously, they would still be in your object context. For example:
var clients = (from a in _dbFeed.Client select a).ToList();
The EF will then 'fix-up' the relationships so source.Client would not be null. Obviously this is only something you would do if you required a list of all clients for synching, so is not relevant for your specific example.
Always remember that objects are never loaded into the EF unless you request them!
2) The first version of the EF deliberately does not map foreign key fields to observable fields or properties. This is a good rundown on the matter. In EF4.0, I understand foreign keys will be exposed due to popular demand.
3) One issue you may run into is the number of database queries requesting Products or AutoNegotiationContacts may generate. As an alternative, consider loading them in bulk or with a join on your initial query.
It's also seen as good practice to use an object context for one 'operation', then dispose of it, rather than persisting them across requests. There is very little overhead in initialising one, so one object context per SychFeeds() is more appropriate. ObjectContext implements IDisposable, so you can instantiate it in a using block and wrap the method's contents in that, to ensure everything is cleaned up correctly once your changes are submitted.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.