I am trying to solve a problem as well as learn and improve my code skills here.
I am using Entity Framework and am tasked with writing to a SQL table, where I can update or insert based on whether a row exists or not. If it doesn't exist then add it, if it does exist then update it if required.
I have 2 lists, the first list is the EF type from the table that I am writing to. The second list is a class which is made up from a SQL query but shares some of the columns from the table which needs updating, thus if they differ then update the table with the differing property values.
foreach (var tbl in Table_List)
{
foreach (var query in SQL_Query)
{
if (tbl.ID == query.ID)
{
bool changed = false;
if (tbl.Prop1 != query.Prop1)
{
tbl.Prop1 = query.Prop1;
changed = true;
}
if (tbl.Prop2 != query.Prop2)
{
tbl.Prop2 = query.Prop2;
changed = true;
}
if (changed)
await Context.SaveChangesAsync();
}
}
}
There are 10 properties in total in the class, but even if all 10 of them differ I only have to update 2 properties, the rest can stay the same. So to summarize, my question is;
Is there a better way to update these 2 properties? Something other than a bulky series of if statements and foreach loops? Any info on straight up inserting would be appreciated too, thanks very much!
EF uses an internal ChangeTracker. This means that when you change a property of an entity that is being tracked (you queried the lists using a DbSet on the Context) it will marked as "Changed" in the ChangeTracker. This is used by the SaveChangesAsync to determine what to do, ie. insert of update and what fields need to be updated.
Second, this ChangeTracker is smart enough to detect that when you set a property to the same value it already has it won't be marked as a Change.
Also with this ChangeTracker there is no need to call SaveChangesAsync after every change. You can call it at the end of the loop.
foreach (var tbl in Table_List)
{
foreach (var query in SQL_Query)
{
if (tbl.ID == query.ID)
{
tbl.Prop1 = query.Prop1;
tbl.Prop2 = query.Prop2;
}
}
}
await Context.SaveChangesAsync();
Related
We're investigating a performance issue where EF 6.1.3 is being painfully slow, and we cannot figure out what might be causing it.
The database context is initialized with:
Configuration.ProxyCreationEnabled = false;
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
We have isolated the performance issue to the following method:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
var writer = session.Writer<T>();
writer.Attach(entity);
await writer.UpdatePropertyAsync(entity, changedProperties.ToArray()).ConfigureAwait(false);
}
return entity.Id;
}
There are two names in the changedProperties list, and EF correctly generated an update statement that updates just these two properties.
This method is called repeatedly (to process a collection of data items) and takes about 15-20 seconds to complete.
If we replace the method above with the following, execution time drops to 3-4 seconds:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
var sql = $"update {entity.TypeName()}s set";
var separator = false;
foreach (var property in changedProperties)
{
sql += (separator ? ", " : " ") + property + " = #" + property;
separator = true;
}
sql += " where id = #Id";
var parameters = (from parameter in changedProperties.Concat(new[] { "Id" })
let property = entity.GetProperty(parameter)
select ContextManager.CreateSqlParameter(parameter, property.GetValue(entity))).ToArray();
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
await session.UnderlyingDatabase.ExecuteSqlCommandAsync(sql, parameters).ConfigureAwait(false);
}
return entity.Id;
}
The UpdatePropertiesAsync method called on the writer (a repository implementation) looks like this:
public virtual async Task UpdatePropertyAsync(T entity, string[] changedPropertyNames, bool save = true)
{
if (changedPropertyNames == null || changedPropertyNames.Length == 0)
{
return;
}
Array.ForEach(changedPropertyNames, name => context.Entry(entity).Property(name).IsModified = true);
if (save)
await context.SaveChangesAsync().ConfigureAwait(false);
}
}
What is EF doing that completely kills performance? And is there anything we can do to work around it (short of using another ORM)?
By timing the code I was able to see that the additional time spent by EF was in the call to Attach the object to the context, and not in the actual query to update the database.
By eliminating all object references (setting them to null before attaching the object and restoring them after the update is complete) the EF code runs in "comparable times" (5 seconds, but with lots of logging code) to the hand-written solution.
So it looks like EF has a "bug" (some might call it a feature) causing it to inspect the attached object recursively even though change tracking and validation have been disabled.
Update: EF 7 appears to have addressed this issue by allowing you to pass in a GraphBehavior enum when calling Attach.
The problem with Entity framework is that when you call SaveChanges(), insert statements are sent to database one by one, that's how Entity works.
And actually there are 2 db hits per insert, first db hit is insert statement for a record, and the second one is select statement to get the id of inserted record.
So you have numOfRecords * 2 database trips * time for one database trip.
Write this in your code context.Database.Log = message => Debug.WriteLine(message); to log generated sql to console, and you will see what am I talking about.
You can use BulkInsert, here is the link: https://efbulkinsert.codeplex.com/
Seeing as though you already have tried setting:
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
And you are not using an ordered lists, I think you are going to have to refactor your code and do some benchmarking.
I believe the bottleneck is coming from the foreach as the context is having to deal with a potentially large amounts of bulk data (not sure how many this is in your case).
Try and cut the items contained in your array down into smaller batches before calling the SaveChanges(); or SaveChangesAsync(); methods, and note the performance deviations as apposed to letting the context grow too large.
Also, if you are still not seeing further gains, try disposing of the context post SaveChanges(); and then creating a new one, depending on the size of your entities list, flushing out the context may yield even further improvements.
But this all depends on how many entities we are talking about and may only be noticeable in the hundreds and thousands of record scenarios.
I'm using Entity Framework 6, Code First approach. I'll try to present my problem with a simple piece of code:
public void ViewEntity(MyEntity Entity) // Want to read properties of my entity
{
using (var Db = new MyDbContext())
{
var DummyList = Db.MyEntities.ToList(); // Iteration on this DbSet
Db.MyEntities.Attach(Entity); // Exception
}
}
The exception message is: Attaching an entity of type 'MyProgram.MyEntity' failed because another entity of the same type already has the same primary key value.
From what I've read on MSDN it's an expected behaviour. But what I want on that last line is to first check if there is an entity with the same key already attached to a context; if it is, use it instead, and only otherwise attach my entity to context.
But I've failed to find a way to do so. There are many utility methods on ObjectContext instance (for example GetObjectByKey). I can't test them all 'cause they all ultimately need a qualifiedEntitySetName, and I don't have any in my real imlpementation, because this method should be on an abstract class and it should work for all entity types. Calling Db.Entity(this) is no use, there is no EntityKey which would have EntitySetName.
So all of this became complex really fast. And in my terms I just want to check if the object is already in "cache" (context), use it, otherwise use my object and attach it to this context.
To be clear, I have a detached object from a TreeNode.Tag in the first place, and I just want to use it again, or if it's impossible; if there already is one in the context), use that one instead. Maybe I'm missing some crucial concepts of EF6, I'm just starting out with EF.
I've found a solution for me. As I guessed correctly ObjectContext.GetObjectByKey method does what I need, but first I needed to construct qualifiedEntitySetName, and I found a way to do so. A tad bit cumbersome (using reflection, iterating properties of MyDbContext), but does not compare to a headache of a problem I made out of all this. Just in case, here's the patch of code that is a solution for me:
public SdsAbstractObject GetAttachedToContext()
{
var ObjContext = (SdsDbContext.Current as IObjectContextAdapter).ObjectContext;
var ExistingItem = ObjContext.GetObjectByKey(GetEntityKey()) as SdsAbstractObject;
if (ExistingItem != null)
return ExistingItem;
else
{
DbSet.Attach(this);
return this;
}
}
public EntityKey GetEntityKey()
{
string DbSetName = "";
foreach (var Prop in typeof(SdsDbContext).GetProperties())
{
if (Prop.PropertyType.IsGenericType
&& Prop.PropertyType.GenericTypeArguments[0] == ObjectContext.GetObjectType(GetType()))
DbSetName = Prop.Name;
}
if (String.IsNullOrWhiteSpace(DbSetName))
return null;
else
return new EntityKey("SdsDbContext." + DbSetName, "Id", Id);
}
An Entity can be in one of five stages : Added, Unchanged, Modified, Deleted, Detached.
public void ViewEntity(MyEntity entity) // Want to read properties of my entity
{
using (var Db = new MyDbContext())
{
var DummyList = Db.MyEntities.ToList(); // Iteration on this DbSet
// Set the Modified state of entity or you can write defensive code
// to check it before set the state.
if (Db.Entry(entity).State == EntityState.Modified) {
Db.Entry(entity).State = EntityState.Modified
}
// Attached it
Db.MyEntities.Attach(Entity);
Db.SaveChanges();
}
}
Since EF doesn't know which properties are different from those in the database, it will update them all.
I have a table that contains greater than half a million records. Each record contains about 60 fields but we only make changes to three of them.
We make a small modification to each entity based on a calculation and a look up.
Clearly I can't update each entity in turn and then SaveChanges as that would take far too long.
So at the end of the whole process I call SaveChanges on the Context.
This is causing an Out of Memory error when i apply SaveChanges
I'm using the DataRepository pattern.
//Update code
DataRepository<ExportOrderSKUData> repoExportOrders = new DataRepository<ExportOrderSKUData>();
foreach (ExportOrderSKUData grpDCItem in repoExportOrders.all())
{
..make changes to enity..
}
repoExportOrders.SaveChanges();
//Data repository snip
public DataRepository()
{
_context = new tomEntities();
_objectSet = _context.CreateObjectSet<T>();
}
public List<T> All()
{
return _objectSet.ToList<T>();
}
public void SaveChanges()
{
_context.SaveChanges();
}
What should I be looking for in this instance?
Making changes to half a million record through EF within one transaction is not supposed use case. Doing it in small batches is a better technical solution. Doing it on database side through some stored procedure can be even better solution.
I would start by slightly modifying your code (translate it to your repository API yourselves):
using (var readContext = new YourContext()) {
var set = readContext.CreateObjectSet<ExportOrderSKUData>();
foreach (var item in set.ToList()) {
readContext.Detach(item);
using (var updateContext = new YourContext()) {
updateContext.Attach(item);
// make your changes
updateContext.SaveChanges();
}
}
}
This code uses separate context for saving item = each save is in its own transaction. Don't be afraid of that. Even if you try to save more records within one call of SaveChanges EF will use separate roundtrip to database for every updated record. The only difference is if you want to have multiple updates in the same transaction (but having half a million updates in single transaction will cause issues anyway).
Another option may be:
using (var readContext = new YourContext()) {
var set = readContext.CreateObjectSet<ExportOrderSKUData>();
set.MergeOption = MergeOption.NoTracking;
foreach (var item in set) {
using (var updateContext = new YourContext()) {
updateContext.Attach(item);
// make your changes
updateContext.SaveChanges();
}
}
}
This can in theory consume even less memory because you don't need to have all entities loaded prior to doing foreach. The first example probably needs to load all entities prior to enumeration (by calling ToList) to avoid exception when calling Detach (modifying collection during enumeration) - but I'm not sure if that really happens.
Modifying those examples to use some batches should be easy.
I need to switch data context for some records. So basically I have db context A and B, I fetch records using A, then I switch to B, alter records, and save them.
When I call Attach for B, I get exception that records are using by multiple data context, when I add Detach for A, I get exception, that records are not attached to A.
So how can I switch the data context?
Example
db_creator is creator of db context. Here I fetch the data (corrected version):
using (var db = db_creator.Create())
{
var coll = db.Mailing.Where(it => !it.mail_IsSent).ToList(); // (*)
coll.ForEach(it => db.Detach(it));
return coll;
}
(*) the mistake was caused by refactoring this piece, I created extra data context, and then later I tried to detach records from another one.
Now I would like to switch data context to new one, do some computation and modifications and save the records. coll is List of the records:
using (var db = db_creator.Create())
{
coll.ForEach(it => db.Mailing.Attach(it));
...
db.SaveChanges();
}
I recommend change your design and have ONE context at a time. (Based on your project type this could vary. Usually in web apps it's one context per http request.)
For example in a web application, you can do this like below:
protected MyContext Context
{
get
{
var context = HttpContext.Current.Items["MyContext"];
if (context == null)
{
context = new MyContext();
HttpContext.Current.Items.Add("MyContext", context);
}
return context as MyContext;
}
}
And dispose it in your Application_EndRequest:
app.EndRequest += (sender, args) =>
{
HttpContext.Current.Items.Remove("MyContext");
}
If you have multiple project types, then consider using an Ioc.
But if you still want to use two contexts, you can do as below(myEntity is your object you want to detach/attach):
if (context1.Entry(myEntity).State != EntityState.Detached);
{
((IObjectContextAdapter)context1).ObjectContext.Detach(myEntity);
}
context2.MyEntities.Attach(myEntity);
I have come to the conclusion that it's best (i.e. easier to avoid problems) to use ApplyCurrentValues instead of attaching. That is because when you call Attach there are several things going on that we don't know about, but which may surface in one way or the other through an exception. I prefer to do things the way I have control over what is done.
var myMailings = db_creator.Create().Mailing.Where(it => !it.mail_IsSent).ToList();
... // make modifications and retrieve coll a collection of Mailing objects
using (var db = db_creator.Create()) {
... // if you want to further modify the objects in coll you should do this before writing it to the context
foreach (Mailing it in coll) {
if (it.EntityKey != null) db.GetObjectByKey(it.EntityKey); // load entity
else db.Mailing.Single(x => x.YourId == it.YourId); // load the entity when EntityKey is not available, e.g. because it's a POCO
db.Mailing.ApplyCurrentValues(it); // sets the entity state to Modified
}
db.SaveChanges();
}
EDIT:
I tested the performance of this vs using Attach. It should be noted that for a simple table with an integer primary key, an int, a float and a string column for updating 1000 entries: the difference was 2.6s vs 0.27s, so this is significantly slower.
EDIT2:
A similar question was raised here. There the answer warned about using ApplyCurrentValues in conjunction with timestamp columns.
I also compared performance when loading the entity with db.GetObjectByKey(it.EntityKey) and there the performance difference is much smaller. ApplyCurrentValues then just takes 0.44s.
I want to update my database using a LINQ2SQL query.
However this seems for some reason to be a very ugly task compared to the otherwise lovely LINQ code.
The query needs to update two tables.
tbl_subscription
(
id int,
sub_name nvarchar(100),
sub_desc nvarchar(500),
and so on.
)
tbl_subscription2tags
(
sub_id (FK to tbl_subscription)
tag_id (FK to a table called tbl_subscription_tags)
)
Now down to my update function a send a tbl_subscription entity with the tags and everything.
I can't find a pretty way to update my database..
I can only find ugly examples where I suddenly have to map all attributes..
There most be a smart way to perform this. Please help.
C# Example if possible.
I have tried this with no effect:
public void UpdateSubscription(tbl_subscription subscription)
{
db.tbl_subscriptions.Attach(subscription);
db.Refresh(System.Data.Linq.RefreshMode.OverwriteCurrentValues, subscription);
db.SubmitChanges(System.Data.Linq.ConflictMode.FailOnFirstConflict);
}
Source for this code is here:
http://skyeyefive.spaces.live.com/blog/cns!6B6EB6E6694659F2!516.entry
Why don't just make the changes to the objects and perform a SubmitChanges to the DataContext?
using(MyDataContext dc = new MyDataContext("ConnectionString"))
{
foreach(var foo in dc.foo2)
{
foo.prop1 = 1;
}
dc.SubmitChanges();
}
Otherwise you need to tell us more about the lifecycle of the object you want to manipulate
edit: forgot to wrap in brackets for using
Unless I'm misunderstanding your situation, I think that citronas is right.
The best and easiest way that I've found to update database items through LINQ to SQL is the following:
Obtain the item you want to change from the data context
Change whatever values you want to update
Call the SubmitChanges() method of the data context.
Sample Code
The sample code below assumes that I have a data context named DBDataContext that connects to a database that has a Products table with ID and Price parameters. Also, a productID variable contains the ID of the record you want to update.
using (var db = new DBDataContext())
{
// Step 1 - get the item from the data context
var product = db.Products.Where(p => p.ID == productID).SingleOrDefault();
if (product == null) //Error checking
{
throw new ArgumentException();
}
// Step 2 - change whatever values you want to update
product.Price = 100;
// Step 3 - submit the changes
db.SubmitChanges();
}
I found out that you can use "Attach" as seen in my question to update a table, but apparently not the sub tables. So I just used a few Attach and it worked without having to run through parameters!