I have a streets table, which has a combo of two string columns acting as the PK, being postalcode and streetcode.
With EF4.1 and DBContext, I'd like to write a single "Save" method than takes a street (coming in in an unattached state), checks if it already exists in the database. If it does, it issues an UPDATE, and if it doesn't, it issues an INSERT.
FYI, the application that saves these streets, is reading them from a textfile and saves them (there a few tens of thousands of these "streetlines" in that file).
What I've come up with for now is:
public void Save(Street street)
{
var existingStreet = (
from s in streetContext.Streets
where s.PostalCode.Equals(street.PostalCode)
&& s.StreetCode.Equals(street.StreetCode)
select s
).FirstOrDefault();
if (existingStreet != null)
this.streetContext.Entry(street).State = System.Data.EntityState.Modified;
else
this.streetContext.Entry(street).State = System.Data.EntityState.Added;
this.streetContext.SaveChanges();
}
Is this good practice ? How about performance here ? Cause for every street it first does a roundtrip to the db to see if it exists.
Wouldn't it be better performance-wise to try to insert the street (state = added), and catch any PK violations ? In the catch block, I can then change the state to modified and call SaveChanges() again. Or would that not be a good practice ?
Any suggestions ?
Thanks
Select all the streets then make a for each loop that compares and change states. After the loop is done call saveChanges. This way you only make a few calls to the db instead of several thousends
Thanks for the replies but none were really satisfying, so I did some more research and rewrote the method like this, which satisfies my need.
ps: renamed the method to Import() because I find that a more appropriate name for a method that is used for (bulk) importing entities from an outside source (like a textfile in my case)
ps2: I know it's not really best practice to catch an exception and let it die silently by not doing anything with it, but I don't have the need to do anything with it in my particular case. It just serves as a method to find out that the row already exists in the database.
public void Import(Street street)
{
try
{
this.streetContext.Entry(street).State = System.Data.EntityState.Added;
this.streetContext.SaveChanges();
}
catch (System.Data.Entity.Infrastructure.DbUpdateException dbUex)
{
this.streetContext.Entry(street).State = System.Data.EntityState.Modified;
this.streetContext.SaveChanges();
}
finally
{
((IObjectContextAdapter)this.streetContext).ObjectContext.Detach(street);
}
}
Your code must result in exception if street already exists in the database because you will load it from the context and after that you will try to attach another instance with the same primary key to the same context instance.
If you really have to do this use this code instead:
public void Save(Street street)
{
string postalCode = street.PostalCode;
string streetCode = steet.StreetCode;
bool existingStreet = streetContext.Streets.Any(s =>
s.PostalCode == postalCode
&& s.StreetCode = steetCode);
if (existingStreet)
streetContext.Entry(street).State = System.Data.EntityState.Modified;
else
streetContext.Entry(street).State = System.Data.EntityState.Added;
streetContext.SaveChanges();
}
Anyway it is still not reliable solution in highly concurrent system because other thread can insert the same street between you check and subsequent insert.
Related
We're investigating a performance issue where EF 6.1.3 is being painfully slow, and we cannot figure out what might be causing it.
The database context is initialized with:
Configuration.ProxyCreationEnabled = false;
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
We have isolated the performance issue to the following method:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
var writer = session.Writer<T>();
writer.Attach(entity);
await writer.UpdatePropertyAsync(entity, changedProperties.ToArray()).ConfigureAwait(false);
}
return entity.Id;
}
There are two names in the changedProperties list, and EF correctly generated an update statement that updates just these two properties.
This method is called repeatedly (to process a collection of data items) and takes about 15-20 seconds to complete.
If we replace the method above with the following, execution time drops to 3-4 seconds:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
var sql = $"update {entity.TypeName()}s set";
var separator = false;
foreach (var property in changedProperties)
{
sql += (separator ? ", " : " ") + property + " = #" + property;
separator = true;
}
sql += " where id = #Id";
var parameters = (from parameter in changedProperties.Concat(new[] { "Id" })
let property = entity.GetProperty(parameter)
select ContextManager.CreateSqlParameter(parameter, property.GetValue(entity))).ToArray();
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
await session.UnderlyingDatabase.ExecuteSqlCommandAsync(sql, parameters).ConfigureAwait(false);
}
return entity.Id;
}
The UpdatePropertiesAsync method called on the writer (a repository implementation) looks like this:
public virtual async Task UpdatePropertyAsync(T entity, string[] changedPropertyNames, bool save = true)
{
if (changedPropertyNames == null || changedPropertyNames.Length == 0)
{
return;
}
Array.ForEach(changedPropertyNames, name => context.Entry(entity).Property(name).IsModified = true);
if (save)
await context.SaveChangesAsync().ConfigureAwait(false);
}
}
What is EF doing that completely kills performance? And is there anything we can do to work around it (short of using another ORM)?
By timing the code I was able to see that the additional time spent by EF was in the call to Attach the object to the context, and not in the actual query to update the database.
By eliminating all object references (setting them to null before attaching the object and restoring them after the update is complete) the EF code runs in "comparable times" (5 seconds, but with lots of logging code) to the hand-written solution.
So it looks like EF has a "bug" (some might call it a feature) causing it to inspect the attached object recursively even though change tracking and validation have been disabled.
Update: EF 7 appears to have addressed this issue by allowing you to pass in a GraphBehavior enum when calling Attach.
The problem with Entity framework is that when you call SaveChanges(), insert statements are sent to database one by one, that's how Entity works.
And actually there are 2 db hits per insert, first db hit is insert statement for a record, and the second one is select statement to get the id of inserted record.
So you have numOfRecords * 2 database trips * time for one database trip.
Write this in your code context.Database.Log = message => Debug.WriteLine(message); to log generated sql to console, and you will see what am I talking about.
You can use BulkInsert, here is the link: https://efbulkinsert.codeplex.com/
Seeing as though you already have tried setting:
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
And you are not using an ordered lists, I think you are going to have to refactor your code and do some benchmarking.
I believe the bottleneck is coming from the foreach as the context is having to deal with a potentially large amounts of bulk data (not sure how many this is in your case).
Try and cut the items contained in your array down into smaller batches before calling the SaveChanges(); or SaveChangesAsync(); methods, and note the performance deviations as apposed to letting the context grow too large.
Also, if you are still not seeing further gains, try disposing of the context post SaveChanges(); and then creating a new one, depending on the size of your entities list, flushing out the context may yield even further improvements.
But this all depends on how many entities we are talking about and may only be noticeable in the hundreds and thousands of record scenarios.
Currently, I am struggling with an issue regarding Entity Framework (LINQ to Entities). Most of the time when I try to execute entity.SaveChanges() everything works fine but at some points entity.SaveChanges() takes too much and timesouts. I searched a lot but was unable to find out the answer.
(According to companies policy, I cannot copy code somewhere else. So, I do not have the exact code but I will try to layout the basic structure. I hope it helps you to figure out the problem but if i doesn't then let me know.)
Task:
My task is to scan the whole network for some specific files. Match content of each file with the content of database and based on the matching either insert or update the database with the content of the file. I have around 3000 files on the network.
Problem:
public void PerformAction()
{
DbTransaction tran = null;
entity.Connection.Open(); //entity is a global variable declared like myDatabaseEntity entity = new myDatabaseEntity();
tran = entity.Connection.BeginTransaction();
foreach(string path in listOfPaths)
{
//returns 1 - Multiple matching in database OR
// 2 - One matching file in database OR
// 3 - No Matching found.
int returnValue = SearchDatabase();
if(returnValue == 1)
DoSomething(); //All inserts/updates work perfectly. Save changes also works correctly.
else if(returnValue == 2)
DoSomething(); //Again, everything ok. SaveChanges works perfectly here.
else
{
//This function uses some XML file to generate all the queries dynamically
//Forexample INSERT INTO TABLEA(1,2,3);
GenerateInsertQueriesFromXML();
ExecuteQueries();
SaveChanges(); <---- Problem here. Sometimes take too much time.
}
//Transaction commit/rollback code here
}
}
public bool ExecuteQueries()
{
int result = 0;
foreach(string query in listOfInsertQueries)
{
result = entity.ExecuteStoreCommand(query); //Execute the insert queries
if(result <=0)
return false;
}
entity.TestEntityA a = new entity.TestEntityA();
a.PropertyA = 123;
a.PropertyB = 345;
//I have around 25 properties here
entity.AddToTestEntityA(a);
return true;
}
Found the issue.
The main table where i was inserting all the data had a trigger on INSERT and DELETE.
So, whenever i inserted some new data in the main table, the trigger was firing in the backend and was taking all the time.
Entity framework is FAST and INNOCENT :D
I have a User entity in my entity model:
Username and Email should be unique but for now EF4 doesn't support this.
So I wrote this code to ensure uniqueness :
public void CreateNewUser(string i_UserName, string i_Email)
{
using (ModelContainer context = new ModelContainer())
{
User usr;
usr = context.UserSet.Where(u => u.Username == i_UserName).SingleOrDefault();
if (usr != null)
throw new Exception("Username not unique");
usr = context.UserSet.Where(u => u.Email == i_Email).SingleOrDefault();
if (usr != null)
throw new Exception("Email not unique");
context.UserSet.AddObject(new User() { Username = i_UserName, Email = i_Email });
context.SaveChanges();
}
}
If this is right approach, do I have way to automatically preform this code whenever context.UserSet.AddObject() is called? Or a more elegant way does exist?
I don't know of a more elegant way. Rather than using SingleOrDefault, I think you want something like:
bool isUniqueUserName = !context.UserSet.Any(u => u.Username == i_UserName);
for performance. Any() stops at the first match and doesn't have to enumerate the entire sequence.
Correct way is defining Unique index in database and catching an exception. These checks in code must be much more complex then simple: say me if the user already exists. First of all this is not only problem of insert but user can also usually modify his email. Another problem is concurrency - in very corner case you can have two users inserting same records in the same time. In both threads test queries can return that user name and email are not registered but during the saving one thread will get an exception (if you have unique index in db) or duplicit record will be created. If you want to avoid it you must lock your database records and lock table for insertion (serializable transaction). Such operation can decrease throughput of your application.
Another fact is that you will do 2 additional queries before each insert and at least one query before each update.
I use the same approach. Just ensure that you have one method that insert it and you should be find. Similar to factory method pattern.
Branching off of #ysrb's answer (having a single method that does the User insert), a more foolproof solution might be to hook into the ObjectContext.SavingChanges event and do your test there if a User entity is being saved. This way you can be sure your logic always fires.
Example:
IEnumerable<ObjectStateEntry> changes = this.ObjectStateManager.GetObjectStateEntries(EntityState.Added);
foreach (ObjectStateEntry stateEntryEntity in changes)
{
if (!stateEntryEntity.IsRelationship && stateEntryEntity.Entity != null)
{
if (stateEntryEntity.Entity is User)
{
//Do you work here
}
}
}
My code is written with C# and the data layer is using LINQ to SQL that fill/load detached object classes.
I have recently changed the code to work with multi threads and i'm pretty sure my DAL isn't thread safe.
Can you tell me if PopCall() and Count() are thread safe and if not how do i fix them?
public class DAL
{
//read one Call item from database and delete same item from database.
public static OCall PopCall()
{
using (var db = new MyDataContext())
{
var fc = (from c in db.Calls where c.Called == false select c).FirstOrDefault();
OCall call = FillOCall(fc);
if (fc != null)
{
db.Calls.DeleteOnSubmit(fc);
db.SubmitChanges();
}
return call;
}
}
public static int Count()
{
using (var db = new MyDataContext())
{
return (from c in db.Calls select c.ID).Count();
}
}
private static OCall FillOCall(Model.Call c)
{
if (c != null)
return new OCall { ID = c.ID, Caller = c.Caller, Called = c.Called };
else return null;
}
}
Detached OCall class:
public class OCall
{
public int ID { get; set; }
public string Caller { get; set; }
public bool Called { get; set; }
}
Individually they are thread-safe, as they use isolated data-contexts etc. However, they are not an atomic unit. So it is not safe to check the count is > 0 and then assume that there is still something there to pop. Any other thread could be mutating the database.
If you need something like this, you can wrap in a TransactionScope which will give you (by default) the serializable isolation level:
using(var tran = new TransactionScope()) {
int count = OCall.Count();
if(count > 0) {
var call = Count.PopCall();
// TODO: something will call, assuming it is non-null
}
}
Of course, this introduces blocking. It is better to simply check the FirstOrDefault().
Note that PopCall could still throw exceptions - if another thread/process deletes the data between you obtaining it and calling SubmitChanges. The good thing about it throwing here is that you shouldn't find that you return the same record twice.
The SubmitChanges is transactional, but the reads aren't, unless spanned by a transaction-scope or similar. To make PopCall atomic without throwing:
public static OCall PopCall()
{
using(var tran = new TrasactionScope())
using (var db = new MyDataContext())
{
var fc = (from c in db.Calls where c.Called == false select c).FirstOrDefault();
OCall call = FillOCall(fc);
if (fc != null)
{
db.Calls.DeleteOnSubmit(fc);
db.SubmitChanges();
}
return call;
}
tran.Complete();
}
}
Now the FirstOrDefault is covered by the serializable isolation-level, so doing the read will take a lock on the data. It would be even better if we could explicitly issue an UPDLOCK here, but LINQ-to-SQL doesn't offer this.
Count() is thread-safe. Calling it twice at the same time, from two different threads will not harm anything. Now, another thread might change the number of items during the call, but so what? Another thread might change the number of item a microsecond after it returns, and there's nothing you can do about it.
PopCall on the other hand, does has a possibility of threading problems. One thread could read fc, then before it reaches the SubmitChanges(), another thread may interceded and do the read & delete, before returning to the first thread, which will attempt to delete the already deleted record. Then both calls will return the same object, even though it was your intension that a row only be returned once.
Unfortunately no amount of Linq-To-Sql trickery, nor SqlClient isolation levels, nor System.Transactions can make the PopCall() thread safe, where 'thread safe' really means 'concurrent safe' (ie. when concurrency occurs on the database server, outside the controll and scope of the client code/process). And neither is any sort of C# locking and synchronization going to help you. You just need to deeply internalize how a relational storage engine works in order to get this doen correctly. Using tables as queues (as you do it here) is notoriously tricky , deadlock prone, and really hard to get it correct.
Even less fortunate, your solution is going to have to be platform specific. I'm only going to explain the right way to do it with SQL Server, and that is to leverage the OUTPUT clause. If you want to get a bit more details why this is the case, read this article Using tables as Queues. Your Pop operation must occur atomically in the database with a calls like this:
WITH cte AS (
SELECT TOP(1) ...
FROM Calls WITH (READPAST)
WHERE Called = 0)
DELETE
FROM cte
OUTPUT DELETED.*;
Not only this, but the Calls table has to be organized with a leftmost clustered key on the Called column. Why this is the case, is again explained in the article I referenced before.
In this context the Count call is basically useless. Your only way to check correctly for an item available is to Pop, asking for Count is just going to put useless stress on the database to return a COUNT() value which means nothing under a concurrent environment.
Model #1 - This model sits in a database on our Dev Server.
Model #1 http://content.screencast.com/users/Keith.Barrows/folders/Jing/media/bdb2b000-6e60-4af0-a7a1-2bb6b05d8bc1/Model1.png
Model #2 - This model sits in a database on our Prod Server and is updated each day by automatic feeds. alt text http://content.screencast.com/users/Keith.Barrows/folders/Jing/media/4260259f-bce6-43d5-9d2a-017bd9a980d4/Model2.png
I have written what should be some simple code to sync my feed (Model #2) into my working DB (Model #1). Please note this is prototype code and the models may not be as pretty as they should. Also, the entry into Model #1 for the feed link data (mainly ClientID) is a manual process at this point which is why I am writing this simple sync method.
private void SyncFeeds()
{
var sourceList = from a in _dbFeed.Auto where a.Active == true select a;
foreach (RivWorks.Model.NegotiationAutos.Auto source in sourceList)
{
var targetList = from a in _dbRiv.Product where a.alternateProductID == source.AutoID select a;
if (targetList.Count() > 0)
{
// UPDATE...
try
{
var product = targetList.First();
product.alternateProductID = source.AutoID;
product.isFromFeed = true;
product.isDeleted = false;
product.SKU = source.StockNumber;
_dbRiv.SaveChanges();
}
catch (Exception ex)
{
string m = ex.Message;
}
}
else
{
// INSERT...
try
{
long clientID = source.Client.ClientID;
var companyDetail = (from a in _dbRiv.AutoNegotiationDetails where a.ClientID == clientID select a).First();
var company = companyDetail.Company;
switch (companyDetail.FeedSourceTable.ToUpper())
{
case "AUTO":
var product = new RivWorks.Model.Negotiation.Product();
product.alternateProductID = source.AutoID;
product.isFromFeed = true;
product.isDeleted = false;
product.SKU = source.StockNumber;
company.Product.Add(product);
break;
}
_dbRiv.SaveChanges();
}
catch (Exception ex)
{
string m = ex.Message;
}
}
}
}
Now for the questions:
In Model #2, the class structure for Auto is missing ClientID (see red circled area). Now, everything I have learned, EF creates a child class of Client and I should be able to find the ClientID in the child class. Yet, when I run my code, source.Client is a NULL object. Am I expecting something that EF does not do? Is there a way to populate the child class correctly?
Why does EF hide the child entity ID (ClientID in this case) in the parent table? Is there any way to expose it?
What else sticks out like the proverbial sore thumb?
TIA
1) The reason you are seeing a null for source.Client is because related objects are not loaded until you request them, or they are otherwise loaded into the object context. The following will load them explicitly:
if (!source.ClientReference.IsLoaded)
{
source.ClientReference.Load();
}
However, this is sub-optimal when you have a list of more than one record, as it sends one database query per Load() call. A better alternative is to the Include() method in your initial query, to instruct the ORM to load the related entities you are interested in, so:
var sourceList = from a in _dbFeed.Auto .Include("Client") where a.Active == true select a;
An alternative third method is to use something call relationship fix-up, where if, in your example for instance, the related clients had been queried previously, they would still be in your object context. For example:
var clients = (from a in _dbFeed.Client select a).ToList();
The EF will then 'fix-up' the relationships so source.Client would not be null. Obviously this is only something you would do if you required a list of all clients for synching, so is not relevant for your specific example.
Always remember that objects are never loaded into the EF unless you request them!
2) The first version of the EF deliberately does not map foreign key fields to observable fields or properties. This is a good rundown on the matter. In EF4.0, I understand foreign keys will be exposed due to popular demand.
3) One issue you may run into is the number of database queries requesting Products or AutoNegotiationContacts may generate. As an alternative, consider loading them in bulk or with a join on your initial query.
It's also seen as good practice to use an object context for one 'operation', then dispose of it, rather than persisting them across requests. There is very little overhead in initialising one, so one object context per SychFeeds() is more appropriate. ObjectContext implements IDisposable, so you can instantiate it in a using block and wrap the method's contents in that, to ensure everything is cleaned up correctly once your changes are submitted.