Currently, I am struggling with an issue regarding Entity Framework (LINQ to Entities). Most of the time when I try to execute entity.SaveChanges() everything works fine but at some points entity.SaveChanges() takes too much and timesouts. I searched a lot but was unable to find out the answer.
(According to companies policy, I cannot copy code somewhere else. So, I do not have the exact code but I will try to layout the basic structure. I hope it helps you to figure out the problem but if i doesn't then let me know.)
Task:
My task is to scan the whole network for some specific files. Match content of each file with the content of database and based on the matching either insert or update the database with the content of the file. I have around 3000 files on the network.
Problem:
public void PerformAction()
{
DbTransaction tran = null;
entity.Connection.Open(); //entity is a global variable declared like myDatabaseEntity entity = new myDatabaseEntity();
tran = entity.Connection.BeginTransaction();
foreach(string path in listOfPaths)
{
//returns 1 - Multiple matching in database OR
// 2 - One matching file in database OR
// 3 - No Matching found.
int returnValue = SearchDatabase();
if(returnValue == 1)
DoSomething(); //All inserts/updates work perfectly. Save changes also works correctly.
else if(returnValue == 2)
DoSomething(); //Again, everything ok. SaveChanges works perfectly here.
else
{
//This function uses some XML file to generate all the queries dynamically
//Forexample INSERT INTO TABLEA(1,2,3);
GenerateInsertQueriesFromXML();
ExecuteQueries();
SaveChanges(); <---- Problem here. Sometimes take too much time.
}
//Transaction commit/rollback code here
}
}
public bool ExecuteQueries()
{
int result = 0;
foreach(string query in listOfInsertQueries)
{
result = entity.ExecuteStoreCommand(query); //Execute the insert queries
if(result <=0)
return false;
}
entity.TestEntityA a = new entity.TestEntityA();
a.PropertyA = 123;
a.PropertyB = 345;
//I have around 25 properties here
entity.AddToTestEntityA(a);
return true;
}
Found the issue.
The main table where i was inserting all the data had a trigger on INSERT and DELETE.
So, whenever i inserted some new data in the main table, the trigger was firing in the backend and was taking all the time.
Entity framework is FAST and INNOCENT :D
Related
In the project, I need to call an external API based on time. So, for one day, I may need to call the API 24 times, one call for one hour period. The API result is a XML file which has 6 fields. I will need to insert these data into a table. Averagely, for each hour, it has about 20,000 rows data.
The table has these 6 columns:
col1, col2, col3, col4, col5, col6
When all 6 columns are the same, we consider the rows are the same, and we should not insert duplications.
I'm using C# and Entity Framework for this:
foreach (XmlNode node in nodes)
{
try
{
count++;
CallData data = new CallData();
...
// get all data and set in 'data'
// check whether in database already
var q = ctx.CallDatas.Where(x => x.col1 == data.col1
&& x.col2 == data.col2
&& x.col3 == data.col3
&& x.col4 == data.col4
&& x.col5 == data.col5
&& x.col6 == data.col6
).Any();
if (q)
{
// exists in database, skip
// log info
}
else
{
string key = $"{data.col1}|{data.col2}|{data.col3}|{data.col4}|{data.col5}|{data.col6}";
// check whether in current chunk already
if (dic.ContainsKey(key))
{
// in current chunk, skip
// log info
}
else
{
// insert
ctx.CallDatas.Add(data);
// update dic
dic.Add(key, true);
}
}
}
catch (Exception ex)
{
// log error
}
}
Logger.InfoFormat("Saving changes ...");
if (ctx.ChangeTracker.HasChanges())
{
await ctx.SaveChangesAsync();
}
Logger.InfoFormat("Saving changes ... Done.");
The code works fine. However, we will need to use this code to run for past several months. The issue is: the code runs slow since for each row it will need to check whether it exists already.
Is there any suggestions to improve the performance?
Thanks
You don't show the code on when the context is created or the life-cycle. I'm inclined to point you to your indexes on the table. If these aren't primary keys then you might see the performance issue there. If you are doing full table scans, it will be progressively slower. With that said, there are two separate ways to handle the
The EF Native way: You can explicitly create a new connection on each interaction (avoiding change tracking for all entries reducing progressive slowdown). Also, your save is async but your *Any statement is sync. Using async for that as well might help take some pressure off the current thread if it's waiting.
// Start your context scope closer to the data call, as if the look is long
// running you could be building up tracked changes in the cache, this prevents
// that situation.
using (YourEntity ctx = new YourEntity())
{
CallData data = new CallData();
if (await ctx.CallDatas.Where(x => x.col1 == data.col1
&& x.col2 == data.col2
&& x.col3 == data.col3
&& x.col4 == data.col4
&& x.col5 == data.col5
&& x.col6 == data.col6
).AnyAsync()
)
{
// exists in database, skip
// log info
}
else
{
string key = $"{data.col1}|{data.col2}|{data.col3}|{data.col4}|{data.col5}|{data.col6}";
// check whether in current chunk already
if (dic.ContainsKey(key))
{
// in current chunk, skip
// log info
}
else
{
// insert
ctx.CallDatas.Add(data);
await ctx.SaveChangesAsync();
// update dic
dic.Add(key, true);
}
}
}
Optional Way: Look into inserting the data using a bulk operation via store procedure. 20k rows is trivial, and you can still use entity framework for that as well. See https://stackoverflow.com/a/9837927/1558178
I have created my own version of this (customized for my specific needs) and have found that it works well and give more control for bulk inserts.
I have used this ideology to insert 100k records at a time. I have my logic in the stored procedure for checking for duplicates which gives me better control as well as reducing the over the wire call to 0 reads and 1 write. This should just take a second or two to execute assuming your stored procedure is optimized.
Different approach:
Save all rows with duplicates - should be very efficient
When you use data from the table use DISTINCT for all fields.
For raw, bulk operations like this I would consider avoiding EF entities and context tracking and merely execute SQL through the context:
var sql = $"IF NOT EXISTS(SELECT 1 FROM CallDates WHERE Col1={data.Col1} AND Col2={data.Col2} AND Col3={data.Col3} AND Col4={data.Col4} AND Col5={data.Col5} AND Col6={data.Col6}) INSERT INTO CallDates(Col1,Col2,Col3,Col4,Col5,Col6) VALUES ({data.Col1},{data.Col2},{data.Col3},{data.Col4},{data.Col5},{data.Col6})";
context.Database.ExeculeSqlCommand(sql);
This does without the extra checks and logging, just effectively raw SQL with duplicate detection.
We're investigating a performance issue where EF 6.1.3 is being painfully slow, and we cannot figure out what might be causing it.
The database context is initialized with:
Configuration.ProxyCreationEnabled = false;
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
We have isolated the performance issue to the following method:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
var writer = session.Writer<T>();
writer.Attach(entity);
await writer.UpdatePropertyAsync(entity, changedProperties.ToArray()).ConfigureAwait(false);
}
return entity.Id;
}
There are two names in the changedProperties list, and EF correctly generated an update statement that updates just these two properties.
This method is called repeatedly (to process a collection of data items) and takes about 15-20 seconds to complete.
If we replace the method above with the following, execution time drops to 3-4 seconds:
protected virtual async Task<long> UpdateEntityInStoreAsync(T entity,
string[] changedProperties)
{
var sql = $"update {entity.TypeName()}s set";
var separator = false;
foreach (var property in changedProperties)
{
sql += (separator ? ", " : " ") + property + " = #" + property;
separator = true;
}
sql += " where id = #Id";
var parameters = (from parameter in changedProperties.Concat(new[] { "Id" })
let property = entity.GetProperty(parameter)
select ContextManager.CreateSqlParameter(parameter, property.GetValue(entity))).ToArray();
using (var session = sessionFactory.CreateReadWriteSession(false, false))
{
await session.UnderlyingDatabase.ExecuteSqlCommandAsync(sql, parameters).ConfigureAwait(false);
}
return entity.Id;
}
The UpdatePropertiesAsync method called on the writer (a repository implementation) looks like this:
public virtual async Task UpdatePropertyAsync(T entity, string[] changedPropertyNames, bool save = true)
{
if (changedPropertyNames == null || changedPropertyNames.Length == 0)
{
return;
}
Array.ForEach(changedPropertyNames, name => context.Entry(entity).Property(name).IsModified = true);
if (save)
await context.SaveChangesAsync().ConfigureAwait(false);
}
}
What is EF doing that completely kills performance? And is there anything we can do to work around it (short of using another ORM)?
By timing the code I was able to see that the additional time spent by EF was in the call to Attach the object to the context, and not in the actual query to update the database.
By eliminating all object references (setting them to null before attaching the object and restoring them after the update is complete) the EF code runs in "comparable times" (5 seconds, but with lots of logging code) to the hand-written solution.
So it looks like EF has a "bug" (some might call it a feature) causing it to inspect the attached object recursively even though change tracking and validation have been disabled.
Update: EF 7 appears to have addressed this issue by allowing you to pass in a GraphBehavior enum when calling Attach.
The problem with Entity framework is that when you call SaveChanges(), insert statements are sent to database one by one, that's how Entity works.
And actually there are 2 db hits per insert, first db hit is insert statement for a record, and the second one is select statement to get the id of inserted record.
So you have numOfRecords * 2 database trips * time for one database trip.
Write this in your code context.Database.Log = message => Debug.WriteLine(message); to log generated sql to console, and you will see what am I talking about.
You can use BulkInsert, here is the link: https://efbulkinsert.codeplex.com/
Seeing as though you already have tried setting:
Configuration.AutoDetectChangesEnabled = false;
Configuration.ValidateOnSaveEnabled = false;
And you are not using an ordered lists, I think you are going to have to refactor your code and do some benchmarking.
I believe the bottleneck is coming from the foreach as the context is having to deal with a potentially large amounts of bulk data (not sure how many this is in your case).
Try and cut the items contained in your array down into smaller batches before calling the SaveChanges(); or SaveChangesAsync(); methods, and note the performance deviations as apposed to letting the context grow too large.
Also, if you are still not seeing further gains, try disposing of the context post SaveChanges(); and then creating a new one, depending on the size of your entities list, flushing out the context may yield even further improvements.
But this all depends on how many entities we are talking about and may only be noticeable in the hundreds and thousands of record scenarios.
I'm trying to accomplish 2 things with the below snippet of code (from ApplicationDataService.lsml.cs in the server project of my Lightswitch 2013 solution).
partial void Query1_PreprocessQuery(ref IQueryable<CandidateBasic> query)
{
query = from item in query where item.CreatedBy == this.Application.User.Name select item;
}
partial void CandidateBasics_Validate(CandidateBasic entity, EntitySetValidationResultsBuilder results)
{
var newcandidateCount = this.DataWorkspace.ApplicationData.Details.GetChanges().AddedEntities.OfType<CandidateBasic>().Count();
var databasecandidateCount = this.CandidateBasics.GetQuery().Execute().Count();
const int maxcandidateCount = 1;
if (newcandidateCount + databasecandidateCount > maxcandidateCount)
{
results.AddEntityError("Error: you are only allowed to have one candidate record");
}
}
Firstly, I want to make sure each user can only see things that he has made. This, together with a preprocess query on the table in question, works perfectly.
The next bit is designed to make sure that each user can only create one record in a certain table. Unfortunately, it seems to be looking at the whole table, and not the query I made that shows only the user's own records.
How can I get that second bit of code to limit only the user's own records, and not the global table?
You're not actually calling that query though are you? Your query is called Query1 based on the code provided yet you don't seem to be calling it. I'd do something like:
int count = DataWorkspace.ApplicationData.Query1().Count();
Is there a "best practice" way of handling bulk inserts (via LINQ) but discard records that may already be in the table? Or I am going to have to either do a bulk insert into an import table then delete duplicates, or insert one record at a time?
08/26/2010 - EDIT #1:
I am looking at the Intersect and Except methods right now. I am gathering up data from separate sources, converting into a List, want to "compare" to the target DB then INSERT just the NEW records.
List<DTO.GatherACH> allACHes = new List<DTO.GatherACH>();
State.IState myState = null;
State.Factory factory = State.Factory.Instance;
foreach (DTO.Rule rule in Helpers.Config.Rules)
{
myState = factory.CreateState(rule.StateName);
List<DTO.GatherACH> stateACHes = myState.GatherACH();
allACHes.AddRange(stateACHes);
}
List<Model.ACH> newRecords = new List<Model.ACH>(); // Create a disconnected "record set"...
foreach (DTO.GatherACH record in allACHes)
{
var storeInfo = dbZach.StoreInfoes.Where(a => a.StoreCode == record.StoreCode && (a.TypeID == 2 || a.TypeID == 4)).FirstOrDefault();
Model.ACH insertACH = new Model.ACH
{
StoreInfoID = storeInfo.ID,
SourceDatabaseID = (byte)sourceDB.ID,
LoanID = (long)record.LoanID,
PaymentID = (long)record.PaymentID,
LastName = record.LastName,
FirstName = record.FirstName,
MICR = record.MICR,
Amount = (decimal)record.Amount,
CheckDate = record.CheckDate
};
newRecords.Add(insertACH);
}
The above code builds the newRecords list. Now, I am trying to get the records from this List that are not in the DB by comparing on the 3 field Unique Index:
AchExceptComparer myComparer = new AchExceptComparer();
var validRecords = dbZach.ACHes.Intersect(newRecords, myComparer).ToList();
The comparer looks like:
class AchExceptComparer : IEqualityComparer<Model.ACH>
{
public bool Equals(Model.ACH x, Model.ACH y)
{
return (x.LoanID == y.LoanID && x.PaymentID == y.PaymentID && x.SourceDatabaseID == y.SourceDatabaseID);
}
public int GetHashCode(Model.ACH obj)
{
return base.GetHashCode();
}
}
However, I am getting this error:
LINQ to Entities does not recognize the method 'System.Linq.IQueryable1[MisterMoney.LARS.ZACH.Model.ACH] Intersect[ACH](System.Linq.IQueryable1[MisterMoney.LARS.ZACH.Model.ACH], System.Collections.Generic.IEnumerable1[MisterMoney.LARS.ZACH.Model.ACH], System.Collections.Generic.IEqualityComparer1[MisterMoney.LARS.ZACH.Model.ACH])' method, and this method cannot be translated into a store expression.
Any ideas? And yes, this is completely inline with the original question. :)
You can't do bulk inserts with LINQ to SQL (I presume you were referring to LINQ to SQL when you said "LINQ"). However, based on what you're describing, I'd recommend checking out the new MERGE operator of SQL Server 2008.
Inserting, Updating, and Deleting Data by Using MERGE
Another example here.
I recommend you just write the SQL yourself to do the inserting, I find it is a lot faster and you can get it to work exactly how you want it to. When I did something similar to this (just a one-off program) I just used a Dictionary to hold the ID's I had inserted already, to avoid duplicates.
I find LINQ to SQL is good for one record or a small set that does its entire lifespan in the LINQ to SQL.
Or you can try to use SQL Server 2008's Bulk Insert .
One thing to watch out for is if you queue more than 2000 or so records without calling SubmitChanges() - TSQL has a limit on the number of statements per execution, so you cannot simply queue up every record and then call SubmitChanges() as this will throw an SqlException, you need to periodically clear the queue to avoid this.
I want to update my database using a LINQ2SQL query.
However this seems for some reason to be a very ugly task compared to the otherwise lovely LINQ code.
The query needs to update two tables.
tbl_subscription
(
id int,
sub_name nvarchar(100),
sub_desc nvarchar(500),
and so on.
)
tbl_subscription2tags
(
sub_id (FK to tbl_subscription)
tag_id (FK to a table called tbl_subscription_tags)
)
Now down to my update function a send a tbl_subscription entity with the tags and everything.
I can't find a pretty way to update my database..
I can only find ugly examples where I suddenly have to map all attributes..
There most be a smart way to perform this. Please help.
C# Example if possible.
I have tried this with no effect:
public void UpdateSubscription(tbl_subscription subscription)
{
db.tbl_subscriptions.Attach(subscription);
db.Refresh(System.Data.Linq.RefreshMode.OverwriteCurrentValues, subscription);
db.SubmitChanges(System.Data.Linq.ConflictMode.FailOnFirstConflict);
}
Source for this code is here:
http://skyeyefive.spaces.live.com/blog/cns!6B6EB6E6694659F2!516.entry
Why don't just make the changes to the objects and perform a SubmitChanges to the DataContext?
using(MyDataContext dc = new MyDataContext("ConnectionString"))
{
foreach(var foo in dc.foo2)
{
foo.prop1 = 1;
}
dc.SubmitChanges();
}
Otherwise you need to tell us more about the lifecycle of the object you want to manipulate
edit: forgot to wrap in brackets for using
Unless I'm misunderstanding your situation, I think that citronas is right.
The best and easiest way that I've found to update database items through LINQ to SQL is the following:
Obtain the item you want to change from the data context
Change whatever values you want to update
Call the SubmitChanges() method of the data context.
Sample Code
The sample code below assumes that I have a data context named DBDataContext that connects to a database that has a Products table with ID and Price parameters. Also, a productID variable contains the ID of the record you want to update.
using (var db = new DBDataContext())
{
// Step 1 - get the item from the data context
var product = db.Products.Where(p => p.ID == productID).SingleOrDefault();
if (product == null) //Error checking
{
throw new ArgumentException();
}
// Step 2 - change whatever values you want to update
product.Price = 100;
// Step 3 - submit the changes
db.SubmitChanges();
}
I found out that you can use "Attach" as seen in my question to update a table, but apparently not the sub tables. So I just used a few Attach and it worked without having to run through parameters!