entity framework save taking a long time - c#

There are many articles here on EF taking a long time to save, but I've looked through them and used their answers and still seem to get very slow results.
My code looks like so:
using (MarketingEntities1 db = new MarketingEntities1())
{
//using (var trans = db.Database.BeginTransaction(IsolationLevel.ReadUncommitted))
//{
int count = 0;
db.Configuration.AutoDetectChangesEnabled = false;
db.Configuration.ValidateOnSaveEnabled = false;
while (count < ranges.Count)
{
if (bgw != null)
{
bgw.ReportProgress(0, "Saving count: " + count.ToString());
}
db.Set<xGeoIPRanx>().AddRange(ranges.Skip(count).Take(BATCHCOUNT));
db.SaveChanges();
count+=BATCHCOUNT;
}
//trans.Commit();
//}
}
Each batch takes 30+ seconds to complete. BatchCount is 1000. i know EF isn't that slow. You can see that I've stopped using transaction, I've taken tracking off, none of it seemed to help.
Some more info:
xGeoIpRanx is an empty table, with no PK(I'm not sure how much it would help). I'm trying to insert about 10 mil ranges.
Edit:
i feel stupid but im trying to use bulkInsert and i keep getting this entity doesnt exist errors, i look at this code
using (var ctx = GetContext())
{
using (var transactionScope = new TransactionScope())
{
// some stuff in dbcontext
ctx.BulkInsert(entities);
ctx.SaveChanges();
transactionScope.Complete();
}
}
What is "entities" I tried a list of my entities, that doesnt work, what data type is that?
nvm it works as expected it was a strange error due to how i generated the edmx file

Pause the debugger 10 times under load and look at the stack including
external code. Where does it stop most often?
.
Its taking a long time on the .SaveChanges(). just from some quick tests, ADO.net code
That means network latency and server execution time are causing this. For inserts server execution time is usually not that high. You cannot do anything about network latency with EF because it sends one batch per insert. (Yes, this is a deficiency of the framework.).
Don't use EF for bulk work. Consider using table-values parameters or SqlBulkCopy or any other means of bulk inserting such as Aducci's proposal from the comments.

Related

How to avoid .NET Connection Pool timeouts when inserting 37k rows

I'm trying to figure out the best way to batch insert about 37k rows into my Sql Server using DAPPER.
My problem is that when I use Parallel.ForEach - the number of connections to the database increases over a short period of time - finally hitting nearly or about 100 ... which gives connection pool errors. If I force the max degree of parall then it's hit that max number and stays there.
Setting the maxdegree feels wrong.
It currently is doing about 10-20 inserts a second. This is also in a simple Console App - so there's no other database activity besides what's happening in my Parallel.ForEach loop.
Is using Parallel.ForEach the incorrect thing in this case because this is not-CPU bound?
Should I be using async/await ? If so, what stopping this from doing hundreds of db calls in one go?
Sample code which is basically what I'm doing.
var items = GetItemsFromSomewhere(); // Returns 37K items.
Parallel.ForEach(items => item)
{
using (var sqlConnection = new SqlConnection(_connectionString))
{
var result = sqlConnection.Execute(myQuery, new { ... } );
}
}
My (incorrect) understanding of this was that there should on be about 8 or so connections at any time to the db. The Connection Pool will release the connection (which remains instantiated in the Connection Pool, waiting to be used). And if the Execute takes .. i donno .. lets say even a 1 second (the longest running time for an insert was about 500ms .. and that's 1 in every 100 or so) ... that's ok .. that thread is blocked and chills until the Execute completes. Then the scope completes (and Dispose is auto called) and the connection closed. With the connection closed, the Parallel.ForEach then grabs the next item in the collection, goes to the connection pool and then grabs a spare connection (remember - we just closed one, a split second ago) ... rinse.repeat.
Is this wrong?
Notes:
.NET 4.5
Sql 2012
Console app.
Using Dapper.NET for sql code.
First of all: If it is about performance, use SqlBulkCopy. This works with SQL-Server. If you are using other database servers, they might have their own SqlBulkCopy-solution (Oracle has one).
SqlBulkCopy works like a bulk-select: One state opens one connection and streams all the data from the server to the client. With an insert, it works the other way arround: It streams all the new records from the client to the server.
See: https://msdn.microsoft.com/en-us/library/ex21zs8x(v=vs.110).aspx
If you insist of using parallellism, you might want to consider the follow code:
void BulkInsert<T>(object p)
{
IEnumerator<T> e = (IEnumerator<T>)p;
using (var sqlConnection = new SqlConnection(_connectionString))
{
while(true)
{
T item;
lock(e)
{
if (!e.MoveNext())
return;
item = e.Current;
}
var result = sqlConnection.Execute(myQuery, new { ... } );
}
}
}
Now create your own threads and invoke this method on these threads with one and the same parameter: The iterator which runs through your collection. Each threat opens its own connection once, starts inserting, and after all items are inserted, the connection is closed. This solutions uses as many connections as your created threads.
PS: Multiple variants of above code are possible . You could call it from background threads, from Tasks, etc. I hope you get the point.
You should use SqlBulkCopy instead of inserting one by one. Faster and more efficient.
https://msdn.microsoft.com/en-us/library/ex21zs8x(v=vs.110).aspx
credits to the answer owner
Sql Bulk Copy/Insert in C#

SagePay (payment gateway) notification return taking long time in ASP.Net MVC application

We are experiencing performance issues within our website related to high cpu usage. When using a profiler we have identified a particular method that is taking ~35 seconds to return from.
This is a call back method when using a payment gateway called SagePay.
I have copied the two methods that are part of this call below:
public void SagePayNotificationReturn()
{
string vendorTxCode = Request.Form["vendortxcode"];
var sagePayTransaction = this.sagePayTransactionManager.GetTransactionByVendorTxCode(vendorTxCode);
if (sagePayTransaction == null)
{
// Cannot find the order, so log an error and return error response
int errorId = this.exceptionManager.LogException(System.Web.HttpContext.Current.Request, new Exception(string.Format("Could not find SagePay transaction for order {0}.", vendorTxCode)));
ReturnResponse(System.Web.HttpContext.Current, StatusEnum.ERROR, string.Format("{0}home/error/{1}", GlobalSettings.SiteURL, errorId), string.Format("Received notification for {0} but the transaction was not found.", vendorTxCode));
}
else
{
// Store the response and respond immediately to SagePay
sagePayTransaction.NotificationValues = sagePayTransactionManager.FormValuesToQueryString(Request.Form);
this.sagePayTransactionManager.Save(sagePayTransaction);
ReturnResponse(System.Web.HttpContext.Current, StatusEnum.OK, string.Format("{0}payment/processtransaction/{1}", GlobalSettings.SiteURL, vendorTxCode), string.Empty);
}
}
private void ReturnResponse(HttpContext context, StatusEnum status, string redirectUrl, string statusDetail)
{
context.Response.Clear();
context.Response.ContentEncoding = Encoding.UTF8;
using (StreamWriter streamWriter = new StreamWriter(context.Response.OutputStream))
{
streamWriter.WriteLine(string.Concat("Status=", status.ToString()));
streamWriter.WriteLine(string.Concat("RedirectURL=", redirectUrl));
streamWriter.WriteLine(string.Concat("StatusDetail=", HttpUtility.HtmlEncode(statusDetail)));
streamWriter.Flush();
streamWriter.Close();
}
context.ApplicationInstance.CompleteRequest();
}
The GetTransactionByVendorTxCode method is a simple Entity Framework call, so I've ruled that out.
Has anybody any experience in this or can they see anything glaringly wrong with the code that could cause such an issue?
EDIT: Looking at the breakdown table provided by the profiler, it says that 99.6% time is spent in System.Web.Mvc.MvcHandler.BeginProcessRequest().
EDIT: Using the profiling tool New Relic, it says that 22% of all processing time is spent in the this.sagePayTransactionManager.GetTransactionByVendorTxCode(vendorTxCode) method. This is simply containing an EF6 call to a repository. The call does contain a predicate parameter though, rather than pre-defined condition. Could it be that the query is not being pre-compiled?
Here's my first step of getting to the solution:
Put in a timer start before this statement then stop it when it completes. Tell us the timespan.
var sagePayTransaction = this.sagePayTransactionManager.GetTransactionByVendorTxCode(vendorTxCode);
Put in another timer for this code block: Tell us the relative time compared to method above.
using (StreamWriter streamWriter = new StreamWriter(context.Response.OutputStream))
{
streamWriter.WriteLine(string.Concat("Status=", status.ToString()));
streamWriter.WriteLine(string.Concat("RedirectURL=", redirectUrl));
streamWriter.WriteLine(string.Concat("StatusDetail=", HttpUtility.HtmlEncode(statusDetail)));
streamWriter.Flush();
streamWriter.Close();
}
Finally put another timer here:
context.ApplicationInstance.CompleteRequest();
Post the information back to us and I'll guide you to the next step. What we are doing above is getting metrics that will span both local and remote access to find the major problem. We'll pick that off first then progress further if needed. Just tell us what the measurements are.
There are a number of things you may need to consider here.
If GetTransactionByVendorTxCode is responsible for 22% of total processing time, then you will need to streamline everything that method touches but then still go on to fish other bottlenecks in the overall processing pipeline.
You say that method abstracts a call to EF6 and that it passes in a predicate expression that gets used in the Where clause to build up the final query.
If the query is a complex one, have you considered delegating to a StoredProcedure? Since you are returning an Entity, you can hang it off of the DbSet. (In the case of a DTO, it will hang off the Database property of the DbContext).
Also, you will need to look at the indices on the columns used in your predicate. What is the current record count? Are your queries resulting in seeks or scans? You will need to look at the resulting query plans; if using SQL Server run your queries Database Engine Tuning Advisor.
Perhaps a little more detail about your current setup will assist in providing better guidance.

SqlBulkCopy.WriteToServer hangs Thread.Abort is called but not sure why

Given:
A BenchMark class that lets me know when something has completed.
A very large XML file (~120MB) that has been parsed into multiple Lists
Some code:
SqlConnection con = null;
SqlTransaction transaction = null;
try
{
con = getCon(); // gets a new connection object
con.Open();
transaction = con.BeginTransaction();
var bulkCopy = new SqlBulkCopy(con, SqlBulkCopyOptions.Default, transaction)
{
BatchSize = 1000,
DestinationTableName = "Table1"
};
// assume that the BenchMark class is working
b = new BenchMark("Table1");
bulkCopy.WriteToServer(_insertTable1s.AsDataReader()); // _insertTables1s is a List<Table1>
b.Complete();
LogHelper.WriteLogItem(b);
b = new BenchMark("Table2");
bulkCopy.DestinationTableName = "Table2";
bulkCopy.WriteToServer(_insertTable2s.AsDataReader()); // _insertTables2s is a List<Table2>
b.Complete();
LogHelper.WriteLogItem(b);
// etc... this code does a batch insert into about 7 tables all having about 40,000 records being inserted.
b = new BenchMark("Transaction Commit");
transaction.Commit();
b.Complete();
}
catch (Exception e)
{
transaction.Rollback();
LogHelper.WriteLogItem(
LogLevel.Critical,
LogType.DataProcessing,
e.ToString());
}
finally
{
con.Close();
}
The Problem:
On my local development environment, everything is fine. Its when I run this operation in the cloud that causes it to hang. Using the LogHelper.WriteLogItem method, I can watch the progress of this process. I observe it hang randomly on a particular table. No exception is thrown so the transaction isn't rolled back. Say it hangs on Table2 bulk insert. Using MS SQL Management Studio, I run queries on Table3, Table2 and Table1 with no issue (this means that the transaction was aborted?)
Since it hangs, I'll go rerun the process. This time it hangs sooner so I might get logs like this:
7755 Benchmark LoadXML took 00:00:04.2432816
7756 Benchmark Table1 took 00:00:06.3961230
7757 Benchmark Table2 took 00:00:05.2566890
7758 Benchmark Table3 took 00:00:08.4900921
7759 Benchmark Table4 took 00:00:02.0000123
... it hangs on Table5 (because the BenchMark never completed). I go to run it again and the rest of the log looks like:
7780 Benchmark LoadXML took 00:00:04.1203923
... and it hangs here now.
I'm using rackspace cloud hosting if that helps. I have been able to fix this in the past by deleting all the tables from my dbml file and readding them but this time its not working. I'm wondering if the amount of data being processed is causing the problem?
EDIT: The code in this example is run in an Asynchronous thread. I've found out that the Thread is Aborting for an unknown reason and I need to find out why to solve this problem.
If you have admin to your server or database, you can run
SELECT * FROM sys.dm_tran_session_transactions
to see what transactions are currently active - From Pinal
Additionally, you can run sp_lock to make sure there isn't something blocking your transaction.
Because this process is done asynchronously (i.e. a thread is kicked off to handle this) the thread has a problem which aborts it and that is why I get strange behavior where the code stalls at different places. I've solved this by completing this task synchronously (it works but its not ideal).
I guess the real issue is why my thread is aborting since I'm not aborting it in any of my code. I believe that its due to amount of data that is being processed, but I could be wrong.
Either way, I've solved my problem.

Database file is inexplicably locked during SQLite commit

I'm performing a large number of INSERTS to a SQLite database. I'm using just one thread. I batch the writes to improve performance and have a bit of security in case of a crash. Basically I cache up a bunch of data in memory and then when I deem appropriate, I loop over all of that data and perform the INSERTS. The code for this is shown below:
public void Commit()
{
using (SQLiteConnection conn = new SQLiteConnection(this.connString))
{
conn.Open();
using (SQLiteTransaction trans = conn.BeginTransaction())
{
using (SQLiteCommand command = conn.CreateCommand())
{
command.CommandText = "INSERT OR IGNORE INTO [MY_TABLE] (col1, col2) VALUES (?,?)";
command.Parameters.Add(this.col1Param);
command.Parameters.Add(this.col2Param);
foreach (Data o in this.dataTemp)
{
this.col1Param.Value = o.Col1Prop;
this. col2Param.Value = o.Col2Prop;
command.ExecuteNonQuery();
}
}
this.TryHandleCommit(trans);
}
conn.Close();
}
}
I now employ the following gimmick to get the thing to eventually work:
private void TryHandleCommit(SQLiteTransaction trans)
{
try
{
trans.Commit();
}
catch (Exception e)
{
Console.WriteLine("Trying again...");
this.TryHandleCommit(trans);
}
}
I create my DB like so:
public DataBase(String path)
{
//build connection string
SQLiteConnectionStringBuilder connString = new SQLiteConnectionStringBuilder();
connString.DataSource = path;
connString.Version = 3;
connString.DefaultTimeout = 5;
connString.JournalMode = SQLiteJournalModeEnum.Persist;
connString.UseUTF16Encoding = true;
using (connection = new SQLiteConnection(connString.ToString()))
{
//check for existence of db
FileInfo f = new FileInfo(path);
if (!f.Exists) //build new blank db
{
SQLiteConnection.CreateFile(path);
connection.Open();
using (SQLiteTransaction trans = connection.BeginTransaction())
{
using (SQLiteCommand command = connection.CreateCommand())
{
command.CommandText = DataBase.CREATE_MATCHES;
command.ExecuteNonQuery();
command.CommandText = DataBase.CREATE_STRING_DATA;
command.ExecuteNonQuery();
//TODO add logging
}
trans.Commit();
}
connection.Close();
}
}
}
I then export the connection string and use it to obtain new connections in different parts of the program.
At seemingly random intervals, though at far too great a rate to ignore or otherwise workaround this problem, I get unhandled SQLiteException: Database file is locked. This occurs when I attempt to commit the transaction. No errors seem to occur prior to then. This does not always happen. Sometimes the whole thing runs without a hitch.
No reads are being performed on these files before the commits finish.
I have the very latest SQLite binary.
I'm compiling for .NET 2.0.
I'm using VS 2008.
The db is a local file.
All of this activity is encapsulated within one thread / process.
Virus protection is off (though I think that was only relevant if you were connecting over a network?).
As per Scotsman's post I have implemented the following changes:
Journal Mode set to Persist
DB files stored in C:\Docs + Settings\ApplicationData via System.Windows.Forms.Application.AppData windows call
No inner exception
Witnessed on two distinct machines (albeit very similar hardware and software)
Have been running Process Monitor - no extraneous processes are attaching themselves to the DB files - the problem is definitely in my code...
Does anyone have any idea whats going on here?
I know I just dropped a whole mess of code, but I've been trying to figure this out for way too long. My thanks to anyone who makes it to the end of this question!
brian
UPDATES:
Thanks for the suggestions so far! I've implemented many of the suggested changes. I feel that we are getting closer to the answer...however...
The code above technically works however it is non-deterministic! It is not guaranteed to do anything aside from spin in neutral forever. In practice it seems to work somewhere between the 1st and 10th iteration. If i batch my commits at a reasonable interval damage will be mitigated but I really do not want to leave things in this state...
More suggestions welcome!
It looks like you failed to link the command with the transaction you've created.
Instead of:
using (SQLiteCommand command = conn.CreateCommand())
You should use:
using (SQLiteCommand command = new SQLiteCommand("<INSERT statement here>", conn, trans))
Or you can set its Transaction property after its construction.
While we are at it - your handling of failures is incorrect:
The command's ExecuteNonQuery method can also fail and you are not really protected. You should change the code to something like:
public void Commit()
{
using (SQLiteConnection conn = new SQLiteConnection(this.connString))
{
conn.Open();
SQLiteTransaction trans = conn.BeginTransaction();
try
{
using (SQLiteCommand command = conn.CreateCommand())
{
command.Transaction = trans; // Now the command is linked to the transaction and don't try to create a new one (which is probably why your database gets locked)
command.CommandText = "INSERT OR IGNORE INTO [MY_TABLE] (col1, col2) VALUES (?,?)";
command.Parameters.Add(this.col1Param);
command.Parameters.Add(this.col2Param);
foreach (Data o in this.dataTemp)
{
this.col1Param.Value = o.Col1Prop;
this. col2Param.Value = o.Col2Prop;
command.ExecuteNonQuery();
}
}
trans.Commit();
}
catch (SQLiteException ex)
{
// You need to rollback in case something wrong happened in command.ExecuteNonQuery() ...
trans.Rollback();
throw;
}
}
}
Another thing is that you don't need to cache anything in memory. You can depend on SQLite journaling mechanism for storing incomplete transaction state.
Run Sysinternals Process Monitor and filter on filename while running your program to rule out if any other process does anything to it and to see what exacly your program is doing to the file. Long shot, but might give a clue.
We had a very similar problem using nested Transactions with the TransactionScope class. We thought all database actions occurred on the same thread...however we were caught out by the Transaction mechanism...more specifically the Ambient transaction.
Basically there was a transaction higher up the chain which, by the magic of ado, the connection automatically enlisted in. The result was that, even though we thought we were writing to the database on a single thread, the write didn't really happen until the topmost transaction was committed. At this 'indeterminate' point the database was written to causing it to be locked outside of our control.
The solution was to ensure that the sqlite database did not directly take part in the ambient transaction by ensuring we used something like:
using(TransactionScope scope = new TransactionScope(TransactionScopeOptions.RequiresNew))
{
...
scope.Complete()
}
Things to watch for:
don't use connections across multiple threads/processes.
I've seen it happen when a virus scanner would detect changes to the file and try to scan it. It would lock the file for a short interval and cause havoc.
I started facing this same problem today: I'm studying asp.net mvc, building my first application completely from scratch. Sometimes, when I'd write to the database, I'd get the same exception, saying the database file was locked.
I found it really strange, since I was completely sure that there was just one connection open at that time (based on process explorer's listing of active file handles).
I've also built the whole data access layer from scratch, using System.Data.SQLite .Net provider, and, when I planned it, I took special care with connections and transactions, in order to ensure no connection or transaction was left hanging around.
The tricky part was that setting a breakpoint on ExecuteNonQuery() command and running the application in debug mode would make the error disappear!
Googling, I found something interesting on this site: http://www.softperfect.com/board/read.php?8,5775. There, someone replied the thread suggesting the author to put the database path on the anti-virus ignore list.
I added the database file to the ignore list of my anti-virus (Microsoft Security Essentials) and it solved my problem. No more database locked errors!
Is your database file on the same machine as the app or is it stored on a server?
You should create a new connection in every thread. I would simplefy the creation of a connection, use everywhere: connection = new SQLiteConnection(connString.ToString());
and use a database file on the same machine as the app and test again.
Why the two different ways of creating a connection?
These guys were having similiar problems (mostly, it appears, with the journaling file being locked, maybe TortoiseSVN interactions ... check the referenced articles).
They came up with a set of recommendations (correct directories, changing journaling types from delete to persist, etc). http://sqlite.phxsoftware.com/forums/p/689/5445.aspx#5445
The journal mode options are discussed here: http://www.sqlite.org/pragma.html . You could try TRUNCATE.
Is there a stack trace during the exception into SQL Lite?
You indicate you "batch my commits at a reasonable interval". What is the interval?
I would always use a Connection, Transaction and Command in a using clause. In your first code listing you did, but your third (creating the tables) you didn't. I suggest you do that too, because (who knows?) maybe the commands that create the table somehow continue to lock the file. Long shot... but worth a shot?
Do you have Google Desktop Search (or another file indexer) running? As previously mentioned, Sysinternals Process Monitor can help you track it down.
Also, what is the filename of the database? From PerformanceTuningWindows:
Be VERY, VERY careful what you name your database, especially the extension
For example, if you give all your databases the extension .sdb (SQLite Database, nice name hey? I thought so when I choose it anyway...) you discover that the SDB extension is already associated with APPFIX PACKAGES.
Now, here is the cute part, APPFIX is an executable/package that Windows XP recognizes, and it will, (emphasis mine) ADD THE DATABASE TO THE SYSTEM RESTORE FUNCTIONALITY
This means, stay with me here, every time you write ANYTHING to the database, the Windows XP system thinks a bloody executable has changed and copies your ENTIRE 800 meg database to the system restore directory....
I recommend something like DB or DAT.
While the lock is reported on the COMMIT, the lock is on the INSERT/UPDATE command. Check for record locks not being released earlier in your code.

Unable to enlist in a distributed transaction with NHibernate

I'm seeing a problem in a unit test where Oracle is thrown an exception with the message "Unable to enlist in a distributed transaction". We're using ODP.net and NHibernate. The issue comes up after making a certain number of commits to the database inside nested transactions. Annoyingly, this is failing on the continuous integration server (Windows Server 2003 R2 SP1), and not on my dev machine (XP SP2).
This is a small(ish) repro of the issue:
using (new TransactionScope())
{
for (int j = 0; j < 15; j++)
{
using (var transactionScope = new TransactionScope(TransactionScopeOption.Required))
using (var session = sessionFactory.OpenSession())
{
for (int i = 0; i < 200; i++)
{
var obj = [create new NHibernate mapped obj]
session.Save(obj);
}
session.Flush();
transactionScope.Complete();
}
}
}
The connection string we're using is:
Data Source=server;User Id=user;Password=password;Enlist=true;
Obviously this looks like a heavy handed thing to be doing, but the case of the product code is more complex (the outer transaction loop and inner transaction loop are very separated).
On the build server, it reliably bombs out on the fifth iteration of the outer loop (j). Seeing as it passes on my local machine, I'm wondering if this is hitting some kind of configured limit of transactions or connections?
Anyone got any hunches I can try out? The obvious way to fix it is to change the code to better handle this situation, but I'd just like to understand why it works on one machine and not on another. Thanks!
It seems to me this has to do with your Oracle database configuration.
Do you use the same database server in both environments (I assume not) ?
Which version of the database do you use (I'll take 10g) ?
Here is what I could find based on these assumptions :
Check Tuning Microsoft Transaction Server Performance. The default value for the ORAMTS_NET_CACHE_MAXFREE parameter is set to 5, which may be related to your problem. Read the whole page before taking any action, though (you could try to increase the SESSIONS and PROCESSES parameters too).
You could enable tracing on Oracle MTS to see what is really happening there.
If still stuck, I guess you could enable tracing on MSDTC to try to get more insight.

Categories

Resources