I've written a small tool for archiving data from my Entity Framework code-first driven database.
I'm testing it thoroughly and I'm coming to a point where I'm trying it with large amounts of data. Where it comes to some problems. For example I got timeouts or exceptions like this sometimes:
Deadlock found when trying to get lock; try restarting transaction.
I know what transactions are and I guess Entity Framework is making them for all of its changes in one DbContext so in case any of them or the entire thing fails when SaveChanges() is called nothing is actually changed (short side questions: can I then simply run SaveChanges() again?)
What I want to know is this: since I need to delete different batches of information throughout my database (after exporting it) I'm constantly creating dbcontext's for each of those batches.
Should I create transactions manually for every batch and commit them all at once at the very end?
I'm studying Informatics and learn about transactional information systems in one of my courses. How is it possible with Entity Framework to create a meta transaction for all my single transactions when deleting batches of data, so all the data that is spread throughout the database is only really deleted when everything worked well, like this:
Or is there a better way to solve the entire thing?
I've got Entity Framework 4.1 with .NET 4.5 running on ASP.NET in Windows 2008R2. I'm using EF code-first to connect to SQL Server 2008R2, and executing a fairly complex LINQ query, but resulting in just a Count().
I've reproduced the problem on two different web servers but only one database (production of course). It recently started happening with no application, database structure, or server changes on the web or database side.
My problem is that executing the query under certain circumstances takes a ridiculous amount of time (close to 4 minutes). I can take the actual query, pulled from SQL Profiler, and execute in SSMS in about 1 second. This is consistent and reproducible for me, but if I change the value of one of the parameters (a "Date after 2015-01-22" parameter) to something earlier, like 2015-01-01, or later like 2015-02-01, it works fine in EF. But I put it back to 2015-01-22 and it's slow again. I can repeat this over and over again.
I can then run a similar but unrelated query in EF, then come back to the original, and it runs fine this time - same exact query as before. But if I open a new browser, the cycle starts over again. That part also makes no sense - we're not doing anything to retain the data context in a user session, so I have no clue whatsoever why that comes into play.
But this all tells me that the data itself is fine.
In Profiler, when the query runs properly, it takes about a second or two, and shows about 2,000,000 in reads and about 2,000 in CPU. When it runs slowly, it takes 3.5 minutes, and the values are 300,000,000 and 200,000 - so reads are about 150 times higher and CPU is 100 times higher. Again, for the identical SQL statement.
Any suggestions on what EF might be doing differently that wouldn't show up in the query text? Is there some kind of hidden connection property which might cause a different execution plan in certain circumstances?
EDIT
The query that EF builds is one of the ones where it builds a giant string with the parameter included in the text, not as a SQL parameter:
exec sp_executesql
N'SELECT [GroupBy1].[A1] AS [C1]
FROM (
SELECT COUNT(1) AS [A1]
...
AND ([Extent1].[Added_Time] >= convert(datetime2, ''2015-01-22 00:00:00.0000000'', 121))
...
) AS [GroupBy1]'
EDIT
I'm not adding this as an answer since it doesn't actually address the underlying issue, but this did end up getting resolved by rebuilding indexes and recomputing statistics. That hadn't been done in longer than usual, and it seems to have cleared up whatever caused the issue.
I'll keep reading up on some of the links here in case this happens again, but since it's all working now and unreproduceable, I don't know if I'll ever know for sure exactly what it was doing.
Thanks for all the ideas.
I recently had a very similar scenario, a query would run very fast executing it directly in the database, but had terrible performance using EF (version 5, in my case). It was not a network issue, the difference was from 4ms to 10 minutes.
The problem ended up being a mapping problem. I had a column mapped to NVARCHAR, while it was VARCHAR in the database. Seems inoffensive, but that resulted in an implicit conversion in the database, which totally ruined the performance.
I'm not entirely sure on why this happens, but from the tests I made, this resulted in the database doing an Index Scan instead of an Index Seek, and apparently they are very different performance-wise.
I blogged about this here (disclaimer: it is in Portuguese), but later I found that Jimmy Bogard described this exact problem in a post from 2012, I suggest you check it out.
Since you do have a convert in your query, I would say start from there. Double check all your column mappings and check for differences between your table's column and your entity's property. Avoid having implicit conversions in your query.
If you can, check your execution plan to find any inconsistencies, be aware of the yellow warning triangle that may indicate problems like this one about doing implicit conversion:
I hope this helps you somehow, it was a really difficult problem for us to find out, but made sense in the end.
Just to put this out there since it has not been addressed as a possibility:
Given that you are using Entity Framework (EF), if you are using Lazy Loading of entities, then EF requires Multiple Active Result Sets (MARS) to be enabled via the connection string. While it might seem entirely unrelated, MARS does sometimes produce this exact behavior of something running quickly in SSMS but horribly slow (seconds become several minutes) via EF.
One way to test this is to turn off Lazy Loading and either remove MultipleActiveResultSets=True; (the default is "false") or at least change it to be MultipleActiveResultSets=False;.
As far as I know, there is unfortunately no work-around or fix (currently) for this behavior.
Here is an instance of this issue: Same query with the same query plan takes ~10x longer when executed from ADO.NET vs. SMSS
There is an excellent article about Entity Framework performance consideration here.
I would like to draw your attention to the section on Cold vs. Warm Query Execution:
The very first time any query is made against a given model, the
Entity Framework does a lot of work behind the scenes to load and
validate the model. We frequently refer to this first query as a
"cold" query. Further queries against an already loaded model are
known as "warm" queries, and are much faster.
During LINQ query execution, the step "Metadata loading" has a high impact on performance for Cold query execution. However, once loaded metadata will be cached and future queries will run much faster. The metadata are cached outside of the DbContext and will be re-usable as long as the application pool lives.
In order to improve performance, consider the following actions:
use pre-generated views
use query plan caching
use no tracking queries (only if accessing for read-only)
create a native image of Entity Framework (only relevant if using EF 6 or later)
All those points are well documented in the link provided above. In addition, you can find additional information about creating a native image of Entity Framework here.
I don't have an specific answer as to WHY this is happening, but it certainly looks to be related with how the query is handled more than the query itself. If you say that you don't have any issues running the same generated query from SSMS, then it isn't the problem.
A workaround you can try: A stored procedure. EF can handle them very well, and it is the ideal way to deal with potentially complicated or expensive queries.
Realizing you are using Entity Framework 4.1, I would suggest you upgrade to Entity Framework 6.
There has been a lot of performance improvement and EF 6 is much faster than EF 4.1.
The MSDN article about Entity Framework performance consideration mentioned in my other response has also a comparison between EF 4.1 and EF 6.
There might be a bit of refactoring needed as a result, but the improvement in performance should be worth it (and that would reduce the technical debt at the same time).
I'm working on an application which will be updating multiple databases (SQL Server 2008 and Oracle 11g). TransactionScope seemed like the logical way to ensure updates were always committed correctly, but it seems that installing MSDTC is not going to be an option. In the future, it's also possible this application could be using data sources which don't support distributed transactions.
I've spent many hours trying to come up with another solution but nothing seems like it will work. All searches point back to TransactionScope and distributed transactions.
The application is written in C#, using the Entity Framework. Anyone have any suggestions, which won't require being escalated to distributed transactions? Here's a list of ideas I've had which have gone nowhere.
+TransactionScope: Can't use MSDTC. Future data sources may not support distributed transactions.
+Manually track and rollback transactions: I haven't found a good way to do this within Entity Framework.
+Queue/log failures so they can be re-committed by another process: Can't come up with a good way to store the failed commits generically. Also need to make sure the re-commit doesn't overwrite newer data.
#ThinkJet. That related link is an interesting opinion. In my case a small failure, like what is described, would not be a huge deal. We currently have other stuff in place which tries to keep all these systems in sync (not always successfully). If one or two transactions did fail it should be picked up by these processes.
After reading through these comments, I might try to have this library write the data to it's own database. Then, sync those changes to the other sources so that the other applications can see the changes. It would cause a slight delay in some updates but even that would be better than what we have now.
This is quite a long one, but I'd very much appreciate your thoughts and suggestions.
We are busy rebuilding a legacy system which was written in PHP and MySQL and replacing its components with ASP.MVC in C# and SQL Server. The legacy architecture leaves much to be desired and there is a serious issue with spaghetti code, no referential integrity in the DB, unused code and database fields and just generally bad coding.
As much as I'd love to, we can't just rip out all of the old code and replace it. The company needs to stay functional during the development process, so we will need to build new functionality while using the old databases to ensure that their data is accurate at all times. The level of data accuracy isn't real-time, but if we had 2 systems, they would have to be in sync 100% of the time. The old system uses 6 different MySQL databases, all on the same server, running Linux. We will be running Windows 2008 R2 on the new server for the new system and we are planning to use the latest version of SQL Server.
The problem I'm having to solve is: I need to somehow map all of these databases into a consolidated model that we can use through C# to develop the new system on. Once we have moved all the functionality over to C#, we need to port the data into a DB that matches our code model. This DB will be running on SQL Server. I'm not too worried about the migration just yet; my current issue is finding an ORM tool that will allow me to map these 6 MySQL databases into a single, well planned out and designed model that we can use for the new development.
The new model might have additional fields that we would have to store in a new MySQL database until we port the data across at some stage, so the ORM should support easily building entities that span multiple tables and databases.
Is what I'm trying to do possible? Is it viable in terms of effort? Is there an ORM that can do all of this? and what other way is there to maintain operational capacity of the company whilst developing on the system actively?
I have looked at these ORM options:
SubSonic (great, but I think too lightweight for what we are trying)
Entity Framework (looks like I might be able to use this if I use very dirty models with tons of stored procedures for inserts, updates and deletes)
NHibernate (the client does not want us to use this due to bad experiences in the past)
LLBLGen (seems like it can do what we need it to, but long term support could be a concern with the client)
Anything else I should look at? Is there a different approach I could try?
ORMs aren't designed to solve the problem you have. That said, a quality ORM will get you some percentage of the way toward a solution.
NHibernate is the easy choice. LLBLGen would be my second choice. I wouldn't even bother with EF or SubSonic as they are very feature poor compared to the other two and you need decent feature support in your scenario.
You'll likely have to invest a lot of time in writing custom code around your migration requirements. Your use case is not a standard, well traveled path.
For Entity Framework: if you're prepared to maintain one complete set of stored procedures with a static interface (i.e. same signature) you could implement them all in Transact-SQL on the SQL Server box, with linked servers (to the MySQL farm).
When the time comes, you could migrate the data into SQL Server and update your stored procedures.
Basically, design a nice model with nice stored procedures, and as a temporary solution implement any ugliness inside the stored procedures. Once MySQL is out of the way, you can replace the stored procedures with better ones.
SQL Server has a tendency to retrieve the entire remote table when you're running queries against a linked server, so if performance is a concern it might eventuate that all your stored procedures are wrappers around OPENROWSET (see Example A for running a query on a remote server).
Can anyone give me a bit of a steer in the right direction.
I'm currently trying to write a web interface using asp.net mvc3 which provides a singular view of two systems with backend databases in MSSQL and DB2.
Being new to the entity framework I've attempted to connect to each of the databases in isolation and can pull data back succesfully.
The next logical step is to attempt to join the databases together to gain some leverage over the data. This is where I've hit a bit of a stumbling block.
Looking at Entity Framework it doesn't appear to facilitate cross database joins when the databases are sat on different physical servers. Have I missed something obvious here? I can't seem to find any reference to this?
As a fall back option I thought about using linked servers and wrapping the sql in a view which would theoretically allow me to run the types of queries I need. Has anyone done this?
I thought about using linked servers
and wrapping the sql in a view which
would theoretically allow me to run
the types of queries I need. Has
anyone done this?
I have done this approach many times. I have only needed one or two tables from the other database, so creating the views is easy.