EF holding onto old values in some cases - c#

When using entity framework, when reading from some tables/views, it seems I get old data back. By this I mean that an external process has changed the data.
When running my code, I see EF build and run (using profiler) an SQL query to retrieve the data, but then the old values end up in the object.
What is more confusing to me is that this does not happen for all tables/view, but for the tables/views it does effect, it is consistent.
If I restart IIS I get the correct result, so clearly the values are being held somewhere.
What is causing this selective cacheing of data and how do I influence it?

This is normal when you use same instance of ObjectContext to long. Make it's lifetime as short as possible. Instance per request should be fine.

Related

SQL CLR Web Service Call: Limiting Overhead

I'm attempting to improve query performance for an application and I'm logically stuck.
So the application is proprietary and thus we're unable to alter application-side code. We have, however, received permission to work with the underlying database (surprisingly enough). The application calls a SQL Server database, so the current idea we're running with is to create a view with the same name as the table and rename the underlying table. When the application hits the view, the view calls one of two SQL CLR functions, which both do nothing more than call a web service we've put together. The web service performs all the logic, and contains an API call to an external, proprietary API that performs some additional logic and then returns the result.
This all works, however, we're having serious performance issues when scaling up to large data sets (100,000+ rows). The pretty clear source of this is the fact we're having to work on one row at a time with the web service, which includes the API call, which makes for a lot of latency overhead.
The obvious solution to this is to figure out a way to limit the number of times that the web service has to be hit per query, but this is where I'm stuck. I've read about a few different ways out there for potentially handling scenarios like this, but as a total database novice I'm having difficulty getting a grasp on what would be appropriate in this situation.
If any there are any ideas/recommendations out there, I'd be very appreciative.
There are probably a few things to look at here:
Is your SQLCLR TVF streaming the results out (i.e. are you adding to a collection and then returning that collection at the end, or are you releasing each row as it is completed -- either with yield return or building out a full Enumerator)? If not streaming, then you should do this as it allows for the rows to be consumed immediately instead of waiting for the entire process to finish.
Since you are replacing a Table with a View that is sourced by a TVF, you are naturally going to have performance degradation since TVFs:
don't report their actual number of rows. T-SQL Multi-statement TVFs always appear to return 1 row, and SQLCLR TVFs always appear to return 1000 rows.
don't maintain column statistics. When selecting from a Table, SQL Server will automatically create statistics for columns referenced in WHERE and JOIN conditions.
Because of these two things, the Query Optimizer is not going to have an easy time generating an appropriate plan if the actual number of rows is 100k.
How many SELECTs, etc are hitting this View concurrently? Since the View is hitting the same URI each time, you are bound by the concurrent connection limit imposed by ServicePointManager ( ServicePointManager.DefaultConnectionLimit ). And the default limit is a whopping 2! Meaning, all additional requests to that URI, while there are already 2 active/open HttpWebRequests, will wait inline, patiently. You can increase this by setting the .ServicePoint.ConnectionLimit property of the HttpWebRequest object.
How often does the underlying data change? Since you switched to a View, that doesn't take any parameters, so you are always returning everything. This opens the door for doing some caching, and there are two options (at least):
cache the data in the Web Service, and if it hasn't reached a particular time limit, return the cached data, else get fresh data, cache it, and return it.
go back to using a real Table. Create a SQL Server Agent job that will, every few minutes (or maybe longer if the data doesn't change that often): start a transaction, delete the current data, repopulate via the SQLCLR TVF, and commit the transaction. This requires that extra piece of the SQL Agent job, but you are then back to having more accurate statistics!!
For more info on working with SQLCLR in general, please visit: SQLCLR Info

Very slow T-SQL stored procedure sped up by dropping and recreating

I have a simple stored procedure in T-SQL that is instant when run from SQL Server Management Studio, and has a simple execution plan. It's used in a C# web front-end, where it is usually quick, but occasionally seems to get itself into a state where it sits there and times-out. It then does this consistently from any web-server. The only way to fix it that I’ve found is to drop and recreate it. It only happens with a single stored procedure, out of a couple of hundred similar procedures that are used in the application.
I’m looking for an answer that’s better than making a service to test it every n minutes and dropping and recreating on timeout.
As pointed out by other responses, the reasons could be many, varying from execution plan, to the actual SP code, to anything else. However, in my past experience, I faced a similar problem due to 'parameter sniffing'. Google for it and take a look, it might help. Basically, you should use local variables in your SP instead of the parameters passed in.
Not sure if my situation is too uncommon to be useful to others (It involved use of table variables inside the stored proc). But here is the story anyways.
I was working on an issue where a stored proc would take 10 seconds in most cases, but 3-4 minutes every now and then. After a little digging around, I found a pattern in the issue :
This being a stored proc that takes in a start date and and an end date, if I ran this for a year's worth of data (which is what people normally do), it ran in 10 sec. However when the query plan cache was cleared out, and if someone ran it for a day (uncommon use case), all further calls for a 1-year range would take 3-4 minutes, until I did a DBCC FREEPROCCACHE
The following 2 things were what fixed the problem
My first suspect was Parameter sniffing. Fixed it immediately using the local variable approach This, however, improved performance only by a small percentage (<10%).
In a clutching-at-straws approach, I changed the table variables that the original developer had used in this stored proc, to temp tables. This was what fixed the issue finally. Now that I know that this was the problem, I am doing some reading online, and have come across a few links such as
http://www.sqlbadpractices.com/using-table-variable-for-large-table-vs-temporary-table/
which seem to correspond with the issue I am seeing.
Happy coding!!
It's hard to say for sure without seeing SP code.
Some suggestions.
SQL server by default reuses execution plan for stored procedure. The plan is generated upon the first execution. That may cause a problem. For example, for the first time you provide input with very high selectivity, and SQL Server generates the plan keeping that in mind. Next time you pass low selectivity input, but SP reuses the old plan, causing very slow execution.
Having different execution paths in SP causes the same problem.
Try creating this procedure WITH RECOMPILE option to prevent caching.
Hope that helps.
Run SQL Profiler and execute it from the web site until it happens again. When it pauses / times out check to see what is happening on the SQL server itself.
There are lots of possibilities here depending on what the s'proc actually does. For example, if it is inserting records then you may have issues where the database server needs to expand the database and/or log file size to accept new data. If it's happening on the log file and you have slow drives or are nearing the max of your drive space then it could timeout.
If it's a select, then those tables might be locked for a period of time due to other inserts happening... Or it might be reusing a bad execution plan.
The drop / recreate dance is may only be delaying the execution to the point that the SQL server can catch up or it might be causing a recompile.
My original thought was that it was an index but on further reflection, I don't think that dropping and recreating the stored prod would help.
It most probably your cached execution plan that is causing this.
Try using DBCC FREEPROCCACHE to clean your cache the next time this happens. Read more here http://msdn.microsoft.com/en-us/library/ms174283.aspx
Even this is a reactive step - it does not really solve the issue.
I suggest you execute the procedure in SSMS and check out the actual Execution Plan and figure out what is causing the delay. (in the Menu, go to [View] and then [Include Actual Execution Plan])
Let me just suggest that this might be unrelated to the procedure itself, but to the actual operation you are trying to do on the database.
I'm no MS SQL expert, but I would'n be surprised that it behaves similarly to Oracle when two concurrent transactions try to delete the same row: the transaction that first reaches the deletion locks the row and the second transaction is then blocked until the first one either commits or rolls back. If that was attempted from your procedure it might appear as "stuck" (until the "locking" transaction is finished).
Do you have any long-running transactions that might lock rows that your procedure is accessing?

Can a Snapshot transaction fail and only partially commit in a TransactionScope?

Greetings
I stumbled onto a problem today that seems sort of impossible to me, but its happening...I'm calling some database code in c# that looks something like this:
using(var tran = MyDataLayer.Transaction())
{
MyDataLayer.ExecSproc(new SprocTheFirst(arg1, arg2));
MyDataLayer.CallSomethingThatEventuallyDoesLinqToSql(arg1, argEtc);
tran.Commit();
}
I've simplified this a bit for posting, but whats going on is MyDataLayer.Transaction() makes a TransactionScope with the IsolationLevel set to Snapshot and TransactionScopeOption set to Required. This code gets called hundreds of times a day, and almost always works perfectly. However after reviewing some data I discovered there are a handful of records created by "SprocTheFirst" but no corresponding data from "CallSomethingThatEventuallyDoesLinqToSql". The only way that records should exist in the tables I'm looking at is from SprocTheFirst, and its only ever called in this one function, so if its called and succeeded then I would expect CallSomethingThatEventuallyDoesLinqToSql would get called and succeed because its all in the same TransactionScope. Its theoretically possible that some other dev mucked around in the DB, but I don't think they have. We also log all exceptions, and I can find nothing unusual happening around the time that the records from SprocTheFirst were created.
So, is it possible that a transaction, or more properly a declarative TransactionScope, with Snapshot isolation level can fail somehow and only partially commit?
We have spotted the same issue. I have recreated it here - https://github.com/DavidBetteridge/MSMQStressTest
For us we see the issue when reading from the queue rather than writing to it. Our solution was to change the isolation level of the first read in the subscriber to be serialised.
no, but snapshot isolation level isn't the same as serializable.
snapshoted rows are stored in the tempdb until the row commits.
so some other transaction can read the old data just fine.
at least that's how i understood your problem. if not please provide more info like a grapf of the timeline or something similar.
Can you verify that CallSomethingThatEventuallyDoesLinqToSQL is using the same Connection as the first call? Does the second call read data that the first filed into the db... and if it is unable to "see" that data would cause the second to skip a few steps and not do it's job?
Just because you have it wrapped in a .NET transaction doesn't mean the data as seen in the db is the same between connections. You could for instance have connections to two different databases and want to rollback both if one failed, or file data to a DB and post a message to MSMQ... if MSMQ operation failed it would roll back the DB operation too. .NET transaction would take care of this multi-technology feature for you.
I do remember a problem in early versions of ADO.NET (maybe 3.0) where the pooled connection code would allocate a new db connection rather than use the current one when a .NET level TransactionScope was used. I believe it was fully implemented with 3.5 (I may have my versions wrong.. might be 3.5 and 3.5.1). It could also be caused by the MyDataLayer and how it allocates connections.
Use SQL Profiler to trace these operations and make sure the work is being done on the same spid.
It sounds like your connection may not be enlisted in the transaction. When do you create your connectiion object? If it occurs before the TransactionScope then it will not be enlisted in the transaction.

Load SQL query result data into cache in advance

I have the following situation:
.net 3.5 WinForm client app accessing SQL Server 2008
Some queries returning relatively big amount of data are used quite often by a form
Users are using local SQL Express and restarting their machines at least daily
Other users are working remotely over slow network connections
The problem is that after a restart, the first time users open this form the queries are extremely slow and take more or less 15s on a fast machine to execute. Afterwards the same queries take only 3s. Of course this comes from the fact that no data is cached and must be loaded from disk first.
My question:
Would it be possible to force the loading of the required data in advance into SQL Server cache?
Note
My first idea was to execute the queries in a background worker when the application starts, so that when the user starts the form the queries will already be cached and execute fast directly. I however don't want to load the result of the queries over to the client as some users are working remotely or have otherwise slow networks.
So I thought just executing the queries from a stored procedure and putting the results into temporary tables so that nothing would be returned.
Turned out that some of the result sets are using dynamic columns so I couldn't create the corresponding temp tables and thus this isn't a solution.
Do you happen to have any other idea?
Are you sure this is the execution plan being created, or is it server memory caching that's going on? Maybe the first query loads quite a bit of data, but subsequent queries can use the already-cached data, and so run much quicker. I've never seen an execution plan take more than a second to generate, so I'd suspect the plan itself isn't the cause.
Have you tried running the index tuning wizard on your query? If it is the plan that's causing problems, maybe some statistics or an additional index will help you out, and the optimizer is pretty good at recommending things.
I'm not sure how you are executing your queries, but you could do:
SqlCommand Command = /* your command */
Command.ExecuteReader(CommandBehavior.SchemaOnly).Dispose();
Executing your command with the schema-only command behavior will add SET FMTONLY ON to the query and cause SQL Server to get metadata about the result set (requiring generation of the plan), but will not actually execute the command.
To narrow down the source of the problem you can always use the SQL Server Objects in perfmon to get a general idea of how the local instance of SQL Server Express is performing.
In this case you would most likely see a lower Buffer Cache Hit Ratio on the first request and a higher number on subsequent requests.
Also you may want to check out http://msdn.microsoft.com/en-us/library/ms191129.aspx It describes how you can set a sproc to run automatically when the SQL Server service starts up.
If you retrieve the Data you need with that sproc then maybe the data will remain cached and improve the performance the first time the data is retrieved by the end user via your form.
In the end I still used the approach I tried first: Executing the queries from a stored procedure and putting the results into temporary tables so that nothing would be returned. This 'caching' stored procedure is executed in the background whenever the application starts.
It just took some time to write the temporary tables as the result sets are dynamic.
Thanks to all of you for your really fast help on the issue!

ChangeConflictException in Linq to Sql non-concurrent updates

I'm using LINQ to SQL, and having a bit of an issue incrementing a view counter cross-connection.
The teeny bit of code I'm using is:
t = this.AppManager.ForumManager.GetThread(id);
t.Views = t.Views + 1;
this.AppManager.DB.SubmitChanges();
Now in my tests, I am running this multiple times, non-concurrently. There are a total of 4 copies of the object performing this test.
That is to say, there is no locking issue, or anything like that but there are 4 data contexts.
Now, I would expect this to work like this: fetch a row, modify a field, update the row. However, this is throwing a ChangeConflictException.
Why would the change be conflicted if none of the copies of this are running concurrently?
Is there a way to ignore change conflicts on a certain table?
EDIT: Found the answer:
You can set "UpdateCheck = Never" on all columns on a table to create a last-in-wins style of update. This is what the application was using before I ported it to LINQ, so that is what I will use for now.
EDIT2: While my fix above did indeed prevent the exception from being thrown, it did not fix the underlying issue:
Since I have more than one data context, there ends up being more than one cached copy of each object. Should I be recreating my data context with every page load?
I would rather instruct the data context to forget everything. Is this possible?
I believe DataContext is indented to be relatively lightweight and short-lived. IMO, you should not cache data loaded with a DataContext longer than necessary. When it's short lived, it remains relatively small because (as I understand it) the DataContext's memory usage is primarily associated with tracking the changes you make to objects managed by it (retrieved by it).
In the application I work on, we create the context, display data on the UI, wait for user updates and then update the data. However, that is necessary mainly because we want the update to be based on what the user is looking at (otherwise we could retrieve the data and update all at once when the user hits update). If your updates are relatively free-standing, I think it would be sensible to retrieve the row immediately before updating it.
You can also use System.Data.Linq.DataContext.Refresh() to re-sync already-retrieved data with data in the database to help with this problem.
To respond to your last comment about making the context forget everything, I don't think there's a way to do that, but I suspect that's because all there is to a context is the tracked changes (and the connection), and it's just as well that you create a new context (remember to dispose of the old one) because really you want to throw away everything that the context is.

Categories

Resources