I have two scenarios (examples below), both are perfectly legitimate methods of making a database request, however I'm not really sure which is best.
Example One - This is the method we generally use when building new applications.
private readonly IInterfaceName _repositoryInterface;
public ControllerName()
{
_repositoryInterface = new Repository(Context);
}
public JsonResult MethodName(string someParameter)
{
var data = _repositoryInterface.ReturnData(someParameter);
return data;
}
protected override void Dispose(bool disposing)
{
Context.Dispose();
base.Dispose(disposing);
}
public IEnumerable<ModelName> ReturnData(filter)
{
Expression<Func<ModelName, bool>> query = q => q.ParameterName.ToUpper().Contains(filter)
return Get(filter);
}
Example Two - I've recently started seeing this more frequently
using (SqlConnection connection = new SqlConnection(
ConfigurationManager.ConnectionStrings["ConnectionName"].ToString()))
{
var storedProcedureName = GetStoredProcedureName();
using (SqlCommand command = new SqlCommand(storedProcedureName, connection))
{
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add("#Start", SqlDbType.Int).Value = start;
using (SqlDataReader reader = command.ExecuteReader())
{
// DATA IS READ AND PARSED
}
}
}
Both examples use Entity Framework in some form (the first more so than the other), there are Model and Mapping files for every table which could be interrogated. The main thing the second example does over the first (regarding EF) is utilising Migrations as part of the Stored Procedure code generation. In addition, both implement the Repository pattern similar to that which is in the second link below.
Code First - MSDN
Contoso University - Tutorial
My understanding of Example One is that the repository and context are instantiated once the Controller is called. When making the call to the repository it returns the data but leaves the context intact until it is disposed of at the end of the method. Example Two on the other hand will call Dispose as soon as the database call is finished with (unless forced into memory, e.g. using .ToList() on an IEnumerable). If my understanding is not correct, please correct me where appropriate.
So my main question is what are the disadvantages and advantages of using one over the other? Example, is there a larger performance overhead of going with Example 2 compared to Example 1.
FYI: I've tried to search for an answer to the below but have been unsuccessful, so if you are of a similar question please feel free to point me in that direction.
You seem to be making a comparison like this:
Is it better to build a house or to install plumbing in the bathroom?
You can have both. You could have a repository (house) that uses data connections (plumbing) so it's not an "OR" situation.
There is no reason why the call to ReturnData doesn't use a SqlCommand under the hood.
Now, the real important difference that is worth considering is whether or not the repository holds a resource (memory, connection, pipe, file, etc) open for its lifetime, or just per data call.
The advantage of using a using is that resources are only opened for the duration of the call. This helps immensely with scaling of the app.
On the other hand there's an overhead to opening connections, so it's better - particularly for single threaded apps - to open a connection, do several tasks, and then close it.
So it really boils down to what type of app you're writing as to which approach you use.
Your second example isn't using entity framework. It seems you may have two different approaches to data access here although it is hard to tell from the repository snippet as it quite rightly hides the data access implementation. The second example is correctly using a "using" statement as you should on any object that implements IDisposable. It means you don't have to worry about calling dispose. This is using pure ADO.net which is what Entity Framework uses under the hood.
If the first example is using Entity framework you most likely have lazy loading in play in which case you need the DbContext to remain until the query has been executed. Entity Framework is an ORM tool. It too uses ADO.net under the hood to connect to the database but it also offers you alot more on top. A good book on both subjects should help you.
I found learning ADO.net first helps alot in understanding how Entity Framework retrieves info from the Database.
the using statement is good practice where ever you find an object that implements IDisposable. You can read more about that here : IDisposable the right way
In response to the change to the question - the answer still on the whole remains the same. In terms of performance - how fast are the queries returned? Does the performance of one work better than the other? Only your current system and set up can tell you that. Both approaches seem to be doing things the correct way.
I haven't worked with Migrations so not sure why you are getting ADO.net type queries integrating with your EF models but wouldn't be surprised by this functionality. Entity Framework as I have experienced it creates the queries for you and then executes them using the ADO.net objects from your second example. The key point is that you want to have the "using" block for SqlConnection and SqlCommand objects (although I don't think you need to nest them. everything inside the outer "using block will be disposed).
There is nothing stopping you putting a "using" block in your repository around the context but when it comes to lazily load the related Entities you will get an error as the context will have been disposed. If you need to make this change you can include the relevant elements in your query and do away with the lazy loading approach. There are performance gains in certain situations for doing this but again you need to balance this in terms to how your system is performing.
Related
I'm going through old projects at work trying to make them faster. I'm currently looking at some web APIs. One API is running particularly slow the problem is in the data service it is calling. Specifically it is in a lambda method trying to map a stored procedure result to a domain model. A simple version of the code.
public IEnumerable<DomainModelResult> GetData()
{
return this.EntityFrameworkDB.GetDataSproc().ToList()
.Select(sprocResults=>sprocResults.ToDomainModelResult())
.AsEnumerable();
}
This is a simplified version, but after profiling it I found the major hangup is in the lambda function. I am assuming this is because the EFContext is still open and some goofy entity framework stuff is happening.
Problem is I'm relatively new to Entity Framework(intern) and pretty ignorant to the inner workings of it. Could someone explain why this is so slow. I feel it should be very fast The DomainModelResult is a POCO and only setter methods are being used in ToDomainModelResult.
Edit:
I thought ToList() would do that but started to doubt myself because I couldn't think of another explanation. All the ToDomainModelResult() stuff is extremely simple. Something like.
public static DomainModelResult ToDomainModelResult(SprocResult source)
{
return new DomainModeResult
{
FirstName = source.description,
MiddleName = source._middlename,
LastName = source.lastname,
UserName = source.expr2,
Address = source.uglyName
};
}
Its just a bunch of simple setters, I think the model causing problems has 17 properties. The reason this is being done is because the project is old database first and the stored procedures have ugly names that aren't descriptive at all. Also so switching the stored procedures in dataservices is easy and doesn't break the rest of the project.
Edit:2 For some reason Using ToArray and breaking apart the linq statements makes the assignment from procedure result to domain model result extremely fast. Now the whole dataservice method is faster which is odd, I don't know where the rest of the time went.
This might be a more esoteric question than I originally thought. My question hasn't been answered but the problem is no longer there. Thanks to all the replied. I'm keeping this as unanswered for now.
Edit3: Please flag this question for removal I can't remove it. I found the problem but it is totally unrelated to my original question. I misunderstood the problem when I asked the question. The increase in speed I'm chalking up to compiler optimization and running code in the profiler. The real issues wasn't in my lambda but in a dynamic lambda called by entity framework when the context is closed or an object is accessed it was doing data validation. GetString, GetInt32, and ISDBNull were eating up the most time. So I'm assuming microsoft has optimized these methods and the only way to speed this up is possibly making some variable not nullable in the procedure. This question is misleading and so esoteric I don't think it belongs here and will just confuse people. Sorry.
You should split the code and check which one is taking time.
public IEnumerable<DomainModelResult> GetData()
{
var lst = this.EntityFrameworkDB.GetDataSproc().ToList();
return lst
.Select(sprocResults=>sprocResults.ToDomainModelResult())
.AsEnumerable();
}
I am pretty sure the GetDataSproc procedure is taking most of your time. You need to optimize the stored procedure code
Update
If possible, it is better to do more work on SQL side instead of retrieving 60,000 rows into your memory. Few possible solutions:
If you need to display this information, do paging (top and skip)
If you are doing any filtering or calculating or grouping anything after you retrieve rows in memory, do it in your stored proc
.Net side, as you are returning IEnumerable you may able to use yield on your second part, depends on your architecture
I am using the .xsd dataset thingies (which I hate) to auto-generate TableAdapter classes for some backend code.
I have not really used these before, tending to favour manual commands and stored procs whenever possible (for various speed-induced reasons: those xsds play hell with dynamic tables and really large amounts of columns), and am finding myself instantiating a TableAdapter in a large number of my methods, so my question is this:
Will the auto-generated code automatically streamline itself so that a full adapter class is not created on an instatiation, and instead share some static data (such as connection information), and if not would it be better for me to have some sort of singleton/static class provider that can give me access to their methods when needed without the overhead of creating a new adapter every time I want to get some information?
Cheers, Ed
If you're concerned about the performance you could always run a benchmark to see what the performance hit, if any, is.
Sorry you didn't find my answer useful.
My point was that while you had received responses they all seemed to be subjective and not based on hard data. So if you had some reason to be concerned that there was a performance hit in your particular application you should measure it.
There is no reason to refactor one area for performance unless there is an actual problem.
I actually tend to instanciate a very low number of adapters (usually only one of each type). I never tried using them as on the stack variables (instantiated when needed), so I never ran into your question, but I understand your concern.
From what I know the aqdapters themselves may be quite heavyweight in instancing, but the real killer is the connection. What I do is I mark the adapter's Connection modifier as Public in the .xsd designer so I can assign the property whatever I need it to use, and maintain a tight grip on the opening and closing of connections:
void Load() {
using (SqlConnection conn = ...) {
conn.Open();
invoicesAdapter.Connection = conn;
customersAdapter.Connection = conn;
invoicesAdapter.Fill(dataSet.Invoices);
customersAdapter.Fill(dataSet.Customers);
}
}
void Save() {
using (SqlConnection conn = ...) {
conn.Open();
invoicesAdapter.Connection = conn;
customersAdapter.Connection = conn;
invoicesAdapter.Update(dataSet);
customersAdapater.Update(dataSet);
}
}
I ommitted transaction control and error handling for brevity.
From a performance perspective, is it better to wrap each statement that utilizes LINQ in a using() statement, or to declare a class-level instance and use in each method?
For instance:
public void UpdateSomeRecord(Record recordToUpdate)
{
using(var entities = new RecordDBEntities())
{
// logic here...
}
}
private RecordDBEntities entites = new RecordDBEntities();
public void UpdateSomeRecord(Record recordToUpdate)
{
// logic here...
}
Or does it not matter either way?
Thanks!
The using statement may hurt performance in the sense that it will take longer to run but this shouldn't be your concern in cases like this. If a type implements IDisposable it really ought to be wrapped in a using statement so that it can clean up after itself.
This cleanup code will take longer to run than no cleanup code of course so that is why I say that the using statement will take longer to run. But this does not mean that you shouldn't have the using statement. I think that you should use the using statement even though it may take longer to run.
I guess what I am trying to say is that you are comparing apples to oranges here as performance comparisons only make sense when the code being compared creates identical output and identical side effects. Your examples do not so that I why I don't think this is a performance issue.
The best practice in this situation is to use the using statement on types that implement IDisposable regardless of the fact that the using statement will make the method run longer. If you need to know how much longer it will run then you should employ a profiler to identify if the code in question is creating a bottleneck.
In fact your question is about the lifetime management of the LINQ DataContext.
You may wish to look at the following article: Linq to SQL DataContext Lifetime Management
I have a method that I want to be "transactional" in the abstract sense. It calls two methods that happen to do stuff with the database, but this method doesn't know that.
public void DoOperation()
{
using (var tx = new TransactionScope())
{
Method1();
Method2();
tc.Complete();
}
}
public void Method1()
{
using (var connection = new DbConnectionScope())
{
// Write some data here
}
}
public void Method2()
{
using (var connection = new DbConnectionScope())
{
// Update some data here
}
}
Because in real terms the TransactionScope means that a database transaction will be used, we have an issue where it could well be promoted to a Distributed Transaction, if we get two different connections from the pool.
I could fix this by wrapping the DoOperation() method in a ConnectionScope:
public void DoOperation()
{
using (var tx = new TransactionScope())
using (var connection = new DbConnectionScope())
{
Method1();
Method2();
tc.Complete();
}
}
I made DbConnectionScope myself for just such a purpose, so that I don't have to pass connection objects to sub-methods (this is more contrived example than my real issue). I got the idea from this article: http://msdn.microsoft.com/en-us/magazine/cc300805.aspx
However I don't like this workaround as it means DoOperation now has knowledge that the methods it's calling may use a connection (and possibly a different connection each). How could I refactor this to resolve the issue?
One idea I'm thinking of is creating a more general OperationScope, so that when teamed up with a custom Castle Windsor lifestyle I'll write, will mean any component requested of the container with OperationScopeLifetyle will always get the same instance of that component. This does solve the problem because OperationScope is more ambiguous than DbConnectionScope.
I'm seeing conflicting requirements here.
On the one hand, you don't want DoOperation to have any awareness of the fact that a database connection is being used for its sub-operations.
On the other hand, it clearly is aware of this fact because it uses a TransactionScope.
I can sort of understand what you're getting at when you say you want it to be transactional in the abstract sense, but my take on this is that it's virtually impossible (no, scratch that - completely impossible) to describe a transaction in such abstract terms. Let's just say you have a class like this:
class ConvolutedBusinessLogic
{
public void Splork(MyWidget widget)
{
if (widget.Validate())
{
widgetRepository.Save(widget);
widget.LastSaved = DateTime.Now;
OnSaved(new WidgetSavedEventArgs(widget));
}
else
{
Log.Error("Could not save MyWidget due to a validation error.");
SendEmailAlert(new WidgetValidationAlert(widget));
}
}
}
This class is doing at least two things that probably can't be rolled back (setting the property of a class and executing an event handler, which might for example cascade-update some controls on a form), and at least two more things that definitely can't be rolled back (appending to a log file somewhere and sending out an e-mail alert).
Perhaps this seems like a contrived example, but that is actually my point; you can't treat a TransactionScope as a "black box". The scope is in fact a dependency like any other; TransactionScope just provides a convenient abstraction for a unit of work that may not always be appropriate because it doesn't actually wrap a database connection and can't predict the future. In particular, it's normally not appropriate when a single logical operation needs to span more than one database connection, whether those connections are to the same database or different ones. It tries to handle this case of course, but as you've already learned, the result is sub-optimal.
The way I see it, you have a few different options:
Make explicit the fact that Method1 and Method2 require a connection by having them take a connection parameter, or by refactoring them into a class that takes a connection dependency (constructor or property). This way, the connection becomes part of the contract, so Method1 no longer knows too much - it knows exactly what it's supposed to know according to the design.
Accept that your DoOperation method does have an awareness of what Method1 and Method2 do. In fact, there is nothing wrong with this! It's true that you don't want to be relying on implementation details of some future call, but forward dependencies in the abstraction are generally considered OK; it's reverse dependencies you need to be concerned about, like when some class deep in the domain model tries to update a UI control that it has no business knowing about in the first place.
Use a more robust Unit of Work pattern (also: here). This is getting to be more popular and it is, by and large, the direction Microsoft has gone in with Linq to SQL and EF (the DataContext/ObjectContext are basically UOW implementations). This sleeves in well with a DI framework and essentially relieves you of the need to worry about when transactions start and end and how the data access has to occur (the term is "persistence ignorance"). This would probably require significant rework of your design, but pound for pound it's going to be the easiest to maintain long-term.
Hope one of those helps you.
We have a SQL utility class that takes the name of a stored procedure an its input parameters, and returns the results in datatable. The reasoning behind this is so that we don't have to worry about forgetting to close connections and having connection leaks. Also so that we can reduce code by not having to recreate datadapters and datareaders in our data access layers.
The problem I have with this is that we're populating a datatable so that we can loop through it to create our objects, so we're basically using it like a datareader. I've read about classes that will return a datareader or dataadapter. But the problem with this is either client has to open and close connections, or you have to close the connection in a Finalize method. It seems that you wouldn't want garbage collection being responsible for closing your database connections.
To sum up, we want to have a class so that we can reduce code by not having to create datareaders for every query and so that we can ensure database connections are closed.
What is the best way of handling this?
UPDATE: Still thinking about this, but so far it seems that the best practice is to still return a datareader, use CommandBehavior.CloseConnection, and then trust who ever uses the class to call dr.Close()?
Have you considered the Microsoft Enterprise Library?
public List<User> GetUsers()
{
List<User> result = new List<User>();
Database db = new
Microsoft.Practices.EnterpriseLibrary.Data.Sql.SqlDatabase(this.connectionString);
DbCommand cmd = db.GetStoredProcCommand("GetUsers");
using (IDataReader rdr = db.ExecuteReader(cmd))
{
while (rdr.Read())
{
User user = new User();
FillUser(rdr, user);
result.Add(user);
}
}
return result;
}
We use something like this and it performs very well under high volume.
public SqlDataReader ExecuteReader(string command, SqlParameter[] parameters)
{
SqlDataReader reader = null;
using (SqlConnection conn = new SqlConnection())
using (SqlCommand cmd = conn.CreateCommand())
{
conn.Open();
cmd.CommandText = command;
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddRange(parameters);
reader = cmd.ExecuteReader(CommandBehavior.CloseConnection);
}
return reader;
}
DataTables are not considered best practice for several reasons including their bloat and lack of type safety.
I have the same structure - utility classes with methods that fetch the data and return filled DataTables (or fill/update a DataTable passed in to them) - for exactly the same reasons: keeping the database connections separate from the rest of the code and ensuring they are opened only when required and closed asap. Especially since the data is stored in various back-end systems, and I want to present only one interface to my application and not have it worry about the details.
There is one difference to your situation: We don't (in general) create objects from the rows in the DataTables, but rather work directly on the data in the rows. I find working with DataTables simple and efficient.
Other than that, I personally don't see anything wrong with this approach and find that it works very well for our purposes.
Returning a datareader doesn't work in a lot of scenarios. At a lot of places, direct connections to the database from the client machine are not allowed in production (for good reason). So you have to serialize the objects you are retrieving. I can think of designs that would allow you to persist a datareader in whatever class you use for remoting/serialization on the server side but returning items across http or nettcp in row by agonizing row fashion likely does not offer much benefit.
Are you serializing these objects? If so, your choices boil down to Datatable, Dataset, or custom objects. Custom objects, if written well, perform and serialize the best but you have to write concurrency in addition to a bunch of other functionality.
IMO, since ADO.Net 2.0, datatables can perform well even in large scale remoting situations. They provide a special binary remoting format and are simple to work with. Throw in some compression and you're not even using a lot of bandwidth for your large data sets.
well, if you plan to use this class inside of web pages you can register the utility class with the page's unload event. In the event sink you can write your logic to close the database connection. Check out this tip on codeproject for more ideas.
however this solution won't work for use inside web methods (web services). I suppose you'd have to adapt the technique for web service use. Your last line in the web method should should be an event call. So when you write your web service class, define an event called WebMethodCompleted. You'd probably get a reference to the instance of the web service via the technique mentioned in the article. Once you get a reference you can register tthe event in your utility class. Just remember to invoke the event in the web method.
Happy programming.