Why isn't "CreateCommand()" part of C# (or at least .NET)? - c#

After successfully going through the initial stages of learning C# in tandem with SQL Server, I discovered that the various tutorials that I used simply got it wrong by declaring a global SqlConnection, SqlDataAdapter and even DataSet variables.
As a result, this code that works great in a single threaded application, doesn't work so great in a multi-threaded environment. In my research for a solution, I discovered that both MSDN and this educational answer recommend wrapping the "atomic" parts of a SQL transaction in a using/try method:
private static void CreateCommand(string queryString, string connectionString)
{
using (SqlConnection connection = new SqlConnection(connectionString))
{
try
{
SqlCommand command = new SqlCommand(queryString, connection);
command.Connection.Open();
command.ExecuteNonQuery();
}
catch (InvalidOperationException)
{
//log and/or rethrow or ignore
}
catch (SqlException)
{
//log and/or rethrow or ignore
}
catch (ArgumentException)
{
//log and/or rethrow or ignore
}
}
}
So, what I am going to do now is convert my entire code to using wrappers like this. But before proceeding ahead with this I would like to understand the tradeoffs of this approach. In my experience, there usually is a good reason for a large team of designers/engineers for deciding not to include certain defensive features. This is especially interesting when, from my point of view as a C/C++ programmer, the entire value proposition of C# is "defensiveness" (where the tradeoff is the well known CLR performance hit).
To summarize my question(s):
What are the tradeoffs of encapsulating every transaction in my code as described above?
Are there any caveats I should be looking for?

The reason's down to flexibility. Does the developer want to include the command in a transaction, do they want to retry on a given error, if so how many times, do they want a connection from a thread pool or to create a new connection each time (with a performance overhead), do they want a SQL connection or a more generic DbConnection, etc.
However, MS have provided the Enterprise Library, a suite of functionality which wraps up a lot of common approaches to things in an open source library. Take a look at the Data Access block:
http://msdn.microsoft.com/en-us/library/ff632023.aspx

There is no such method built in because:
Connecting and disconnecting the database for each command is not economical. If you execute more than one command at a given point in the code, you want to use the same connection for them instead of repeatedly opening and closing the connection.
The method can't know what you want to do about each kind of exception, so the only thing that it could do would be to rethrow them, and then there is no point in catching the exceptions in the first place.
So, almost everything that the method does would be specific for each situatuon.
Besides, the method would have to do more to be generally useful. It would have to take parameters for command type and parameters. Otherwise it can only be used for text queries, and would encourage people to create SQL queries dynamically instead of using stored procedures and/or parameterised queries, and that it not something that a general library would want to do.

1 - There are no real tradeoffs, it's pretty standard.
2 - Your code is ok to send commands as strings to be executed as SQL queries, but it lacks quit a bit of flexibility:
You can't use parameterized queries (command.Parameters.AddWithValue(...)) which will be mandatory once you start using stored procedures
You can't use output parameters like this
You can't do anything with whatever would be queried
I prefer to use something like this:
private static void CallProc(string storedProcName, Action<SqlCommand> fillParams, Action postAction, Action onError)
{
using (SqlConnection connection = new SqlConnection(connectionString))
{
using (SqlCommand command = new SqlCommand(String.Format("[dbo].[{0}]", storedProcName), connection))
{
try
{
if(fillParams != null)
fillParams(command);
command.Connection.Open();
command.ExecuteNonQuery();
if(postAction != null)
postAction();
}
catch (InvalidOperationException)
{
//log and/or rethrow or ignore
throw;
}
catch (SqlException)
{
//log and/or rethrow or ignore
throw;
}
catch (ArgumentException)
{
//log and/or rethrow or ignore
throw;
}
catch
{
if(onError != null)
onError();
}
}
}
}
You can then make variants to handle return values, output parameters, etc.
And you call is like:
CallProc("myStoredProc",
command =>
{
command.Parameters.AddWithValue("#paramNameOne", "its value here");
// More parameters for the stored proc...
},
null,
null);

As long as you encapsulate the functionality in a "bottleneck" method like the static method you've posted, so that all your database accesses are implemented in one easy-to-change piece of shared code, there often needs to be no trade-off, because you can change the implementation later without having to rewrite vast tracts of code.
By creating a new connection every time, the risk is that you might incur an expensive overhead for every open/close of the connection. However, the connections should be pooled, in which case the overheads may not be very large and this performance hit may be minimal.
The other approach would be to create a single connection and hold it open, sharing it for all your queries. This is undoubtedly more efficient because you're minimising the overheads per transaction. However, the performance gain may be minimal.
In both cases there will be additional threading (multiple simultaneous queries) issues to resolve unless you make sure that all database queries operate on a single thread. The performance implications all depend on how many queries you're firing off per second - and of course it doesn't matter how efficient your connection approach is if you are using grossly inefficient queries; you need to focus your "optimisation" time on the worst performance issues.
So I'd suggest keeping it simple for now and avoiding premature optimisation, but try to keep the implementation of the database access code in a separate layer, so that your main codebase simply issues commands to the access layer, and has minimal database-specific code in it. The less it "knows" about the database the better. This will make it much easier to change the underlying implementation or port your code to use a different database engine in future.
Another approach that can help with this is to encapsulate queries in stored procedures. This means your program knows the name of the procedure and the parameters for it, but the actual tables/columns that are accessed are hidden inside the database. Your code then knows as little as possible of the low-level structure of the database, which improves its flexibility, maintainability, and portability. Stored procedure calls can also be more efficient than sending generic SQL commands.

Related

best way to catch database constraint errors

I am calling a stored procedure that inserts data in to a sql server database from c#. I have a number of constraints on the table such as unique column etc. At present I have the following code:
try
{
// inset data
}
catch (SqlException ex)
{
if (ex.Message.ToLower().Contains("duplicate key"))
{
if (ex.Message.ToLower().Contains("url"))
{
return 1;
}
if (ex.Message.ToLower().Contains("email"))
{
return 2;
}
}
return 3;
}
Is it better practise to check if column is unique etc before inserting the data in C#, or in store procedure or let an exception occur and handle like above? I am not a fan of the above but looking for best practise in this area.
I view database constraints as a last resort kind of thing. (I.e. by all means they should be present in your schema as a backup way of maintaining data integrity.) But I'd say the data should really be valid before you try to save it in the database. If for no other reason, then because providing feedback about invalid input is a UI concern, and a data validity error really shouldn't bubble up and down the entire tier stack every single time.
Furthermore, there are many sorts of assertions you want to make about the shape of your data that can't be expressed using constraints easily. (E.g. state transitions of an order. "An order can only go to SHIPPED from PAID" or more complex scenarios.) That is, you'd need to use involving procedural-language based checks, ones that duplicate even more of your business logic, and then have those report some sort of error code as well, and include yet more complexity in your app just for the sake of doing all your data validation in the schema definition.
Validation is inherently hard to place in an app since it concerns both the UI and is coupled to the model schema, but I veer on the side of doing it near the UI.
I see two questions here, and here's my take...
Are database constraints good? For large systems they're indepensible. Most large systems have more than one front end, and not always in compatible languages where middle-tier or UI data-checking logic can be shared. They may also have batch processes in Transact-SQL or PL/SQL only. It's fine to duplicate the checking on the front end, but in a multi-user app the only way to truly check uniqueness is to insert the record and see what the database says. Same with foreign key constraints - you don't truly know until you try to insert/update/delete.
Should exceptions be allowed to throw, or should return values be substituted? Here's the code from the question:
try
{
// inset data
}
catch (SqlException ex)
{
if (ex.Message.ToLower().Contains("duplicate key"))
{
if (ex.Message.ToLower().Contains("url"))
{
return 1; // Sure, that's one good way to do it
}
if (ex.Message.ToLower().Contains("email"))
{
return 2; // Sure, that's one good way to do it
}
}
return 3; // EVIL! Or at least quasi-evil :)
}
If you can guarantee that the calling program will actually act based on the return value, I think the return 1 and return 2 are best left to your judgement. I prefer to rethrow a custom exception for cases like this (for example DuplicateEmailException) but that's just me - the return values will do the trick too. After all, consumer classes can ignore exceptions just as easily as they can ignore return values.
I'm against the return 3. This means there was an unexpected exception (database down, bad connection, whatever). Here you have an unspecified error, and the only diagnostic information you have is this: "3". Imagine posting a question on SO that says I tried to insert a row but the system said '3'. Please advise. It would be closed within seconds :)
If you don't know how to handle an exception in the data class, there's no way a consumer of the data class can handle it. At this point you're pretty much hosed so I say log the error, then exit as gracefully as possible with an "Unexpected error" message.
I know I ranted a bit about the unexpected exception, but I've handled too many support incidents where the programmer just sequelched database exceptions, and when something unexpected came up the app either failed silently or failed downstream, leaving zero diagnostic information. Very naughty.
I would prefer a stored procedure that checks for potential violations before just throwing the data at SQL Server and letting the constraint bubble up an error. The reasons for this are performance-related:
Performance impact of different error handling techniques
Checking for potential constraint violations before entering SQL Server TRY and CATCH logic
Some people will advocate that constraints at the database layer are unnecessary since your program can do everything. The reason I wouldn't rely solely on your C# program to detect duplicates is that people will find ways to affect the data without going through your C# program. You may introduce other programs later. You may have people writing their own scripts or interacting with the database directly. Do you really want to leave the table unprotected because they don't honor your business rules? And I don't think the C# program should just throw data at the table and hope for the best, either.
If your business rules change, do you really want to have to re-compile your app (or all of multiple apps)? I guess that depends on how well-protected your database is and how likely/often your business rules are to change.
I did something like this:
public class SqlExceptionHelper
{
public SqlExceptionHelper(SqlException sqlException)
{
// Do Nothing.
}
public static string GetSqlDescription(SqlException sqlException)
{
switch (sqlException.Number)
{
case 21:
return "Fatal Error Occurred: Error Code 21.";
case 53:
return "Error in Establishing a Database Connection: 53.";
default
return ("Unexpected Error: " + sqlException.Message.ToString());
}
}
}
Which allows it to be reusable, and it will allow you to get the Error Codes from SQL.
Then just implement:
public class SiteHandler : ISiteHandler
{
public string InsertDataToDatabase(Handler siteInfo)
{
try
{
// Open Database Connection, Run Commands, Some additional Checks.
}
catch(SqlException exception)
{
SqlExceptionHelper errorCompare = new SqlExceptionHelper(exception);
return errorCompare.ToString();
}
}
}
Then it is providing some specific errors for common occurrences. But as mentioned above; you really should ensure that you've tested your data before you just input it into your database. That way no mismatched constraints surface or exists.
Hope it points you in a good direction.
Depends on what you're trying to do. Some things to think about:
Where do you want to handle your error? I would recommend as close to the data as possible.
Who do you want to know about the error? Does your user need to know that 'you've already used that ID'...?
etc.
Also -- constraints can be good -- I don't 100% agree with millimoose's answer on that point -- I mean, I do in the should be this way / better performance ideal -- but practically speaking, if you don't have control over your developers / qc, and especially when it comes to enforcing rules that could blow your database up (or otherwise, break dependent objects like reports, etc. if a duplicate key were to turn-up somewhere, you need some barrier against (for example) the duplicate key entry.

Command.Prepare() Causing Memory Leakage?

I've sort of inherited some code on this scientific modelling project, and my colleagues and I are getting stumped by this problem. The guy who wrote this is now gone, so we can't ask him (go figure).
Inside the data access layer, there is this insert() method. This does what it sounds like -- it inserts records into a database. It is used by the various objects being modeled to tell the database about themselves during the course of the simulation.
However, we noticed that during longer simulations after a fair number of database inserts, we eventually got connection timeouts. So we upped the timeout limits, and then we started getting "out of memory" errors from PostgreSQL. We eventually pinpointed the problem to a line where an IDbCommand object uses Prepare(). Leaving it in causes memory usage to indefinitely go up. Commenting out this line causes the code to work just fine, and eliminates all the memory problems. What does Prepare() do that causes this? I can't find anything in the documentation to explain this.
A compressed version of the code follows.
public virtual void insert(DomainObjects.EntityObject obj)
{
lock (DataBaseProvider.DataBase.Connection)
{
IDbCommand cmd = null;
IDataReader noInsertIdReader = null;
IDataReader reader= null;
try
{
if (DataBaseProvider.DataBase.Validate)
{ ... }
// create and prepare the insert command
cmd = createQuery(".toInsert", obj);
cmd.Prepare(); // This is what is screwing things up
// get the query to retreive the sequence number
SqlStatement lastInsertIdSql = DAOLayer...getStatement(this.GetType().ToString() + ".toGetLastInsertId");
// if the obj insert does not use a sequence, execute the insert command and return
if (lastInsertIdSql == null)
{
noInsertIdReader = cmd.ExecuteReader();
noInsertIdReader.Close();
return;
}
// append the sequence query to the end of the insert statement
cmd.CommandText += ";" + lastInsertIdSql.Statement;
reader = cmd.ExecuteReader();
// read the sequence number and set the objects id
...
}
// deal with some specific exceptions
...
}
}
EDIT: (In response to the first given answer) All the database objects do get disposed in a finally block. I just cut that part out here to save space. We've played with that a bit and that didn't make any difference, so I don't think that's the problem.
You'll notice that IDbCommand and IDataReader both implement IDisposable. Whenever you create an instance of an IDisposable object you should either wrap it in a using statement or call Dispose once you're finished. If you don't you'll end up leaking resources (sometimes resources other than just memory).
Try this in your code
using (IDbCommand cmd = createQuery(".toInsert", obj))
{
cmd.Prepare(); // This is what is screwing things up
...
//the rest of your example code
...
}
EDIT to talk specifically about Prepare
I can see from the code that you're preparing the command and then never reusing it.
The idea behind preparing a command is that it costs extra overhead to prepare, but then each time you use the command it will be more efficient than a non prepared statement. This is good if you've got a command that you're going to reuse a lot, and is a trade off of whether the overhead is worth the performance increase of the command.
So in the code you've shown us you are preparing the command (paying all of the overhead) and getting no benefit because you then immediately throw the command away!
I would either recycle the prepared command, or just ditch the call to the prepare statement.
I have no idea why the prepared commands are leaking, but you shouldn't be preparing so many commands in the first place (especially single use commands).
The Prepare() method was designed to make the query run more efficiently. It is entirely up to the provider to implement this. A typical one creates a temporary stored procedure, giving the server an opportunity to pre-parse and optimize the query.
There's a couple of ways code like this could leak memory. One is a typical .NET detail, a practical implementation of an IDbCommand class always has a Dispose() method to release resources explicitly before the finalizer thread does it. I don't see it being used in your snippet. But pretty unlikely in this case, it is very hard to consume all memory without ever running the garbage collector. You can tell from Perfmon.exe and observe the performance counters for the garbage collector.
The next candidate is more insidious, you are using a big chunk of native code. Dbase providers are not that simple. The FOSS kind tends to be designed to allow you to get the bugs out of them. Source code is available for a reason. Perfmon.exe again to diagnose that, seeing the managed heaps not growing beyond bounds but private bytes exploding is a dead giveaway.
If you don't feel much like debugging the provider you could just comment the statement.

When using auto-generated TableAdapters, what is the suggested way to deal with repeated instantiation?

I am using the .xsd dataset thingies (which I hate) to auto-generate TableAdapter classes for some backend code.
I have not really used these before, tending to favour manual commands and stored procs whenever possible (for various speed-induced reasons: those xsds play hell with dynamic tables and really large amounts of columns), and am finding myself instantiating a TableAdapter in a large number of my methods, so my question is this:
Will the auto-generated code automatically streamline itself so that a full adapter class is not created on an instatiation, and instead share some static data (such as connection information), and if not would it be better for me to have some sort of singleton/static class provider that can give me access to their methods when needed without the overhead of creating a new adapter every time I want to get some information?
Cheers, Ed
If you're concerned about the performance you could always run a benchmark to see what the performance hit, if any, is.
Sorry you didn't find my answer useful.
My point was that while you had received responses they all seemed to be subjective and not based on hard data. So if you had some reason to be concerned that there was a performance hit in your particular application you should measure it.
There is no reason to refactor one area for performance unless there is an actual problem.
I actually tend to instanciate a very low number of adapters (usually only one of each type). I never tried using them as on the stack variables (instantiated when needed), so I never ran into your question, but I understand your concern.
From what I know the aqdapters themselves may be quite heavyweight in instancing, but the real killer is the connection. What I do is I mark the adapter's Connection modifier as Public in the .xsd designer so I can assign the property whatever I need it to use, and maintain a tight grip on the opening and closing of connections:
void Load() {
using (SqlConnection conn = ...) {
conn.Open();
invoicesAdapter.Connection = conn;
customersAdapter.Connection = conn;
invoicesAdapter.Fill(dataSet.Invoices);
customersAdapter.Fill(dataSet.Customers);
}
}
void Save() {
using (SqlConnection conn = ...) {
conn.Open();
invoicesAdapter.Connection = conn;
customersAdapter.Connection = conn;
invoicesAdapter.Update(dataSet);
customersAdapater.Update(dataSet);
}
}
I ommitted transaction control and error handling for brevity.

How to safely and effectively cache ADO.NET commands?

I want to use this pattern:
SqlCommand com = new SqlCommand(sql, con);
com.CommandType = CommandType.StoredProcedure;//um
com.CommandTimeout = 120;
//com.Connection = con; //EDIT: per suggestions below
SqlParameter par;
par = new SqlParameter("#id", SqlDbType.Int);
par.Direction = ParameterDirection.Input;
com.Parameters.Add(par);
HttpContext.Current.Cache["mycommand"] = com;
Obviously I don't want to run into odd problems like person A retrieving this from the cache, updating param1, person 2 getting it from the cache and updating param2 and each user running the command with a blend of the two.
And cloning the command taken out of the cache is likely more expensive that creating a new one from scratch.
How thread safe is the ASP.NET Cache? Am I missing any other potential pitfalls? Would this technique work for parameterless commands despite threading issues?
Clarefication: If I want to metaphorically shoot myself in the foot, how do I aim? Is there a way to lock access to objects in the cache so that access is serialized?
Quite simply: don't. If you can't see that this is wrong, you need to read up more on ADO.NET. There is plenty of literature that explains the right way to do it: just create connections and commands when you need them, and make sure you dispose them properly.
The Cache itself is thread-safe but that doesn't confer thread-safety on the objects that you place within it. The SqlCommand object is not Thread-safe and therefore not the sort of thing you would want to cache.
The most important thing in this scenario is the caching of the connection which is handled for you and you should not attempt to look after this yourself.
The creation of command object (even one with many parameters) is still going to be peanuts compared with its execution. Unless you have evidence to the contray do not attempt to cache them.
The biggest risk to your project is premature optimisation.
As others have stated, this is just an all around bad idea. There are a number of reasons why it is a bad idea.
More than anything, if you are in a high load situation, storing the command for each and every user is going to really quickly fill up the cache, and depending on priorities, etc, will start to cause other items to fall out of the cache, that should REALLY still be there.
With ADO.NET you really should be creating, using, then disposing of your commands and connections as you use them. Performance wise I have NEVER had to change this system.....and I have not really heard of many others that have as well.
Also, as others mentioned with your code sample, the connection, which is needed to actually execute would be lost anyway.
Why would you ever cache the command? The overhead on the creation of a command is minuscule-you're just newing up a couple of objects and setting some properties. I can't ever see that being a bottleneck..
You want to cache the results of the command, as actually executing the command is (relatively) expensive. And, iin general, you want to treat shared cache as readonly so that you don't have to worry about locking and synchronizing access. Caching the results achieves that.
I should have asked how to lock an item in ASP.NET cache, instead of saying what I was intending to put in the cache.
lock(Cache)
{
// do something with cache that otherwise wouldn't be threadsafe
}
Reference: http://www.codeguru.com/csharp/.net/net_asp/article.php/c5363
Cache your results, and only create the connection (and command) if the resultcache is null:
PsuedoCode:
result = getResultFromCache(CacheKey)
if (result == null)
{
result = getResultFromDB();
InsertIntoCache(result,cacheKey);
}
return result;

How to manage SQL Connections with a utility class?

We have a SQL utility class that takes the name of a stored procedure an its input parameters, and returns the results in datatable. The reasoning behind this is so that we don't have to worry about forgetting to close connections and having connection leaks. Also so that we can reduce code by not having to recreate datadapters and datareaders in our data access layers.
The problem I have with this is that we're populating a datatable so that we can loop through it to create our objects, so we're basically using it like a datareader. I've read about classes that will return a datareader or dataadapter. But the problem with this is either client has to open and close connections, or you have to close the connection in a Finalize method. It seems that you wouldn't want garbage collection being responsible for closing your database connections.
To sum up, we want to have a class so that we can reduce code by not having to create datareaders for every query and so that we can ensure database connections are closed.
What is the best way of handling this?
UPDATE: Still thinking about this, but so far it seems that the best practice is to still return a datareader, use CommandBehavior.CloseConnection, and then trust who ever uses the class to call dr.Close()?
Have you considered the Microsoft Enterprise Library?
public List<User> GetUsers()
{
List<User> result = new List<User>();
Database db = new
Microsoft.Practices.EnterpriseLibrary.Data.Sql.SqlDatabase(this.connectionString);
DbCommand cmd = db.GetStoredProcCommand("GetUsers");
using (IDataReader rdr = db.ExecuteReader(cmd))
{
while (rdr.Read())
{
User user = new User();
FillUser(rdr, user);
result.Add(user);
}
}
return result;
}
We use something like this and it performs very well under high volume.
public SqlDataReader ExecuteReader(string command, SqlParameter[] parameters)
{
SqlDataReader reader = null;
using (SqlConnection conn = new SqlConnection())
using (SqlCommand cmd = conn.CreateCommand())
{
conn.Open();
cmd.CommandText = command;
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddRange(parameters);
reader = cmd.ExecuteReader(CommandBehavior.CloseConnection);
}
return reader;
}
DataTables are not considered best practice for several reasons including their bloat and lack of type safety.
I have the same structure - utility classes with methods that fetch the data and return filled DataTables (or fill/update a DataTable passed in to them) - for exactly the same reasons: keeping the database connections separate from the rest of the code and ensuring they are opened only when required and closed asap. Especially since the data is stored in various back-end systems, and I want to present only one interface to my application and not have it worry about the details.
There is one difference to your situation: We don't (in general) create objects from the rows in the DataTables, but rather work directly on the data in the rows. I find working with DataTables simple and efficient.
Other than that, I personally don't see anything wrong with this approach and find that it works very well for our purposes.
Returning a datareader doesn't work in a lot of scenarios. At a lot of places, direct connections to the database from the client machine are not allowed in production (for good reason). So you have to serialize the objects you are retrieving. I can think of designs that would allow you to persist a datareader in whatever class you use for remoting/serialization on the server side but returning items across http or nettcp in row by agonizing row fashion likely does not offer much benefit.
Are you serializing these objects? If so, your choices boil down to Datatable, Dataset, or custom objects. Custom objects, if written well, perform and serialize the best but you have to write concurrency in addition to a bunch of other functionality.
IMO, since ADO.Net 2.0, datatables can perform well even in large scale remoting situations. They provide a special binary remoting format and are simple to work with. Throw in some compression and you're not even using a lot of bandwidth for your large data sets.
well, if you plan to use this class inside of web pages you can register the utility class with the page's unload event. In the event sink you can write your logic to close the database connection. Check out this tip on codeproject for more ideas.
however this solution won't work for use inside web methods (web services). I suppose you'd have to adapt the technique for web service use. Your last line in the web method should should be an event call. So when you write your web service class, define an event called WebMethodCompleted. You'd probably get a reference to the instance of the web service via the technique mentioned in the article. Once you get a reference you can register tthe event in your utility class. Just remember to invoke the event in the web method.
Happy programming.

Categories

Resources