writing SQL queries - methods to write complicated ones! - c#

Hey guys, does anyone have any suggestions that might help with the following?
I am rewriting some software, which I did for a prototype for where I work, I am turning it into a more OOP compliant program :)
I have just written a custom database handler class to deal with my connections, my queries etc. The idea is that this database handler does everything needed to deal with the DB and only returns the result set of the query being run.
Anyways, I have just written a few methods which write my SQL queries for me - the idea being that I pass it some arguments in the form of an Array and the class writes the SQL String needed to query, which removes SQL injection problems.
The problem I have is; with normal selects (with where arguments and order/group by ) and insert and update, These all work fine. But if I want to pass a query which might have a join, or a multi-table join or a where-clause that contains a like or a sub select on the where (this one might be doable with running the select method twice!)
I can't work out how to get the method to produce these queries. Does anyone have any suggestions? - Might have to build custom ones where there is no way around not writing the query myself.
The other idea is over complicating things and to just perform a call that removes slashes contained in the passed string.
Thanks in advance,
btw if it doesnt make much sense, been coding since 7 this morning, brain dying slowly! :)T

I would encourage you to consider writing stored procedures for the functionality you need rather than trying to write some kind of generic query building mechanism.

You could just use sqlcommand with parameters instead http://www.csharp-station.com/Tutorials/AdoDotNet/Lesson06.aspx

If you need a complex query-building mechanism, consider one of the many ORM frameworks already developed, such as NHibernate or MSEF. These allow you to create some pretty complex queries using Linq (compiler-checked; gotta love it) that then are translated to SQL.

If you have the option on the table you could look into doing a LINQ to SQL datalayer. That will let you work with your tables and query results as classes.
Its very easy to get started making a .dbml of your database, check out a walk through on MSDN
It also leaves the dynamic SQL authoring to the MS Language team, so thats nice.


How to escape a string for a SQL query when you really CAN'T use parameters

I've seen this question more than a few times obviously but haven't seen a good example of when parameterized queries truly aren't an option…but I think I have one.
I'm working with the Cisco Call Manager AXL API. Its backend is an Informix DB. Usually and whenever possible, I use the provided SOAP methods to get results, which since I'm using a WSDL-created interface class and passing parameters in actual object properties this takes care of any escaping necessary via the SOAP libraries.
There are a few things I have to use direct SQL calls against the DB for, and the API provides a method where you can pass in an SQL query (as a string) and get back rows of results. Unfortunately this method doesn't provide any facility for parameterized queries. So, yes I am actually required to do my own escaping.
Well then, of course I could make my own regex, but A: I could easily miss something, and B: Really? There's not a utility class for this? Can I somehow use the SQL parameterization engine to spit back the escaped query? Obviously I know you have to deal with ', but I've read about the backspace-character injection method and I'm sure there are others that I don't yet know about…surely someone else has already written a pretty secure version?
I'm interested in solutions that use off-the-shelf libraries, preferably a built-in one.
If I have to write my own, I can use the examples in the link above and elsewhere, but I really don't want to write my own, so lets try and refrain from telling me how to do that.
No, I can't connect directly to the Informix DB and use an Informix driver with parameterized query support. That would be a good answer, but it's ruled out in this scenario.
You may be able to get away with using the EscapeSequence class from Microsoft.SqlServer.Management.SqlParser.dll if the MsSql escaping is close enough to what your database back end uses.
You can find more information about it here. http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.management.sqlparser.parser.escapesequence.aspx
The ADO.NET abstraction does not process quotes, it simply passes them to the underlying provider. So if there is an off-the-shelf library, it would be an Informix DB-specific one, but I doubt you'll find one for .NET since everyone's happy with ADO.NET or an even higher abstraction.
Even PHP, known for its mysql_real_escape_string function does not seem to have an equivalent for Informix DB.
Tempted to try to help solve your problem, given your scope that's all I can say.

How is using Entity + LINQ not just essentially hard coding my queries?

So I've been developing with Entity + LINQ for a bit now and I'm really starting to wonder about best practices. I'm used to the model of "if I need to get data, reference a stored procedure". Stored procedures can be changed on the fly if needed and don't require code recompiling. I'm finding that my queries in my code are looking like this:
List<int> intList = (from query in context.DBTable
where query.ForeignKeyId == fkIdToSearchFor
select query.ID).ToList();
and I'm starting to wonder what the difference is between that and this:
List<int> intList = SomeMgrThatDoesSQLExecute.GetResults(
string.Format("SELECT [ID]
WHERE ForeignKeyId = {0}",
My concern is that that I'm essentially hard coding the query into the code. Am I missing something? Is that the point of Entity? If I need to do any real query work should I put it in a sproc?
The power of Linq doesn't really make itself apparent until you need more complex queries.
In your example, think about what you would need to do if you wanted to apply a filter of some form to your query. Using the string built SQL you would have to append values into a string builder, protected against SQL injection (or go through the additional effort of preparing the statement with parameters).
Let's say you wanted to add a statement to your linq query;
IQueryable<Table> results = from query in context.Table
where query.ForeignKeyId = fldToSearchFor
select query;
You can take that and make it;
results.Where( r => r.Value > 5);
The resulting sql would look like;
SELECT * FROM Table WHERE ForeignKeyId = fldToSearchFor AND Value > 5
Until you enumerate the result set, any extra conditions you want to decorate will get added in for you, resulting in a much more flexible and powerful way to make dynamic queries. I use a method like this to provide filters on a list.
I personally avoid to hard-code SQL requests (as your second example). Writing LINQ instead of actual SQL allows:
ease of use (Intellisense, type check...)
power of LINQ language (which is most of the time more simple than SQL when there is some complexity, multiple joins...etc.)
power of anonymous types
seeing errors right now at compile-time, not during runtime two months later...
better refactoring if your want to rename a table/column/... (you won't forget to rename anything with LINQ becaues of compile-time checks)
loose coupling between your requests and your database (what if you move from Oracle to SQL Server? With LINQ you won't change your code, with hardcoded requests you'll have to review all of your requests)
LINQ vs stored procedures: you put the logic in your code, not in your database. See discussion here.
if I need to get data, reference a stored procedure. Stored procedures
can be changed on the fly if needed and don't require code recompiling
-> if you need to update your model, you'll probably also have to update your code to take the update of the DB into account. So I don't think it'll help you avoid a recompilation most of the time.
Is LINQ is hard-coding all your queries into your application? Yes, absolutely.
Let's consider what this means to your application.
If you want to make a change to how you obtain some data, you must make a change to your compiled code; you can't make a "hotfix" to your database.
But, if you're changing a query because of a change in your data model, you're probably going to have to change your domain model to accommodate the change.
Let's assume your model hasn't changed and your query is changing because you need to supply more information to the query to get the right result. This kind of change most certainly requires that you change your application to allow the use of the new parameter to add additional filtering to the query.
Again, let's assume you're happy to use a default value for the new parameter and the application doesn't need to specify it. The query might include an field as part of the result. You don't have to consume this additional field though, and you can ignore the additional information being sent over the wire. It has introduced a bit of a maintenance problem here, in that your SQL is out-of-step with your application's interpretation of it.
In this very specific scenario where you either aren't making an outward change to the query, or your application ignores the changes, you gain the ability to deploy your SQL-only change without having to touch the application or bring it down for any amount of time (or if you're into desktops, deploy a new version).
Realistically, when it comes to making changes to a system, the majority of your time is going to be spent designing and testing your queries, not deploying them (and if it isn't, then you're in a scary place). The benefit of having your query in LINQ is how much easier it is to write and test them in isolation of other factors, as unit tests or part of other processes.
The only real reason to use Stored Procedures over LINQ is if you want to share your database between several systems using a consistent API at the SQL-layer. It's a pretty horrid situation, and I would prefer to develop a service-layer over the top of the SQL database to get away from this design.
Yes, if you're good at SQL, you can get all that with stored procs, and benefit from better performance and some maintainance benefits.
On the other hand, LINQ is type-safe, slightly easier to use (as developers are accustomed to it from non-db scenarios), and can be used with different providers (it can translate to provider-specific code). Anything that implements IQueriable can be used the same way with LINQ.
Additionally, you can pass partially constructed queries around, and they will be lazy evaluated only when needed.
So, yes, you are hard coding them, but, essentially, it's your program's logic, and it's hard coded like any other part of your source code.
I also wondered about that, but the mentality that database interactivity is only in the database tier is an outmoded concept. Having a dedicated dba churn out stored procedures while multiple front end developers wait, is truly an inefficient use of development dollars.
Java shops hold to this paradigm because they do not have a tool such as linq and hold to that paradigm due to necessity. Databases with .Net find that the need for a database person is lessoned if not fully removed because once the structure is in place the data can be manipulated more efficiently (as shown by all the other responders to this post) in the data layer using linq than by raw SQL.
If one has a chance to use code first and create a database model from the ground up by originating the entity classes first in the code and then running EF to do the magic of actually creating the database...one understands how the database is truly a tool which stores data and the need for stored procedures and a dedicated dba is not needed.

Best option for dynamic queries?

I'm working on porting an old application to from WebForms to MVC, and part of that process is tearing out the existing data layer, moving the logic from stored procedures to code. As I have initially only worked with basic C# SQL functions (System.Data.SqlClient), I went with a lightweight pseudo-ORM (PetaPoco), which just takes a SQL statement as a string and executes it. Building dynamic queries would work about the same in SQL - lots of conditionals that add and remove additional code (average query has ~30 filters).
So after looking around a bit, I found some choices:
A bunch of strings and conditionals that add bits of the query as they are needed. Really nasty, especially when queries get complex, and not something I want to pursue if a better solution exists.
A bunch of conditionals using L2E. Looks more elegant, but I tested L2E is too bloated in general was an awful experience. Could I do the same thing in L2S? If so, is L2S going to stick around for the next 5-10 years?
Use a PredicateBuilder. Still looking into this, same questions regarding L2S.
EDIT: I can also just stick to the existing stored procedure model, but I have to rewrite them anyway, so it can't hurt to look at other options as I'm still going to have to do the leg work.
Are there any other options out there? Can anyone weigh in with some experience on any of the mentioned methods - mainly, did the method you choose make you want to build a time machine and kill past you for implementing it?
I'd look at LLBLGen. The code that it generates is quite good and customizable. They also provide a robust linq provider which may help with your queries. I used it for a couple large projects and was quite happy.
In my opinion, neither L2S nor L2E can generate efficient SQL code, especially when it comes to complex queries. Even in some relatively simple cases generating queries via either of the two methods would yield inefficient SQL code, here's an example: Why does this additional join increase # of queries?
That being said, if you're using SQL Server L2S is a better option, as L2E is meant to handle any database; Because of which L2E will generate inefficient SQL code. Also another point to keep in mind is neither L2S or L2E will leverage the tempDB, i.e. generating temp-tables or table variables or CTEs.
I would re-write the stored procedures, optimizing them as much as possible, and use L2S/L2E for simple queries, that would generate one round-trip (this should be as low as possible) to the server, and also ensure that the execution plan SQL Server uses is the most efficient (i.e. uses indexes etc).
Not really an answer, but too long for a comment:
I have built a mid-sized web app using the 'concatenate pieces of SQL' method, and am currently in the process of doing a similar job but using L2E.
I found that with some self-control, the concatenate-pices-of-sql method is not that bad. Of course use parameterized queries, don't try to stick user input into the SQL directly.
I have been slowly growing an appreciation for the L2E method though. It gives you type safety, though you do have to do some things "backwards" from how you might do it with SQL -- such as WHERE X IN (...) constructs. But so far I haven't hit anything that L2E can't handle.
I feel like the L2E approach would be a little easier to maintain if other people were to be heavily involved.
Do you have actual use cases where the "bloat" of L2E is a problem? Or is it just a general sense of malaise where you feel the framework is doing too much behind the scenes?
I definitely had that feeling at first (ok, still do), and certainly don't like reading the generated SQL (esp. compared to my handwritten SQL from the previous project), but so far have found L2E pretty good with regard to only hitting the DB when it is actually necessary.
Another concern is what DB you're using, and how up-to-date its L2E bindings are. If you're using SQL Server, then no problem. MySql might be more flaky though. A chunk of L2E's slickness comes from its nice integration with VStudio, and VStudio's ability to build entity models from your DB automagically. Not sure how good the support is for non-MS DB backends.

Methods of pulling data from a database

I'm getting ready to start a C# web application project and just wanted some opinions regarding pulling data from a database. As far as I can tell, I can either use C# code to access the database from the code behind (i.e. LINQ) of my web app or I can call a stored procedure that will collect all the data and then read it with a few lines of code in my code behind. I'm curious to know which of these two approaches, or any other approach, would be the most efficient, elegant, future proof and easiest to test.
The most future proof way to write your application would be to have an abstraction between you and your database. To do that you would want to use an ORM of some sort. I would recommend using either NHibernate or Entity Framework.
This would give you the advantage of only having to write your queries once instead of multiple times (Example: if you decide to change your database "moving from mssql to mysql or vice versa"). This also gives you the advantage of having all of your data in objects. Which is much easier to work with than raw ado Datatables or DataReaders.
Most developers like to introduce at least one layer between the code behind and the Database.
Additionally there are many data access strategies that people use within that layer. ADO.NET, Entity Framework, Enterprise Library NHibernate, Linq etc.
In all of those you can use SQL Queries or Stored Procedures. I prefer Stored Procedures because they are easy for me to write and deploy. Others prefer to use Parameterized queries.
When you have so many options its usually indicative that there really isn't a clear winner. This means you can probably just pick a direction and go with it and you'll be fine.
But you really shouldn't use non-parameterized queries and you shouldn't do it in the code behind but instead in seperate classes
Using LINQ to SQL to access your data is probably the worst choice right now. Microsoft has said that they will no longer be improving LINQ to SQL in favor of Entity Framework. Also, you can use LINQ with your EF if you should choose to go that route.
I would recommend using an ORM like nHibernate or Entity framework instead of a sproc/ADO approach. Between the two ORMs, I would probably suggest EF for you where you are just getting the hang of this. EF isn't QUITE as powerful as nHibernate but it has a shorter learning curve and is pretty robust.

Can we convert all SQL scripts to Linq-to-SQL expressions or there is any limitation?

I want to convert all of my db stored procedures to linq to sql expressions, is there any limitation for this work? you must notice that there is some complicated queries in my db.
Several features of SQL Server are not supported by Linq to SQL:
Batch updates (unless you use non-standard extensions);
Table-Valued Parameters;
CLR types, including spatial types and hierarchyid;
DML statements (I'm thinking specifically of table variables and temporary tables);
The OUTPUT INTO clause;
The MERGE statement;
Recursive Common Table Expressions, i.e. hierarchical queries on a nested set;
Optimized paging queries using SET ROWCOUNT (ROW_NUMBER is not the most efficient);
Certain windowing functions like DENSE_RANK and NTILE;
Cursors - although these should obviously be avoided, sometimes you really do need them;
Analytical queries using ROLLUP, CUBE, COMPUTE, etc.
Statistical aggregates such as STDEV, VAR, etc.
PIVOT and UNPIVOT queries;
XML columns and integrated XPath;
...and so on...
With some of these things you could technically write your own extension methods, parse the expression trees and actually generate the correct SQL, but that won't work for all of the above, and even when it is a viable option, it will often simply be easier to write the SQL and invoke the command or stored procedure. There's a reason that the DataContext gives you the ExecuteCommand, ExecuteQuery and ExecuteMethodCall methods.
As I've stated in the past, ORMs such as Linq to SQL are great tools, but they are not silver bullets. I've found that for larger, database-heavy projects, L2S can typically handle about 95% of the tasks, but for that other 5% you need to write UDFs or Stored Procedures, and sometimes even bypass the DataContext altogether (object tracking does not play nice with server triggers).
For smaller/simpler projects it is highly probable that you could do everything in Linq to SQL. Whether or not you should is a different question entirely, and one that I'm not going to try to answer here.
I've found that in almost all cases where I've done a new project with L2S, I've completely removed the need for stored procedures. In fact, many of the cases where I would have been forced to use a stored proc, multivariable filters for instance, I've found that by building the query dynamically in LINQ, I've actually gotten better queries in the vast majority of cases since I don't need to include those parts of the query that get translated to "don't care" in the stored proc. So, from my perspective, yes -- you should be able to translate your stored procs to LINQ.
A better question, thought, might be should you translate your stored procs to LINQ? The answer to that, I think, depends on the state of the project, your relative expertise with C#/VB and LINQ vs SQL, the size of the conversion, etc. On an existing project I'd only make the effort if it improves the maintainability or extensibility of the code base, or if I was making significant changes and the new code would benefit. In the latter case you may choose to incrementally move your code to pure LINQ as you touch it to make changes. You can use stored procs with LINQ so you may not need to change it to make use of LINQ.
I'm not a fan of this approach. This is a major architectural change, because you are now removing a major interface layer you previously put in place to gain a decoupling advantage.
With stored procedures, you have already chosen the interface your database exposes. You will now need to grant users SELECT privileges on all the underlying tables/views instead of EXECUTE on just the application stored procedures and potentially you will need to restrict column read rights at the column level in the tables/views. Now you will need to re-implement at a lower level every explicit underlying table/view/column rights which your stored procedure was previously implementing with a single implicit EXECUTE right.
Whereas before the services expected from the database could be enumerated by an appropriate inventory of stored procedures, now the potential database operations are limited to the exposed tables/views/columns, vastly increasing the coupling and potential for difficulty in estimating scope changes for database refactorings and feature implementations.
Unless there are specific cases where the stored procedure interface is difficult to create/maintain, I see little benefit of changing a working SP-based architecture en masse. In cases where LINQ generates a better implementation because of application-level data coupling (for instance joining native collections to database), it can be appropriate. Even then, you might want to LINQ to the stored procedure on the database side.
If you chose LINQ from the start, you would obviously have done a certain amount of work up front in determining column/view/table permissions and limiting the scope of application code affecting database implementation details.
What does this mean? Does this mean you want to use L2S to call your stored procedures, or do you want to convert all the T-SQL statements in your stored procs to L2S? If it's the later, you should not have too many problems doing this. Most T-SQL statements can be represented in Linq without problem.
I might suggest you investigate a tool like Linqer to help you with your T-SQL conversion. It will convert most any T-SQL statement into Linq. It has saved my quite a bit of time in converting some of my queries.
There are many constructs in T-SQL which have no parallel in LINQ to SQL. Starting with flow control, ability to return multiple row sets, recursive queries.
You will need to approach this on a case by case basis. Remembering any times the SP does significant filtering work on the database much of that filtering may end up on the client, so needing to move far more data from server to client.
If you already have tested and working stored procedures, why convert them at all? That's just making work for no reason.
If you were starting a new product from scratch and were wondering whether to use stored procedures or not, that would be an entirely different question.

