better way to store long SQL strings in C# - c#

I have searched around for a while for this and have not found anything.
I am storing some pretty long SQL select strings (a shorter one like this:)
string mySelectQuery = "select distribution_stop_information.unique_id_no as stop_unique_id_no,distribution_line_items.unique_id_no as line_unique_id_no, stop_name, stop_address,route_code AS RouteCode, customer_reference," +
"distribution_line_items.datetime_created, rma_number from distribution_stop_information join distribution_line_items on " +
"distribution_line_items.unique_id_no = distribution_stop_information.unique_id_no " +
"where distribution_line_items.datetime_created > '2/22/2017' and customer_no = '91000'";
then passing them by
var Ourstops = (List<stop_data>)db.Query<stop_data>(mySelectQuery);
This is cumbersome and produces hard to read/debug code.
What are some better ways of doing this?
Just a point of clairification - on this project I am not allowed to create any sprocs. Strictly a query only use type, and using postgresql (not that that matters that much here)

This is my preferred formatting, just one guy's opinion:
string mySelectQuery = #"
select
distribution_stop_information.unique_id_no as stop_unique_id_no
,distribution_line_items.unique_id_no as line_unique_id_no, stop_name
,stop_address,route_code AS RouteCode, customer_reference
,distribution_line_items.datetime_created, rma_number
from
distribution_stop_information
join distribution_line_items on distribution_line_items.unique_id_no = distribution_stop_information.unique_id_no
where
distribution_line_items.datetime_created > '2/22/2017' and customer_no = '91000'
";
Benefits
Copy and paste right into sql management studio
Using the # (verbatim literal) eliminates all the quotes and concatenations
Easy to also use $ for string interpolation
commas in front make commenting lines out easier
But do be mindful of SQL Injection and use parameters as much as possible- which is pretty much always. (Edit from comments)

Stored procs or views (SQL specific)
Resources
Use of # to allow line breaks
Configuration files
Content management system (if you have one, and even then not too sure about that)
Entity Framework so you don't have SQL (I'm not a fan BTW. I'd go SPs)

My favourite question! Like you perhaps, I find SQL in string literals the weirdest thing in programming. So weird that I went off and wrote QueryFirst, a visual studio extension for working intelligently with SQL. Your SQL lives in a .sql file, like god intended. You edit it with the marvellous TSQL editor, connected to the database with intellisense for tables and columns and syntax validation. Every time you save the file, QueryFirst checks that the query runs, then (re)generates the C# wrapper that lets you use it. Behind the scenes, QueryFirst compiles your SQL into the binary, and accesses it with GetManifestResourceStream. All your data access is continually integration tested and working, and you have no more SQL hanging around in string literals.

Have you considered getting rid of the strings by using a query builder? There are heaps around eg. https://github.com/cdroulers/awesome-sql-builder.
I have never used it so cannot comment on it specifically.
And even if you don't want to go the heavy EF ORM route you might also look at something like https://github.com/StackExchange/Dapper and avoid constructing SQL by hand.
Aside from it being a potential security risk the hassle in terms of getting quoting right is worth the time and effort investing in a micro ORM.

Related

Is it safe to not parameterize an SQL query when the parameter is not a string?

In terms of SQL injection, I completely understand the necessity to parameterize a string parameter; that's one of the oldest tricks in the book. But when can it be justified to not parameterize an SqlCommand? Are any data types considered "safe" to not parameterize?
For example: I don't consider myself anywhere near an expert in SQL, but I can't think of any cases where it would be potentially vulnerable to SQL injection to accept a bool or an int and just concatenate it right into the query.
Is my assumption correct, or could that potentially leave a huge security vulnerability in my program?
For clarification, this question is tagged c# which is a strongly-typed language; when I say "parameter," think something like public int Query(int id).
I think it's safe... technically, but it's a terrible habit to get into. Do you really want to be writing queries like this?
var sqlCommand = new SqlCommand("SELECT * FROM People WHERE IsAlive = " + isAlive +
" AND FirstName = #firstName");
sqlCommand.Parameters.AddWithValue("firstName", "Rob");
It also leaves you vulnerable in the situation where a type changes from an integer to a string (Think employee number which, despite its name - may contain letters).
So, we've changed the type of EmployeeNumber from int to string, but forgot to update our sql queries. Oops.
When using a strongly-typed platform on a computer you control (like a web server), you can prevent code injection for queries with only bool, DateTime, or int (and other numeric) values. What is a concern are performance issues caused by forcing sql server to re-compile every query, and by preventing it from getting good statistics on what queries are run with what frequency (which hurts cache management).
But that "on a computer you control" part is important, because otherwise a user can change the behavior used by the system for generating strings from those values to include arbitrary text.
I also like to think long-term. What happens when today's old-and-busted strongly-typed code base gets ported via automatic translation to the new-hotness dynamic language, and you suddenly lose the type checking, but don't have all the right unit tests yet for the dynamic code?
Really, there's no good reason not to use query parameters for these values. It's the right way to go about this. Go ahead and hard-code values into the sql string when they really are constants, but otherwise, why not just use a parameter? It's not like it's hard.
Ultimately, I wouldn't call this a bug, per se, but I would call it a smell: something that falls just short of a bug by itself, but is a strong indication that bugs are nearby, or will be eventually. Good code avoids leaving smells, and any good static analysis tool will flag this.
I'll add that this is not, unfortunately, the kind of argument you can win straight up. It sounds like a situation where being "right" is no longer enough, and stepping on your co-workers toes to fix this issue on your own isn't likely to promote good team dynamics; it could ultimately hurt more than it helps. A better approach in this case may be to promote the use of a static analysis tool. That would give legitimacy and credibility to efforts aimed and going back and fixing existing code.
In some cases, it IS possible to perform SQL injection attack with non-parametrized (concatenated) variables other than string values - see this article by Jon: http://codeblog.jonskeet.uk/2014/08/08/the-bobbytables-culture/ .
Thing is that when ToString is called, some custom culture provider can transform a non-string parameter into its string representation which injects some SQL into the query.
This is not safe even for non-string types. Always use parameters. Period.
Consider following code example:
var utcNow = DateTime.UtcNow;
var sqlCommand = new SqlCommand("SELECT * FROM People WHERE created_on <= '" + utcNow + "'");
At the first glance code looks safe, but everything changes if you make some changes in Windows Regional Settings and add injection in short date format:
Now resulting command text looks like this:
SELECT * FROM People WHERE created_on <= '26.09.2015' OR '1'<>' 21:21:43'
The same can be done for int type as user can define custom negative sign which can be easily changed into SQL injection.
One could argue that invariant culture should be used instead of current culture, but I have seen string concatenations like this so many times and it is quite easy to miss when concatenating strings with objects using +.
"SELECT * FROM Table1 WHERE Id=" + intVariable.ToString()
Security
It is OK.
Attackers can not inject anything in your typed int variable.
Performance
Not OK.
It's better to use parameters, so the query will be compiled once and cached for next usage. Next time even with different parameter values, query is cached and doesn't need to compile in database server.
Coding Style
Poor practice.
Parameters are more readable
Maybe it makes you get used to queries without parameters, then maybe you made a mistake once and use a string value this way and then you probably should say goodbye to your data. Bad habit!
"SELECT * FROM Product WHERE Id=" + TextBox1.Text
Although it is not your question, but maybe useful for future readers:
Security
Disaster!
Even when the Id field is an integer, your query may be subject to SQL Injection. Suppose you have a query in your application "SELECT * FROM Table1 WHERE Id=" + TextBox1.Text. An attacker can insert into text box 1; DELETE Table1 and the query will be:
SELECT * FROM Table1 WHERE Id=1; DELETE Table1
If you don't want to use a parametrized query here, you should use typed values:
string.Format("SELECT * FROM Table1 WHERE Id={0}", int.Parse(TextBox1.Text))
Your Question
My question arose because a coworker wrote a bunch of queries
concatenating integer values, and I was wondering whether it was a
waste of my time to go through and fix all of them.
I think changing those codes is not waste of time. Indeed change is recommended!
If your coworker uses int variables it has no security risk, but I think changing those codes is not waste of time and indeed changing those codes is recommended. It makes code more readable, more maintainable, and makes execution faster.
There are actually two questions in one. And question from the title has very little to do with concerns expressed by the OP in the comments afterwards.
Although I realize that for the OP it is their particular case that matters, for the readers coming from Google, it is important to answer to the more general question, that can be phrased as "is concatenation as safe as prepared statements if I made sure that every literal I am concatenating is safe?". So, I would like to concentrate on this latter one. And the answer is
Definitely NO.
The explanation is not that direct as most readers would like, but I'll try my best.
I have been pondering on the matter for a while, resulting in the article (though based on the PHP environment) where I tried to sum everything up. It occurred to me that the question of protection from SQL injection is often eludes toward some related but narrower topics, like string escaping, type casting and such. Although some of the measures can be considered safe when taken by themselves, there is no system, nor a simple rule to follow. Which makes it very slippery ground, putting too much on the developer's attention and experience.
The question of SQL injection cannot be simplified to a matter of some particular syntax issue. It is wider than average developer used to think. It's a methodological question as well. It is not only "Which particular formatting we have to apply", but "How it have to be done" as well.
(From this point of view, an article from Jon Skeet cited in the other answer is doing rather bad than good, as it is again nitpicking on some edge case, concentrating on a particular syntax issue and failing to address the problem at whole.)
When you're trying to address the question of protection not as whole but as a set of different syntax issues, you're facing multitude of problems.
the list of possible formatting choices is really huge. Means one can easily overlook some. Or confuse them (by using string escaping for identifier for example).
Concatenation means that all protection measures have to be done by the programmer, not program. This issue alone leads to several consequences:
such a formatting is manual. Manual means extremely error prone. One could simply forget to apply.
moreover, there is a temptation to move formatting procedures into some centralized function, messing things even more, and spoiling data that is not going to database.
when more than one developers involved, problems multiply by a factor of ten.
when concatenation is used, one cannot tell a potentially dangerous query at glance: they all potentially dangerous!
Unlike that mess, prepared statements are indeed The Holy Grail:
it can be expressed in the form of one simple rule that is easy to follow.
it is essentially undetacheable measure, means the developer cannot interfere, and, willingly or unwillingly, spoil the process.
protection from injection is really only a side effect of the prepared statements, which real purpose is to produce syntactically correct statement. And a syntactically correct statement is 100% injection proof. Yet we need our syntax to be correct despite of any injection possibility.
if used all the way around, it protects the application regardless of the developer's experience. Say, there is a thing called second order injection. And a very strong delusion that reads "in order to protect, Escape All User Supplied Input". Combined together, they lead to injection, if a developer takes the liberty to decide, what needs to be protected and what not.
(Thinking further, I discovered that current set of placeholders is not enough for the real life needs and have to be extended, both for the complex data structures, like arrays, and even SQL keywords or identifiers, which have to be sometimes added to the query dynamically too, but a developer is left unarmed for such a case, and forced to fall back to string concatenation but that's a matter of another question).
Interestingly, this question's controversy is provoked by the very controversial nature of Stack Overflow. The site's idea is to make use of particular questions from users who ask directly to achieve the goal of having a database of general purpose answers suitable for users who come from search. The idea is not bad per se, but it fails in a situation like this: when a user asks a very narrow question, particularly to get an argument in a dispute with a colleague (or to decide if it worth to refactor the code). While most of experienced participants are trying to write an answer, keeping in mind the mission of Stack Overflow at whole, making their answer good for as many readers as possible, not the OP only.
Let's not just think about security or type-safe considerations.
The reason you use parametrized queries is to improve performance at the database level. From a database perspective, a parametrized query is one query in the SQL buffer (to use Oracle's terminology although I imagine all databases have a similar concept internally). So, the database can hold a certain amount of queries in memory, prepared and ready to execute. These queries do not need to be parsed and will be quicker. Frequently run queries will usually be in the buffer and will not need parsing every time they are used.
UNLESS
Somebody doesn't use parametrized queries. In this case, the buffer gets continually flushed through by a stream of nearly identical queries each of which needs to be parsed and run by the database engine and performance suffers all-round as even frequently run queries end up being re-parsed many times a day. I have tuned databases for a living and this has been one of the biggest sources of low-hanging fruit.
NOW
To answer your question, IF your query has a small number of distinct numeric values, you will probably not be causing issues and may in fact improve performance infinitesimally. IF however there are potentially hundreds of values and the query gets called a lot, you are going to affect the performance of your system so don't do it.
Yes you can increase the SQL buffer but it's always ultimately at the expense of other more critical uses for memory like caching Indexes or Data. Moral, use parametrized queries pretty religiously so you can optimize your database and use more server memory for the stuff that matters...
To add some info to Maciek answer:
It is easy to alter the culture info of a .NET third party app by calling the main-function of the assembly by reflection:
using System;
using System.Globalization;
using System.Reflection;
using System.Threading;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
Assembly asm = Assembly.LoadFile(#"C:\BobbysApp.exe");
MethodInfo mi = asm.GetType("Test").GetMethod("Main");
mi.Invoke(null, null);
Console.ReadLine();
}
static Program()
{
InstallBobbyTablesCulture();
}
static void InstallBobbyTablesCulture()
{
CultureInfo bobby = (CultureInfo)CultureInfo.InvariantCulture.Clone();
bobby.DateTimeFormat.ShortDatePattern = #"yyyy-MM-dd'' OR ' '=''";
bobby.DateTimeFormat.LongTimePattern = "";
bobby.NumberFormat.NegativeSign = "1 OR 1=1 OR 1=";
Thread.CurrentThread.CurrentCulture = bobby;
}
}
}
This only works if the Main function of BobbysApp is public. If Main is not public, there might be other public functions you might call.
In my opinion if you can guarantee that the parameter you working with will never contain a string it is safe but I would not do it in any case. Also, you will see a slight performance drop due to the fact that you are performing concatenation. The question I would ask you is why don't you want to use parameters?
It is ok but never safe.. and the security always depend on the inputs, for example if the input object is TextBox, the attackers can do something tricky since the textbox can accept string, so you have to put some kind of validation/conversion to be able prevent users the wrong input. But the thing is, it is not safe. As simply as that.
No you can get an SQL injection attack that way. I have written an old article in Turkish which shows how here. Article example in PHP and MySQL but concept works same in C# and SQL Server.
Basically you attack following way. Lets consider you have a page which shows information according to integer id value. You do not parametrized this in value, like below.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=24
Okay, I assume you are using MySQL and I attack following way.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=ASCII((SELECT%20DATABASE()))
Note that here injected value is not string. We are changing char value to int using ASCII function. You can accomplish same thing in SQL Server using "CAST(YourVarcharCol AS INT)".
After that I use length and substring functions to find about your database name.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=LEN((SELECT%20DATABASE()))
http://localhost/sqlEnjeksiyon//instructors.aspx?id=ASCII(SUBSTR(SELECT%20DATABASE(),1,1))
Then using database name, you start to get table names in database.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=ASCII(SUBSTR((SELECT table_name FROM INFORMATION_SCHEMA.TABLES LIMIT 1),1,1))
Of course you have to automate this process, since you only get ONE character per query. But you can easily automate it. My article shows one example in watir. Using only one page and not parameterized ID value. I can learn every table name in your database. After that I can look for important tables. It will take time but it is doable.

Use SQL to return a JSON string

This is a "best practice" question. We are having internal discussions on this topic and want to get input from a wider audience.
I need to store my data in a traditional MS SQL Server table with normal columns and rows. I sometimes need to return a DataTable to my web application, and other times I need to return a JSON string.
Currently, I return the table to the middle layer and parse it into a JSON string. This seems to work well for the most part, but does occasionally take a while on large datasets (parsing the data, not returning the table).
I am considering revising the stored procedures to selectively return a DataTable or a JSON string. I would simply add a #isJson bit parameter to the SP.
If the user wanted the string instead of the table the SP would execute a query like this:
DECLARE #result varchar(MAX)
SELECT #result = COALESCE(#results ',', '') + '{id:"' + colId + '",name:"' + colName + '"}'
FROM MyTable
SELECT #result
This produces something like the following:
{id:"1342",name:"row1"},{id:"3424",name:"row2"}
Of course, the user can also get the table by passing false to the #isJson parameter.
I want to be clear that the data storage isn't affected, nor are any of the existing views and other processes. This is a change to ONLY the results of some stored procedures.
My questions are:
Has anyone tried this in a large application? If so, what was the result?
What issues have you seen/would you expect with this approach?
Is there a better faster way to go from table to JSON in SQL Server other than modifying the stored procedure in this way or parsing the string in the middle tier?
I personally think the best place for this kind of string manipulation is in program code in a fully expressive language that has functions and can be compiled. Doing this in T-SQL is not good. Program code can have fast functions that do proper escaping.
Let's think about things a bit:
When you deploy new versions of the parts and pieces of your application, where is the best place for this functionality to be?
If you have to restore your database (and all its stored procedures) will that negatively affect anything? If you are deploying a new version of your web front end, will the JSON conversion being tied into the database cause problems?
How will you escape characters properly? Are you sending any dates through? What format will date strings be in and how will they get converted to actual Date objects on the other end (if that is needed)?
How will you unit test it (and with automated tests!) to prove it is working correctly? How will you regression test it?
SQL Server UDFs can be very slow. Are you content to use a slow function, or for speed hack into your SQL code things like Replace(Replace(Replace(Replace(Value, '\', '\\'), '"', '\"'), '''', '\'''), Char(13), '\n')? What about Unicode, \u and \x escaping? How about splitting '</script>' into '<' + '/script>'? (Maybe that doesn't apply, but maybe it does, depending on how you use your JSON.) Is your T-SQL procedure going to do all this, and be reusable for different recordsets, or will you rewrite it each time into each SP that you need to return JSON?
You may only have one SP that needs to return JSON. For now. Some day, you might have more. Then if you find a bug, you have to fix it in two places. Or five. Or more.
It may seem like you are making things more complicated by having the middle layer do the translation, but I promise you it is going to be better in the long run. What if your product scales out and starts going massively parallel—you can always throw more web servers at it cheaply, but you can't so easily fix database server resource saturation! So don't make the DB do more work than it should. It is a data access layer, not a presentation layer. Make it do the minimum amount of work possible. Write code for everything else. You will be glad you did.
Speed Tips for String Handling in a Web Application
Make sure your web string concatenation code doesn't suffer from Schlemiel the Painter's Algorithm. Either directly write to the output buffer as JSON is generated (Response.Write), or use a proper StringBuilder object, or write the parts of the JSON to an array and Join() it later. Don't do plain vanilla concatenation to a longer and longer string over and over.
Dereference objects as little as possible. I don't know your server-side language, but if it happens to be ASP Classic, don't use field names--either get a reference to each field in a variable or at the very least use integer field indexes. Dereferencing a field based on its name inside a loop is (much) worse performance.
Use pre-built libraries. Don't roll your own when you can use a tried and true library. Performance should be equal or better to your own and (most importantly) it will be tested and correct.
If you're going to spend the time doing this, make it abstract enough to handle converting any recordset, not just the one you have now.
Use compiled code. You can always get the fastest code when it is compiled, not interpreted. If you identify that the JSON-conversion routines are truly the bottleneck (and you MUST prove this for real, do not guess) then get the code into something that is compiled.
Reduce string lengths. This is not a big one, but if at all possible use one-letter json names instead of many-letter. For a giant recordset this will add up to savings on both ends.
Ensure it is GZipped. This is not so much a server-side improvement, but I couldn't mention JSON performance without being complete.
Passing Dates in JSON
What I recommend is to use a separate JSON schema (itself in JSON, defining the structure of the virtual recordset to follow). This schema can be sent as a header to the "recordset" to follow, or it can be already loaded in the page (included in the base javascript files) so it doesn't have to be sent each time. Then, in your JSON parse callback (or post-callback on the final resultant object) look in the schema for the current column and do conversions as necessary. You might consider using ISO format since in ECMAScript 5 strict mode there is supposed to be better date support and your code can be simplified without having to change the data format (and a simple object detect can let you use this code for any browser that supports it):
Date
Dates are now capable of both parsing and outputting ISO-formatted dates.
The Date constructor now attempts to parse the date as if it was ISO-formatted, first, then moves on to the other inputs that it accepts.
Additionally, date objects now have a new .toISOString() method that outputs the date in an ISO format.
var date = new Date("2009-05-21T16:06:05.000Z");
print( date.toISOString() );
// 2009-05-21T16:06:05.000Z
I wouldn't do that way you are doing (contatenating)
You can try creating a CLR SQL function that uses JSON.net and returns a varchar.
See here how to create SQL CLR Functions:
http://msdn.microsoft.com/en-us/library/w2kae45k(v=vs.80).aspx
Something like this (untested code)
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString MyFunctionName(int id) {
// Put your code here (maybe find the object you want to serialize using the id passed?)
using (var cn = new SqlConnection("context connection=true") ) {
//get your data into an object
var myObject = new {Name = "My Name"};
return new SqlString(Newtonsoft.Json.JsonConvert.SerializeObject(myObject));
}
}

Fixing SQL injection forms in a big asp.net C# web application

I have to fix a project that is vulnerable to SQL injection.
All the forms in every page on the project do not use parametrized query but simply string query.
For example I have the search page, and looking at the code behind I see that there is a method CreateQuery() that creates the query basing on the text fields as example:
string sQuery = "";
sQuery += "b.name like '%" + txtName.Text + "%'";
Then in the btnSearch_Click() I have the method that does the query:
query = CreateQuery();
var totalList = GetAllBlaBla(query);
My question is:
Since I have hundreds of forms and thousands of formText and values to FIX, is there a "quick" solution to implement like
A global function that parametrizes the query or handle the situation in some way?
Since in every class the query is executed in the SubmitButton_Click() code behind method, can I handle the situation here, of course in every class?
Should I modify every form and every entry in the form codebehind to parametrize the SQL string, that is gonna take one million of years?
(Edit) What about Encode/Decode input values? SO that the example above will be:
string sQuery = "";
var txt = var txt = HttpUtility.HtmlEncode(txtName.Text);
sQuery += "b.name like '%" + txt + "%'";
Is this a possible temporary patch?
5- (Edit) Is this a possible solution, or it simply does not change anything?
cmd.Parameters.Add("#txtNameParameter", SqlDbType.VarChar);
cmd.Parameters["#txtNameParameter"].Value = txtName.Text;
sQuery += "b.name like '%" + (string)cmd.Parameters["#txtNameParameter"].Value + "%'";
The problem is that I have to return a string because the logic that handles the query is defined in another business class that takes a string as a query, I cannot give it a CommandType or SqlDataAdapter...
Suggestion?
Thanks in advance.
You already know there is a problem; IMO, any "quick" fix here is likely to reduce the attack surface, but is not likely to prevent determined abuse; simply, blacklisting is truly hard and there are some really bizarre inputs readily available on black-hat (and, as samples, on white-hat) sites. These are not always easily recognizable as abusive. It isn't all ' drop table Customers -- ;p
WHATEVER you do, I would advise doing it properly; parameters. Tools like dapper might reduce the code you need, though:
sQuery += "b.name like '%'+#text+'%'"
...
conn.Execute(sQuery, new {text=txtName.Text});
(which is easier than handling all the parameters etc manually)
Modify every query + validate every input.
Takes time but think about the guy who will maintain and add features to that "big web application"
(it may be you).
<customErrors mode="On"/>
This will prevent users to see the errors so a potential hacker would have very few clues how to exploit this security door, based on the error messages he/shes sees
Add Elmah to log errors.
Rewrite every query to use Parameters or use an ORM.
Any javascript based solution is useless, since a "hacker" for sure knows how to disable it.
The Plain Way
Estimate the number of replacements by searching the project for string sQuery = ". Multiply it by the time you plan spending on fixing a single query (e.g. one fix in 5 minutes, plus a coffee break each 10 fixes).
Add time for testing the whole website.
Then tell management the fix is going to be huge, give them the estimate and just do it.
Surely you'll have headaches for a couple of days but at least you'll have the thing working.
The Creative Way
If you literally mean hundreds and thousands of such forms with nearly identical code (e.g. query always created in CreateQuery, executed in SubmitButton_Click), I would consider learning to use Visual Studio regular expression syntax and crafting a couple of very accurate search & replace patterns.
This saved me hours of work in one project, but you'll need to be really precise with regexps and make sure you understand what you're doing.
Another option, again, when you're sure it is worth it, is to write a tool that will rewrite C# sources.
If all you need is a simple transform like the one Marc mentioned, it could take a couple hours of work.
But you can fail miserably here so it's a risky route.
Reduce the permissions of the database account that your application uses to access data. Hopefully it doesn't have sysadmin. Remove permissions to drop tables. If the account is used only for data retrieval, remove all permissions to update data. You might even consider setting up views, locking them down, and using them instead of direct table access.
Turn on ASP.NET Request Validation, described here. This automatically checks all traffic for malicious character sequences.
If possible, consider adding an event handler to Global for OnBeginRequest that inspects the incoming data and performs white list checks on all inputs. Not sure how well this maps to your problem but the nice thing is you only have to do it in once place and it will affect the whole site.
Ensure that all of your pages call Page.Validate to ensure that the client-side validation is also enforced on the server side.
Begin the long hard work of adding field-specific white list validation to every control, and figure out a long term plan to move onto parameterized database calls.
find all textbox controls on each page during pageload and disable special keys press handling

Execute multiple SQL commands in one round trip

I am building an application and I want to batch multiple queries into a single round-trip to the database. For example, lets say a single page needs to display a list of users, a list of groups and a list of permissions.
So I have stored procs (or just simple sql commands like "select * from Users"), and I want to execute three of them. However, to populate this one page I have to make 3 round trips.
Now I could write a single stored proc ("getUsersTeamsAndPermissions") or execute a single SQL command "select * from Users;exec getTeams;select * from Permissions".
But I was wondering if there was a better way to specify to do 3 operations in a single round trip. Benefits include being easier to unit test, and allowing the database engine to parrallelize the queries.
I'm using C# 3.5 and SQL Server 2008.
Something like this. The example is probably not very good as it doesn't properly dispose objects but you get the idea. Here's a cleaned up version:
using (var connection = new SqlConnection(ConnectionString))
using (var command = connection.CreateCommand())
{
connection.Open();
command.CommandText = "select id from test1; select id from test2";
using (var reader = command.ExecuteReader())
{
do
{
while (reader.Read())
{
Console.WriteLine(reader.GetInt32(0));
}
Console.WriteLine("--next command--");
} while (reader.NextResult());
}
}
The single multi-part command and the stored procedure options that you mention are the two options. You can't do them in such a way that they are "parallelized" on the db. However, both of those options does result in a single round trip, so you're good there. There's no way to send them more efficiently. In sql server 2005 onwards, a multi-part command that is fully parameterized is very efficient.
Edit: adding information on why cram into a single call.
Although you don't want to care too much about reducing calls, there can be legitimate reasons for this.
I once was limited to a crummy ODBC driver against a mainframe, and there was a 1.2 second overhead on each call! I'm serious. There were times when I crammed a little extra into my db calls. Not pretty.
You also might find yourself in a situation where you have to configure your sql queries somewhere, and you can't just make 3 calls: it has to be one. It shouldn't be that way, bad design, but it is. You do what you gotta do!
Sometimes of course it can be very good to encapsulate multiple steps in a stored procedure. Usually not for saving round trips though, but for tighter transactions, getting ID for new records, constraining for permissions, providing encapsulation, blah blah blah.
Making one round-trip vs three will be more eficient indeed. The question is wether it is worth the trouble. The entire ADO.Net and C# 3.5 toolset and framework opposes what you try to do. TableAdapters, Linq2SQL, EF, all these like to deal with simple one-call==one-resultset semantics. So you may loose some serious productivity by trying to beat the Framework into submission.
I would say that unless you have some serious measurements showing that you need to reduce the number of roundtrips, abstain. If you do end up requiring this, then use a stored procedure to at least give an API kind of semantics.
But if your query really is what you posted (ie. select all users, all teams and all permissions) then you obviosuly have much bigger fish to fry before reducing the round-trips... reduce the resultsets first.
I this this link might be helpful.
Consider using at least the same connection-openning; according to what it says here, openning a connection is almost the top-leader of performance cost in Entity-Framework.
Firstly, 3 round trips isn't really a big deal. If you were talking about 300 round trips then that would be another matter, but for just 3 round trips I would conderer this to definitley be a case of premature optimisation.
That said, the way I'd do this would probably be to executed the 3 stored procuedres using SQL:
exec dbo.p_myproc_1 #param_1 = #in_param_1, #param_2 = #in_param_2
exec dbo.p_myproc_2
exec dbo.p_myproc_3
You can then iterate through the returned results sets as you would if you directly executed multiple rowsets.
Build a temp-table? Insert all results into the temp table and then select * from #temp-table
as in,
#temptable=....
select #temptable.field=mytable.field from mytable
select #temptable.field2=mytable2.field2 from mytable2
etc... Only one trip to the database, though I'm not sure it is actually more efficient.

Avoiding SQL injection without parameters

We are having another discussion here at work about using parametrized sql queries in our code. We have two sides in the discussion: Me and some others that say we should always use parameters to safeguard against sql injections and the other guys that don't think it is necessary. Instead they want to replace single apostrophes with two apostrophes in all strings to avoid sql injections. Our databases are all running Sql Server 2005 or 2008 and our code base is running on .NET framework 2.0.
Let me give you a simple example in C#:
I want us to use this:
string sql = "SELECT * FROM Users WHERE Name=#name";
SqlCommand getUser = new SqlCommand(sql, connection);
getUser.Parameters.AddWithValue("#name", userName);
//... blabla - do something here, this is safe
While the other guys want to do this:
string sql = "SELECT * FROM Users WHERE Name=" + SafeDBString(name);
SqlCommand getUser = new SqlCommand(sql, connection);
//... blabla - are we safe now?
Where the SafeDBString function is defined as follows:
string SafeDBString(string inputValue)
{
return "'" + inputValue.Replace("'", "''") + "'";
}
Now, as long as we use SafeDBString on all string values in our queries we should be safe. Right?
There are two reasons to use the SafeDBString function. First, it is the way it has been done since the stone ages, and second, it is easier to debug the sql statements since you see the excact query that is run on the database.
So then. My question is whether it really is enough to use the SafeDBString function to avoid sql injection attacks. I have been trying to find examples of code that breaks this safety measure, but I can't find any examples of it.
Is there anybody out there that can break this? How would you do it?
EDIT:
To summarize the replies so far:
Nobody has found a way to get around the SafeDBString on Sql Server 2005 or 2008 yet. That is good, I think?
Several replies pointed out that you get a performance gain when using parametrized queries. The reason is that the query plans can be reused.
We also agree that using parametrized queries give more readable code that is easier to maintain
Further it is easier to always use parameters than to use various versions of SafeDBString, string to number conversions and string to date conversions.
Using parameters you get automatic type conversion, something that is especially useful when we are working with dates or decimal numbers.
And finally: Don't try to do security yourself as JulianR wrote. The database vendors spend lots of time and money on security. There is no way we can do better and no reason we should try to do their job.
So while nobody was able to break the simple security of the SafeDBString function I got lots of other good arguments. Thanks!
I think the correct answer is:
Don't try to do security yourself. Use whatever trusted, industry standard library there is available for what you're trying to do, rather than trying to do it yourself. Whatever assumptions you make about security, might be incorrect. As secure as your own approach may look (and it looks shaky at best), there's a risk you're overlooking something and do you really want to take that chance when it comes to security?
Use parameters.
And then somebody goes and uses " instead of '. Parameters are, IMO, the only safe way to go.
It also avoids a lot of i18n issues with dates/numbers; what date is 01/02/03? How much is 123,456? Do your servers (app-server and db-server) agree with each-other?
If the risk factor isn't convincing to them, how about performance? The RDBMS can re-use the query plan if you use parameters, helping performance. It can't do this with just the string.
The argument is a no-win. If you do manage to find a vulnerability, your co-workers will just change the SafeDBString function to account for it and then ask you to prove that it's unsafe all over again.
Given that parametrized queries are an undisputed programming best practice, the burden of proof should be on them to state why they aren't using a method that is both safer and better performing.
If the issue is rewriting all the legacy code, the easy compromise would be to use parametrized queries in all new code, and refactor old code to use them when working on that code.
My guess is the actual issue is pride and stubbornness, and there's not much more you can do about that.
First of all, your sample for the "Replace" version is wrong. You need to put apostrophes around the text:
string sql = "SELECT * FROM Users WHERE Name='" + SafeDBString(name) & "'";
SqlCommand getUser = new SqlCommand(sql, connection);
So that's one other thing parameters do for you: you don't need to worry about whether or not a value needs to be enclosed in quotes. Of course, you could build that into the function, but then you need to add a lot of complexity to the function: how to know the difference between 'NULL' as null and 'NULL' as just a string, or between a number and a string that just happens to contain a lot of digits. It's just another source for bugs.
Another thing is performance: parameterized query plans are often cached better than concatenated plans, thus perhaps saving the server a step when running the query.
Additionally, escaping single quotes isn't good enough. Many DB products allow alternate methods for escaping characters that an attacker could take advantage of. In MySQL, for example, you can also escape a single quote with a backslash. And so the following "name" value would blow up MySQL with just the SafeDBString() function, because when you double the single quote the first one is still escaped by the backslash, leaving the 2nd one "active":
x\' OR 1=1;--
Also, JulianR brings up a good point below: NEVER try to do security work yourself. It's so easy to get security programming wrong in subtle ways that appear to work, even with thorough testing. Then time passes and a year later your find out your system was cracked six months ago and you never even knew it until just then.
Always rely as much as possible on the security libraries provided for your platform. They will be written by people who do security code for a living, much better tested than what you can manage, and serviced by the vendor if a vulnerability is found.
So I'd say:
1) Why are you trying to re-implement something that's built in? it's there, readily available, easy to use and already debugged on a global scale. If future bugs are found in it, they'll be fixed and available to everyone very quickly without you having to do anything.
2) What processes are in place to guarantee that you never miss a call to SafeDBString? Missing it in just 1 place could open up a whole host of issues. How much are you going to eyeball these things, and consider how much wasted that effort is when the accepted correct answer is so easy to reach.
3) How certain are you that you've covered off every attack vector that Microsoft(the author of the DB and the access library) knows about in your SafeDBString implementation ...
4) How easy is it to read the structure of the sql? The example uses + concatenation, parameters are very like string.Format, which is more readable.
Also, there are 2 ways of working out what was actually run - roll your own LogCommand function, a simple function with no security concerns, or even look at an sql trace to work out what the database thinks is really going on.
Our LogCommand function is simply:
string LogCommand(SqlCommand cmd)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(cmd.CommandText);
foreach (SqlParameter param in cmd.Parameters)
{
sb.Append(param.ToString());
sb.Append(" = \"");
sb.Append(param.Value.ToString());
sb.AppendLine("\"");
}
return sb.ToString();
}
Right or wrong, it gives us the information we need without security issues.
With parameterised queries you get more than protection against sql injection. You also get better execution plan caching potential. If you use the sql server query profiler you can still see the 'exact sql that is run on the database' so you're not really losing anything in terms of debugging your sql statements either.
I have used both approaches to avoid SQL injection attacks and definitely prefer parametrized queries. When I have used concatenated queries I have used a library function to escape the variables (like mysql_real_escape_string) and wouldn't be confident I have covered everything in a proprietary implementation (as it seems you are too).
You aren't able to easily do any type checking of the user input without using parameters.
If you use the SQLCommand and SQLParameter classes to make you're DB calls, you can still see the SQL query that's being executed. Look at the SQLCommand's CommandText property.
I'm always a litle suspect of the roll-your-own approach to preventing SQL injection when parameterized queries are so easy to use. Second, just because "it's always been done that way" doesn't mean it's the right way to do it.
This is only safe if you're guaranteed that you're going to pass in a string.
What if you're not passing in a string at some point? What if you pass just a number?
http://www.mywebsite.com/profile/?id=7;DROP DATABASE DB
Would ultimately become:
SELECT * FROM DB WHERE Id = 7;DROP DATABASE DB
I'd use stored procedures or functions for everything, so the question wouldn't arise.
Where I have to put SQL into code, I use parameters, which is the only thing that makes sense. Remind the dissenters that there are hackers smarter than they are, and with better incentive to break the code that's trying to outsmart them. Using parameters, it's simply not possible, and it's not like it's difficult.
Agree hugely on the security issues.
Another reason to use parameters is for efficiency.
Databases will always compile your query and cache it, then re-use the cached query (which is obviously faster for subsequent requests).
If you use parameters then even if you use different parameters the database will re-use your cached query as it matches based on the SQL string before binding the parameters.
If however you don't bind parameters then the SQL string changes on every request (that has different parameters) and it will never match what's in your cache.
For the reasons already given, parameters are a very good idea. But we hate using them because creating the param and assigning its name to a variable for later use in a query is a triple indirection head wreck.
The following class wraps the stringbuilder that you will commonly use for building SQL requests. It lets you write paramaterized queries without ever having to create a parameter, so you can concentrate on the SQL. Your code will look like this...
var bldr = new SqlBuilder( myCommand );
bldr.Append("SELECT * FROM CUSTOMERS WHERE ID = ").Value(myId, SqlDbType.Int);
//or
bldr.Append("SELECT * FROM CUSTOMERS WHERE NAME LIKE ").FuzzyValue(myName, SqlDbType.NVarChar);
myCommand.CommandText = bldr.ToString();
Code readability, I hope you agree, is greatly improved, and the output is a proper parameterized query.
The class looks like this...
using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Data.SqlClient;
namespace myNamespace
{
/// <summary>
/// Pour le confort et le bonheur, cette classe remplace StringBuilder pour la construction
/// des requêtes SQL, avec l'avantage qu'elle gère la création des paramètres via la méthode
/// Value().
/// </summary>
public class SqlBuilder
{
private StringBuilder _rq;
private SqlCommand _cmd;
private int _seq;
public SqlBuilder(SqlCommand cmd)
{
_rq = new StringBuilder();
_cmd = cmd;
_seq = 0;
}
//Les autres surcharges de StringBuilder peuvent être implémenté ici de la même façon, au besoin.
public SqlBuilder Append(String str)
{
_rq.Append(str);
return this;
}
/// <summary>
/// Ajoute une valeur runtime à la requête, via un paramètre.
/// </summary>
/// <param name="value">La valeur à renseigner dans la requête</param>
/// <param name="type">Le DBType à utiliser pour la création du paramètre. Se référer au type de la colonne cible.</param>
public SqlBuilder Value(Object value, SqlDbType type)
{
//get param name
string paramName = "#SqlBuilderParam" + _seq++;
//append condition to query
_rq.Append(paramName);
_cmd.Parameters.Add(paramName, type).Value = value;
return this;
}
public SqlBuilder FuzzyValue(Object value, SqlDbType type)
{
//get param name
string paramName = "#SqlBuilderParam" + _seq++;
//append condition to query
_rq.Append("'%' + " + paramName + " + '%'");
_cmd.Parameters.Add(paramName, type).Value = value;
return this;
}
public override string ToString()
{
return _rq.ToString();
}
}
}
From the very short time I've had to investigate SQL injection problems, I can see that making a value 'safe' also means that you're shutting the door to situations where you might actually want apostrophes in your data - what about someone's name, eg O'Reilly.
That leaves parameters and stored procedures.
And yes, you should always try to implement code in the best way you know now - not just how its always been done.
Here are a couple of articles that you might find helpful in convincing your co-workers.
http://www.sommarskog.se/dynamic_sql.html
http://unixwiz.net/techtips/sql-injection.html
Personally I prefer to never allow any dynamic code to touch my database, requiring all contact to be through sps (and not one which use dynamic SQl). This means nothing excpt what I have given users permission to do can be done and that internal users (except the very few with production access for admin purposes) cannot directly access my tables and create havoc, steal data or commit fraud. If you run a financial application, this is the safest way to go.
It can be broken, however the means depends on exact versions/patches etc.
One that has already been brought up is the overflow/truncation bug that can be exploited.
Another future means would be finding bugs similar to other databases - for example the MySQL/PHP stack suffered an escaping problem because certain UTF8 sequences could be used to manipulate the replace function - the replace function would be tricked into introducing the injection characters.
At the end of the day, the replacement security mechanism relies on expected but not intended functionality. Since the functionality was not the intended purpose of the code, there is a high probablity that some discovered quirk will break your expected functionality.
If you have a lot of legacy code, the replace method could be used as a stopgap to avoid lengthy rewriting and testing. If you are writing new code, there is no excuse.
Always use parameterized queries where possible. Sometimes even a simple input without the use of any weird characters can already create an SQL-injection if its not identified as a input for a field in the database.
So just let the database do its work of identifying the input itself, not to mention it also saves allot of trouble when you need to actually insert weird characters that otherwise would be escaped or changed. It can even save some valuable runtime in the end for not having to calculate the input.
I did not see any other answsers address this side of the 'why doing it yourself is bad', but consider a SQL Truncation attack.
There is also the QUOTENAME T-SQL function that can be helpful if you can't convince them to use params. It catches a lot (all?) of the escaped qoute concerns.
2 years later, I recidivated... Anyone who finds parameters a pain is welcome to try my VS Extension, QueryFirst. You edit your request in a real .sql file (Validation, Intellisense). To add a parameter, you just type it directly into your SQL, starting with the '#'. When you save the file, QueryFirst will generate wrapper classes to let you run the query and access the results. It will look up the DB type of your parameter and map it to a .net type, which you will find as an input to the generated Execute() methods. Could not be simpler. Doing it the right way is radically quicker and easier than doing it any other way, and creating a sql injection vulnerability becomes impossible, or at least perversely difficult. There are other killer advantages, like being able to delete columns in your DB and immediately see compile errors in your application.
legal disclaimer : I wrote QueryFirst
Here are a few reasons to use parameterized queries:
Security - The database access layer knows how to remove or escape items that are not allowed in data.
Separation of concerns - My code is not responsible for transforming the data into a format that the database likes.
No redundancy - I don't need to include an assembly or class in every project that does this database formatting/escaping; it's built in to the class library.
There were few vulnerability(I can't remember which database it was) that is related to buffer overflow of the SQL statement.
What I want to say is, SQL-Injection is more then just "escape the quote", and you have no idea what will come next.
Another important consideration is keeping track of escaped and unescaped data. There are tons and tons of applications, Web and otherwise, that don't seem to properly keep track of when data is raw-Unicode, &-encoded, formatted HTML, et cetera. It's obvious that it will become difficult to keep track of which strings are ''–encoded and which aren't.
It's also a problem when you end up changing the type of some variable — perhaps it used to be an integer, but now it's a string. Now you have a problem.

Categories

Resources