Use SQL to return a JSON string - c#

This is a "best practice" question. We are having internal discussions on this topic and want to get input from a wider audience.
I need to store my data in a traditional MS SQL Server table with normal columns and rows. I sometimes need to return a DataTable to my web application, and other times I need to return a JSON string.
Currently, I return the table to the middle layer and parse it into a JSON string. This seems to work well for the most part, but does occasionally take a while on large datasets (parsing the data, not returning the table).
I am considering revising the stored procedures to selectively return a DataTable or a JSON string. I would simply add a #isJson bit parameter to the SP.
If the user wanted the string instead of the table the SP would execute a query like this:
DECLARE #result varchar(MAX)
SELECT #result = COALESCE(#results ',', '') + '{id:"' + colId + '",name:"' + colName + '"}'
FROM MyTable
SELECT #result
This produces something like the following:
{id:"1342",name:"row1"},{id:"3424",name:"row2"}
Of course, the user can also get the table by passing false to the #isJson parameter.
I want to be clear that the data storage isn't affected, nor are any of the existing views and other processes. This is a change to ONLY the results of some stored procedures.
My questions are:
Has anyone tried this in a large application? If so, what was the result?
What issues have you seen/would you expect with this approach?
Is there a better faster way to go from table to JSON in SQL Server other than modifying the stored procedure in this way or parsing the string in the middle tier?

I personally think the best place for this kind of string manipulation is in program code in a fully expressive language that has functions and can be compiled. Doing this in T-SQL is not good. Program code can have fast functions that do proper escaping.
Let's think about things a bit:
When you deploy new versions of the parts and pieces of your application, where is the best place for this functionality to be?
If you have to restore your database (and all its stored procedures) will that negatively affect anything? If you are deploying a new version of your web front end, will the JSON conversion being tied into the database cause problems?
How will you escape characters properly? Are you sending any dates through? What format will date strings be in and how will they get converted to actual Date objects on the other end (if that is needed)?
How will you unit test it (and with automated tests!) to prove it is working correctly? How will you regression test it?
SQL Server UDFs can be very slow. Are you content to use a slow function, or for speed hack into your SQL code things like Replace(Replace(Replace(Replace(Value, '\', '\\'), '"', '\"'), '''', '\'''), Char(13), '\n')? What about Unicode, \u and \x escaping? How about splitting '</script>' into '<' + '/script>'? (Maybe that doesn't apply, but maybe it does, depending on how you use your JSON.) Is your T-SQL procedure going to do all this, and be reusable for different recordsets, or will you rewrite it each time into each SP that you need to return JSON?
You may only have one SP that needs to return JSON. For now. Some day, you might have more. Then if you find a bug, you have to fix it in two places. Or five. Or more.
It may seem like you are making things more complicated by having the middle layer do the translation, but I promise you it is going to be better in the long run. What if your product scales out and starts going massively parallel—you can always throw more web servers at it cheaply, but you can't so easily fix database server resource saturation! So don't make the DB do more work than it should. It is a data access layer, not a presentation layer. Make it do the minimum amount of work possible. Write code for everything else. You will be glad you did.
Speed Tips for String Handling in a Web Application
Make sure your web string concatenation code doesn't suffer from Schlemiel the Painter's Algorithm. Either directly write to the output buffer as JSON is generated (Response.Write), or use a proper StringBuilder object, or write the parts of the JSON to an array and Join() it later. Don't do plain vanilla concatenation to a longer and longer string over and over.
Dereference objects as little as possible. I don't know your server-side language, but if it happens to be ASP Classic, don't use field names--either get a reference to each field in a variable or at the very least use integer field indexes. Dereferencing a field based on its name inside a loop is (much) worse performance.
Use pre-built libraries. Don't roll your own when you can use a tried and true library. Performance should be equal or better to your own and (most importantly) it will be tested and correct.
If you're going to spend the time doing this, make it abstract enough to handle converting any recordset, not just the one you have now.
Use compiled code. You can always get the fastest code when it is compiled, not interpreted. If you identify that the JSON-conversion routines are truly the bottleneck (and you MUST prove this for real, do not guess) then get the code into something that is compiled.
Reduce string lengths. This is not a big one, but if at all possible use one-letter json names instead of many-letter. For a giant recordset this will add up to savings on both ends.
Ensure it is GZipped. This is not so much a server-side improvement, but I couldn't mention JSON performance without being complete.
Passing Dates in JSON
What I recommend is to use a separate JSON schema (itself in JSON, defining the structure of the virtual recordset to follow). This schema can be sent as a header to the "recordset" to follow, or it can be already loaded in the page (included in the base javascript files) so it doesn't have to be sent each time. Then, in your JSON parse callback (or post-callback on the final resultant object) look in the schema for the current column and do conversions as necessary. You might consider using ISO format since in ECMAScript 5 strict mode there is supposed to be better date support and your code can be simplified without having to change the data format (and a simple object detect can let you use this code for any browser that supports it):
Date
Dates are now capable of both parsing and outputting ISO-formatted dates.
The Date constructor now attempts to parse the date as if it was ISO-formatted, first, then moves on to the other inputs that it accepts.
Additionally, date objects now have a new .toISOString() method that outputs the date in an ISO format.
var date = new Date("2009-05-21T16:06:05.000Z");
print( date.toISOString() );
// 2009-05-21T16:06:05.000Z

I wouldn't do that way you are doing (contatenating)
You can try creating a CLR SQL function that uses JSON.net and returns a varchar.
See here how to create SQL CLR Functions:
http://msdn.microsoft.com/en-us/library/w2kae45k(v=vs.80).aspx
Something like this (untested code)
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString MyFunctionName(int id) {
// Put your code here (maybe find the object you want to serialize using the id passed?)
using (var cn = new SqlConnection("context connection=true") ) {
//get your data into an object
var myObject = new {Name = "My Name"};
return new SqlString(Newtonsoft.Json.JsonConvert.SerializeObject(myObject));
}
}

Related

Store c# object in sql server database

I would like to store a c# object in SQL server. I thought about the following options:
Read object byte memory stream and save them into the database (but
not readable in sql)
Json, readable, easy to convert but what data type? (only a datatype for sql 2016)
XML, a bit less readable, easy to convert, there is an XML dataType
What's the best practice to store a C# object in a sql column and why?
I am using SQL 2014, so I think option 3 is the best?
Edit:
Note: it's not data to query, I just want to load a object which I have cached into a c# object in memory. And perform some logic on that in c#. It just takes a while to get the data from another database, therefore I save all my data in a custom object. Therefore I don't think I should use ORM
If it's just to throw in a database to read back at some point later by a key, then go with (2) and just use an nvarchar(max) field type.
If it's data to query, then you should probably design a schema to match and use an ORM.
If you are more positive towards option B, then you can store json-serialized string of any object[or datatype] in sql server as NVARCHAR(MAX) field.
And when you want to read it you can easily de-serialize that string in original format.
e.g.
Demo d1=new Demo();
//store this json into database.
string json= JsonConvert.SerializeObject(d);
// Now while reading fron db
Demo d2= JsonConvert.DeserializeObject<Demo>(json);
I'd go for JSON serialisation, it's just text, so when storing things like "user profile settings" or other types of structural data you're covered as you can read and write JSON in any language. Now SQL server has also understood this, like the XML support that was such a hype 8-10 years ago one can now store JSON with a good deal of TSQL support for those that need to update the data, like when you need to fix all updates for all user where...
anyway, have a look at the article. JSON in SQL Server 2016-2017
When going to and from JSON you should test your properties as some data types might not convert back and forward nice depending on things like regional specific settings like date and decimal values.

How can I access a Properties.Settings.Default property if I have its name as a string?

I have a number of properties in Properties.Settings.Default whose name all start with "store" and an integer number, these numbers follow in sequence and what I would like to do after the method is fired off is to increase the number in the property name, i.e. from "store1" to "store2".
I keep getting an "identifier expected" error. I'm rather new at programming, so any help would be appreciated.
public void store()
{
storename1.ForeColor = Color.Orange;
if (File.Exists(Filedestination))
{
File.Delete(Filedestination);
}
NumberOfScales = Properties.Settings.Default.("store"+ Convert.ToString(storeNumber) + "NrOfScales");
StartRange = EndRange - Properties.Settings.Default.DegrendelNrOfScales;
IPRange = Properties.Settings.Default.DegrendelIPRange;
CurrentRange = StartRange;
PingScales();
}
I don't even know how I can read a property with the name ("store" + Convert.ToString(storeNumber) + "NrOfScales"). If I knew how to do that, it would shorten the code by at least 9/10ths as I would not have to redo this for every single instance of all the stores that I have. Is there any way I can get this to work?
At first glance, it seems like you possibly chose the wrong place to store your data. Is there any particular reason why you are using Windows Forms' application settings (Settings) to store data?
If you really want to do it that way, IIRC you can access a setting by its name using Properties.Settings.Default["PropertyName"] (where you can substitute "PropertyName" by any expression that yields a string, e.g. "store" + Convert.ToString(storeNumber) + "NrOfScales" (or more succinctly in Visual Studio 2015 or later, $"store{storeNumber}NrOfScales"). You will get back an object that you'll have to cast to whatever type of values you stored in there, e.g.:
var numberOfScales = (int)Properties.Settings.Default[$"store{storeNumber:D}NrOfScales"];
Some hints about syntax used here:
The [] syntax is called an "indexer".
$"…" is for string interpolation. It often allows for neater concatenation of strings than by using +.
The D (decimal) format specifier used in $"…{…:D}…" makes sure that storeNumber will be formatted as a decimal without any thousands/decimal separators.
Now, back to my initial question, if you're open to other means of storing data, let me point out a few alternatives:
If you only need the data during one single execution of your program, i.e. the data does not need to be persisted from one run of the program to the next, then a Dictionary<string, int> might be sufficient. Dictionaries allow you to associate int values with string values and look them up by these strings.
If your data is actually user content / business data, then don't store it as "application settings". At the least, store the data to a simple file (possibly to Isolated Storage) using the facilities under System.IO (File.Create, File.Open, StreamWriter, etc.). If you want to store structured data, you could make use of relational databases (see e.g. SQLite, SQL Server Compact, or SQL Server) or document databases.
If the data you're storing is in fact data that influences the setup / configuration of your application, then your current use of application settings might be fine.

Is it safe to not parameterize an SQL query when the parameter is not a string?

In terms of SQL injection, I completely understand the necessity to parameterize a string parameter; that's one of the oldest tricks in the book. But when can it be justified to not parameterize an SqlCommand? Are any data types considered "safe" to not parameterize?
For example: I don't consider myself anywhere near an expert in SQL, but I can't think of any cases where it would be potentially vulnerable to SQL injection to accept a bool or an int and just concatenate it right into the query.
Is my assumption correct, or could that potentially leave a huge security vulnerability in my program?
For clarification, this question is tagged c# which is a strongly-typed language; when I say "parameter," think something like public int Query(int id).
I think it's safe... technically, but it's a terrible habit to get into. Do you really want to be writing queries like this?
var sqlCommand = new SqlCommand("SELECT * FROM People WHERE IsAlive = " + isAlive +
" AND FirstName = #firstName");
sqlCommand.Parameters.AddWithValue("firstName", "Rob");
It also leaves you vulnerable in the situation where a type changes from an integer to a string (Think employee number which, despite its name - may contain letters).
So, we've changed the type of EmployeeNumber from int to string, but forgot to update our sql queries. Oops.
When using a strongly-typed platform on a computer you control (like a web server), you can prevent code injection for queries with only bool, DateTime, or int (and other numeric) values. What is a concern are performance issues caused by forcing sql server to re-compile every query, and by preventing it from getting good statistics on what queries are run with what frequency (which hurts cache management).
But that "on a computer you control" part is important, because otherwise a user can change the behavior used by the system for generating strings from those values to include arbitrary text.
I also like to think long-term. What happens when today's old-and-busted strongly-typed code base gets ported via automatic translation to the new-hotness dynamic language, and you suddenly lose the type checking, but don't have all the right unit tests yet for the dynamic code?
Really, there's no good reason not to use query parameters for these values. It's the right way to go about this. Go ahead and hard-code values into the sql string when they really are constants, but otherwise, why not just use a parameter? It's not like it's hard.
Ultimately, I wouldn't call this a bug, per se, but I would call it a smell: something that falls just short of a bug by itself, but is a strong indication that bugs are nearby, or will be eventually. Good code avoids leaving smells, and any good static analysis tool will flag this.
I'll add that this is not, unfortunately, the kind of argument you can win straight up. It sounds like a situation where being "right" is no longer enough, and stepping on your co-workers toes to fix this issue on your own isn't likely to promote good team dynamics; it could ultimately hurt more than it helps. A better approach in this case may be to promote the use of a static analysis tool. That would give legitimacy and credibility to efforts aimed and going back and fixing existing code.
In some cases, it IS possible to perform SQL injection attack with non-parametrized (concatenated) variables other than string values - see this article by Jon: http://codeblog.jonskeet.uk/2014/08/08/the-bobbytables-culture/ .
Thing is that when ToString is called, some custom culture provider can transform a non-string parameter into its string representation which injects some SQL into the query.
This is not safe even for non-string types. Always use parameters. Period.
Consider following code example:
var utcNow = DateTime.UtcNow;
var sqlCommand = new SqlCommand("SELECT * FROM People WHERE created_on <= '" + utcNow + "'");
At the first glance code looks safe, but everything changes if you make some changes in Windows Regional Settings and add injection in short date format:
Now resulting command text looks like this:
SELECT * FROM People WHERE created_on <= '26.09.2015' OR '1'<>' 21:21:43'
The same can be done for int type as user can define custom negative sign which can be easily changed into SQL injection.
One could argue that invariant culture should be used instead of current culture, but I have seen string concatenations like this so many times and it is quite easy to miss when concatenating strings with objects using +.
"SELECT * FROM Table1 WHERE Id=" + intVariable.ToString()
Security
It is OK.
Attackers can not inject anything in your typed int variable.
Performance
Not OK.
It's better to use parameters, so the query will be compiled once and cached for next usage. Next time even with different parameter values, query is cached and doesn't need to compile in database server.
Coding Style
Poor practice.
Parameters are more readable
Maybe it makes you get used to queries without parameters, then maybe you made a mistake once and use a string value this way and then you probably should say goodbye to your data. Bad habit!
"SELECT * FROM Product WHERE Id=" + TextBox1.Text
Although it is not your question, but maybe useful for future readers:
Security
Disaster!
Even when the Id field is an integer, your query may be subject to SQL Injection. Suppose you have a query in your application "SELECT * FROM Table1 WHERE Id=" + TextBox1.Text. An attacker can insert into text box 1; DELETE Table1 and the query will be:
SELECT * FROM Table1 WHERE Id=1; DELETE Table1
If you don't want to use a parametrized query here, you should use typed values:
string.Format("SELECT * FROM Table1 WHERE Id={0}", int.Parse(TextBox1.Text))
Your Question
My question arose because a coworker wrote a bunch of queries
concatenating integer values, and I was wondering whether it was a
waste of my time to go through and fix all of them.
I think changing those codes is not waste of time. Indeed change is recommended!
If your coworker uses int variables it has no security risk, but I think changing those codes is not waste of time and indeed changing those codes is recommended. It makes code more readable, more maintainable, and makes execution faster.
There are actually two questions in one. And question from the title has very little to do with concerns expressed by the OP in the comments afterwards.
Although I realize that for the OP it is their particular case that matters, for the readers coming from Google, it is important to answer to the more general question, that can be phrased as "is concatenation as safe as prepared statements if I made sure that every literal I am concatenating is safe?". So, I would like to concentrate on this latter one. And the answer is
Definitely NO.
The explanation is not that direct as most readers would like, but I'll try my best.
I have been pondering on the matter for a while, resulting in the article (though based on the PHP environment) where I tried to sum everything up. It occurred to me that the question of protection from SQL injection is often eludes toward some related but narrower topics, like string escaping, type casting and such. Although some of the measures can be considered safe when taken by themselves, there is no system, nor a simple rule to follow. Which makes it very slippery ground, putting too much on the developer's attention and experience.
The question of SQL injection cannot be simplified to a matter of some particular syntax issue. It is wider than average developer used to think. It's a methodological question as well. It is not only "Which particular formatting we have to apply", but "How it have to be done" as well.
(From this point of view, an article from Jon Skeet cited in the other answer is doing rather bad than good, as it is again nitpicking on some edge case, concentrating on a particular syntax issue and failing to address the problem at whole.)
When you're trying to address the question of protection not as whole but as a set of different syntax issues, you're facing multitude of problems.
the list of possible formatting choices is really huge. Means one can easily overlook some. Or confuse them (by using string escaping for identifier for example).
Concatenation means that all protection measures have to be done by the programmer, not program. This issue alone leads to several consequences:
such a formatting is manual. Manual means extremely error prone. One could simply forget to apply.
moreover, there is a temptation to move formatting procedures into some centralized function, messing things even more, and spoiling data that is not going to database.
when more than one developers involved, problems multiply by a factor of ten.
when concatenation is used, one cannot tell a potentially dangerous query at glance: they all potentially dangerous!
Unlike that mess, prepared statements are indeed The Holy Grail:
it can be expressed in the form of one simple rule that is easy to follow.
it is essentially undetacheable measure, means the developer cannot interfere, and, willingly or unwillingly, spoil the process.
protection from injection is really only a side effect of the prepared statements, which real purpose is to produce syntactically correct statement. And a syntactically correct statement is 100% injection proof. Yet we need our syntax to be correct despite of any injection possibility.
if used all the way around, it protects the application regardless of the developer's experience. Say, there is a thing called second order injection. And a very strong delusion that reads "in order to protect, Escape All User Supplied Input". Combined together, they lead to injection, if a developer takes the liberty to decide, what needs to be protected and what not.
(Thinking further, I discovered that current set of placeholders is not enough for the real life needs and have to be extended, both for the complex data structures, like arrays, and even SQL keywords or identifiers, which have to be sometimes added to the query dynamically too, but a developer is left unarmed for such a case, and forced to fall back to string concatenation but that's a matter of another question).
Interestingly, this question's controversy is provoked by the very controversial nature of Stack Overflow. The site's idea is to make use of particular questions from users who ask directly to achieve the goal of having a database of general purpose answers suitable for users who come from search. The idea is not bad per se, but it fails in a situation like this: when a user asks a very narrow question, particularly to get an argument in a dispute with a colleague (or to decide if it worth to refactor the code). While most of experienced participants are trying to write an answer, keeping in mind the mission of Stack Overflow at whole, making their answer good for as many readers as possible, not the OP only.
Let's not just think about security or type-safe considerations.
The reason you use parametrized queries is to improve performance at the database level. From a database perspective, a parametrized query is one query in the SQL buffer (to use Oracle's terminology although I imagine all databases have a similar concept internally). So, the database can hold a certain amount of queries in memory, prepared and ready to execute. These queries do not need to be parsed and will be quicker. Frequently run queries will usually be in the buffer and will not need parsing every time they are used.
UNLESS
Somebody doesn't use parametrized queries. In this case, the buffer gets continually flushed through by a stream of nearly identical queries each of which needs to be parsed and run by the database engine and performance suffers all-round as even frequently run queries end up being re-parsed many times a day. I have tuned databases for a living and this has been one of the biggest sources of low-hanging fruit.
NOW
To answer your question, IF your query has a small number of distinct numeric values, you will probably not be causing issues and may in fact improve performance infinitesimally. IF however there are potentially hundreds of values and the query gets called a lot, you are going to affect the performance of your system so don't do it.
Yes you can increase the SQL buffer but it's always ultimately at the expense of other more critical uses for memory like caching Indexes or Data. Moral, use parametrized queries pretty religiously so you can optimize your database and use more server memory for the stuff that matters...
To add some info to Maciek answer:
It is easy to alter the culture info of a .NET third party app by calling the main-function of the assembly by reflection:
using System;
using System.Globalization;
using System.Reflection;
using System.Threading;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
Assembly asm = Assembly.LoadFile(#"C:\BobbysApp.exe");
MethodInfo mi = asm.GetType("Test").GetMethod("Main");
mi.Invoke(null, null);
Console.ReadLine();
}
static Program()
{
InstallBobbyTablesCulture();
}
static void InstallBobbyTablesCulture()
{
CultureInfo bobby = (CultureInfo)CultureInfo.InvariantCulture.Clone();
bobby.DateTimeFormat.ShortDatePattern = #"yyyy-MM-dd'' OR ' '=''";
bobby.DateTimeFormat.LongTimePattern = "";
bobby.NumberFormat.NegativeSign = "1 OR 1=1 OR 1=";
Thread.CurrentThread.CurrentCulture = bobby;
}
}
}
This only works if the Main function of BobbysApp is public. If Main is not public, there might be other public functions you might call.
In my opinion if you can guarantee that the parameter you working with will never contain a string it is safe but I would not do it in any case. Also, you will see a slight performance drop due to the fact that you are performing concatenation. The question I would ask you is why don't you want to use parameters?
It is ok but never safe.. and the security always depend on the inputs, for example if the input object is TextBox, the attackers can do something tricky since the textbox can accept string, so you have to put some kind of validation/conversion to be able prevent users the wrong input. But the thing is, it is not safe. As simply as that.
No you can get an SQL injection attack that way. I have written an old article in Turkish which shows how here. Article example in PHP and MySQL but concept works same in C# and SQL Server.
Basically you attack following way. Lets consider you have a page which shows information according to integer id value. You do not parametrized this in value, like below.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=24
Okay, I assume you are using MySQL and I attack following way.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=ASCII((SELECT%20DATABASE()))
Note that here injected value is not string. We are changing char value to int using ASCII function. You can accomplish same thing in SQL Server using "CAST(YourVarcharCol AS INT)".
After that I use length and substring functions to find about your database name.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=LEN((SELECT%20DATABASE()))
http://localhost/sqlEnjeksiyon//instructors.aspx?id=ASCII(SUBSTR(SELECT%20DATABASE(),1,1))
Then using database name, you start to get table names in database.
http://localhost/sqlEnjeksiyon//instructors.aspx?id=ASCII(SUBSTR((SELECT table_name FROM INFORMATION_SCHEMA.TABLES LIMIT 1),1,1))
Of course you have to automate this process, since you only get ONE character per query. But you can easily automate it. My article shows one example in watir. Using only one page and not parameterized ID value. I can learn every table name in your database. After that I can look for important tables. It will take time but it is doable.

Using c# to generate and 100000s of records into a postgres database

I am programming a project in c# where many records are generated and need to be stored into a database. At the moment what I do (which is VERY slow) is store all of these results as a list of structs. Then at the end iterate through this struct and add all of the records to an sql query string. The issue with this is it takes ages to iterate through a list when it contains 100000s of items. A similar size inserts needs to be performed several times in the simulation. I've considered just storing the string from the off and rather then storing the records in a list put it into the string directly. Also perhaps storing them in a temporary file and using sql copy. I don't really have much experience with dealing with this amount of data so your feedback will be appreciated.
Thanks in advance
What you should try is populating a file with your data then using the built in COPY command. This is the recommended method of populating a database.
http://www.postgresql.org/docs/8.3/interactive/sql-copy.html
When building the CSV temp file take care to follow the CSV spec. If your column data contains new lines (\n \r), commas (,) or quotes (") then escape the quotes (") with quotes
data=data.Replace("\"", "\"\"");
and surround the data with quotes
data="\""+data+"\"";
Something like
public String CSVEscape(String columnData)
{
if(columnData.Contains("\n") || columnData.Contains("\r") || columnData.Contains("\"") || columnData.Contains(","))
{
return "\"" + columnData.Replace("\"", "\"\"") + "\"";
}
return columnData;
}
If I'm reading your question correctly, you're sending the PostgreSQL server a string that looks something like this:
INSERT INTO mytable (x, y, z) VALUES (1,2,3), (4,5,6), ...
What you should do instead is
start a transaction
prepare the statement INSERT INTO mytable (x, y, z) VALUES ($1, $2, $3)
for each struct in your list, execute the prepared statement with the
appropriate fields
commit the transaction
(Sorry, no code, because I don't know C#'s DB APIs.)
I wouldn't bother figuring out COPY IN unless the approach I described above is still way too slow. I get nervous when inserting data into a database requires any sort of text munging on my part.
If you have a low performance by using OOP approach, so using structs/ classes, first thing to do, is to measure and optimize code as much as possible.
If the performance even after optimization not good in your specific context, I would leave OOP approach and pass to raw SQL.
One of solution can be, like you said in post, during generation of string for every single entity, immediately add it to big file, where at the end of generation you will find complete huge SQL string. The problem here is testability of solution.
But, you know, somewhere you need to "pay". You can not have a comfott and a performance contemporary on such scale.

How YOU parse user input?

Consider the following scenario:
http://www.yourdomain.com/Default.aspx?p=2
Now we ofcourse want to check if the querystring parameter p doesnt contain errors.
I now have this setup:
1) Check if p exists
2) Filter out html from p's value
3) htmlencode p's value
4) check if p is integer
5) check if p's integer exists in db
This is how I usual do it, though step 5 is ofcourse a performance hit.
Kind regards,
Mark
My view: Generally a querystring parameter of this kind isn't really "entered" by users but is submitted as a link. So over-complex slow validation isn't really necessary.
So I would just pass this through to the persistence / data layer and handle any errors that come back as a regular 404 Not Found or 500 Internal Server Error depending on the kind of system I'm working with.
If your intent is to use the parameter to retrieve something from the database, why filter out html or encode it? It's not like you're going to store it in the database, or display it on the front end. Just immediately throw it to the DAL if it exists. You're DAL should be smart enough to tell you if it failed to retrieve a record with that ID, or if the ID couldn't be parsed, etc..
If you are going to convert the input to an integer anyway, then steps 2 and 3 are not needed - just use int.TryParse to see what you have. I would encode and test the input for html only if you are expecting a string which you will use in a dynamic sql statement, or will be displaying on your site
What about:
int p = 0;
if(!Int32.TryParse(Request.QueryString["p"], out p))
throw new ArgumentOutOfRangeException("p");
Quite simple. For most data types (integers, decimals, doubles, dates and booleans) there is a very strict format. If the value does not parse under the strict format, it's an error.
Strings sometimes have a strict format, like an email address or a phone number. Those can be validated with a simple regexp. If it conforms, use it, otherwise it's an error.
Most of the time however strings will simply need to be persisted to the DB and later displayed again. In that case no processing is needed, aside from escaping when inserting into DB (unnecessary as well if you used parametrized queries)k, and HTML-encoding when rendering to the display.
This way any and all data is validated, and there is no risk of any injections whatsoever.
The rare exception of a loose format for a string is, well... rare. I can't think of any right now. For that you can afford some more extensive parsing and processing.
Added: Oh, yes, checking whether IDs (or other values) are valid in respect to a DB. You're doing it right, but think if you always need it. Quite often you can put the check into some other query that you have to do anyway. Like when you select data based on the ID, you don't need to explicitly check that it exists - just be ready that your query can return no data.
Sometimes you don't need to use the value at all, then you can simply ignore it.
But, of course, there are other times, like when inserting/updating data, that you indeed need to explicitly check whether the data exists and is valid in the current context.

Categories

Resources