Get dependencies from SQL query - c#

I'm trying to do some SQL query validation programatically in C# (without invoking the actual database). Essentially, I'd like a user to be able to enter a view, UDF, or SP and have its dependencies validated immediately. The user would be entering these into a custom tool for defining database objects.
Thus, if a user entered:
CREATE VIEW someView AS SELECT name, address FROM users
I could pull out the dependency of "users" and then check against my database object collections that are stored in memory (e.g., Tables, Views, etc...) to make sure that dependency exists in one of them. Keep in mind the actual views/UDFs/SPs entered into my custom app are very complex and parsing them myself is not desirable.
I'm currently trying to do this using Microsoft.Data.Schema.ScriptDom.Sql.TSql100Parser. This provides a parse method which returns a DOM representation of the query. However, this is a terribly complex DOM and I'm essentially having to write an entire parser just for it.
Any ideas/suggestions? Thanks!

There is a proc that you can use named sp_depends...however it is not failsafe because of deferred name resolution. There really is no safe way to do this, it has gotten a little better in the latest versions but still a pain in the neck
read this Do you depend on sp_depends (no pun intended) to see what I mean

Related

Is it a good practice to allow users to embed raw sql queries in an application?

I tried my best to come up with an appropriate title, excuses if it doesn't make much sense. I hope to explain it better below. We have an application that is based on .Net framework and uses SQL as data storage. As with any application, this application need to support extensibility, for example to support additional data transformations & validations.
Example: Think of the application as a tool that provides a set of input tables the user can import data into using access or excel (there is UI & grids already) to allow data import. Once the data is imported the tool creates an intermediate model and performs some calculations on the input data and later throws the results in pre-defined format. The input table schema and intermediate model schema is fixed, no changes will happen. The extensibility is needed in the stage where the intermediate model is derived from the input data. Allow users to be able to change the way the data gets derived, eg instead of grouping by one field allow grouping by multiple fields etc.
To support this kind of flexibility, I see there are two basic approaches as I could see
Option 1: Create a business model that maps to the sql data model and expose the business model to the user to allow them to override transformations & create new transformations (e.g., using LINQ or plain C#)
Option 2: Expose the entire sql data model and allow users to embed raw SQL queries to perform the transformations & validations.
My personal preference is to use Option 1, since I am not a big fan of allowing users to play with underlying data tables directly. I prefer more controlled access. However, this approach requires the user to have programming language (C# or VB). On the other hand Option 2, may just need someone who has knowledge of SQL programming to create raw queries and directly plug them into the application. But, I think this is a bad approach.
The product management team is inclined to go with Option 2, as they feel it is more flexible and easy to implement from resource point of view.
So, I am trying to come up with pros and cons of both approach to better support my inclination for Option 1. Basically, inclination to do the work in programming language in C# or VB .net rather than just using plain SQL queries.
Kindly, share your thoughts and opinions.
Neither option is a good one.
End users are (theoretically) proficient in the business logic - they shouldn't need to be proficient in a programming language to do their job.
What you should do is create a framework for extending things using standard controls - combo boxes, text input, grids, etc. It's hard to give you specific advice without specifics of what kind of extensibility you're looking for, but I'll give you an example from my projects:
Our users need a way to add arbitrary tags to products, for filtering on our website. We created a data grid where they could type in the name of a type, specify whether it's a "True/False", integer, decimal, or a value off of a list, and then set the items for said list. Then, on each product, they are presented with the list of all the applicable types and they need to fill in values - the "True/False" produces a checkbox, the integer and decimal produce text fields which validate, and the list produces a combo box of all the options they specified. Whenever they want a new property, they can go add it themselves, but they don't need to think about how those properties work at all, because the website operates based on the type.
Ok, based on your example, I would suggest this:
Provide a form which lists every column of the data and has a combo box next to it to specify what action to take. The actions can be chosen from a list which contains things like:
Ignore data
Use data as-is
Format data
Look up value elsewhere
Perform calculation
Based on the user's choice in this dropdown, you would:
Not include the column in the output
Use the data as-is.
Provide a button to open a form where they can format the data (probably with a subset of the String.Format options) based on the type of the data. You'd show a key on the bottom to show what values are supported.
Provide a button to specify what "elsewhere" is. This is probably where extensibility will be most useful, so I'll address it again below
Provide a way to enter an appropriate calculation.
As for the "elsewhere" lookup, this would probably be mostly listings of values and strings to transform it with. Those values can be specified in a configuration section of the application, and stored in a table somewhere. You can create arbitrary groupings of them to represent various "lists" of options. Alternatively, you could list some existing data tables which you would want the users to reference, and let them specify a transformation (using the calculation screen) to convert their value to a lookup on the other table.
Is this making sense?

Returning Good Errors from UpdateDataSet

EDIT: Solution (kind of)
So, what I did had very little in common with what I originally wanted to do, but my application now works much faster (DataSets that took upward of 15 minutes to process now go through in 30-40 seconds tops). Here's roughly what I did:
- Read spreadsheet & populate DataTable/DataSet normally
- [HACK WARNING] Instead of using UpdateDataSet, I generate my own SQL queries, mostly by having a skeleton string for each type of update (e.g. String skeleton = "UPDATE ... SET ... WHERE ..."). I then consult the template database and replace the placeholder ... with the appropriate entries.
- [MORE HACK WARNING] The way I dealt with errors was by manually checking whether those errors will occur. So if I know I am about to do an insert, I'll run an error-checking command before the actual insert; what the error checker will do is construct a JOIN statement, checking whether any of the entries in the user's DataSet already exist in the database. Just by executing the JOIN command, I get back a DataSet with the results, so I know that if there is anything there, it's the errors. Then I can proceed to print them.
If anyone needs more details, I'll be happy to provide them. It's a fairly specific question, so I should probably keep this outline fairly high level.
Original Question
For (good) reasons outside of my control, I need to use the Database.UpdateDataSet() method from Microsoft's Enterprise Library. The way my project will work, I am letting the user make changes to the database (multiple database, multiple schemas, multiple tables, but always only one at a time) by uploading Excel spreadsheets to a web application. The spreadsheets follow a design/template specified by me (usually). I am a state where I read the spreadsheet, turn it into a DataTable/DataSet, and use (dynamically generated) prepared statements to make the appropriate changes to the database. Here's the problem:
Each spreadsheet only allows for one type of change (insert/update/delete). I want to make it so if the user uploads an insert spreadsheet, but several (let's say 10) of the entries are already in the database, I not only return with an error, but also tell them which entries (DataRows) violated the primary key constraint.
The idea solution would be get a DataSet with the list of errors back, but I don't see how I can do that. Perhaps there is a way to construct the prepared statements in such a way that if a DataRow is to be inserted (following the example from above), it proceeds normally; however if it attempts to update or delete, it skips it and adds it to an error collection of some sort?
Note that I am trying to avoid using stored procedures. Since the number of different templates will grow extremely quickly after deployment, it is important that I stay away from manually written code and close to database-driven model as much as possible.

How do I handle user submitted data to ensure best practice programming?

I would like to know what are the best practice programming tasks in relation to users submitting data through a web form to a website.
I am particularly interested in any C# or VB.NET commands that should be used through out the process from the moment the user hits the submit button until the data hits the database.
I have been reading about reasons why you may want to take precautions such as SQL injections etc.
Avoiding SQL injections is quite simple - just use parameterized queries, or an ORM such as LINQ to SQL or nHibernate (which all use parameters under the hood). The library takes care of everything for you, and has been thoroughly vetted.
After that, you're safe until it's time to write the data back out to other users. You always want to store the data as close to the original user input as possible. Another way to say this is - don't store a scrubbed version (unless you also store the original alongside it). Scrubbing is a one-way process - it destroys information. It's always easy to scrub again if you need to, but you can't un-scrub something.
However, storing the original format means you do need to make sure you encode the output before you write it to the browser. This prevents users from putting malicious cross-site scripts and other things into your data that might be rendered on other users' pages.
At the highest level, just keep in mind that all the work should be done as late as possible. Be liberal in what you accept (do only what is necessary to protect yourself) and strict in what you send (encode everything, scrub the hell out of it, transform it, etc). You want to have a "pure" copy which is altered to conform to the target output.
If you are serious about it read this book: 19 Deadly Sins of Software Security
Using linq2sql you get protection from SQL injection. Alternatively use .Parameters with parametrized queries.
When you send the data back on the page, you have to prevent the data from running js by encoding it. Use http://msdn.microsoft.com/en-us/library/w3te6wfz.aspx
And overall consider any of use of that data a chance for attack and look for ways to prevent it. For example, using user data as a filename to access/save something can mean access to unintended resources (by adding ..\).
You can't go wrong with the following general rules
Validate everywhere! Where you validate determines the quality of the user experience. The closer to the user, the less safe it is but more responsive. The farther away, the safer but tends to give worse error messages.
Validate at the front-end to give the user a responsive error.
Validate in the middle to give the user nicer error messages.
Validate in the database (constraints and such) to keep your database sane.
Use parameters early, and use them often! Find those square pegs early.
Coerce data into the correct types as quickly as possible. (This is a form of validation.) If something is an int, don't handle it like a string.
Don't throw away errors when checking parameters. If your regex doesn't match, or your try { parse } catch { } gets triggered it's important you know why and don't continue!
Whether you use LINQ or roll-your-own SQL: do not build SQL statements with user-supplied data. EVER. Use parameterized queries and/or stored procedure calls. If you must piece-together SQL as strings, don't do it with user data. Get the "untrustworthy" data stored and manipulate it as needed later, in a separate query.
Encode all data passed to the user. The bad data may not be their fault, don't trash their world.
Assume that anything they pass you is full of JavaScript and HTML. Assume that "binary" data will find its way in. Someone will run your web page on something other than a "browser" eventually. Your "phone number" field will be used to store an .EXE.
Return all data encoded and harmless. Don't assume that "because it's in the database" (or that it's an int, or that it's just a 1 character string) that it's harmless.
Assume that eventually your database will fail you somehow. A developer will drop in "test" data, you'll miss an edge case above, or something may run amok and insert all-purpose crap. This crap has to be passed to the user safely.
Nobody's perfect: especially you. Plan for that.
While pretty much all of the guidelines on the Open Web Application Security Project (OWASP) site are useful, here are their guidelines on data validation.

Why is the asp.NET profile designed in such a horrible way?

In the current project I'm working on, we are using the asp.NET profile to store information about users, such as their involvment in a mailing list.
Now, in order to get a list of all the users in the mailing list, I cannot simply do a database query, as the asp.NET profile table is, simply put, awful.
For those who do not know, the profile table has two main columns, the 'keys' column, and 'values' column, and they are organised as so:
Keys:
Key1:dataType:startIndex:endIndex:key2:dataType . . etc.
Values:
value1value2value3...
This is pretty much impossible to query using SQL, so the only option to find users that have a specific property is to load up a list of ALL the users and loop through it.
In a site with over 150k members, this is understandably very slow!
Are there specific reasons why the Profile was designed like this, or is it just a terrible way of doing dynamically-generated data?
I agree that it's a pretty bad way to store profile data, but I suspect the use case was just to get the profile data for a user with a single query but in such a way that it can be extended to handle any number of different profile properties. If you don't like it, you can always write your own, custom profile provider that separates each value out into its own column. Having implemented various membership and role providers, I don't think that this would be too complicated a task. The number of methods doesn't look too large.
The whole point of the Provider model is that it abstracts away the data source. The idea is that, as a developer, you don't need to know how the data is stored or in what format - you just have a common set of methods for accessing it. This means you can swap providers without changing a single line of code. It also means that you specifically do not try and access data direct from the data source (eg. going straight to the database) by bypassing the provider methods - that defeats the whole point.
The default ASP.NET profile provider is actually very powerful, as it can not only store simple value types (strings, ints etc.) but it can also store complex objects and entire collections in a single field. Try doing that in a relational database! However, the downside of this generic-ism is that it comes at a cost of efficiency. Which is why, if you have a specific need, then you are supposed to implement your own provider. For example, see SearchableSqlProfileProvider - The Searchable SQL Profile Provider.
Of course, your third option is to simple not use the profile provider - nobody is forcing you to! You could implement your own classes/database entirely, as you would have had to do in other frameworks.
I have implemented various custom providers (membership/Sitemap/Roles etc) and havent really looked at the ASP.NET Profile Provider after seeing that kind of thing (Name/Value pairs or XML data). I am not sure, but I think the Profile is primary created for User Preferences/Settings where the settings are only required for a specific user, I dont think the Profile is meant for User "Data" that can be queried?
Note: This is an assumtion based on what I think I know, please comment on this otherwise.

How to prevent Sql-Injection on User-Generated Sql Queries

I have a project (private, ASP.net website, password protected with https) where one of the requirements is that the user be able to enter Sql queries that will directly query the database. I need to be able to allow these queries, while preventing them from doing damage to the database itself, and from accessing or updating data that they shouldn't be able to access/update.
I have come up with the following rules for implementation:
Use a db user that only has permission for Select Table/View and Update Table (thus any other commands like drop/alter/truncate/insert/delete will just not run).
Verify that the statement begins with the words "Select" or "Update"
Verify (using Regex) that there are no instances of semi-colons in the statement that are not surrounded by single-quotes, white space and letters. (The thought here is that the only way that they could include a second query would be to end the first with a semi-colon that is not part of an input string).
Verify (using Regex) that the user has permission to access the tables being queried/updated, included in joins, etc. This includes any subqueries. (Part of the way that this will be accomplished is that the user will be using a set of table names that do not actually exist in the database, part of the query parsing will be to substitute in the correct corresponding table names into the query).
Am I missing anything?
The goal is that the users be able to query/update tables to which they have access in any way that they see fit, and to prevent any accidental or malicious attempts to damage the db. (And since a requirement is that the user generate the sql, I have no way to parametrize the query or sanitize it using any built-in tools that I know of).
This is a bad idea, and not just from an injection-prevention perspective. It's really easy for a user that doesn't know any better to accidentally run a query that will hog all your database resources (memory/cpu), effectively resulting in a denial of service attack.
If you must allow this, it's best to keep a completely separate server for these queries, and use replication to keep it pretty close to an exact mirror of your production system. Of course, that won't work with your UPDATE requirement.
But I want to say again: this just won't work. You can't protect your database if users can run ad hoc queries.
what about this stuff, just imagine the select is an EXEC
select convert(varchar(50),0x64726F70207461626C652061)
My gut reaction is that you should focus on setting the account privileges and grants as tightly as possible. Look at your RDBMS security documentation thoroughly, there may well be features you are not familiar with that would prove helpful (e.g. Oracle's Virtual Private Database, I believe, may be useful in this kind of scenario).
In particular, your idea to "Verify (using Regex) that the user has permission to access the tables being queried/updated, included in joins, etc." sounds like you would be trying to re-implement security functionality already built into the database.
Well, you already have enough people telling you "dont' do this", so if they aren't able to dissuade you, here are some ideas:
INCLUDE the Good, Don't try to EXCLUDE the bad
(I think the proper terminology is Whitelisting vs Blacklisting )
By that, I mean don't look for evil or invalid stuff to toss out (there are too many ways it could be written or disguised), instead look for valid stuff to include and toss out everything else.
You already mentioned in another comment that you are looking for a list of user-friendly table names, and substituting the actual schema table names. This is what I'm talking about--if you are going to do this, then do it with field names, too.
I'm still leaning toward a graphical UI of some sort, though: select tables to view here, select fields you want to see here, use some drop-downs to build a where clause, etc. A pain, but still probably easier.
What you're missing is the ingenuity of an attacker finding holes in your application.
I can virtually guarantee you that you won't be able to close all the holes if you allow this. There might even be bugs in the database engine you don't know about but they do that allows an SQL statement you deem safe to wreck havoc in your system.
In short: This is a monumentally bad idea!
As the others indicate, letting end-users do this is not a good idea. I suspect the requirement isn't really that the user really needs ad-hoc SQL, but rather a way to get and update data in ways not initially forseen. To allow queries, do as Joel suggests and keep a "read only" database, but use a reporting application such as Microsoft Reporting Services or Data Dynamics Active reports to allow users to design and run ad-hoc reports. Both I believe have ways to present users with a filtered view on "their" data.
For the updates, it is more tricky- I don't know of existing tools to do this. One option may be to design your application so that developers can quickly write plugins to expose new forms for updating data. The plugin would need to expose a UI form, code for checking that the current user can execute it, and code for executing it. Your application would load all plugins and expose the forms that a user has access to.
Event seemingly secure technology like Dynamic LINQ, is not safe from code injection issues and you are talking about providing low-level access.
No matter how hard you sanitize queries and tune permissions, it probably will still be possible to freeze your DB by sending over some CPU-intensive query.
So one of the "protection options" is to show up a message box telling that all queries accessing restricted objects or causing bad side-effects will be logged against user's account and reported to the admins immediately.
Another option - just try to look for a better alternative (i.e. if you really need to process & update data, why not expose API to do this safely?)
One (maybe overkill) option could be use a compiler for a reduced SQL language. Something like using JavaCC with a modified SQL grammar that only allows SELECT statements, then you might receive the query, compile it and if it compiles you can run it.
For C# i know Irony but never used it.
You can do a huge amount of damage with an update statement.
I had a project similar to this, and our solution was to walk the user through a very annoying wizard allowing them to make the choices, but the query itself is constructed behind the scenes by the application code. Very laborious to create, but at least we were in control of the code that finally executed.
The question is, do you trust your users? If your users have had to log into the system, you are using HTTPS & taken precautions against XSS attacks then SQL Injection is a smaller issue. Running the queries under a restricted account ought to be enough if you trust the legitimate users. For years I've been running MyLittleAdmin on the web and have yet to have a problem.
If you run under a properly restricted SQL Account select convert(varchar(50),0x64726F70207461626C652061) won't get very far and you can defend against resource hogging queries by setting a short timeout on your database requests. People could still do incorrect updates, but then that just comes back to do you trust your users?
You are always taking a managed risk attaching any database to the web, but then that's what backups are for.
If they don't have to perform really advanced queries you could provide a ui that only allows certain choices, like a drop down list with "update,delete,select" then the next ddl would automatically populate with a list of available tables etc.. similar to query builder in sql management studio.
Then in your server side code you would convert these groups of ui elements into sql statements and use a parametrized query to stop malicious content
This is a terribly bad practice. I would create a handful of stored procedures to handle everything you'd want to do, even the more advanced queries. Present them to the user, let them pick the one they want, and pass your parameters.
The answer above mine is also extremely good.
Although I agree with Joel Coehoorn and SQLMenace, some of us do have "requirements". Instead of having them send ad Hoc queries, why not create a visual query builder, like the ones found in the MS sample applications found at asp.net, or try this link.
I am not against the points made by Joel. He is correct. Having users (remember we are talking users here, they could care less about what you want to enforce) throw queries is like an app without a "Business Logic Layer", not to mention the additional questions to be answered when certain results does not match other supporting application results.
here is another example
the hacker doesn't need to know the real table name, he/she can run undocumented procs like this
sp_msforeachtable 'print ''?'''
just instead of print it will be drop
Plenty of answers saying that it's a bad idea but somethimes that's what the requirements insist on. There is one gotcha that I haven't spotted mentioned in the "If you have to do it anyway" suggestions though:
Make sure that any update statements include a WHERE clause. It's all too easy to run
UPDATE ImportantTable
SET VitalColumn = NULL
and miss out the important
WHERE UserID = #USER_NAME
If an update is required across the whole table then it's easy enough to add
WHERE 1 = 1
Requiring the where clause doesn't stop a malicious user from doing bad things but it should reduce accidental whole table changes.

Categories

Resources