How to handle invalid user input table name

How to handle invalid user input table name - c#

I am writing a C# WinForms program which includes a user input textbox, the value of which will be used to create a table. I have been thinking about what the best way to handle invalid T-SQL table names is (though this can be extended to many other situations). Currently the only method I can think of would be to check the input string for any violations of valid table names individually, though this seems long winded and could be prone to missing certain characters for example due to my own ignorance of what is a violation and what is not.
I feel like there should be a better way of doing this but have been unable to find anything in my search so far. Can anyone help point me in the right direction?

As told you in a comment already you should not do this...
You might use something like this
USE master;
GO
CREATE DATABASE dbTest;
GO
USE dbTest;
GO
CREATE TABLE UserTables(ID INT IDENTITY CONSTRAINT PK_UserTables PRIMARY KEY
,UserInput NVARCHAR(500) NOT NULL CONSTRAINT UQ_UserInput UNIQUE);
GO
INSERT INTO UserTables VALUES(N'blah')
,(N'invalid !%$& <<& >< $')
,(N'silly 💖');
GO
SELECT * FROM UserTables;
/*
ID UserInput
1 blah
2 invalid !%$& <<& >< $
3 silly 💖
*/
GO
USE master;
GO
DROP DATABASE dbTest;
GO
You would then create your tables as Table1, Table2 and so on.
Whenever a user enters his string, you visit the table, pick the ID and create the table's name by concatenating the word Table with the ID.
There are better approaches!
But you should think of a fix schema. You will have to define columns (how many, which type, how to name them?). You will feel in hell when you have to query this. Nothing to rely on...
One approach is a classical n:m mapping
A User table (UserID, Name, ...)
A test table (TestID, TestName, TestType, ...)
The mapping table (ID, UserID, TestID, Result VARCHAR(MAX))
Depending on what you need you might add a table
question table (QuestionID, QuestionText ...)
Then use a mapping to bind questions to tests and another mapping to bind answers to such mapped questions.
another approach was to store the result as a generic container (XML or JSON). This keeps your tables slim, but needs to knwo the XML's structure in order to query it.
Many ways to skin a rabbit...
UPDATE
You ask for an explanation...
The main advantage of a relational database is the pre-known structure.
Precompiled queries, cached results, statisics, indexes demand for known structures.
Data integrity is ensured with constraints, foreign keys and so on. All this demands for known names, known types(!) and known relations.
User-specific table names, and even worse: generically defined structures, do not allow for any join, or other typical RDBMS operation. The only approach is to create each and any statement dynamically (string building)
The rule of thumb is: Whenever you think to have to create several objects of for the same, but with different names you should think about the design. It is bad to store Phone1, Phone2 and Phone3. It is better to have a side table, with a UserID and a Phone column (classical 1:n). It is bad to have SalesCustomerA, SalesCustomerB, better use a Customer table and bind its ID into a general Sales table as FK.
You see what I mean? What belongs together should live in one single table. if you need separation add columns to your table and use them for grouping and filtering.
Just imagine you want to do some statistical evaluation of all your user test tables. How would you gather the data into one big pool, if you cannot rely on some structure in common?
I hope this makes it clear...
If you still wnat to stick to your idea, you should give my code sample a try. this allows to map any silly string to a secure and easy to handle table name.

Lots can go wrong with users entering table names. A bunch of whacked out names is maintenance nightmare. A user should not even be aware of table name. It is a security risk as now the program has to have database owner authority. You want to limit users to minimum authority.
Similar to Shnugo but with composite primary key. This will prevent duplicate userID, testID.
user
ID int identity PK
varchar(100) fName
varchar(100) lName
test
ID int identity PK
varchar(100) name
userTest
int userID PK FK to User
int testID PK FK to Test
int score
select t.Name, u.Name, ut.score
from userTest ut
join Test t
on t.ID = ut.testID
join User u
on u.ID = ut.userID
order by t.Name, u.Name

Related

User should be able to reset autoincrement (identity) column: possible solutions

In the app there should be a functionality for the user to reset orderNumber whenever needed. We are using SQL Server for db, .NET Core, Entity Framework, etc. I was wondering what is the most elegant way to achieve this?
Thought about making orderNumber int, identity(1,1), and I've searched for DBCC CHECKIDENT('tableName', RESEED, 0), but the latter introduces some permissions concerns (the user has to own the schema, be sysadmin, etc.).
EDIT: orderNumber is NOT a primary key, and duplicate values are not the problem. We should just let the user (once a year probably) reset the numbering of their orders to start from 1 again..
Any advice?

An identity column is used to auto-generate incremental values, so if you're relying on this column as the primary key or some unique identifer for rows, updating this can cause issues with duplicates.
It's difficult to recommend the best solution without knowing more about your use case, but I would consider (1) if this orderNumber should be the PK or would some surrogate key like (customerId, locationId, date) makes sense and allows you to more freely update orderNumber without impacts on data integrity, or (2) if keeping orderNumber as an identity make sense, but you could build a data model or table that maps multiple rows in this table to the same "order" allowing you to maintain the key on this base table.

It seems that orderNumber is a business layer concern - therefore I recommend a non-SQL solution. You need C# code that generates the number for storage in your "Order" entity. I wouldn't use IDENTITY() to implement/solve this.
The customer isn't going to reset anything in the DB, your code will do this. You need a "take a number" service in your business layer and a place in the UI to reset it (presumable Per Customer).
Sql Server has Sequence. My only concern regarding using it is partitioning per customer (an assumed requirement). Will you have multiple customers? If so, you probably can't have a single number generator. Hence why I suggest a C# implementation (sure, you'll want to save the state as numbers are handed out).

Identity should not be used in the way you're suggesting. Presumably you don't want a customer to get two different orders with the same order number (i.e., order number is unique within customer). If you don't care if customers get discontinuous order numbers, then you can use a sequence, but if you want continuous order numbers, then you would need to create an separate sequence for each customer, which is not a good solution either. I suggest you set the order number to max([order number]) over(partition by [customer id]) + 1 on the insert. That will automatically give you the next order number for a particular customer.

Database displaying/allowing different users certain information based on their role or ID

Im needing a system as such - they have sales, project management and service management. Each different type of employee should be able to see all information from other employees but only be able to modify the specific information for their job (example is sales cant modify the project management information but can view it). To twist this up a little more, they have multiple locations and only those locations can access data specific to that location.
Since I am most versed in C# I will be using this as the language to tie it all in together.
I have a few different ideas on how to do this and have no idea what would be the best solution to get this all done.
My current idea is for the database...
CREATE TABLE Employee
(
Employee_ID INT, --PK
Employee_Role INT, --FK
Location_ID INT --FK
);
Then, each query in the code would be specific to their Employee_Role and Location_ID. I am assuming I would manually create each user (they only have about 15 employees), assign them the values and then have the back-end code tie the queries together.
Difficulty with this is that these system queries are going to get pretty nasty (at least for me, I am not a DBA and have never done something like this).
So for example, if I wanted to display.. lets say Prospective_Clients Table for a specific location I would...and this is roughly my idea
SELECT *
FROM Prospective_Clients
INNER JOIN Employee on Prospective_Clients.Location_ID = Employee.Location_ID;
Or something to that effect.. And then to update or insert data, the query would contain in insert statements that Employee_Role has to equal whatever value I assign it. Of course them poses the questions...
Would each time an employee insert client information they have to add the location_id as well? I imagine I can just have a drop down they can select that the location name will equal the location_ID... but then there is room for error on the users parts. Or I suppose I could have it set up in the query that the clients Location_ID will equal whatever the employees Location_ID is.
Am I going about to right way doing this? Is this all needing to be controlled in the back-end code? Or is there something on the database end that is easier to do this. What I mean by this question is this making sense to anyone the way I am thinking or should I just go find a DBA to hire.

Use a separate table for permission. also make permission based on role not each employee. that's more conformable for end user and developer.
CREATE TABLE Role_Permission
(
RoleID int, -- fk
LocationID int, -- fk
CanAdd bit,
CanRead bit,
CanUpdate bit ,
CanDelete bit ,
Active bit
)

Should column names be unique across all tables?

In most of my databases that I have created I've always named my column names by pre-appending the table name. For example:
Person Table
- PersonID
- PersonName
- PersonSurName
- PersonTimestamp
Customer Table
- CustomerID
- CustomerName
- CustomerSurName
- CustomerTimestamp
As opposed to having
Person Table
- ID
- Name
- SurName
- Timestamp
Customer Table
- ID
- Name
- SurName
- Timestamp
But I was wondering if it's really the best, most convenient and self explaining later down the road. Maybe some columns like timestamp should be better left as Timestamp in all tables? Are there any general good-practice about it? I'm using this databases in C# / WinForms.

I don't like either example. I would prefer:
Person Table
- PersonID -- not ID, since this is likely to be referenced in other tables
- FirstName -- why is a first name just a "name"?
- LastName -- why not use the form more common than surname?
- ModifiedDate -- what is a "persontimestamp"?
I am adamantly against primary keys, which will occur in other tables in the model, to be named with generic things like "ID" - I don't ever want to see ON p.ID = SomeOtherTable.PersonID - an entity that is common to more than one table in the model should be named consistently throughout the model. Other aspects like FirstName belong only to that table - even if another table has a FirstName, it's not the same "entity" so to speak. And even if you ever have joins between these two tables for whatever reason, you're always going to be differentiating between them as Person.FirstName and Customer.FirstName - so adding the Person or Customer prefix is just redundant and annoying for anyone who has to write such a query.
Also Timestamp is a horrible name for a column (or as a column suffix) in a SQL Server model because TIMESTAMP is a data type that has nothing to do with date or time, and the name might imply other usage to your peers.
Of course all of this is quite subjective. It's like asking 100 people what car you should drive. You're going to get a handful of answers, and now it'll be up to you to wade through the crap and figure out what makes sense for you, your team and your environment. :-)

In my experience the more standard naming convention for columns is not to include the table name for the following reasons:
It is unnecessary repetition
It is harder to maintain because if the table name changes in the future all the columns would need renaming.
The convention may not be clear to another implementer further down the line
When you perform a query you can always alias the columns at that point if you need the table name in them.
I would only use the table name in a column if the column is a foreign key to another table. If you use this convention it makes it relatively easy to identify the foreign key columns within a table without the use of a relational diagram.

Uggghhh, I guess i'm just too lazy so i would do Department.ID instead of Department.DepartmentID. I have enough redundant work enough as it is.

Your first example is better in my opinion. (The second one is confusing when you build queries and often requires the AS sql keyword)
However in my shop we use a little different convention.
PrimaryKey - this column should be named starting with ID and followed by tablename (IDPerson)
ForeingKey - this columns should be named starting with ID and followed by the name of the external table (IDDepartment)
OtherColumns - they should have a meaningful name for the data contained. It's required to repeat the tablename only for those fields that will happen to have the same name in different tables.
When using parameters to call stored procedure you should reverse the convention (#personID, #departmentID)

Some columns might be repeated or have the same value, I generally use the same name in that case for timestamps etc. I work with financial data and would like to name the columns as the stocks name itself, rather than giving different name in different tables.

I never see same rule between differente vendor. I do not know if there is some standard, but I think no!
Personally, I like your second option. Repeate the table name in the column name, is an unnecessary repetition.

I like to try to keep column names unique throughout my database. I do that by using prefixes for tables and columns so that even foreign keys are uniquely named. This makes doing joins simpler as I (usually) don't need to reference the tables in the joins:
--------------------
pe_people
--------------------
pe_personID (PK)
pe_firstName
pe_lastName
pe_timeStamp
--------------------
--------------------
ac_accounts
--------------------
ac_accountID (PK)
ac_personID (FK)
ac_accountName
ac_accountBalance
--------------------
SELECT pe_firstName, pe_lastName, pe_accountName, pe_accountBalance
FROM pe_people
INNER JOIN ac_accounts ON (ac_personID = pe_personID)
WHERE pe_timeStamp > '2016-01-01';

SQL Server - formatted identity column

I would like to have a primary key column in a table that is formatted as FOO-BAR-[identity number], for example:
FOO-BAR-1
FOO-BAR-2
FOO-BAR-3
FOO-BAR-4
FOO-BAR-5
Can SQL Server do this? Or do I have to use C# to manage the sequence? If that's the case, how can I get the next [identity number] part using EntityFramwork?
Thanks
EDIT:
I needed to do this is because this column represents a unique identifier of a notice send out to customers.
FOO will be a constant string
BAR will be different depending on the type of the notice (either Detection, Warning or Enforcement)
So is it better to have just an int identity column and append the values in Business Logic Layer in C#?

If you want this 'composited' field in your reports, I propose you to:
Use INT IDENTITY field as PK in table
Create view for this table. In this view you can additionally generate the field that you want using your strings and types.
Use this view in your repoorts.
But I still think, that there is BIG problem with DB design. I hope you'll try to redesign using normalization.

You can set anything as the PK in a table. But in this instance I would set IDENTITY to just an auto-incrementing int and manually be appending FOO-BAR- to it in the SQL, BLL, or UI depending on why it's being used. If there is a business reason for FOO and BAR then you should also set these as values in your DB row. You can then create a key in the DB between the two three columns depending on why your actually using the values.
But IMO I really don't think there is ever a real reason to concatenate an ID in such a fashion and store it as such in the DB. But then again I really only use an int as my ID's.

Another option would be to use what an old team I used to be on called a codes and value table. We didn't use it for precisely this (we used it in lieu of auto-incrementing identities to prevent environment mismatches for some key tables), but what you could do is this:
Create a table that has a row for each of your categories. Two (or more) columns in the row - minimum of category name and next number.
When you insert a record in the other table, you'll run a stored proc to get the next available identity number for that category, increment the number in the codes and values table by 1, and concatenate the category and number together for your insert.
However, if you're main table is a high-volume table with lots of inserts, it's possible you could wind up with stuff out of sequence.
In any event, even if it's not high volume, I think you'd be better off to reexamine why you want to do this, and see if there's another, better way to do it (such as having the business layer or UI do it, as others have suggested).

It is quite possible by using computed column like this:
CREATE TABLE #test (
id INT IDENTITY UNIQUE CLUSTERED,
pk AS CONCAT('FOO-BAR-', id) PERSISTED PRIMARY KEY NONCLUSTERED,
name NVARCHAR(20)
)
INSERT INTO #test (name) VALUES (N'one'), (N'two'), (N'three')
SELECT id, pk, name FROM #test
DROP TABLE #test
Note that pk is set to NONCLUSTERED on purpose because it is of VARCHAR type, while the IDENTITY field, which will be unique anyway, is set to UNIQUE CLUSTERED.

How to fetch lots of database table records by primary key?

Using the ADO.NET MySQL Connector, what is a good way to fetch lots of records (1000+) by primary key?
I have a table with just a few small columns, and a VARCHAR(128) primary key. Currently it has about 100k entries, but this will become more in the future.
In the beginning, I thought I would use the SQL IN statement:
SELECT * FROM `table` WHERE `id` IN ('key1', 'key2', [...], 'key1000')
But with this the query could be come very long, and also I would have to manually escape quote characters in the keys etc.
Now I use a MySQL MEMORY table (tempid INT, id VARCHAR(128)) to first upload all the keys with prepared INSERT statements. Then I make a join to select all the existing keys, after which I clean up the mess in the memory table.
Is there a better way to do this?
Note: Ok maybe its not the best idea to have a string as primary key, but the question would be the same if the VARCHAR column would be a normal index.
Temporary table: So far it seems the solution is to put the data into a temporary table, and then JOIN, which is basically what I currently do (see above).

I've dealt with a similar situation in a Payroll system where the user needed to generate reports based on a selection of employees (eg. employees X,Y,Z... or employees that work in certain offices). I've built a filter window with all the employees and all the attributes that could be considered as a filter criteria, and had that window save selected employee id's in a filter table from the database. I did this because:
Generating SELECT queries with dynamically generated IN filter is just ugly and highly unpractical.
I could join that table in all my queries that needed to use the filter window.
Might not be the best solution out there but served, and still serves me very well.

If your primary keys follow some pattern, you can select where key like 'abc%'.
If you want to get out 1000 at a time, in some kind of sequence, you may want to have another int column in your data table with a clustered index. This would do the same job as your current memory table - allow you to select by int range.
What is the nature of the primary key? It is anything meaningful?

If you're concerned about performance I definitely wouldn't recommend an 'IN' clause. It's much better try do an INNER JOIN if you can.
You can either first insert all the values into a temporary table and join to that or do a sub-select. Best is to actually profile the changes and figure out what works best for you.

Why can't you consider using a Table valued parameter to push the keys in the form of a DataTable and fetch the matching records back?
Or
Simply you write a private method that can concatenate all the key codes from a provided collection and return a single string and pass that string to the query.
I think it may solve your problem.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.