Should column names be unique across all tables?

Should column names be unique across all tables? - c#

In most of my databases that I have created I've always named my column names by pre-appending the table name. For example:
Person Table
- PersonID
- PersonName
- PersonSurName
- PersonTimestamp
Customer Table
- CustomerID
- CustomerName
- CustomerSurName
- CustomerTimestamp
As opposed to having
Person Table
- ID
- Name
- SurName
- Timestamp
Customer Table
- ID
- Name
- SurName
- Timestamp
But I was wondering if it's really the best, most convenient and self explaining later down the road. Maybe some columns like timestamp should be better left as Timestamp in all tables? Are there any general good-practice about it? I'm using this databases in C# / WinForms.

I don't like either example. I would prefer:
Person Table
- PersonID -- not ID, since this is likely to be referenced in other tables
- FirstName -- why is a first name just a "name"?
- LastName -- why not use the form more common than surname?
- ModifiedDate -- what is a "persontimestamp"?
I am adamantly against primary keys, which will occur in other tables in the model, to be named with generic things like "ID" - I don't ever want to see ON p.ID = SomeOtherTable.PersonID - an entity that is common to more than one table in the model should be named consistently throughout the model. Other aspects like FirstName belong only to that table - even if another table has a FirstName, it's not the same "entity" so to speak. And even if you ever have joins between these two tables for whatever reason, you're always going to be differentiating between them as Person.FirstName and Customer.FirstName - so adding the Person or Customer prefix is just redundant and annoying for anyone who has to write such a query.
Also Timestamp is a horrible name for a column (or as a column suffix) in a SQL Server model because TIMESTAMP is a data type that has nothing to do with date or time, and the name might imply other usage to your peers.
Of course all of this is quite subjective. It's like asking 100 people what car you should drive. You're going to get a handful of answers, and now it'll be up to you to wade through the crap and figure out what makes sense for you, your team and your environment. :-)

In my experience the more standard naming convention for columns is not to include the table name for the following reasons:
It is unnecessary repetition
It is harder to maintain because if the table name changes in the future all the columns would need renaming.
The convention may not be clear to another implementer further down the line
When you perform a query you can always alias the columns at that point if you need the table name in them.
I would only use the table name in a column if the column is a foreign key to another table. If you use this convention it makes it relatively easy to identify the foreign key columns within a table without the use of a relational diagram.

Uggghhh, I guess i'm just too lazy so i would do Department.ID instead of Department.DepartmentID. I have enough redundant work enough as it is.

Your first example is better in my opinion. (The second one is confusing when you build queries and often requires the AS sql keyword)
However in my shop we use a little different convention.
PrimaryKey - this column should be named starting with ID and followed by tablename (IDPerson)
ForeingKey - this columns should be named starting with ID and followed by the name of the external table (IDDepartment)
OtherColumns - they should have a meaningful name for the data contained. It's required to repeat the tablename only for those fields that will happen to have the same name in different tables.
When using parameters to call stored procedure you should reverse the convention (#personID, #departmentID)

Some columns might be repeated or have the same value, I generally use the same name in that case for timestamps etc. I work with financial data and would like to name the columns as the stocks name itself, rather than giving different name in different tables.

I never see same rule between differente vendor. I do not know if there is some standard, but I think no!
Personally, I like your second option. Repeate the table name in the column name, is an unnecessary repetition.

I like to try to keep column names unique throughout my database. I do that by using prefixes for tables and columns so that even foreign keys are uniquely named. This makes doing joins simpler as I (usually) don't need to reference the tables in the joins:
--------------------
pe_people
--------------------
pe_personID (PK)
pe_firstName
pe_lastName
pe_timeStamp
--------------------
--------------------
ac_accounts
--------------------
ac_accountID (PK)
ac_personID (FK)
ac_accountName
ac_accountBalance
--------------------
SELECT pe_firstName, pe_lastName, pe_accountName, pe_accountBalance
FROM pe_people
INNER JOIN ac_accounts ON (ac_personID = pe_personID)
WHERE pe_timeStamp > '2016-01-01';

Related

How to handle invalid user input table name

I am writing a C# WinForms program which includes a user input textbox, the value of which will be used to create a table. I have been thinking about what the best way to handle invalid T-SQL table names is (though this can be extended to many other situations). Currently the only method I can think of would be to check the input string for any violations of valid table names individually, though this seems long winded and could be prone to missing certain characters for example due to my own ignorance of what is a violation and what is not.
I feel like there should be a better way of doing this but have been unable to find anything in my search so far. Can anyone help point me in the right direction?

As told you in a comment already you should not do this...
You might use something like this
USE master;
GO
CREATE DATABASE dbTest;
GO
USE dbTest;
GO
CREATE TABLE UserTables(ID INT IDENTITY CONSTRAINT PK_UserTables PRIMARY KEY
,UserInput NVARCHAR(500) NOT NULL CONSTRAINT UQ_UserInput UNIQUE);
GO
INSERT INTO UserTables VALUES(N'blah')
,(N'invalid !%$& <<& >< $')
,(N'silly 💖');
GO
SELECT * FROM UserTables;
/*
ID UserInput
1 blah
2 invalid !%$& <<& >< $
3 silly 💖
*/
GO
USE master;
GO
DROP DATABASE dbTest;
GO
You would then create your tables as Table1, Table2 and so on.
Whenever a user enters his string, you visit the table, pick the ID and create the table's name by concatenating the word Table with the ID.
There are better approaches!
But you should think of a fix schema. You will have to define columns (how many, which type, how to name them?). You will feel in hell when you have to query this. Nothing to rely on...
One approach is a classical n:m mapping
A User table (UserID, Name, ...)
A test table (TestID, TestName, TestType, ...)
The mapping table (ID, UserID, TestID, Result VARCHAR(MAX))
Depending on what you need you might add a table
question table (QuestionID, QuestionText ...)
Then use a mapping to bind questions to tests and another mapping to bind answers to such mapped questions.
another approach was to store the result as a generic container (XML or JSON). This keeps your tables slim, but needs to knwo the XML's structure in order to query it.
Many ways to skin a rabbit...
UPDATE
You ask for an explanation...
The main advantage of a relational database is the pre-known structure.
Precompiled queries, cached results, statisics, indexes demand for known structures.
Data integrity is ensured with constraints, foreign keys and so on. All this demands for known names, known types(!) and known relations.
User-specific table names, and even worse: generically defined structures, do not allow for any join, or other typical RDBMS operation. The only approach is to create each and any statement dynamically (string building)
The rule of thumb is: Whenever you think to have to create several objects of for the same, but with different names you should think about the design. It is bad to store Phone1, Phone2 and Phone3. It is better to have a side table, with a UserID and a Phone column (classical 1:n). It is bad to have SalesCustomerA, SalesCustomerB, better use a Customer table and bind its ID into a general Sales table as FK.
You see what I mean? What belongs together should live in one single table. if you need separation add columns to your table and use them for grouping and filtering.
Just imagine you want to do some statistical evaluation of all your user test tables. How would you gather the data into one big pool, if you cannot rely on some structure in common?
I hope this makes it clear...
If you still wnat to stick to your idea, you should give my code sample a try. this allows to map any silly string to a secure and easy to handle table name.

Lots can go wrong with users entering table names. A bunch of whacked out names is maintenance nightmare. A user should not even be aware of table name. It is a security risk as now the program has to have database owner authority. You want to limit users to minimum authority.
Similar to Shnugo but with composite primary key. This will prevent duplicate userID, testID.
user
ID int identity PK
varchar(100) fName
varchar(100) lName
test
ID int identity PK
varchar(100) name
userTest
int userID PK FK to User
int testID PK FK to Test
int score
select t.Name, u.Name, ut.score
from userTest ut
join Test t
on t.ID = ut.testID
join User u
on u.ID = ut.userID
order by t.Name, u.Name

insert a row in a table that links with PK autoincrement of another table

I have two tables
contact table
contactID (PK auto increment)
FirstName
LastName
Address
etc..
Patient table
PatientID
contactID (FK)
How can I add the contact info for Patient first, then link that contactID to Patient table
when the contactID is autoincrement (therefore not known until after the row is created)
I also have other tables
-Doctor, nurse etc
that also links to contact table..
Teacher table
TeacherID
contactID (FK)
So therefore all the contact details are located in one table.
Is this a good database design?
or is it better to put contact info for each entity in it's own table..
So like this..
Patient table
PatientID (PK auto increment)
FirstName
LastName
Address
Doctor table
DoctorID (PK auto increment)
FirstName
LastName
Address
In terms of programming, it is easier to just have one insert statement.
eg.
INSERT INTO Patient VALUES(Id, #Firstname,#lastname, #Address)
But I do like the contact table separated (since it normalize the data) but then it has issue with not knowing what the contactID is until after it is inserted, and also probably needing to do two insert statements (which I am not sure how to do)
=======
Reply to EDIT 4
With the login table, would you still have a userid(int PK) column?
E.g
Login table
UserId (int PK), Username, Password..
Username should be unique

You must first create the Contact and then once you know its primary key then create the Patient and reference the contact with the PK you now know. Or if the FK in the Patient table is nullable you can create the Patient first with NULL as the ContactId, create the contact and then update the Patient but I wouldn't do it like this.
The idea of foreign key constraints is that the row being referenced MUST exist therefore the row being referenced must exist BEFORE the row referencing it.
If you really need to be able to have the same Contact for multiple Patients then I think it's good db design. If the relationship is actually one-to-one, then you don't need to separate them into two tables. Given your examples, it might be that what you need is a Person table where you can put all the common properties of Doctors, Teachers and Patients.
EDIT:
I think it's inheritance what you are really after. There are few styles of implementing inheritance in relational db but here's one example.
Person database design
PersonId in Nurse and Doctor are foreign keys referencing Person table but they are also the primary keys of those tables.
To insert a Nurse-row, you could do like this (SQL Server):
INSERT INTO Person(FirstName) VALUES('Test nurse')
GO
INSERT INTO Nurse(PersonId, IsRegistered) VALUES(SCOPE_IDENTITY(), 1)
GO
EDIT2:
Google reveals that SCOPE_IDENTITY() equivalent in mysql is LAST_INSERT_ID() [mysql doc]
EDIT3:
I wouldn't separate doctors and nurses into their own tables so that columns are duplicated. Doing a select without inner joins would probably be more efficient but performance shouldn't be the only criteria especially if the performance difference isn't that notable. There will many occasions when you just need the common person data so you don't always have to do the joins anyway. Having each person in the same table gives the possibility to look for a person in a single table. Having common properties in a single table also allows you have to have doctor who is also a patient without duplicating any data. Later, if you want to have more common attributes, you'd need to add them to each "derived" table too and I will assure you that one day you or someone else forgets to add the properties in one of the tables.
If for some reason you are still worried about performance and are willing to sacrifice normalization to gain performance, another possibility is to have all person columns in the same table and maybe have a type column there to distinguish them and just have a lot of null columns, so that all the nurse columns are null for doctors and so on. You can read about inheritance implementation strategies to get an idea of even though you aren't using Entity Framework.
EDIT4:
Even if you don't have any nurse-specific columns at the moment, I would still create a table for them if it's even slightly possible that there will be in the future. Doing an inner join is a pretty good way to find the nurses or you could do it in the WHERE-clause (there a probably a billion ways to do this). You could have type column in the Person table but that would prevent the same person being a doctor and a patient at the same time. Also in my opinion separate tables is more "strict" and more clear for (future) developers.
I would probably make PersonId nullable in the User table since you might have users that are not actual people in the organization. For example administrators or similar service users. Think about in terms of real world entities (forget about foreign keys and nullables), is every user absolutely part of the organization? But all this is up to you and the requirements of the software. Database design should begin with an entity relationship design where you figure out the real world relationships without considering how they will be mapped to a relational database. This helps you to figure out what the actual requirements are.

selecting a column from ms-access database which has linked tables

I am building a windows form C# app. and I use oleDb for linking access database to my app. the problem is, My access database has two tables (students,courseCodes) and one column of the "students" table(courseName) is linked to one in the "courseCode" table (the "courseCode" table contains course codes for example course code 1 is Static and I use code 1 in the "students" table for displaying Statics) now when I want to select column containing Statics using
"SELECT DISTINCT courseName FROM students";
I got the "1" instead "Statics" is there any way to retrieve "Statics" instead "1"?

I'd say your naming convention is misleading and confusing. The column should be courseIndex, not courseName.
Do a JOIN, of course (no pun intended). This query will return the distinct course names that a given student has signed up for.
select distinct courseCode.courseName
from student
join courseCode
on student.courseId = courseCode.id
where student.id = ?
Please adjust for your schema details.
Personally I think this is a poor design. A student can sign up for many courses, and a course can have many students. This is a many-to-many relationship. You need a join table; sounds like you only have a foreign key one-to-many relationship here.

Database Table Schema and Aggregate Roots

Applicaiton is single user, 1-tier(1 pc), database SqlCE. DataService layer will be (I think) : Repository returning domain objects and quering database with LinqToSql (dbml). There are obviously a lot more columns, this is simplified view.
LogTime in separate table: http://i53.tinypic.com/9h8cb4.png
LogTime in ItemTimeLog table (as Time): http://i51.tinypic.com/4dvv4.png
alt text http://i53.tinypic.com/9h8cb4.png
This is my first attempt of creating a >2 tables database. I think the table schema makes sense, but I need some reassurance or critics. Because the table relations looks quite scary to be honest. I'm hoping you could;
Look at the table schema and respond if there are clear signs of troubles or errors that you spot right away.. And if you have time,
Look at Program Summary/Questions, and see if the table layout makes makes sense to those points.
Please be brutal, I will try to defend :)
Program summary:
a) A set of categories, each having a set of strategies (1:m)
b) Each day a number of items will be produced. And each strategy MAY reference it.
(So there can be 50 items, and a strategy may reference 23 of them)
c) An item can be referenced by more than one strategy. So I think it's an m:m relation.
d) Status values will be logged at fixed time-fractions through the day, for:
- .... each Strategy.....each StrategyItem....each item
e) An action on an item may be executed by a strategy that reference it.
- This is logged as ItemAction (Could have called it StrategyItemAction)
User Requsts
b) -> e) described the main activity mode of the program. To work with only today's DayLog , for each category. 2nd priority activity is retrieval of history, which typically will be From all categories, from day x to day y; Get all StrategyDailyLog.
Questions
First, does the overall layout look sound? I'm worried to see that there are so many relationships in all directions, connecting everything. Is this normal, or does it look like trouble?
StrategyItem is made to represent an m:m relationship. Is it correct as I noted 1:m / 1:1 (marked red) ?
StrategyItemTimeLog and ItemTimeLog; Logs values that both need to be retrieved together, when retreiving a StrategyItem. Reason I separated is that the first one is strategy-specific, and several strategies can reference same item. So I thought not to duplicate those values that are not dependent no strategy, but only on the item. Hence I also dragged out the LogTime, as it seems to be the only parameter to unite the logs. But this all looks quite disturbing with those 3 tables. Does it make sense at all? Or you have suggestion?
Pink circles shows my vague attempt of Aggregate Root Paths. I've been thinking in terms of "what entity is responsible for delete". Though I'm unsure about the actual root. I think it's Category. Does it make sense related to User Requests described above?
EDIT1:
(Updated schema, showing typical number of hierarchy items for the first few relations, for 365 days, and additional explanations)
1:1 relation: Sorry. I made a mistake. The StrategyDailyLog should be 1:m. See updated schema. It is one per Strategy, per day.
DayLog / StrategyDailyLog: I’ve been pondering over wether DayLog shall be a part of the hierarchy like this or not. The purpose of the DayLog table is to hold “sum values” derived from all the StrategyDailyLog tables for the same day. Like performance values for this day. It also holds the date value. Which allows me to omit a date value in the StrategyDailyLog (Which I feel would kind of be a duplicate modeling of the date-field), but instead the reference to DayLog exist to “find” the date. I’m not sure if this is an abuse/misconception of normalization.
Null value: I haden’t thought about this. I believe I found 2, as now marked in StrategyDailyLog and ItemAction. They can not be null on creation, but they can be set to null if one need to delete either a Strategy, or a StrategyItem. That should not require a delete of the StrategyDailyLog and the ItemAction. Hence they can be set to null.
All Id –columns: My idea was to have ID (autogenerated Integer) as PK for all my tables. I believed that also would be sufficient as candidate key. Is this not a proper way to make PKs? It’s the only way any table of mine can be identified. I asked a question before if that was ok, maybe I misunderstood, but thought that was a good approach.
m:m relation: This is what I have attempted to do: StrategyItem is the m:m table of StrategyDailyLog / DailyItem.

Ok. Here is me being brutal. I do not understand the model.
So instead of trying to comment on that so much, here are some thoughts that came to my mind when I looked at it.
I think you should have look at your 1:1 relationships (all of them). Why is DayLog and StrategyDailyLog separated in two tables? Probably because you will always have at least one DayLog item but not all DayLog items have a StrategyDailyLog item. If that is the case you can have a StrategyID FK in DayLog table with allow nulls option.
It would help to understand the model if you could show which fields are required and which fields accept null as a value.
All your tables have its own id column. That can be quite confusing when doing 1:1 relations and m:m relations. For a 1:1 relation, usually the relation between the two tables is made on the primary key in both tables. If you do not do that you have to create a candidate key on the foreign key column. In your case that means that StrategyDailyLog should have a candidate key on DayLogID.
A m:m relation between two tables is usually solved by adding a new table in between, with the primary keys from both tables. Those fields together is the primary key for the table in the middle.
Lets say for example that you should have a m:m relationship between Category and Strategy. You should then create a table called CategoryStrategy with two fields CategoryID and StrategyID that together is the primary key for table CategoryStrategy.
I hope my comments makes sense and that they are useful to you.
EDIT 2011-01-17
I do not think that you should have as a principle to use a IDENTITY column as primary key in all tables. A m:m relation does not need it so you should not do it. I also think that you have misunderstood what I meant with a candidate key. A candidate key is a key that could have been used as the primary key. In MS SQL Server you define a UNIQUE CONSTRAINT for your candidate key.
Ex: Table StrategyItem have id as PK but the combination of StrategyID and DailyItemID is the candidate key. Better would be to remove id and use StrategyID+DailyItemID as PK.
Below is the schema that I would have built with your description. I might have missed something important because I do not know everything about what you want to do.
You should not think so much about query performance and building aggregates when designing the schema. That can be handled by creating indexes on columns and using sum, count and group by in your queries. An index on column Created in the model below would be necessary for your queries on a date or date interval. In MS SQL Server there is something called the clustered index. Default the PK of a table is the clustered index but in this case I would make the index on Created column the clustered index.
A Category has 0,1 or more Strategy.
LogItem have on Category and optionally one Strategy
LogItem.Created holds date and time.

How to Update the primary key of table which is referenced as foreign key in another table?

Suppose a
Table "Person" having
"SSN",
"Name",
"Address"
and another
Table "Contacts" having
"Contact_ID",
"Contact_Type",
"SSN" (primary key of Person)
similarly
Table "Records" having
"Record_ID",
"Record_Type",
"SSN" (primary key of Person)
Now i want that when i change or update SSN in person table that accordingly changes in other 2 tables.
If anyone can help me with a trigger for that
Or how to pass foreign key constraints for tables

Just add ON UPDATE CASCADE to the foreign key constraint.

Preferably the primary key of a table should never change. If you expect the SSN to change you should use a different primary key and have the SSN as a normal data column in the person table. If it's already too late to make this change, you can add ON UPDATE CASCADE to the foreign key constraint.

If you have PKs that change, you need to look at the table design, use an surrogate PK, like an identity.
In your question you have a Person table, which could be a FK to many many tables. In that case a ON UPDATE CASCADE could have some serious problems. The database I'm working on has well over 300 references (FK) to our equivalent table, we track all the various work that a person does in each different table. If I insert a row into our Person table and then try to delete it back out again (it will not be used in any other tables, it is new) the delete will fail with a Msg 8621, Level 17, State 2, Line 1 The query processor ran out of stack space during query optimization. Please simplify the query. As a result I can't imagine an ON UPDATE CASCADE would work either when you get many FKs on your PK.
I would never make sensitive data like a SSN a PK. Health care companies used to do this and had a painful switch because of privacy. I hope you don't have a web app and have a GET or POST variable called SSN with the actual value in it!! Or display the SSN on every report, or will you shred all old printed reports and limit access to who views each report., etc.

Well, assuming the SSN is the primary key of the Person table, I would just (in a transaction of course):
create a brand new row with the new SSN, copying all other details from the old row.
update the columns in the other tables to point to the new row.
delete the old row.
Now this is actually a good example of why you shouldn't use real data as table cross-references, if that data can change. If you'd used an artificial column to tie them together (and only stored the SSN in one place), you wouldn't have the problem.

Cascade update and delete are very dangerous to use. If you have a million child records, you could end up with a serious locking problem. You should code the updates and deletes instead.
You should never use a PK with the potential to change if it can be avoided. Nor should you ever use SSN as a PK because it should never be stored unencrypted in your database. Never, unless your company likes to be sued when they are the cause of an indentity theft incident. This is not a design flaw to shrug off as this is legacy, we don't have time to fix. This is a design flaw that could bankrupt your company if someone steals your backup tapes or gets the ssns out of the sytem in another manner (most of these types of thefts are internal BTW). This is an urgent - must fix now design flaw.
SSN is also a bad candidate because it changes (people change them when they are victims of identity theft for instance.) Plus an integer PK will have faster performance than a nine-digit PK.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.