I have to create a database structure. I have a question about foreing keys and good practice:
I have a table which must have a field that can be two different string values, either "A" or "B".
It cannot be anything else (therefore, i cannot use a string type field).
What is the best way to design this table:
1) create an int field which is a foreign key to another table with just two records, one for the string "A" and one for the string "B"
2) create an int field then, in my application, create an enumeration such as this
public enum StringAllowedValues
{
A = 1,
B
}
3) ???
In advance, thanks for your time.
Edit: 13 minutes later and I get all this awesome feedback. Thank you all for the ideas and insight.
Many database engines support enumerations as a data type. And there are, indeed, cases where an enumeration is the right design solution.
However...
There are two requirements which may decide that a foreign key to a separate table is better.
The first is: it may be necessary to increase the number of valid options in that column. In most cases, you want to do this without a software deployment; enumerations are "baked in", so in this case, a table into which you can write new data is much more efficient.
The second is: the application needs to reason about the values in this column, in ways that may go beyond "A" or "B". For instance, "A" may be greater/older/more expensive than "B", or there is some other attribute to A that you want to present to the end user, or A is short-hand for something.
In this case, it is much better to explicitly model this as columns in a table, instead of baking this knowledge into your queries.
In 30 years of working with databases, I personally have never found a case where an enumeration was the right decision....
Create a secondary table with the meanings of these integer codes. There's nothing that compels you to JOIN that in, but if you need to that data is there. Within your C# code you can still use an enum to look things up but try to keep that in sync with what's in the database, or vice-versa. One of those should be authoritative.
In practice you'll often find that short strings are easier to work with than rigid enums. In the 1990s when computers were slow and disk space scarce you had to do things like this to get reasonable performance. Now it's not really an issue even on tables with hundreds of millions of rows.
Related
I have a table that has Constant Value...Is it better that I have this table in my Database(that is SQL)or have an Enum in my code and delete my table?
my table has only 2 Columns and maximum 20 rows that these rows are fixed and get filled once,first time that i run application.
I would suggest to create an Enum for your case. Since the values are fixed(and I am assuming that the table is not going to change very often) you can use Enum. Creating a table in database will require an unnecessary hit to the database and will require a database connection which could be skipped if you are using Enum.
Also a lot may depend on how much operation you are going to do with your values. For example: its tedious to query your Enum values to get distinct values from your table. Whereas if you will use table approach then it would be a simple select distinct. So you may have to look into your need and the operations which you will perform on these values.
As far as the performance is concerned you can look at: Enum Fields VS Varchar VS Int + Joined table: What is Faster?
As you can see, ENUM and VARCHAR results are almost the same, but join
query performance is 30% lower. Also note the times themselves –
traversing about same amount of rows full table scan performs about 25
times better than accessing rows via index (for the case when data
fits in memory!)
So, if you have an application and you need to have some table field
with a small set of possible values, I’d still suggest you to use
ENUM, but now we can see that performance hit may not be as large as
you expect. Though again a lot depends on your data and queries.
That depends on your needs.
You may want to translate the Enum Values (if you are showing it in GUI) and order a set of record based on translated values. For example: imagine you have a Employees table and a Position column. If the record set is big, and you want to sort or order by translated position column, then you have to keep the enum values + translations in database.
Otherwise KISS and have it in code. You will spare time on asking database for values.
I depends on character of that constants.
If they are some low level system constants that never should be change (like pi=3.1415) then it is better to keep them only in code part in some config file. And also if performance is critical parameter and you use them very often (on almost each request) it is better to keep them in code part.
If they are some constants (may be business constants) that can change in future it is Ok to put them in table - then you have more flexibility to change them (for instance from admin panel).
It really depends on what you actually need.
With Enum
It is faster to access
Bound to that certain application. (although you can share by making it as reference, but it just does not look as good as using DB)
You can use in switch statement
Enum usually does not care about value and it is limited to int.
With DB
It is slower, because you have to make connection and query.
The data can be shared widely.
You can set the value to be anything (any type any value).
So, if you will use it only on certain application, Enum is good enough. But if several applications are going to use it, then DB would be better option.
I am migrating an old database (oracle) and there are few tables like CountryCode, DeptCode and RoleCodes, their primary key is string (Codes) and i am thinking about adding Number column as a primary key because it would work fast with joins. These tables are not really big.
I am wondering if primary key for those tables should start from number '1' or it can be started from 100 just to differentiate b/w tables PK although i don't think i would be showing them on reports.
For sequence-generated IDs, I would suggest starting at different values if it's easy to do (depends on your database etc). You shouldn't be using this to differentiate between them in code, but it can make testing more reasonable.
Before now, I've had a situation where I've accidentally used a foreign key one table as if it were the foreign key for another table. The tests passed as the IDs were coincidentally the same. After we discovered the problem, we changed the initial seed and found the tests were a lot clearer.
You shouldn't do it to differentiate between tables. That is just not practical.
Not all primary keys have to start at 1, as in the case of an order number.
The rationale you're using to switch to an integer primary key doesn't seem valid: the performance gain you'd see using an INT rather than the original codes (which I assume are strings) will be negligable. The PK is always indexed, and indexes for strings or numerics are as good as instant. So unless you really need an INT, I'd be tempted to stick with the original data-type and work with the original data - simplifies data migration (which is something that should be considered whilst doing any work).
It is very common for example in ERP systems to define number ranges that
represent a certain group of items.
This can be both as position in a bigger number, e.g.
1234567890
| |
index 4 - 6 represents region code
index 7 - 8 represents dept code...
or, as I suspect in your case, parts at the same place, like
1000 - 1999 Region codes
2000 - 2999 DeptCode
3000 - 3999 RoleCode
Therefore: No, it not necessarily starts with 1.
Bigger ERP Systems have even configuration sections for number ranges!
Now, from a database point of view:
Yes, your tables should always have a primary key!
Having one will tremendously improve performance on average cases.
(but in most database systems, if you do not provide one, one will be
set by the DBMS which you do not see and can not handle. Some DBMS even
create indices, but thats another story)
I think it does not matter the start number or the start value that will hold the primary key .
What is important is that they will be represented in the FK of the join tables with the same values that are in the PK of the MAIN table .
A surrogate key can have any values, as long as they are unique. That's what makes it "surrogate" after all - values have no intrinsic meaning on their own, and shouldn't generally even be shown to the user. That being said, you could think about using different seeds, just for testing purposes, as Jon Skeet suggested.
That being said, do you really need to introduce a new (surrogate) key? The existing natural key could actually lead to less1 JOINS, and may be useful for clustering. While there are legitimate uses for surrogate keys, don't do it just becaus it is "fashionable" - always be aware of the tradeoffs you are making and pick the right balance for you concrete needs.
1 It is automatically "propagated" down foreign keys, so you don't need to JOIN the child table to the parent just to get the natural key - natural key is already in the child.
Doesn't matter what int the primary key starts from.
Assuming the codes aren't updated regularly, I don't believe that int will be any faster. It more heavily depends on it being a varchar or of a known size.
I personally always have an field names "Id" as a primary key to a table, defined as an int or a bigInt if necessary.
If the table matches up to an enumerated type then I make sure the Id matches the EnumeratedType id which can be any number - so no it doesn't need to start from 1.
If it doesn't match an enumerated type, then I will usually use an auto-incrementing key starting from 1 but this is not always needed.
Note - that if the number of rows is small, then the difference between indexing on a number and on a varchar will be negligible.
yes, it does'nt matter what integer it start from, it main use is define row uniquely and relationship among other table.
I'm going to be creating competitions on the current site I'm working on. Each competition is not going to be the same and may have a varying number of input fields that a user must enter to be part of the competition eg.
Competition 1 might just require a firstname
Competition 2 might require a firstname, lastname and email address.
I will also be building a tool to observe these entries so that I can look at each individual entry.
My question is what is the best way to store an arbitrary number of fields? I was thinking of two options, one being to write each entry to a CSV file containing all the entries of the competition, the other being to have a db table with a varchar field in the database that just stores an entire entry as text. Both of these methods seem messy, is there any common practice for this sort of task?
I could in theory create a db table with a column for every possible field, but it won't work when the competition has specific requirements such as "Tell us in 100 words why..." or "Enter your 5 favourite things that.."
ANSWERED:
I have decided to use the method described below where there are multiple generic columns that can be utilized for different purposes per competition.
Initially I was going to use EAV, and I still think it might be slightly more appropriate for this specific scenario. But it is generally recommended against because of it's poor scalability and complicated querying, and I wouldn't want to get into a habit of using it. Both answers worked absolutely fine in my tests.
I think you are right to be cautious about EAV as it will make your code a bit more complex, and it will be a bit more difficult to do ad-hoc queries against the table.
I've seen many enterprise apps simply adopt something like the following schema -
t_Comp_Data
-----------
CompId
Name
Surname
Email
Field1
Field2
Field3
...
Fieldn
In this instance, the generic fields (Field1 etc) mean different things for the different competitions. For ease of querying, you might create a different view for each competition, with the proper field names aliased in.
I'm usually hesitant to use it, but this looks like a good situation for the Entity-attribute-value model if you use a database.
Basically, you have a CompetitionEntry (entity) table with the standard fields which make up every entry (Competition_id, maybe dates, etc), and then a CompetitionEntryAttribute table with CompetitionEntry_id, Attribute and Value.You probably also want another table with template attributes for each competition for creating new entries.
Unfortunately you will only be able to store one datatype, which will likely have to be a large nvarchar.
Another disadvantage is the difficulty to query against EAV databases.
Another option is to create one table per competition (possibly in code as part of the competition creation), but depending on the number of competitions this may be impractcal.
Suppose i have one table that holds Blogs.
The schema looks like :
ID (int)| Title (varchar 50) | Value (longtext) | Images (longtext)| ....
In the field Images i store an XML Serialized List of images that are associated with the blog.
Should i use another table for this purpose?
Yes, you should put the images in another table. Having several values in the same field indicates denormalized data and makes it hard to work with the database.
As with all rules, there are exceptions where it makes sense to put XML with multiple values in one field in the database. The first rule is that:
The data should always read/written together. No need to read or update just one of the values.
If that is fulfilled, there can be a number of reasons to put the data together in one field:
Storage efficiency, if space has proved to be a problem.
Retrieval efficiency, if performance has proved to be a problem.
Schema flexilibity; where one XML field can eliminate tens or hundreds of different tables.
I would certainly use another table. If you use XML, what happens when you need to go through and update the references to all images? (Would you just rather do an Update blog_images Set ..., or parse through the XML for each row, make the update, then re-generate the updated XML for each?
Well, it is a bit "inner platform", but it will work. A separate table would allow better image querying, although on some RDBMS platforms this could also be achieved via an XML-type column and SQL/XML.
If this data only has to be opaque storage, then maybe. However, keep in mind you'll generally have to bring back the entire XML to the app-tier to do anything interesting with it (or: depending on platform, use SQL/XML, but I advise against this, as the DB isn't the place to do such processing in most cases).
My advice in all other cases: separate table.
That depends on whether you'd need to query on the actual image data itself. If you see a possible need to query on certain images, or images with certain attributes, then it would probably be best to store that image data in a different way.
Otherwise, leave it the way it is.
But remember, only include the fields in your SELECT when you need them.
Should i use another table for this purpose?
Not necessarily. You just have to ensure that you are not selecting the images field in your queries when you don't need it. But if you wanted to denormalize your schema you could use another table and when you need the images perform a join.
I'm creating a data-entry application where users are allowed to create the entry schema.
My first version of this just created a single table per entry schema with each entry spanning a single or multiple columns (for complex types) with the appropriate data type. This allowed for "fast" querying (on small datasets as I didn't index all columns) and simple synchronization where the data-entry was distributed on several databases.
I'm not quite happy with this solution though; the only positive thing is the simplicity...
I can only store a fixed number of columns. I need to create indexes on all columns. I need to recreate the table on schema changes.
Some of my key design criterias are:
Very fast querying (Using a simple domain specific query language)
Writes doesn't have to be fast
Many concurrent users
Schemas will change often
Schemas might contain many thousand columns
The data-entries might be distributed and needs syncronization.
Preferable MySQL and SQLite - Databases like DB2 and Oracle is out of the question.
Using .Net/Mono
I've been thinking of a couple of possible designs, but none of them seems like a good choice.
Solution 1: Union like table containing a Type column and one nullable column per type.
This avoids joins, but will definitly use a lot of space.
Solution 2: Key/value store. All values are stored as string and converted when needed.
Also use a lot of space, and of course, I hate having to convert everything to string.
Solution 3: Use an xml database or store values as xml.
Without any experience I would think this is quite slow (at least for the relational model unless there is some very good xpath support).
I also would like to avoid an xml database as other parts of the application fits better as a relational model, and being able to join the data is helpful.
I cannot help to think that someone has solved (some of) this already, but I'm unable to find anything. Not quite sure what to search for either...
I know market research is doing something like this for their questionnaires, but there are few open source implementations, and the ones I've found doesn't quite fit the bill.
PSPP has much of the logic I'm thinking of; primitive column types, many columns, many rows, fast querying and merging. Too bad it doesn't work against a database.. And of course... I don't need 99% of the provided functionality, but a lot of stuff not included.
I'm not sure this is the right place to ask such a design related question, but I hope someone here has some tips, know of any existing work, or can point me to a better place to ask such a question.
Thanks in advance!
Have you already considered the most trivial solution: having one table for each of your datatypes and storing the schema of your dataset in the database as well. Most simple solution:
DATASET Table (Virtual "table")
ID - primary key
Name - Name for the dataset/table
COLUMNSCHEMA Table (specifies the columns for one "dataset")
DATASETID - int (reference to Dataset-table)
COLID - smallint (unique # of the column)
Name - varchar
DataType - ("varchar", "int", whatever)
Row Table
DATASETID
ID - Unique id for the "row"
ColumnData Table (one for each datatype)
ROWID - int (reference to Row-table)
COLID - smallint
DATA - (varchar/int/whatever)
To query a dataset (a virtual table), you must then dynamically construct a SQL statement using the schema information in COLUMNSCHEMA table.