I'm going to be creating competitions on the current site I'm working on. Each competition is not going to be the same and may have a varying number of input fields that a user must enter to be part of the competition eg.
Competition 1 might just require a firstname
Competition 2 might require a firstname, lastname and email address.
I will also be building a tool to observe these entries so that I can look at each individual entry.
My question is what is the best way to store an arbitrary number of fields? I was thinking of two options, one being to write each entry to a CSV file containing all the entries of the competition, the other being to have a db table with a varchar field in the database that just stores an entire entry as text. Both of these methods seem messy, is there any common practice for this sort of task?
I could in theory create a db table with a column for every possible field, but it won't work when the competition has specific requirements such as "Tell us in 100 words why..." or "Enter your 5 favourite things that.."
ANSWERED:
I have decided to use the method described below where there are multiple generic columns that can be utilized for different purposes per competition.
Initially I was going to use EAV, and I still think it might be slightly more appropriate for this specific scenario. But it is generally recommended against because of it's poor scalability and complicated querying, and I wouldn't want to get into a habit of using it. Both answers worked absolutely fine in my tests.
I think you are right to be cautious about EAV as it will make your code a bit more complex, and it will be a bit more difficult to do ad-hoc queries against the table.
I've seen many enterprise apps simply adopt something like the following schema -
t_Comp_Data
-----------
CompId
Name
Surname
Email
Field1
Field2
Field3
...
Fieldn
In this instance, the generic fields (Field1 etc) mean different things for the different competitions. For ease of querying, you might create a different view for each competition, with the proper field names aliased in.
I'm usually hesitant to use it, but this looks like a good situation for the Entity-attribute-value model if you use a database.
Basically, you have a CompetitionEntry (entity) table with the standard fields which make up every entry (Competition_id, maybe dates, etc), and then a CompetitionEntryAttribute table with CompetitionEntry_id, Attribute and Value.You probably also want another table with template attributes for each competition for creating new entries.
Unfortunately you will only be able to store one datatype, which will likely have to be a large nvarchar.
Another disadvantage is the difficulty to query against EAV databases.
Another option is to create one table per competition (possibly in code as part of the competition creation), but depending on the number of competitions this may be impractcal.
Related
I have to create a database structure. I have a question about foreing keys and good practice:
I have a table which must have a field that can be two different string values, either "A" or "B".
It cannot be anything else (therefore, i cannot use a string type field).
What is the best way to design this table:
1) create an int field which is a foreign key to another table with just two records, one for the string "A" and one for the string "B"
2) create an int field then, in my application, create an enumeration such as this
public enum StringAllowedValues
{
A = 1,
B
}
3) ???
In advance, thanks for your time.
Edit: 13 minutes later and I get all this awesome feedback. Thank you all for the ideas and insight.
Many database engines support enumerations as a data type. And there are, indeed, cases where an enumeration is the right design solution.
However...
There are two requirements which may decide that a foreign key to a separate table is better.
The first is: it may be necessary to increase the number of valid options in that column. In most cases, you want to do this without a software deployment; enumerations are "baked in", so in this case, a table into which you can write new data is much more efficient.
The second is: the application needs to reason about the values in this column, in ways that may go beyond "A" or "B". For instance, "A" may be greater/older/more expensive than "B", or there is some other attribute to A that you want to present to the end user, or A is short-hand for something.
In this case, it is much better to explicitly model this as columns in a table, instead of baking this knowledge into your queries.
In 30 years of working with databases, I personally have never found a case where an enumeration was the right decision....
Create a secondary table with the meanings of these integer codes. There's nothing that compels you to JOIN that in, but if you need to that data is there. Within your C# code you can still use an enum to look things up but try to keep that in sync with what's in the database, or vice-versa. One of those should be authoritative.
In practice you'll often find that short strings are easier to work with than rigid enums. In the 1990s when computers were slow and disk space scarce you had to do things like this to get reasonable performance. Now it's not really an issue even on tables with hundreds of millions of rows.
Suppose i have one table that holds Blogs.
The schema looks like :
ID (int)| Title (varchar 50) | Value (longtext) | Images (longtext)| ....
In the field Images i store an XML Serialized List of images that are associated with the blog.
Should i use another table for this purpose?
Yes, you should put the images in another table. Having several values in the same field indicates denormalized data and makes it hard to work with the database.
As with all rules, there are exceptions where it makes sense to put XML with multiple values in one field in the database. The first rule is that:
The data should always read/written together. No need to read or update just one of the values.
If that is fulfilled, there can be a number of reasons to put the data together in one field:
Storage efficiency, if space has proved to be a problem.
Retrieval efficiency, if performance has proved to be a problem.
Schema flexilibity; where one XML field can eliminate tens or hundreds of different tables.
I would certainly use another table. If you use XML, what happens when you need to go through and update the references to all images? (Would you just rather do an Update blog_images Set ..., or parse through the XML for each row, make the update, then re-generate the updated XML for each?
Well, it is a bit "inner platform", but it will work. A separate table would allow better image querying, although on some RDBMS platforms this could also be achieved via an XML-type column and SQL/XML.
If this data only has to be opaque storage, then maybe. However, keep in mind you'll generally have to bring back the entire XML to the app-tier to do anything interesting with it (or: depending on platform, use SQL/XML, but I advise against this, as the DB isn't the place to do such processing in most cases).
My advice in all other cases: separate table.
That depends on whether you'd need to query on the actual image data itself. If you see a possible need to query on certain images, or images with certain attributes, then it would probably be best to store that image data in a different way.
Otherwise, leave it the way it is.
But remember, only include the fields in your SELECT when you need them.
Should i use another table for this purpose?
Not necessarily. You just have to ensure that you are not selecting the images field in your queries when you don't need it. But if you wanted to denormalize your schema you could use another table and when you need the images perform a join.
I'm writing an application that I will use to keep up with my monthly budget. This will be a C# .NET 4.0 Winforms application.
Essentially my budget will be a matrix of data if you look at it visually. The columns are the "dates" at which that budget item will be spent. For example, I have 1 column for every Friday in the month. The Y axis is the name of the budget item (Car payment, house payment, eating out, etc). There are also categories, which are used to group the budget item names that are similar. For example, a category called "Housing" would have budget items called Mortgage, Rent, Electricity, Home Insurance, etc.
I need a good way to store this data from a code design perspective. Basically I've thought of two approaches:
One, I can have a "BudgetItem" class that has a "Category", "Value", and "Date". I would have a linear list of these items and each time I wanted to find a value by either date or category, I iterate this list in some form or fashion to find the value. I could probably use LINQ for this.
Second, I could use a 2D array which is indexed first by column (date) and second by row. I'd have to maintain categories and budget item names in a separate list and join the data together when I do my lookups somehow.
What is the best way to store this data in code? I'm leaning more towards the first solution but I wanted to see what you guys think. Later on when I implement my data persistence, I want to be able to persist this data to SQL server OR to an XML file (one file per monthly budget).
While your first attempt looks nicer, obviusly the second could be faster (depends on how you implement it). However when we are talking about desktop applications which are not performance critical, your first idea is definitely better, expecially because will help you a lot talking about maintaining your code. Also remember that the entity framework could be really nice in this situation
Finally if you know how to works with XML, I think is really better for this type of project. A database is required only when you have a fair amount of tables, as you explained you will only have 2 tables (budgetitem and category), I don't think you need a database for such a simple thing
I'm creating a data-entry application where users are allowed to create the entry schema.
My first version of this just created a single table per entry schema with each entry spanning a single or multiple columns (for complex types) with the appropriate data type. This allowed for "fast" querying (on small datasets as I didn't index all columns) and simple synchronization where the data-entry was distributed on several databases.
I'm not quite happy with this solution though; the only positive thing is the simplicity...
I can only store a fixed number of columns. I need to create indexes on all columns. I need to recreate the table on schema changes.
Some of my key design criterias are:
Very fast querying (Using a simple domain specific query language)
Writes doesn't have to be fast
Many concurrent users
Schemas will change often
Schemas might contain many thousand columns
The data-entries might be distributed and needs syncronization.
Preferable MySQL and SQLite - Databases like DB2 and Oracle is out of the question.
Using .Net/Mono
I've been thinking of a couple of possible designs, but none of them seems like a good choice.
Solution 1: Union like table containing a Type column and one nullable column per type.
This avoids joins, but will definitly use a lot of space.
Solution 2: Key/value store. All values are stored as string and converted when needed.
Also use a lot of space, and of course, I hate having to convert everything to string.
Solution 3: Use an xml database or store values as xml.
Without any experience I would think this is quite slow (at least for the relational model unless there is some very good xpath support).
I also would like to avoid an xml database as other parts of the application fits better as a relational model, and being able to join the data is helpful.
I cannot help to think that someone has solved (some of) this already, but I'm unable to find anything. Not quite sure what to search for either...
I know market research is doing something like this for their questionnaires, but there are few open source implementations, and the ones I've found doesn't quite fit the bill.
PSPP has much of the logic I'm thinking of; primitive column types, many columns, many rows, fast querying and merging. Too bad it doesn't work against a database.. And of course... I don't need 99% of the provided functionality, but a lot of stuff not included.
I'm not sure this is the right place to ask such a design related question, but I hope someone here has some tips, know of any existing work, or can point me to a better place to ask such a question.
Thanks in advance!
Have you already considered the most trivial solution: having one table for each of your datatypes and storing the schema of your dataset in the database as well. Most simple solution:
DATASET Table (Virtual "table")
ID - primary key
Name - Name for the dataset/table
COLUMNSCHEMA Table (specifies the columns for one "dataset")
DATASETID - int (reference to Dataset-table)
COLID - smallint (unique # of the column)
Name - varchar
DataType - ("varchar", "int", whatever)
Row Table
DATASETID
ID - Unique id for the "row"
ColumnData Table (one for each datatype)
ROWID - int (reference to Row-table)
COLID - smallint
DATA - (varchar/int/whatever)
To query a dataset (a virtual table), you must then dynamically construct a SQL statement using the schema information in COLUMNSCHEMA table.
I have an application that I need to query lifetables (for insurance calculation).
I was thinking about using XML to store the data, but thought it was a little big, but maybe a little small for using a full-fledged database. So I chose to use SQLite.
In my application, I have enums defining a few different things. For example, GENDER.Male, GENDER.Female. and JOBTYPE.BlueCollar, JOBTYPE.WhiteCollar. etc etc.
I have some methods that look like this: (example)
FindLifeExpectancy(int age, GENDER gender);
FindDeathRate(int age, JOBTYPE jobType);
So my question is: How do you model enums in a database? I don't think it is best practice to use 0 or 1 in the database to store JOBTYPE because that would be meaningless to anyone looking at it. But if you used nvarchar, to store "BlueCollar", there would be a lot of duplicate data.
I don't think GENDER or JOBTYPE should have an entire class, or be apart of the entity model because of the little information they provide.
How is this normally done?
Thanks.
I prefer to statically map my enums in my program to a lookup table in my database. I rarely actually use the lookup table to do a join. As an example I might have the following tables:
Gender
GenderID Name
1 Male
2 Female
Accounts
AccountID GenderID FirstName LastName
1 1 Andrew Siemer
2 2 Jessica Siemer
And in code I would then have my enum defined with the appropriate mapping
public enum Gender
{
Male = 1,
Female = 2
}
Then I can use my enum in code and when I need to use the enum in a LINQ to SQL query I just get its physical value like this
int genderValue = (int)Enum.Parse(typeof(Gender), Gender.Male));
This method may make some folks out there a bit queezy though given that you have just coupled your code to values in your database! But this method makes working with your code and the data that backs that code much easier. Generally, if someone swaps out the ID of a lookup table, you are gonna be hosed in some way or another given that it is mapped across your database any how! I prefer the readability and ubiquitous nature of this design though.
While it's unlikely that you will be adding a new gender, I wouldn't be so sure about the jobtype enum. I'd have used a separate table for both, and have foreign keys to this table every where I need to reference them. The schema will be extensible, the database will automatically check that only possible values are saved in the referencing tables.
The SQL equivalent of 'enums' are lookup tables. These are tables with two (sometimes more) columns:
a code, typically short, numeric or character (ex: 'R', 'S', 'M'...)
a text definition (ex: 'Retired', 'Student', 'Military'...)
extra columns can be used to store definitions, or alternate versions of the text for example a short abbreviation for columnar reports)
The short code is the type of value stored in the database, avoiding the replication you mentioned. For relatively established categories (say Male/Female), you may just use a code, without 'documenting' it in a lookup table.
If you have very many different codes, it may be preferable to keep their lookup in a single SQL table, rather than having a proliferation of dozen of tables. You can simply add a column that is the "category", which itself is a code, designating the nature of the group of codes defined in this category ("marital status", "employment", "education"...)
The info from the lookup tables can be used to populate drop downs and such, in the UI, wherey the end-user sees the clear text but the application can use the code to query the database. It is also used in the reverse direction, to produce the clear text for codes found in the database, for displaying results list and such.
A JOIN construct at the level of SQL is a convenient way to relate the lookup table and the main table. For example:
SELECT Name, Dob, M.MaritalStatus
FROM tblCustomers C
LEFT OUTER JOIN tblMaritalLkup M ON C.MStatus = M.Code
WHERE ...