Suppose i have one table that holds Blogs.
The schema looks like :
ID (int)| Title (varchar 50) | Value (longtext) | Images (longtext)| ....
In the field Images i store an XML Serialized List of images that are associated with the blog.
Should i use another table for this purpose?
Yes, you should put the images in another table. Having several values in the same field indicates denormalized data and makes it hard to work with the database.
As with all rules, there are exceptions where it makes sense to put XML with multiple values in one field in the database. The first rule is that:
The data should always read/written together. No need to read or update just one of the values.
If that is fulfilled, there can be a number of reasons to put the data together in one field:
Storage efficiency, if space has proved to be a problem.
Retrieval efficiency, if performance has proved to be a problem.
Schema flexilibity; where one XML field can eliminate tens or hundreds of different tables.
I would certainly use another table. If you use XML, what happens when you need to go through and update the references to all images? (Would you just rather do an Update blog_images Set ..., or parse through the XML for each row, make the update, then re-generate the updated XML for each?
Well, it is a bit "inner platform", but it will work. A separate table would allow better image querying, although on some RDBMS platforms this could also be achieved via an XML-type column and SQL/XML.
If this data only has to be opaque storage, then maybe. However, keep in mind you'll generally have to bring back the entire XML to the app-tier to do anything interesting with it (or: depending on platform, use SQL/XML, but I advise against this, as the DB isn't the place to do such processing in most cases).
My advice in all other cases: separate table.
That depends on whether you'd need to query on the actual image data itself. If you see a possible need to query on certain images, or images with certain attributes, then it would probably be best to store that image data in a different way.
Otherwise, leave it the way it is.
But remember, only include the fields in your SELECT when you need them.
Should i use another table for this purpose?
Not necessarily. You just have to ensure that you are not selecting the images field in your queries when you don't need it. But if you wanted to denormalize your schema you could use another table and when you need the images perform a join.
Related
So for example I have Collection of Documents like this:
{
hotField1 : 0,
hotField2 : "",
coldField1 : 0,
...
coldFieldN : ""
}
In this scope cold properties are written once, and accessed sometimes, hot properties are written and then fairly often accessed\updated (but in different use cases, it is not same sub-document or parts of same object).
Amount of documents is fairly huge (1M and more), size of hot data is at least ten times less than cold.
Since partial update is still most wanted yet not implemented feature, only way to update hotField1 is:
Request full document
Change either hotField1 or hotField2
Write back whole document
This is costly in terms of RUs, and doesn't scale so well.
So the question is how to organize such data&calls in DocumentDB to minimize costs?
Discovered alternatives:
Obviously best: retrieve one property; change; update - not yet.
Separate on two Collections, use stored procedures to retrieve from Main Collection then from Dictionary?
Put hotFields1-2 as subdocument ({ sub: {hf1:0, hf2:""}}) and somehow only update it? (I'm not sure if it is possible)
PS. C# in tags for client library we use. If it lacks smth its ok to use REST interface instead.
While there's no exact "best" answer:
Your #2 choice will not work with stored procedures, since stored procedures are scoped to a collection.
Updating a subdocument (#3 choice) is no different than updating top-level properties - you are still retrieving, and re-writing, a document (a subdocument is just another property on the document).
While it may or may not reduce RU (you'd need to benchmark, as Larry pointed out in comments), you may choose to store your hot properties in a separate (smaller) document (or multiple smaller documents). With less properties, there would be less bandwidth consumed during updates, and less index updating. However, since you'd now be retrieving more than one document (possibly across multiple calls), you may find that this activity negates any RU savings from storing in a single document.
Note: There's nothing stopping you from storing these separate documents in the same collection (which then lets you approach the problem with a stored procedure, as you suggested in your #2 choice). You'll just need to create some type of property to help you identify different document types.
NoSQL based on Documents replace the document once you change one or all properties.
In terms of cost, it is based on per collection basis.
So, if you have a DB with two collections in it and each with a performance tier of S1 i.e., $25/month.
$25 x 2 = $50
Case you need a better performance, and change one to S2 you'll be charged:
$50 + $25 = $75
I have to create a database structure. I have a question about foreing keys and good practice:
I have a table which must have a field that can be two different string values, either "A" or "B".
It cannot be anything else (therefore, i cannot use a string type field).
What is the best way to design this table:
1) create an int field which is a foreign key to another table with just two records, one for the string "A" and one for the string "B"
2) create an int field then, in my application, create an enumeration such as this
public enum StringAllowedValues
{
A = 1,
B
}
3) ???
In advance, thanks for your time.
Edit: 13 minutes later and I get all this awesome feedback. Thank you all for the ideas and insight.
Many database engines support enumerations as a data type. And there are, indeed, cases where an enumeration is the right design solution.
However...
There are two requirements which may decide that a foreign key to a separate table is better.
The first is: it may be necessary to increase the number of valid options in that column. In most cases, you want to do this without a software deployment; enumerations are "baked in", so in this case, a table into which you can write new data is much more efficient.
The second is: the application needs to reason about the values in this column, in ways that may go beyond "A" or "B". For instance, "A" may be greater/older/more expensive than "B", or there is some other attribute to A that you want to present to the end user, or A is short-hand for something.
In this case, it is much better to explicitly model this as columns in a table, instead of baking this knowledge into your queries.
In 30 years of working with databases, I personally have never found a case where an enumeration was the right decision....
Create a secondary table with the meanings of these integer codes. There's nothing that compels you to JOIN that in, but if you need to that data is there. Within your C# code you can still use an enum to look things up but try to keep that in sync with what's in the database, or vice-versa. One of those should be authoritative.
In practice you'll often find that short strings are easier to work with than rigid enums. In the 1990s when computers were slow and disk space scarce you had to do things like this to get reasonable performance. Now it's not really an issue even on tables with hundreds of millions of rows.
I have a table that has Constant Value...Is it better that I have this table in my Database(that is SQL)or have an Enum in my code and delete my table?
my table has only 2 Columns and maximum 20 rows that these rows are fixed and get filled once,first time that i run application.
I would suggest to create an Enum for your case. Since the values are fixed(and I am assuming that the table is not going to change very often) you can use Enum. Creating a table in database will require an unnecessary hit to the database and will require a database connection which could be skipped if you are using Enum.
Also a lot may depend on how much operation you are going to do with your values. For example: its tedious to query your Enum values to get distinct values from your table. Whereas if you will use table approach then it would be a simple select distinct. So you may have to look into your need and the operations which you will perform on these values.
As far as the performance is concerned you can look at: Enum Fields VS Varchar VS Int + Joined table: What is Faster?
As you can see, ENUM and VARCHAR results are almost the same, but join
query performance is 30% lower. Also note the times themselves –
traversing about same amount of rows full table scan performs about 25
times better than accessing rows via index (for the case when data
fits in memory!)
So, if you have an application and you need to have some table field
with a small set of possible values, I’d still suggest you to use
ENUM, but now we can see that performance hit may not be as large as
you expect. Though again a lot depends on your data and queries.
That depends on your needs.
You may want to translate the Enum Values (if you are showing it in GUI) and order a set of record based on translated values. For example: imagine you have a Employees table and a Position column. If the record set is big, and you want to sort or order by translated position column, then you have to keep the enum values + translations in database.
Otherwise KISS and have it in code. You will spare time on asking database for values.
I depends on character of that constants.
If they are some low level system constants that never should be change (like pi=3.1415) then it is better to keep them only in code part in some config file. And also if performance is critical parameter and you use them very often (on almost each request) it is better to keep them in code part.
If they are some constants (may be business constants) that can change in future it is Ok to put them in table - then you have more flexibility to change them (for instance from admin panel).
It really depends on what you actually need.
With Enum
It is faster to access
Bound to that certain application. (although you can share by making it as reference, but it just does not look as good as using DB)
You can use in switch statement
Enum usually does not care about value and it is limited to int.
With DB
It is slower, because you have to make connection and query.
The data can be shared widely.
You can set the value to be anything (any type any value).
So, if you will use it only on certain application, Enum is good enough. But if several applications are going to use it, then DB would be better option.
What is the best way to retrieve a "X" number of random records using Entity Framework (EF5 if it's relevant). The value of "X" will be set based on where this will be used.
Is there a method for doing this built into EF or is best to pull down a result set and then use a C# random number function to pull the records. Or is there a method that I'm not thinking of?
On the off chance that it's relevant I have a table that stores images that I use for different usages (there is a FK to an image type table). The images that I use in my carousel on the homepage is what I'm wanting to add some variety to...consequently how "random" it is doesn't matter to me much. I'm just trying to get away from the same six or so pictures always being displayed. (Also, I'm not really interested in debating/discussing storing images in a table vs local storage.)
The solution needs to be one using EF via a LINQ statement. If this isn't directly possibly I may end up doing something SIMILAR to what #cmd has recommended in the comments. This would most likely be a matter of retrieving a record count...testing the PK to make sure the resulting object wasn't null and building a LIST of the X number of object's PKs to pass to front end. The carousel lazy loads the images so I don't actually need the image when I'm building the list that will be used by the carousel.
Can you just add an ORDER BY RAND() clause to your query?
See this related question: MySQL: Alternatives to ORDER BY RAND()
I'm going to be creating competitions on the current site I'm working on. Each competition is not going to be the same and may have a varying number of input fields that a user must enter to be part of the competition eg.
Competition 1 might just require a firstname
Competition 2 might require a firstname, lastname and email address.
I will also be building a tool to observe these entries so that I can look at each individual entry.
My question is what is the best way to store an arbitrary number of fields? I was thinking of two options, one being to write each entry to a CSV file containing all the entries of the competition, the other being to have a db table with a varchar field in the database that just stores an entire entry as text. Both of these methods seem messy, is there any common practice for this sort of task?
I could in theory create a db table with a column for every possible field, but it won't work when the competition has specific requirements such as "Tell us in 100 words why..." or "Enter your 5 favourite things that.."
ANSWERED:
I have decided to use the method described below where there are multiple generic columns that can be utilized for different purposes per competition.
Initially I was going to use EAV, and I still think it might be slightly more appropriate for this specific scenario. But it is generally recommended against because of it's poor scalability and complicated querying, and I wouldn't want to get into a habit of using it. Both answers worked absolutely fine in my tests.
I think you are right to be cautious about EAV as it will make your code a bit more complex, and it will be a bit more difficult to do ad-hoc queries against the table.
I've seen many enterprise apps simply adopt something like the following schema -
t_Comp_Data
-----------
CompId
Name
Surname
Email
Field1
Field2
Field3
...
Fieldn
In this instance, the generic fields (Field1 etc) mean different things for the different competitions. For ease of querying, you might create a different view for each competition, with the proper field names aliased in.
I'm usually hesitant to use it, but this looks like a good situation for the Entity-attribute-value model if you use a database.
Basically, you have a CompetitionEntry (entity) table with the standard fields which make up every entry (Competition_id, maybe dates, etc), and then a CompetitionEntryAttribute table with CompetitionEntry_id, Attribute and Value.You probably also want another table with template attributes for each competition for creating new entries.
Unfortunately you will only be able to store one datatype, which will likely have to be a large nvarchar.
Another disadvantage is the difficulty to query against EAV databases.
Another option is to create one table per competition (possibly in code as part of the competition creation), but depending on the number of competitions this may be impractcal.