Database Table Schema and Aggregate Roots

Database Table Schema and Aggregate Roots - c#

Applicaiton is single user, 1-tier(1 pc), database SqlCE. DataService layer will be (I think) : Repository returning domain objects and quering database with LinqToSql (dbml). There are obviously a lot more columns, this is simplified view.
LogTime in separate table: http://i53.tinypic.com/9h8cb4.png
LogTime in ItemTimeLog table (as Time): http://i51.tinypic.com/4dvv4.png
alt text http://i53.tinypic.com/9h8cb4.png
This is my first attempt of creating a >2 tables database. I think the table schema makes sense, but I need some reassurance or critics. Because the table relations looks quite scary to be honest. I'm hoping you could;
Look at the table schema and respond if there are clear signs of troubles or errors that you spot right away.. And if you have time,
Look at Program Summary/Questions, and see if the table layout makes makes sense to those points.
Please be brutal, I will try to defend :)
Program summary:
a) A set of categories, each having a set of strategies (1:m)
b) Each day a number of items will be produced. And each strategy MAY reference it.
(So there can be 50 items, and a strategy may reference 23 of them)
c) An item can be referenced by more than one strategy. So I think it's an m:m relation.
d) Status values will be logged at fixed time-fractions through the day, for:
- .... each Strategy.....each StrategyItem....each item
e) An action on an item may be executed by a strategy that reference it.
- This is logged as ItemAction (Could have called it StrategyItemAction)
User Requsts
b) -> e) described the main activity mode of the program. To work with only today's DayLog , for each category. 2nd priority activity is retrieval of history, which typically will be From all categories, from day x to day y; Get all StrategyDailyLog.
Questions
First, does the overall layout look sound? I'm worried to see that there are so many relationships in all directions, connecting everything. Is this normal, or does it look like trouble?
StrategyItem is made to represent an m:m relationship. Is it correct as I noted 1:m / 1:1 (marked red) ?
StrategyItemTimeLog and ItemTimeLog; Logs values that both need to be retrieved together, when retreiving a StrategyItem. Reason I separated is that the first one is strategy-specific, and several strategies can reference same item. So I thought not to duplicate those values that are not dependent no strategy, but only on the item. Hence I also dragged out the LogTime, as it seems to be the only parameter to unite the logs. But this all looks quite disturbing with those 3 tables. Does it make sense at all? Or you have suggestion?
Pink circles shows my vague attempt of Aggregate Root Paths. I've been thinking in terms of "what entity is responsible for delete". Though I'm unsure about the actual root. I think it's Category. Does it make sense related to User Requests described above?
EDIT1:
(Updated schema, showing typical number of hierarchy items for the first few relations, for 365 days, and additional explanations)
1:1 relation: Sorry. I made a mistake. The StrategyDailyLog should be 1:m. See updated schema. It is one per Strategy, per day.
DayLog / StrategyDailyLog: I’ve been pondering over wether DayLog shall be a part of the hierarchy like this or not. The purpose of the DayLog table is to hold “sum values” derived from all the StrategyDailyLog tables for the same day. Like performance values for this day. It also holds the date value. Which allows me to omit a date value in the StrategyDailyLog (Which I feel would kind of be a duplicate modeling of the date-field), but instead the reference to DayLog exist to “find” the date. I’m not sure if this is an abuse/misconception of normalization.
Null value: I haden’t thought about this. I believe I found 2, as now marked in StrategyDailyLog and ItemAction. They can not be null on creation, but they can be set to null if one need to delete either a Strategy, or a StrategyItem. That should not require a delete of the StrategyDailyLog and the ItemAction. Hence they can be set to null.
All Id –columns: My idea was to have ID (autogenerated Integer) as PK for all my tables. I believed that also would be sufficient as candidate key. Is this not a proper way to make PKs? It’s the only way any table of mine can be identified. I asked a question before if that was ok, maybe I misunderstood, but thought that was a good approach.
m:m relation: This is what I have attempted to do: StrategyItem is the m:m table of StrategyDailyLog / DailyItem.

Ok. Here is me being brutal. I do not understand the model.
So instead of trying to comment on that so much, here are some thoughts that came to my mind when I looked at it.
I think you should have look at your 1:1 relationships (all of them). Why is DayLog and StrategyDailyLog separated in two tables? Probably because you will always have at least one DayLog item but not all DayLog items have a StrategyDailyLog item. If that is the case you can have a StrategyID FK in DayLog table with allow nulls option.
It would help to understand the model if you could show which fields are required and which fields accept null as a value.
All your tables have its own id column. That can be quite confusing when doing 1:1 relations and m:m relations. For a 1:1 relation, usually the relation between the two tables is made on the primary key in both tables. If you do not do that you have to create a candidate key on the foreign key column. In your case that means that StrategyDailyLog should have a candidate key on DayLogID.
A m:m relation between two tables is usually solved by adding a new table in between, with the primary keys from both tables. Those fields together is the primary key for the table in the middle.
Lets say for example that you should have a m:m relationship between Category and Strategy. You should then create a table called CategoryStrategy with two fields CategoryID and StrategyID that together is the primary key for table CategoryStrategy.
I hope my comments makes sense and that they are useful to you.
EDIT 2011-01-17
I do not think that you should have as a principle to use a IDENTITY column as primary key in all tables. A m:m relation does not need it so you should not do it. I also think that you have misunderstood what I meant with a candidate key. A candidate key is a key that could have been used as the primary key. In MS SQL Server you define a UNIQUE CONSTRAINT for your candidate key.
Ex: Table StrategyItem have id as PK but the combination of StrategyID and DailyItemID is the candidate key. Better would be to remove id and use StrategyID+DailyItemID as PK.
Below is the schema that I would have built with your description. I might have missed something important because I do not know everything about what you want to do.
You should not think so much about query performance and building aggregates when designing the schema. That can be handled by creating indexes on columns and using sum, count and group by in your queries. An index on column Created in the model below would be necessary for your queries on a date or date interval. In MS SQL Server there is something called the clustered index. Default the PK of a table is the clustered index but in this case I would make the index on Created column the clustered index.
A Category has 0,1 or more Strategy.
LogItem have on Category and optionally one Strategy
LogItem.Created holds date and time.

Related

User should be able to reset autoincrement (identity) column: possible solutions

In the app there should be a functionality for the user to reset orderNumber whenever needed. We are using SQL Server for db, .NET Core, Entity Framework, etc. I was wondering what is the most elegant way to achieve this?
Thought about making orderNumber int, identity(1,1), and I've searched for DBCC CHECKIDENT('tableName', RESEED, 0), but the latter introduces some permissions concerns (the user has to own the schema, be sysadmin, etc.).
EDIT: orderNumber is NOT a primary key, and duplicate values are not the problem. We should just let the user (once a year probably) reset the numbering of their orders to start from 1 again..
Any advice?

An identity column is used to auto-generate incremental values, so if you're relying on this column as the primary key or some unique identifer for rows, updating this can cause issues with duplicates.
It's difficult to recommend the best solution without knowing more about your use case, but I would consider (1) if this orderNumber should be the PK or would some surrogate key like (customerId, locationId, date) makes sense and allows you to more freely update orderNumber without impacts on data integrity, or (2) if keeping orderNumber as an identity make sense, but you could build a data model or table that maps multiple rows in this table to the same "order" allowing you to maintain the key on this base table.

It seems that orderNumber is a business layer concern - therefore I recommend a non-SQL solution. You need C# code that generates the number for storage in your "Order" entity. I wouldn't use IDENTITY() to implement/solve this.
The customer isn't going to reset anything in the DB, your code will do this. You need a "take a number" service in your business layer and a place in the UI to reset it (presumable Per Customer).
Sql Server has Sequence. My only concern regarding using it is partitioning per customer (an assumed requirement). Will you have multiple customers? If so, you probably can't have a single number generator. Hence why I suggest a C# implementation (sure, you'll want to save the state as numbers are handed out).

Identity should not be used in the way you're suggesting. Presumably you don't want a customer to get two different orders with the same order number (i.e., order number is unique within customer). If you don't care if customers get discontinuous order numbers, then you can use a sequence, but if you want continuous order numbers, then you would need to create an separate sequence for each customer, which is not a good solution either. I suggest you set the order number to max([order number]) over(partition by [customer id]) + 1 on the insert. That will automatically give you the next order number for a particular customer.

Data modeling for Same tables with same columns

I have many tables that have same number of columns and names because they are all lookup tables.
For example, there are LabelType and TaskType tables. LabelType and TaskType tables have TypeID and TypeName columns. They will be used as a foreign key in other tables such as LabelType table with shippingLog table and TaskType table with EmployeeTask Table.
LabelType Table
TypeID TypeName
1 Fedex
2 UPS
3 USPS
TaskType Table
TypeID TypeName
1 Receiving
2 Pickup
3 Shipping
So far, I have more than 20 tables and I am expecting it is going to be keep increasing.
I have no problem with it , but I am just wondering whether there is any better or smarter way of using tables or not. I was even thinking to consolidate all those tables as one lookup Type Table and differentiate them by adding a foreign key from lookup table. The lookup table may have data like Label, Task, and etc. Then I just need one or two tables for all those lookup data.
Please, advise me if you have any better or smarter way of data modeling.

Just because data has similar structure doesn't mean it has the same meaning or same constraints. Keep your lookup tables separate. This keeps foreign keys separate, so the database can protect itself from referencing the wrong kind of lookup data.1
I wish relational DBMSes supported inheritance, where you could define the basic structure in the parent table and just add specific FKs in the child tables. As it stands now, you'll need to endure some repetition in your DDL...
NOTE: One exception from "keep lookup tables separate" rule might be when your system needs to be dynamic (i.e. be able to add new kinds of lookup data without actually creating new physical tables in the database), but it doesn't look that way from your question.
1 With one big lookup table, FKs alone won't stop (for example) the ShippingLog table from referencing a row meant for the EmployeeTask table. By using identifying relationships and migrating PKs, you can protect yourself from this, but not without introducing some redundancies and needing some careful constraining. It's cleaner and probably more performant to simply do the right thing and keep lookup tables separate.

Keep your lookup tables separate. It's faster at lookup time, and you will do millions of lookups between times when you add a new lookup table.
A lot of tables is not a big problem.

Insert many rows to one table OR insert rows separately to many table?

I have two DataBase Table (SQL CE). A Teacher table and a A Class table. The two tables have One-to-Many relationship where one teacher has many classes (i.e. Class has a foreign key teacher_id). Number of teachers (rows) is inserted (or generated) through C# code in run time, so as classes
Which of the following is faster in INSERT and SELECT?
Each time a new teacher is INSERTed, a new Class Table is created (e.g. Class_teacher001) to store whichever classes the teacher has. In this case, each Class Table doesn't have to be so large and foreign key is not needed because table name would identify itself. But there will be one Teacher table and many Class_xxx Tables
Only one Teacher table and one Class table. Each class row has a foreign key pointing at the Teacher table. Only one Class table, but it will get very long. I worry searching and reading wil be slow

Regardless of which is faster, (2) is the way to go....simply create indexes to support your searches. This is how almost all relational databases are used.
The nightmare of maintaining option (1) makes me shudder

OK, where to start. First, the relationship between Teacher and Class is potentially many-to-many, but as described by you is at least one-to-many.
The first option is absolutely the wrong way to go. Never dynamically create tables. The second option is how this sort of thing is handled. Databases are powerful, written by very smart people (usually), and can handle many more rows than all the students at a given school.
As long as you properly index your tables, they can easily support hundreds of millions of records.

I also agree with Mitch Wheat. Because when you create an index your table physically sort according to our Teacher Be creating Combined Index of (Teacher_Id ,Class_Id).
Though its will Help to get fast retrieval Of Select Statment.

Unless you are already having performance problems, I would not worry about them. There are many things that can cause performance problems other than the number of rows, and they should be dealt with differently depending on what they are. You have to worry more about the number of columns in a table affecting performance than you do about the number of rows. Also the number of concurrent connections to the database. One million rows in a table is not that many, it is the other two items in conjunction with that many rows that will make a database slow. You should use the second option.

Setting IsPrimaryKey=true on column in table with no primary key

I'm writing a quick app using LINQ to SQL to populate a db with some test data and had a problem because one of the tables had no primary key as described by this bloke Can't Update because table has no primary key.
Taking the top answer I added the IsPrimaryKey attribute to an appropriate column and the app worked even though the I haven't changed the db table itself (i.e. there is still no primary key).
I expect it will be ok for my current intentions but are there any side effects which may come from having a table without a primary key seen as having one by the LINQ object?
(I can only think it might be a problem if I tried to read from a table (or populate to a table) with data where the 'primary key' column has the same value in more than one row).

When using an ORM framework, you can simulate keys and foreign keys at ORM level, thus "hiding and overriding" the database defined ones.
That said, that's a practice that I wouldn't recommend. Even if the model is more important than the database itself, the logical structure should always match. It is ok doing what you did if you're forced to work with a legacy database and you don't have the possibility to fix it (like adding the PK on the table). But try to walk the righteous path everytime you can :)
Tables without a PK = Pure Evil.

Basically if all the table updates go through the LINQ object you should be fine. If you have a DBA that decides to modify data directly though SQL then you can quickly run into issues if he duplicates a row with the same PK value.

How to Update the primary key of table which is referenced as foreign key in another table?

Suppose a
Table "Person" having
"SSN",
"Name",
"Address"
and another
Table "Contacts" having
"Contact_ID",
"Contact_Type",
"SSN" (primary key of Person)
similarly
Table "Records" having
"Record_ID",
"Record_Type",
"SSN" (primary key of Person)
Now i want that when i change or update SSN in person table that accordingly changes in other 2 tables.
If anyone can help me with a trigger for that
Or how to pass foreign key constraints for tables

Just add ON UPDATE CASCADE to the foreign key constraint.

Preferably the primary key of a table should never change. If you expect the SSN to change you should use a different primary key and have the SSN as a normal data column in the person table. If it's already too late to make this change, you can add ON UPDATE CASCADE to the foreign key constraint.

If you have PKs that change, you need to look at the table design, use an surrogate PK, like an identity.
In your question you have a Person table, which could be a FK to many many tables. In that case a ON UPDATE CASCADE could have some serious problems. The database I'm working on has well over 300 references (FK) to our equivalent table, we track all the various work that a person does in each different table. If I insert a row into our Person table and then try to delete it back out again (it will not be used in any other tables, it is new) the delete will fail with a Msg 8621, Level 17, State 2, Line 1 The query processor ran out of stack space during query optimization. Please simplify the query. As a result I can't imagine an ON UPDATE CASCADE would work either when you get many FKs on your PK.
I would never make sensitive data like a SSN a PK. Health care companies used to do this and had a painful switch because of privacy. I hope you don't have a web app and have a GET or POST variable called SSN with the actual value in it!! Or display the SSN on every report, or will you shred all old printed reports and limit access to who views each report., etc.

Well, assuming the SSN is the primary key of the Person table, I would just (in a transaction of course):
create a brand new row with the new SSN, copying all other details from the old row.
update the columns in the other tables to point to the new row.
delete the old row.
Now this is actually a good example of why you shouldn't use real data as table cross-references, if that data can change. If you'd used an artificial column to tie them together (and only stored the SSN in one place), you wouldn't have the problem.

Cascade update and delete are very dangerous to use. If you have a million child records, you could end up with a serious locking problem. You should code the updates and deletes instead.
You should never use a PK with the potential to change if it can be avoided. Nor should you ever use SSN as a PK because it should never be stored unencrypted in your database. Never, unless your company likes to be sued when they are the cause of an indentity theft incident. This is not a design flaw to shrug off as this is legacy, we don't have time to fix. This is a design flaw that could bankrupt your company if someone steals your backup tapes or gets the ssns out of the sytem in another manner (most of these types of thefts are internal BTW). This is an urgent - must fix now design flaw.
SSN is also a bad candidate because it changes (people change them when they are victims of identity theft for instance.) Plus an integer PK will have faster performance than a nine-digit PK.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.