Data modeling for Same tables with same columns - c#

I have many tables that have same number of columns and names because they are all lookup tables.
For example, there are LabelType and TaskType tables. LabelType and TaskType tables have TypeID and TypeName columns. They will be used as a foreign key in other tables such as LabelType table with shippingLog table and TaskType table with EmployeeTask Table.
LabelType Table
TypeID TypeName
1 Fedex
2 UPS
3 USPS
TaskType Table
TypeID TypeName
1 Receiving
2 Pickup
3 Shipping
So far, I have more than 20 tables and I am expecting it is going to be keep increasing.
I have no problem with it , but I am just wondering whether there is any better or smarter way of using tables or not. I was even thinking to consolidate all those tables as one lookup Type Table and differentiate them by adding a foreign key from lookup table. The lookup table may have data like Label, Task, and etc. Then I just need one or two tables for all those lookup data.
Please, advise me if you have any better or smarter way of data modeling.

Just because data has similar structure doesn't mean it has the same meaning or same constraints. Keep your lookup tables separate. This keeps foreign keys separate, so the database can protect itself from referencing the wrong kind of lookup data.1
I wish relational DBMSes supported inheritance, where you could define the basic structure in the parent table and just add specific FKs in the child tables. As it stands now, you'll need to endure some repetition in your DDL...
NOTE: One exception from "keep lookup tables separate" rule might be when your system needs to be dynamic (i.e. be able to add new kinds of lookup data without actually creating new physical tables in the database), but it doesn't look that way from your question.
1 With one big lookup table, FKs alone won't stop (for example) the ShippingLog table from referencing a row meant for the EmployeeTask table. By using identifying relationships and migrating PKs, you can protect yourself from this, but not without introducing some redundancies and needing some careful constraining. It's cleaner and probably more performant to simply do the right thing and keep lookup tables separate.

Keep your lookup tables separate. It's faster at lookup time, and you will do millions of lookups between times when you add a new lookup table.
A lot of tables is not a big problem.

Related

One table for different products with same fields but with different data range?

I want to stores many different products in my database(as well as in one table). With help of inheritance (Table per Concrete Type) ,i am keeping all common fields(date,customer,orderID) in parent table and made one child table for one product .
one child table => it holds many different product with same and different fields
ProductOne = {A,B,**C**}
ProductTwo = {A,B,**D**}
ProductThree ={A,B,**F**}
Now i made TableAllProduct and Field of tables are {A,B,C,D,F}
To reason to select this design ,because i am thinking about my future new product ,For example if we got new product with these exist fields{A,B,C,D,F} ,so we should able to store new product data in TableAllProduct table without any software upgrade (instead create new table as per Inheritance approach which required new code)
TableAllProduct can hold three different product ProductOne = {A,B,C} ProductTwo = {A,B,D} ProductThree ={A,B,F}
Next step is stores Data in TableAllProduct
As per given scenario, ProductOne and ProductTwo have common field {A,B} But A field stores data from ProductOne as well as for ProductTwo
ProductOne have following option=={data__A_1,data__A_2 ,data__A_3}
ProductTwo have following option =={data__B_1,data__B_2 }
which i brings from other table (Manny to Manny)
Here we breaks rules of RDBMS ,Because I need multiple foreign key at one column ,But RDBMS doesn't supports , To delete/edit of foreign key responsibilities/function can done with DELETE_trigger(which will check record in Category table )
In this way , i can stores multiple product in table for now and future.
What is disadvantage of this approach ?
Is there any other possibilities solutions to solve this problem with better way .(I know about Entity–attribute–value model ,but in our situation ,product doesn't not changes daily /weekly bases and EVA is too complex to maintain).Thanks
You need to normalize your data.
The model you've described can work. You need to have the AllProducts table only contain the attributes(columns) in common for all of the products. Attributes like name and SKU, and maybe a reference to the vendor/supplier.
Once you have identified the common attributes, the remaining attributes can be moved into a table specific to each product. The SpecificProduct table can use the PK of the AllProducts table as a PK and FK. Every record in SpecificProduct will also have a record in the AllProducts table. The complete data for a specific product consists of the attributes from the AllProducts table joined to the columns for the specific product table.
This strategy helps to keep the AllProducts table width small when a varied subset of attributes relates to a small subset of the records in the table. By reusing the AllProducts PK as the PK/FK of the specific products table, you ensure joins performance will be good as well.

Entity-Framework: Sort on Many-to-Many

I have two entities with a many-to-many relationship and I'm looking for a way to be able to sort the result from the tables.
In other words, when I get a row from table1 and all the corresponding records from table2 I want to be able to have a stored sort order for table2 that's specific for that row in table1.
My first thought was to add a sort column to the table that represents the relation, but to my knowledge there is no way of accessing the new column in the relation.
Does anybody have any suggestions on how to accomplish this?
As Ladislav Mrnka states, if you add the new column to the junction table, there will be a new entity "in the middle" that will make navigation much harder.
If you want to avoid this, but still be able to make the navigation as usual, you can keep the junction table and add a new table, just like the junction, with the order column added. When you need the order info, you can just join this table to get it and use it.
This new table will, of course, require some maintenance. I.e. you can create a delete on cascade for the junction+order to the junction table. And use a trigger (ooops, that's not good!) to create a new row with default order for each new created relation. So, it would be much more advisable to handle this in you business logic.
I know it's too tricky, but there's no magic solution... just choose what is more comfortable to you.
You can add new column to the junction table but the table will become a new entity so your model will now consist of three entities where and two one-to-many relations instead of two entities and single many-to-many relation.
Due to your requirement of sorting table2 results per table1 row and not globally, you have three non-elegant solutions:
The approach Ladislav suggested (with the bad looking model) - add order column, add bridge entity.
The approach JotaBe suggested (with the bad looking schema) - add an additional table and maintain both.
If the context is used only for reading (no need to change relationships) and you don't mind changing the EDMX manually after every update from DB, then you could hack the emdx and change the SSDL definition of the relationship table to an SQL query e.g.
<EntitySet Name="AS_TO_BS" EntityType="BlaBla.Store.AS_TO_BS">
<DefiningQuery>
SELECT ID1, ID2 ORDER BY ORDERVALUE
FROM AS_TO_BS
</DefiningQuery>
</EntitySet>
Instead of:
<EntitySet Name="AS_TO_BS" EntityType="BlaBla.Store.AS_TO_BS"
store:Type="Tables" Schema="MY_SCHEMA" />
See if you can relax your requirements, if not then settle on one of the three solutions.
Edit:
Another idea:
Use a view to duplicate the relationship table, then map the relationship to the view (as read only) and the order entity to the table (writable).
Thank you all for the good answers to my question. I now feel more confident about the pros and cons of the different solutions.
What I ended up doing was this: As it turns out, just adding a sort column to the relation-table doesn't affect the model, update from DB still works and the table still gets mapped as a many-to-many relation. Then I created a stored procedure that fetches the sort column from the relation-table and another stored procedure to update the sort-index of a specified record.

De-normalize table or use Joins in Entity Framework

I need to know what are the tradeoffs of using a denormalized table vs using two separate tables and accessing the data using joins. I am using Entity Framework 4.
In my case I have two tables Order and OrderCategoryDetails.
I am thinking whether merging these two tables into one single table is better?
If denormalized, the added columns (OrderCategory and OrderSubcategory will be will be sparse (could be 100% empty. Will always be at least 50% empty)
On the other hand, if I keep it as it is, I am worried about frequent join operations being executed (i.e. whenever I am querying for a specific Order, I would need information from OrderCategoryDetails too.
At present, I have normalized tables and use navigational properties:
To access Order Category information from OrderItem instance
OrderItem orderItem = _context.OrderItems.Where(...).FirstOrDefault();
if(2 == orderItem.SalesOrder.Category.OrderCategory){ ...}
To access Order Category information from Order instance
Order order = _context.Orders.Where(...).FirstOrDefault();
if(2 == order.Category.OrderCategory){ ...}
This is my schema:
Table : Order
ID (Primary Key)
Date
Amount
ItemCount
OrderCategoryInfo (FK - join with OrderCategoryDetails on OrderCategoryDetails.ID)
Table : OrderCategoryDetails
ID (Primary Key)
OrderCategory
OrderSubCategory
Table : OrderItem
OrderItem ID (Primary key)
Order ID (FK - Join with Order)
Database used: SQL Server 2008 R2
My general advice would be to ask yourself the following question; does every single row from the first table require a row from the second table? If the answer is yes then you might be better off de-normalising the data. If the answer is no you're probably better off keeping it as a seperate table.
As long as you set up your foreign key association between the two tables you shouldn't concern yourself with performance implications of performing a join. It will only become an issue in pathological situations.
Based upon your answers in the comments thread, I'd recommend that you should keep the tables separate and set up a foreign key relationship between the two.
If you do get any performance problems further down the line, run a profiler on the problematic SQL and add any indexes that the profiler recommends, but only do this for queries that are used frequently. Indexes are great for speeding up queries but come at the cost of insert performance, so take care with them.

Insert many rows to one table OR insert rows separately to many table?

I have two DataBase Table (SQL CE). A Teacher table and a A Class table. The two tables have One-to-Many relationship where one teacher has many classes (i.e. Class has a foreign key teacher_id). Number of teachers (rows) is inserted (or generated) through C# code in run time, so as classes
Which of the following is faster in INSERT and SELECT?
Each time a new teacher is INSERTed, a new Class Table is created (e.g. Class_teacher001) to store whichever classes the teacher has. In this case, each Class Table doesn't have to be so large and foreign key is not needed because table name would identify itself. But there will be one Teacher table and many Class_xxx Tables
Only one Teacher table and one Class table. Each class row has a foreign key pointing at the Teacher table. Only one Class table, but it will get very long. I worry searching and reading wil be slow
Regardless of which is faster, (2) is the way to go....simply create indexes to support your searches. This is how almost all relational databases are used.
The nightmare of maintaining option (1) makes me shudder
OK, where to start. First, the relationship between Teacher and Class is potentially many-to-many, but as described by you is at least one-to-many.
The first option is absolutely the wrong way to go. Never dynamically create tables. The second option is how this sort of thing is handled. Databases are powerful, written by very smart people (usually), and can handle many more rows than all the students at a given school.
As long as you properly index your tables, they can easily support hundreds of millions of records.
I also agree with Mitch Wheat. Because when you create an index your table physically sort according to our Teacher Be creating Combined Index of (Teacher_Id ,Class_Id).
Though its will Help to get fast retrieval Of Select Statment.
Unless you are already having performance problems, I would not worry about them. There are many things that can cause performance problems other than the number of rows, and they should be dealt with differently depending on what they are. You have to worry more about the number of columns in a table affecting performance than you do about the number of rows. Also the number of concurrent connections to the database. One million rows in a table is not that many, it is the other two items in conjunction with that many rows that will make a database slow. You should use the second option.

Database Table Schema and Aggregate Roots

Applicaiton is single user, 1-tier(1 pc), database SqlCE. DataService layer will be (I think) : Repository returning domain objects and quering database with LinqToSql (dbml). There are obviously a lot more columns, this is simplified view.
LogTime in separate table: http://i53.tinypic.com/9h8cb4.png
LogTime in ItemTimeLog table (as Time): http://i51.tinypic.com/4dvv4.png
alt text http://i53.tinypic.com/9h8cb4.png
This is my first attempt of creating a >2 tables database. I think the table schema makes sense, but I need some reassurance or critics. Because the table relations looks quite scary to be honest. I'm hoping you could;
Look at the table schema and respond if there are clear signs of troubles or errors that you spot right away.. And if you have time,
Look at Program Summary/Questions, and see if the table layout makes makes sense to those points.
Please be brutal, I will try to defend :)
Program summary:
a) A set of categories, each having a set of strategies (1:m)
b) Each day a number of items will be produced. And each strategy MAY reference it.
(So there can be 50 items, and a strategy may reference 23 of them)
c) An item can be referenced by more than one strategy. So I think it's an m:m relation.
d) Status values will be logged at fixed time-fractions through the day, for:
- .... each Strategy.....each StrategyItem....each item
e) An action on an item may be executed by a strategy that reference it.
- This is logged as ItemAction (Could have called it StrategyItemAction)
User Requsts
b) -> e) described the main activity mode of the program. To work with only today's DayLog , for each category. 2nd priority activity is retrieval of history, which typically will be From all categories, from day x to day y; Get all StrategyDailyLog.
Questions
First, does the overall layout look sound? I'm worried to see that there are so many relationships in all directions, connecting everything. Is this normal, or does it look like trouble?
StrategyItem is made to represent an m:m relationship. Is it correct as I noted 1:m / 1:1 (marked red) ?
StrategyItemTimeLog and ItemTimeLog; Logs values that both need to be retrieved together, when retreiving a StrategyItem. Reason I separated is that the first one is strategy-specific, and several strategies can reference same item. So I thought not to duplicate those values that are not dependent no strategy, but only on the item. Hence I also dragged out the LogTime, as it seems to be the only parameter to unite the logs. But this all looks quite disturbing with those 3 tables. Does it make sense at all? Or you have suggestion?
Pink circles shows my vague attempt of Aggregate Root Paths. I've been thinking in terms of "what entity is responsible for delete". Though I'm unsure about the actual root. I think it's Category. Does it make sense related to User Requests described above?
EDIT1:
(Updated schema, showing typical number of hierarchy items for the first few relations, for 365 days, and additional explanations)
1:1 relation: Sorry. I made a mistake. The StrategyDailyLog should be 1:m. See updated schema. It is one per Strategy, per day.
DayLog / StrategyDailyLog: I’ve been pondering over wether DayLog shall be a part of the hierarchy like this or not. The purpose of the DayLog table is to hold “sum values” derived from all the StrategyDailyLog tables for the same day. Like performance values for this day. It also holds the date value. Which allows me to omit a date value in the StrategyDailyLog (Which I feel would kind of be a duplicate modeling of the date-field), but instead the reference to DayLog exist to “find” the date. I’m not sure if this is an abuse/misconception of normalization.
Null value: I haden’t thought about this. I believe I found 2, as now marked in StrategyDailyLog and ItemAction. They can not be null on creation, but they can be set to null if one need to delete either a Strategy, or a StrategyItem. That should not require a delete of the StrategyDailyLog and the ItemAction. Hence they can be set to null.
All Id –columns: My idea was to have ID (autogenerated Integer) as PK for all my tables. I believed that also would be sufficient as candidate key. Is this not a proper way to make PKs? It’s the only way any table of mine can be identified. I asked a question before if that was ok, maybe I misunderstood, but thought that was a good approach.
m:m relation: This is what I have attempted to do: StrategyItem is the m:m table of StrategyDailyLog / DailyItem.
Ok. Here is me being brutal. I do not understand the model.
So instead of trying to comment on that so much, here are some thoughts that came to my mind when I looked at it.
I think you should have look at your 1:1 relationships (all of them). Why is DayLog and StrategyDailyLog separated in two tables? Probably because you will always have at least one DayLog item but not all DayLog items have a StrategyDailyLog item. If that is the case you can have a StrategyID FK in DayLog table with allow nulls option.
It would help to understand the model if you could show which fields are required and which fields accept null as a value.
All your tables have its own id column. That can be quite confusing when doing 1:1 relations and m:m relations. For a 1:1 relation, usually the relation between the two tables is made on the primary key in both tables. If you do not do that you have to create a candidate key on the foreign key column. In your case that means that StrategyDailyLog should have a candidate key on DayLogID.
A m:m relation between two tables is usually solved by adding a new table in between, with the primary keys from both tables. Those fields together is the primary key for the table in the middle.
Lets say for example that you should have a m:m relationship between Category and Strategy. You should then create a table called CategoryStrategy with two fields CategoryID and StrategyID that together is the primary key for table CategoryStrategy.
I hope my comments makes sense and that they are useful to you.
EDIT 2011-01-17
I do not think that you should have as a principle to use a IDENTITY column as primary key in all tables. A m:m relation does not need it so you should not do it. I also think that you have misunderstood what I meant with a candidate key. A candidate key is a key that could have been used as the primary key. In MS SQL Server you define a UNIQUE CONSTRAINT for your candidate key.
Ex: Table StrategyItem have id as PK but the combination of StrategyID and DailyItemID is the candidate key. Better would be to remove id and use StrategyID+DailyItemID as PK.
Below is the schema that I would have built with your description. I might have missed something important because I do not know everything about what you want to do.
You should not think so much about query performance and building aggregates when designing the schema. That can be handled by creating indexes on columns and using sum, count and group by in your queries. An index on column Created in the model below would be necessary for your queries on a date or date interval. In MS SQL Server there is something called the clustered index. Default the PK of a table is the clustered index but in this case I would make the index on Created column the clustered index.
A Category has 0,1 or more Strategy.
LogItem have on Category and optionally one Strategy
LogItem.Created holds date and time.

Categories

Resources