I'm working with Entity Framework Core (6.0.7) on a legacy database, that isn't under my control, so I can't change any of the database structure unfortunately.
The database structure that I've inherited has quite a few tables in the following format. (this isn't the actual structure, I've simplified it for this question)
id int (pk)
name varchar
item_model varchar
item_id int
...
It's probably irrelevant but I'm using MySQL.
Where the item_model contains the name of a table, and the item_id contains the id of the primary key for that table. So the item_id can point to many different tables.
Although I don't like this pattern, it is used throughout the DB, and in some cases, with only one or two distinct item_models.
Some example data might be
id name item_model item_id
1 Item1 ships 5
2 Item2 cruises 2
3 Item3 ships 3
...
I would prefer the table to look like... as I would then be able to define the foreign keys correctly, and allow the navigation field to work correctly.
id name ship_id cruise_id
1 Item1 5 null
2 Item2 null 2
3 Item3 3 null
...
But as I mentioned before, I can't change the DB structure
How can I tell EF that if item_model=ships that the item_id refers to a ship_ip on the ships table?
I was thinking of creating a virtual field on the model which has either a c# or SQL case statement on it, but I can't see how I can do this. i.e. something like case when item_model='ships' then item_id else null end as ship_id
I guess the bottom line is that I want to keep the logic as simple as possible for anybody utilising the model/repository, and just being able to specify tab.Cruises.CruiseLaunchDate rather that having to add logic to each query in the repository.
ps. I'm relatively new to EF, so please forgive me if I've missed something obvious.
Related
I am writing a C# WinForms program which includes a user input textbox, the value of which will be used to create a table. I have been thinking about what the best way to handle invalid T-SQL table names is (though this can be extended to many other situations). Currently the only method I can think of would be to check the input string for any violations of valid table names individually, though this seems long winded and could be prone to missing certain characters for example due to my own ignorance of what is a violation and what is not.
I feel like there should be a better way of doing this but have been unable to find anything in my search so far. Can anyone help point me in the right direction?
As told you in a comment already you should not do this...
You might use something like this
USE master;
GO
CREATE DATABASE dbTest;
GO
USE dbTest;
GO
CREATE TABLE UserTables(ID INT IDENTITY CONSTRAINT PK_UserTables PRIMARY KEY
,UserInput NVARCHAR(500) NOT NULL CONSTRAINT UQ_UserInput UNIQUE);
GO
INSERT INTO UserTables VALUES(N'blah')
,(N'invalid !%$& <<& >< $')
,(N'silly đź’–');
GO
SELECT * FROM UserTables;
/*
ID UserInput
1 blah
2 invalid !%$& <<& >< $
3 silly đź’–
*/
GO
USE master;
GO
DROP DATABASE dbTest;
GO
You would then create your tables as Table1, Table2 and so on.
Whenever a user enters his string, you visit the table, pick the ID and create the table's name by concatenating the word Table with the ID.
There are better approaches!
But you should think of a fix schema. You will have to define columns (how many, which type, how to name them?). You will feel in hell when you have to query this. Nothing to rely on...
One approach is a classical n:m mapping
A User table (UserID, Name, ...)
A test table (TestID, TestName, TestType, ...)
The mapping table (ID, UserID, TestID, Result VARCHAR(MAX))
Depending on what you need you might add a table
question table (QuestionID, QuestionText ...)
Then use a mapping to bind questions to tests and another mapping to bind answers to such mapped questions.
another approach was to store the result as a generic container (XML or JSON). This keeps your tables slim, but needs to knwo the XML's structure in order to query it.
Many ways to skin a rabbit...
UPDATE
You ask for an explanation...
The main advantage of a relational database is the pre-known structure.
Precompiled queries, cached results, statisics, indexes demand for known structures.
Data integrity is ensured with constraints, foreign keys and so on. All this demands for known names, known types(!) and known relations.
User-specific table names, and even worse: generically defined structures, do not allow for any join, or other typical RDBMS operation. The only approach is to create each and any statement dynamically (string building)
The rule of thumb is: Whenever you think to have to create several objects of for the same, but with different names you should think about the design. It is bad to store Phone1, Phone2 and Phone3. It is better to have a side table, with a UserID and a Phone column (classical 1:n). It is bad to have SalesCustomerA, SalesCustomerB, better use a Customer table and bind its ID into a general Sales table as FK.
You see what I mean? What belongs together should live in one single table. if you need separation add columns to your table and use them for grouping and filtering.
Just imagine you want to do some statistical evaluation of all your user test tables. How would you gather the data into one big pool, if you cannot rely on some structure in common?
I hope this makes it clear...
If you still wnat to stick to your idea, you should give my code sample a try. this allows to map any silly string to a secure and easy to handle table name.
Lots can go wrong with users entering table names. A bunch of whacked out names is maintenance nightmare. A user should not even be aware of table name. It is a security risk as now the program has to have database owner authority. You want to limit users to minimum authority.
Similar to Shnugo but with composite primary key. This will prevent duplicate userID, testID.
user
ID int identity PK
varchar(100) fName
varchar(100) lName
test
ID int identity PK
varchar(100) name
userTest
int userID PK FK to User
int testID PK FK to Test
int score
select t.Name, u.Name, ut.score
from userTest ut
join Test t
on t.ID = ut.testID
join User u
on u.ID = ut.userID
order by t.Name, u.Name
In most of my databases that I have created I've always named my column names by pre-appending the table name. For example:
Person Table
- PersonID
- PersonName
- PersonSurName
- PersonTimestamp
Customer Table
- CustomerID
- CustomerName
- CustomerSurName
- CustomerTimestamp
As opposed to having
Person Table
- ID
- Name
- SurName
- Timestamp
Customer Table
- ID
- Name
- SurName
- Timestamp
But I was wondering if it's really the best, most convenient and self explaining later down the road. Maybe some columns like timestamp should be better left as Timestamp in all tables? Are there any general good-practice about it? I'm using this databases in C# / WinForms.
I don't like either example. I would prefer:
Person Table
- PersonID -- not ID, since this is likely to be referenced in other tables
- FirstName -- why is a first name just a "name"?
- LastName -- why not use the form more common than surname?
- ModifiedDate -- what is a "persontimestamp"?
I am adamantly against primary keys, which will occur in other tables in the model, to be named with generic things like "ID" - I don't ever want to see ON p.ID = SomeOtherTable.PersonID - an entity that is common to more than one table in the model should be named consistently throughout the model. Other aspects like FirstName belong only to that table - even if another table has a FirstName, it's not the same "entity" so to speak. And even if you ever have joins between these two tables for whatever reason, you're always going to be differentiating between them as Person.FirstName and Customer.FirstName - so adding the Person or Customer prefix is just redundant and annoying for anyone who has to write such a query.
Also Timestamp is a horrible name for a column (or as a column suffix) in a SQL Server model because TIMESTAMP is a data type that has nothing to do with date or time, and the name might imply other usage to your peers.
Of course all of this is quite subjective. It's like asking 100 people what car you should drive. You're going to get a handful of answers, and now it'll be up to you to wade through the crap and figure out what makes sense for you, your team and your environment. :-)
In my experience the more standard naming convention for columns is not to include the table name for the following reasons:
It is unnecessary repetition
It is harder to maintain because if the table name changes in the future all the columns would need renaming.
The convention may not be clear to another implementer further down the line
When you perform a query you can always alias the columns at that point if you need the table name in them.
I would only use the table name in a column if the column is a foreign key to another table. If you use this convention it makes it relatively easy to identify the foreign key columns within a table without the use of a relational diagram.
Uggghhh, I guess i'm just too lazy so i would do Department.ID instead of Department.DepartmentID. I have enough redundant work enough as it is.
Your first example is better in my opinion. (The second one is confusing when you build queries and often requires the AS sql keyword)
However in my shop we use a little different convention.
PrimaryKey - this column should be named starting with ID and followed by tablename (IDPerson)
ForeingKey - this columns should be named starting with ID and followed by the name of the external table (IDDepartment)
OtherColumns - they should have a meaningful name for the data contained. It's required to repeat the tablename only for those fields that will happen to have the same name in different tables.
When using parameters to call stored procedure you should reverse the convention (#personID, #departmentID)
Some columns might be repeated or have the same value, I generally use the same name in that case for timestamps etc. I work with financial data and would like to name the columns as the stocks name itself, rather than giving different name in different tables.
I never see same rule between differente vendor. I do not know if there is some standard, but I think no!
Personally, I like your second option. Repeate the table name in the column name, is an unnecessary repetition.
I like to try to keep column names unique throughout my database. I do that by using prefixes for tables and columns so that even foreign keys are uniquely named. This makes doing joins simpler as I (usually) don't need to reference the tables in the joins:
--------------------
pe_people
--------------------
pe_personID (PK)
pe_firstName
pe_lastName
pe_timeStamp
--------------------
--------------------
ac_accounts
--------------------
ac_accountID (PK)
ac_personID (FK)
ac_accountName
ac_accountBalance
--------------------
SELECT pe_firstName, pe_lastName, pe_accountName, pe_accountBalance
FROM pe_people
INNER JOIN ac_accounts ON (ac_personID = pe_personID)
WHERE pe_timeStamp > '2016-01-01';
I have a table that hold records of part numbers for a specific project like so:
create table ProjectParts
(
PartNumber varchar(20) not null,
ProjectID int not null,
Description varchar(max) not null,
primary key nonclustered (PartNumber, ProjectID)
);
I have a view that will collect inventory information from multiple places, but for now I basically have a skeleton:
create view ProjectQuantities as
select distinct
pp.PartNumber,
pp.ProjectID,
0 as QtyOnHand,
0 as QtyOnOrder,
0 as QtyCommitted
from
ProjectParts pp;
So far, so good. I go into EF designer in Visual Studio (I already had an object model using the ProjectParts table) and update the model from the database. I select the ProjectQuantities view, click ok.
EF tries to divine the key on the table as a combination of all columns, but I fix that so the key for the object is the PartNumber and ProjectID columns. I check to make sure this validates, and it does.
Next, I add an 1:1 association between the ProjectPart object and the ProjectQuantity object in the EF UI and click OK. Now, when I try validating, I get the message Error 11008: Association 'ProjectQuantityProjectPart' is not mapped. Seriously? It can't figure this out? Alright, I select the link, go to the Mapping Details, and add the ProjectParts table. It adds both tables and meshes up the key relationships. My job is done. I run the validation.
No luck for me. Now I get the error Error 3021: Problem in mapping fragments starting at line (line number): Each of the following columnes in table ProjectParts is mapped to multiple conceptual side properties. The the message lists the ProjectID and the PartNumber columns and their references to the association I just created.
Well duh! Of course there are multiple references! it's a 1:1 compound key, it has to have multiple references!
This is stopping me from getting stuff done. Does anyone know a simple way to fix this so I can collect Quantity information when I'm collecting data about a project and its parts?
Thanks!
You may find this article useful http://blogs.u2u.be/diederik/post/2011/01/31/Building-an-Entity-Framework-40-model-on-views-practical-tips.aspx
Applicaiton is single user, 1-tier(1 pc), database SqlCE. DataService layer will be (I think) : Repository returning domain objects and quering database with LinqToSql (dbml). There are obviously a lot more columns, this is simplified view.
LogTime in separate table: http://i53.tinypic.com/9h8cb4.png
LogTime in ItemTimeLog table (as Time): http://i51.tinypic.com/4dvv4.png
alt text http://i53.tinypic.com/9h8cb4.png
This is my first attempt of creating a >2 tables database. I think the table schema makes sense, but I need some reassurance or critics. Because the table relations looks quite scary to be honest. I'm hoping you could;
Look at the table schema and respond if there are clear signs of troubles or errors that you spot right away.. And if you have time,
Look at Program Summary/Questions, and see if the table layout makes makes sense to those points.
Please be brutal, I will try to defend :)
Program summary:
a) A set of categories, each having a set of strategies (1:m)
b) Each day a number of items will be produced. And each strategy MAY reference it.
(So there can be 50 items, and a strategy may reference 23 of them)
c) An item can be referenced by more than one strategy. So I think it's an m:m relation.
d) Status values will be logged at fixed time-fractions through the day, for:
- .... each Strategy.....each StrategyItem....each item
e) An action on an item may be executed by a strategy that reference it.
- This is logged as ItemAction (Could have called it StrategyItemAction)
User Requsts
b) -> e) described the main activity mode of the program. To work with only today's DayLog , for each category. 2nd priority activity is retrieval of history, which typically will be From all categories, from day x to day y; Get all StrategyDailyLog.
Questions
First, does the overall layout look sound? I'm worried to see that there are so many relationships in all directions, connecting everything. Is this normal, or does it look like trouble?
StrategyItem is made to represent an m:m relationship. Is it correct as I noted 1:m / 1:1 (marked red) ?
StrategyItemTimeLog and ItemTimeLog; Logs values that both need to be retrieved together, when retreiving a StrategyItem. Reason I separated is that the first one is strategy-specific, and several strategies can reference same item. So I thought not to duplicate those values that are not dependent no strategy, but only on the item. Hence I also dragged out the LogTime, as it seems to be the only parameter to unite the logs. But this all looks quite disturbing with those 3 tables. Does it make sense at all? Or you have suggestion?
Pink circles shows my vague attempt of Aggregate Root Paths. I've been thinking in terms of "what entity is responsible for delete". Though I'm unsure about the actual root. I think it's Category. Does it make sense related to User Requests described above?
EDIT1:
(Updated schema, showing typical number of hierarchy items for the first few relations, for 365 days, and additional explanations)
1:1 relation: Sorry. I made a mistake. The StrategyDailyLog should be 1:m. See updated schema. It is one per Strategy, per day.
DayLog / StrategyDailyLog: I’ve been pondering over wether DayLog shall be a part of the hierarchy like this or not. The purpose of the DayLog table is to hold “sum values” derived from all the StrategyDailyLog tables for the same day. Like performance values for this day. It also holds the date value. Which allows me to omit a date value in the StrategyDailyLog (Which I feel would kind of be a duplicate modeling of the date-field), but instead the reference to DayLog exist to “find” the date. I’m not sure if this is an abuse/misconception of normalization.
Null value: I haden’t thought about this. I believe I found 2, as now marked in StrategyDailyLog and ItemAction. They can not be null on creation, but they can be set to null if one need to delete either a Strategy, or a StrategyItem. That should not require a delete of the StrategyDailyLog and the ItemAction. Hence they can be set to null.
All Id –columns: My idea was to have ID (autogenerated Integer) as PK for all my tables. I believed that also would be sufficient as candidate key. Is this not a proper way to make PKs? It’s the only way any table of mine can be identified. I asked a question before if that was ok, maybe I misunderstood, but thought that was a good approach.
m:m relation: This is what I have attempted to do: StrategyItem is the m:m table of StrategyDailyLog / DailyItem.
Ok. Here is me being brutal. I do not understand the model.
So instead of trying to comment on that so much, here are some thoughts that came to my mind when I looked at it.
I think you should have look at your 1:1 relationships (all of them). Why is DayLog and StrategyDailyLog separated in two tables? Probably because you will always have at least one DayLog item but not all DayLog items have a StrategyDailyLog item. If that is the case you can have a StrategyID FK in DayLog table with allow nulls option.
It would help to understand the model if you could show which fields are required and which fields accept null as a value.
All your tables have its own id column. That can be quite confusing when doing 1:1 relations and m:m relations. For a 1:1 relation, usually the relation between the two tables is made on the primary key in both tables. If you do not do that you have to create a candidate key on the foreign key column. In your case that means that StrategyDailyLog should have a candidate key on DayLogID.
A m:m relation between two tables is usually solved by adding a new table in between, with the primary keys from both tables. Those fields together is the primary key for the table in the middle.
Lets say for example that you should have a m:m relationship between Category and Strategy. You should then create a table called CategoryStrategy with two fields CategoryID and StrategyID that together is the primary key for table CategoryStrategy.
I hope my comments makes sense and that they are useful to you.
EDIT 2011-01-17
I do not think that you should have as a principle to use a IDENTITY column as primary key in all tables. A m:m relation does not need it so you should not do it. I also think that you have misunderstood what I meant with a candidate key. A candidate key is a key that could have been used as the primary key. In MS SQL Server you define a UNIQUE CONSTRAINT for your candidate key.
Ex: Table StrategyItem have id as PK but the combination of StrategyID and DailyItemID is the candidate key. Better would be to remove id and use StrategyID+DailyItemID as PK.
Below is the schema that I would have built with your description. I might have missed something important because I do not know everything about what you want to do.
You should not think so much about query performance and building aggregates when designing the schema. That can be handled by creating indexes on columns and using sum, count and group by in your queries. An index on column Created in the model below would be necessary for your queries on a date or date interval. In MS SQL Server there is something called the clustered index. Default the PK of a table is the clustered index but in this case I would make the index on Created column the clustered index.
A Category has 0,1 or more Strategy.
LogItem have on Category and optionally one Strategy
LogItem.Created holds date and time.
I created a foreign key relationship in my database so that I could access another table as a property in this case House.Area
Now if I create a House object and have Area as null I get the following exception on SubmitChanges():
An attempt was made to remove a relationship between a Area and a House. However, one of the relationship's foreign keys (House.AreaID) cannot be set to null.
Ok i made the example above a little simpler, but a comment below make me think i should give a better example
"House" Table has a column called CityID mapping to a "City" Table connecting on CityID as FK and CityID can't be null
"House" Table also has a column called AreaID mapping to a "Area" Table connecting on CityID as well as AreaID, but AreaID can be null.
House needs to have 1 city always. 1 city can have many houses.
House can be in zero or 1 area. 1 area might have zero or many houses.
I update the
House.City = new City(....);
House.CityID does get a value <- checked
Error Update
An attempt was made to remove a relationship between a Area and a House. However, one of the relationship's foreign keys (House.CityID, House.AreaID) cannot be set to null.
Unless updating the Area is overriding the value of CityID which can explain the error. Comments please. Any way around?
It looks like your database does not allow the field to be null. Load up the House table in SQL server enterprise manager (or whatever database tool you're using) and check that the AreaID field allows nulls
/ \_
/ \|
/ \
/ \
---------
| house |
---------
There fore house must contain area
I agree with Orion's response. This error message is not a LINQ error but a database provider error. Remember LINQ is simply a query language layer on top of a backend provider.