Automatically add data if foreign key is missing

Automatically add data if foreign key is missing - c#

I have two tables in MySQL, let's say:
Product
id (unique)
name
Purchases
id (unique)
fk_productid (references Product.id)
buyerName
I am using InnoDB and created a foreign key from purchases.fk_productid to product.id.
Assume we have no products in the table so far and no purchases. Now someone purchases the product with id 10 which will cause a failure if I simply try this query:
INSERT INTO purchases (fk_productid, buyerName) VALUES (10, "Andreas")
What can I do to insert the purchase anyway? I can think of two things:
Add NULL as fk_productid, but how do I do that directly in the query if it fails?
Add a dumy entry in the product table, but how would I do that? How can I automatically add
INSERT INTO products (id, name) VALUES (10, "???")
before the other query?
Should this be done with triggers, procedures or is there even an easier way I don't know?

Well, first I would say you're probably safer to refuse the order and tell the user to come back, because you should be doing things like checking stock, price, etc. in your product table before verifying a purchase.
Anyway, this should solve your immediate problem:
INSERT IGNORE INTO products (id, name, isDummy) VALUES (10, "TBD", 1); // does nothing if product id 10 already exists
INSERT INTO purchases (fk_productid, buyerName) VALUES (10, "Andreas");

I decided to go with this solution:
INSERT INTO purchases (fk_productid, buyerName) VALUES ((SELECT id FROM product WHERE id=10 LIMIT 1), "Andreas");
I agree that in production this case should never happen - don't worry, it won't. This was more of a simple description of a larger problem I had. ;-)
Thanks for your ideas before!

Related

How to handle invalid user input table name

I am writing a C# WinForms program which includes a user input textbox, the value of which will be used to create a table. I have been thinking about what the best way to handle invalid T-SQL table names is (though this can be extended to many other situations). Currently the only method I can think of would be to check the input string for any violations of valid table names individually, though this seems long winded and could be prone to missing certain characters for example due to my own ignorance of what is a violation and what is not.
I feel like there should be a better way of doing this but have been unable to find anything in my search so far. Can anyone help point me in the right direction?

As told you in a comment already you should not do this...
You might use something like this
USE master;
GO
CREATE DATABASE dbTest;
GO
USE dbTest;
GO
CREATE TABLE UserTables(ID INT IDENTITY CONSTRAINT PK_UserTables PRIMARY KEY
,UserInput NVARCHAR(500) NOT NULL CONSTRAINT UQ_UserInput UNIQUE);
GO
INSERT INTO UserTables VALUES(N'blah')
,(N'invalid !%$& <<& >< $')
,(N'silly 💖');
GO
SELECT * FROM UserTables;
/*
ID UserInput
1 blah
2 invalid !%$& <<& >< $
3 silly 💖
*/
GO
USE master;
GO
DROP DATABASE dbTest;
GO
You would then create your tables as Table1, Table2 and so on.
Whenever a user enters his string, you visit the table, pick the ID and create the table's name by concatenating the word Table with the ID.
There are better approaches!
But you should think of a fix schema. You will have to define columns (how many, which type, how to name them?). You will feel in hell when you have to query this. Nothing to rely on...
One approach is a classical n:m mapping
A User table (UserID, Name, ...)
A test table (TestID, TestName, TestType, ...)
The mapping table (ID, UserID, TestID, Result VARCHAR(MAX))
Depending on what you need you might add a table
question table (QuestionID, QuestionText ...)
Then use a mapping to bind questions to tests and another mapping to bind answers to such mapped questions.
another approach was to store the result as a generic container (XML or JSON). This keeps your tables slim, but needs to knwo the XML's structure in order to query it.
Many ways to skin a rabbit...
UPDATE
You ask for an explanation...
The main advantage of a relational database is the pre-known structure.
Precompiled queries, cached results, statisics, indexes demand for known structures.
Data integrity is ensured with constraints, foreign keys and so on. All this demands for known names, known types(!) and known relations.
User-specific table names, and even worse: generically defined structures, do not allow for any join, or other typical RDBMS operation. The only approach is to create each and any statement dynamically (string building)
The rule of thumb is: Whenever you think to have to create several objects of for the same, but with different names you should think about the design. It is bad to store Phone1, Phone2 and Phone3. It is better to have a side table, with a UserID and a Phone column (classical 1:n). It is bad to have SalesCustomerA, SalesCustomerB, better use a Customer table and bind its ID into a general Sales table as FK.
You see what I mean? What belongs together should live in one single table. if you need separation add columns to your table and use them for grouping and filtering.
Just imagine you want to do some statistical evaluation of all your user test tables. How would you gather the data into one big pool, if you cannot rely on some structure in common?
I hope this makes it clear...
If you still wnat to stick to your idea, you should give my code sample a try. this allows to map any silly string to a secure and easy to handle table name.

Lots can go wrong with users entering table names. A bunch of whacked out names is maintenance nightmare. A user should not even be aware of table name. It is a security risk as now the program has to have database owner authority. You want to limit users to minimum authority.
Similar to Shnugo but with composite primary key. This will prevent duplicate userID, testID.
user
ID int identity PK
varchar(100) fName
varchar(100) lName
test
ID int identity PK
varchar(100) name
userTest
int userID PK FK to User
int testID PK FK to Test
int score
select t.Name, u.Name, ut.score
from userTest ut
join Test t
on t.ID = ut.testID
join User u
on u.ID = ut.userID
order by t.Name, u.Name

Editing duplicate values in a database

I have a DataGrid View pulling some items from my database. What I want to achieve is to be able to edit the pack size or the bar_code fields. I am aware on how to update values in a database but how would I go about doing it if the data is the same? Meaning in many instances a bar code would have multiple pack sizes that is related to the one bar code number. Let's say I have the below screenshot. A data entry error was made and the bar_code and PackSize columns are the exact same. I want to change the first bar code to "1234." How would I achieve this? I can't say update barcode to 'textBox1.Text' where bar_code = '771313166386' because it would then change both data. How do I go about only focusing on one row of data at a time?

You can try using this query to update only the first row:
UPDATE TOP (1) my_table
SET bar_code = '1234'
WHERE bar_code = '771313166386'
You should have an auto-increment id column or a Primary key in your table.

I'd suggest you handle the logic of data duplicate manipulation at the backend rather than pulling them inside the grid and handle it there.
The following query will help you retrieve the duplicate records based on the mentioned columns. You can change it to UPDATE or DELETE as per your requirement.
-- Using cte and ranking function
;With CTE
As
(
Select
Product,
Description,
BarCode,
PackSize
Row_Number() Over(Partition By Product, BarCode, PackSize Order By Product) As RowNum
From YourTable
)
Select * From CTE
-- Where RowNum > 1;
Hope this is helpful :)

This might not help you directly in your answer. But, it is important to mention that your table design is incorrect. You should ensure the data integrity by creating a primary key in your table.
So when you need to update a product you have only one row to update.
Then you can add more tables and use foreign key references between them.

You need to uniquely represent the products. As per your sample data, I guess that there isn't any primary key on your table.
What you can do is either specify a unique constraint on columns to ensure that this type of data entry cannot be done.
If you cannot come up with list of columns to uniquely identify the rows, you can use surrogate keys by specifying Identity column and then while updating, always put a constraint where thisIdentityColumn=value

A data entry error was made and the bar_code and PackSize columns are
the exact same
I think this is the key. Essentially, the exact duplicates are unintentional, and the rows should be unique. Further it looks like bar_code + pack_size is your primary key (subject to data being entered correctly).
So, when you do an update, simply update the first row found that matches a bar_code and a pack_size. If it isn't unique, then the update should ensure that you are one step closer to unique rows in the database.
If you need a non-verbal answer, let me know.

How normalize the table Order Details

I have been given an assignment to complete the following task:
I will be using C# and Sql server to solve the above. However i need an heads up on how many tables i will need since i am completely new to this. I have given this a try, if someone can solve my query its fine or can give me a better alternate solution altogether.
This is what i have tried uptill now.
I have made 3 tables uptill now as shown in image below:
Now if you notice in the second image i could make Order Number applicable only for one party. However, the issue i am still facing is that when one Party Orders more than one type of Product i will have to generate 2 PO Numbers in the Orders Table.
What is the solution to my issue here? How do i Normalize it further?
P.S. This might sound like a simple question as a simple question because this is my first attempt on Normalization.

Maybe you can use this design. Observe below that there is only 1 PO number per party per order. This assumes you want to manually supply the PO Number; otherwise, you can use the OrderID as a convenient autogenerated PO Number.

I would suggest adding another table called "Orders". This table will contain the information that will be the same for the entire order, such as PONumber, PODate, RefDate, PartyID. Then your OrdersDetail table will contain the information for each product ordered, such as ProductId, Quantity, Rate, Amount, OrderId (FK into the new Order table).
Also, don't make all the data types Text. Consider using a data type appropriate for the information being stored. I would also consider either not including Amount or making it a calculated field since it is calculated from other information in the same record (Quantity * Rate).
Further you may consider using a different value other then PONumber as the primary key. As a general rule the primary key would have no other purpose other then internally identifying record. I would suggest adding an OrderDetailsId and make that the primary key.
Edit: (I have added additional information to answer Lohits question below)
If I understand what you are stating in your question, the Party can have multiple Orders; on each order the party can purchase multiple products. Therefore there would be a one-to-many relationship between PartyDetails and Order, and a one-to-may relationship between Order and OrderDetail; and a one-to-one relationship between ProductDetails and OrderDetails. The Party table stores the information about the person purchasing the order. The Order table stores the information about each order the person places. The OrderDetails stores the information about each product the person purchases for each order. And the ProductDetails table stores a list of all products.
Here is a diagram of the data structure as I see it....Mind you it does not have every detail in it. But hopefully it will give you enough to get started.

4 tables:
Partyinformation: id, name, address
Productinformation: id, name, price
Orderinformation: id: dates etc
Orderline: orderID, ProductID, Amount
where orderID and ProductID are the foreignkeys into productinformation and orderinformation
Adding products to the (created) order just involves adding an productID and OrderID into [orderline] incrementing the amount when the same product is entered twice.

Records for Sales Person

I am designing this database and c# app, that a record gets saved to database. now say we have three Sales Person and each should be assigned a record in strict rotation so they get to work on equal amount of records.
What I have done so far was to create one table called Records and one SalesPerson, the Records would have salesperson id as foreign key and another column that would say which agent it is assigned to and will increment this column.
Do you think this is a good design, if not can you give any ideas?

To do this I would use the analytical functions ROW_NUMBER and NTILE (assuming your RDBMS supports them). This way you can allocate each available sales person a pseudo id incrementing upwards from 1, then randomly allocate each unassigned record one of these pseudo ids to assign them equally between sales people. Using pseudo ids rather than actual ids allows for the SalesPersonID field not being continuous. e.g.
-- CREATE SOME SAMPLE DATA
DECLARE #SalesPerson TABLE (SalesPersonID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY, Name VARCHAR(50) NOT NULL, Active BIT NOT NULL)
DECLARE #Record TABLE (RecordID INT IDENTITY(1, 1) NOT NULL PRIMARY KEY, SalesPersonFK INT NULL, SomeOtherInfo VARCHAR(100))
INSERT #SalesPerson VALUES ('TEST1', 1), ('TEST2', 0), ('TEST3', 1), ('TEST4', 1);
INSERT #Record (SomeOtherInfo)
SELECT Name
FROM Sys.all_Objects
With this sample data the first step is to find the number of available sales people to allocate records to:
DECLARE #Count INT = (SELECT COUNT(*) FROM #SalesPerson WHERE Active = 1)
Next using CTEs to contain the window functions (as they can't be used in join clauses)
;WITH Records AS
( SELECT *,
NTILE(#Count) OVER(ORDER BY NEWID()) [PseudoSalesPersonID]
FROM #Record
WHERE SalesPersonFK IS NULL -- UNALLOCATED RECORDS
), SalesPeople AS
( SELECT SalesPersonID,
ROW_NUMBER() OVER (ORDER BY SalesPersonID) [RowNumber]
FROM #SalesPerson
WHERE Active = 1 -- ACTIVE SALES PEOPLE
)
Finally update the records CTE with the actual sales personID rather than a pseudo id
UPDATE Records
SET SalesPersonFK = SalesPeople.SalesPersonID
FROM Records
INNER JOIN SalesPeople
ON PseudoSalesPersonID = RowNumber
ALL COMBINED IN AN SQL FIDDLE

This is quite confusing as I suspect you're using the database term 'record' aswell as an object/entity 'Record'.
The simple concept of having a unique identifier in one table that also features as a foreign key in another table is fine though, yes. It avoids redundancy.
Basics of normalisation

Its mostly as DeeMac said. But if your Record is an object (i.e. it has all the work details or its a sale or a transaction) then you need to separate that table. Have a table Record with all the details to that particular object. Have another table `Salesman' with all the details about the Sales Person. (In a good design, you would only add particular business related attributes of the position in this table. All the personal detail will go in a different table)
Now for your problem, you can build two separate tables. One would be Record_Assignment where you will assign a Record to a Salesman. This table will hold all the active jobs. Another table will be Archived_Record_Assignment which will hold all the past jobs. You move all the completed jobs here.
For equal assignment of work, you said you want circular assignment. I am not sure if you want to spread work amongst all sales person available or only certain number. Usually assignments are given by team. Create a table (say table SalesTeam)with the Salesman ids of the sales persons you want to assign the jobs (add team id, if you have multiple teams working on their own assigned work areas or customers. That's usually the case). When you want to assign new job, query the Record_Assignment table for last record, get the Salesman id and assign the job to the next salesman in the SalesTeam table. The assignment will be done through business logic (coding).
I am not fully aware of your scenario. These are all my speculations so if you see something off according to your scenario, let me know.
Good Luck!

What is the best way, algorithm, method to difference large lists of data?

I am receiving a large list of current account numbers daily, and storing them in a database. My task is to find added and released accounts from each file. Right now, I have 4 SQL tables, (AccountsCurrent, AccountsNew, AccountsAdded, AccountsRemoved). When I receive a file, I am adding it entirely to AccountsNew. Then running the below queries to find which we added and removed.
INSERT AccountsAdded(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew WHERE AccountNumber not in (SELECT AccountNum FROM AccountsCurrent)
INSERT AccountsRemoved(AccountNum, Name) SELECT AccountNum, Name FROM AccountsCurrent WHERE AccountNumber not in (SELECT AccountNum FROM AccountsNew)
TRUNCATE TABLE AccountsCurrent
INSERT AccountsCurrent(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew
TRUNCATE TABLE AccountsNew
Right now, I am differencing about 250,000 accounts, but this is going to keep growing. Is this the best method, do you have any other ideas?
EDIT:
This is an MSSQL 2000 database. I'm using c# to process the file.
The only data I am focused on is the accounts that were added and removed between the last and current files. The AccountsCurrent, is only used to determine what accounts were added or removed.

To be honest, I think that I'd follow something like your approach. One thing is that you could remove the truncate, do a rename of the "new" to "current" and re-create "new".

Sounds like a history/audit process that might be better done using triggers. Have a separate history table that captures changes (e.g., timestamp, operation, who performed the change, etc.)
New and deleted accounts are easy to understand. "Current" accounts implies that there's an intermediate state between being new and deleted. I don't see any difference between "new" and "added".
I wouldn't have four tables. I'd have a STATUS table that would have the different possible states, and ACCOUNTS or the HISTORY table would have a foreign key to it.

Using IN clauses on long lists can be slow.
If the tables are indexed, using a LEFT JOIN can prove to be faster...
INSERT INTO [table] (
[fields]
)
SELECT
[fields]
FROM
[table1]
LEFT JOIN
[table2]
ON [join condition]
WHERE
[table2].[id] IS NULL
This assumes 1:1 relationships and not 1:many. If you have 1:many you can do any of...
1. SELECT DISTINCT
2. Use a GROUP BY clause
3. Use a different query, see below...
INSERT INTO [table] (
[fields]
)
SELECT
[fields]
FROM
[table1]
WHERE
EXISTS (SELECT * FROM [table2] WHERE [condition to match tables 1 and 2])
-- # This is quick provided that all fields to match the two tables are
-- # indexed in both tables. Should then be much faster than the IN clause.

You could also subtract the intersection to get the differences in one table.

If the initial file is ordered in a sensible and consistent way (big IF!), it would run considerably faster as a C# program which logically compared the files.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.