I cringe to ask this... as usual, I'm stuck with a legacy design beyond my control.
Incoming datafeed table (keyed to datafeed, line number, and file date):
DATAFEED_USER"
(
"DATAFEED_NM" VARCHAR2(32) NOT NULL ENABLE,
"LINE_NBR" NUMBER(6,0) NOT NULL ENABLE,
"FILE_DT" DATE NOT NULL ENABLE,
"NETWORK_ID" VARCHAR2(10),
"PERNR" VARCHAR2(10),
"COMPANY_CODE" VARCHAR2(5),
"LOCAL_COMPANY_ID" VARCHAR2(12),
// end identifier fields, begin user data
"USER_TYPE" VARCHAR2(16),
"LAST_NM" VARCHAR2(40),
"FIRST_NM" VARCHAR2(40),
...
)
Table in our system (keyed to network ID):
USER_DESC
(
"NETWORK_ID" VARCHAR2(30) NOT NULL ENABLE,
"PERNR" CHAR(8),
"COMPANY_CODE" VARCHAR2(4),
"LOCAL_COMPANY_ID" VARCHAR2(12),
// end identifier fields, begin user data
"LAST_NM" VARCHAR2(40),
"FIRST_NM" VARCHAR2(40),
...
)
I need the datafeed_user entities to have collections of matching user_desc records - records can match by network ID, PerNr, or Company Code + Local ID. There's no FK relationship, because (1) new users will come in on the datafeed before we have a record of them, and (2) only the network ID is a PK in our system.
The relationships:
Network ID: Many datafeeds can send the same network ID; there will only be one match in our system - Many to 1.
PerNr: Only one datafeed users PerNrs, but they can match to multiple Network IDs in our system - 1 to Many
Company Code/Local ID: Each datafeed sends one Local ID, but they can match to multiple Network IDs in our system - 1 to Many
The datafeed processing looks at all potential matches to the database, chooses one using a set of business rules, and updates the matching record. Users sent on more than one feed are flagged and not processed. There are 250k+ records in our database, so I only want to pull down the matching records to update. (I'd love to only pull down one, but then I'd have to push the business logic for matching records to the database)
How do I define a association/navigation property in the EF designer so that I can easily work with the related records?
I understand that what I'm after isn't a true relationship in db terms, so I'm open to alternatives. The code I'm migrating uses typed datasets, which were extended to have custom properties (such as a collection of matched records). I can't cleanly do that in the EF, because the .cs file is auto-generated. The requirements are:
For each datafeed record, look at all records that match in our system by one of the three unique identifiers
Update the matching record in our system
Don't pull all 250k records down to update one (and definitely don't do that 1,000 times, once for each datafeed record)
Suggestions?
Related
The following C# code was working with MySQL server correctly to get the MAX value of a column from a table and at the same time this query adds 1 to the value like this:
SqlDataReader dr = new SqlCommand("SELECT (MAX(Consec) +1) AS NextSampleID FROM Samples", Connection).ExecuteReader();
while (dr.Read())
{ //in case of Maximum value of Consec = 555, the expected result: A556
txtSampleID.Text = "A" + dr["NextSampleID"].ToString();
}
However this code does not work anymore after migrating the DB from MySQL to SQL Server, the result is the same if MAX(Consec) = 555 the result after running the query is A555, it does not add 1 like before when using MySQL server.
Question: What is the correct query to get the MAX value of Consec and how to add "1" to the result of MAX in the same query?
The MySQL query is wrong and won't work except in trivial applications, with only a single user, no deletions and no relations :
Concurrent calls will produce the same MAX value so result in the same, duplicate next value
Deleting records will reduce the MAX value, resulting in previous ID values getting assigned to new rows. If that ID value is used in another table, the new record will end up associated with rows it has no real relation. This can be very bad. Imagine one patient's test samples getting mixed with another's.
Calculating the MAX requires locking the entire table or index, thus blocking or getting blocked by others. Given MySQL's MVCC isolation though, that wouldn't prevent duplicates as concurrent SELECT MAX queries wouldn't block each other.
It's possible MAX+1 would work in a POS application with only one terminal generating invoice numbers, but as soon as you added two POS terminals you'd risk generating duplicate invoices.
In an e-Commerce application on the other hand, it's almost guaranteed that even if only two orders are placed per month, they'll happen at the exact same moment, resulting in duplicates.
Correct MySQL solution and equivalent
The correct solution in MySQL is to use the AUTO_INCREMENT attribute :
CREATE TABLE Samples (
Consec INT NOT NULL AUTO_INCREMENT,
...
);
If you want the invoice number to contain other data, use a calculated column to combine the incrementing number and that other data.
The equivalent in SQL Server is the IDENTITY property :
CREATE TABLE Samples (
Consec INT NOT NULL IDENTITY,
...
);
Sequences
Another option available in SQL Server and other databases is the SEQUENCE object. A SEQUENCE can be used to generate incrementing numbers that aren't tied to a table. It can also be reset, making it ideal for accounting applications where invoice numbers are reset after a specific period (eg every year).
Since a SEQUENCE is an independent object, you can increment and receive the new value before inserting any data in the database with NEXT VALUE FOR eg :
SELECT NEXT VALUE FOR seq_InvoiceNumber;
NEXT VALUE FOR can be uses as a default constraint for a table column the same way IDENTITY or AUTO_INCREMENT are used:
Create MyTable (
...
Consec INT NOT NULL DEFAULT (NEXT VALUE FOR seq_ThatSequence)
)
Multi-table sequences
The same sequence can be used in multiple tables. One case where that's useful is assigning a Document ID to data imported from multiple sources, stored in different tables, eg payments.
Payment providers (credit cards, banks etc) send statements using different formats. Obviously you can't lose any information there so you need to use different tables per provider, but still be able to handle payments the same way no matter where they came from.
If you used an IDENTITY on each table you'd end up with conflicting IDs for payments coming from different providers. On eg the OrderPayments table you'd have to record both the provider name and ID. Generating a single view of payments would end up with ID values that can't be used by themselves.
By using a single SEQUENCE though, each record would get its own ID, no matter the table.
In the app there should be a functionality for the user to reset orderNumber whenever needed. We are using SQL Server for db, .NET Core, Entity Framework, etc. I was wondering what is the most elegant way to achieve this?
Thought about making orderNumber int, identity(1,1), and I've searched for DBCC CHECKIDENT('tableName', RESEED, 0), but the latter introduces some permissions concerns (the user has to own the schema, be sysadmin, etc.).
EDIT: orderNumber is NOT a primary key, and duplicate values are not the problem. We should just let the user (once a year probably) reset the numbering of their orders to start from 1 again..
Any advice?
An identity column is used to auto-generate incremental values, so if you're relying on this column as the primary key or some unique identifer for rows, updating this can cause issues with duplicates.
It's difficult to recommend the best solution without knowing more about your use case, but I would consider (1) if this orderNumber should be the PK or would some surrogate key like (customerId, locationId, date) makes sense and allows you to more freely update orderNumber without impacts on data integrity, or (2) if keeping orderNumber as an identity make sense, but you could build a data model or table that maps multiple rows in this table to the same "order" allowing you to maintain the key on this base table.
It seems that orderNumber is a business layer concern - therefore I recommend a non-SQL solution. You need C# code that generates the number for storage in your "Order" entity. I wouldn't use IDENTITY() to implement/solve this.
The customer isn't going to reset anything in the DB, your code will do this. You need a "take a number" service in your business layer and a place in the UI to reset it (presumable Per Customer).
Sql Server has Sequence. My only concern regarding using it is partitioning per customer (an assumed requirement). Will you have multiple customers? If so, you probably can't have a single number generator. Hence why I suggest a C# implementation (sure, you'll want to save the state as numbers are handed out).
Identity should not be used in the way you're suggesting. Presumably you don't want a customer to get two different orders with the same order number (i.e., order number is unique within customer). If you don't care if customers get discontinuous order numbers, then you can use a sequence, but if you want continuous order numbers, then you would need to create an separate sequence for each customer, which is not a good solution either. I suggest you set the order number to max([order number]) over(partition by [customer id]) + 1 on the insert. That will automatically give you the next order number for a particular customer.
I am writing a C# WinForms program which includes a user input textbox, the value of which will be used to create a table. I have been thinking about what the best way to handle invalid T-SQL table names is (though this can be extended to many other situations). Currently the only method I can think of would be to check the input string for any violations of valid table names individually, though this seems long winded and could be prone to missing certain characters for example due to my own ignorance of what is a violation and what is not.
I feel like there should be a better way of doing this but have been unable to find anything in my search so far. Can anyone help point me in the right direction?
As told you in a comment already you should not do this...
You might use something like this
USE master;
GO
CREATE DATABASE dbTest;
GO
USE dbTest;
GO
CREATE TABLE UserTables(ID INT IDENTITY CONSTRAINT PK_UserTables PRIMARY KEY
,UserInput NVARCHAR(500) NOT NULL CONSTRAINT UQ_UserInput UNIQUE);
GO
INSERT INTO UserTables VALUES(N'blah')
,(N'invalid !%$& <<& >< $')
,(N'silly 💖');
GO
SELECT * FROM UserTables;
/*
ID UserInput
1 blah
2 invalid !%$& <<& >< $
3 silly 💖
*/
GO
USE master;
GO
DROP DATABASE dbTest;
GO
You would then create your tables as Table1, Table2 and so on.
Whenever a user enters his string, you visit the table, pick the ID and create the table's name by concatenating the word Table with the ID.
There are better approaches!
But you should think of a fix schema. You will have to define columns (how many, which type, how to name them?). You will feel in hell when you have to query this. Nothing to rely on...
One approach is a classical n:m mapping
A User table (UserID, Name, ...)
A test table (TestID, TestName, TestType, ...)
The mapping table (ID, UserID, TestID, Result VARCHAR(MAX))
Depending on what you need you might add a table
question table (QuestionID, QuestionText ...)
Then use a mapping to bind questions to tests and another mapping to bind answers to such mapped questions.
another approach was to store the result as a generic container (XML or JSON). This keeps your tables slim, but needs to knwo the XML's structure in order to query it.
Many ways to skin a rabbit...
UPDATE
You ask for an explanation...
The main advantage of a relational database is the pre-known structure.
Precompiled queries, cached results, statisics, indexes demand for known structures.
Data integrity is ensured with constraints, foreign keys and so on. All this demands for known names, known types(!) and known relations.
User-specific table names, and even worse: generically defined structures, do not allow for any join, or other typical RDBMS operation. The only approach is to create each and any statement dynamically (string building)
The rule of thumb is: Whenever you think to have to create several objects of for the same, but with different names you should think about the design. It is bad to store Phone1, Phone2 and Phone3. It is better to have a side table, with a UserID and a Phone column (classical 1:n). It is bad to have SalesCustomerA, SalesCustomerB, better use a Customer table and bind its ID into a general Sales table as FK.
You see what I mean? What belongs together should live in one single table. if you need separation add columns to your table and use them for grouping and filtering.
Just imagine you want to do some statistical evaluation of all your user test tables. How would you gather the data into one big pool, if you cannot rely on some structure in common?
I hope this makes it clear...
If you still wnat to stick to your idea, you should give my code sample a try. this allows to map any silly string to a secure and easy to handle table name.
Lots can go wrong with users entering table names. A bunch of whacked out names is maintenance nightmare. A user should not even be aware of table name. It is a security risk as now the program has to have database owner authority. You want to limit users to minimum authority.
Similar to Shnugo but with composite primary key. This will prevent duplicate userID, testID.
user
ID int identity PK
varchar(100) fName
varchar(100) lName
test
ID int identity PK
varchar(100) name
userTest
int userID PK FK to User
int testID PK FK to Test
int score
select t.Name, u.Name, ut.score
from userTest ut
join Test t
on t.ID = ut.testID
join User u
on u.ID = ut.userID
order by t.Name, u.Name
I am working on a dynamic loader. Based on a database table that defines the flat text files I can read a single file with multiple record types and load it into database tables. The tables are related and using identity primary keys. Everything is currently working but runs really slow as would be expected given that it is all accomplished by single insert statements. I am working on optimizing the process and cant find an 'easy' or 'best practice' answer on the web.
My current project deals with 8 tables but to simplify I will use a customers / orders example.
Lets look at two customers below, the data would repeat for each set of customers and orders in the data file. Parent records are always before child records. The first field is record type and each record type has a different definition of the fields that follow. This is all specified in the control tables.
CUST|Joe Green|123 Main St
ORD|Pancakes|5
ORD|Nails|2
CUST|John Deere|456 Park Pl
ORD|Tires|4
Current code will:
Insert customer Joe Green and return an ID. (Using Output
Inserted.Id in the insert statement)
Insert orders pancakes and nails attaching the returned ID.
Insert customer John Deere and return an ID.
Insert order Tires with the return ID.
This runs painfully slow. If this could be optimized and I wouldn't have to change much code, that would be ideal but I cant think of how.
So the solution? I was thinking datatables... Here is what I am thinking of so far.
Create Transaction
Lock all tables that are part of the 'file definition', in this case
Customers and Orders Get max ID for each table and increment by one
to have starting IDs for all tables
Create datatable for all tables
Execute as currently set up but instead of issuing insert statements
add to data table
After data is read bulk upload tables in the correct order based on
relationships
Unlock tables
End Transaction
I was wondering, before I go down this path, if anyone has worked out a better solution. I am also considering a custom script component in SSIS. I have seen posts and blogs about holding off on commiting a transaction but each parent record has only a few child records and the tree can get up to 4 deep, think order details and products. Due to needing the parent record ID I need to commit the insert of parent records. I have also considered managing the ID's myself rather than Identity but I do not want to add that extra management if I can avoid it.
UPDATE based on answer, for clarification / context.
A typical text file has
one file header record
- 5 facility records that relate to the file header
- 7,000 customers(account)
- 5 - 10 notes per customer
- 1-5 payments at the account level
- 1-5 adjustments at the account level
- 5 - 20 orders per customer
- 5 - 20 order details per order
- 1-5 payments at the order level
- 1-5 adjustments at the order level
- one file trailer record related to the file header
Keys
- File Header -> Facility -> Customer (Account)
- File Header -> FileTrailer
- Customer -> Notes
- Customer -> Payments
- Customer -> Adjustments
- Customer -> Orders
- Order -> OrderDetails
- Order -> Payments
- Order -> Adjustments
There are a few more tables involved but this should give an idea of the overall context.
Data Sample ... = MORE FIELDS .... MORE RECORDS
HEADER|F1|F2|...
FACILITY|F1|F2|..
CUSTOMER|F1|F2|...
NOTE|F1|F2|....
....
ORDER|F1|F2|...
ORDERDETAIL|F1|F2|...
.... ORDER DETAILS
ORDERPYMT|F1|F2|...
....
ORDERADJ|F1|F2|...
....
CUSTOMERPYMT|F1|F2|...
....
CUSTOMERADJ|F1|F2|...
....
(The structure repeats for each facility)
TRAILER|F1|F2|...
Inserting related tables with low data volumes should normally not be a problem. If they are slow, we will need more context to answer your question.
If you are encountering problems because you have many records to insert, you will probably have to look at SqlBulkCopy.
If you prefer not managing your ids yourself, the cleanest way I know of is working with temporary placeholder id columns.
Create and fill datatables with your data and a tempId columns you fill yourself and foreign keys blank
SqlBulkCopy primary table
Update secondary datatable with generated foreign keys by finding primary keys from previously inserted table through your tempids column
Upload secondary table
Repeat until done
Remove temporary id columns (optional)
I am making an invoicing system, with the support for multiple subsidaries which each have their own set of invoice numbers, therefore i have a table with a primary key of (Subsidiary, InvoiceNo)
I cannot use MySQL auto increment field, as then it will be constantly incrementing the same count for all subsidaries.
I don't want to make seperate tables for each subsidiary as there will be new subsidaries added as need be...
I am currently using "Select Max (ID) Where Subsidiary = X", from my table and adding the invoice according to this.
I am using nHibernate, and the Invoice insert, comes before the InvoiceItem insert, therefore if Invoice insert fails, InvoiceItem will not be carried out. But instead i will catch the exception, re-retrieve the Max(ID) and try again.
What is the problem with this approach? And if any, what is an alternative?
The reson for asking is because i read one of the answers on this question: Nhibernate Criteria: 'select max(id)'
This is a very bad idea to use when generating primary keys. My advise is as follows:
Do not give primary keys a business meaning (synthetic keys);
Use a secondary mechanism for generating the invoice numbers.
This will make your life a lot easier. The mechanism for generating invoice numbers can then e.g. be a table that looks something like:
Subsidiary;
NextInvoiceNumber.
This will separate the internal numbering from how the database works.
With such a mechanism, you will be able to use auto increment fields again, or even better, GUID's.
Some links with reading material:
http://fabiomaulo.blogspot.com/2008/12/identity-never-ending-story.html
http://nhforge.org/blogs/nhibernate/archive/2009/02/09/nh2-1-0-new-generators.aspx
As you say, the problem with this approach is multiple sessions might try and insert the same invoice ID. You get a unique constraint violation, have to try again, that might fail as well, and so on.
I solve such problems by locking the subsiduary during the creation of new invoices. However, don't lock the table, (a) if you are using InnoDB there are problems that a lock table command by default will commit the transaction. (b) There is no reason why invoices for two different subsiduaries shouldn't be added at the same time as they have different independent invoice numbers.
What I would do in your situation is:
Open an transaction and make sure your tables are InnoDB.
Lock the subsiduary with an SELECT .. FOR UPDATE command. This can be done using LockMode.UPGRADE in NHibernate.
Find the max id using max(..) function and do the insert
Commit the transaction
This serializes all invoice inserts for one subsiduary (i.e. only one session can do such an insert at once, any second attempt will wait until the first is complete or has rolled back) but that's what you want. You don't want holes in your invoice numbers (e.g. if you insert invoice id 3485 and then it fails, then there are invoices 3484 and 3486 but no 3485).