Primary key constraint duplicate key exceptions with time series data and DATETIME2

Primary key constraint duplicate key exceptions with time series data and DATETIME2 - c#

I have a database table as follows:
CREATE TABLE some_table
(
price FLOAT NOT NULL,
size FLOAT NOT NULL,
retrieved DATETIME2 DEFAULT SYSUTCDATETIME(),
runner_id INT NOT NULL,
FOREIGN KEY (runner_id) REFERENCES runner(id),
PRIMARY KEY (retrieved, price, size, runner_id)
);
CREATE INDEX some_table_index ON some_table (runner_id);
This table is populated by sets of price/size data retrieved from a web service which is essentially time-series in nature. As far as I can tell (and I have put some comparison logic in my code to make sure) price and size are never the duplicated in a single set of entries retrieved from the web service. They may however be duplicated in subsequent requests for price/size data related to the same runner.
I am getting intermittent primary key constraint duplicate key exceptions even though I am forming my key off a high resolution date time value as well as the rest of the table columns. At this stage I am considering dropping the composite key in favor of an auto-generated primary key. Can anyone suggest why this might be happening based on the table schema? I consider it unlikely that I am trying to insert two separate sets of price/size data with duplicate values simultaneously given the nature of the code and resolution of the date time value. I guess it is possible though - I am using asynchronous methods to interact with the database and web service.
Thanks

Is each runner_id inserting multiple rows into the table in bulk? It's possible the same price and size would be processed in less than 100 nanoseconds. This would result in them not being unique.
SQL Server obtains the date and time values by using the GetSystemTimeAsFileTime() Windows API. The accuracy depends on the computer hardware and version of Windows on which the instance of SQL Server is running. The precision of this API is fixed at 100 nanoseconds. The accuracy can be determined by using the GetSystemTimeAdjustment() Windows API.
https://msdn.microsoft.com/en-us/library/bb630387.aspx

Related

SQL Server SELECT MAX and Add 1

The following C# code was working with MySQL server correctly to get the MAX value of a column from a table and at the same time this query adds 1 to the value like this:
SqlDataReader dr = new SqlCommand("SELECT (MAX(Consec) +1) AS NextSampleID FROM Samples", Connection).ExecuteReader();
while (dr.Read())
{ //in case of Maximum value of Consec = 555, the expected result: A556
txtSampleID.Text = "A" + dr["NextSampleID"].ToString();
}
However this code does not work anymore after migrating the DB from MySQL to SQL Server, the result is the same if MAX(Consec) = 555 the result after running the query is A555, it does not add 1 like before when using MySQL server.
Question: What is the correct query to get the MAX value of Consec and how to add "1" to the result of MAX in the same query?

The MySQL query is wrong and won't work except in trivial applications, with only a single user, no deletions and no relations :
Concurrent calls will produce the same MAX value so result in the same, duplicate next value
Deleting records will reduce the MAX value, resulting in previous ID values getting assigned to new rows. If that ID value is used in another table, the new record will end up associated with rows it has no real relation. This can be very bad. Imagine one patient's test samples getting mixed with another's.
Calculating the MAX requires locking the entire table or index, thus blocking or getting blocked by others. Given MySQL's MVCC isolation though, that wouldn't prevent duplicates as concurrent SELECT MAX queries wouldn't block each other.
It's possible MAX+1 would work in a POS application with only one terminal generating invoice numbers, but as soon as you added two POS terminals you'd risk generating duplicate invoices.
In an e-Commerce application on the other hand, it's almost guaranteed that even if only two orders are placed per month, they'll happen at the exact same moment, resulting in duplicates.
Correct MySQL solution and equivalent
The correct solution in MySQL is to use the AUTO_INCREMENT attribute :
CREATE TABLE Samples (
Consec INT NOT NULL AUTO_INCREMENT,
...
);
If you want the invoice number to contain other data, use a calculated column to combine the incrementing number and that other data.
The equivalent in SQL Server is the IDENTITY property :
CREATE TABLE Samples (
Consec INT NOT NULL IDENTITY,
...
);
Sequences
Another option available in SQL Server and other databases is the SEQUENCE object. A SEQUENCE can be used to generate incrementing numbers that aren't tied to a table. It can also be reset, making it ideal for accounting applications where invoice numbers are reset after a specific period (eg every year).
Since a SEQUENCE is an independent object, you can increment and receive the new value before inserting any data in the database with NEXT VALUE FOR eg :
SELECT NEXT VALUE FOR seq_InvoiceNumber;
NEXT VALUE FOR can be uses as a default constraint for a table column the same way IDENTITY or AUTO_INCREMENT are used:
Create MyTable (
...
Consec INT NOT NULL DEFAULT (NEXT VALUE FOR seq_ThatSequence)
)
Multi-table sequences
The same sequence can be used in multiple tables. One case where that's useful is assigning a Document ID to data imported from multiple sources, stored in different tables, eg payments.
Payment providers (credit cards, banks etc) send statements using different formats. Obviously you can't lose any information there so you need to use different tables per provider, but still be able to handle payments the same way no matter where they came from.
If you used an IDENTITY on each table you'd end up with conflicting IDs for payments coming from different providers. On eg the OrderPayments table you'd have to record both the provider name and ID. Generating a single view of payments would end up with ID values that can't be used by themselves.
By using a single SEQUENCE though, each record would get its own ID, no matter the table.

User should be able to reset autoincrement (identity) column: possible solutions

In the app there should be a functionality for the user to reset orderNumber whenever needed. We are using SQL Server for db, .NET Core, Entity Framework, etc. I was wondering what is the most elegant way to achieve this?
Thought about making orderNumber int, identity(1,1), and I've searched for DBCC CHECKIDENT('tableName', RESEED, 0), but the latter introduces some permissions concerns (the user has to own the schema, be sysadmin, etc.).
EDIT: orderNumber is NOT a primary key, and duplicate values are not the problem. We should just let the user (once a year probably) reset the numbering of their orders to start from 1 again..
Any advice?

An identity column is used to auto-generate incremental values, so if you're relying on this column as the primary key or some unique identifer for rows, updating this can cause issues with duplicates.
It's difficult to recommend the best solution without knowing more about your use case, but I would consider (1) if this orderNumber should be the PK or would some surrogate key like (customerId, locationId, date) makes sense and allows you to more freely update orderNumber without impacts on data integrity, or (2) if keeping orderNumber as an identity make sense, but you could build a data model or table that maps multiple rows in this table to the same "order" allowing you to maintain the key on this base table.

It seems that orderNumber is a business layer concern - therefore I recommend a non-SQL solution. You need C# code that generates the number for storage in your "Order" entity. I wouldn't use IDENTITY() to implement/solve this.
The customer isn't going to reset anything in the DB, your code will do this. You need a "take a number" service in your business layer and a place in the UI to reset it (presumable Per Customer).
Sql Server has Sequence. My only concern regarding using it is partitioning per customer (an assumed requirement). Will you have multiple customers? If so, you probably can't have a single number generator. Hence why I suggest a C# implementation (sure, you'll want to save the state as numbers are handed out).

Identity should not be used in the way you're suggesting. Presumably you don't want a customer to get two different orders with the same order number (i.e., order number is unique within customer). If you don't care if customers get discontinuous order numbers, then you can use a sequence, but if you want continuous order numbers, then you would need to create an separate sequence for each customer, which is not a good solution either. I suggest you set the order number to max([order number]) over(partition by [customer id]) + 1 on the insert. That will automatically give you the next order number for a particular customer.

Size of a PK int in SQL Server / ASP.NET MVC 4

Currently, my primary key data type is an int, not null, and is defined as following from the application:
public virtual int Id { get; set; }
I am a bit worried though, since int is restricted, at some point, it will not be possible to add new rows because of the size of the primary key.
How should this problem be approached? I was thinking about a) simply changing the data type to long, etc. or, b) if possible, remove the primary key since it is not used at any time in the application?

Don't remove your primary key, you need it to identify records in your database. If you expect to have more than int can handle, you can:
Make it bigint
Make it uniqueidentifier
Make it a composite key (made up of two or more fields)

Unless you're dealing with very small tables (10 rows), you never want to go without a primary key. The primary key dramatically affects performance as it provides your initial unique and clustered index. Unique indexes play a vital role in maintaining data integrity. Clustered indexes play a vital role in allowing SQL Server to perform index and row seeks instead of scans. Basically, does it have to load one row or all of the rows.
Changing the data-type will affect your primary index size, row size, as well the size of as any index placed on the table. Unless you're worried about exceeding 2,147,483,647 rows in the near future, I would stick with an INT. Every data type has a restricted row count.

Do you really think you'll get above 2,147,483,647 rows? I doubt it. I wouldn't worry about it.
If you, at some point, begin to reach the limit, it should be trivial to change it to a bigint.

It depends on how big you're expecting this table to become - take a look at the reference page for SQL Server for supported ranges and you can answer the question about the need to change the data type of the PK for yourself.
If the key is really never used (not even as a foreign key) then culling it is entirely seemly.

You should always have a primary key, so I wouldn't remove that. However, do you really think you're going to exceed to limit of 2,147,483,647 rows in your table?
If it's really a concern, you could just change your dataType to a bigint.
Here is also a limits sheet on what SQL server can handle - that may help you get a fix on what you need to plan for.

Edit Primary Key

Is there a way of editing the primary key in MVC3 if the table only contains a primary key field. For example I have a console table and within it i have the console name as the Primary key and I want to be able to edit it and change it and save the edited value.
If there is any more info you require please let me know.

As a general rule, you should never edit primary keys. The primary key in SQL Server typically has a clustered unique index on it, so editing the primary key means you potentially have to rebuild your indexes (maybe not every time, but depending on the skew).
Instead I would create a fake primary key, such as an IDENTITY column in SQL Server, and put a UNIQUE constraint on the Name column. If your table grows large, retrieving items on an int column will also be faster than retrieving on a varchar() column.
Update:
Since I was told I didn't answer the question (even though this is the accepted answer), it is possible to change the primary key values in SQL Server. But it is not technically an edit operation, since referential integrity may prevent a true edit (I haven't tried, so feel free to conduct your own experiment!)
The operation would go something like this:
Add a new row to the primary table, using the new PK value
Run an update operation to change all FK values to the new PK value
Delete the old PK row
I'd run all that in a transaction, too. But I will state again for the record, I do not recommend taking this approach.

As aKzenT pointed out, it is best to always use an Auto-Number/Identity or Sequence (Oracle) when defining primary keys. It is much more efficient for b-tree processors to find and join numeric keys, especially when textual ones are longer that a few bytes. Smaller keys also result in fewer b-tree pages that need to be searched.
Another important reason is that auto-generated keys cannot be modified. When using modifiable textual keys, foreign keys must employ CASCADE UPDATE which many (ex. Oracle, DB2) RDBMS do not support declaratively and must be defined using triggers, which is very complicated.
In your case, replacing the textual key with an auto-generated primary key will eliminate the problem.

Modification in Database due to use of GUID (uniqueidentifier)

The application I have completed has gone live and we are facing some very specific problems as far as response time is concerned in specific tables.
In short, response time in some of the tables that have 5k rows is very low. And these tables will grow in size.
Some of these tables (e.g. Order Header table) have a uniqueidentifier as the P.K. We figure that this may be the reason for the low response time.
On studying the situation we have decided the following options
Convert the index of the primary key in the table OrderHeader to a non-clustered one.
Use newsequentialid() as the default value for the PK instead of newid()
Convert the PK to a bigint
We feel that option number 2 is ideal since option number 3 will require big ticket changes.
But to implement that we need to move some of our processing in the insert stored procedures to triggers. This is because we need to trap the PK from the OrderHeader table and there is no way we can use
Select #OrderID = newsequentialid() within the insert stored procedure.
Whereas if we move the processing to a trigger we can use
select OrderID from inserted
Now for the questions?
Will converting the PK from newid() to newsequentialid() result in performance gain?
Will converting the index of the PK to a non-clustered one and retaining both uniqueidentifier as the data type for PK and newid() for generating the PK solve our problems?
If you faced a similar sort of situation please do let provide helpful advice
Thanks a tons in advance people
Romi

Convert the index of the primary key in the table OrderHeader to a non-clustered one.
Seems like a good option to do regardless of what you do. If your table is clustered using your pkey and the latter is a UUID, it means you're constantly writing somewhere in the middle of the table instead of appending new rows to the end of it. That alone will result in a performance hit.
Prefer to cluster your table using an index that's actually useful for sorting; ideally something on a date field, less ideally (but still very useful) a title/name, etc.

Move the clustered index off the GUID column and onto some other combination of columns (your most often run range search, for instance)
Please post your table structure and index definitions, and problem query(s)
Before you make any changes: you need to measure and determine where your actual bottleneck is.
One of the common reasons for a GUID Primary Key, is generating these ID's in a client layer, but you do not mention this.
Also, are your statistics up to date? Do you rebuild indexes regularly?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.