I have tried searching this before asking but every result I have found mentions GUIDs as a PK which is not the case here.
I have a database that's using INT as the PK on all tables. However the data is accessed via API calls and a requirement was for the INT value not to be returned or used in any API. Therefore I thought of having an extra column on the tables containing a GUID.
Now my question is, if I Index the GUID column what kind of performance impact will this have? Would it be positive or negative? Bear in mind the GUID is NOT a PK or FK.
I think you are on the right track, but don't take it from me...
In the comments section on one of Kimberly Tripp's articles, she responds to a comment that advocates the opposite of your position, and she disagrees and argues for the same solution you are proposing (nonclustered indexed guid with a clustered int/bigint primary key).
Herman:
If the GUID is the intrinsic identifier for the entity being modelled (i.e. used by selects) then it should be the clustered primary key without question. The reason is that adding a surrogate identity key (of int or bigint) and demoting the GUID primary key to a column with an index/unique constraint requires 2 indexes to be maintained and slows down, in my experience, by a factor of 2.
Kimberly Tripp
Hey there Herman – Actually, I disagree. For point-based queries using a nonclustered index does not add a significant amount of costly IOs. And, the maintenance of a nonclustered index that’s heavily fragmented is a lot cheaper than the required maintenance on a heavily fragmented clustered index. Additionally, the GUID might make your nonclustered indexes unnecessarily wide – making them take: more log space, more disk space, more cache as well as adding time on insert and access (especially in larger queries/joins). So, while you might not feel like an arbitrary/surrogate key is useful (because you never directly query against it) it can be incredibly efficient to use indirectly through your nonclustered indexes. There’s definitely an element of “it depends” here but if you have even just a few nonclustered indexes then it’s likely to be more beneficial than negative and often significantly so.
Cheers,
kt ~ GUIDs as PRIMARY KEYs and/or the clustering key - Kimberly L. Tripp
This should be fine. Of course, you have the normal impact of any index and any column taking up more space. So, data modifications will be a bit slower. The use of a GUID to locate a record versus an integer is slightly slower. Unless you have a very high throughput application, these are probably not important considerations.
One key point is that the GUID column should not be clustered. This is very important because GUIDs are random, but primary keys are ordered. If a GUID were used for a clustered index, almost every insert would go between two existing records, requiring a lot of movement of data. By contrast, an identity column as a clustered index always inserts at the end of the data.
I am guessing that your references on GUIDs have discussed this issue.
Related
I am developing an API in .Net Core where I have to save an integer number in the SQL table. The number must be of length '9' any number starting from 000000001 - 9 digit number and it should never repeat in the future also I want this number at the memory level because I have to use this number for some other purposes. After doing some search, one of the most common solution is using DateTime.Now.Ticks and trimming its length but the problem is when concurrent HTTP requests come the ticks value might be the same.
One solution to solve this is by applying a lock on the method and releasing it when data will saved in the database but this will slow the performance of the application as the lock is expensive to use.
Second solution is by introducing a new table and setting the initial counter value to 1, so on every HTTP request first apply the UnitOfWork read the value from the table and increment it by one and then save it and then process the other request but again there is a performance hit and not an optimum solution.
So, Is there any other solution that is faster and less expensive?
Thanks in advance
I think you can create a computed column/field in combination with ID(Auto increment). Auto ID will help you insert unique number and computed field will help you make that field Generate specific in length.
An example:
CREATE TABLE [dbo].[EmployeeMaster](
[ID] [int] IDENTITY(1,1) NOT NULL,
[PreFix] [varchar](50) NOT NULL,
[EmployeeNo] AS ([PreFix]+ RIGHT('0000000' + CAST(Id AS VARCHAR(9)), 9)) PERSISTED,
[EmployeeName] VARCHAR(50),
CONSTRAINT [PK_AutoInc] PRIMARY KEY ([ID] ASC)
)
Currently, my primary key data type is an int, not null, and is defined as following from the application:
public virtual int Id { get; set; }
I am a bit worried though, since int is restricted, at some point, it will not be possible to add new rows because of the size of the primary key.
How should this problem be approached? I was thinking about a) simply changing the data type to long, etc. or, b) if possible, remove the primary key since it is not used at any time in the application?
Don't remove your primary key, you need it to identify records in your database. If you expect to have more than int can handle, you can:
Make it bigint
Make it uniqueidentifier
Make it a composite key (made up of two or more fields)
Unless you're dealing with very small tables (10 rows), you never want to go without a primary key. The primary key dramatically affects performance as it provides your initial unique and clustered index. Unique indexes play a vital role in maintaining data integrity. Clustered indexes play a vital role in allowing SQL Server to perform index and row seeks instead of scans. Basically, does it have to load one row or all of the rows.
Changing the data-type will affect your primary index size, row size, as well the size of as any index placed on the table. Unless you're worried about exceeding 2,147,483,647 rows in the near future, I would stick with an INT. Every data type has a restricted row count.
Do you really think you'll get above 2,147,483,647 rows? I doubt it. I wouldn't worry about it.
If you, at some point, begin to reach the limit, it should be trivial to change it to a bigint.
It depends on how big you're expecting this table to become - take a look at the reference page for SQL Server for supported ranges and you can answer the question about the need to change the data type of the PK for yourself.
If the key is really never used (not even as a foreign key) then culling it is entirely seemly.
You should always have a primary key, so I wouldn't remove that. However, do you really think you're going to exceed to limit of 2,147,483,647 rows in your table?
If it's really a concern, you could just change your dataType to a bigint.
Here is also a limits sheet on what SQL server can handle - that may help you get a fix on what you need to plan for.
In an SQL Server database, I created a Unique Constraint to ensure that one of it's tables contains only unique pairs of values.
The problem now is that the order of records I get is different. The records are sorted, but I want them to come in original order, just as they exist in the table, without any sorting.
I've checked everywhere, but couldn't find a way to create a unique constraint without sort order. Is this supported at all?
The records are sorted, but I want them to come in original order, just as they exist in the table, without any
sorting.
Ah, the old sort issue - SQL for beginners.
TABLES have a sort order that is the order of the clustered index. Missing that the odder is undefined.
RESULTS have NO ORDER UNLESS DEFINED. SQL can change the order if it thinks it can process a query better. This is FUNDAMENTAL - you deal with data sets, and data sets per se are not ordered.
So, if you want an order, ASK FOR IT.
but couldn't find a way to create a unique constraint without sort order.
Why would you need an order for a unique constraint? A unique index should suffice, or? I would NOT make uniqueness a constraint but put - standard - a unique index on the fields. Especially as the index is good for - validating that they are unique and thus needed anyway.
IF you want to get your records in the "original" order - you should use any field which will mark this order, such as an identity sequence / primary key (probably the best option you can use), or a creation date or anything else.
The rows in ur table (physically, in the file) are actually sorted by a particular order only when you use a clustered index, however, even in that case, there are no guarantees whatsover that this or any order will be preserved when you selected rows from that table, without any order by clause.
Usually, with a clustered table, You'll get the results in the order of the clustered index, however this is not something you can rely on, and wherever order is important, you should provide ORDER BY in your query.
Using ROW_NUMBER you can get how your order is stored without using sort_order. I hope it help.
Is there a way of editing the primary key in MVC3 if the table only contains a primary key field. For example I have a console table and within it i have the console name as the Primary key and I want to be able to edit it and change it and save the edited value.
If there is any more info you require please let me know.
As a general rule, you should never edit primary keys. The primary key in SQL Server typically has a clustered unique index on it, so editing the primary key means you potentially have to rebuild your indexes (maybe not every time, but depending on the skew).
Instead I would create a fake primary key, such as an IDENTITY column in SQL Server, and put a UNIQUE constraint on the Name column. If your table grows large, retrieving items on an int column will also be faster than retrieving on a varchar() column.
Update:
Since I was told I didn't answer the question (even though this is the accepted answer), it is possible to change the primary key values in SQL Server. But it is not technically an edit operation, since referential integrity may prevent a true edit (I haven't tried, so feel free to conduct your own experiment!)
The operation would go something like this:
Add a new row to the primary table, using the new PK value
Run an update operation to change all FK values to the new PK value
Delete the old PK row
I'd run all that in a transaction, too. But I will state again for the record, I do not recommend taking this approach.
As aKzenT pointed out, it is best to always use an Auto-Number/Identity or Sequence (Oracle) when defining primary keys. It is much more efficient for b-tree processors to find and join numeric keys, especially when textual ones are longer that a few bytes. Smaller keys also result in fewer b-tree pages that need to be searched.
Another important reason is that auto-generated keys cannot be modified. When using modifiable textual keys, foreign keys must employ CASCADE UPDATE which many (ex. Oracle, DB2) RDBMS do not support declaratively and must be defined using triggers, which is very complicated.
In your case, replacing the textual key with an auto-generated primary key will eliminate the problem.
The application I have completed has gone live and we are facing some very specific problems as far as response time is concerned in specific tables.
In short, response time in some of the tables that have 5k rows is very low. And these tables will grow in size.
Some of these tables (e.g. Order Header table) have a uniqueidentifier as the P.K. We figure that this may be the reason for the low response time.
On studying the situation we have decided the following options
Convert the index of the primary key in the table OrderHeader to a non-clustered one.
Use newsequentialid() as the default value for the PK instead of newid()
Convert the PK to a bigint
We feel that option number 2 is ideal since option number 3 will require big ticket changes.
But to implement that we need to move some of our processing in the insert stored procedures to triggers. This is because we need to trap the PK from the OrderHeader table and there is no way we can use
Select #OrderID = newsequentialid() within the insert stored procedure.
Whereas if we move the processing to a trigger we can use
select OrderID from inserted
Now for the questions?
Will converting the PK from newid() to newsequentialid() result in performance gain?
Will converting the index of the PK to a non-clustered one and retaining both uniqueidentifier as the data type for PK and newid() for generating the PK solve our problems?
If you faced a similar sort of situation please do let provide helpful advice
Thanks a tons in advance people
Romi
Convert the index of the primary key in the table OrderHeader to a non-clustered one.
Seems like a good option to do regardless of what you do. If your table is clustered using your pkey and the latter is a UUID, it means you're constantly writing somewhere in the middle of the table instead of appending new rows to the end of it. That alone will result in a performance hit.
Prefer to cluster your table using an index that's actually useful for sorting; ideally something on a date field, less ideally (but still very useful) a title/name, etc.
Move the clustered index off the GUID column and onto some other combination of columns (your most often run range search, for instance)
Please post your table structure and index definitions, and problem query(s)
Before you make any changes: you need to measure and determine where your actual bottleneck is.
One of the common reasons for a GUID Primary Key, is generating these ID's in a client layer, but you do not mention this.
Also, are your statistics up to date? Do you rebuild indexes regularly?