Modification in Database due to use of GUID (uniqueidentifier) - c#

The application I have completed has gone live and we are facing some very specific problems as far as response time is concerned in specific tables.
In short, response time in some of the tables that have 5k rows is very low. And these tables will grow in size.
Some of these tables (e.g. Order Header table) have a uniqueidentifier as the P.K. We figure that this may be the reason for the low response time.
On studying the situation we have decided the following options
Convert the index of the primary key in the table OrderHeader to a non-clustered one.
Use newsequentialid() as the default value for the PK instead of newid()
Convert the PK to a bigint
We feel that option number 2 is ideal since option number 3 will require big ticket changes.
But to implement that we need to move some of our processing in the insert stored procedures to triggers. This is because we need to trap the PK from the OrderHeader table and there is no way we can use
Select #OrderID = newsequentialid() within the insert stored procedure.
Whereas if we move the processing to a trigger we can use
select OrderID from inserted
Now for the questions?
Will converting the PK from newid() to newsequentialid() result in performance gain?
Will converting the index of the PK to a non-clustered one and retaining both uniqueidentifier as the data type for PK and newid() for generating the PK solve our problems?
If you faced a similar sort of situation please do let provide helpful advice
Thanks a tons in advance people
Romi

Convert the index of the primary key in the table OrderHeader to a non-clustered one.
Seems like a good option to do regardless of what you do. If your table is clustered using your pkey and the latter is a UUID, it means you're constantly writing somewhere in the middle of the table instead of appending new rows to the end of it. That alone will result in a performance hit.
Prefer to cluster your table using an index that's actually useful for sorting; ideally something on a date field, less ideally (but still very useful) a title/name, etc.

Move the clustered index off the GUID column and onto some other combination of columns (your most often run range search, for instance)
Please post your table structure and index definitions, and problem query(s)
Before you make any changes: you need to measure and determine where your actual bottleneck is.
One of the common reasons for a GUID Primary Key, is generating these ID's in a client layer, but you do not mention this.
Also, are your statistics up to date? Do you rebuild indexes regularly?

Related

User should be able to reset autoincrement (identity) column: possible solutions

In the app there should be a functionality for the user to reset orderNumber whenever needed. We are using SQL Server for db, .NET Core, Entity Framework, etc. I was wondering what is the most elegant way to achieve this?
Thought about making orderNumber int, identity(1,1), and I've searched for DBCC CHECKIDENT('tableName', RESEED, 0), but the latter introduces some permissions concerns (the user has to own the schema, be sysadmin, etc.).
EDIT: orderNumber is NOT a primary key, and duplicate values are not the problem. We should just let the user (once a year probably) reset the numbering of their orders to start from 1 again..
Any advice?
An identity column is used to auto-generate incremental values, so if you're relying on this column as the primary key or some unique identifer for rows, updating this can cause issues with duplicates.
It's difficult to recommend the best solution without knowing more about your use case, but I would consider (1) if this orderNumber should be the PK or would some surrogate key like (customerId, locationId, date) makes sense and allows you to more freely update orderNumber without impacts on data integrity, or (2) if keeping orderNumber as an identity make sense, but you could build a data model or table that maps multiple rows in this table to the same "order" allowing you to maintain the key on this base table.
It seems that orderNumber is a business layer concern - therefore I recommend a non-SQL solution. You need C# code that generates the number for storage in your "Order" entity. I wouldn't use IDENTITY() to implement/solve this.
The customer isn't going to reset anything in the DB, your code will do this. You need a "take a number" service in your business layer and a place in the UI to reset it (presumable Per Customer).
Sql Server has Sequence. My only concern regarding using it is partitioning per customer (an assumed requirement). Will you have multiple customers? If so, you probably can't have a single number generator. Hence why I suggest a C# implementation (sure, you'll want to save the state as numbers are handed out).
Identity should not be used in the way you're suggesting. Presumably you don't want a customer to get two different orders with the same order number (i.e., order number is unique within customer). If you don't care if customers get discontinuous order numbers, then you can use a sequence, but if you want continuous order numbers, then you would need to create an separate sequence for each customer, which is not a good solution either. I suggest you set the order number to max([order number]) over(partition by [customer id]) + 1 on the insert. That will automatically give you the next order number for a particular customer.

Size of a PK int in SQL Server / ASP.NET MVC 4

Currently, my primary key data type is an int, not null, and is defined as following from the application:
public virtual int Id { get; set; }
I am a bit worried though, since int is restricted, at some point, it will not be possible to add new rows because of the size of the primary key.
How should this problem be approached? I was thinking about a) simply changing the data type to long, etc. or, b) if possible, remove the primary key since it is not used at any time in the application?
Don't remove your primary key, you need it to identify records in your database. If you expect to have more than int can handle, you can:
Make it bigint
Make it uniqueidentifier
Make it a composite key (made up of two or more fields)
Unless you're dealing with very small tables (10 rows), you never want to go without a primary key. The primary key dramatically affects performance as it provides your initial unique and clustered index. Unique indexes play a vital role in maintaining data integrity. Clustered indexes play a vital role in allowing SQL Server to perform index and row seeks instead of scans. Basically, does it have to load one row or all of the rows.
Changing the data-type will affect your primary index size, row size, as well the size of as any index placed on the table. Unless you're worried about exceeding 2,147,483,647 rows in the near future, I would stick with an INT. Every data type has a restricted row count.
Do you really think you'll get above 2,147,483,647 rows? I doubt it. I wouldn't worry about it.
If you, at some point, begin to reach the limit, it should be trivial to change it to a bigint.
It depends on how big you're expecting this table to become - take a look at the reference page for SQL Server for supported ranges and you can answer the question about the need to change the data type of the PK for yourself.
If the key is really never used (not even as a foreign key) then culling it is entirely seemly.
You should always have a primary key, so I wouldn't remove that. However, do you really think you're going to exceed to limit of 2,147,483,647 rows in your table?
If it's really a concern, you could just change your dataType to a bigint.
Here is also a limits sheet on what SQL server can handle - that may help you get a fix on what you need to plan for.

SQL Server - formatted identity column

I would like to have a primary key column in a table that is formatted as FOO-BAR-[identity number], for example:
FOO-BAR-1
FOO-BAR-2
FOO-BAR-3
FOO-BAR-4
FOO-BAR-5
Can SQL Server do this? Or do I have to use C# to manage the sequence? If that's the case, how can I get the next [identity number] part using EntityFramwork?
Thanks
EDIT:
I needed to do this is because this column represents a unique identifier of a notice send out to customers.
FOO will be a constant string
BAR will be different depending on the type of the notice (either Detection, Warning or Enforcement)
So is it better to have just an int identity column and append the values in Business Logic Layer in C#?
If you want this 'composited' field in your reports, I propose you to:
Use INT IDENTITY field as PK in table
Create view for this table. In this view you can additionally generate the field that you want using your strings and types.
Use this view in your repoorts.
But I still think, that there is BIG problem with DB design. I hope you'll try to redesign using normalization.
You can set anything as the PK in a table. But in this instance I would set IDENTITY to just an auto-incrementing int and manually be appending FOO-BAR- to it in the SQL, BLL, or UI depending on why it's being used. If there is a business reason for FOO and BAR then you should also set these as values in your DB row. You can then create a key in the DB between the two three columns depending on why your actually using the values.
But IMO I really don't think there is ever a real reason to concatenate an ID in such a fashion and store it as such in the DB. But then again I really only use an int as my ID's.
Another option would be to use what an old team I used to be on called a codes and value table. We didn't use it for precisely this (we used it in lieu of auto-incrementing identities to prevent environment mismatches for some key tables), but what you could do is this:
Create a table that has a row for each of your categories. Two (or more) columns in the row - minimum of category name and next number.
When you insert a record in the other table, you'll run a stored proc to get the next available identity number for that category, increment the number in the codes and values table by 1, and concatenate the category and number together for your insert.
However, if you're main table is a high-volume table with lots of inserts, it's possible you could wind up with stuff out of sequence.
In any event, even if it's not high volume, I think you'd be better off to reexamine why you want to do this, and see if there's another, better way to do it (such as having the business layer or UI do it, as others have suggested).
It is quite possible by using computed column like this:
CREATE TABLE #test (
id INT IDENTITY UNIQUE CLUSTERED,
pk AS CONCAT('FOO-BAR-', id) PERSISTED PRIMARY KEY NONCLUSTERED,
name NVARCHAR(20)
)
INSERT INTO #test (name) VALUES (N'one'), (N'two'), (N'three')
SELECT id, pk, name FROM #test
DROP TABLE #test
Note that pk is set to NONCLUSTERED on purpose because it is of VARCHAR type, while the IDENTITY field, which will be unique anyway, is set to UNIQUE CLUSTERED.

T-SQL query timeout / performance issue

I have a table having around 1 million records. Table structure is shown below. The UID column is a primary key and uniqueidentifier type.
Table_A (contains a million records)
UID Name
-----------------------------------------------------------
E8CDD244-B8E4-4807-B04D-FE6FDB71F995 DummyRecord
I also have a function called fn_Split('Guid_1,Guid_2,Guid_3,....,Guid_n') which accepts a list of comma
seperated guids and gives back a table variable containing the guids.
From my application code I am passing a sql query to get new guids [Keys that are with application code but not in the database table]
var sb = new StringBuilder();
sb
.Append(" SELECT NewKey ")
.AppendFormat(" FROM fn_Split ('{0}') ", keyList)
.Append(" EXCEPT ")
.Append("SELECT UID from Table_A");
The first time this command is executed it times out on quite a few occassions. I am trying to figure out what would be a better approach here to avoid such timeouts and/or improve performance of this.
Thanks.
Firstly add an index if there isn't one, on table_a.uid, but i assume there is.
Some alternate queries to try,
select newkey
from fn_split
left outer join table_a
on newkey = uid
where uid IS NULL
select newkey
from fn_split(blah)
where newkey not in (select uid
from table_a)
select newkey
from fn_split(blah) f
where not exists(select uid
from table_a a
where f.newkey = a.uid)
There is plenty of info around here as to why you should not use a Guid for your primary key, especially if it in unordered. That would be the first thing to fix. As far as your query goes you might try what Paul or Tim suggested, but as far as I know EXCEPT and NOT IN will use the same execution plan, though the OUTER JOIN may be more efficint in some cases.
If you're using MS SQL 2008 then you can/should use TableValue Parameters. Essentially you'd send in your guids in the form of a DataTable to your stored procedure.
Then inside your stored procedure you can use the parameters as a "table" and do a join or EXCEPT or what have you to get your results.
This method is faster than using a function to split because functions in MS SQL server are really slow.
But I guess is the time is being taken due to massive Disk I/O this query requires. Since you're searching on your UId column and since they are "random" no index is going to help here. The engine will have to resort to a table scan. Which means you'll need some serious Disk I/O performance to get the results in "good time".
Using the Uid data type as in index is not recommended. However, it may not make a difference in your case. But let me ask you this:
The guids that you send in from your app, are in just a random list of guids or is here some business relationship or entity relationship here? It's possible, that your data model is not correct for what you are trying to do. So how do you determine what guids you have to search on?
However, for argument sake, let's assume your guids are just a random selection then there is no index that is really being used since the database engine will have to do a table scan to pick out each of the required guids/records from the million records you have. In a situation like this the only way to speed things up is at the physical database level, that is how your data is physically stored on the hard drives etc.
For example:
Having faster drives will improve performance
If this kind of query is being fired over and over then more memory on the box will help because the engine can cache the data in memory and it won't need to do physical reads
If you partition your table then the engine can parallelize the the seek operation and get you results faster.
If your table contains a lot of other fields that you don't always need, then spliting the table in two tables where table1 contains the guid and the bare minimum set of fields and table2 contains the rest will speed up the query quite a bit due to the disk I/O demands being less
Lot's of other things to look at here
Also note that when you send in adhoc SQL statements that don't have parameters the engine has to create a plan each time you execute it. In this case it's not a big deal but keep in mind that each plan will be cached in memory thus pushing out any data that might have been cached.
Lastly you can always increase the commandTimeOut property in this case to get past the timeout issues.
How much time does it take now and what kind of improvement are you looking to get ot hoping to get?
If I understand your question correctly, in your client code you have a comma-delimited string of (string) GUIDs. These GUIDS are usable by the client only if they don't already exist in TableA. Could you invoke a SP which creates a temporary table on the server containing the potentially usable GUIDS, and then do this:
select guid from #myTempTable as temp
where not exists
(
select uid from TABLEA where uid = temp.guid
)
You could pass your string of GUIDS to the SP; it would populate the temp table using your function; and then return an ADO.NET DataTable to the client. This should be very easy to test before you even bother to write the SP.
I am questioning what you do with this information.
If you insert the keys into this table afterwards you could simply try to insert them on first hand - that's much faster and more solid in a multi-user environment then query first insert later:
create procedure TryToInsert #GUID uniqueidentifier, #Name varchar(n) as
begin try
insert into Table_A (UID,Name)
values (#GUID, #Name);
return 0;
end try
begin catch
return 1;
end;
In all cases you can split the KeyList at the client to get faster results - and you could query the keys that are not valid:
select UID
from Table_A
where UID in ('new guid','new guid',...);
If the GUID are random you should use newsequentialid() with you clustered primary key:
create table Table_A (
UID uniqueidentifier default newsequentialid() primary key,
Name varchar(n) not null
);
With this you can insert and query your newly inserted data in one step:
insert into Table_A (Name)
output inserted.*
values (#Name);
... just my two cents
In any case, are not GUIDs intrinsically engineered to be, for all intents and purposes, unique? (i.e. universally unique -- doesn't matter where generated). I wouldn't even bother to do the test beforehand; just insert your row with the GUID PK and if the insert should fail, discard the GUID. But it should not fail, unless these are not truly GUIDs.
http://en.wikipedia.org/wiki/GUID
http://msdn.microsoft.com/en-us/library/ms190215.aspx
It seems you are doing a lot of unnecessary work, but perhaps I don't grasp your application requirement.

How to fetch lots of database table records by primary key?

Using the ADO.NET MySQL Connector, what is a good way to fetch lots of records (1000+) by primary key?
I have a table with just a few small columns, and a VARCHAR(128) primary key. Currently it has about 100k entries, but this will become more in the future.
In the beginning, I thought I would use the SQL IN statement:
SELECT * FROM `table` WHERE `id` IN ('key1', 'key2', [...], 'key1000')
But with this the query could be come very long, and also I would have to manually escape quote characters in the keys etc.
Now I use a MySQL MEMORY table (tempid INT, id VARCHAR(128)) to first upload all the keys with prepared INSERT statements. Then I make a join to select all the existing keys, after which I clean up the mess in the memory table.
Is there a better way to do this?
Note: Ok maybe its not the best idea to have a string as primary key, but the question would be the same if the VARCHAR column would be a normal index.
Temporary table: So far it seems the solution is to put the data into a temporary table, and then JOIN, which is basically what I currently do (see above).
I've dealt with a similar situation in a Payroll system where the user needed to generate reports based on a selection of employees (eg. employees X,Y,Z... or employees that work in certain offices). I've built a filter window with all the employees and all the attributes that could be considered as a filter criteria, and had that window save selected employee id's in a filter table from the database. I did this because:
Generating SELECT queries with dynamically generated IN filter is just ugly and highly unpractical.
I could join that table in all my queries that needed to use the filter window.
Might not be the best solution out there but served, and still serves me very well.
If your primary keys follow some pattern, you can select where key like 'abc%'.
If you want to get out 1000 at a time, in some kind of sequence, you may want to have another int column in your data table with a clustered index. This would do the same job as your current memory table - allow you to select by int range.
What is the nature of the primary key? It is anything meaningful?
If you're concerned about performance I definitely wouldn't recommend an 'IN' clause. It's much better try do an INNER JOIN if you can.
You can either first insert all the values into a temporary table and join to that or do a sub-select. Best is to actually profile the changes and figure out what works best for you.
Why can't you consider using a Table valued parameter to push the keys in the form of a DataTable and fetch the matching records back?
Or
Simply you write a private method that can concatenate all the key codes from a provided collection and return a single string and pass that string to the query.
I think it may solve your problem.

Categories

Resources