Generating Unique & Randon codes in SQL Server and .NET - c#

I have a requirement to generate a semi-random code in C#/ASP.NET that has to be unique in the SQL Server database.
These codes need to be generated in batches of up to 100 codes per run.
Given the requirements, I'm not sure how I can do this without generating a code and then checking the database to see if it exists, which seems like a horrible way of doing it.
Here are the requirements:
Maximum 10 characters long (alpha-numeric only)
Must not be case sensitive
User can specify an optional 3 character prefix for the code
Must not violate 2 column unique constraint in the database, i.e. must be a unique "code text" within the "category" (CONSTRAINT ucCodes UNIQUE (ColumnCodeText, ColumnCategoryId))
So, given the 10 character limit, GUIDs are not an option. Given the case insensitivity requirement, the mathematical probability for database collisions are fairly high, I think.
At the same time, there are enough possible combinations that a straight look-up table in the DB would be prohibitive, I believe.
Is there a reasonably performant way of generating codes with these requirements that doesn't involve saving them to the DB one code at a time and waiting for a unique key violation to see if it goes through?

You have two options here.
You generate a new ID and insert it. If it throws dup unique key exception then try again until you succeed or bail if you run out of IDs. The performance will stink if most of the IDs are used up.
You pregenerate all the possible IDs and store them in a table. Whenever you need to get one you can remove one from a random row index and use that as the ID. Database will take care of the concurrency for you so its guarantee unique. if the first three letters are given then you can simply add a where clause to restrict the rows to match that constraint.

Related

Should primary key always start from 1?

I am migrating an old database (oracle) and there are few tables like CountryCode, DeptCode and RoleCodes, their primary key is string (Codes) and i am thinking about adding Number column as a primary key because it would work fast with joins. These tables are not really big.
I am wondering if primary key for those tables should start from number '1' or it can be started from 100 just to differentiate b/w tables PK although i don't think i would be showing them on reports.
For sequence-generated IDs, I would suggest starting at different values if it's easy to do (depends on your database etc). You shouldn't be using this to differentiate between them in code, but it can make testing more reasonable.
Before now, I've had a situation where I've accidentally used a foreign key one table as if it were the foreign key for another table. The tests passed as the IDs were coincidentally the same. After we discovered the problem, we changed the initial seed and found the tests were a lot clearer.
You shouldn't do it to differentiate between tables. That is just not practical.
Not all primary keys have to start at 1, as in the case of an order number.
The rationale you're using to switch to an integer primary key doesn't seem valid: the performance gain you'd see using an INT rather than the original codes (which I assume are strings) will be negligable. The PK is always indexed, and indexes for strings or numerics are as good as instant. So unless you really need an INT, I'd be tempted to stick with the original data-type and work with the original data - simplifies data migration (which is something that should be considered whilst doing any work).
It is very common for example in ERP systems to define number ranges that
represent a certain group of items.
This can be both as position in a bigger number, e.g.
1234567890
| |
index 4 - 6 represents region code
index 7 - 8 represents dept code...
or, as I suspect in your case, parts at the same place, like
1000 - 1999 Region codes
2000 - 2999 DeptCode
3000 - 3999 RoleCode
Therefore: No, it not necessarily starts with 1.
Bigger ERP Systems have even configuration sections for number ranges!
Now, from a database point of view:
Yes, your tables should always have a primary key!
Having one will tremendously improve performance on average cases.
(but in most database systems, if you do not provide one, one will be
set by the DBMS which you do not see and can not handle. Some DBMS even
create indices, but thats another story)
I think it does not matter the start number or the start value that will hold the primary key .
What is important is that they will be represented in the FK of the join tables with the same values that are in the PK of the MAIN table .
A surrogate key can have any values, as long as they are unique. That's what makes it "surrogate" after all - values have no intrinsic meaning on their own, and shouldn't generally even be shown to the user. That being said, you could think about using different seeds, just for testing purposes, as Jon Skeet suggested.
That being said, do you really need to introduce a new (surrogate) key? The existing natural key could actually lead to less1 JOINS, and may be useful for clustering. While there are legitimate uses for surrogate keys, don't do it just becaus it is "fashionable" - always be aware of the tradeoffs you are making and pick the right balance for you concrete needs.
1 It is automatically "propagated" down foreign keys, so you don't need to JOIN the child table to the parent just to get the natural key - natural key is already in the child.
Doesn't matter what int the primary key starts from.
Assuming the codes aren't updated regularly, I don't believe that int will be any faster. It more heavily depends on it being a varchar or of a known size.
I personally always have an field names "Id" as a primary key to a table, defined as an int or a bigInt if necessary.
If the table matches up to an enumerated type then I make sure the Id matches the EnumeratedType id which can be any number - so no it doesn't need to start from 1.
If it doesn't match an enumerated type, then I will usually use an auto-incrementing key starting from 1 but this is not always needed.
Note - that if the number of rows is small, then the difference between indexing on a number and on a varchar will be negligible.
yes, it does'nt matter what integer it start from, it main use is define row uniquely and relationship among other table.

How to provide Unique ID in Offline mode?

In a client-server accounting application in invoice form when a user saves an invoce it gets An invoice number like 90134 from server and saves the invoice with that number The invoice number is needed for the customer.
So in Offline mode (like when the network dropped) how provide a unique id?
Is it good to use String Id like this pattern: client + incremental number?
I don't want to use GUIDs.
If you know in advance how many invoice numbers you will generate per client during an offline period, would you be able to pre-allocate invoice numbers? e.g. if each client is likely only to generate 4 invoices per offline period, you could allocate a block of 4 numbers to each client. This may involve an extra column in your DB to store a value indicating whether the number is an invoice already created, or a preallocation of a number. Depending on the structure and constraints within your DB, you may also need to store some dummy data to enforce referential integrity.
The downsides would be that your block of numbers may not get used sequentially, or indeed at all, so your invoice numbers would not be in chronological order. Also, you would run into problems if the pool of available numbers is used up.
You can use Guid:
var myUniqueID = Guid.NewID();
In SQL server is corresponding type uniqueidentifier.
In general the Guid is 128-bit number.
More about Guid you can read:
http://en.wikipedia.org/wiki/Globally_unique_identifier
http://msdn.microsoft.com/en-us/library/system.guid.aspx
I suppose the invoice number (integer) is incremental: in this case, since you have no way of knowing the last invoice number, you could save the invoice in a local db/cache/xml without the invoice Number and wait for the network connection to insert the new records in the DB (the invoice number would be generated then)
You could start your numbers for each client at a different range... e.g.:
client 1: 1,000,000
client 2: 2,000,000
client 3: 3,000,000
Update them every now and then when there is a connection to avoid overlaps.
It's not 100% bulletproof, but at least it's better than nothing.
My favorite would still be a GUID for this, since they're always unique.
There is a workaround, but it is merely a "dirty hack", you should seriously reconsider accepting new data entries while offline, especially when dealing with unique IDs to be inserted in many tables.
Say you have an "orders" table and another "orderDetails" table in your local dataset:
1- add a tmpID of type integer in your "orders" table to temporarily identify each unique order.
2- use the tmpID of your newly created order in the rest of the process (say for adding products to the current order in the orderDetails table)
--> once you are connected to the server, in a single transaction do the following
1- insert the first order in the "orders" table
2- get its uniqueID generated on your SQL server
3- search for each line in "orderDetails" that have a tmpID of currentOrder.tmpID and insert them in the "orderDetails" table on your server
4- commit the transaction and continue to the following row.
Keep in mind that this is very bad coding and that it can get real dirty and hard to maintain.
it looks like impossible to create unique numbers with two different systems both offline when it must be chronological and without missing numbers.
imho there is no way if the last number (on the server) was 10, to know if i should return 11 or 12; i would have to know if 11 was already used by another person.
I can only imagine to use a temporary number and later on renumber those numbers, but if the invoices are printed and the number can not be changed, i don't know how you could accomplish such a solution.

.Net to SQL, best way to compare unique values

i have an import process, where i take a CSV file in .Net, and it writes to two user tables, in the DataBase. It's a pretty complex process, it takes several minutes to process about five hundred users at at time.
In that process, i need to generate a string, random string, that will be unique to each user, as it gives him access to some promotions. I can't use GUIDs because it has to be a simple string for the user to input in a splash screen.
What I need to know is, what is the best way to check if each newly generated key doesn't repeat any already created in thousands of pre-existing users.
I don't want to add a new query in each inserted row, asking if the string is already there.
Thanks, i hope i was clear enough
How many users are there compared to the number of possible unique keys?
If there are many more possible keys than there are users then I'd just add a unique constraint on the key column, and generate a new key if you hit a constraint violation.
If you're likely to get a lot of collisions with the above technique then there are a few options open to you:
Pre-generate sets of unique keys, store them in a table somewhere and take one when needed.
Add some uniqueness to the keys: do the users have a unique id that could be incorporated?
You can store string in Dictionary in Key value. If string is repeated then it will generate an error, here you can handle error and generate new string for user.
Hope it will help for you.
One simple way could be to use part of an MD5 for the customer + the customerid encoded in hex.
The customerid part ensures uniqueness and the MD5 part ensures that you cannot guess another users key.
Depending on how short string you can handle you can use just the first 6-10 chars from the MD5 and if you need to shorten it further reencode using somthing more compact than Hex, like base-64 or if you cannot handle different case make your own selection, A-Z + 0-9 and maybe some special chars to get an exponent of 2 that is easy to map to hex.

Auto generation of ID

I need to generate an id with the
following features:
Id must be unique
Id consist of two parts 'type' and 'auto incremented' number
'type' is integer and value can be 1, 2 or 3
'auto incremented' number starts with 10001 and incremented each time id
is generated.
type is selected from a web form and auto incremented number
is from the database.
Example: if type is selected 2 and auto incremented number is 10001
then the generated id is = 210001
There may be hundrads of users generating id. Now my question is,
Can this be done without stored procedure so that there is no id confict.
I am using ASP.Net(C#), Oracle, NHibernate
As you use Oracle, you can use a Sequence for that.
Each time you call your_sequence.NEXTVAL, a unique number is returned.
Why isn't the NHibernate implementation of Hi-Lo acceptable?
What’s the Hi/Lo Algorithm
What's the point in having the first digit of the ID to define the type? You should use a separate column for this, and then just use a plain auto-incrementing primary key for the actual ID.
The cleanest way is - as Scott Anderson also said - to use two columns. Each attribute should be atomic, i.e. have only one meaning. With a multi-valued column you'll have to apply functions (substr) to reveal for example the type. Constraints will be harder to define. Nothing beats a simple "check (integer_type in (1,2,3))" or "check (id > 10000)".
As for defining your second attribute - let's call it "id" - the number starting from 10001, you have two good strategies:
1) use one sequence, start with 1, and for display use the expression "10000 + row_number() over (partition by integer_type order by id)", to let the users see the number they want.
2) use three sequences, one for each integer_type, and let them have a start with clause of 10001.
The reason why you should definitely use sequences, is scalability. If you don't use sequences, you'll have to store the current value in some table, and serialize access to that table. And that's not good in a multi user system. With sequences you can set the cache property to reduce almost all contention issues.
Hope this helps.
Regards,
Rob.
If you can't use auto incrementing types such as sequences, have a table containing each type and keeping score of its current value. Be careful to control access to this table and use it to generate new numbers. It is likely it will be a hot spot in your db though.

TSQL: Generate human readable ids

We have a large database with enquiries, each enquirys is referenced using a Guid. The Guid isn't very customer friendly so we want to the additional 5 digit "human id" (ok as we'll very likely won't have more than 99999 enquirys active at any time, and it's ok if a humanuid reference multiple enquirys as they aren't used for anything important).
1) Is there any way to have a IDENTITY column reset to 1 after 99999?
My current workaround to this is to use a INT IDENTITY(1,1) NOT NULL column and when presenting a HumanId take HumanId % 100000.
2) Is there any way to automatically "randomly distribute" the ids over [0..99999] so that two enquirys created after each other don't get the adjacent ids? I guess I'm looking for a two-way one-to-one hash function??
... Ideally I'd like to create this using T-SQL automatically creating these id's when a enquiry is created.
If performance and concurrency isn't too much of an issue, you can use triggers and the MAX() function to calculate a 'next human ID' value. You probably would want to keep your IDENTITY column as is, and have the 'human ID' in a separate column.
EDIT: On a side note, this sounds like a 'presentation layer' issue, which shouldn't be in your database. Your presentation layer of your application should have the code to worry about presenting a record in a human readable manner. Just a thought...
If you absolutely need to do this in the database, then why not derive your human-friendly value directly from the GUID column?
-- human_id doesn't have to be calculated when you retrieve the data
-- you could create a computed column on the table itself if you prefer
SELECT (CAST(your_guid_column AS BINARY(3)) % 100000) AS human_id
FROM your_table
This will give you a random-ish value between 0 and 99999, derived from the first 3 bytes of the GUID. If you want a larger, or smaller, range then adjust the divisor accordingly.
I would strongly recommend relooking at your logic. Your approach has a few dangers, including:
It is always a bad idea to re-use ID's, even if the original record has become "obsolete" - do you lose anything by continuing to grow ID's beyond 99999? The problem here is more likely to be with long term maintenance, especially if there is any danger of the system developing over time. Another thing to consider - is there any chance a user will take this reference number, and use it to reference your system at some stage in the future?
With manually assigning a generated / random ID, you will need to ensure that multiple records are not assigned the same ID. There are a few options that you have to follow this (for example, using transactions), however you should ensure that the scope of the transactions is not going to leave you open to problems with concurrent transactions being blocked - this may cause a few problems eg. Performance. You may be best served by generating your ID externally (as SQL does not do random especially well), and then enforcing a unique constraint on your DB, perhaps in the way suggested by Firoz Ansari.
If you still want to reset the identity column, this can be done with the DBCC CHECKIDENT command.
An example of generating random seeds in SQL server can be found here:
http://weblogs.sqlteam.com/jeffs/archive/2004/11/22/2927.aspx
You can create composite primary key with two columns, say..BatchId and HumanId.
Records in these columns will look like this:
BatchId, HumanId
1, 1
1, 2
1, 3
.
.
1, 99998
1, 99999
2, 1
2, 2
3, 3
use MAX or ORDER BY DESC to get next available HumanId with condition with BachId
SELECT TOP 1 #NextHumanId=HumanId
FROM [THAT_TABLE]
ORDER BY BatchId DESC, HumanID DESC
IF #NextHumanId>=99999 THEN SET #NextHumanId=1
Hope this help.
You could have a table of available HUMANIDs, each time you add an enquiry you could randomly pull a HUMANID from the table (and DELETE it), and each time you delete the enquiry you could add it back (by INSERTing).

Categories

Resources