Create random string and check its uniqueness in DB - c#

Check the code bellow, the RandomMan.MyRandomString(64) is generating a random string of 64 char.
Now I want to check if this random string is unique in database using entityframework query like bellow. And if this string is not unique in database then it will continue the do loop until it finds a unique random string. Now my question is am I doing it correctly? Or is there any better way than that?
string randstr;
do {
randstr = RandomMan.MyRandomString(64);
} while (DataCtx.StorageFiles.Any(x => x.AwsUniqueFileName == randstr));

I cannot tell whether or not you are doing it correctly, but if you already have the row in DB, I could suggest concatenating (adding) your identity field id to the produced string so you make sure that the result is unique in the DB, given that your MyRandomString only produces chars (or no numbers at the end)
Let's say your generated string is abc and the id of the row you are updating is 53 then your final unique string is going to be abc53

Standard approach for this is to just generate GUID:
Console.WriteLine(Guid.NewGuid());
It's designed to be unique and highly unlikely to generate two identical GUIDs even on many instances at the same time so you don't need to worry much about atomicity of this operation.
The possibility of collision is so low that you can skip handling it at all, but just to be sure you can set unique key on this column and treat it as an exception, no need for loop for sure.

Related

Generating Unique & Randon codes in SQL Server and .NET

I have a requirement to generate a semi-random code in C#/ASP.NET that has to be unique in the SQL Server database.
These codes need to be generated in batches of up to 100 codes per run.
Given the requirements, I'm not sure how I can do this without generating a code and then checking the database to see if it exists, which seems like a horrible way of doing it.
Here are the requirements:
Maximum 10 characters long (alpha-numeric only)
Must not be case sensitive
User can specify an optional 3 character prefix for the code
Must not violate 2 column unique constraint in the database, i.e. must be a unique "code text" within the "category" (CONSTRAINT ucCodes UNIQUE (ColumnCodeText, ColumnCategoryId))
So, given the 10 character limit, GUIDs are not an option. Given the case insensitivity requirement, the mathematical probability for database collisions are fairly high, I think.
At the same time, there are enough possible combinations that a straight look-up table in the DB would be prohibitive, I believe.
Is there a reasonably performant way of generating codes with these requirements that doesn't involve saving them to the DB one code at a time and waiting for a unique key violation to see if it goes through?
You have two options here.
You generate a new ID and insert it. If it throws dup unique key exception then try again until you succeed or bail if you run out of IDs. The performance will stink if most of the IDs are used up.
You pregenerate all the possible IDs and store them in a table. Whenever you need to get one you can remove one from a random row index and use that as the ID. Database will take care of the concurrency for you so its guarantee unique. if the first three letters are given then you can simply add a where clause to restrict the rows to match that constraint.

Reduce Number of Primary Keys

I am working on a project which reads information from CSV files posted twice a day and stores the info into a database. Each CSV file may contain rows from previous files. Unfortunately, to get a unique row in the CSV files, you have to assign 8 columns as the primary key. I feel that this is ridiculous to work with. So, I really want to reduce the number down to one. So far, the only idea I have is to create a hash of all of the primary key columns or just append them all into one string. Before I do this, I'd like to know if there might be a better way to reduce the 8 primary keys down to one.
PK columns are defined as:
// ....
table.Columns.Add("plantNumber",typeof(string)); //e.g. 341
table.Columns.Add("shipLocation",typeof(string)); //e.g. 11000047
table.Columns.Add("shipDate",typeof(DateTime)); //e.g. 2017/04/18 00:00
table.Columns.Add("releaseNumber",typeof(string)); //e.g. VH6516128
table.Columns.Add("releaseDate",typeof(DateTime)); //e.g. 2017/04/14
table.Columns.Add("orderNumber",typeof(string)); //e.g. 216967
table.Columns.Add("orderLine",typeof(string)); //e.g. 0011
table.Columns.Add("sequence",typeof(string)); //e.g. 044
// ....
table.PrimaryKey = new DataColumn[]
{
table.Columns["plantNumber"],
table.Columns["shipLocation"],
table.Columns["shipDate"],
table.Columns["releaseDate"],
table.Columns["releaseNumber"],
table.Columns["orderNumber"],
table.Columns["orderLine"],
table.Columns["sequence"],
};
Note: the reason many of the seemingly numeric fields are treated as a string instead of an int is because they quoted in the CSV file, and may begin with zero's which I need to preserve. I also do not know 100% certain they won't ever contain letters.
UPDATE:
I don't consider an auto-incremental number to be a good solution, because I still need to ensure that not only within the SQL DB, but within the DataTable itself that the combination of the 8 columns are unique. The individual columns by themselves are not unique. Only the combination of the columns.
To me that is not a primary key. The primary key isn't 'the only thing unique' in your row. An unique index can do the same for you.
A primary key (in my opinion), should just be a single (often) numerical value to technically represent the data as unique. Functionally something else can define a row as unique, as you have in your sample here, but I wouldn't make that the primary key just for that reason only.
Nothing wrong with a compound index. Thats's how relational databases work, but if you really have to you could concat or hash the 8 values that build the unique key into a single column, but that would have the adverse effect of making your data static, unless you rebuild the hash/concat index.

Find a best way to search data structure string compare

I have a question which data structure is the best for particular situation.
we have one string "AAAAAAAAAAA", and we want to know this string contain in one data base column or not.
For example below database there is two column.
1. ID 2. Name
1 A
2 B
3 C
.....
49581 AAAAAAAAAAA
if it's match then, return true if not false.
I know I can use list<string> but I don't think it's best way to searching
I want to know which data structure is best way to search in this case.
A HashSet<string> would be faster to search than a List<string> if you only need to know whether the string exists.
HashSet<T> Class
..or if you feel adventurous, creating a "ternary search tree" or a "trie" may be an option:
http://www.drdobbs.com/database/ternary-search-trees/184410528
Similar to another answer, but note that if you have a hash table then for each hashed string in the column you can store the row number(s) that have that string in the hash table position for the string. So hashing is not just limited to determining whether the string exists in your column or not.

.Net to SQL, best way to compare unique values

i have an import process, where i take a CSV file in .Net, and it writes to two user tables, in the DataBase. It's a pretty complex process, it takes several minutes to process about five hundred users at at time.
In that process, i need to generate a string, random string, that will be unique to each user, as it gives him access to some promotions. I can't use GUIDs because it has to be a simple string for the user to input in a splash screen.
What I need to know is, what is the best way to check if each newly generated key doesn't repeat any already created in thousands of pre-existing users.
I don't want to add a new query in each inserted row, asking if the string is already there.
Thanks, i hope i was clear enough
How many users are there compared to the number of possible unique keys?
If there are many more possible keys than there are users then I'd just add a unique constraint on the key column, and generate a new key if you hit a constraint violation.
If you're likely to get a lot of collisions with the above technique then there are a few options open to you:
Pre-generate sets of unique keys, store them in a table somewhere and take one when needed.
Add some uniqueness to the keys: do the users have a unique id that could be incorporated?
You can store string in Dictionary in Key value. If string is repeated then it will generate an error, here you can handle error and generate new string for user.
Hope it will help for you.
One simple way could be to use part of an MD5 for the customer + the customerid encoded in hex.
The customerid part ensures uniqueness and the MD5 part ensures that you cannot guess another users key.
Depending on how short string you can handle you can use just the first 6-10 chars from the MD5 and if you need to shorten it further reencode using somthing more compact than Hex, like base-64 or if you cannot handle different case make your own selection, A-Z + 0-9 and maybe some special chars to get an exponent of 2 that is easy to map to hex.

Auto generation of ID

I need to generate an id with the
following features:
Id must be unique
Id consist of two parts 'type' and 'auto incremented' number
'type' is integer and value can be 1, 2 or 3
'auto incremented' number starts with 10001 and incremented each time id
is generated.
type is selected from a web form and auto incremented number
is from the database.
Example: if type is selected 2 and auto incremented number is 10001
then the generated id is = 210001
There may be hundrads of users generating id. Now my question is,
Can this be done without stored procedure so that there is no id confict.
I am using ASP.Net(C#), Oracle, NHibernate
As you use Oracle, you can use a Sequence for that.
Each time you call your_sequence.NEXTVAL, a unique number is returned.
Why isn't the NHibernate implementation of Hi-Lo acceptable?
What’s the Hi/Lo Algorithm
What's the point in having the first digit of the ID to define the type? You should use a separate column for this, and then just use a plain auto-incrementing primary key for the actual ID.
The cleanest way is - as Scott Anderson also said - to use two columns. Each attribute should be atomic, i.e. have only one meaning. With a multi-valued column you'll have to apply functions (substr) to reveal for example the type. Constraints will be harder to define. Nothing beats a simple "check (integer_type in (1,2,3))" or "check (id > 10000)".
As for defining your second attribute - let's call it "id" - the number starting from 10001, you have two good strategies:
1) use one sequence, start with 1, and for display use the expression "10000 + row_number() over (partition by integer_type order by id)", to let the users see the number they want.
2) use three sequences, one for each integer_type, and let them have a start with clause of 10001.
The reason why you should definitely use sequences, is scalability. If you don't use sequences, you'll have to store the current value in some table, and serialize access to that table. And that's not good in a multi user system. With sequences you can set the cache property to reduce almost all contention issues.
Hope this helps.
Regards,
Rob.
If you can't use auto incrementing types such as sequences, have a table containing each type and keeping score of its current value. Be careful to control access to this table and use it to generate new numbers. It is likely it will be a hot spot in your db though.

Categories

Resources