I have a dataset with a column "person_code".
This column contains the following data:
"Carra"
"Carra " -> one trailing space
Now if i set the primary key of the dataset to the column "person_code" I'll get the following error:
"These columns don't currently have unique values."
Any way around this? The best I can think of is to add a new column "primary_key" and then replace the ending/starting spaces with another sign. This will cause some extra problems: if I replace them with _ and there's already a Carra_ in the database...
Is there a better way?
This is a database schema question, and the answer to your question is that as you have described it the person_code column is not suitable as a primary key, as its values are not unique.
Your primary key should be 100% unique - usually an incrementally generated number is a suitable choice.
Primary keys are ALWAYS unique, so if two person have the same person code then this will indeed cause this problem
Now, it seems that they are not entirely identical (as you say one has a trailing space. My guess is that your code may remove the trailing space by default. You might want to put a symbol at the start and at the end, like 'Carra' and 'Carra ' to make them different.
You may need to define another symbole that that though. Try one that you know will never be in your data.
Creating a new primary key field, preferably with an int or uniqueidentifier for the data type is the suggested solution for your problem and long term maintenance of your database.
Trailing spaces in a value, in general, are a bad idea. both from a UI perspective, but also regarding data integrity. You may consider cleansing the data, stripping leading and trailing spaces, special chars, etc from your data.
Related
I am working on a project which reads information from CSV files posted twice a day and stores the info into a database. Each CSV file may contain rows from previous files. Unfortunately, to get a unique row in the CSV files, you have to assign 8 columns as the primary key. I feel that this is ridiculous to work with. So, I really want to reduce the number down to one. So far, the only idea I have is to create a hash of all of the primary key columns or just append them all into one string. Before I do this, I'd like to know if there might be a better way to reduce the 8 primary keys down to one.
PK columns are defined as:
// ....
table.Columns.Add("plantNumber",typeof(string)); //e.g. 341
table.Columns.Add("shipLocation",typeof(string)); //e.g. 11000047
table.Columns.Add("shipDate",typeof(DateTime)); //e.g. 2017/04/18 00:00
table.Columns.Add("releaseNumber",typeof(string)); //e.g. VH6516128
table.Columns.Add("releaseDate",typeof(DateTime)); //e.g. 2017/04/14
table.Columns.Add("orderNumber",typeof(string)); //e.g. 216967
table.Columns.Add("orderLine",typeof(string)); //e.g. 0011
table.Columns.Add("sequence",typeof(string)); //e.g. 044
// ....
table.PrimaryKey = new DataColumn[]
{
table.Columns["plantNumber"],
table.Columns["shipLocation"],
table.Columns["shipDate"],
table.Columns["releaseDate"],
table.Columns["releaseNumber"],
table.Columns["orderNumber"],
table.Columns["orderLine"],
table.Columns["sequence"],
};
Note: the reason many of the seemingly numeric fields are treated as a string instead of an int is because they quoted in the CSV file, and may begin with zero's which I need to preserve. I also do not know 100% certain they won't ever contain letters.
UPDATE:
I don't consider an auto-incremental number to be a good solution, because I still need to ensure that not only within the SQL DB, but within the DataTable itself that the combination of the 8 columns are unique. The individual columns by themselves are not unique. Only the combination of the columns.
To me that is not a primary key. The primary key isn't 'the only thing unique' in your row. An unique index can do the same for you.
A primary key (in my opinion), should just be a single (often) numerical value to technically represent the data as unique. Functionally something else can define a row as unique, as you have in your sample here, but I wouldn't make that the primary key just for that reason only.
Nothing wrong with a compound index. Thats's how relational databases work, but if you really have to you could concat or hash the 8 values that build the unique key into a single column, but that would have the adverse effect of making your data static, unless you rebuild the hash/concat index.
I am pulling data from SQL Server to my c# project. I have some textboxes that update from page-to-page. I have one textbox in particular that is set up to only accept two characters, which is setup in the database as char(2). If I were to delete those two characters and click my button to update the database and go to the next page, it stores two empty spaces. I need it to just be empty with no spaces. In my other textboxes, this issue does not occur. The database allows the data to be null. I am able to manually enter "null" in the database, but I need it to be done when erasing the two chars and updating it.
A column declared as CHAR(2) may contain one of the following:
2 characters
NULL.
A column declared as CHAR(2) may not contain any of the following:
0 characters
1 character
3 characters
etc.
When you try to store anything other than either 2 characters or NULL in the column, your database will troll you in the name of some ill-conceived notion of convenience: instead of generating an error, it will store something other than what you gave it to store.
(Amusingly enough, receiving an error when doing something wrong is, and historically has been, regarded as an inconvenience by a surprisingly large portion of programmers. But that's okay, that's how we get stackoverflow questions to answer.)
Specifically, your database will pad the value you are storing with spaces, to match the length of the column. So, if you try to store just one character, it will add one space. If you try to store zero characters, it will add two spaces.
Possible Solutions:
If you have the freedom to change the type of the column:
Declare it as VARCHAR instead of CHAR(2), so that it will contain exactly what you store in it.
If you do not have the freedom to change the type of the column:
You have to always be manually checking whether you are about to store an empty string into it, and if so, store NULL instead.
Note about Oracle
The Oracle RDBMS before version 11g (and perhaps also in more recent versions, I am not sure, if someone knows, please leave a comment) will do that last conversion for you: if you try to store an empty string, it will store NULL instead. This is extremely treacherous due to the following reasons:
It is yet one more example of the database system trolling you by storing something very different from what you gave it to store.
They apply the same rule to all types of character columns, even VARCHAR, which means that you cannot have an empty string even in columns that could accommodate one; you store either NULL or an empty string, you always get NULL back.
This behavior is completely different from the behavior of any other RDBMS.
The behavior is fixed, there is no way to configure Oracle to quit doing that.
I am migrating an old database (oracle) and there are few tables like CountryCode, DeptCode and RoleCodes, their primary key is string (Codes) and i am thinking about adding Number column as a primary key because it would work fast with joins. These tables are not really big.
I am wondering if primary key for those tables should start from number '1' or it can be started from 100 just to differentiate b/w tables PK although i don't think i would be showing them on reports.
For sequence-generated IDs, I would suggest starting at different values if it's easy to do (depends on your database etc). You shouldn't be using this to differentiate between them in code, but it can make testing more reasonable.
Before now, I've had a situation where I've accidentally used a foreign key one table as if it were the foreign key for another table. The tests passed as the IDs were coincidentally the same. After we discovered the problem, we changed the initial seed and found the tests were a lot clearer.
You shouldn't do it to differentiate between tables. That is just not practical.
Not all primary keys have to start at 1, as in the case of an order number.
The rationale you're using to switch to an integer primary key doesn't seem valid: the performance gain you'd see using an INT rather than the original codes (which I assume are strings) will be negligable. The PK is always indexed, and indexes for strings or numerics are as good as instant. So unless you really need an INT, I'd be tempted to stick with the original data-type and work with the original data - simplifies data migration (which is something that should be considered whilst doing any work).
It is very common for example in ERP systems to define number ranges that
represent a certain group of items.
This can be both as position in a bigger number, e.g.
1234567890
| |
index 4 - 6 represents region code
index 7 - 8 represents dept code...
or, as I suspect in your case, parts at the same place, like
1000 - 1999 Region codes
2000 - 2999 DeptCode
3000 - 3999 RoleCode
Therefore: No, it not necessarily starts with 1.
Bigger ERP Systems have even configuration sections for number ranges!
Now, from a database point of view:
Yes, your tables should always have a primary key!
Having one will tremendously improve performance on average cases.
(but in most database systems, if you do not provide one, one will be
set by the DBMS which you do not see and can not handle. Some DBMS even
create indices, but thats another story)
I think it does not matter the start number or the start value that will hold the primary key .
What is important is that they will be represented in the FK of the join tables with the same values that are in the PK of the MAIN table .
A surrogate key can have any values, as long as they are unique. That's what makes it "surrogate" after all - values have no intrinsic meaning on their own, and shouldn't generally even be shown to the user. That being said, you could think about using different seeds, just for testing purposes, as Jon Skeet suggested.
That being said, do you really need to introduce a new (surrogate) key? The existing natural key could actually lead to less1 JOINS, and may be useful for clustering. While there are legitimate uses for surrogate keys, don't do it just becaus it is "fashionable" - always be aware of the tradeoffs you are making and pick the right balance for you concrete needs.
1 It is automatically "propagated" down foreign keys, so you don't need to JOIN the child table to the parent just to get the natural key - natural key is already in the child.
Doesn't matter what int the primary key starts from.
Assuming the codes aren't updated regularly, I don't believe that int will be any faster. It more heavily depends on it being a varchar or of a known size.
I personally always have an field names "Id" as a primary key to a table, defined as an int or a bigInt if necessary.
If the table matches up to an enumerated type then I make sure the Id matches the EnumeratedType id which can be any number - so no it doesn't need to start from 1.
If it doesn't match an enumerated type, then I will usually use an auto-incrementing key starting from 1 but this is not always needed.
Note - that if the number of rows is small, then the difference between indexing on a number and on a varchar will be negligible.
yes, it does'nt matter what integer it start from, it main use is define row uniquely and relationship among other table.
i have an import process, where i take a CSV file in .Net, and it writes to two user tables, in the DataBase. It's a pretty complex process, it takes several minutes to process about five hundred users at at time.
In that process, i need to generate a string, random string, that will be unique to each user, as it gives him access to some promotions. I can't use GUIDs because it has to be a simple string for the user to input in a splash screen.
What I need to know is, what is the best way to check if each newly generated key doesn't repeat any already created in thousands of pre-existing users.
I don't want to add a new query in each inserted row, asking if the string is already there.
Thanks, i hope i was clear enough
How many users are there compared to the number of possible unique keys?
If there are many more possible keys than there are users then I'd just add a unique constraint on the key column, and generate a new key if you hit a constraint violation.
If you're likely to get a lot of collisions with the above technique then there are a few options open to you:
Pre-generate sets of unique keys, store them in a table somewhere and take one when needed.
Add some uniqueness to the keys: do the users have a unique id that could be incorporated?
You can store string in Dictionary in Key value. If string is repeated then it will generate an error, here you can handle error and generate new string for user.
Hope it will help for you.
One simple way could be to use part of an MD5 for the customer + the customerid encoded in hex.
The customerid part ensures uniqueness and the MD5 part ensures that you cannot guess another users key.
Depending on how short string you can handle you can use just the first 6-10 chars from the MD5 and if you need to shorten it further reencode using somthing more compact than Hex, like base-64 or if you cannot handle different case make your own selection, A-Z + 0-9 and maybe some special chars to get an exponent of 2 that is easy to map to hex.
I need to generate an id with the
following features:
Id must be unique
Id consist of two parts 'type' and 'auto incremented' number
'type' is integer and value can be 1, 2 or 3
'auto incremented' number starts with 10001 and incremented each time id
is generated.
type is selected from a web form and auto incremented number
is from the database.
Example: if type is selected 2 and auto incremented number is 10001
then the generated id is = 210001
There may be hundrads of users generating id. Now my question is,
Can this be done without stored procedure so that there is no id confict.
I am using ASP.Net(C#), Oracle, NHibernate
As you use Oracle, you can use a Sequence for that.
Each time you call your_sequence.NEXTVAL, a unique number is returned.
Why isn't the NHibernate implementation of Hi-Lo acceptable?
What’s the Hi/Lo Algorithm
What's the point in having the first digit of the ID to define the type? You should use a separate column for this, and then just use a plain auto-incrementing primary key for the actual ID.
The cleanest way is - as Scott Anderson also said - to use two columns. Each attribute should be atomic, i.e. have only one meaning. With a multi-valued column you'll have to apply functions (substr) to reveal for example the type. Constraints will be harder to define. Nothing beats a simple "check (integer_type in (1,2,3))" or "check (id > 10000)".
As for defining your second attribute - let's call it "id" - the number starting from 10001, you have two good strategies:
1) use one sequence, start with 1, and for display use the expression "10000 + row_number() over (partition by integer_type order by id)", to let the users see the number they want.
2) use three sequences, one for each integer_type, and let them have a start with clause of 10001.
The reason why you should definitely use sequences, is scalability. If you don't use sequences, you'll have to store the current value in some table, and serialize access to that table. And that's not good in a multi user system. With sequences you can set the cache property to reduce almost all contention issues.
Hope this helps.
Regards,
Rob.
If you can't use auto incrementing types such as sequences, have a table containing each type and keeping score of its current value. Be careful to control access to this table and use it to generate new numbers. It is likely it will be a hot spot in your db though.