I need to be able to change the primary keys in a table. The problem is, some of the keys will be changing to existing key values. E.g. record1.ID 3=>4 and record2.ID 4=>5. I need to keep these as primary keys as they are set as foreign keys (which cascade up update) Is there a reasonable way to accomplish this, or am I attempting sql heresey?
As for the why, I have data from one set of tables linked by this primary key that are getting inserted/updated into another set of similarly structured tables. The insertion is in parts, as it is part of a deduping process, and if I could simply update all of the tables that are to be inserted with the new primary key, life would be easier.
One solution is to start the indexing on the destination table higher than the incoming tables row count will ever reach (the incoming table gets re=indexed every time), but I'd still like to know if it is possible to do the above, otherwise.
TIA
You are attempting sql heresy. I'm actually pretty open-minded and know a lot of times one must do things that seem crazy. It annoys me when people arrogantly answer with "you should do that differently", when they have know idea what the situation is. However I must tell you that you should do this differently. heh heh.
No, there is no way to do this elegantly with sql\DataAdapter. You could do it through ADO.NET with a series of t-sql commands. You have to, every time, turn on an identity-overwrite mode (set identity_insert theTable on), do your query where all the values on that table are incremented up one, and then turn of autonumber-overwrite mode. But then you would need to increment all the other tables that use this as a foreign key. But wait, it gets worse:
You would need to all this in a transaction, because you cannot have anything else happening to these tables during this time, and because if there was a failure you would most definitely need to rollback. This could be a good-size chunk of processing; your tables would be locked for a good bit.
If you have any foreign key constraints between these tables, you would need to turn them off before you do this, and re-implement them afterwards.
If you find yourself starting to think about update primary key values, alarm bells should start ringing.
It may seem easier, but I'd class it as more of a hack than a solution. Personally, I'd be having a rethink and try to address the real problem - may seem harder now, but it will be much better to maintain and reduce potential horrible issues down the line.
Related
I have a table that contains a non primary key RequestID. When I do a bulkInsert, all the records must have the same RequestID. But If I do another BulkInsert, the next inserted rows must have RequestID incremented :
NewRequestID = PreviousRequestID + 1
The only solution I found so far -and I don't like it by the way-, is to get the last record everytime before inserting the new records.
Why I dont like this approach ? because the database is supposed to be relationnel, which means there is "no specific order". Besides, I don't have primary keys or Dates to order with.
What is the best way to implement this?
(I've added c# tag because i am using EF. if there is an easy solution with EF)
You could take a number of different approaches:
Are you guaranteed that your RequestID's are always incremented? If so, you could query table for largest RequestID and that should represent the "last one inserted."
You could track state somewhere in your application, but this is likely dangerous in scenarios where service fails/restarts (unless state is tracked externally).
Assuming you have control over the schema, if you don't want to update the particular table schema you are speaking of, you could create another table to track the last RequestID used, and retrieve it from there (which would protect you against service restarts/failures).
Those are a few that come to mind.
UPDATE:
Assuming RequestID isn't a particular type of identifier, you could use timestamp - which will always be incremented when you do a new batch, however, I'm not sure if you needed it to always be incremented by exactly '1' which would preclude this approach.
I have to create a database structure. I have a question about foreing keys and good practice:
I have a table which must have a field that can be two different string values, either "A" or "B".
It cannot be anything else (therefore, i cannot use a string type field).
What is the best way to design this table:
1) create an int field which is a foreign key to another table with just two records, one for the string "A" and one for the string "B"
2) create an int field then, in my application, create an enumeration such as this
public enum StringAllowedValues
{
A = 1,
B
}
3) ???
In advance, thanks for your time.
Edit: 13 minutes later and I get all this awesome feedback. Thank you all for the ideas and insight.
Many database engines support enumerations as a data type. And there are, indeed, cases where an enumeration is the right design solution.
However...
There are two requirements which may decide that a foreign key to a separate table is better.
The first is: it may be necessary to increase the number of valid options in that column. In most cases, you want to do this without a software deployment; enumerations are "baked in", so in this case, a table into which you can write new data is much more efficient.
The second is: the application needs to reason about the values in this column, in ways that may go beyond "A" or "B". For instance, "A" may be greater/older/more expensive than "B", or there is some other attribute to A that you want to present to the end user, or A is short-hand for something.
In this case, it is much better to explicitly model this as columns in a table, instead of baking this knowledge into your queries.
In 30 years of working with databases, I personally have never found a case where an enumeration was the right decision....
Create a secondary table with the meanings of these integer codes. There's nothing that compels you to JOIN that in, but if you need to that data is there. Within your C# code you can still use an enum to look things up but try to keep that in sync with what's in the database, or vice-versa. One of those should be authoritative.
In practice you'll often find that short strings are easier to work with than rigid enums. In the 1990s when computers were slow and disk space scarce you had to do things like this to get reasonable performance. Now it's not really an issue even on tables with hundreds of millions of rows.
In Azure we have four Shards and i want to remove two of them as we do not need them anymore. The Data should be merged into the other two Shards.
I use a Listmap with GUIDs as Key to identifiy the Shard (in our application this is the UserId).
In the tutorials i only found samples to merge Shards with the Range type.
Is there a way to merge these type of shards in a faster way or do i have to write my own tool for this?
If the merge is performed automatically what will for example happen in the following case:
The GUID to identify the Shard is the UserId, now this data is moved from Shard A to Shard B. There is another Table called Comments which has the UserId as ForeignKey. The PrimaryKey in this Table is a classic numeric auto increment value. What will happen to those values if they are moved from Shard A to Shard B? Will they be inserted and a new ID is assigned to them or will this not work at all?
Also there is some local FileStorage invloved which uses IDs in the Path so i will have to write my own tool anyway i think.
For that I took a look at the ShardMapManager but did not fully understand how it works. In the ShardMappingsGlobal Table is a Column called MappingId. But this is not the Guid/UserId which is stored in the Shard Database. How do i get the actual Guid which is used to identify the shard, in my case the UserId?
I also did not find Methods to move data between Shards.
What i would do now is Transfer the Data between the Shards with a tool by myself and then use the ListShardMap.UpdateMapping Method to set a new Shard for the value.
At the end of the operation i would use ListShardMap.DeleteShard or is there a better way to do this?
EDIT:
I wrote my own tool to merge the shards but i get a strange exception now. here some code:
Guid userKey = Guid.Parse(userId);
ListShardMap<Guid> map = GetUserShardMap<Guid>();
try
{
PointMapping<Guid> currentMapping = map.GetMappingForKey(userKey);
PointMapping<Guid> mappingOffline = map.UpdateMapping(currentMapping, new PointMappingUpdate()
{
Status = MappingStatus.Offline
});
}
The UpdateMapping causes the following exception:
Store Error: Error 515, Level 16, State 2, Procedure __ShardManagement.spBulkOperationShardMappingsLocal, Line 98, Message: Cannot insert the value NULL into column 'LockOwnerId', table __ShardManagement.ShardMappingsLocal
I do not understand why there is even an insert? I checked for the mappingId in the local and global Shardmapping tables and the mapping is there so no insert should be required in my opinion. I also took a look at the Code of the mentioned stored procedure spBulkOperationShardMappingsLocal here: https://github.com/Azure/elastic-db-tools/blob/master/Src/ElasticScale.Client/ShardManagement/Scripts/UpgradeShardMapManagerLocalFrom1.1To1.2.sql
In the Insert statement the LockOwnerId is not passed as parameter so it can only fail.
Currently i work with a testsetup because i do not want to play on the productive system of course. Maybe i made a mistake there but to me everything looks good. i would be very grateful about any hint regarding this error.
In the tutorials i only found samples to merge Shards with the Range type. Is there a way to merge these type of shards in a faster way or do i have to write my own tool for this?
Yes, the Split-Merge tool can move data from both range and list shard maps. For a list shard map you can issue shardlet move requests for each key. The Split-Merge tool unfortunately has some complicated set up, last time it took me around an hour to configure. I know this is not great, I'll leave it up to you to determine whether it would take more or less time to write your own custom version.
There is another Table called Comments which has the UserId as ForeignKey. The PrimaryKey in this Table is a classic numeric auto increment value. What will happen to those values if they are moved from Shard A to Shard B? Will they be inserted and a new ID is assigned to them or will this not work at all?
The values of autoincrement columns are not copied over, they will be regenerated at the destination. So new ids will be assigned to these rows.
For that I took a look at the ShardMapManager but did not fully understand how it works. In the ShardMappingsGlobal Table is a Column called MappingId. But this is not the Guid/UserId which is stored in the Shard Database. How do i get the actual Guid which is used to identify the shard, in my case the UserId?
I would strongly suggest not trying to edit the ShardMapManager tables on your own, it's very easy to mess up. Editing ShardMapManager tables is precisely what the Elastic Database Tools library is designed to do.
You can update the metadata for a mapping by using the ListShardMap.UpdatePointMapping method. Just to be clear, this only updates the ShardMapManager tables' knowledge of where the data should be for the key. Actually moving the mapping must be done by a higher layer.
This is a high-level summary of what the Split-Merge service does:
Lock the mapping to prevent concurrent update from another shard map management operation
Mark the mapping offline with ListShardMap.UpdatePointMapping. This prevents data-directed routing with OpenConnectionForKey from being allowed to access data with that key. It also kills all current sessions on the shard to force them to reconnect, this ensure that there are no active connections operating on data with the now-offline key
Move the underlying data, using the Shard Map's SchemaInfo to determine which tables need to be moved
Update the mapping and mark it online with ListShardMap.UpdatePointMapping
Unlock the mapping
In the table, there's an identity column incrementing by one each time we add a row (the lowest value is 1). Now, there's a new requirement - we need to implement soft deletion.
So, my idea was to basically multiply any softly deleted row by -1. That ensures the uniqueness and clearly draws a line between active and softly deleted item.
update Things set Id = -101
where Id = 101
Would you know, the stupid computer doesn't let me do that. The suggested work-around is to:
alternate the column
perform the update
alternate back the column
and to me it seems like a Q&D. However, the only alternative I can see is to add a new column carrying the deletion status.
I'm using EF to perform the work with the extra quirk that when I changed the value of the identity and stored it, the software was kind enough to think for me and actually create a new row (with incrementally set identity that was neither the original one, nor the negative of it).
How should I tackle this issue?
I would strongly discourage you from overloading your identity column with any additional meaning. Someone who will look at your database table for the first time has no way of knowing that a negative ID means "deleted".
Introducing a new column Deleted BIT NOT NULL DEFAULT 0 does not have this disadvantage. It is self-explanatory. And it costs almost nothing: in the times of Big Data, an additional BIT column isn't going to fill your hard disk.
All of that being said, if you still want to do that, you could try to SET IDENTITY_INSERT dbo.Things ON before you UPDATE. (I cannot currently verify whether that would work, though.)
I am migrating an old database (oracle) and there are few tables like CountryCode, DeptCode and RoleCodes, their primary key is string (Codes) and i am thinking about adding Number column as a primary key because it would work fast with joins. These tables are not really big.
I am wondering if primary key for those tables should start from number '1' or it can be started from 100 just to differentiate b/w tables PK although i don't think i would be showing them on reports.
For sequence-generated IDs, I would suggest starting at different values if it's easy to do (depends on your database etc). You shouldn't be using this to differentiate between them in code, but it can make testing more reasonable.
Before now, I've had a situation where I've accidentally used a foreign key one table as if it were the foreign key for another table. The tests passed as the IDs were coincidentally the same. After we discovered the problem, we changed the initial seed and found the tests were a lot clearer.
You shouldn't do it to differentiate between tables. That is just not practical.
Not all primary keys have to start at 1, as in the case of an order number.
The rationale you're using to switch to an integer primary key doesn't seem valid: the performance gain you'd see using an INT rather than the original codes (which I assume are strings) will be negligable. The PK is always indexed, and indexes for strings or numerics are as good as instant. So unless you really need an INT, I'd be tempted to stick with the original data-type and work with the original data - simplifies data migration (which is something that should be considered whilst doing any work).
It is very common for example in ERP systems to define number ranges that
represent a certain group of items.
This can be both as position in a bigger number, e.g.
1234567890
| |
index 4 - 6 represents region code
index 7 - 8 represents dept code...
or, as I suspect in your case, parts at the same place, like
1000 - 1999 Region codes
2000 - 2999 DeptCode
3000 - 3999 RoleCode
Therefore: No, it not necessarily starts with 1.
Bigger ERP Systems have even configuration sections for number ranges!
Now, from a database point of view:
Yes, your tables should always have a primary key!
Having one will tremendously improve performance on average cases.
(but in most database systems, if you do not provide one, one will be
set by the DBMS which you do not see and can not handle. Some DBMS even
create indices, but thats another story)
I think it does not matter the start number or the start value that will hold the primary key .
What is important is that they will be represented in the FK of the join tables with the same values that are in the PK of the MAIN table .
A surrogate key can have any values, as long as they are unique. That's what makes it "surrogate" after all - values have no intrinsic meaning on their own, and shouldn't generally even be shown to the user. That being said, you could think about using different seeds, just for testing purposes, as Jon Skeet suggested.
That being said, do you really need to introduce a new (surrogate) key? The existing natural key could actually lead to less1 JOINS, and may be useful for clustering. While there are legitimate uses for surrogate keys, don't do it just becaus it is "fashionable" - always be aware of the tradeoffs you are making and pick the right balance for you concrete needs.
1 It is automatically "propagated" down foreign keys, so you don't need to JOIN the child table to the parent just to get the natural key - natural key is already in the child.
Doesn't matter what int the primary key starts from.
Assuming the codes aren't updated regularly, I don't believe that int will be any faster. It more heavily depends on it being a varchar or of a known size.
I personally always have an field names "Id" as a primary key to a table, defined as an int or a bigInt if necessary.
If the table matches up to an enumerated type then I make sure the Id matches the EnumeratedType id which can be any number - so no it doesn't need to start from 1.
If it doesn't match an enumerated type, then I will usually use an auto-incrementing key starting from 1 but this is not always needed.
Note - that if the number of rows is small, then the difference between indexing on a number and on a varchar will be negligible.
yes, it does'nt matter what integer it start from, it main use is define row uniquely and relationship among other table.