Azure Elastic Database Merge GUI Key Shards - c#

In Azure we have four Shards and i want to remove two of them as we do not need them anymore. The Data should be merged into the other two Shards.
I use a Listmap with GUIDs as Key to identifiy the Shard (in our application this is the UserId).
In the tutorials i only found samples to merge Shards with the Range type.
Is there a way to merge these type of shards in a faster way or do i have to write my own tool for this?
If the merge is performed automatically what will for example happen in the following case:
The GUID to identify the Shard is the UserId, now this data is moved from Shard A to Shard B. There is another Table called Comments which has the UserId as ForeignKey. The PrimaryKey in this Table is a classic numeric auto increment value. What will happen to those values if they are moved from Shard A to Shard B? Will they be inserted and a new ID is assigned to them or will this not work at all?
Also there is some local FileStorage invloved which uses IDs in the Path so i will have to write my own tool anyway i think.
For that I took a look at the ShardMapManager but did not fully understand how it works. In the ShardMappingsGlobal Table is a Column called MappingId. But this is not the Guid/UserId which is stored in the Shard Database. How do i get the actual Guid which is used to identify the shard, in my case the UserId?
I also did not find Methods to move data between Shards.
What i would do now is Transfer the Data between the Shards with a tool by myself and then use the ListShardMap.UpdateMapping Method to set a new Shard for the value.
At the end of the operation i would use ListShardMap.DeleteShard or is there a better way to do this?
EDIT:
I wrote my own tool to merge the shards but i get a strange exception now. here some code:
Guid userKey = Guid.Parse(userId);
ListShardMap<Guid> map = GetUserShardMap<Guid>();
try
{
PointMapping<Guid> currentMapping = map.GetMappingForKey(userKey);
PointMapping<Guid> mappingOffline = map.UpdateMapping(currentMapping, new PointMappingUpdate()
{
Status = MappingStatus.Offline
});
}
The UpdateMapping causes the following exception:
Store Error: Error 515, Level 16, State 2, Procedure __ShardManagement.spBulkOperationShardMappingsLocal, Line 98, Message: Cannot insert the value NULL into column 'LockOwnerId', table __ShardManagement.ShardMappingsLocal
I do not understand why there is even an insert? I checked for the mappingId in the local and global Shardmapping tables and the mapping is there so no insert should be required in my opinion. I also took a look at the Code of the mentioned stored procedure spBulkOperationShardMappingsLocal here: https://github.com/Azure/elastic-db-tools/blob/master/Src/ElasticScale.Client/ShardManagement/Scripts/UpgradeShardMapManagerLocalFrom1.1To1.2.sql
In the Insert statement the LockOwnerId is not passed as parameter so it can only fail.
Currently i work with a testsetup because i do not want to play on the productive system of course. Maybe i made a mistake there but to me everything looks good. i would be very grateful about any hint regarding this error.

In the tutorials i only found samples to merge Shards with the Range type. Is there a way to merge these type of shards in a faster way or do i have to write my own tool for this?
Yes, the Split-Merge tool can move data from both range and list shard maps. For a list shard map you can issue shardlet move requests for each key. The Split-Merge tool unfortunately has some complicated set up, last time it took me around an hour to configure. I know this is not great, I'll leave it up to you to determine whether it would take more or less time to write your own custom version.
There is another Table called Comments which has the UserId as ForeignKey. The PrimaryKey in this Table is a classic numeric auto increment value. What will happen to those values if they are moved from Shard A to Shard B? Will they be inserted and a new ID is assigned to them or will this not work at all?
The values of autoincrement columns are not copied over, they will be regenerated at the destination. So new ids will be assigned to these rows.
For that I took a look at the ShardMapManager but did not fully understand how it works. In the ShardMappingsGlobal Table is a Column called MappingId. But this is not the Guid/UserId which is stored in the Shard Database. How do i get the actual Guid which is used to identify the shard, in my case the UserId?
I would strongly suggest not trying to edit the ShardMapManager tables on your own, it's very easy to mess up. Editing ShardMapManager tables is precisely what the Elastic Database Tools library is designed to do.
You can update the metadata for a mapping by using the ListShardMap.UpdatePointMapping method. Just to be clear, this only updates the ShardMapManager tables' knowledge of where the data should be for the key. Actually moving the mapping must be done by a higher layer.
This is a high-level summary of what the Split-Merge service does:
Lock the mapping to prevent concurrent update from another shard map management operation
Mark the mapping offline with ListShardMap.UpdatePointMapping. This prevents data-directed routing with OpenConnectionForKey from being allowed to access data with that key. It also kills all current sessions on the shard to force them to reconnect, this ensure that there are no active connections operating on data with the now-offline key
Move the underlying data, using the Shard Map's SchemaInfo to determine which tables need to be moved
Update the mapping and mark it online with ListShardMap.UpdatePointMapping
Unlock the mapping

Related

What's causing a Concurrency Violation when using a MySQL linked Dataset via TableAdapters?

The more I read on this, the more confused I get, so hope someone can help. I have a complex database setup, which sometimes produces the error on update:
"Concurrency violation: the UpdateCommand affected 0 of the expected 1 records"
I say sometimes, because I cannot recreate conditions to trigger it consistently. I have a remote mySQL database connected to my app through the DataSource Wizard, which produces the dataset, tables and linked DataTableAdapters.
My reading suggests that this error is meant to occur when there is more than one open connection to the database trying to update the same record? This shouldn't happen in my instance, as the only updates are sequential from my app.
I am wondering whether it has something to do with running the update from a background worker? I have my table updates in one, for example, thusly:
Gi_gamethemeTableAdapter.Update(dbDS.gi_gametheme)
Gi_gameplaystyleTableAdapter.Update(dbDS.gi_gameplaystyle)
Gi_gameTableAdapter.Update(dbDS.gi_game)
These run serially in the backgroundworker, however, so unsure about this. The main thread also waits for it to finish, and there are no other db operations going on before or after this is started.
I did read about going into the dataset designer view, choosing "configure" in the datatableadapter > advanced options and setting "Use optimistic concurrency" to false. This might have worked (hard to say because of the seemingly random nature of the error), however, there are drawbacks to this that I want to avoid:
I have around 60 tables. I don't want to do this for each one.
I sometimes have to re-import the mysql schema into the dataset designer, or delete a table and re-add it. This would obviously lose this setting and I would have to remember to do it on all of them again, potentially. I also can't find a way to do this automatically in code.
I'm afraid I'm not at code level in terms of the database updates etc, relying on the Visual Studio wizards. It's a bit late to change the stack as well (e.g. can't change to Entity Framework etc).
SO my question is:
what is/how can I find what's causing the error?
What can I do about it?
thanks
When you have tableadapters that download data into datatables, they can be configured for optimistic concurrency
This means that for a table like:
Person
ID Name
1 John
They might generate an UPDATE query like:
UPDATE Person SET Name = #newName WHERE ID = #oldID AND Name = #oldName
(In reality they are more complex than this but this will suffice)
Datatables track original values and current values; you download 1/"John", and then change the name to "Jane", you(or the tableadapter) can ask the DT what the original value was and it will say "John"
The datatable can also feed this value into the UPDATE query and that's how we detect "if something else changed the row in the time we had it" i.e. a concurrency violation
Row was "John" when we downloaded it, we edited to "Jane", and went to save.. But someone else had been in and changed it to "Joe". Our update will fail because Name is no longer "John" that it was (and we still think it is) when we downloaded it. By dint of the tableadapter having an update query that said AND Name = #oldName, and setting #oldName parameter to the original value somedatarow["Name", DataRowVersion.Original].Value (i.e. "John") we cause the update to fail. This is a useful thing; mostly they will succeed so we can opportunistically hope our users can update our db without needing to get into locking rows while they have them open in some UI
Resolving the cases where it doesn't work is usually a case of coding up some strategy:
My changes win - don't use an optimistic query that features old values, just UPDATE and erase their changes
Their changes win - cancel your attempts
Re-download the latest DB state and choose what to do - auto merge it somehow (maybe the other person changed fields you didn't), or show the user so they can pick and choose what to keep etc (if both people edited the same fields)
Now you're probably sat there saying "but noone else changes my DB" - we can still get this though, if the database has changed some values upon one save and you don't have the latest ones in your dataset..
There's another option in the tableadapter wizardd - "refresh the dataset" - it's supposed to run a select after a modification to import any latest database calculated values (like auto inc primary keys or triggers/defaults/etc). Some query like INSERT INTO Person(Name) VALUES(#name) is supposed to silently have a SELECT * FROM PERSON WHERE ID = last_inserted_id() tagged on the end of it to retrieve the latest values
Except "refresh the dataset" doesn't work :/
So, while I can't tell you exactly why youre getting your CV exception, I hope that explaining why they occur and pointing out that there are sometimes bugs that cause them (insert new record, calculated ID is not retreieved, edit this recent record, update fails because data wasn't fresh) will hopefully arm you with what you need to find the problem: when you get one, keep the app stopped on the breakpoint and inspect the datarow: take a look at the query being run and what original/current values are being put as parameters - inspect the original and current values held by the row using the overload of the Item indexer that allows you to state the version you want and look in the DB
Somewhere in all of that there will be the mismatch that explains why 0 records were updated - the db has "Joe" as the name or 174354325 as the ID, your datarow has "John" as the original name or -1 as the ID (it never refreshed), and the WHERE clause is finding 0 records as a result
Some of your tables will contain a field that is marked as [ConcurrencyCheck] or [TimeStamp] concurrency token.
When you update a record, the SQL generated will include a WHERE [ConcurrencyField]='Whatever the value was when the record was retrieved'.
If that record was updated by another thread or process or something other than the current thread, then your UPDATE will return 0 records updated, rather than the 1 (or more) that was expected.
What can you do about it? Firstly, put a try/catch(DbConcurrencyException) around your code. Then you can re-read the offending record and try and update it again.

How to implement Identity for group of records?

I have a table that contains a non primary key RequestID. When I do a bulkInsert, all the records must have the same RequestID. But If I do another BulkInsert, the next inserted rows must have RequestID incremented :
NewRequestID = PreviousRequestID + 1
The only solution I found so far -and I don't like it by the way-, is to get the last record everytime before inserting the new records.
Why I dont like this approach ? because the database is supposed to be relationnel, which means there is "no specific order". Besides, I don't have primary keys or Dates to order with.
What is the best way to implement this?
(I've added c# tag because i am using EF. if there is an easy solution with EF)
You could take a number of different approaches:
Are you guaranteed that your RequestID's are always incremented? If so, you could query table for largest RequestID and that should represent the "last one inserted."
You could track state somewhere in your application, but this is likely dangerous in scenarios where service fails/restarts (unless state is tracked externally).
Assuming you have control over the schema, if you don't want to update the particular table schema you are speaking of, you could create another table to track the last RequestID used, and retrieve it from there (which would protect you against service restarts/failures).
Those are a few that come to mind.
UPDATE:
Assuming RequestID isn't a particular type of identifier, you could use timestamp - which will always be incremented when you do a new batch, however, I'm not sure if you needed it to always be incremented by exactly '1' which would preclude this approach.

SQL Merge Replication Issue

I have a issue regarding Merge Replication. I have a table SETTINGS where in i store the settings of my software.
The schema of the table is ID ( PK) , Description , Value.
Suppose i have 15 rows in this table on my server.
Now i have applied filter on this table saying only the first 10 rows would replicate.
Now with this settings when i sync for the first time, i receive the 10 rows on my client (having subscription).
Then i add the remaining 5 on my client.
Now when i sync again it gives me a conflict saying that
A row insert at 'ClientServer.ClientDatabaseName' could not be
propagated to 'MyServer.ServerDatabaseName'. This failure can be
caused by a constraint violation. Violation of PRIMARY KEY constraint
'PK_SETTINGS'. Cannot insert duplicate key in object 'dbo.SETTINGS'.
The duplicate key value is (11).
What i don't understand is why is it trying to replicate something (row) which is outside the subset filter applied on that table ?? Please help guys.
Is this scenario not possible with Merge replication ?
https://msdn.microsoft.com/en-us/library/ms151775.aspx the link suggests that this is possible. But confused.
Filters created on for a merge article are evaluated only at the publisher. Changes made at the subscriber will always be propagated back to the subscriber, even if they are outside the filter criteria. However if the changes from the one subscriber do not meet the filtering criteria, then they will sit on the publisher, but not be replicated to all the other subscribers.
Is this a production scenario, or are you playing around with replication? If you do static filtering, which is what you have above, it is typically done on read-only type of tables. For example, a salesperson in the field may only need prices for products in their region. They are not expected to update this table. If you do dynamic filtering, for example, filtering based on HOSTNAME(), then you would only get data specific for that user. For example, a salesperson in the field would receive only their customer information. Thus, any updates to that information, unless it's shared across multiple salespersons, would propagate back up, and not flow to anyone else.
In your case, i would not recommend updating tables on the subscriber that have static filters, thus i suggest re-evaluating your filtering design to ensure you have the right filtering model for your scenario.

How to query an SQLite db in batches

I am using C# with .NET 4.5. I am making a scraper which collects specific data. Each time a value is scraped, I need to make sure it hasn't already been added to the SQLite db.
To do this, I am making a call each time a value is scraped to query against the db to check if it contains the value, and if not, I make another call to insert the value into the db.
Since I am scraping multiple values per second, this gets to be very IO-intensive, with constant calls to the db.
My question is, is there any better way to do this? Perhaps I could queue the values scraped and then run a batch query at once? Is that possible?
I see three approaches:
Use INSERT OR IGNORE, which will reject an entry if it is already present (based on primary key and unique fields). Or plainly INSERT (or its equivalent (INSERT or ABORT) which will return SQLITE_CONSTRAINT, a value you will have to catch and manage if you want to count failed insertions.
Accumulate, outside the database, the updates you want to make. When you have accumulated enough/all, start a transaction (BEGIN;), do your insertions (you can use INSERT OR IGNORE here as well), commit the transaction (COMMIT;)
You could pre-fetch a list of items you already have, depending, and check against that list, if your data model allows it.

How do I programatically verify, create, and update SQL table structure?

Scenario:
I have an application (C#) that expects a SQL database and login, which are set by a user. Once connected, it checks for the existence of several table and creates them if not found.
I'd like to expand on this by having the program be capable of adding columns to those tables if I release a new version of the program which relies upon the new columns.
Question:
What is the best way to programatically check the structure of an existing SQL table and create or update it to match an expected structure?
I am planning to iterate through the list of required columns and alter the existing table whenever it does not contain the new column. I can't help but wonder if there's an approach that is different or better.
Criteria:
Here are some of my expectations and self-imposed rules:
Newer versions of the program might no longer use certain columns, but they would be retained for data logging purposes. In other words, no columns will be removed.
Existing data in the table must be preserved, so the table cannot simply be dropped and recreated.
In all cases, newly added columns would allow null data, so the population of old records is taken care of by having default null values.
Example:
Here is a sample table (because visual examples help!):
id datetime sensor_name sensor_status x1 x2 x3 x4
1 20100513T151907 na019 OK 0.01 0.21 1.41 1.22
2 20100513T152907 na019 OK 0.02 0.23 1.45 1.52
Then, in a new version, I may want to add the column x5. The "x-columns" are all data-storage columns that accept null.
Edit:
I updated the sample table above. It is more of a log and not a parent table. So the sensors will repeatedly show up in this logging table with the values logged. A separate parent table contains the geographic and other logistical information about the sensor, making the table I wish to modify a child table.
This is a very troublesome feature that you're thinking about implementing. i would advise against it and instead consider scripting changes using a 3rd party tool such as Red Gate's Sql Compare: http://www.red-gate.com/products/SQL_Compare/index.htm
If you're in doubt, consider downloading the trial version of the software and performing a structure diff script on two databases with some non-trivial differences. You'll see from the result that the considerations for such operations are far from simple.
The other way around this type of issue is to redesign your database using the EAV model: http://en.wikipedia.org/wiki/Entity-attribute-value_model (Pivots to dynamically add rows thus never changing the structure. It has its own issues but it's very flexible.)
(To utilize a diff tool you would have to have a copy of all of your db versions and create diff scripts which would go out and get executed with new releases and upgrades. That's a huge mess of its own to maintain. EAV is the way for a thing like this. It wrongfully gets a lot of flak for not being as performant as a traditional db structure but i've used it a number of times with great success. In fact, i have an HIPAA-compliant EAV db (Sql Server 2000) that's been in production for over six years with several of the EAV tables containing tens or millions of rows and it's still going strong w/ no big slow down. Of course we don't do heavy reporting against that db. For reports we have an export that flattens the data into a relational structure.)
The common solution i see would be to store in your database somewhere version information. maybe have a really small table:
CREATE TABLE DB_PROPERTIES (key varchar(100), value varchar(100));
then you could add a row:
key | value
version | 12
Then you could just create a sql update script (or set of scripts) which updates the db from version 12 to version13.
declare v varchar(100)
select v=value from DB_PROPERTIES where key='version'
if v ='12'
#do upgrade from 12 to 13
elsif v='11'
#do upgrade from 11 to 13
...and so on
depending on what upgrade paths you wanted to support you could add more cases. You could also obviously move this upgrade logic into C# and or whatever design works for you. But having the db version information stored in the database will make it much easier to figure out what is already there, rather than querying for all the db structures individually.
If you have to build something in such a way as to rely on the application making table changes, your design is flawed. You should have a related table for the sensor values (x1, x2, etc.), then you can just add another record rather than having to create a new column.
Suggested child table structure
READINGS
ID int
Reading_type varchar (10)
Reading_Value int
Then data in the table would read:
ID Reading_type Reading_value
1 x1 2
1 x2 3
1 x3 1
2 x1 7
Try Microsoft.SqlServer.Management.Smo
These are a set of C# classes that provide an API to SQL Server database objects.
The Microsoft.SqlServer.Management.Smo.Table has a Columns Collection that will allow you to query and manipulate the columns.
Have fun.

Categories

Resources