I have a issue regarding Merge Replication. I have a table SETTINGS where in i store the settings of my software.
The schema of the table is ID ( PK) , Description , Value.
Suppose i have 15 rows in this table on my server.
Now i have applied filter on this table saying only the first 10 rows would replicate.
Now with this settings when i sync for the first time, i receive the 10 rows on my client (having subscription).
Then i add the remaining 5 on my client.
Now when i sync again it gives me a conflict saying that
A row insert at 'ClientServer.ClientDatabaseName' could not be
propagated to 'MyServer.ServerDatabaseName'. This failure can be
caused by a constraint violation. Violation of PRIMARY KEY constraint
'PK_SETTINGS'. Cannot insert duplicate key in object 'dbo.SETTINGS'.
The duplicate key value is (11).
What i don't understand is why is it trying to replicate something (row) which is outside the subset filter applied on that table ?? Please help guys.
Is this scenario not possible with Merge replication ?
https://msdn.microsoft.com/en-us/library/ms151775.aspx the link suggests that this is possible. But confused.
Filters created on for a merge article are evaluated only at the publisher. Changes made at the subscriber will always be propagated back to the subscriber, even if they are outside the filter criteria. However if the changes from the one subscriber do not meet the filtering criteria, then they will sit on the publisher, but not be replicated to all the other subscribers.
Is this a production scenario, or are you playing around with replication? If you do static filtering, which is what you have above, it is typically done on read-only type of tables. For example, a salesperson in the field may only need prices for products in their region. They are not expected to update this table. If you do dynamic filtering, for example, filtering based on HOSTNAME(), then you would only get data specific for that user. For example, a salesperson in the field would receive only their customer information. Thus, any updates to that information, unless it's shared across multiple salespersons, would propagate back up, and not flow to anyone else.
In your case, i would not recommend updating tables on the subscriber that have static filters, thus i suggest re-evaluating your filtering design to ensure you have the right filtering model for your scenario.
Related
I'm building an app where I need to store invoices from customers so we can track who has paid and who has not, and if not, see how much they owe in total. Right now my schema looks something like this:
Customer
- Id
- Name
Invoice
- Id
- CreatedOn
- PaidOn
- CustomerId
InvoiceItem
- Id
- Amount
- InvoiceId
Normally I'd fetch all the data using Entity Framework and calculate everything in my C# service, (or even do the calculation on SQL Server) something like so:
var amountOwed = Invoice.Where(i => i.CustomerId == customer.Id)
.SelectMany(i => i.InvoiceItems)
.Select(ii => ii.Amount)
.Sum()
But calculating everything every time I need to generate a report doesn't feel like the right approach this time, because down the line I'll have to generate reports that should calculate what all the customers owe (sometimes go even higher on the hierarchy).
For this scenario I was thinking of adding an Amount field on my Invoice table and possibly an AmountOwed on my Customer table which will be updated or populated via the InvoiceService whenever I insert/update/delete an InvoiceItem. This should be safe enough and make the report querying much faster.
But I've also been searching some on this subject and another recommended approach is using triggers on my database. I like this method best because even if I were to directly modify a value using SQL and not the app services, the other tables would automatically update.
My question is:
How do I add a trigger to update all the parent tables whenever an InvoiceItem is changed?
And from your experience, is this the best (safer, less error-prone) solution to this problem, or am I missing something?
There are many examples of triggers that you can find on the web. Many are poorly written unfortunately. And for future reference, post DDL for your tables, not some abbreviated list. No one should need to ask about the constraints and relationships you have (or should have) defined.
To start, how would you write a query to calculate the total amount at the invoice level? Presumably you know the tsql to do that. So write it, test it, verify it. Then add your amount column to the invoice table. Now how would you write an update statement to set that new amount column to the sum of the associated item rows? Again - write it, test it, verify it. At this point you have all the code you need to implement your trigger.
Since this process involves changes to the item table, you will need to write triggers to handle all three types of dml statements - insert, update, and delete. Write a trigger for each to simplify your learning and debugging. Triggers have access to special tables - go learn about them. And go learn about the false assumption that a trigger works with a single row - it doesn't. Triggers must be written to work correctly if 0 (yes, zero), 1, or many rows are affected.
In an insert statement, the inserted table will hold all the rows inserted by the statement that caused the trigger to execute. So you merely sum the values (using the appropriate grouping logic) and update the appropriate rows in the invoice table. Having written the update statement mentioned in the previous paragraphs, this should be a relatively simple change to that query. But since you can insert a new row for an old invoice, you must remember to add the summed amount to the value already stored in the invoice table. This should be enough direction for you to start.
And to answer your second question - the safest and easiest way is to calculate the value every time. I fear you are trying to solve a problem that you do not have and that you may never have. Generally speaking, no one cares about invoices that are of "significant" age. You might care about unpaid invoices for a period of time, but eventually you write these things off (especially if the amounts are not significant). Another relatively easy approach is to create an indexed view to calculate and materialize the total amount. But remember - nothing is free. An indexed view must be maintained and it will add extra processing for DML statements affecting the item table. Indexed views do have limitations - which are documented.
And one last comment. I would strongly hesitate to maintain a total amount at any level higher than invoice. Above that level one frequently wants to filter the results in any ways - date, location, type, customer, etc. At this level you are approaching data warehouse functionality which is not appropriate for a OLTP system.
First of all never use triggers for business logic. Triggers are tricky and easily forgettable. It will be hard to maintain such application.
For most cases you can easily populate your reporting data via entity framework or SQL query. But if it requires lots of joins then you need to consider using staging tables. Because reporting requires data denormalization. To populate staging tables you can use SQL jobs or other schedule mechanism (Azure Scheduler maybe). This way you won't need to work with lots of join and your reports will populate faster.
I have a table that contains a non primary key RequestID. When I do a bulkInsert, all the records must have the same RequestID. But If I do another BulkInsert, the next inserted rows must have RequestID incremented :
NewRequestID = PreviousRequestID + 1
The only solution I found so far -and I don't like it by the way-, is to get the last record everytime before inserting the new records.
Why I dont like this approach ? because the database is supposed to be relationnel, which means there is "no specific order". Besides, I don't have primary keys or Dates to order with.
What is the best way to implement this?
(I've added c# tag because i am using EF. if there is an easy solution with EF)
You could take a number of different approaches:
Are you guaranteed that your RequestID's are always incremented? If so, you could query table for largest RequestID and that should represent the "last one inserted."
You could track state somewhere in your application, but this is likely dangerous in scenarios where service fails/restarts (unless state is tracked externally).
Assuming you have control over the schema, if you don't want to update the particular table schema you are speaking of, you could create another table to track the last RequestID used, and retrieve it from there (which would protect you against service restarts/failures).
Those are a few that come to mind.
UPDATE:
Assuming RequestID isn't a particular type of identifier, you could use timestamp - which will always be incremented when you do a new batch, however, I'm not sure if you needed it to always be incremented by exactly '1' which would preclude this approach.
In Azure we have four Shards and i want to remove two of them as we do not need them anymore. The Data should be merged into the other two Shards.
I use a Listmap with GUIDs as Key to identifiy the Shard (in our application this is the UserId).
In the tutorials i only found samples to merge Shards with the Range type.
Is there a way to merge these type of shards in a faster way or do i have to write my own tool for this?
If the merge is performed automatically what will for example happen in the following case:
The GUID to identify the Shard is the UserId, now this data is moved from Shard A to Shard B. There is another Table called Comments which has the UserId as ForeignKey. The PrimaryKey in this Table is a classic numeric auto increment value. What will happen to those values if they are moved from Shard A to Shard B? Will they be inserted and a new ID is assigned to them or will this not work at all?
Also there is some local FileStorage invloved which uses IDs in the Path so i will have to write my own tool anyway i think.
For that I took a look at the ShardMapManager but did not fully understand how it works. In the ShardMappingsGlobal Table is a Column called MappingId. But this is not the Guid/UserId which is stored in the Shard Database. How do i get the actual Guid which is used to identify the shard, in my case the UserId?
I also did not find Methods to move data between Shards.
What i would do now is Transfer the Data between the Shards with a tool by myself and then use the ListShardMap.UpdateMapping Method to set a new Shard for the value.
At the end of the operation i would use ListShardMap.DeleteShard or is there a better way to do this?
EDIT:
I wrote my own tool to merge the shards but i get a strange exception now. here some code:
Guid userKey = Guid.Parse(userId);
ListShardMap<Guid> map = GetUserShardMap<Guid>();
try
{
PointMapping<Guid> currentMapping = map.GetMappingForKey(userKey);
PointMapping<Guid> mappingOffline = map.UpdateMapping(currentMapping, new PointMappingUpdate()
{
Status = MappingStatus.Offline
});
}
The UpdateMapping causes the following exception:
Store Error: Error 515, Level 16, State 2, Procedure __ShardManagement.spBulkOperationShardMappingsLocal, Line 98, Message: Cannot insert the value NULL into column 'LockOwnerId', table __ShardManagement.ShardMappingsLocal
I do not understand why there is even an insert? I checked for the mappingId in the local and global Shardmapping tables and the mapping is there so no insert should be required in my opinion. I also took a look at the Code of the mentioned stored procedure spBulkOperationShardMappingsLocal here: https://github.com/Azure/elastic-db-tools/blob/master/Src/ElasticScale.Client/ShardManagement/Scripts/UpgradeShardMapManagerLocalFrom1.1To1.2.sql
In the Insert statement the LockOwnerId is not passed as parameter so it can only fail.
Currently i work with a testsetup because i do not want to play on the productive system of course. Maybe i made a mistake there but to me everything looks good. i would be very grateful about any hint regarding this error.
In the tutorials i only found samples to merge Shards with the Range type. Is there a way to merge these type of shards in a faster way or do i have to write my own tool for this?
Yes, the Split-Merge tool can move data from both range and list shard maps. For a list shard map you can issue shardlet move requests for each key. The Split-Merge tool unfortunately has some complicated set up, last time it took me around an hour to configure. I know this is not great, I'll leave it up to you to determine whether it would take more or less time to write your own custom version.
There is another Table called Comments which has the UserId as ForeignKey. The PrimaryKey in this Table is a classic numeric auto increment value. What will happen to those values if they are moved from Shard A to Shard B? Will they be inserted and a new ID is assigned to them or will this not work at all?
The values of autoincrement columns are not copied over, they will be regenerated at the destination. So new ids will be assigned to these rows.
For that I took a look at the ShardMapManager but did not fully understand how it works. In the ShardMappingsGlobal Table is a Column called MappingId. But this is not the Guid/UserId which is stored in the Shard Database. How do i get the actual Guid which is used to identify the shard, in my case the UserId?
I would strongly suggest not trying to edit the ShardMapManager tables on your own, it's very easy to mess up. Editing ShardMapManager tables is precisely what the Elastic Database Tools library is designed to do.
You can update the metadata for a mapping by using the ListShardMap.UpdatePointMapping method. Just to be clear, this only updates the ShardMapManager tables' knowledge of where the data should be for the key. Actually moving the mapping must be done by a higher layer.
This is a high-level summary of what the Split-Merge service does:
Lock the mapping to prevent concurrent update from another shard map management operation
Mark the mapping offline with ListShardMap.UpdatePointMapping. This prevents data-directed routing with OpenConnectionForKey from being allowed to access data with that key. It also kills all current sessions on the shard to force them to reconnect, this ensure that there are no active connections operating on data with the now-offline key
Move the underlying data, using the Shard Map's SchemaInfo to determine which tables need to be moved
Update the mapping and mark it online with ListShardMap.UpdatePointMapping
Unlock the mapping
I have a problem concerning application performance: I have many tables, each having millions of records. I am performing select statements over them using joins, where clauses and orderby on different criterias (specified by the user at runtime). I want to get my records paged but no matter what I do with my SQL statements I cannot reach the performance of getting my pages directly from memory. Basically the problem comes when I have to filter my records by using some runtime dynamic specified criteria. I tried everything such as using ROW_NUMBER() function combined with a "where RowNo between" clause, I've tried CTE, temp tables, etc. Those SQL solutions performs well only if I don't include filtering. Keep in mind also that I want my solution to be as generic as possible (imagine that i have in my app several lists that virtually presents paged millions of records and those records are constructed with very complex sql statements).
All my tables has a primary key of type INT.
So, I come with an ideea: Why not create a "server" only for select statements. The server loads first all records from all tables and stores them into some HashSets where each T has an Id property and GetHashCode () returns that Id and also the Equals is implemented such that two records are "equal" only if Id is equal (don't scream, You will see later why I am not using all record data for hashing and comparisons).
So far so good, but there's a problem: How can I sync my in memory collections with database records?. The ideea is that I must find a solution such as I load only differential changes. So I invented a changelog table for each table that I want to cache. In this changelog I perform only inserts that marks dirty rows (updates or deletes) and also records newly inserted ids, all of this mechanism implemented using triggers. So whenever an in-memory select comes, I check first if I must sync something (by interogating the changelog). If something must be applied, I load the changelog, I apply those changes in memory and finally I am clearing that changelog (or maybe remember what was the highest changelog id that I've applied ...).
In order to be able to apply the changelog in O ( N ) where N is the changelog size, i am using this algo:
for each log.
identify my in-memory Dictionary <int, T> where the key is the primary key.
if it's a delete log then call dictionary.Remove (id) ( O ( 1 ))
if it's an update log, then call also dictionary.Remove (id) ( O (1)) and move this id into an "to be inserted" collection
if it's an insert log, move this id into a "to be inserted" collection.
finally, refresh cache by selecting all data from the corresponding table where Id in ("to be inserted").
For filtering, I am compiling some expression trees into Func < T, List < FilterCriterias >, bool > functors. Using this mechanism I am performing way more faster than SQL.
I Know that SQL 2012 has caching support and the new comming SQL version will suport even more but My client have SQL server 2005 so ... I can't benefit of this stuff.
My question: What do you think ? this is a bad ideea ? there's a better aproach ?
The developers of SQL Server did a very good job. I think it is fairly impossible to trick this out.
Unless your data has some kind of implicit structure which might help to speed things up and which the optimizer cannot be aware of, such "I do my own speedy trick" approaches won't help - normally...
Performance problems are ever first to be solved where they occur:
the tables structures and relations
indexes and statistics
quality of SQL statements
Even many million rows are no problem if the design and the queries are good...
If your queries do a lot of computations, or you need to retrieve data out of tricky structures (nested list with recursive reads, XML...) I'd go the Data-Warehouse-Path and write some denormalized tables for quick selects. Of course you will have to deal with the fact, that you are reading "old" data. If your data does not change much, you could trigger all changes to a denormalized structure immediately. But this depends on your actual situation.
If you want, you could post one of your imperformant queries together with the relevant structure details and ask for review. There are dedicated groups on Stack-Exchange, such as "Code Review". If it's not to big, you might try it here as well...
I have a DataTable that I am binding it to a GridView on my ASP.NET page. I also allow editing and insertion.
Upon saving/insertion, I need to determine if there is a duplicate description in the Gridview.
How can I accomplish this?
Any way the data which you are binding will have the unique id.
So after binding check for that id whether it is there or not in datatable.We can't say more than this unless you explain it more.
We may need some more information on what kind of database you are using to give you the right answer, but I'll take a swing anyway.
First, you need to have a PRIMARY KEY on your database table for several reasons including a default index and insuring uniqueness. Second, you can configure the table to have a UNIQUE INDEX on the description column. This will prevent the insertion of duplicate data at the database level. But, once you do that you will likely get some kind of exception or error in your client application that you will need to catch and handle.
Also, you could create an AJAX function to filter the data as the user types in the new row and show them records that are similar. I did this on an app where the users would put in the same request but use slightly different wording.