Deleting database records unpermenantley (soft-delete) - c#

The Story
I'm going to write up some code to manage the deleted items in my application, but I'm going to soft delete them so I could return them back when I need. I have a hierarchy to respect in my application's logic when it comes to hiding or deleting items.
I logically place my items in three containers to the country, city, district and brand.
Each item should belong to a country, a city, a district and a brand.
Now, if I deleted a country it should delete the cities, districts, brands, and items that belongs to the given country. and if I deleted the city it should also delete the whole stuff under it (districts, brands, etc)
A Note
When I delete a country and delete the associated brands, I should take care that a brand might have items in more than one country.
The Question
Do you suggest to
Flag the items (whether it's country, city, item, etc) as deleted and this will require a lot of code to check every time when any item is loaded from the database, if it's deleted or not and also some extra fields to mark if the city it belongs to is deleted, and the country it belongs to is deleted and so on.
Move the deleted stuff each to a specific table (DeletedCountries, Deleted Cities, etc)
and save the the IDs of the items it was associated with so I could insert them back later to it's original table. and of course this will save my application all the code that will manage to check all the deleted items and make sure all the hierarchy is deleted.
Maybe you have a better approach/advice/idea about achieving such a thing!

For argument's sake, one advantage of solution #2 (moving deleted items to their own tables) is if you have lots and lots of records, you would not have to worry about indexing records in respect to their "deleted" state.
With that said, if I were going to "move" data from table to table (via delete followed by insert) I would make sure to do it in 1 transaction.

I'm using a technique right now where we are storing a 'DeleteDate' on every user maintained table in our database. The DeleteDate field is a smalldatetime data type with a default value of 6/1/2079
Coupled with an index on the DeleteDate field, we are able to use a standard View or User-Defined-Function to return only the 'current' records (that is, those records with a delete date in the future). All queries route through this index when looking for current data, and deletes become a trivial update query.
There are some additional logic checks that need to be done for related tables. But that is part of the price of having to never worry about a user 'accidentally' deleting valuable data.
In the future, when these tables are excessively large and there are a lot of deleted records present, we can partition the table first on the DeleteDate. This will move all 'deleted' records away from the 'live' records.

Flagging an item as delete really complicates the information retrieval, and also, you need to deal with cascade remove by yourself.
I would choose the "mail box" approach, that move deleted records to different table. I have done a project that use soft-delete, and I end up put all delete calls to Stored Procedure and handle the copy and remove in Stored Procedure.

You should manage your hierarchy by tagging all subitems as deleted. This way if your eg. product belongs to a brand, you can check only if brand is deleted. You should also put your logic on data retrieval side, to avoid unnecessary gathering of deleted information.
SELECT
*
FROM
products p,
category c
WHERE
p.catId = c.Id
AND NOT c.Deleted
And above all, information about deleted category should be indexed.
CREATE PRIMARY INDEX ON category (Id)
CREATE INDEX ON category (Deleted)
or
CREATE INDEX ON category (Id, Deleted)

I think to flag the item is the best approach and even i also use the mail approach for the purpose of soft delete.
Yea that requires much stuff to take in account to manage but yet i didn't found any other way. I just add the one extra column to each and every table that is Status whose datatype is bit.
Thanks

How complex a delete technique are you asking for?
With just one date field and no audit log, you can have an instant deleted flag. If datefield is null, then it's not been deleted. You can then use that datefield on the index (if the index allows nulls).
If you want something more complex, then you could use extra tables. Will you allow it to be deleted, undeleted, redeleted and maintain a record of each of those? If so, keep a separate table for action logging and keep only the one record with a boolean field (actually a join on that table might be faster, depends on the data)

If you often reconsistute the items, flagging is a preferable means, but you end up having to alter your data access to avoid showing the items that are flagged, which can be rather painful if you have already set up a lot of code accessing your data, so moving may be better if you have a lot of "legacy" code accessing the data. If it is rare, and you are also interested in a history log, moving to another database table works well.
One easy way to achieve either is to use a trigger that changes the delete row and does the operation. If you actually do need to delete items, however, the flag option becomes a royal PITA when you flag rather than move items. The reason a trigger is easier in many cases is you capture every delete, not just those that are initiated by code.

Related

How to use Dapper with Change Tracking to save an altered list?

I am looking at Dapper as ORM for our next project, but something is not clear to me.
In this question there are answers on how to do inserts, updates and deletes.
Since this question is already a bit older, maybe there are better ways now a days..
But my biggest concern is how to do an ApplyUpdates on a list.
Suppose you have a List<Customer> that is build like shown here
And suppose you show this list in a DataGridView.
Now, the user will
alter the data of a few rows,
insert a few new rows,
delete a few rows
And when he clicks on the save button, at that time you want to save all these changes in this List<Customer> to your database, using Dapper.
How can I go about that ?
If I have to loop through the list and for each row call an insert, update or delete statement, then how can I determine what operation to use ? The deleted rows will be gone from the list.
I also want to make sure that if one statement fails, all will be rollbacked.
And I need the primary key for all new rows returned and filled in the DataGridView.
In other words, all that ADO DataAdapter/DataTable does for you.
What is the best way to do this using Dapper ?
EDIT
The best way I can think of now is to keep 3 list in memory and when the user alters some data, add a row in the update list, same for insert list and deleted list so I can run through these 3 list on the button click.
But I am hoping there is a better alternative build in Dapper for this kind of situation.
You need to handle this yourself, as Dapper doesn't manage it. There are several theories for how to do it.
Delete all items and then add them again.
Easy to implement.
Bad for DB performance, which is effectively 2 DB writes per row.
Loop through the items and update without checking for changes
Not too difficult to implement.
DB performance better than option 1, but not ideal.
Add and deletes are more complex to detect than updates.
Loop through the items and update only if there are differences
More difficult to implement.
Requires reading from the DB first to compare values (extra DB action)
Store changes in a separate list
Even more difficult to implement, as you need to "wrap" List updates into another class (first class collection?) and store changes
Most efficient for DB, as you execute only the minimum on each DB item.
In the end, you might select different approaches for different Entities depending on how you need to optimise. e.g. Option 1 is fine if you know you will only have a few entities and not many updates.

How to lock and unlock a SQL SERVER table?

At the risk of over-explaining my question, I'm going to err on the side of too much information.
I am creating a bulk upload process that inserts data into two tables. The two tables look roughly as follows. TableA is a self-referencing table that allows N levels of reference.
Parts (self-referencing table)
--------
PartId (PK Int Non-Auto-Incrementing)
DescriptionId (Fk)
ParentPartId
HierarchyNode (HierarchyId)
SourcePartId (VARCHAR(500) a unique Part Id from the source)
(other columns)
Description
--------
DescriptionId (PK Int Non-Auto-Incrementing)
Language (PK either 'EN' or 'JA')
DescriptionText (varchar(max))
(I should note too that there are other tables that will reference our PartID that I'm leaving out of this for now.)
In Description, the combo of Description and Language will be unique, but the actual `DescriptionID will always have at least two instances.
Now, for the bulk upload process, I created two staging tables that look a lot like Parts and Description but don't have any PK's, Indexes, etc. They are Parts_Staging and Description_Staging.
In Parts_Staging there is an extra column that contains a Hierarchy Node String, which is the HierarchyNode in this kind of format: /1/2/3/ etc. Then when data is copied from the _Staging table to the actual table, I use a CAST(Source.Column AS hierarchyid).
Because of the complexity of the ID's shared across the two tables, the self-referencing id's and the hierarchyid in Parts, and the number of rows to be inserted (possible in the 100,000's) I decided to 100% compile ALL of the data in a C# model first, including the PK ID's. So the process looks like this in C#:
Query the two tables for MAX ID
Using those Max ID's, compile a complete model of all the data for both tables (inlcuding the hierarchyid /1/2/3/)
Do a bulk insert into both _Staging Tables
Trigger a SP that copies non-duplicate data from the two _Staging tables into the actual tables. (This is where the CAST(Source.Column AS hierarchyid) happens).
We are importing lots of parts books, and a single part may be replicated across multiple books. We need to remove the duplicates. In step 4, duplicates are weeded out by checking the SourcePartId in the Parts table and the Description in the DescriptionText in the Description table.
That entire process works beautifully! And best of all, it's really fast. But, if you are reading this carefully (and I thank if you are) then you have already noticed one glaring, obvious problem.
If multiple processes are happening at the same time (and that absolutely WILL happen!) then there is a very real risk of getting the ID's mixed up and the data becoming really corrupted. Process1 could do the GET MAX ID query and before it manages to finish, Process2 could also do a GET MAX ID query, and because Process1 hasn't actually written to the tables yet, it would get the same ID's.
My original thought was to use a SEQUENCE object. And at first, that plan seemed to be brilliant. But it fell apart in testing because it's entirely possible that the same data will be processed more than once and eventually ignored when the copy happens from the _Staging tables to the final tables. And in that case, the SEQUENCE numbers will already be claimed and used, resulting in giant gaps in the ID's. Not that this is a fatal flaw, but it's an issue we would rather avoid.
So... that was a LOT of background info to ask this actual question. What I'm thinking of doing is this:
Lock both of the tables in question
Steps 1-4 as outlined above
Unlock both of the tables.
The lock would need to be a READ lock (which I think is an Exclusive lock?) so that if another process attempts to do the GET MAX ID query, it will have to wait.
My question is: 1) Is this the best approach? And 2) How does one place an Exclusive lock on a table?
Thanks!
I'm not sure in regards to what's the best approach but in terms of placing an 'exclusive' lock on a table, simply using with (TABLOCKX) in your query will put one on the table.
If you wish to learn about it;
https://msdn.microsoft.com/en-GB/library/ms187373.aspx

Best approach to track Amount field on Invoice table when InvoiceItem items change?

I'm building an app where I need to store invoices from customers so we can track who has paid and who has not, and if not, see how much they owe in total. Right now my schema looks something like this:
Customer
- Id
- Name
Invoice
- Id
- CreatedOn
- PaidOn
- CustomerId
InvoiceItem
- Id
- Amount
- InvoiceId
Normally I'd fetch all the data using Entity Framework and calculate everything in my C# service, (or even do the calculation on SQL Server) something like so:
var amountOwed = Invoice.Where(i => i.CustomerId == customer.Id)
.SelectMany(i => i.InvoiceItems)
.Select(ii => ii.Amount)
.Sum()
But calculating everything every time I need to generate a report doesn't feel like the right approach this time, because down the line I'll have to generate reports that should calculate what all the customers owe (sometimes go even higher on the hierarchy).
For this scenario I was thinking of adding an Amount field on my Invoice table and possibly an AmountOwed on my Customer table which will be updated or populated via the InvoiceService whenever I insert/update/delete an InvoiceItem. This should be safe enough and make the report querying much faster.
But I've also been searching some on this subject and another recommended approach is using triggers on my database. I like this method best because even if I were to directly modify a value using SQL and not the app services, the other tables would automatically update.
My question is:
How do I add a trigger to update all the parent tables whenever an InvoiceItem is changed?
And from your experience, is this the best (safer, less error-prone) solution to this problem, or am I missing something?
There are many examples of triggers that you can find on the web. Many are poorly written unfortunately. And for future reference, post DDL for your tables, not some abbreviated list. No one should need to ask about the constraints and relationships you have (or should have) defined.
To start, how would you write a query to calculate the total amount at the invoice level? Presumably you know the tsql to do that. So write it, test it, verify it. Then add your amount column to the invoice table. Now how would you write an update statement to set that new amount column to the sum of the associated item rows? Again - write it, test it, verify it. At this point you have all the code you need to implement your trigger.
Since this process involves changes to the item table, you will need to write triggers to handle all three types of dml statements - insert, update, and delete. Write a trigger for each to simplify your learning and debugging. Triggers have access to special tables - go learn about them. And go learn about the false assumption that a trigger works with a single row - it doesn't. Triggers must be written to work correctly if 0 (yes, zero), 1, or many rows are affected.
In an insert statement, the inserted table will hold all the rows inserted by the statement that caused the trigger to execute. So you merely sum the values (using the appropriate grouping logic) and update the appropriate rows in the invoice table. Having written the update statement mentioned in the previous paragraphs, this should be a relatively simple change to that query. But since you can insert a new row for an old invoice, you must remember to add the summed amount to the value already stored in the invoice table. This should be enough direction for you to start.
And to answer your second question - the safest and easiest way is to calculate the value every time. I fear you are trying to solve a problem that you do not have and that you may never have. Generally speaking, no one cares about invoices that are of "significant" age. You might care about unpaid invoices for a period of time, but eventually you write these things off (especially if the amounts are not significant). Another relatively easy approach is to create an indexed view to calculate and materialize the total amount. But remember - nothing is free. An indexed view must be maintained and it will add extra processing for DML statements affecting the item table. Indexed views do have limitations - which are documented.
And one last comment. I would strongly hesitate to maintain a total amount at any level higher than invoice. Above that level one frequently wants to filter the results in any ways - date, location, type, customer, etc. At this level you are approaching data warehouse functionality which is not appropriate for a OLTP system.
First of all never use triggers for business logic. Triggers are tricky and easily forgettable. It will be hard to maintain such application.
For most cases you can easily populate your reporting data via entity framework or SQL query. But if it requires lots of joins then you need to consider using staging tables. Because reporting requires data denormalization. To populate staging tables you can use SQL jobs or other schedule mechanism (Azure Scheduler maybe). This way you won't need to work with lots of join and your reports will populate faster.

SQL Merge Replication Issue

I have a issue regarding Merge Replication. I have a table SETTINGS where in i store the settings of my software.
The schema of the table is ID ( PK) , Description , Value.
Suppose i have 15 rows in this table on my server.
Now i have applied filter on this table saying only the first 10 rows would replicate.
Now with this settings when i sync for the first time, i receive the 10 rows on my client (having subscription).
Then i add the remaining 5 on my client.
Now when i sync again it gives me a conflict saying that
A row insert at 'ClientServer.ClientDatabaseName' could not be
propagated to 'MyServer.ServerDatabaseName'. This failure can be
caused by a constraint violation. Violation of PRIMARY KEY constraint
'PK_SETTINGS'. Cannot insert duplicate key in object 'dbo.SETTINGS'.
The duplicate key value is (11).
What i don't understand is why is it trying to replicate something (row) which is outside the subset filter applied on that table ?? Please help guys.
Is this scenario not possible with Merge replication ?
https://msdn.microsoft.com/en-us/library/ms151775.aspx the link suggests that this is possible. But confused.
Filters created on for a merge article are evaluated only at the publisher. Changes made at the subscriber will always be propagated back to the subscriber, even if they are outside the filter criteria. However if the changes from the one subscriber do not meet the filtering criteria, then they will sit on the publisher, but not be replicated to all the other subscribers.
Is this a production scenario, or are you playing around with replication? If you do static filtering, which is what you have above, it is typically done on read-only type of tables. For example, a salesperson in the field may only need prices for products in their region. They are not expected to update this table. If you do dynamic filtering, for example, filtering based on HOSTNAME(), then you would only get data specific for that user. For example, a salesperson in the field would receive only their customer information. Thus, any updates to that information, unless it's shared across multiple salespersons, would propagate back up, and not flow to anyone else.
In your case, i would not recommend updating tables on the subscriber that have static filters, thus i suggest re-evaluating your filtering design to ensure you have the right filtering model for your scenario.

Keep info about deleted entities

In our project we need the ability to say who and when delete some entity.
So after some investigation I've found the next solutions:
Add IsDeleted and DeletedBy columns to every table and set it before deletion (Using delete event of NH). But here is some drawback of this solution: we have many sql views which should work only with non deleted data. So to achieve this we should write View over each Table which will be something like a filter. (WHERE IsDeleted = 0)
Serialize to xml each entity before deletion and store it in single separate table with the next structure: Id | XML | Deleted By
From your point of view which of these solutions is prefered, or maybe there are other solutions I didn't mention above?
P.S. The deleted rows should be excluded from queries (Both Nhibernate and SQL).
I see three options:
Hard delete. The rows do not exist.
Soft delete. As you describe. Yep, you'll have to tack on IsSoftDeleted checks EVERYWHERE. EVERYWHERE. EVERYWHERE. Its a total pain.
Archive table. Create a table that is an exact replica of the existing table...and do the move (to the archive table) and the delete (from the original table) in a transaction.
I've worked with #2 an #3. I prefer #3 because you avoid the EVERYWHERE additional clauses.
With #2, you may also have to figure out constraints that allow for 1 non-soft-deleted row (based on the unique constraint) but also allow duplicates of soft-deleted-rows that violate the unique-constraint. Yep, good times.

Categories

Resources