Deleting from huge database

Deleting from huge database - c#

I'm deleting data from database which is about 1.8GB big. (through C# app)
The same operation on smaller databases (~600MB) run without problem, but on the big one I'm getting:
Lock wait timeout exceeded; try restarting transaction.
Will innodb_lock_wait_timeout fix the problem or there is another way?
I don't think that optimizing queries is a solution, because there is no way to make them simpler.
I'm deleting parts of the data on conditions and relations, not all the data.

You can split the delete statement into smaller parts that wont timeout.
Like delete these stuff from id 1 to 1000 ,execute and commit, do the same for ids 10.000 -20.000 and etc..

You mentioned that you were '...deleting parts of the data based on some conditions and relations, not all the data'. I would check that there are appropriate indexes on all the keys you are using to filter the data to delete.
If you were to show us your schema and where clause we could suggest ones that may help.
You should also consider splitting your delete into multiple batches of smaller numbers of rows.
Another alternative would be to do a SELECT INTO, with only the data you want to keep into another table, drop the original, then rename this new table.

Right Click the table--> Script Table As --> Create To --> New query.. save the query...
Right click the table --> delete
Refresh your Database and Intellisense to forget the table and then run the script which will recreate the table and that's how you have an empty table...
Or you can simply increase the setting for innodb_lock_wait_timeout (or table_lock_wait_timeout, not sure which) if you dont want to delete all the info in the table

If you're deleting all the rows in the table, use
Truncate table *tablename*
The delete command uses the transaction log when completing the task, but truncate cleans it out without logging.

Related

How to use Dapper with Change Tracking to save an altered list?

I am looking at Dapper as ORM for our next project, but something is not clear to me.
In this question there are answers on how to do inserts, updates and deletes.
Since this question is already a bit older, maybe there are better ways now a days..
But my biggest concern is how to do an ApplyUpdates on a list.
Suppose you have a List<Customer> that is build like shown here
And suppose you show this list in a DataGridView.
Now, the user will
alter the data of a few rows,
insert a few new rows,
delete a few rows
And when he clicks on the save button, at that time you want to save all these changes in this List<Customer> to your database, using Dapper.
How can I go about that ?
If I have to loop through the list and for each row call an insert, update or delete statement, then how can I determine what operation to use ? The deleted rows will be gone from the list.
I also want to make sure that if one statement fails, all will be rollbacked.
And I need the primary key for all new rows returned and filled in the DataGridView.
In other words, all that ADO DataAdapter/DataTable does for you.
What is the best way to do this using Dapper ?
EDIT
The best way I can think of now is to keep 3 list in memory and when the user alters some data, add a row in the update list, same for insert list and deleted list so I can run through these 3 list on the button click.
But I am hoping there is a better alternative build in Dapper for this kind of situation.

You need to handle this yourself, as Dapper doesn't manage it. There are several theories for how to do it.
Delete all items and then add them again.
Easy to implement.
Bad for DB performance, which is effectively 2 DB writes per row.
Loop through the items and update without checking for changes
Not too difficult to implement.
DB performance better than option 1, but not ideal.
Add and deletes are more complex to detect than updates.
Loop through the items and update only if there are differences
More difficult to implement.
Requires reading from the DB first to compare values (extra DB action)
Store changes in a separate list
Even more difficult to implement, as you need to "wrap" List updates into another class (first class collection?) and store changes
Most efficient for DB, as you execute only the minimum on each DB item.
In the end, you might select different approaches for different Entities depending on how you need to optimise. e.g. Option 1 is fine if you know you will only have a few entities and not many updates.

C#-Replacing Sharepoint list data nightly

I have a Sharepoint list on a site that I want to update nightly from a SQL server DB, preferably using C#. Here is the catch, I do not know if any records were removed, added, or if any field in any record has been updated. I would believe then the simplest thing to do is remove the data from the list and then replace it with the new list data. But is there any simple way to do this? I would hate to remove 3000+ items line by line from the list and then add the 3000+ records one at a time.

Its up to your environment. If you not that much load on the systems in the night, i would prefer one of the following ways:
1) Build a timerjob, delete the list (not the items one by one, cause this is slow), recreate the list and import the items from the db. When we are talking about 3.000 - 5.000 Elements, this is not that much and i think done under 10 Minutes.
2) Loop through the sharepoint list with the items and check field by field if it was updated within the db and if yes, update it.
I would preferr to delete the list and import the complete table, cause we are talking about not that much data.
Another way, which is a good idea, is to use BCS or BDC. Then you would have the data always in place and synched with the db. Look at
https://msdn.microsoft.com/en-us/library/office/jj163782.aspx
https://msdn.microsoft.com/de-de/library/ee231515(v=vs.110).aspx

Unfortunately there is no "easy" and/or elegant way to delete all the items in a list, like the delete statement in SQL. You can either delete the entire list and recreate it if the list can be easily created from a list definition or, if your concern is performance, since SP 2007 the SPWeb Class has a method called ProcessBatchData. You can use it to batch process commands to avoid the performance penalty of issuing 6000 separate commands to the server. However, it still requires you to pass an ugly XML that contains a list of all the items to be deleted or added.

The ideal way is to enumerate all the rows from the database and see if each row already exists in the SharePoint list using a primary field value. If it already exists, simply update them[1]. Otherwise you can add a new item.
[1] - Optionally, while updating them we can compare the list item field values with database column values. Only if there is a change in any of the field, update it. Otherwise skip it.

How to query an SQLite db in batches

I am using C# with .NET 4.5. I am making a scraper which collects specific data. Each time a value is scraped, I need to make sure it hasn't already been added to the SQLite db.
To do this, I am making a call each time a value is scraped to query against the db to check if it contains the value, and if not, I make another call to insert the value into the db.
Since I am scraping multiple values per second, this gets to be very IO-intensive, with constant calls to the db.
My question is, is there any better way to do this? Perhaps I could queue the values scraped and then run a batch query at once? Is that possible?

I see three approaches:
Use INSERT OR IGNORE, which will reject an entry if it is already present (based on primary key and unique fields). Or plainly INSERT (or its equivalent (INSERT or ABORT) which will return SQLITE_CONSTRAINT, a value you will have to catch and manage if you want to count failed insertions.
Accumulate, outside the database, the updates you want to make. When you have accumulated enough/all, start a transaction (BEGIN;), do your insertions (you can use INSERT OR IGNORE here as well), commit the transaction (COMMIT;)
You could pre-fetch a list of items you already have, depending, and check against that list, if your data model allows it.

Remove record from a datatable when record disappears from a mirrored datatable

I have two mirrored datatables (same structure with two primary keys) :
DataTable_A ---> bound to a datagridView
DataTable_B ---> filled from a database
Since DataTable_B is filled by a query into database every 2 seconds, I need to mirror the DataTable_A like DataTable_B avoiding filling directly DataTable_A. When a record disappears from DataTable_B i need to delete the record also from DataTable_A. What is the best way to do this ?
Right now I am doing a "for cycle" on each row of DataTable_B and if the row doesn't exist on DataTable_A, I delete it.
Is there a better way to do it ?

The best way may be not to have a TableA at all but use a DataView on TableB. That would solve all problems at once. Can you elaborate on why you need the copy?
But otherwise you would want to handle the RowChanged and TableNewRow RowDeleted event of TableB
A more general idea, after seeing your comments: If it is possible to add a Timestamp column to the table in the database you can run a much more efficient query. And the DataTable.Merge method would do the rest.

Minimise database updates from changes in DataTable/SqlDataAdapter

My goal is to maximise performance. The basics of the scenario are:
I read some data from SQL Server 2005 into a DataTable (1000 records x 10 columns)
I do some processing in .NET of the data, all records have at least 1 field changed in the DataTable, but potentially all 10 fields could be changed
I also add some new records in to the DataTable
I do a SqlDataAdapter.Update(myDataTable.GetChanges()) to persist the updates (an inserts) back to the db using a InsertCommand and UpdateCommand I defined at the start
Assume table being updated contains 10s of millions of records
This is fine. However, if a row has changed in the DataTable then ALL columns for that record are updated in the database even if only 1 out of 9 columns has actually changed value. This means unnecessary work, particularly if indexes are involved. I don't believe SQL Server optimises this scenario?
I think, if I was able to only update the columns that had actually changed for any given record, that I should see a noticeable performance improvement (esp. as cumulatively I will be dealing with millions of rows).
I found this article: http://netcode.ru/dotnet/?lang=&katID=30&skatID=253&artID=6635
But don't like the idea of doing multiple UPDATEs within the sproc.
Short of creating individual UPDATE statements for each changed DataRow and then firing them in somehow in a batch, I'm looking for other people's experiences/suggestions.
(Please assume I can't use triggers)
Thanks in advance
Edit: Any way to get SqlDataAdapter to send UPDATE statements specific to each changed DataRow (only to update the actual changed columns in that row) rather than giving a general .UpdateCommand that updates all columns?

Isn't it possible to implement your own IDataAdapter where you implement this functionality ?
Offcourse, the DataAdapter only fires the correct SqlCommand, which is determined by the RowState of each DataRow.
So, this means that you would have to generate the SQL command that has to be executed for each situation ...
But, I wonder if it is worth the effort. How much performance will you gain ?
I think that - if it is really necessary - I would disable all my indexes and constraints, do the update using the regular SqlDataAdapter, and afterwards enable the indexes and constraints.

you might try is do create an XML of your changed dataset, pass it as a parameter ot a sproc and the do a single update by using sql nodes() function to translate the xml into a tabular form.
you should never try to update a clustered index. if you do it's time to rethink your db schema.

I would VERY much suggest that you do this with a stored procedure.
Lets say that you have 10 million records you have to update. And lets say that each record has 100 bytes (for 10 columns this could be too small, but lets be conservative). This amounts to cca 100 MB of data that must be transferred from database (network traffic), stored in memory and than returned to database in form of UPDATE or INSERT that are much more verbose for transfer to database.
I expect that SP would perform much better.
Than again you could divide you work into smaller SP (that are called from main SP) that would update just the necessary fields and that way gain additional performance.
Disabling indexes/constraints is also an option.
EDIT:
Another thing you must consider is potential number of different update statements. In case of 10 fields per row any field could stay the same or change. So if you construct your UPDATE statement to reflect this you could potentially get 10^2 = 1024 different UPDATE statements and any of those must be parsed by SQL Server, execution plan calculated and parsed statement stored in some area. There is a price to do this.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Deleting from huge database - c#

You can split the delete statement into smaller parts that wont timeout. Like delete these stuff from id 1 to 1000 ,execute and commit, do the same for ids 10.000 -20.000 and etc..

If you're deleting all the rows in the table, use Truncate table tablename The delete command uses the transaction log when completing the task, but truncate cleans it out without logging.

Related

How to use Dapper with Change Tracking to save an altered list?

C#-Replacing Sharepoint list data nightly

How to query an SQLite db in batches

Remove record from a datatable when record disappears from a mirrored datatable

Minimise database updates from changes in DataTable/SqlDataAdapter

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Deleting from huge database - c#

You can split the delete statement into smaller parts that wont timeout. Like delete these stuff from id 1 to 1000 ,execute and commit, do the same for ids 10.000 -20.000 and etc..

If you're deleting all the rows in the table, use Truncate table *tablename* The delete command uses the transaction log when completing the task, but truncate cleans it out without logging.

Related

How to use Dapper with Change Tracking to save an altered list?

C#-Replacing Sharepoint list data nightly

How to query an SQLite db in batches

Remove record from a datatable when record disappears from a mirrored datatable

Minimise database updates from changes in DataTable/SqlDataAdapter

Categories

Resources

If you're deleting all the rows in the table, use Truncate table tablename The delete command uses the transaction log when completing the task, but truncate cleans it out without logging.