Split a .NET DataSet? - c#

I am working with a webservice that accepts a ADO.NET DataSet. The webservice will reject a submission if over 1000 rows are changed across all of the ten or so tables in the dataset.
I need to take my dataset and break it apart into chunks of less than 1000 changed rows. I can use DataSet.GetChanges() to produce a reduced dataset, but that still may exceed the changed row limit. Often a single table will have more than 1000 changes.
Right now, I think I need to: Create an empty copy of the dataset
Iterate over the DataTableCollection and .Add rows individually to the new tables until I get to the limit. Start a new dataset, and repeat until I've gone through everything.
Am I missing a simpler approach to this?

This is asking for trouble. Often changes to one table are dependent on changes to another. You don't want to split those up or bad things which may be difficult to debug problems will occur unless you are very very careful about this. Most likely, the "right" thing to do here is to submit changes to the webservice on a more frequent basis instead of batching them up so much.

Related

How to use Dapper with Change Tracking to save an altered list?

I am looking at Dapper as ORM for our next project, but something is not clear to me.
In this question there are answers on how to do inserts, updates and deletes.
Since this question is already a bit older, maybe there are better ways now a days..
But my biggest concern is how to do an ApplyUpdates on a list.
Suppose you have a List<Customer> that is build like shown here
And suppose you show this list in a DataGridView.
Now, the user will
alter the data of a few rows,
insert a few new rows,
delete a few rows
And when he clicks on the save button, at that time you want to save all these changes in this List<Customer> to your database, using Dapper.
How can I go about that ?
If I have to loop through the list and for each row call an insert, update or delete statement, then how can I determine what operation to use ? The deleted rows will be gone from the list.
I also want to make sure that if one statement fails, all will be rollbacked.
And I need the primary key for all new rows returned and filled in the DataGridView.
In other words, all that ADO DataAdapter/DataTable does for you.
What is the best way to do this using Dapper ?
EDIT
The best way I can think of now is to keep 3 list in memory and when the user alters some data, add a row in the update list, same for insert list and deleted list so I can run through these 3 list on the button click.
But I am hoping there is a better alternative build in Dapper for this kind of situation.
You need to handle this yourself, as Dapper doesn't manage it. There are several theories for how to do it.
Delete all items and then add them again.
Easy to implement.
Bad for DB performance, which is effectively 2 DB writes per row.
Loop through the items and update without checking for changes
Not too difficult to implement.
DB performance better than option 1, but not ideal.
Add and deletes are more complex to detect than updates.
Loop through the items and update only if there are differences
More difficult to implement.
Requires reading from the DB first to compare values (extra DB action)
Store changes in a separate list
Even more difficult to implement, as you need to "wrap" List updates into another class (first class collection?) and store changes
Most efficient for DB, as you execute only the minimum on each DB item.
In the end, you might select different approaches for different Entities depending on how you need to optimise. e.g. Option 1 is fine if you know you will only have a few entities and not many updates.

storing dataset of entire table and doing query on copy then updating GridView with results of query

I'm new to n-tier enterprise development. I just got quite a tutorial just reading threw the 'questions that may already have your answer' but didn't find what I was looking for. I'm doing a geneology site that starts off with the first guy that came over on the boat, you click on his name and the grid gets populated with all his children, then click on one of his kids that has kids and the grid gets populated with his kids and so forth. Each record has an ID and a ParentID. When you choose any given person, the ID is stored and then used in a search for all records that match the ParentID which returns all the kids. The data is never changed (at least by the user) so I want to just do one database access, fill all fields into one datatable and then do a requery of it each time to get the records to display. In the DAL I put all the records into a List which, in the ObjectDataSource the function that fills the GridView just returns the List of all entries. What I want to do is requery the datatable, fill the list back up with the new query and display in the GridView. My code is in 3 files here
(I can't get the backticks to show my code in this window) All I need is to figure out how to make a new query on the existing DataTable and copy it to a new DataTable. Hope this explains it well enough.
[edit: It would be easier to just do a new query from the database each time and it would be less resource intensive (in the future if the database gets too large) to store in memory, but I just want to know if I can do it this way - that is, working from 1 copy of the entire table] Any ideas...
Your data represents a tree structure by nature.
A grid to display it may not be my first choice...
Querying all data in one query can be done by using a complex SP.
But you are already considering performance. Thats always a good thing to keep in mind when coming up with a design. But creating something, improve it and only then start to optimize seems a better to go.
Since relational databases are not real good on hierarchical data, consider a nosql (graph)database. As you mentioned there are almost no writes to the DB, nosql shines here.

Paging through a datatable in codebehind

I need to handle very large datatables (2 million rows+) that comes from databases (SQL, Oracle, Access, MySQL, Sharepoint etc) outside of my control: Currently I loop through every row and column building a string object, but I run out of memory at about 100k rows.
The only solution I may take is to break the datatable into smaller pieces and persisting each block before starting on the next block of rows.
Since I cannot add ROW_NUMBER() or anything similar, I have to handle the populated datatable.
How can I easily (keep performance in mind) break the populated datatable into smaller datatables like paging?
PS there is no visual component to this functionality.
Are you using string concatenation? like this string += string.
Change that to StringBuilder and you should not have problems, at least not for 20k rows.
If you are talking about filling a DataTable object (which loads the results of your calls into memory before processing), you will likely be better off using a datareader for each of the mentioned providers so then you can process each row as it is read from the database instead of storing the DataTable in memory...
A great answer to another question lists the pro/cons of datareaders/datatables
If you're already using datareaders- ignore this. But your memory problem might be from also storing the retrieved results...

DataTable Update Problem

What is the best method for saving thousands of rows and after doing something, updating them.
Currently, I use a datatable, filling it, when done inserting by
MyDataAdapter.Update(MyDataTable)
After doing some change on MyDataTable, I again use MyDataAdapter.Update(MyDataTable) method.
Edit:
I am sorry for not providing more info.
There may be up to 200.000 rows which will be created from an XML file. There rows will be saved to the database. After than there will be some process for each row. And I will need to update each row in database.
Instead of updating row by row, I decided to update the datatable and using the same dataadapter to update the rows.
This is the best of me.
I think that there may be a smarter approach.
In Reacting to your comments:
An DataAdapter.Update() will Udate (and Insert/Delete) row by row. If you have individual changes there really is no faster way. If you have systematic changes, like SET Price = Price+ 2 WHERE SelByDate < '1/1/2010' you are better of by running a DbCommand against the database.
But maybe you should worry about transactions and error handling before performance.
If I understand correctly you are doing two separate operations: loading rows to a database, and then updating those rows.
If the rows you are inserting come from another ADO.NET supported datasource then you can use SqlBulkCopy to insert the rows in batches, which will be more efficient than using a datatable.
Once the rows are in the database I would assume you would be better off executing a SQLCommand to modify their values.
If you can provide more details about what--and why--you're asking the question then perhaps we can better tailor an answer for it.

Minimise database updates from changes in DataTable/SqlDataAdapter

My goal is to maximise performance. The basics of the scenario are:
I read some data from SQL Server 2005 into a DataTable (1000 records x 10 columns)
I do some processing in .NET of the data, all records have at least 1 field changed in the DataTable, but potentially all 10 fields could be changed
I also add some new records in to the DataTable
I do a SqlDataAdapter.Update(myDataTable.GetChanges()) to persist the updates (an inserts) back to the db using a InsertCommand and UpdateCommand I defined at the start
Assume table being updated contains 10s of millions of records
This is fine. However, if a row has changed in the DataTable then ALL columns for that record are updated in the database even if only 1 out of 9 columns has actually changed value. This means unnecessary work, particularly if indexes are involved. I don't believe SQL Server optimises this scenario?
I think, if I was able to only update the columns that had actually changed for any given record, that I should see a noticeable performance improvement (esp. as cumulatively I will be dealing with millions of rows).
I found this article: http://netcode.ru/dotnet/?lang=&katID=30&skatID=253&artID=6635
But don't like the idea of doing multiple UPDATEs within the sproc.
Short of creating individual UPDATE statements for each changed DataRow and then firing them in somehow in a batch, I'm looking for other people's experiences/suggestions.
(Please assume I can't use triggers)
Thanks in advance
Edit: Any way to get SqlDataAdapter to send UPDATE statements specific to each changed DataRow (only to update the actual changed columns in that row) rather than giving a general .UpdateCommand that updates all columns?
Isn't it possible to implement your own IDataAdapter where you implement this functionality ?
Offcourse, the DataAdapter only fires the correct SqlCommand, which is determined by the RowState of each DataRow.
So, this means that you would have to generate the SQL command that has to be executed for each situation ...
But, I wonder if it is worth the effort. How much performance will you gain ?
I think that - if it is really necessary - I would disable all my indexes and constraints, do the update using the regular SqlDataAdapter, and afterwards enable the indexes and constraints.
you might try is do create an XML of your changed dataset, pass it as a parameter ot a sproc and the do a single update by using sql nodes() function to translate the xml into a tabular form.
you should never try to update a clustered index. if you do it's time to rethink your db schema.
I would VERY much suggest that you do this with a stored procedure.
Lets say that you have 10 million records you have to update. And lets say that each record has 100 bytes (for 10 columns this could be too small, but lets be conservative). This amounts to cca 100 MB of data that must be transferred from database (network traffic), stored in memory and than returned to database in form of UPDATE or INSERT that are much more verbose for transfer to database.
I expect that SP would perform much better.
Than again you could divide you work into smaller SP (that are called from main SP) that would update just the necessary fields and that way gain additional performance.
Disabling indexes/constraints is also an option.
EDIT:
Another thing you must consider is potential number of different update statements. In case of 10 fields per row any field could stay the same or change. So if you construct your UPDATE statement to reflect this you could potentially get 10^2 = 1024 different UPDATE statements and any of those must be parsed by SQL Server, execution plan calculated and parsed statement stored in some area. There is a price to do this.

Categories

Resources