I have a Sharepoint list on a site that I want to update nightly from a SQL server DB, preferably using C#. Here is the catch, I do not know if any records were removed, added, or if any field in any record has been updated. I would believe then the simplest thing to do is remove the data from the list and then replace it with the new list data. But is there any simple way to do this? I would hate to remove 3000+ items line by line from the list and then add the 3000+ records one at a time.
Its up to your environment. If you not that much load on the systems in the night, i would prefer one of the following ways:
1) Build a timerjob, delete the list (not the items one by one, cause this is slow), recreate the list and import the items from the db. When we are talking about 3.000 - 5.000 Elements, this is not that much and i think done under 10 Minutes.
2) Loop through the sharepoint list with the items and check field by field if it was updated within the db and if yes, update it.
I would preferr to delete the list and import the complete table, cause we are talking about not that much data.
Another way, which is a good idea, is to use BCS or BDC. Then you would have the data always in place and synched with the db. Look at
https://msdn.microsoft.com/en-us/library/office/jj163782.aspx
https://msdn.microsoft.com/de-de/library/ee231515(v=vs.110).aspx
Unfortunately there is no "easy" and/or elegant way to delete all the items in a list, like the delete statement in SQL. You can either delete the entire list and recreate it if the list can be easily created from a list definition or, if your concern is performance, since SP 2007 the SPWeb Class has a method called ProcessBatchData. You can use it to batch process commands to avoid the performance penalty of issuing 6000 separate commands to the server. However, it still requires you to pass an ugly XML that contains a list of all the items to be deleted or added.
The ideal way is to enumerate all the rows from the database and see if each row already exists in the SharePoint list using a primary field value. If it already exists, simply update them[1]. Otherwise you can add a new item.
[1] - Optionally, while updating them we can compare the list item field values with database column values. Only if there is a change in any of the field, update it. Otherwise skip it.
Related
I am looking at Dapper as ORM for our next project, but something is not clear to me.
In this question there are answers on how to do inserts, updates and deletes.
Since this question is already a bit older, maybe there are better ways now a days..
But my biggest concern is how to do an ApplyUpdates on a list.
Suppose you have a List<Customer> that is build like shown here
And suppose you show this list in a DataGridView.
Now, the user will
alter the data of a few rows,
insert a few new rows,
delete a few rows
And when he clicks on the save button, at that time you want to save all these changes in this List<Customer> to your database, using Dapper.
How can I go about that ?
If I have to loop through the list and for each row call an insert, update or delete statement, then how can I determine what operation to use ? The deleted rows will be gone from the list.
I also want to make sure that if one statement fails, all will be rollbacked.
And I need the primary key for all new rows returned and filled in the DataGridView.
In other words, all that ADO DataAdapter/DataTable does for you.
What is the best way to do this using Dapper ?
EDIT
The best way I can think of now is to keep 3 list in memory and when the user alters some data, add a row in the update list, same for insert list and deleted list so I can run through these 3 list on the button click.
But I am hoping there is a better alternative build in Dapper for this kind of situation.
You need to handle this yourself, as Dapper doesn't manage it. There are several theories for how to do it.
Delete all items and then add them again.
Easy to implement.
Bad for DB performance, which is effectively 2 DB writes per row.
Loop through the items and update without checking for changes
Not too difficult to implement.
DB performance better than option 1, but not ideal.
Add and deletes are more complex to detect than updates.
Loop through the items and update only if there are differences
More difficult to implement.
Requires reading from the DB first to compare values (extra DB action)
Store changes in a separate list
Even more difficult to implement, as you need to "wrap" List updates into another class (first class collection?) and store changes
Most efficient for DB, as you execute only the minimum on each DB item.
In the end, you might select different approaches for different Entities depending on how you need to optimise. e.g. Option 1 is fine if you know you will only have a few entities and not many updates.
It's a struggle with any application that provides select fields, that are populated by a certain datasource: Everything works fine in the first place, but once the application ages, some older entries might be deleted, leading to the problem that prior select fields can no longer access the entity in question.
Opening a view, where a select points to an already deleted datarow will (best case) show empty string.
We designed our system in a way, that deletions are not real delete-operations, but only the setting of a deleted flag. (So, all the information is still there)
However, when using Databindings along with C# (or even if not) the most blatant use case is still not covered by general mechanics (I assume):
Select-Field should show all NOT-Deleted-Entities while creating a new object containing references to the entity in question.
Select-Field (populated the very same way) should show the "deleted" entity, if it was selected "days/months/years" ago.
Is there a "handy" solution to this?
Currently we are using a "Proxy-Method" for every datasource, which will reload the data of the deleted entity, if it's not in the "available data" collection - but it's hard to believe there is no better way to deal with this, as this problem applies for almost every language out there?
In a normalized database you would have a constraint with ON DELETE NO ACTION/RESTRICT event that would prevent removal of a referenced element from the list. It would force you to decide what is to be done with the referencing rows.
With your manually-controlled deletions this could have been covered by a trigger. As none of these were implemented, you are left with only one thing to do: updating the dropdown with the selected option before rendering the UI. My approach (in Java, I'm not good at C#):
List<String> options = getNonDeletedWhatever();
if (!options.contains(currentEntity.getWhatever())) {
options.add(currentEntity.getWhatever()); // This optionally inserts an outdated value
}
or simply:
Set<String> options = getNonDeletedWhatever();
options.add(currentEntity.getWhatever()); // This optionally inserts an outdated value
I solve it by creating a list of available (non-deleted) items and if the selected item is a deleted one, then I add that item to the list.
This list becomes the data source for my dropdown.
my following problem is, that I have a List of Items and want to index those with elasticsearch. I have a running elasticsearch instance, and this instance has an index called "default".
So I'm running following code:
var items = GetAListOfItem();
var response = Client.IndexMany(items);
I also tried it with Client.IndexManyAsync(items). But that didn't do anything.
Only 1 Item of this List gets indexed. Nothing more. I think its the last item, which got indexed.
I thought it could be a thing with IEnumerable and multiple enumerations, but i parsed it as a List<Item>.
Another Question would be about the best practice with Elasticsearch. Is it common to use a Index per Model. So if I'm gathering data from for example Exchange and another system, I would do 2 indeces?
ExchangeIndex
OtherSystemIndex
Thank you for your help.
Update: I saw that my Client.Index does all those calls succesful, but all those objects got the same ID from NEST. Normally she had to increment by herself, isnt it?
Update 2: I fixed the Indexing Problem. I had setup an empty ID-Field.
But still have the question mit best practive about Elasticsearch.
If you are uploading all the data with the same id, it will not increment the id, that will update the record with that id and you will have only one record, so you can upload the data without an id or give wherever unique id to identified the records.
The other common problem is that your records have not the same mapping that you give for the index.
About the other question, in the indexes, you store the information that is relevant for you, even if that have content from many models, the only thing that you have to avoid is mix information, if you have an index about server logs dont mix it with user activities for example.
I am using C# with .NET 4.5. I am making a scraper which collects specific data. Each time a value is scraped, I need to make sure it hasn't already been added to the SQLite db.
To do this, I am making a call each time a value is scraped to query against the db to check if it contains the value, and if not, I make another call to insert the value into the db.
Since I am scraping multiple values per second, this gets to be very IO-intensive, with constant calls to the db.
My question is, is there any better way to do this? Perhaps I could queue the values scraped and then run a batch query at once? Is that possible?
I see three approaches:
Use INSERT OR IGNORE, which will reject an entry if it is already present (based on primary key and unique fields). Or plainly INSERT (or its equivalent (INSERT or ABORT) which will return SQLITE_CONSTRAINT, a value you will have to catch and manage if you want to count failed insertions.
Accumulate, outside the database, the updates you want to make. When you have accumulated enough/all, start a transaction (BEGIN;), do your insertions (you can use INSERT OR IGNORE here as well), commit the transaction (COMMIT;)
You could pre-fetch a list of items you already have, depending, and check against that list, if your data model allows it.
I'm deleting data from database which is about 1.8GB big. (through C# app)
The same operation on smaller databases (~600MB) run without problem, but on the big one I'm getting:
Lock wait timeout exceeded; try restarting transaction.
Will innodb_lock_wait_timeout fix the problem or there is another way?
I don't think that optimizing queries is a solution, because there is no way to make them simpler.
I'm deleting parts of the data on conditions and relations, not all the data.
You can split the delete statement into smaller parts that wont timeout.
Like delete these stuff from id 1 to 1000 ,execute and commit, do the same for ids 10.000 -20.000 and etc..
You mentioned that you were '...deleting parts of the data based on some conditions and relations, not all the data'. I would check that there are appropriate indexes on all the keys you are using to filter the data to delete.
If you were to show us your schema and where clause we could suggest ones that may help.
You should also consider splitting your delete into multiple batches of smaller numbers of rows.
Another alternative would be to do a SELECT INTO, with only the data you want to keep into another table, drop the original, then rename this new table.
Right Click the table--> Script Table As --> Create To --> New query.. save the query...
Right click the table --> delete
Refresh your Database and Intellisense to forget the table and then run the script which will recreate the table and that's how you have an empty table...
Or you can simply increase the setting for innodb_lock_wait_timeout (or table_lock_wait_timeout, not sure which) if you dont want to delete all the info in the table
If you're deleting all the rows in the table, use
Truncate table *tablename*
The delete command uses the transaction log when completing the task, but truncate cleans it out without logging.