How can I do these 2 scenarios.
Currently I am doing something like this
public class Repository
{
private LinqtoSqlContext dbcontext = new LinqtoSqlContext();
public void Update()
{
// find record
// update record
// save record ( dbcontext.submitChanges()
}
public void Insert()
{
// make a database table object ( ie ProductTable t = new ProductTable() { productname
="something"}
// insert record ( dbcontext.ProductTable.insertOnSubmit())
// dbcontext.submitChanges();
}
}
So now I am trying to load an XML file what has tons of records. First I validate the records one at a time. I then want to insert them into the database but instead of doing submitChanges() after each record I want to do a mass submit at the end.
So I have something like this
public class Repository
{
private LinqtoSqlContext dbcontext = new LinqtoSqlContext();
public void Update()
{
// find record
// update record
}
public void Insert()
{
// make a database table object ( ie ProductTable t = new ProductTable() { productname
="something"}
// insert record ( dbcontext.ProductTable.insertOnSubmit())
}
public void SaveToDb()
{
dbcontext.submitChanges();
}
}
Then in my service layer I would do like
for(int i = 0; i < 100; i++)
{
validate();
if(valid == true)
{
update();
insert()
}
}
SaveToDb();
So pretend my for loop is has a count for all the record found in the xml file. I first validate it. If valid then I have to update a table before I insert the record. I then insert the record.
After that I want to save everything in one go.
I am not sure if I can do a mass save when updating of if that has to be after every time or what.
But I thought it would work for sure for the insert one.
Nothing seems to crash and I am not sure how to check if the records are being added to the dbcontext.
The simple answer is: you do not. Linq2Sql is a lot of things - it is not a replacement for bulk upload / bulk copy. You will be a LOT more efficient using the ETL route:
Generate a flat file (csv etc.) with the new data
Load it into the database using bulk load mechanisms
If the data is updating etc. - load it into temporary tables and use the MERGE command to merge it into the main table.
Linq2Sql will by design always suck in mass insert scenarios. ORM's just are not ETL tools.
Linq2SQL (as has been noted) does not handle this well by default, but luckily there are some solutions out there. here's one i used for a website when i wanted to do some bulk deletes. It worked well for me and due to its use of extension methods it was basically indistinguishable from regular Lin2SQL methods.
I haven't really "released" this project yet, but it's a T4-based repository system that extends Linq To SQL and implements a bunch of batch operations (delete, update, create csv, etc.): http://code.google.com/p/grim-repo/. You can check out the source code and implement it however you see fit.
Also, this link has some great source code for batch operations: http://www.aneyfamily.com/terryandann/post/2008/04/Batch-Updates-and-Deletes-with-LINQ-to-SQL.aspx
And, also, I know it's tempting, but don't crap on the elderly. Try performing batch operations with DataAdapters/ADO.net: http://davidhayden.com/blog/dave/archive/2006/01/05/2665.aspx. It's faster, but inevitably hairier.
Finally, if you have an XML file, you can create a stored procedure that takes advantage of SQL server's built-in sproc, sp_xml_preparedocument. Check out how to use it here: http://msdn.microsoft.com/en-us/library/ms187367.aspx
Even when you add multiple records to the DataContext before calling SubmitChanges, LINQ2SQL will loop through and insert them one by one. You can verify this by implementing one of the partial methods on an entity class ("InsertMyObject(MyObject instance)"). It will be called for each pending row individually.
I don't see anything wrong with your plan -- you say it works, but you just don't know how to verify it? Can't you simply look in the database to check if the records got added?
Another way to see what records are pending in the DataContext and have not yet been added is to call GetChangeSet() on the data context and then refer to the "Inserts" property of the returned object to get a list of rows that will be inserted when SubmitChanges is called.
Related
Just a bit of an outline of what i am trying to accomplish.
We keep a local copy of a remote database (3rd party) within our application. To download the information we use an api.
We currently download the information on a schedule which then either inserts new records into the local database or updates the existing records.
here is how it currently works
public void ProcessApiData(List<Account> apiData)
{
// get the existing accounts from the local database
List<Account> existingAccounts = _accountRepository.GetAllList();
foreach(account in apiData)
{
// check if it already exists in the local database
var existingAccount = existingAccounts.SingleOrDefault(a => a.AccountId == account.AccountId);
// if its null then its a new record
if(existingAccount == null)
{
_accountRepository.Insert(account);
continue;
}
// else its a new record so it needs updating
existingAccount.AccountName = account.AccountName;
// ... continue updating the rest of the properties
}
CurrentUnitOfWork.SaveChanges();
}
This works fine, however it just feels like this could be improved.
There is one of these methods per Entity, and they all do the same thing (just updating different properties) or inserting a different Entity. Would there be anyway to make this more generic?
It just seems like a lot of database calls, would there be anyway to "Bulk" do this. I've had a look at this package which i have seen mentioned on a few other posts https://github.com/loresoft/EntityFramework.Extended
But it seems to focus on bulk updating a single property with the same value, or so i can tell.
Any suggestions on how i can improve this would be brilliant. I'm still fairly new to c# so i'm still searching for the best way to do things.
I'm using .net 4.5.2 and Entity Framework 6.1.3 with MSSQL 2014 as the backend database
For EFCore you can use this library:
https://github.com/borisdj/EFCore.BulkExtensions
Note: I'm the author of this one.
And for EF 6 this one:
https://github.com/TomaszMierzejowski/EntityFramework.BulkExtensions
Both are extending DbContext with Bulk operations and have the same syntax call:
context.BulkInsert(entitiesList);
context.BulkUpdate(entitiesList);
context.BulkDelete(entitiesList);
EFCore version have additionally BulkInsertOrUpdate method.
Assuming that the classes in apiData are the same as your entities, you should be able to use Attach(newAccount, originalAccount) to update an existing entity.
For bulk inserts I use AddRange(listOfNewEntitities). If you have a lot of entities to insert it is advisable to batch them. Also you may want to dispose and recreate the DbContext on each batch so that it's not using too much memory.
var accounts = new List<Account>();
var context = new YourDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
foreach (var account in apiData)
{
accounts.Add(account);
if (accounts.Count % 1000 == 0)
// Play with this number to see what works best
{
context.Set<Account>().AddRange(accounts);
accounts = new List<Account>();
context.ChangeTracker.DetectChanges();
context.SaveChanges();
context?.Dispose();
context = new YourDbContext();
}
}
context.Set<Account>().AddRange(accounts);
context.ChangeTracker.DetectChanges();
context.SaveChanges();
context?.Dispose();
For bulk updates, there's not anything built in in LINQ to SQL. There are however libraries and solutions to address this. See e.g. Here for a solution using expression trees.
List vs. Dictionary
You check in a list every time if the entity exists which is bad. You should create a dictionary instead to improve performance.
var existingAccounts = _accountRepository.GetAllList().ToDictionary(x => x.AccountID);
Account existingAccount;
if(existingAccounts.TryGetValue(account.AccountId, out existingAccount))
{
// ...code....
}
Add vs. AddRange
You should be aware of Add vs. AddRange performance when you add multiple records.
Add: Call DetectChanges after every record is added
AddRange: Call DetectChanges after all records is added
So at 10,000 entities, Add method have taken 875x more time to add entities in the context simply.
To fix it:
CREATE a list
ADD entity to the list
USE AddRange with the list
SaveChanges
Done!
In your case, you will need to create an InsertRange method to your repository.
EF Extended
You are right. This library updates all data with the same value. That is not what you are looking for.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library may perfectly fit for your enterprise if you want to improve your performance dramatically.
You can easily perform:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
Example:
public void ProcessApiData(List<Account> apiData)
{
// Insert or Update using the primary key (AccountID)
CurrentUnitOfWork.BulkMerge(apiData);
}
I am building a class library for Windows Phone. As the title suggests, all I want to do is to update a record in a SQL Server CE table. All the methods I came across involve fetching the record from database, manually updating the values and then flushing changes back to the database.
Issue here is that, I'm building a library. So I don't know beforehand which columns are going to be there. So, manually updating values is out of question. I want something simpler like UpdateOnSync method which could be used as follows:
Contacts.UpdateOnSubmit(contctToUpdate);
Where Contacts is a table. Is this somehow possible? Apparently, SQLite for Windows Phone, Windows Azure, etc. have update methods and their style is pretty close to LINQ. How do I update I record without manually modifying values?
Update 1: I just ran following code for testing. It didn't work either!
//Get updated items from local to put them in cloud
var newLocalItems = (from TEntity p in localTable
where p.LastModified > currentTimeStamp
select p).ToList();
foreach (var localItem in newLocalItems)
{
if (localItem.RemoteId == 0)
{
localItem.LastSynchronized = DateTime.Now;
connection.SubmitChanges();
}
}
Even after modifying the value manually, the rows in database are unmodified. Does the update work only when a single item is fetched?
Here's how I get connection:
public async void ConfigureSQLCE<T>(T mainDataContext) where T : DataContextBase
{
MainDB = mainDataContext; //MainDB is class level dynamic
}
And here's GetConnection method:
private async Task<DataContextBase> GetConnection()
{
if (!MainDB.DatabaseExists())
{
MainDB.CreateDatabase();
}
return MainDB;
}
Then I obtain connection by var connection = await GetConnection();
The table mentioned above doesn't exist in the class DataContextBase. It rather exists in a class derived from it. How do I go about this situation now?
Update 2: Manually setting property like localItem.LastSynchronized = DateTime.Now started to work magically after I implemented INotifyPropertyChanged in the model.
Still, original question stands, how do I update item?
I am working on a project which involves fetching sales data from a website and store it in DB. I am using Linq-to-SQL. Is there a way such that the value in database is updated only if there is a change in the new data I fetch?
This is what I tried
foreach (var SalesResult in oDailySalesResult)
{
if (SalesResult.DailySalesResultsID == 0)
{
Dc.DailySalesResults.InsertOnSubmit(SalesResult);
Dc.SubmitChanges();
}
else
{
Dc.DailySalesResults.Attach(SalesResult);
Dc.Refresh(RefreshMode.KeepCurrentValues, SalesResult);
Dc.SubmitChanges();
}
}
But this just updates the record even though the record has same data that was in the database. Or do you guys have any other solution for this? Thanks in advance.
If you use Attach with one parameter it will assume it is dirty and needs updating. Because : how can it know otherwise?
There are two other forms of attach:
one takes a bool to indicate clean vs dirty: if you attach it as clean, changes made after the attach will be tracked and committed appropriately
one takes two instances - representing the old values and new values: here again, the differences will be computed and subsequent changes tracked appropriately
In my seed data file, I'm first deleting all entries in a particular entity and then adding new. However I feel like the code for deleting data can be improved (or cleaned).
Currently I'm doing like:
var oldCertCat = context.CertCategoryValues.ToList();
oldCertCat.ForEach(cat => context.CertCategoryValues.Remove(cat));
Next entity:
var oldCertLevel = context.CertLevelValues.ToList();
oldCertLevel.ForEach(certLevel => context.CertLevelValues.Remove(certLevel));
I'm thinking of creating a helper function like:
void DeleteData("EntityName")
{
var oldData = context."EntityName".ToList();
oldData.ForEach(item => context."EntityName".Remove(item));
}
It would be more cleaner this way. Any suggestions?
Deleting a lot of data with EF is very inefficient. It first loads all the entities from the database and then deletes them one by one.
Use raw SQL instead:
void Deletedata(string tableName)
{
context.Database.ExecuteSqlCommand("DELETE FROM {0}", tableName);
}
It will make just one call to the database and let the DB do the entire deletion in one step which is much more efficient. If the tables involved are large it's might be worth using TRUNCATE TABLE instead to gain some performance.
I would suggest using sql here. The way you are doing it is data intensive since EF loads each of those entities to then delete them. Very expensive.
Just write a sql "delete from CertLevelValues" or trunc the table.
Significantly faster especially with a large data set.
You can do it with a generic
void DeleteData<T>() where T : class
{
var oldData = context.Set<T>().ToList();
oldData.ForEach(item => context.Set<T>().Remove(item));
context.SaveChanges();
}
You can then call it like:
DeleteData<User>()
if you want to delete all users then.
Is there any way to insert multiple records without a loop if you already have a collection? Granted there is little performance benefit if you commit outside the loop.
I have a context and ObjectSet<Type>. I also have a collection of Type which I want to swap in or concatenate or intersect or what have you for what's in the table. In other words, I don't want to proceed from the below:
foreach (Type r in Collection)
{
Context.ObjectSet.Add(r);
}
Context.SaveChanges();
You must always use loop. There cannot be any performance benefit in methods like AddCollection or AddRange because these methods usually works on some performance optimization where internal collection is extended for whole range and it is just copied instead of extending collection for each Add call. AddObject does much more then passing data to some internal collection so it still have to process entities one by one.
If you want to optimize performance of database inserts themselves you must move to another solution because EF doesn't have any batch or bulk data modifications. Each record is passed as single insert in separate roundtrip to the database.
You could try creating an extension method to do the looping and adding.
see below.
public static class AddHelper
{
public void AddAll(this ObjectSet<T> objectSet,IEnumerable<T> items)
{
foreach(var item in items)
{
objectSet.AddObject(item);
}
}
}
Then you could use the code below.
IEnumerable<EntityName> list = GetEntities(); //this would typically return your list
Context.ObjectSet.AddAll(list);
I'm looking for the same type of solution.
I have yet to find a way to do it directly with EF, but there is a way to do it. If you create a Stored Procedure that accepts a DataTable as a parameter, you can execute the PROC with Entity Context and pass the table as a parameter.
When you're dealing with thousands or millions of records, this is a lot better than looping through each record to be passed to DB.
Passing Table Valued Parameters to SQL Server Stored Procedures.