Helper function to remove/delete all entity data, EF - c#

In my seed data file, I'm first deleting all entries in a particular entity and then adding new. However I feel like the code for deleting data can be improved (or cleaned).
Currently I'm doing like:
var oldCertCat = context.CertCategoryValues.ToList();
oldCertCat.ForEach(cat => context.CertCategoryValues.Remove(cat));
Next entity:
var oldCertLevel = context.CertLevelValues.ToList();
oldCertLevel.ForEach(certLevel => context.CertLevelValues.Remove(certLevel));
I'm thinking of creating a helper function like:
void DeleteData("EntityName")
{
var oldData = context."EntityName".ToList();
oldData.ForEach(item => context."EntityName".Remove(item));
}
It would be more cleaner this way. Any suggestions?

Deleting a lot of data with EF is very inefficient. It first loads all the entities from the database and then deletes them one by one.
Use raw SQL instead:
void Deletedata(string tableName)
{
context.Database.ExecuteSqlCommand("DELETE FROM {0}", tableName);
}
It will make just one call to the database and let the DB do the entire deletion in one step which is much more efficient. If the tables involved are large it's might be worth using TRUNCATE TABLE instead to gain some performance.

I would suggest using sql here. The way you are doing it is data intensive since EF loads each of those entities to then delete them. Very expensive.
Just write a sql "delete from CertLevelValues" or trunc the table.
Significantly faster especially with a large data set.

You can do it with a generic
void DeleteData<T>() where T : class
{
var oldData = context.Set<T>().ToList();
oldData.ForEach(item => context.Set<T>().Remove(item));
context.SaveChanges();
}
You can then call it like:
DeleteData<User>()
if you want to delete all users then.

Related

Entity Framework update/insert multiple entities

Just a bit of an outline of what i am trying to accomplish.
We keep a local copy of a remote database (3rd party) within our application. To download the information we use an api.
We currently download the information on a schedule which then either inserts new records into the local database or updates the existing records.
here is how it currently works
public void ProcessApiData(List<Account> apiData)
{
// get the existing accounts from the local database
List<Account> existingAccounts = _accountRepository.GetAllList();
foreach(account in apiData)
{
// check if it already exists in the local database
var existingAccount = existingAccounts.SingleOrDefault(a => a.AccountId == account.AccountId);
// if its null then its a new record
if(existingAccount == null)
{
_accountRepository.Insert(account);
continue;
}
// else its a new record so it needs updating
existingAccount.AccountName = account.AccountName;
// ... continue updating the rest of the properties
}
CurrentUnitOfWork.SaveChanges();
}
This works fine, however it just feels like this could be improved.
There is one of these methods per Entity, and they all do the same thing (just updating different properties) or inserting a different Entity. Would there be anyway to make this more generic?
It just seems like a lot of database calls, would there be anyway to "Bulk" do this. I've had a look at this package which i have seen mentioned on a few other posts https://github.com/loresoft/EntityFramework.Extended
But it seems to focus on bulk updating a single property with the same value, or so i can tell.
Any suggestions on how i can improve this would be brilliant. I'm still fairly new to c# so i'm still searching for the best way to do things.
I'm using .net 4.5.2 and Entity Framework 6.1.3 with MSSQL 2014 as the backend database
For EFCore you can use this library:
https://github.com/borisdj/EFCore.BulkExtensions
Note: I'm the author of this one.
And for EF 6 this one:
https://github.com/TomaszMierzejowski/EntityFramework.BulkExtensions
Both are extending DbContext with Bulk operations and have the same syntax call:
context.BulkInsert(entitiesList);
context.BulkUpdate(entitiesList);
context.BulkDelete(entitiesList);
EFCore version have additionally BulkInsertOrUpdate method.
Assuming that the classes in apiData are the same as your entities, you should be able to use Attach(newAccount, originalAccount) to update an existing entity.
For bulk inserts I use AddRange(listOfNewEntitities). If you have a lot of entities to insert it is advisable to batch them. Also you may want to dispose and recreate the DbContext on each batch so that it's not using too much memory.
var accounts = new List<Account>();
var context = new YourDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
foreach (var account in apiData)
{
accounts.Add(account);
if (accounts.Count % 1000 == 0)
// Play with this number to see what works best
{
context.Set<Account>().AddRange(accounts);
accounts = new List<Account>();
context.ChangeTracker.DetectChanges();
context.SaveChanges();
context?.Dispose();
context = new YourDbContext();
}
}
context.Set<Account>().AddRange(accounts);
context.ChangeTracker.DetectChanges();
context.SaveChanges();
context?.Dispose();
For bulk updates, there's not anything built in in LINQ to SQL. There are however libraries and solutions to address this. See e.g. Here for a solution using expression trees.
List vs. Dictionary
You check in a list every time if the entity exists which is bad. You should create a dictionary instead to improve performance.
var existingAccounts = _accountRepository.GetAllList().ToDictionary(x => x.AccountID);
Account existingAccount;
if(existingAccounts.TryGetValue(account.AccountId, out existingAccount))
{
// ...code....
}
Add vs. AddRange
You should be aware of Add vs. AddRange performance when you add multiple records.
Add: Call DetectChanges after every record is added
AddRange: Call DetectChanges after all records is added
So at 10,000 entities, Add method have taken 875x more time to add entities in the context simply.
To fix it:
CREATE a list
ADD entity to the list
USE AddRange with the list
SaveChanges
Done!
In your case, you will need to create an InsertRange method to your repository.
EF Extended
You are right. This library updates all data with the same value. That is not what you are looking for.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library may perfectly fit for your enterprise if you want to improve your performance dramatically.
You can easily perform:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
Example:
public void ProcessApiData(List<Account> apiData)
{
// Insert or Update using the primary key (AccountID)
CurrentUnitOfWork.BulkMerge(apiData);
}

Entity Framework, Bulk Inserts, and Maintaining Relationships

I have what would seem to be a common problem yet I cannot figure out how to achieve the desired outcome. I have a nested entity with navigation properties defined on it as seen in the following diagram.
The map points collection can potentially be quite large for a given MapLine and there can be quite a large number of MapLines for a MapLayer.
The question here is what is the best approach for getting a MapLayer object inserted into the database using Entity Framework and still maintain the relationships that are defined by the navigation properties?
A standard Entity Framework implementation
dbContext.MapLayers.Add(mapLayer);
dbContext.SaveChanges();
causes a large memory spike and pretty poor return times.
I have tried implementing the EntityFramework.BulkInsert package but it does not honor the relationships of the objects.
This seems like it would be a problem that someone has run into before but I cant seem to find any resources that explain how to accomplish this task.
Update
I have tried to implement the suggestion provided by Richard but I am not understanding how I would go about this for a nested entity such as the one I have described. I am running under the assumption that I need to insert the MapLayer object, then the MapLines, then the MapPoints to honor the PF/FK relationship in the database. I am currently trying the following code but this does not appear to be correct.
dbContext.MapLayers.Add(mapLayer);
dbContext.SaveChanges();
List<MapLine> mapLines = new List<MapLine>();
List<MapPoint> mapPoints = new List<MapPoint>();
foreach (MapLine mapLine in mapLayer.MapLines)
{
//Update the mapPoints.MapLine properties to reflect the current line object
var updatedLines = mapLine.MapPoints.Select(x => { x.MapLine = mapLine; return x; }).ToList();
mapLines.AddRange(updatedLines);
}
using (TransactionScope scope = new TransactionScope())
{
MyDbContext context = null;
try
{
context = new MyDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
int count = 0;
foreach (var entityToInsert in mapLines)
{
++count;
context = AddToContext(context, entityToInsert, count, 100, true);
}
context.SaveChanges();
}
finally
{
if (context != null)
context.Dispose();
}
scope.Complete();
}
Update 2
After having tried multiple different ways to achieve this I finally gave up and just inserted the MapLayer as an entity and stored the MapLines => MapPoints relationship as the raw Json string in a byte array on the MapLayer entity (as I am not querying against those structures this works for me).
As the saying goes "It aint pretty, but it works".
I did have some success with the BulkInsert package and managing the relationships outside of EF, but again ran into a memory problem when trying to use EF to pull the data back into the system. It seems that currently, EF is not capable of handling large datasets and complex relationships efficiently.
I had bad experience with huge context save. All those recommendations about saving in iterations by 100 rows, by 1000 rows, then disposing context or clearing list and detaching objects, assigning null to everything etc etc - it is all bullshit. We had requirements to insert daily millions of rows in many tables. Definitely one should not use entity in these conditions. You will be fighting with memory leaks and decrease in insertion speed when iterations proceed.
Our first improvement was creating stored procedures and adding them to model. It is 100 times faster then Context.SaveChanges(), and there is no leaks, no decrease in speed over time.
But it was not sufficient for us and we decided to use SqlBulkCopy. It is super fast. 1000 times faster then using stored procedures.
So my suggestion will be:
if you have many rows to insert but count is under something like 50000 rows, use stored procedures, imported in model;
if you have hundreds of thousands of rows, go and try SqlBulkCopy.
Here is some code:
EntityConnection ec = (EntityConnection)Context.Connection;
SqlConnection sc = (SqlConnection)ec.StoreConnection;
var copy = new SqlBulkCopy(sc, SqlBulkCopyOptions.CheckConstraints | SqlBulkCopyOptions.Default , null);
copy.DestinationTableName = "TableName";
copy.ColumnMappings.Add("SourceColumn", "DBColumn");
copy.WriteToServer(dataTable);
copy.Close();
If you use DbTransaction with context, you can manage to bulk insert using that transaction as well, but it needs some hacks.
Bulk Insert is not the only way of efficiently adding data using Entity Framework - a number of alternatives are detailed in this answer. You can use the optimisations suggested there (disabling change tracking) then you can just add things as normal.
Note that as you are adding many items at once, you'll need to recreate your context fairly frequently to stop the memory leak and slowdown that you'll get.

Improving efficiency with Entity Framework

I have been using the Entity Framework with the POCO First approach. I have pretty much followed the pattern described by Steve Sanderson in his book 'Pro ASP.NET MVC 3 Framework', using a DI container and DbContext class to connect to SQL Server.
The underlying tables in SQL server contain very large datasets used by different applications. Because of this I have had to create views for the entities I need in my application:
class RemoteServerContext : DbContext
{
public DbSet<Customer> Customers { get; set; }
public DbSet<Order> Orders { get; set; }
public DbSet<Contact> Contacts { get; set; }
...
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<Customer>().ToTable("vw_Customers");
modelBuilder.Entity<Order>().ToTable("vw_Orders");
...
}
}
and this seems to work fine for most of my needs.
The problem I have is that some of these views have a great deal of data in them so that when I call something like:
var customers = _repository.Customers().Where(c => c.Location == location).Where(...);
it appears to be bringing back the entire data set, which can take some time before the LINQ query reduces the set to those which I need. This seems very inefficient when the criteria is only applicable to a few records and I am getting the entire data set back from SQL server.
I have tried to work around this by using stored procedures, such as
public IEnumerable<Customer> CustomersThatMatchACriteria(string criteria1, string criteria2, ...) //or an object passed in!
{
return Database.SqlQuery<Customer>("Exec pp_GetCustomersForCriteria #crit1 = {0}, #crit2 = {1}...", criteria1, criteria2,...);
}
whilst this is much quicker, the problem here is that it doesn't return a DbSet and so I lose all of the connectivity between my objects, e.g. I can't reference any associated objects such as orders or contacts even if I include their IDs because the return type is a collection of 'Customers' rather than a DbSet of them.
Does anyone have a better way of getting SQL server to do the querying so that I am not passing loads of unused data around?
var customers = _repository.Customers().Where(c => c.Location == location).Where(...
If Customers() returns IQueryable, this statement alone won't actually be 'bringing back' anything at all - calling Where on an IQueryable gives you another IQueryable, and it's not until you do something that causes query execution (such as ToList, or FirstOrDefault) that anything will actually be executed and results returned.
If however this Customers method returns a collection of instantiated objects, then yes, since you are asking for all the objects you're getting them all.
I've never used either code-first or indeed even then repository pattern, so I don't know what to advise, other than staying in the realm of IQueryable for as long as possible, and only executing the query once you've applied all relevant filters.
What I would have done to return just a set of data would have been the following:
var customers = (from x in Repository.Customers where <boolean statement> &&/|| <boolean statement select new {variableName = x.Name , ...).Take(<integer amount for amount of records you need>);
so for instance:
var customers = (from x in _repository.Customers where x.ID == id select new {variableName = x.Name} ).take(1000);
then Iterate through the results to get the data: (remember, the linq statement returns an IQueryable)...
foreach (var data in customers)
{
string doSomething = data.variableName; //to get data from your query.
}
hope this helps, not exactly the same methods, but I find this handy in my code
Probably it's because your Cusomters() method in your repository is doing a GetAll() kind of thing and fetching the entire list first. This prohibits LINQ and your SQL Server from creating smart queries.
I don't know if there's a good workaround for your repository, but if you would do something like:
using(var db = new RemoteServerContext())
{
var custs = db.Customers.Where(...);
}
I think that will be a lot quicker. If your project is small enough, you can do without a repository. Sure, you'll lose an abstraction layer, but with small projects this may not be a big problem.
On the other hand, you could load all Customers in your repository once and use the resulting collection directly (instead of the method-call that fills the list). Beware of adding, removing and modifying Customers though.
You need the LINQ query to return less data like sql paging like top function in sql or do manual querying using stored procedures. In either cases, you need to rewrite your querying mechanism. This is one of the reasons why I didn't use EF, because you don't have a lot of control over the code it seems.

inserting multiple records through objectset

Is there any way to insert multiple records without a loop if you already have a collection? Granted there is little performance benefit if you commit outside the loop.
I have a context and ObjectSet<Type>. I also have a collection of Type which I want to swap in or concatenate or intersect or what have you for what's in the table. In other words, I don't want to proceed from the below:
foreach (Type r in Collection)
{
Context.ObjectSet.Add(r);
}
Context.SaveChanges();
You must always use loop. There cannot be any performance benefit in methods like AddCollection or AddRange because these methods usually works on some performance optimization where internal collection is extended for whole range and it is just copied instead of extending collection for each Add call. AddObject does much more then passing data to some internal collection so it still have to process entities one by one.
If you want to optimize performance of database inserts themselves you must move to another solution because EF doesn't have any batch or bulk data modifications. Each record is passed as single insert in separate roundtrip to the database.
You could try creating an extension method to do the looping and adding.
see below.
public static class AddHelper
{
public void AddAll(this ObjectSet<T> objectSet,IEnumerable<T> items)
{
foreach(var item in items)
{
objectSet.AddObject(item);
}
}
}
Then you could use the code below.
IEnumerable<EntityName> list = GetEntities(); //this would typically return your list
Context.ObjectSet.AddAll(list);
I'm looking for the same type of solution.
I have yet to find a way to do it directly with EF, but there is a way to do it. If you create a Stored Procedure that accepts a DataTable as a parameter, you can execute the PROC with Entity Context and pass the table as a parameter.
When you're dealing with thousands or millions of records, this is a lot better than looping through each record to be passed to DB.
Passing Table Valued Parameters to SQL Server Stored Procedures.

How to mass insert/update in linq to sql?

How can I do these 2 scenarios.
Currently I am doing something like this
public class Repository
{
private LinqtoSqlContext dbcontext = new LinqtoSqlContext();
public void Update()
{
// find record
// update record
// save record ( dbcontext.submitChanges()
}
public void Insert()
{
// make a database table object ( ie ProductTable t = new ProductTable() { productname
="something"}
// insert record ( dbcontext.ProductTable.insertOnSubmit())
// dbcontext.submitChanges();
}
}
So now I am trying to load an XML file what has tons of records. First I validate the records one at a time. I then want to insert them into the database but instead of doing submitChanges() after each record I want to do a mass submit at the end.
So I have something like this
public class Repository
{
private LinqtoSqlContext dbcontext = new LinqtoSqlContext();
public void Update()
{
// find record
// update record
}
public void Insert()
{
// make a database table object ( ie ProductTable t = new ProductTable() { productname
="something"}
// insert record ( dbcontext.ProductTable.insertOnSubmit())
}
public void SaveToDb()
{
dbcontext.submitChanges();
}
}
Then in my service layer I would do like
for(int i = 0; i < 100; i++)
{
validate();
if(valid == true)
{
update();
insert()
}
}
SaveToDb();
So pretend my for loop is has a count for all the record found in the xml file. I first validate it. If valid then I have to update a table before I insert the record. I then insert the record.
After that I want to save everything in one go.
I am not sure if I can do a mass save when updating of if that has to be after every time or what.
But I thought it would work for sure for the insert one.
Nothing seems to crash and I am not sure how to check if the records are being added to the dbcontext.
The simple answer is: you do not. Linq2Sql is a lot of things - it is not a replacement for bulk upload / bulk copy. You will be a LOT more efficient using the ETL route:
Generate a flat file (csv etc.) with the new data
Load it into the database using bulk load mechanisms
If the data is updating etc. - load it into temporary tables and use the MERGE command to merge it into the main table.
Linq2Sql will by design always suck in mass insert scenarios. ORM's just are not ETL tools.
Linq2SQL (as has been noted) does not handle this well by default, but luckily there are some solutions out there. here's one i used for a website when i wanted to do some bulk deletes. It worked well for me and due to its use of extension methods it was basically indistinguishable from regular Lin2SQL methods.
I haven't really "released" this project yet, but it's a T4-based repository system that extends Linq To SQL and implements a bunch of batch operations (delete, update, create csv, etc.): http://code.google.com/p/grim-repo/. You can check out the source code and implement it however you see fit.
Also, this link has some great source code for batch operations: http://www.aneyfamily.com/terryandann/post/2008/04/Batch-Updates-and-Deletes-with-LINQ-to-SQL.aspx
And, also, I know it's tempting, but don't crap on the elderly. Try performing batch operations with DataAdapters/ADO.net: http://davidhayden.com/blog/dave/archive/2006/01/05/2665.aspx. It's faster, but inevitably hairier.
Finally, if you have an XML file, you can create a stored procedure that takes advantage of SQL server's built-in sproc, sp_xml_preparedocument. Check out how to use it here: http://msdn.microsoft.com/en-us/library/ms187367.aspx
Even when you add multiple records to the DataContext before calling SubmitChanges, LINQ2SQL will loop through and insert them one by one. You can verify this by implementing one of the partial methods on an entity class ("InsertMyObject(MyObject instance)"). It will be called for each pending row individually.
I don't see anything wrong with your plan -- you say it works, but you just don't know how to verify it? Can't you simply look in the database to check if the records got added?
Another way to see what records are pending in the DataContext and have not yet been added is to call GetChangeSet() on the data context and then refer to the "Inserts" property of the returned object to get a list of rows that will be inserted when SubmitChanges is called.

Categories

Resources