Just a bit of an outline of what i am trying to accomplish.
We keep a local copy of a remote database (3rd party) within our application. To download the information we use an api.
We currently download the information on a schedule which then either inserts new records into the local database or updates the existing records.
here is how it currently works
public void ProcessApiData(List<Account> apiData)
{
// get the existing accounts from the local database
List<Account> existingAccounts = _accountRepository.GetAllList();
foreach(account in apiData)
{
// check if it already exists in the local database
var existingAccount = existingAccounts.SingleOrDefault(a => a.AccountId == account.AccountId);
// if its null then its a new record
if(existingAccount == null)
{
_accountRepository.Insert(account);
continue;
}
// else its a new record so it needs updating
existingAccount.AccountName = account.AccountName;
// ... continue updating the rest of the properties
}
CurrentUnitOfWork.SaveChanges();
}
This works fine, however it just feels like this could be improved.
There is one of these methods per Entity, and they all do the same thing (just updating different properties) or inserting a different Entity. Would there be anyway to make this more generic?
It just seems like a lot of database calls, would there be anyway to "Bulk" do this. I've had a look at this package which i have seen mentioned on a few other posts https://github.com/loresoft/EntityFramework.Extended
But it seems to focus on bulk updating a single property with the same value, or so i can tell.
Any suggestions on how i can improve this would be brilliant. I'm still fairly new to c# so i'm still searching for the best way to do things.
I'm using .net 4.5.2 and Entity Framework 6.1.3 with MSSQL 2014 as the backend database
For EFCore you can use this library:
https://github.com/borisdj/EFCore.BulkExtensions
Note: I'm the author of this one.
And for EF 6 this one:
https://github.com/TomaszMierzejowski/EntityFramework.BulkExtensions
Both are extending DbContext with Bulk operations and have the same syntax call:
context.BulkInsert(entitiesList);
context.BulkUpdate(entitiesList);
context.BulkDelete(entitiesList);
EFCore version have additionally BulkInsertOrUpdate method.
Assuming that the classes in apiData are the same as your entities, you should be able to use Attach(newAccount, originalAccount) to update an existing entity.
For bulk inserts I use AddRange(listOfNewEntitities). If you have a lot of entities to insert it is advisable to batch them. Also you may want to dispose and recreate the DbContext on each batch so that it's not using too much memory.
var accounts = new List<Account>();
var context = new YourDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
foreach (var account in apiData)
{
accounts.Add(account);
if (accounts.Count % 1000 == 0)
// Play with this number to see what works best
{
context.Set<Account>().AddRange(accounts);
accounts = new List<Account>();
context.ChangeTracker.DetectChanges();
context.SaveChanges();
context?.Dispose();
context = new YourDbContext();
}
}
context.Set<Account>().AddRange(accounts);
context.ChangeTracker.DetectChanges();
context.SaveChanges();
context?.Dispose();
For bulk updates, there's not anything built in in LINQ to SQL. There are however libraries and solutions to address this. See e.g. Here for a solution using expression trees.
List vs. Dictionary
You check in a list every time if the entity exists which is bad. You should create a dictionary instead to improve performance.
var existingAccounts = _accountRepository.GetAllList().ToDictionary(x => x.AccountID);
Account existingAccount;
if(existingAccounts.TryGetValue(account.AccountId, out existingAccount))
{
// ...code....
}
Add vs. AddRange
You should be aware of Add vs. AddRange performance when you add multiple records.
Add: Call DetectChanges after every record is added
AddRange: Call DetectChanges after all records is added
So at 10,000 entities, Add method have taken 875x more time to add entities in the context simply.
To fix it:
CREATE a list
ADD entity to the list
USE AddRange with the list
SaveChanges
Done!
In your case, you will need to create an InsertRange method to your repository.
EF Extended
You are right. This library updates all data with the same value. That is not what you are looking for.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library may perfectly fit for your enterprise if you want to improve your performance dramatically.
You can easily perform:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
Example:
public void ProcessApiData(List<Account> apiData)
{
// Insert or Update using the primary key (AccountID)
CurrentUnitOfWork.BulkMerge(apiData);
}
Related
I apologise if this has been asked already, I am struggling greatly with the terminology of what I am trying to find out about as it conflicts with functionality in Entity Framework.
What I am trying to do:
I would like to create an application that on setup gives the user to use 1 database as a "trial"/"startup" database, i.e. non-production database. This would allow a user to trial the application but would not have backups etc. in no way would this be a "production" database. This could be SQLite for example.
When the user is then ready, they could then click "convert to production" (or similar), and give it the target of the new database machine/database. This would be considered the "production" environment. This could be something like MySQL, SQLServer or.. whatever else EF connects to these days..
The question:
Does EF support this type of migration/data transfer live? Would it need another app where you could configure the EF source and EF destination for it to then run through the process of conversion/seeding/population of the data source to another data source?
Why I have asked here:
I have tried to search for things around this topic, but transferring/migration brings up subjects totally non-related, so any help would be much appreciated.
From what you describe I don't think there is anything out of the box to support that. You can map a DbContext to either database, then it would be a matter of fetching and detaching entities from the evaluation DbContext and attaching them to the production one.
For a relatively simple schema / object graph this would be fairly straight-forward to implement.
ICollection<Customer> customers = new List<Customer>();
using(var context = new AppDbContext(evalConnectionString))
{
customers = context.Customers.AsNoTracking().ToList();
}
using(var context = new AppDbContext(productionConnectionString))
{ // Assuming an empty database...
context.Customers.AddRange(customers);
}
Though for more complex models this could take some work, especially when dealing with things like existing lookups/references. Where you want to move objects that might share the same reference to another object you would need to query the destination DbContext for existing relatives and substitute them before saving the "parent" entity.
ICollection<Order> orders = new List<Order>();
using(var context = new AppDbContext(evalConnectionString))
{
orders = context.Orders
.Include(x => x.Customer)
.AsNoTracking()
.ToList();
}
using(var context = new AppDbContext(productionConnectionString))
{
var customerIds = orders.Select(x => x.Customer.CustomerId)
.Distinct().ToList();
var existingCustomers = context.Customers
.Where(x => customerIds.Contains(x.CustomerId))
.ToList();
foreach(var order in orders)
{ // Assuming all customers were loaded
var existingCustomer = existingCustomers.SingleOrDefault(x => x.CustomerId == order.Customer.CustomerId);
if(existingCustomer != null)
order.Customer = existingCustomer;
else
existingCustomers.Add(order.Customer);
context.Orders.Add(order);
}
}
This is a very simple example to outline how to handle scenarios where you may be inserting data with references that may, or may not exist in the target DbContext. If we are copying across Orders and want to deal with their respective Customers we first need to check if any tracked customer reference exists and use that reference to avoid a duplicate row being inserted or throwing an exception.
Normally loading the orders and related references from one DbContext should ensure that multiple orders referencing the same Customer entity will all share the same entity reference. However, to use detached entities that we can associate with the new DbContext via AsNoTracking(), detached references to the same record will not be the same reference so we need to treat these with care.
For example where there are 2 orders for the same customer:
var ordersA = context.Orders.Include(x => x.Customer).ToList();
Assert.AreSame(orders[0].Customer, orders[1].Customer); // Passes
var ordersB = context.Orders.Include(x => x.Customer).AsNoTracking().ToList();
Assert.AreSame(orders[0].Customer, orders[1].Customer); // Fails
Even though in the 2nd example both are for the same customer. Each will have a Customer reference with the same ID, but 2 different references because the DbContext is not tracking the references used. One of the several "gotchas" with detached entities and efforts to boost performance etc. Using tracked references isn't ideal since those entities will still think they are associated with another DbContext. We can detach them, but that means diving through the object graph and detaching all references. (Do-able, but messy compared to just loading them detached)
Where it can also get complicated is when possibly migrating data in batches (disposing of a DbContext regularly to avoid performance pitfalls for larger data volumes) or synchronizing data over time. It is generally advisable to first check the destination DbContext for matching records and use those to avoid duplicate data being inserted. (or throwing exceptions)
So simple data models this is fairly straight forward. For more complex ones where there is more data to bring across and more relationships between that data, it's more complicated. For those systems I'd probably look at generating a database-to-database migration such as creating INSERT statements for the desired target DB from the data in the source database. There it is just a matter of inserting the data in relational order to comply with the data constraints. (Either using a tool or rolling your own script generation)
I am creating a WPF app and I have an existing DB that I would like to use and NOT recreate. I will if I have to, but I would rather not. The DB Is sqlite and when I add it to my data later and create a DataModel based on the DB, I get the model and the DB Context, however there are no methods created for CRUD or for instance .ToList() so I can return all of the items on the table.
Do I need to create all of these manually or is there a way to do it like the way that MVC can scaffold?
I am using VS 2017, WPF, EF6 and Sqlite installed with Nu-Get
To answer the question in the title.
No.
There is no click-a-button method of scaffolding out UI like you get with MVC.
If you just deal with a table at a time then you could build a generic repository that returns a List for a given table. That won't save you much coding, but you could do it.
If you made that return an iQueryable rather than just a List then you could "chain" such a query. Linq queries aren't turned into SQL until you force iteration and you can base one on another adding criteria, what to select etc etc for flexibility.
In the body of your post you ask about methods to read and write data. This seems to be almost totally unrelated from the other question because it's data access rather than UI.
"there are no methods created for CRUD or for instance .ToList() so I can return all of the items on the table."
There are methods available in the form of LINQ extension methods.
ToList() is one of these, except it is usual to use async await and ToListAsync.
Where and Select are other extension methods.
You would be writing any model layer that exposed the results of those though.
I'm not clear whether you are just unaware of linq or what, but here's an example query.
var customers = await (from c in db.Customers
orderby c.CustomerName
select c)
.Include(x => x.Orders) //.Include("Orders") alternate syntax
.ToListAsync();
EF uses "lazy loading" of related entities, that Include makes it read the Orders for each customer.
Entity Framework is an Object Relational Mapper
Which means it will Map your C# objects to Tables.
Whenever you are creating a model from bd it will create a Context Class which will in inherit the DbContext. in this class you will find all the tables in DbSet<Tablename> Tablename{get; set;}. Basically, this list contains will the rows. the operation performed on this list will affect the DB on SaveChange method.
Example for CURD
public DbSet<Student> Students { get; set; }
//Create
using (var context = new YourDataContext()) {
var std = new Student()
{
Name = "Aviansh"
};
context.Students.Add(std);
context.SaveChanges();
}//Basically saving it will add a row in student table with name field as avinash
//Delete
using (var context = new YourDataContext()) {
var CurrentStudent=context.Students.FirstOrDefault(x=>x.Name=="Avinash")
CurrentStudent.context.Students.Remove(CurrentStudent);
context.SaveChanges();
}
Note: on SaveChanges the change will reflect on Db
I have used Entity Framework to insert data into SQL tables.
For larger number of records, instead of Add(), I have used AddRange() and called SaveChanges() later.
It's still taking too much time to insert records - are there any solutions to increase the speed?
_Repository.InsertMultiple(deviceDataList);
await _Repository.SaveAsync();
public void InsertMultiple(List<string> deviceDataList)
{
context.Devices.AddRange(devices);
}
Using AddRange over Add is already a great improvement. It fixes the part that's slow in the Application.
However, the SaveChanges still take a lot of time because one database round-trip is made for every entity you save. So if you have 10k entities to insert, 10,000 database round-trip will be made which is INSANELY slow.
Disclaimer: I'm the owner of Entity Framework Extensions
This library is not free but allows you to perform all bulk operations including BulkSaveChanges and BulkInsert:
Bulk SaveChanges
Bulk Insert
Bulk Delete
Bulk Update
Bulk Merge
Example
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Bulk Operations
context.BulkInsert(customers, options => {
options => options.IncludeGraph = true;
});
context.BulkMerge(customers, options => {
options.ColumnPrimaryKeyExpression =
customer => customer.Code;
});
It doesn't matter if you use Add in a foreach or AddRange, problem lies in SaveChanges method, as it stores changes in observed entities one by one I think. There are libraries out there that allows for real bulk insert of entities using under the hood mechanism of SqlBulkCopy
Link to EF Core library: EFCore.BulkExtensions
EDIT:
For EF6 I found this nuget: EntityFramework6.BulkInsert but I haven't personally used it so I can't say anything about it.
EDIT 2: I simplified this, using AddRange over Add will improve time of adding entities to change tracker, but still SaveChanges will could take very long time, so it's not a solution.
I have a model-first, entity framework design like this (version 4.4)
When I load it using code like this:
PriceSnapshotSummary snapshot = db.PriceSnapshotSummaries.FirstOrDefault(pss => pss.Id == snapshotId);
the snapshot has loaded everything (that is SnapshotPart, Quote, QuoteType), except the DataInfo. Now looking into the SQL this appears to be because Quote has no FK to DataInfo because of the 0..1 relationship.
However, I would have expected that the navigation property 'DataInfo' on Quote would still go off to the database to fetch it.
My current work around is this:
foreach (var quote in snapshot.ComponentQuotes)
{
var dataInfo = db.DataInfoes.FirstOrDefault(di => di.Quote.Id == quote.InstrumentQuote.Id);
quote.InstrumentQuote.DataInfo = dataInfo;
}
Is there a better way to achieve this? I thought EF would automatically load the reference?
This problem has to do with how the basic linq building blocks interact with Entity Framework.
take the following (pseudo)code:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This looks OK, but will throw a runtime error, because of an interface called IQueryable.
IQueryable implements IEnumerable and adds info for an expression and a provider. This basically allows it to build and execute sql statements against a database and not have to load whole tables when fetching data and iterating over them like you would over an IEnumerable.
Because linq defers execution of the expression until it's used, it compiles the IQueryable expression into SQL and executes the database query only right before it's needed. This speeds up things a lot, and allows for expression chaining without going to the database every time a Where() or Select() is executed. The side effect is if the object is used outside the scope of db, then the sql statement is executed after db has been disposed of.
To force linq to execute, you can use ToList, like this:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000).ToList();
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will force linq to execute the expression against db and get all addresses with number greater than a thousand. this is all good if you need to access a field within the addresses table, but since we want to get the name of a city (a 1..1 relationship similar to yours), we'll hit another bump before it can run: lazy loading.
Entity framework lazy loads entities by default, so nothing is fetched from the database until needed. Again, this speeds things up considerably, since without it every call to the database could potentially bring the whole database into memory; but has the problem of depending on the context being available.
You could set EF to eager load (in your model, go to properties and set 'Lazy Loading Enabled' to False), but that would bring in a lot of info you probably don't use.
The best fix for this problem is to execute everything inside db's scope:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
addresses.Select(addr => Console.WriteLine(addr.City.Name));
}
I know this is a really simple example but in the real world you can use a DI container like ninject to handle your dependencies and have your db available to you throughout execution of the app.
This leaves us with Include. Include will make IQueryable include all specified relation paths when building the sql statement:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Include("City").Where(addr => addr.Number > 1000).ToList;
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will work, and it's a nice compromise between having to load the whole database and having to refactor an entire project to support DI.
Another thing you can do, is map multiple tables to a single entity. In your case, since the relationship is 1-0..1, you shouldn't have a problem doing it.
As I've mentioned in a couple other questions, I'm currently trying to replace a home-grown ORM with the Entity Framework, now that our database can support it.
Currently, we have certain objects set up such that they are mapped to a table in our internal database and a table in the database that runs our website (which is not even in the same state, let alone on the same server). So, for example:
Part p = new Part(12345);
p.Name = "Renamed part";
p.Update();
will update both the internal and the web databases simultaneously to reflect that the part with ID 12345 is now named "Renamed part". This logic only needs to go one direction (internal -> web) for the time being. We access the web database through a LINQ-to-SQL DBML and its objects.
I think my question has two parts, although it's possible I'm not asking the right question in the first place.
Is there any kind of "OnUpdate()" event/method that I can use to trigger validation of "Should this be pushed to the web?" and then do the pushing? If there isn't anything by default, is there any other way I can insert logic between .SaveChanges() and when it hits the database?
Is there any way that I can specify for each object which DBML object it maps to, and for each EF auto-generated property which property on the L2S object to map to? The names often match up, but not always so I can't rely on that. Alternatively, can I modify the L2S objects in a generic way so that they can populate themselves from the EF object?
Sounds like a job for Sql Server replication.
You don't need to inter-connect the two together as it seems you're saying with question 2.
Just have the two separate databases with their own EF or L2S models and abstract them away using repositories with domain objects.
This is the solution I ended up going with. Note that the implementation of IAdvantageWebTable is inherited from the existing base class, so nothing special needed to be done for EF-based classes, once the T4 template was modified to inherit correctly.
public partial class EntityContext
{
public override int SaveChanges(System.Data.Objects.SaveOptions options)
{
var modified = this.ObjectStateManager.GetObjectStateEntries(EntityState.Modified | EntityState.Added); // Get the list of things to update
var result = base.SaveChanges(options); // Call the base SaveChanges, which clears that list.
using (var context = new WebDataContext()) // This is the second database context.
{
foreach (var obj in modified)
{
var table = obj.Entity as IAdvantageWebTable;
if (table != null)
{
table.UpdateWeb(context); // This is IAdvantageWebTable.UpdateWeb(), which calls all the existing logic I've had in place for years.
}
}
context.SubmitChanges();
}
return result;
}
}