I have a C# program that loads a list of products from a database into a list of Product objects. The user can add new products, edit products, and delete products through my program's interface. Pretty standard stuff. My question relates to tracking those changes and saving them back to the database. Before I get to the details, I know that using something like Entity Framework or NHiberate would solve my problem about tracking adds and edits, but I don't think it would solve my problem about tracking deletes. In addition to wanting an alternative to converting a large codebase to using Entity Framework or NHiberate, I also want to know that answer to this question for my own curiosity.
In order to track edits, I'm doing something like this on the Product class where I set the IsDirty flag any time a property is changed:
class Product
{
public bool IsDirty { get; set; }
public bool IsNew { get; set; }
// If the description is changed, set the IsDirty property
public string Description
{
get
{
return _description;
}
set
{
if (value != _description)
{
this.IsDirty = true;
_description = value;
}
}
}
private string _description;
// ... and so on
}
When I create a new Product object, I set its IsNew flag, so the program knows to write it to the database the next time the user saves. Once I write a product to the database successfully, I clear its IsNew and IsDirty flags.
In order to track deletes, I made a List class that tracks deleted items:
class EntityList<T> : List<T>
{
public List<T> DeletedItems { get; private set; }
EntityList()
{
this.DeletedItems = new List<T>();
}
// When an item is removed, track it in the DeletedItems list
public new bool Remove(T item)
{
this.DeletedItems.Add(item);
base.Remove(item);
}
}
// ...
// When I work with a list of products, I use an EntityList instead of a List
EntityList<Product> products = myRepository.SelectProducts();
Each time I save a list of products to the database, I iterate through all of the products in the EntityList.DeletedItems property and delete those products from the database. Once the list is saved successfully, I clear the DeletedItems list.
All of this works, but it seems like I may be doing too much work, especially to track deleted items and to remember to set the IsNew flag every time I create a new Product object. I can't set the IsNew flag in Product's constructor because I don't want that flag set if I'm loading a Product object from the database. I'm also not thrilled with the fact that I have to use my EntityList class everywhere instead of using List.
It seems like this scenario is extremely common, but I haven't been able to find an elegant way of doing it through my research. So I have two questions:
1) Assuming that I'm not using something like Entity Framework, is there a better way to track adds, edits, and deletes and then persist those changes to the database?
2) Am I correct in saying that even when using Entity Framework or NHiberate, that I'd still have to write some additional code to track my deleted items?
In EF the DbContext object contains all of the logic to track changes to objects that it knows about. When you can SaveChanges it figures out which changes have happened and performs the appropriate actions to commit those changes to the database. You don't need to do anything specific with your object state other than inform the DbContext when you want to add or remove records.
Updates:
When you query a DbSet the objects you get are tracked internally by EF. During SaveChanges the current state of those objects are compared against their original state and those that are changed are put into a queue to be updated in the data.
Inserts:
When you add a new object to the relevant DbSet it is flagged for insertion during the SaveChanges call. The object is enrolled in the change tracking, it's DB-generated fields (auto-increment IDs for instance) are updated, etc.
Deletes:
To delete a record from the database you call Remove on the relevant DbSet and EF will perform that action during the next SaveChanges call.
So you don't need to worry about tracking those changes for the sake of the database, it's all handled for you. You might need to know for your own benefits - it's sometimes nice to be able to color changed records for instance.
The above is also true for Linq2SQL and probably other ORMs and database interface layers, since their main purpose is to allow you to access data without having to write reams of code for doing things that can be abstracted out.
is there a better way to track adds, edits, and deletes and then persist those changes to the database?
Both Entity Framework and NHibernate chose not to make entities themselves responsible for notifying nor tracking their changes*. So this can't be a bad choice. It certainly is a good choice from a design pattern's point of view (single responsibility).
They store snapshots of the data as they are loaded from the database in the context or session, respectively. Also, these snapshots have states telling whether they are new, updated, deleted or unchanged. And there are processes to compare actual values and the snapshots and update the entity states. When it's time to save changes, the states are evaluated and appropriate CRUD statements are generated.
This is all pretty complex to implement all by yourself. And I didn't even mention integrity of entity states and their mutual associations. But of course it's doable, once you decide to follow the same pattern. The advantage of the data layer notifying/tracking changes (and not the entities themselves) is that the DAL know which changes are relevant for the data store. Not all properties are mapped to database tables, but the entities don't know that.
I'd still have to write some additional code to track my deleted items?
No. Both OR mappers have a concept of persistence ignorance. You basically just work with objects in memory, which may encompass removing them from a list (either nested in an owner entity or a list representing a database table) and the ORM knows how to sync the in-memory state of the entities with the database.
*Entity Framework used to have self-tracking entities, but they were deprecated.
Related
I saw a book with some code like this:
public class Order
{
public int OrderID { get; set; }
public ICollection<CartLine> Lines { get; set; }
...
}
public class CartLine
{
public int CartLineID { get; set; }
public Product Product { get; set; }
public int Quantity { get; set; }
}
//Product class is just a normal class that has properties such as ProductID, Name etc
and in the order repository, there is a SaveOrder method:
public void SaveOrder(Order order)
{
context.AttachRange(order.Lines.Select(l => l.Product));
if (order.OrderID == 0)
{
context.Orders.Add(order);
}
context.SaveChanges();
}
and the book says:
when store an Order object in the database. When the user’s cart data is deserialized from the session store, the JSON package creates new objects that are not known to
Entity Framework Core, which then tries to write all the objects into the database. For the Product objects, this means that Entity Framework Core tries to write objects that have already been stored, which causes an error. To avoid this problem, I notify Entity Framework Core that the objects exist and shouldn’t be stored in the database unless they are modified
I'm confused, and have two questions:
Q1-why writing objects that have already been stored will cause an error, in the point of view of underlying database, it's just an update SQL statement that modify all columns to their current values?I know it does unnecessary works by changing nothing and rewrite everything, but it shouldn't throw any error in database level?
Q2-why we don't do the same thing to CartLine as:
context.AttachRange(order.Lines.Select(l => l.Product));
context.AttachRange(order.Lines);
to prevent CartLine objects stored in the database just as the way we do it to Product object?
Okay, so this is gonna be a long one:
1st Question:
In Entity Framework (core or "old" 6), there's this concept of "Change tracking". The DbContext class is capable of tracking all the changes you made to your data, and then applying it in the DB via SQL statements (INSERT, UPDATE, DELETE). To understand why it throws an error in your case, you first need to understand how the DbContext / change tracking actually works. Let's take your example:
public void SaveOrder(Order order)
{
context.AttachRange(order.Lines.Select(l => l.Product));
if (order.OrderID == 0)
{
context.Orders.Add(order);
}
context.SaveChanges();
}
In this method, you receive an Order instance which contains Lines and Products. Let's assume that this method was called from some web application, meaning you didn't load the Order entity from the DB. This is what's know as the Disconected Scenario
It's "disconnected" in the sense that your DbContext is not aware of their existence. When you do context.AttachRange you are literally telling EF: I'm in control here, and I'm 100% sure these entities already exist in the DB. Please be aware of them for now on!,
Let's use your code again: Imagine that it's a new Order (so it will enter your if there) and you remove the context.AttachRange part of the code. As soon as the code reaches the Add and SaveChanges these things will happen internally in the DbContext:
The DetectChanges method will be called
It will try to find all the entities Order, Lines and Products in its current graph
If it doesn't find them, they will be added to the "pending changes" as a new records to be inserted
Then you continue and call SaveChanges and it will fail as the book tells you. Why? Imagine that the Products selected were:
Id: 1, "Macbook Pro"
Id: 2, "Office Chair"
When the DbContext looked at the entities and didn't know about them, it added them to the pending changes with a state of Added. When you call SaveChanges, it issues the INSERT statements for these products based on their current state in the model. Since Id's 1 and 2 already exists in the database, the operation failed, with a Primary Key violation.
That's why you have to call Attach (or AttachRange) in this case. This effectively tells EF that the entities exist in the DB, and it should not try to insert them again. They will be added to the context with a state of Unchanged. Attach is often used in these cases where you didn't load the entities from the dbContext before.
2nd question:
This is hard for me to access because I don't know the context/model at that level, but here's my guess:
You don't need to do that with the Cartline because with every order, you probably want to insert new Order line. Think like buying stuff at Amazon. You put the products in the cart and it will generate an Order, then Order Lines, things that compose that order.
If you were then to update an existing order and add more items to it, then you would run into the same issue. You would have to load the existing CartLines prior to saving them in the db, or call Attach as you did here.
Hope it's a little bit clearer. I have answered a similar question where I gave more details, so maybe reading that also helps more:
How does EF Core Modified Entity State behave?
Rather than deleting an entry from the database, I am planning on using a boolean column like isActive in every table and manage its true/false state.
Normally when you delete a record from the database,
referential integrity is maintained, which means you cannot delete it if before deleting its dependencies.
when you query a deleted record, it returns null
How can I achieve the same results in an automated way using Entity Framework? Because checking isActive field for every entity in every query manually seems too much work which will be error-prone. And the same holds true for marking the dependencies as isActive=false.
EDIT:
My purpose is not limited to point-in-time queries. Let me give an example. UserA posted a photo and UserB wrote a comment on it. Then UserB wanted to delete his account. But the comment has its poster FK pointing at UserB. So, rather than deleting UserB, I want to deactivate its account but keep the record in order not to break dependencies.
And I want to extend this logic to every table in the database. Is that wrong?
As kind of a side answer to this question, instead of querying all of the tables directly why not use Views and then query the views? You can place a filter in the view to only display the "IsActive = true" records, that way you don't have to worry about including it manually in every query (something you mention is error prone).
Because checking isActive field for every entity in every query manually seems too much work which will be error-prone
It is error prone. But you may not always want only the active records (admin page?). You may also not want to soft delete ALL records, as not everything makes sense to keep around (in my experience). You could use an Expression to help you out / wire it up for certain methods / repositories and build dynamic queries.
Expression<Func<MyModel, bool>> IsActive = x => x.IsActive;
And the same holds true for marking the dependencies as isActive=false
A base repository could handle the delete for all your repositories, which would set the status to false (where the BaseModel would have an IsActive property).
public int Delete<TEntity>(long id) where TEntity : BaseModel
{
using (var context = GetContext())
{
var dbEntity = context.Set<TEntity>().Find(id);
dbEntity.IsActive = false;
return context.SaveChanges();
}
}
There is an OSS tool called EF Filters that can achieve what you are looking for: https://github.com/jbogard/EntityFramework.Filters
It let's you set global filters like an IsActive field and would certainly work for queries.
We've been using the Entity framework-code first approach and Fluent Api, and have this requirement, an entity with multiple navigation properties and the possibility of numerous entries.
This entity reflects the data of a process and a field captures whether the entity is active in the process. I've provided an example for this.
public class ProcessEntity
{
//Other properties and Navigation properties
public bool IsInProcess { get; set; }
}
What I've been trying to do is, have an another table could be a mapping table or something that will contain only the ProcessEntity items whose IsInProcess property is set to true, ie.,this table provides the ProcessEntities that are active in the process.
The whole idea and thought behind this segregation is that, a lot of queries and reports are generated only on the items that are still in process and querying the whole table every time with a Where clause would be a performance bottleneck. Please correct me If I'm wrong.
I thought of having a mapping table but the entries have to be manually added and removed based on the condition.
Is there any other solution or alternative design ideas for this requirement?
Consider using an index.
Your second table is what an index would do.
Let the DB do its job.
Given that a boolean isnt a great differentiator, a date or similiar as part of the index may also be useful.
eg How to create index in Entity Framework 6.2 with code first
We have an ASP.NET project with Entity Framework and SQL Azure.
A big part of our data only needs to be updated a few times a day, other data is very volatile.
The data that barely changes we cache in memory at startup, detach from the context and than use it mainly for reading, drastically lowering the amount of database requests we have to do.
The volatile data is requested everytime by a DbContext per Http request.
When we do an update to the cached data, we send a message to all instances to catch a fresh version of all the data from the SQL server.
So far, so good.
Until we introduced a bug that linked one of these 'cached' objects to the 'volatile' data, and did a SaveChanges.
Well, that was quite a mess.
The whole data tree was added again and again by every update, corrupting the whole database with a whole lot of duplicated data.
As a complete hack I added a completely arbitrary column with a UniqueConstraint and some gibberish data on one of the root tables; hopefully failing the SaveChanges() next time we introduce such a bug because it will violate the Unique Constraint.
But it is of course hacky, and I'm still pretty scared ;P
Are there any better ways to prevent whole tree's of cached objects ending up in the database?
More information
Project is ASP.NET MVC
I cache this data, because it is mainly read only, and this saves a tons of extra database calls per http request
This is in a high traffic website, with a lot of personal customized views. Having the POCO data in memory works really good for what I want. Except the problem I mentioned.
It is a bit more complicated, but a simplified version is that I cache the objects by a singleton: so i.e:
EntityCache.Instance.LolCats = new DbContext().LolCats.AsNoTracking().ToList();
This cache I dependency-inject into my controllers.
You can solve it like this:
1) Create an interface like this:
public interface IIsReadOnly
{
bool IsReadOnly { get; set; }
}
2) Implement this interface in all of the entities that can be cached. When you read and cache them, set the IsReadOnly property to true. This flag will be used when SaveChanges is invoked. Remember to decorate this property with the [NotMapped] attribute, or use any other mean to make EF ignore it.
public class ACacheableEntitySample
: IIsReadOnly
{
[NotMapped]
public bool IsReadOnly { get; set; }
// define the "regular" entity properties
}
NOTE: you can include the property directly in the class definition (if using Code First), or use partial classes (for Db First, Model First, or Code First).
NOTE: alternatively you can make EF ignore the IsReadOnly property using the Fluent API, or even better a custom convention (EF 6+)
3) Override your inherited DbContext.SaveChanges method. In the overridden method, review all the entries with pending changes, and if they are read only, change there state to Unchanged:
if (entry is IIsReadOnly) // if it's a cacheable entity
{
if (entry.IsReadOnly) // and it was marked as readonly when caching
{
// change the entry state to unchanged here, so that it's not updated
}
}
NOTE: This is sample code to explain what you need to do. In your final implementation you can do it with a simple LINQ sentence that get all the IIsReadOnly entities, which have the IsReadOnly set to true, and set their state to Unchanged.
You can use the IIsReadOnly entites in another DbContext and manipulate them in the usual way. For example if you get one of these entites, update it, and call SaveChanges, the changes will be saved because IsReadOnly will have the default false value. But you'll easily avoid saving changes of cached data accidentally, simply by setting the IsReadOnly property to true when caching.
Original answer deleted because it was a waste of time.
Your post and proceeding comments are a perfect example of the XY Problem.
You say:
I really need a solution for the problem, not for the architecture
What if the architecture is the problem?
The problem you presented
A caching solution you implemented that violates at least a half dozen best practices has (surprise!) blown up in your face. You've managed to stop it from blowing again up via a spectacular (not in a good way) hack but you want to know how to do it in a way that won't require such a spectacular hack.
The problem you had
You needed to cache some data because it was getting too expensive to hit the database for every request.
The answers that were offered
Use foreign keys instead of navigation properties
This is a perfectly valid answer and, surprise, a best practice. Navigation properties can change any time you regenerate the code in your Entity Data Model and are often ambiguous. With a bit of effort you could have used this and never had to worry about EF's handling of object relationships again.
Cache models instead of Entity objects
Another valid answer, and one that requires the least amount of actual work. MVC applications usually require some redundancy between viewmodels and entity objects and if you ever write a proper multi-tier application you'll practically drown in redundant objects. And nobody will accidentally add these objects to a DbContext ever again - because they can't.
Criticism
You have offered up very little useful information. From what I can tell your approach from the get-go was wrong.
Firstly, dumping whole tables into memory at App_Start is at best a temporary solution. If the table was too big to hit on every request, it's too big to hit on App_Start. What happens if something important breaks while people are using your application and you need to deploy a bug fix ASAP? What happens when your tables get really big and you start getting timeouts from EF while trying to dump them into memory? What happens if 95% of your users only really ever need 10% of that big table you've dumped into memory? Is the memory on your web/cache server going to be enough to accommodate the increasing size of your tables? For how long?
Secondly, no Entity object should remain anywhere after its originating DbContext is disposed. Entity objects behave in a convenient way while their DbContext is in scope and become troublesome POCOs when it's out of scope. I say troublesome because the 'magic' DbContext does with change tracking tends to fool people unfamiliar with the inner workings of EF into thinking that an Entity object is directly connected to a table row in the database. The problem you had illustrates this point perfectly.
Thirdly, it looks like you need to delete and re-dump a whole table to memory, even if you only update a single column in a single row. That's immensely wasteful to both the memory and CPU on your web server, and to your Azure SQL instance(s). What happens when a small bit of data comes in wrong and needs to be updated in a hurry? What if one of your nightly update jobs fails but you need fresh data in the morning?
You may not worry about any of this stuff now but your solution blowing up in your face should have at the very least raised some red flags. I've had to deal with as lot of caching in projects I've worked on in the past few years and everything I say here comes from experience.
Proposed solution - On-demand caching
If you've put a little effort into organizing your code, all of your CRUD operations on the database should be in specialized helper classes which I call repositories. Your controller calls its specialized repository (StuffController - StuffRepository), receives a model and binds that model to a view, kinda like this:
public class StuffController : Controller
{
private MyDbContext _db;
private StuffRepository _repo;
public StuffController()
{
_db = new MyDbContext();
_repo = new StuffRepository(_db);
}
// ...
public ActionResult Details(int id)
{
var model = _repo.ReadDetails(id);
// ...
return View(model);
}
protected override void Dispose(bool disposing)
{
_db.Dispose();
base.Dispose(disposing);
}
}
What on-demand caching would do is wrap that call to the repository in such a way that if the result of that method was already in the cache and it was not stale, it would return it from the cache. Otherwise it would hit the database.
Here's a simplified (and probably nonfunctional) example of a CacheWrapper class so you can understand what it does, using HttpRuntime.Cache:
public static class CacheWrapper
{
private static List<string> _keys = new List<string>();
public static List<string> Keys
{
get { lock(_keys) { return _keys.ToList(); } }
}
public static T Fetch<T>(string key, Func<T> dlgt, bool refresh = false) where T : class
{
var result = HttpRuntime.Cache.Get(key) as T;
if(result != null && !refresh) return result;
lock(HttpRuntime.Cache)
{
lock(_keys)
{
_keys.Add(key);
}
result = dlgt();
HttpRuntime.Cache.Add(key, result, /* some other params */);
}
return result;
}
}
And the new way to call things from the controller:
public ActionResult Details(int id)
{
var model = CacheWrapper.Fetch("StuffDetails_" + id, () => _repo.ReadDetails(id));
// ...
return View(model);
}
A slightly more complex version of this is in production on a public web application as we speak and working quite well.
I want to learn how others cope with the following scenario.
This is not homework or an assignment of any kind. The example classes have been created to better illustrate my question however it does reflect a real life scenario which we would like feedback on.
We retrieve all data from the database and place it into an object. A object represents a single record and if multiple records exist in the database, we place the data into a List<> of the record object.
Lets say we have the following classes;
public class Employee
{
public bool _Modified;
public string _FirstName;
public string _LastName;
public List<Emplyee_Address> _Address;
}
public class Employee_Address
{
public bool _Modified;
public string _Address;
public string _City;
public string _State;
}
Please note that the Getters and Setters have been omitted from the classes for the sake of clarity. Before any code police accuse me of not using them, please note that have been left out for this example only.
The database has a table for Employees and another for Employee Addresses.
Conceptually, what we do is to create a List object that represents the data in the database tables. We do a deep clone of this object which we then bind to controls on the front end. We then have two objects (Orig and Final) representing data from the database.
The user then makes changes to the "Final" object by creating, modifying, deleting records. We then want to persist these changes to the database.
Obviously we want to be as elegant as possible, only editing, creating, deleting those records that require it.
We ultimately want to compare the two List objects so that we can;
See what properties have changed so that the changes can be persisted to the database.
See what properties (records) no longer exist in the second List<> so that these records can be deleted from the database.
See what new properties exist in the new List<> so that we can create these in the database.
Who wants to get the ball rolling on how we can best achieve this. Keep in mind that we also need to drill down into the Employee_Address list to check for any changes, not just the top level properties.
I hope I have made myself clear and look forward to any suggestions.
Add nullable ObjectID field to your layer's base type. Pass it to front end and back to see if particular instance persists in the database.
It also has many other uses even if you don't have any kind of Identity Map
I would do exactly the same thing .NET does in their Data classes, that is keep the record state (System.Data.DataRowState comes to mind) and all associated versions together in one object.
This way:
You can tell at a glance whether it has been modified, inserted, deleted, or is still the original record.
You can quickly find what has been changed by querying the new vs old versions, without having to dig in another collection to find the old version.
You should investigate the use of the Identity Map pattern. Coupled with Unit of Work, this allows you to maintain an object "cache" of sorts from which you can check which objects need saving to the database, and when reading, to return objects from the identity map rather than creating new objects and returning those.
Why would you want to compare two list objects? You will potentially be using a lot of memory for what is essentially duplicate data.
I'd suggest having a status property for each object that can tell you if that particular object is New, Deleted, or Changed. If you want go further than making the property an Enum, you can make it an object that contains some sort of Dictionary that contains the changes to update, though that will most likely apply only in the case of the Changed status.
After you've added such a property, it should be easy to go through your list, add the New objects, remove the Deleted objects etc.
You may want to check how the Entity Framework does this sort of thing as well.