How to implement GetHashCode for NHibernate entity with identity column?

How to implement GetHashCode for NHibernate entity with identity column? - c#

I came to a conclusion that it is impossible to properly implement GetHashCode() for an NHibernate entity with an identity column. The only working solution I found is to return a constant. See below for explanation.
This, obviously, is terrible: all dictionary searches effectively become linear. Am I wrong? Is there a workaround I missed?
Explanation
Let's suppose we have an Order entity that refers to one or more Product entities like this:
class Product
{
public virtual int Id { get; set; } // auto; assigned by the database upon insertion
public virtual string Name { get; set; }
public virtual Order Order { get; set; } // foreign key into the Orders table
}
"Id" is what is called an IDENTITY column in SQL Server terms: an integer key that is automatically generated by the database when the record is inserted.
Now, what options do I have for implementing Product.GetHashCode()? I can base it on
Id value
Name value
Identity of the product object (default behavior)
Each of these ideas does not work. If I base my hash code on Id, it will change when the object is inserted into a database. The following was experimentally shown to break, at least in the presence of NHibernate.SetForNet4:
/* add product to order */
var product = new Product { Name = "Sushi" }; // Id is zero
order.Products.Add(product); // GetHashCode() is calculated based on Id of zero
session.SaveOrUpdate(order);
// product.Id is now changed to an automatically generated value from DB
// product.GetHashCode() value changes accordingly
// order.Products collection does not like it; it assumes GetHashCode() does not change
bool isAdded = order.Products.Contains(product);
// isAdded is false;
// the collection is looking up the product by its new hash code and not finding it
Basing GetHashCode() on the object identity (i.e. leaving Product with default implementation) does not work well either, it was covered on StackOverflow before. Basing GetHashCode() on Name is obviously not a good idea if Name is mutable.
So, what is left? The only thing that worked for me was
class Product
{
...
override public GetHashCode() { return 42; }
}
Thanks for reading through this long quesiton.
Do you have any ideas on how to make it better?
PS. Please keep in mind that this is an NHibernate question, not collections question. The collection type and the order of operations are not arbitrary. They are tied to the way NHibernate works. For instance, I cannot simply make Order.Products to be something like IList. It will have important implications such as requiring an index/order column, etc.

I would base the hashcode (and equality, obviously) on the Id, that's the right thing to do. Your problem stems from the fact that you modify Id while the object is in the Dictionary. Objects should be immutable in terms of hashcode and equality while they are inside a dictionary or hashset.
You have two options -
Don't populate dictionaries or hashsets before storing items in DB
Before saving an object to the DB, remove it from the dictionaries. Save it to the DB and then add it again to the dictionary.
Update
The problem can also be solved by using others mappings
You can use a bag mapping - it will be mapped to an IList and should work OK with you. No need to use HashSets or Dictionaries.
If the DB schema is under your control, you may wish to consider adding an index column and making the relation ordered. This will again be mapped to an IList but will have a List mapping.
There are differences in performance, depending on your mappings and scenarios (see http://nhibernate.info/doc/nh/en/#performance-collections-mostefficientupdate)

Related

Is there a faster way to Search for the existence of an instance within a large collection then using Contains method?

I have a C# console application that saves data to 2 db tables, an entity table and a Relation table. Each entity has Many-To-Many relationships with other entities. The Relations table stores a pair of IDs, which in turn are the primary key of the entity table.
Data in both tables should be unique. Initially I checked for this, prior to INSERTing new individial records in a database stored procedure. When the numbers starting getting larger in both tables (>50k in the entity table and >100k in the Relations table) I noticed that the performance really began to suffer.
I figured that going to the db to carry out checks for duplicate records was net helping performance due to added I/O costs so I refactored my code to, first, read both tables into memory and then carry out the checks there instead. This has increased performance although I suspect that it may still not be ideal. Here's how it looks now:
private IEnumerable<long> _existingUsers = dao.GetUserIds();
private IEnumerable<Relations> _existingRelations = dao.GetRelations();
if (!_existingUsers.Contains(inputModel.ID))
{
// db code to create the new Entity record
}
Relations rel = new Relations { Node = inputModel.Node, Follower = inputModel.ID };
if (!_existingRelations.Contains(rel))
{
// db code to create the new Relation entry
}
Relations class:
public class Relations : IEquatable<Relations>
{
public long Node { get; set; }
public long Follower { get; set; }
public bool Equals(Relations other)
{
return (other.Node == this.Node) && (other.Follower == this.Follower);
}
}
I can see via the Debugger that the majority of the time is now spent determining if the _existingRelations collection in memory Contains the "rel" instance. That in turn repeatedly hits the Equals method of the Relations class.
I suspect there may be a more efficient way to do this but I don't know what that is.

This depends on the concrete implementation of the IEnumerable.
This is what happens when you call contains on a list. Searching in a list always iterated all the list to find an element. So there is no faster way to find one.
If you call this: https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1.contains?view=netcore-3.1 then you'll get an O(1) as is the case for HashSet and dictionaries.
On the downside the hashset is not ordered.

DDD with EF: Collection of Value objects

In a studied domain, a Car may have many Tires and according to DDD concepts, Car is an aggregate root while Tire is a Value Object. '
Consider the following model:
class Car
{
public int Id {get;set;}
public virtual ICollection<Tire> Tires {get;set;}
}
[ComplexType]
class Tier
{
public string Manufacturer {get;set;}
public int Diameter {get;set;}
}
I'm afraid EF 6.0 + cannot implement this model. Am I right? Any way to implement Collection of Value Objects?

Complex types, according to MSDN documentation, cannot participate in associations and cannot contain navigation properties so this is not the right way.
With EF the only way is to have 2 tables (with an Id on the Tires table). You can also hide Id of the Tires table, you can insert a unique index on the foreign key of the Cars table but when you check if two tires are equal you need to check if both properties are equal.

This is a common issue with normalized persistence (which includes the SQL server which you access through EF). Entity framework makes it more difficult by not allowing you to have a protected key though.
One way is to have a Tires table which has an id, which will form part of a foreign key relationship with car. However, the idea of a unique key violates the fact that value objects should not rely on id's and should be compared by value. Being diligent, overriding Equals, and only comparing by value will allow you work with this solution; it does not matter if the actual objects are different as long as the equality comparison would return true if two tires matched. It is not pretty, I agree, but with EF it seems to be the only solution. If I am wrong, please someone correct me. If you go this route, remember to map your domain data to your DTO's in a way that the Id is removed. That way you keep the fact that tires have an id isolated.
Another solution is to serialize the tires using Json before being sent to SQL db (and then deserialize it back on read), but that is not something that I would personally suggest if you need to query on information in the tires (for example give me all cars that use this kind of tyre).
PS: Vaughn Vernon discusses this particular issue with Hibernate for Java in this book: http://www.amazon.co.uk/Implementing-Domain-Driven-Design-Vaughn-Vernon/dp/0321834577 Please read, it will solve a lot of the issues or questions you may have on the subject.

Entity framework 6 code first: what is the best implementation for a baseobject with 10 childobjects

We have a baseobject with 10 childobjects and EF6 code first.
Of those 10 childobjects, 5 have only a few (extra) properties, and 5 have multiple properties (5 to 20).
We implemented this as table-per-type, so we have one table for the base and 1 per child (total 10).
This, however, creates HUGE select queries with select case and unions all over the place, which also takes the EF 6 seconds to generate (the first time).
I read about this issue, and that the same issue holds in the table-per-concrete type scenario.
So what we are left with is table-per-hierachy, but that creates a table with a large number of properties, which doesn't sound great either.
Is there another solution for this?
I thought about maybe skip the inheritance and create a union view for when I want to get all the items from all the child objects/records.
Any other thoughts?

Another solution would be to implement some kind of CQRS pattern where you have separate databases for writing (command) and reading (query). You could even de-normalize the data in the read database so it is very fast.
Assuming you need at least one normalized model with referential integrity, I think your decision really comes down to Table per Hierarchy and Table per Type. TPH is reported by Alex James from the EF team and more recently on Microsoft's Data Development site to have better performance.
Advantages of TPT and why they're not as important as performance:
Greater flexibility, which means the ability to add types without affecting any existing table. Not too much of a concern because EF migrations make it trivial to generate the required SQL to update existing databases without affecting data.
Database validation on account of having fewer nullable fields. Not a massive concern because EF validates data according to the application model. If data is being added by other means it is not too difficult to run a background script to validate data. Also, TPT and TPC are actually worse for validation when it comes to primary keys because two sub-class tables could potentially contain the same primary key. You are left with the problem of validation by other means.
Storage space is reduced on account of not needing to store all the null fields. This is only a very trivial concern, especially if the DBMS has a good strategy for handling 'sparse' columns.
Design and gut-feel. Having one very large table does feel a bit wrong, but that is probably because most db designers have spent many hours normalizing data and drawing ERDs. Having one large table seems to go against the basic principles of database design. This is probably the biggest barrier to TPH. See this article for a particularly impassioned argument.
That article summarizes the core argument against TPH as:
It's not normalized even in a trivial sense, it makes it impossible to enforce integrity on the data, and what's most "awesome:" it is virtually guaranteed to perform badly at a large scale for any non-trivial set of data.
These are mostly wrong. Performance and integrity are mentioned above, and TPH does not necessarily mean denormalized. There are just many (nullable) foreign key columns that are self-referential. So we can go on designing and normalizing the data exactly as we would with a TPH. In a current database I have many relationships between sub-types and have created an ERD as if it were a TPT inheritance structure. This actually reflects the implementation in code-first Entity Framework. For example here is my Expenditure class, which inherits from Relationship which inherits from Content:
public class Expenditure : Relationship
{
/// <summary>
/// Inherits from Content: Id, Handle, Description, Parent (is context of expenditure and usually
/// a Project)
/// Inherits from Relationship: Source (the Principal), SourceId, Target (the Supplier), TargetId,
///
/// </summary>
[Required, InverseProperty("Expenditures"), ForeignKey("ProductId")]
public Product Product { get; set; }
public Guid ProductId { get; set; }
public string Unit { get; set; }
public double Qty { get; set; }
public string Currency { get; set; }
public double TotalCost { get; set; }
}
The InversePropertyAttribute and the ForeignKeyAttribute provide EF with the information required to make the required self joins in the single database.
The Product type also maps to the same table (also inheriting from Content). Each Product has its own row in the table and rows that contain Expenditures will include data in the ProductId column, which is null for rows containing all other types. So the data is normalized, just placed in a single table.
The beauty of using EF code first is we design the database in exactly the same way and we implement it in (almost) exactly the same way regardless of using TPH or TPT. To change the implementation from TPH to TPT we simply need to add an annotation to each sub-class, mapping them to new tables. So, the good news for you is it doesn't really matter which one you choose. Just build it, generate a stack of test data, test it, change strategy, test it again. I reckon you'll find TPH the winner.

Having experienced similar problems myself I've a few suggestions. I'm also open to improvements on these suggestions as It's a complex topic, and I don't have it all worked out.
Entity framework can be very slow when dealing with non-trivial queries on complex entities - ie those with multiple levels of child collections. In some performance tests I've tried it does sit there an awful long time compiling the query. In theory EF 5 and onwards should cache compiled queries (even if the context gets disposed and re-instantiated) without you having to do anything, but I'm not convinced that this is always the case.
I've read some suggestions that you should create multiple DataContexts with only smaller subsets of your database entities for a complex database. If this is practical for you give it a try! But I imagine there would be maintenance issues with this approach.
1) I Know this is obvious but worth saying anyway - make sure you have the right foreign keys set up in your database for related entities, as then entity framework will keep track of these relationships, and be much quicker generating queries where you need to join using the foreign key.
2) Don't retrieve more than you need. One-size fits all methods to get a complex object are rarely optimal. Say you are getting a list of base objects (to put in a list) and you only need to display the name and ID of these objects in the list of the base object. Just retrieve only the base object - any navigation properties that aren't specifically needed should not be retrieved.
3) If the child objects are not collections, or they are collections but you only need 1 item (or an aggregate value such as the count) from them I would absolutely implement a View in the database and query that instead. It is MUCH quicker. EF doesn't have to do any work - its all done in the database, which is better equipped for this type of operation.
4) Be careful with .Include() and this goes back to point #2 above. If you are getting a single object + a child collection property you are best not using .Include() as then when the child collection is retrieved this will be done as a separate query. (so not getting all the base object columns for every row in the child collection)
EDIT
Following comments here's some further thoughts.
As we are dealing with an inheritance hierarchy it makes logical sense to store separate tables for the additional properties of the inheriting classes + a table for the base class. As to how to make Entity Framework perform well though is still up for debate.
I've used EF for a similar scenario (but fewer children), (Database first), but in this case I didn't use the actual Entity framework generated classes as the business objects. The EF objects directly related to the DB tables.
I created separate business classes for the base and inheriting classes, and a set of Mappers that would convert to them. A query would look something like
public static List<BaseClass> GetAllItems()
{
using (var db = new MyDbEntities())
{
var q1 = db.InheritedClass1.Include("BaseClass").ToList()
.ConvertAll(x => (BaseClass)InheritedClass1Mapper.MapFromContext(x));
var q2 = db.InheritedClass2.Include("BaseClass").ToList()
.ConvertAll(x => (BaseClass)InheritedClass2Mapper.MapFromContext(x));
return q1.Union(q2).ToList();
}
}
Not saying this is the best approach, but it might be a starting point?
The queries are certainly quick to compile in this case!
Comments welcome!

With Table per Hierarchy you end up with only one table, so obviously your CRUD operations will be faster and this table is abstracted out by your domain layer anyway. The disadvantage is that you loose the ability for NOT NULL constraints, so this needs to be handled properly by your business layer to avoid potential data integrity. Also, adding or removing entities means that the table changes; but that's also something that is manageable.
With Table per type you have the problem that the more classes in the hierarchy you have, the slower your CRUD operations will become.
All in all, as performance is probably the most important consideration here and you have a lot of classes, I think Table per Hierarchy is a winner in terms of both performance and simplicity and taking into account your number of classes.
Also look at this article, more specifically at chapter 7.1.1 (Avoiding TPT in Model First or Code First applications), where they state: "when creating an application using Model First or Code First, you should avoid TPT inheritance for performance concerns."

The EF6 CodeFirst model I'm working on using generics and an abstract base classes called "BaseEntity". I also use generics and a base class for the EntityTypeConfiguration class.
In the event that I need to reuse a couple of properties "columns" on some tables and it doesn't make sense for them to be on BaseEntity or BaseEntityWithMetaData, I make an interface for them.
E.g. I have one for addresses I haven't finished yet. So if an entity has address information it will implement IAddressInfo. Casting an entity to IAddressInfo will give me an object with just the AddressInfo on it.
Originally I had my metadata columns as their own table. But like others have mentioned, the queries were horrendous, and it was slower than slow. So I thought, why don't I just use multiple inheritance paths to support what I want to do so the columns are on every table that need them, and not on the ones that don't. Also I am using mysql which has a column limit of 4096. Sql Server 2008 has 1024. Even at 1024, I don't see realistic scenarios for going over that on one table.
And non of my objjets inherit in such a way that they have columns they don't need. When that need arises I create a new base class at a level to prevent the extra columns.
Here's are enough snippets from my code to understand how I have my inheritance setup. So far it works really well for me. I haven't really produced a scenario I couldn't model with this setup.
public BaseEntityConfig<T> : EntityTypeConfiguration<T> where T : BaseEntity<T>, new()
{
}
public BaseEntity<T> where T : BaseEntity<T>, new()
{
//shared properties here
}
public BaseEntityMetaDataConfig : BaseEntityConfig<T> where T: BaseEntityWithMetaData<T>, new()
{
public BaseEntityWithMetaDataConfig()
{
this.HasOptional(e => e.RecCreatedBy).WithMany().HasForeignKey(p => p.RecCreatedByUserId);
this.HasOptional(e => e.RecLastModifiedBy).WithMany().HasForeignKey(p => p.RecLastModifiedByUserId);
}
}
public BaseEntityMetaData<T> : BaseEntity<T> where T: BaseEntityWithMetaData<T>, new()
{
#region Entity Properties
public DateTime? DateRecCreated { get; set; }
public DateTime? DateRecModified { get; set; }
public long? RecCreatedByUserId { get; set; }
public virtual User RecCreatedBy { get; set; }
public virtual User RecLastModifiedBy { get; set; }
public long? RecLastModifiedByUserId { get; set; }
public DateTime? RecDateDeleted { get; set; }
#endregion
}
public PersonConfig()
{
this.ToTable("people");
this.HasKey(e => e.PersonId);
this.HasOptional(e => e.User).WithRequired(p => p.Person).WillCascadeOnDelete(true);
this.HasOptional(p => p.Employee).WithRequired(p => p.Person).WillCascadeOnDelete(true);
this.HasMany(e => e.EmailAddresses).WithRequired(p => p.Person).WillCascadeOnDelete(true);
this.Property(e => e.FirstName).IsRequired().HasMaxLength(128);
this.Property(e => e.MiddleName).IsOptional().HasMaxLength(128);
this.Property(e => e.LastName).IsRequired().HasMaxLength(128);
}
}
//I Have to use this pattern to allow other classes to inherit from person, they have to inherit from BasePeron<T>
public class Person : BasePerson<Person>
{
//Just a dummy class to expose BasePerson as it is.
}
public class BasePerson<T> : BaseEntityWithMetaData<T> where T: BasePerson<T>, new()
{
#region Entity Properties
public long PersonId { get; set; }
public virtual User User { get; set; }
public string FirstName { get; set; }
public string MiddleName { get; set; }
public string LastName { get; set; }
public virtual Employee Employee { get; set; }
public virtual ICollection<PersonEmail> EmailAddresses { get; set; }
#endregion
#region Entity Helper Properties
[NotMapped]
public PersonEmail PrimaryPersonalEmail
{
get
{
PersonEmail ret = null;
if (this.EmailAddresses != null)
ret = (from e in this.EmailAddresses where e.EmailAddressType == EmailAddressType.Personal_Primary select e).FirstOrDefault();
return ret;
}
}
[NotMapped]
public PersonEmail PrimaryWorkEmail
{
get
{
PersonEmail ret = null;
if (this.EmailAddresses != null)
ret = (from e in this.EmailAddresses where e.EmailAddressType == EmailAddressType.Work_Primary select e).FirstOrDefault();
return ret;
}
}
private string _DefaultEmailAddress = null;
[NotMapped]
public string DefaultEmailAddress
{
get
{
if (string.IsNullOrEmpty(_DefaultEmailAddress))
{
PersonEmail personalEmail = this.PrimaryPersonalEmail;
if (personalEmail != null && !string.IsNullOrEmpty(personalEmail.EmailAddress))
_DefaultEmailAddress = personalEmail.EmailAddress;
else
{
PersonEmail workEmail = this.PrimaryWorkEmail;
if (workEmail != null && !string.IsNullOrEmpty(workEmail.EmailAddress))
_DefaultEmailAddress = workEmail.EmailAddress;
}
}
return _DefaultEmailAddress;
}
}
#endregion
#region Constructor
static BasePerson()
{
}
public BasePerson()
{
this.User = null;
this.EmailAddresses = new HashSet<PersonEmail>();
}
public BasePerson(string firstName, string lastName)
{
this.FirstName = firstName;
this.LastName = lastName;
}
#endregion
}
Now, code in the context on ModelCreating looks like,
//Config
modelBuilder.Conventions.Remove<PluralizingTableNameConvention>();
//initialize configuration, each line is responsible for telling entity framework how to create relation ships between the different tables in the database.
//Such as Table Names, Foreign Key Contraints, Unique Contraints, all relations etc.
modelBuilder.Configurations.Add(new PersonConfig());
modelBuilder.Configurations.Add(new PersonEmailConfig());
modelBuilder.Configurations.Add(new UserConfig());
modelBuilder.Configurations.Add(new LoginSessionConfig());
modelBuilder.Configurations.Add(new AccountConfig());
modelBuilder.Configurations.Add(new EmployeeConfig());
modelBuilder.Configurations.Add(new ContactConfig());
modelBuilder.Configurations.Add(new ConfigEntryCategoryConfig());
modelBuilder.Configurations.Add(new ConfigEntryConfig());
modelBuilder.Configurations.Add(new SecurityQuestionConfig());
modelBuilder.Configurations.Add(new SecurityQuestionAnswerConfig());
The reason I created base classes for the Configuration of my entities was because when I started down this path I ran into an annoying problem. I had to configure the shared properties for every derrived class over and over again. And if I updated one of the fluent API mappings, I had to update code in every derrived class.
But by using this inheritance method on the configuration classes the two properties are configured in one place, and inherited by the configuration class for derrived entities.
So when PeopleConfig is configured, it runs the logic on the BaseEntityWithMetaData class to configure the two properties, and again when UserConfig runs, etc etc etc.

Three different approaches have different names in M. Fowler's language:
Single Table inheritance - whole inheritance hierarchy held in one table. No joins, optional columns for child types. You need to distinguish which child type it is.
Concrete Table inheritance - you have one table for each concrete type. Joins, no optional columns. In this case, base type table is needed only if the base type requires to have its own mapping (instance can be created).
Class Table inheritance - you have base type table, and child tables - each adding only additional columns to the base's columns. Joins, no optional columns. In this case, base type table always contains row for each child; however, you can retrieve common columns only if no child-specific columns are needed (rest comes with lazy loading maybe?).
All approaches are workable - it only depends on the amount and structure of data you have, so you can measure performance differences first.
Choice will be based on the number of joins vs. data distribution vs. optional columns.
If you don't have (and not going to have) many child types, I would go with class table inheritance since that stands close to the domain and will be easy to translate/map.
If you have many child tables to work with at the same time, and anticipate bottleneck in joins - go with single table inheritance.
If joins are not needed at all and you are going to work with one concrete type at a time - go with concrete table inheritance.

Although, the Table per Hierarchy (TPH) is a better approach for fast CRUD operations, yet in that case it is impossible to avoid a single table with a so many properties for the database created. The case and union clauses that you mentioned are created because the resulting query is effectively requesting a polymorphic result set that includes multiple types.
However, when EF returns flattened table that includes the data for all the types, it does extra work to ensure that, null values are returned for columns that may be irrelevant for a particular type. Technically, this extra validation using case and union is not necessary
The below issue is a performance glitch in Microsoft EF6 and they are are aiming to deliver this fix in a future release.
The below query:
SELECT
[Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Name] AS [Name],
[Extent1].[Address] AS [Address],
[Extent1].[City] AS [City],
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT(([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL ASvarchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN[UnionAll1].[State] END AS [C2],
CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT(([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL ASvarchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))THEN[UnionAll1].[Zip] END AS [C3],
FROM [dbo].[Customers] AS [Extent1]
can be safely replaced by:
SELECT
[Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Name] AS [Name],
[Extent1].[Address] AS [Address],
[Extent1].[City] AS [City],
[UnionAll1].[State] AS [C2],
[UnionAll1].[Zip] AS [C3],
FROM [dbo].[Customers] AS [Extent1]
So, you just saw the problem and the flaw of Entity Framework 6 current release, you have an option to either use a Model First Approach or use a TPH approach.

DbContext corrupts attached entities: why?

I've got some code like this:
Activity[] GetAllActivities() {
using (ScheduleContext ctx = new ScheduleContext())
return ctx.Activities.AsNoTracking().ToArray();
}
The aim to have a very simple in-memory cache of some data: Activities is mapped to a db view which summarizes everything I need.
If I omit AsNoTracking the returned objects are non-deterministically corrupted: properties on the returned objects aren't set correctly, and frequently one object's property value is duplicated in other objects' properties. There's no warning or exception; neither on EF4.3.1 nor EF5rc2. Both CLR 4 and the 4.5 release candidate exhibit the same behavior.
The Activity objects are very simple; consisting solely of non-virtual properties of basic type (int, string, etc.) and have no key nor a relationship with any other entity.
Is this expected behavior? Where can I find documentation about this?
I understand that obviously change tracking cannot work once the relevant DbContext is gone, but I'm surprised the materialized properties are corrupted without warning. I'm mostly worried that I'll forget AsNoTracking somewhere in a more complex scenario and get somewhat plausible but wrong results.
Edit: The entity looks as follows. Thanks Jonathan & Kristof; there is indeed a column that is inferred as the ID!
public class Activity
{
public string ActivityHostKey { get; set; }
public int ActivityDuration { get; set; }
public int ActivityLastedChanged { get; set; }
public string ActivityId { get; set; }//!!!
public string ModuleHostKey { get; set; }
public string ModuleName { get; set; }
...

I think "frequently one object's property value is duplicated in other objects' properties" and that the Activity objects "and have no key" are the key pieces of information here (no pun intended).
When importing a View (which obviously doesn't have a primary key), EF guesses at what the primary key is. If tracking is then enabled, it uses that primary key to make sure only a single copy of each entity is created in memory. This means if you load two rows with the same values for the field EF guessed was the PK, the values for the second row will overwrite the first.
As for the data being "non-deterministically corrupted", that's probably because the database doesn't guarantee the order the rows are returned in, and it's a "last-in-wins" process in EF, so if the order of the records changes from the DB, the record that gets to keep it's values changes too.
Try marking more columns as part of the primary key, or modifying the view (or the DefiningQuery in the EDMX) to contain a column based on the ROW_NUMBER function so you can use it as the primary key.

Generics and database - a design issue

The situation is that I have a table that models an entity. This entity has a number of properties (each identified by a column in the table). The thing is that in the future I'd need to add new properties or remove some properties. The problem is how to model both the database and the corresponding code (using C#) so that when such an occasion appears it would be very easy to just "have" a new property.
In the beginning there was only one property so I had one column. I defined the corresponding property in the class, with the appropriate type and name, then created stored procedures to read it and update it. Then came the second property, quickly copy-pasted, changed name and type and a bit of SQL and there it was. Obviously this is not a suitable model going forward. By this time some of you might suggest an ORM (EF or another) because this will generate the SQL and code automatically but for now this is not an option for me.
I thought of having only one procedure for reading one property (by property name) and another one to update it (by name and value) then some general procedures for reading a bunch or all properties for an entity in the same statement. This may sound easy in C# if you consider using generics but the database doesn't know generics so it's not possible to have a strong typed solution.
I would like to have a solution that's "as strongly-typed as possible" so I don't need to do a lot of casting and parsing. I would define the available properties in code so you don't go guessing what you have available and use magic strings and the like. Then the process of adding a new property in the system would only mean adding a new column to the table and adding a new property "definition" in code (e.g. in an enum).

It sounds like you want to do this:
MyObj x = new MyObj();
x.SomeProperty = 10;
You have a table created for that, but you dont want to keep altering that table when you add
x.AnotherProperty = "Some String";
You need to normalize the table data like so:
-> BaseTable
RecordId, Col1, Col2, Col3
-> BaseTableProperty
PropertyId, Name
-> BaseTableValue
ValueId, RecordId, PropertyId, Value
Your class would look like so:
public class MyObj
{
public int Id { get; set; }
public int SomeProperty { get; set; }
public string AnotherProperty { get; set; }
}
When you create your object from your DL, you enumerate the record set. You then write code once that inspect the property as the same name as your configuration (BaseTableProperty.Name == MyObj.<PropertyName> - and then attempt the type cast to that type as you enumerate the record set.
Then, you simply add another property to your object, another record to the database in BaseTableProperty, and then you can store values for that guy in BaseTableValue.
Example:
RecordId
========
1
PropertyId Name
========== ====
1 SomeProperty
ValueId RecordId PropertyId Value
======= ======== ========== =====
1 1 1 100
You have two result sets, one for basic data, and one joined from the Property and Value tables. As you enumerate each record, you see a Name of SomeProperty - does typeof(MyObj).GetProperty("SomeProperty") exist? Yes? What it it's data type? int? Ok, then try to convert "100" to int by setting the property:
propertyInfo.SetValue(myNewObjInstance, Convert.ChangeType(dbValue, propertyInfo.PropertyType), null);
For each property.

Even if you said you cannot use them, that is what most ORM do. Depending on which one you use (or even create if it's a learning experience), they will greatly vary in complexity and performance. If you prefer a light weight ORM, check Dapper.Net. It makes use of generics as well, so you can check the code, see how it works, and create your own solution if needed.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.