Should I Index my TenantId column?

Should I Index my TenantId column? - c#

I'm using Entity Framework to implement multi-tenant (same db, same schema):
public class Report : ITenantEntity
{
public Report() { }
[Key]
public int ReportId { get; set; }
[Required]
public int TenantId { get; set; }
public DateTime DateSent { get; set; }
}
I don't have a Tenant POCO, just static IDs.
Should I index on TenantId somehow? All queries now involve filtering on TenantId, so I want to make sure I'm not killing performance by not having proper indexes.

Of course as well as you filter your data on tenants, existence of appropriate index is critical for performance. Further depending on database loading and database maintenance procedure could be useful to consider additional optimization measures: make some index clustered, use partitioned tables, use federated database servers etc.
At some moment you might make tenants isolated at the database level and can use parametric views and stored procedures to isolate physical and presentation levels of database. Anyway, optimization of production database lays far away from EF auto-generated database scheme.

Related

Entity design when principal entity state is an aggregate of its dependant entities

Let's define Term as a principal entity and Course as a one to many dependant relationship.
public class Term
{
public int Id { get; set; }
public List<Course> Courses { get; set; } = new List<Course>();
}
public class Course
{
public int Id { get; set; }
public DateTime EndDate { get; set; }
}
A common query criteria for our Term entity is to check whether all of the courses are finished or not and by that we deem the Term as finished or not too, and this implicit state can show itself in a lot of places in our bussiness logic, simple queries to populate view models, etc.
Using ORMs like EntityFramework Core this query can popup in a lot of places.
Terms.Where(t=>t.Courses.All(c=>c.EndTime > DateTime.Now))
Terms.Count(t=>t.Courses.All(c=>c.EndTime > DateTime.Now))
Other examples of this that come to mind are a product and its current inventory count, posts that only contain unconfirmed comments, etc.
What can we consider as best practice if we are to capture these implicit states and make them directly accessible in our principal entity without the need to rehydrate our dependant entity from the database too?
Some solutions that come to mind:
Using a computed column to do a subquery and map it to a property on the principal entity e.g. Term.IsFinished
Defining a normal property on our entity and use a scheduling solution to update its value on predetermined timestamps which is not acceptable in a lot of cases due to inconsistency in different intervals, or use domain events and react upon them to update the property on the principal entity

Create a view, with the two tables joined and aggregated per principal entity.
Use the view directly in Entity Framework instead of the base table.
For bonus points:
In SQL Server you can create a clustered index on the view, and it will be automatically maintained for you. Oracle has a similar concept.
In other RDBMSs, you would need to create a separate table, and maintain it yourself with triggers.

Want to know about database design and best practice of table relationship

At first I want to give an example. Here I will use code first approach to make database tables and their relationship. Please look at the class below. (C#)
public class Blog
{
public int BlogId { get; set; }
public string Name { get; set; }
public virtual List<Post> Posts { get; set; }
}
public class Post
{
public int PostId { get; set; }
public string Title { get; set; }
public string Content { get; set; }
public int BlogId { get; set; }
public virtual Blog Blog { get; set; }
}
You’ll notice that I'm making the two navigation properties (Blog.Posts and Post.Blog) virtual. This enables the Lazy Loading feature of Entity Framework. Lazy Loading means that the contents of these properties will be automatically loaded from the database when you try to access them.
Now Here is my question.
I want to make a Database as like below. The table names will be:
tblCompany
tblSite // Site will be create under Company (A Company will have one or more Sites).
tblLine // Line will be create under Site (A Site will have one or more Lines).
tblMachine // Machine will be create under Line (A Line will have one or more Machines).
So I will create,
Company table and it will have a Company_Id.
.
Then I will create,
Site table and this table will have Site_Id and Company_Id for making relationship between Site Table and Company Table
.
After that when I create Line Table should I user both Company_Id
and Site_Id?
I know I can use only Site_Id and by query I can get the Site which Company belongs to. But what is the best practice? Should I use every Table's Id or I just use Previous Table's Id?
And also provide the class if anyone can.

No, you shouldn't have every table in a hierarchy having every ID from every table above it, because we can use joins to link the tables together in the entire hierarchy chain.
There may be a very limited number of situations where it's specifically advantageous to have a lower level table have the ID of one much further above it, embedded within it but it's typically a developer convenience, when they think "I can't be bothered joining these 27 tables together every time I want to know which machine belongs to which company. I'll just have a companyid in the machine table and I promise I'll keep it updated by some complicated mechanism"..
.. Don't do it.. when you sell a site to another company you have to remember to transfer all the machines to them too, not just by selling the site, but visiting every machine and updating its company ID, otherwise the hierarchy gets messed up
What's the alternative, if your front end app will be querying a million times a second which machines belong to which company, and you don't want the database to have to join 27 tables together, a million times a second, to find this out? Caching; a separate system where you maintain a transient list of machines and companies. Every time you sell something or make a transfer, you invalidate the cache when you update the part of the database hierarchy. Upon next query, the cache misses and shall be rebuilt with the new info. The database only occasionally has to join 27 tables
This is starting to head into an opinion piece, and hence heading out of scope of a SO question/answer, but if you come up against specific problems as you implement your system, feel free to post them up
Ps: don't prefix your tables with tbl; it's obvious what they are. The days of having to give everything a name that included the type of thing it was have thankfully long gone

Should i validate if related entity exists before insert?

class Customer
{
public int Id { get; set; }
}
class Sale
{
public int Id { get; set; }
public int CustomerId { get; set; }
}
class SaleService
{
public void NewSale(Sale sale)
{
//Should i validate if Customer exists by sale.CustomerId before call save?
saleRepository.InsertOrUpdate(sale);
}
}
I'm using Domain Driven Design and Entity Framework.
Should i validate if Customer exists by sale.CustomerId before call save?

I usually don't do that. Normally, these information comes from the client which was loaded before (it existed). However, there are cases that the CustomerId is missing at the time you update your db.
Due to concurrency when many users access the system at the same time. But this case should be solved selectively using optimistic concurrency control (version). We usually don't try to handle concurrency in all the cases as it's almost impossible to do so and doing that also has side effects like performance issues, complexity,... We focus only on some critical code in the system that would cause problems if there is a concurrency issue.
A client tries to hack the system by sending an inappropriate CustomerId. But this is another issue that should be checked based on authorization or something like that.
In most of the cases, I think a foreign key constraint in db is enough.

Entity Framework and Stored Procedures Mismatched Entities/Models

I'm really looking for advice here on best practices so I will explain the situation. We have a fairly large application built on top of POCO and EF 4 with a complicated database. While we have been happy with Entity Framework there are definite performance improvements to be made for example with the following scenario (quite simplified).
We have a table called News which has a collection of users that have added it to their favourites and a collection of ratings (1 - 5) by users for example:
public class News
{
public virtual int NewsId;
public virtual string Title;
.......etc....
public virtual ICollection<User> UserFavourites { get; set; }
public virtual ICollection<Rating> Ratings { get; set; }
}
We have written a stored procedure which returns news for a user and allows us to return whether it is a favourite and whether it has already been rated by the user we are requesting the data for and the current rating for News rather than use EF to build this data from the ICollections and we end up with an object like below.
public class NewsDataModel
{
public int NewsId;
public string Title;
.......etc....
public bool IsFavourite { get; set; }
public bool IsRated { get; set; }
public double Rating { get; set; }
}
The stored procedure is much faster and a single database hit rather than EF with Lazy Loading which could be multiple calls but the data returned by the sproc does not match the POCO class for news which is above.
We have been trying to workout the best way to move forward with this as we have a INewsRepository which can either return the entity framework related class or the custom DataModel class we are populating with a stored procedure and ADO.NET. This doesn't feel right and I would appreciate any advice or insight from others experience about the best way to handle these scenarios when you want a single object with data built from multiple tables which would be a lot faster with a sproc than an entity framework call with lazy loading enabled.
Many thanks for any help

There is nothing wrong with a new method on your repository returning instances of NewsDataModel - it is still in the scope of your INewsRepository because it is data class constructed from news information. Otherwise you will have repository for every data model you defined.

NHibernate and Large Collections

As a way to learn NHibernate, I came up with a small project that includes a typical users & groups authentication system. It got me thinking about how this would be done. I quickly put together the following classes and mapped them to the database, which worked after a lot of trial and error. I ended up with a three-table database schema with a many to many association table between the User and Group tables.
public class User
{
public virtual string Username { get; set; }
public virtual byte[] PasswordHash { get; set; }
public virtual IList<Group> Groups { get; set; }
}
public class Group
{
public virtual string Name { get; set; }
public virtual IList<User> Users { get; set; }
}
My question is regarding the scaleability and potential performance of this sort of class design. If this was in a production system with tens of thousands of users, even with lazy-loading on a Group's Users collection, any call to the Groups property could set off a potentially HUGE data retrieval.
How would NHibernate cope with such a scenario and how might I improve upon my design?

Don't create these as properties. Add functions to these classes which will allow you to fine tune your queries (through the use of parameters) to retrieve the specific data sets you require.

I know this question is old, but just happened to stumble upon it. You state "even with lazy-loading on a Group's Users collection, any call to the Groups property could set off a potentially HUGE data retrieval." Why? Presumably the number of groups is not tens of thousands, and accessing the Groups property on User would only load the Groups collection, not the Users collection within the Groups collection (unless Users wasn't marked to lazy-load). The huge data retrieval would only occur if you accessed the Users collection in Group, in which case I would recommend not having that relationship accessible from the Group mapping.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.