Database schema for multiple companies - c#

I am working on a inventory app using c# and entity framework code first approach.
One of the design requirements is that user should be able to create multiple companies and each company should have a full set of inventory master tables.
For example each company should have its own stock journal and list of items. There would also be a way to combine these companies in future to form like a 'group' company, essentially merging the data.
Using one of the file based RDBMS like sqlite, its very simple, I would just need to create a separate sqlite database for each company and then a master database to tie it all together. However how should I go about doing it in a single database file! not multiple file databases.
I do not want to have a 'company' column on every table!
The idea that I had given my limited knowledge of DB's is to separate using different schemas. One schema for each company with the same set of tables in each schema, with a separate schema holing the common tables and tables to tie up the other schemas together. Is that a good approach? Because I am having a hard time finding a way to 'dynamically' create schemas using ef and code first.
Edit #1
To get an idea of the number of companies, one enterprise has about 4-5 companies, and each financial year the old companies are closed off and a fresh set of companies created. It is essentially good to maintain data for multiple years in the same file but it is not required as long as I can provide a separate module to load data for several years, from several of the db files to facilitate year on year analysis.
As far as size of individual companies data, it can hit the GB mark per company.
Schema changes quite frequently at least on the table level as it will be completely customizable by the user.
I guess one aspect that drives my question is the implementation of this design. If it is a app with discrete desktop interface and implementation and I have my on RDBMS server like SQL Server the number of databases do not matter that much. However for a web-based UI hosted on third party and using their database server, the number of databases available will be limited. The only solution to that would be to use serverless database like SQLite.
But as far as general advice goes, SQLite is not advised for large enterprise class databases.

You've provided viable solutions, and even some design requirements, but it's difficult to advise "what's best" without knowing the base requirements like:
How many companies now - and can be reasonably expected in the future
How many tables per instance
How many records per 'large' table, per company
How likely are things to change frequently, dataschema-wise
With that in mind, off to some general opinion on your solutions. First off, considering the design requirements, it would make sense to consider using seperate databases per company. This would seperate your data and allow for example roles and security quite easily to be defined on a database level. Considering you explicitely mention you could "make it simple" using this approach, you could just create a database (file) per company. With your data access layer through Entity Framework you could also easily change connection strings between databases, and even merge data from A=>B using this. I see no particular reason, besides a possible risk in maintaining and updating different instances, why this shouldn't be a solution to consider.
On the other hand, using the one-big-database-for-all approach, isn't bad by definition either. The domain of maintenance becomes more compact and easily approachable. One way to seperate data is to use different database schemas, as you suggest yourself. However, database schemas are primarily intended to seperate the accessability on a role based level. For example, a backoffice employee e.g. user role should only communicate to the "financial" schema, whilst the dbo can talk to pretty much anything. You could extend this approach on a company base, seeing a company as a "user", but think of the amount of tables you would get if you have to create more and more companies. This would make your database huge. Therefor, in my opinion, not the best approach.
Finally, I'm intrigued by your statement "I do not want to have a 'company' column on every table". In my opinion, you should consider this as well. Having a discriminator property, like the companyId column on several tables are pretty easy to abstract using Entity Framework (or any ORM for that matter). This is what the concept of foreign keys is all about. Also, it would give you the advantage of indexing this column for performance. Your only consideration in this approach would be to make sure you provide this 'company discriminator' on all relevant tables.
The latter would be quite simple to enforce using EF Code First if you use a contract for each seperate data class to inherit from:
interface IMyTableName {
int companyId;
}
Just my quick thoughts, though.

I agree with Moriarty for the most part. Our company chose the one database per company approach, and we're paying for it every time we want to do a schema change. Since our deployments are automated, they should all be the same, but there are small differences each time. Moreover, these databases are really independent, so it's hard to keep our backups in sync as well.
It has been painful working with all these databases. The only plus side is that we can spread them out over multiple servers to increase performance. So I'm going to cast my vote for the one big database design.

Related

Is this a good approach to select the database based on the user?

I'm developing a system using ASP.NET MVC + WebAPI + AngularJS which has the following property: users can log in and different users have totally different data. The reason is simple: the system allows management of data, but although the schema is the same for everyone, the data is totally disconnected between users. Even because of organization, consistency and security, each user would need one separate database.
This gives rise to a problem: although every single database should be the same, i.e., same tables and columns, and hence same EF Data Context, the connections are different. This confuses me because I'm used to specify the connection string on the config XML file and this couldn't be done here, since the connection string would be dynamic.
I've then thought about a solution, which I doesn't now if it's the best one: I create one repository, which in it's construction receives the username of the logged in user. Then, the repository goes on the database of the system and looks for the connection data for that logged in user (this data would be informed when the user registers). Then the repository builds then connection string and feeds it into the DbContext.
Is this a good approach to this problem? Or there are more recommended ways to deal with this kind of thing? Security is one important concern here, and because of that I'm unsure of my approach.
Each Data Context in an Entity Framework solution has a constructor overload that allows you to specify a connection string. You can find out how to build and use that connection string at the link below.
Reference
How to: Build an EntityConnection Connection String
That said, unless you have very special requirements, it's much better from a maintenance and operational standpoint to simply put a UserID in the appropriate tables, and filter on the currently logged in User ID.
One database per user seems like a crazy solution to me.
Include a user_id column in tables that contain per user data and filter on it appropriately.
I think this depends on how complex your per-user database will be. If we are talking about 5-10 tables, then it is easier to add, manage and query addition ID column for all tables, than it is to manage multiple databases. But the more complex the model will get and the more tables will be in the database, then it becomes much easier just to have one database per user compared to having one more column for each table and having to add the user checks to all your queries. Especially the more complicated ones.
Same goes for performance. If you are expecting user databases to grow large in volume of data, then having separate databases could allow you to scale horizontally by putting different databases into different servers.
Shared database will also pose a problem if user requests raw access into their data. And it can happen, for example when they want to migrate the data.

Application DAL design

Hello and thanks for looking.
I have a DAL question for an application I'm working on. The app is going to extract some data from 5-6 tables from a production RDBMS that serves a much more critical role in the org. What the app has to do is use the data in these tables, analyze, apply some business logic/rules and then present.
The restrictions are that since the storage model is critical in nature to the org, I need to restrict how the app will request the data. Since the tables are relatively small, I created my data access to use DataTables to load the entirety of the db tables on a fixed interval using a timer.
My questions are really around my current design and the potential use of EF or LINQtoSQL
Can EF/LS work around the restrictions of the RDBMS. Most tutorials I've seen, the storage exists solely for the application. Can access to the storage be controlled and/or can EF use DataTables rather than An RDBMS?
Since the entirety of the tables are going to be loaded, is there a best practice for creating classes to consume the data within these tables? I will have to do in memory joins and querying/logic to get at the actual data I need.
Sorry if I'm being generic. I'm more just looking for thoughts and opinions as opposed to a solution to my problem. Please done hesitate to share your thoughts. Thanks.
For your first question, yes Entity Framework can use a existing DB as it's source, the term to search for when looking for Entity Framework tutorials on this topic is called "Database First"
For your second question let me first preface it with a warning: many ORMs are not designed around using it to load the entire data table and do bulk operations on them, especially if you will be modifying the result set and pushing the data back to the server in large quanties. The updates will be row based not set based because you did the modifications in C# code, not in a T-SQL query. Most ORMs are built around the expectation that you will be doing CRUD operations on the row level, not ETL operations or set level CRUD operations (except for Read which most ORMs will do as a set operation).
If you will not be updating the data, only pulling out using Entity Framework and building reports and whatnot off of the data you should be fine. If you are bulk inserting in to the database, things get more problematic. See this SO question for more information.

Inheritance within generated LINQ classes?

I have a data warehouse system that relies on LINQ to SQL for its database abstraction.
To cut a long story short, I have a 2011 database which contains many records for this year, I also have a database for each of 2009 and 2010.
These are all located on different servers, but it does not seem to be a problem to have classes within my dbml from different servers.
The problem I have, is that there is table overlap between the two, for example, there is for example a list of customers in both databases. I don't want to have two customer classes in my LINQ to SQL generated code, but would much rather have some sort of inheritance.
I'm struggling to explain the problem maybe, can someone offer any help with how I can have a single class representing multiple tables? I would like to stick to DRY principles.
AK
I think if both database have same table structure, you don't need two classes for both, only the connection string will decide which database the class belongs to.

Database Design In SQL Server or C#?

Should a database be designed on SQL Server or C#?
I always thought it was more appropriate to design it on SQL Server, but recently I started reading a book (Pro ASP.NET MVC Framework) which, to my understanding, basically says that it's probably a better idea to write it in C# since you will be accessing the model through C#, which does make sense.
I was wondering what everyone else's opinion on this matter was...
I mean, for example, do you consider "correct" having a table that specifies constants (like an AccessLevel table that is always supposed to contain
1 Everyone
2 Developers
3 Administrators
4 Supervisors
5 Restricted
Wouldn't it be more robust and streamlined to just have an enum for that same purpose?
A database schema should be designed on paper or with an ERD tool.
It should be implemented in the database.
Are you thinking about ORMs like Entity Framework that let you use code to generate the database?
Personally, I would rather think through my design on paper before committing it to a DB myself. I would be happy to use an ORM or class generator from this DB later on.
Before VS.NET 2010 I was using SQL Server Management Studio to design my databases, now I am using EF 4.0 designer, for me it's the best way to go.
If your problem domain is complex or its complexity grows as the system evolves you'll soon discover you need some meta data to make life easier. C# can be a good choice as a host language for such stuff as you can utilize its type-system to enforce some invariants (like char-columns length, null/not null restrictions or check-constraints; you can declared it as consts, enums, etc). Unfortunately i don't know utilities (sqlmetal.exe can export some meta but only as xml) that can do it out of the box, although some CASE tools probably can be customized. I'd go for some custom-made generator to produce the db schema from C# (just a few hours work comparing to learning, for example, customization options offered by Sybase PowerDesigner).
ORMs have their place, that place is NOT database design. There are many considerations in designing a database that need to be thought through not automatically generated no matter how appealing the idea of not thinking about design might be. There are often many things that need to be considered that have nothing to do with the application, things like data integrity, reporting, audit tables and data imports. Using an ORM to create a database that looks like an object model may not be the best design for performance and may not have the the things you really need in terms of data integrity. Remember even if you think nothing except the application will touch the database ever, this is not true. At some point the data base will need to have someone do a major data revision (to fix a problem) that is done directly on the database not through the application. At somepoint you are going to need need to import a million records from some other company you just bought and are goping to need an ETL process outside teh application. Putting all your hopes and dreams for the database (as well as your data integrity rules) is short-sighted.

pluggable data store architectures

I have a pluggable system management tool. The architecture of this kind of thing is well understood (interfaces, publish/ subscribe, ....). How about the data store though. What do people do?
I need plugins to be able to add new entities, extend existing entities, establish new relationships, etc.
My thoughts (SQL), not necessarily well thought out
each plugin simply extends the schema when they are installed. In the old days changing the schema was a big no-no; now databases are very relaxed about this
plugins have their own tables. If 2 of them have an entity (say) person, then there are 2 tables p1_person and p2_person
plugins have their own database
invent some sort of flexible scheme where the tables are softly typed. Maybe many attributes packed into a single attribute. The ultimate is to have one big table called data, with key of table name & column name and a single data value.
Not SQL
object DB. I have no experience with these. Anybody care to pass on experience. db4o for example. Can I change the 'schema' of objects as the app evolves
NO-SQL
this is 'where its at' at the moment. Most of these seem to be aimed slightly differently than my needs. Anybody want to pass on experience with these
Apologies for the open ended question
My suggestion is go read about the entity framework
a lot of the situations you are describing can be solved (very elegantly) using table inheritance.
Your idea of one big table called data makes the hamsters in my computer cry ;)
The general trend is away from weakly typed schemas because they cannot be debugged at compile time. What you get from something like entity framework is a strongly typed extenislbe schema that you can code against using linq.
Object databases:
like you i havent played with them massivley - however the time when i was considering them was a time when there was no good ORM for .net and writing ado.net code was slowly killing me.
as for NO-SQL these are databases that meet a performance need. SQL performs badly in situations here there are lots of small writes occuring. I say badly tounge in cheek - it performs very well but when you scale to millions of concurrent users everything changes. My understanding of no sql is that it is a non rationalised format designed for lots of small fast writes and reads. The scale of sites that use these is usually very large.
OK - in response
I am currently lucky enough to be on a green field project so i am using EF to generate my schema.
On non greenfield projects I use sql scripts to update my table structures. As for implementing table inheritance in sql its very easy once you know the concept, its essentially a one to many relationship with a constraint that it will only ever be 0-1.
I wouldn't write .net code that updates the database structure ... that sounds like a disaster waiting to happen to me.
Beginning to think i have misunderstood what you are looking for. I find databases to be second nature as I have spent so long with them.
I haven't found a replacement for being meticulous about script management.

Categories

Resources