I need to design a windows application which represents multiple "customers" in SQL
Server. Each customer has the same data model, but it's independent.
what will be the Pros/Cons Using multiple databases vs using single database.
which one is the best way to do this work. if going for an single database, what will the steps to do for that.
Edited:
One thing is database will be hosted in cloud(rackspace) account.
Do not store data from multiple customers in the same database -- I have known companies that had to spend a lot of time/effort/money fixing this mistake. I have even known clients to balk at sharing a database computer even though the databases are separate - on the plus side, these clients are generally willing to pay for the extra hardware.
The problems with security alone should prevent you from ever doing this. You will lose large customers because of this.
If you have some customers that are unwilling to upgrade their software, it can be very difficult if you share a single database. Separate databases allow customers to continue using the old database structure until they are ready to upgrade.
You are artificially limiting a natural data partition that could provide significant scalability to your solution. Multiple small customers can still share a database server, they just see their own databases/catalogs, or they can run on separate database servers / instances.
You are complicating your database design because you will have to distinguish customer data that would otherwise be naturally separated, i.e., having to supply CustomerID on each where clause.
You are making your database slower by having more rows in all tables. You will use up database memory more rapidly because CustomerID is now part of every index, and fewer records can be stored in each index node. Your database is also slower due to the loss of the inherent advantage of locality of reference.
Data rollback for 1 customer can be very difficult, maybe even essentially impossible as the database grows - you will need custom procedures to do this that are much slower and resource intensive than a simple and standard restore from backup.
Large databases can be very difficult to backup / restore in a timely manner, possibly requiring additional spending on hardware to make it fast enough.
Your application(s) that use the database will be harder to maintain and test.
Any mistakes can be much more destructive as you can mess up all of your clients by a single mistake.
You prevent the possible performance enhancement of low-latency by forcing your database to a single location. E.g., overseas customer will be using slow, high-latency networks all the time.
You will be known as the stupid DBA, or the unemployed DBA, or maybe both.
There are some advantages to a shared database design though.
Common table schemas, code tables, stored procs, etc. need only be maintained and stored in 1 location.
Licensing costs may be reduced in some cases.
Some maintenance is easier, although almost certainly worse overall using a combined approach.
If all/most of your clients are very small, you can have a low resource utilization by not combining servers (i.e., a relatively high cost). You can mitigate the high cost by combining clients with their permission and explicit understanding, but still use separate databases for larger clients. You definitely need to be explicit and up-front with your clients in this situation.
Except for the server cost sharing, this is a very bad idea still - but cost can be a very important aspect too. This is really the only justification for this approach - avoid this if at all reasonable though. Maybe you would be better off to change a little more for you product, or just not be able to support tiny customers for a cheap price.
Reading an analysis of the recent Atlassian outage reveals that this mistake is precisely why they are having such trouble recovering.
There is a problem, though:
Atlassian can, indeed, restore all data to a checkpoint in a matter of
hours.
However, if they did this, while the impacted ~400 companies would get
back all their data, everyone else would lose all data committed since
that point
So now each customer’s data needs to be selectively restored.
Atlassian has no tools to do this in bulk.
The article also makes it clear that some customers are already migrating away from Atlassian for their OpsGenie product, and will certainly lose future business too. At a minimum, this will be a large problem for their business.
They also messed up big-time by ignoring the customer during this outage.
I'm assuming that by multiple customers you're not just storing customer information, you're hosting databases for an application for the customers, like CRM systems.
If so, then I would absolutely not store everything in the same database.
Reasons:
Backup, when one customer calls and says that he needs to restore a backup because an intern managed to clean out the production database and not the test database, you do not want to have to deal with all the other customers at the same time
Security, even with a bug in the application it won't be able to reach data for other customers. Also, consider if one customer is a bit too relaxed in their own security considerations and leaks passwords or whatnot to the system, if hackers discovers a way into that customers database, consider the fallout if that also includes all other customers you're hosting for.
Politics, some customers will not allow mixing their data with other customers even if you can 100% guarantee that access to their data won't be (accidentally) given to other customers
So bottom line: separate databases.
One day your developer will screw up something and one customer will access info of another customer. You will lose your customers as result. This alone should tell you that multiple customers can't be in one data base. no one will want to be your customer if they know this.
Do I have to really go over all issues that will eventually happen if this is the case? The answer is simple here - NO. You don't want to have information of multiple customers in the same database.
Only time that this happens is if you have multiplexer database to keep track of customer logons, sessions, etc. But data used and stored by customers should be in the dedicated database.
Some of the advantages to each approach to be considered are:
Single Database
Relating data from different services can be bound together by foreign key constraints
Analytic extracts are simpler to write and faster to execute
In the event of a disaster, restoring the platform to a consistent state is easier
For data that is referenced by multiple services, data cached by one service is likely to be used soon after by another service
Administration and monitoring is simpler and cheaper up front
Multiple Databases
Maintenance work, hardware problems, security breaches and so forth do not necessarily impact the whole platform
Assuming each database is on separate hardware, scaling up multiple machines yields more performance benefits than scaling up one big one
Source
Related
I am assessing the feasibility of mapping data from a proprietary DB ( a "Case Management System") to a database that serves as the data source for an automated online form filling product I have created. One proprietary case managemenet system I am targeting is writing in Advantage Database Server, from what I read a very old product, the other is in MS Acceess. My product is written in C#.
There are a plethora of issues, and more than 1 person has advised me it is not feasible. My goal would be to offer my form filling product that would work with the client's existing DB. Replacing the customer's DB would be easier, of course, but these are systems clients have paid alot of money for, learned how to use, and I would expect getting them to DC them for my DB would be close to zero. Like I said, plethora of issues that include:
is ability to query the data in the proprietary product "locked down" - how difficult is it to work around
fact that customer might be potentionally violating the existing license by allowing data to flow to another "product"
Possibility that existing proprietary DB does not include the fields/data I need to complete the online forms.
4, Getting prospective customer to let me poke around their DB.
Any help in thinking this through would be MOST appreciated.
Here are some things that I would think about on a project like this.
1. Is ability to query the data in the proprietary product "locked down" - how difficult is it to work around
If the underlying database engines are Advantage Database Server and MS Access there really isn’t much difficulty to get at the data from a technical perspective. You would just need a user account for the database (probably only read-only) and the ability to see the database server from wherever you are accessing the data from.
I’d see the difficulty more on the network and data security side of the problem. Questions to consider are:
Where is your software being run and how will it get access to those databases?
If it is something that is installed on their local network it would be less of a problem. If it is intended to be external, there would need to be considerations made for the network security policies of your client.
Is any of the data that is being queried getting stored somewhere else?
If so, there are considerations for chain of custody of the data depending on if it gets stored somewhere else, and where that other storage location is.
2. Fact that customer might be potentially violating the existing license by allowing data to flow to another "product"
Not sure here. Really dependent on the specific software being run, and its’ license.
3. Possibility that existing proprietary DB does not include the fields/data I need to complete the online forms.
Is the database schema proprietary to the customer, or some other vendor that the customer has bought the product from? If it is made by another vendor you may be able to install a test version of the software, or may be able to find the documentation of the schema (though probably not likely). Not sure if this would be within the rules of the software license though.
4. Getting prospective customer to let me poke around their DB.
This is really dependent on what rules they have on the data that is hosted in the proprietary database. It might be fine if they already have a process for consultants that help with cases. I would wonder what implications there are if any of the data has special rules associated with it (like HIPAA).
Background
I am developing a C# winforms application - currently up to about 11000 LOC and the UI and logic is about 75% done but there is no persistence yet. There are hundreds of attributes on the forms. There are 23 entities/data classes.
Requirement
The data needs to be kept in an SQL database. Most of the users operate remotely and we cannot rely on them having a connection so we need a solution that maintains a database locally and keeps it in synch with the central database.
Edit: Most of the remote users will only require a subset of the database in their local copy. This is because if they don't have access permissions (as defined and stored in my application) to view other user's records, they will not receive copies of them during synchronisation.
How can I implement this?
Suggested Solution
I could use the Microsoft Entity Framework to create a database and the link between database and code. This would save a lot of manual work as there are hundreds of attributes. I am new to this technology but have done a "hello world" project in it.
For data synch, each entity would have an integer primary key ID. Additionally it would have a secondary ID column which relates to the central database. This secondary column would contain nulls in the central database but would be populated in the local databases.
For synchronisation, I would write code which copies the records and assigns the IDs accordingly. I would need to handle conflicts.
Can anyone foresee any stumbling blocks to doing this? Would I be better off using one of the recommended solutions for data sychronisation, and if so would these work with the entity framework?
Synching data between relational databases is a pain. Your best course of action is probably dependent on: how many users will there be? How probably are conflicts (i.e. that the users will work offline on the same data). Also possibly what kind of manpower do you have (do you have proper DBAs/Sql Server devs standing by to assist with the SQL part, or are you just .NET devs).
I don't envy you this task, it smells of trouble. I'd especially be worried about data corruption and spreading that corruption to all clients rapidly. I'd put extreme countermeasures in place before any data in the remote DB gets updated.
If you predict a lot of conflicts - the same chunk of data gets modified many times by multiple users - I'd probably at least consider creating an additional 'merge' layer to figure out, what is the correct order of operations to perform on the remote db.
One thought - it might be very wrong and crazy, but just the thing that popped in my mind - would be to use JSON Patch on the entities, be it actual domain objects or some configuration containers. All the changes the user makes are recorded as JSON Patch statements, then applied to the local db, and when the user is online - submitted - with timestamps! - to merge provider. The JSON Patch statements from different clients could be grouped by the entity id and sorted by timestamp, and user could get feedback on what other operations from different users are queued - and manually make amends to it. Those grouped statments could be even stored in a files in a git repo. Then at some pre-defined intervals, or triggered manually, the update would be performed on a server-side app and saved to the remote db. After this the users local copies would be refreshed from server.
It's just a rough idea, but I think that you need something with similar capability - it doesn't have to be JSON Patch + Git, you can do it in probably hundreds of ways. I don't thing though, that you will get away with just going through the local/remote db and making updates/merges. Imagine the scenario, where user updates some data (let's say, 20 fields) offline, another makes completely different updates to 20 fields, and 10 of those are common between the users. Now, what should the synch process do? Apply earlier and then latter changes? I'm fairly certain that both users would be furious, because their input was 'atomic' - either everything is changed, or nothing is. The latter 'commit' must be either rejected, or users should have an option to amend it in respect of the new data. That highly depends what your data is, and as I said - what will be number/behaviour of users. Duh, even time-zones become important here - if you have users all in one time-zone you might get away with having predefined times of day when system synchs - but no way you'll convince people with many different business hours that the 'synch session' will happen at e.g. 11 AM, when they are usually giving presentation to management or sth ;)
I am working on a inventory app using c# and entity framework code first approach.
One of the design requirements is that user should be able to create multiple companies and each company should have a full set of inventory master tables.
For example each company should have its own stock journal and list of items. There would also be a way to combine these companies in future to form like a 'group' company, essentially merging the data.
Using one of the file based RDBMS like sqlite, its very simple, I would just need to create a separate sqlite database for each company and then a master database to tie it all together. However how should I go about doing it in a single database file! not multiple file databases.
I do not want to have a 'company' column on every table!
The idea that I had given my limited knowledge of DB's is to separate using different schemas. One schema for each company with the same set of tables in each schema, with a separate schema holing the common tables and tables to tie up the other schemas together. Is that a good approach? Because I am having a hard time finding a way to 'dynamically' create schemas using ef and code first.
Edit #1
To get an idea of the number of companies, one enterprise has about 4-5 companies, and each financial year the old companies are closed off and a fresh set of companies created. It is essentially good to maintain data for multiple years in the same file but it is not required as long as I can provide a separate module to load data for several years, from several of the db files to facilitate year on year analysis.
As far as size of individual companies data, it can hit the GB mark per company.
Schema changes quite frequently at least on the table level as it will be completely customizable by the user.
I guess one aspect that drives my question is the implementation of this design. If it is a app with discrete desktop interface and implementation and I have my on RDBMS server like SQL Server the number of databases do not matter that much. However for a web-based UI hosted on third party and using their database server, the number of databases available will be limited. The only solution to that would be to use serverless database like SQLite.
But as far as general advice goes, SQLite is not advised for large enterprise class databases.
You've provided viable solutions, and even some design requirements, but it's difficult to advise "what's best" without knowing the base requirements like:
How many companies now - and can be reasonably expected in the future
How many tables per instance
How many records per 'large' table, per company
How likely are things to change frequently, dataschema-wise
With that in mind, off to some general opinion on your solutions. First off, considering the design requirements, it would make sense to consider using seperate databases per company. This would seperate your data and allow for example roles and security quite easily to be defined on a database level. Considering you explicitely mention you could "make it simple" using this approach, you could just create a database (file) per company. With your data access layer through Entity Framework you could also easily change connection strings between databases, and even merge data from A=>B using this. I see no particular reason, besides a possible risk in maintaining and updating different instances, why this shouldn't be a solution to consider.
On the other hand, using the one-big-database-for-all approach, isn't bad by definition either. The domain of maintenance becomes more compact and easily approachable. One way to seperate data is to use different database schemas, as you suggest yourself. However, database schemas are primarily intended to seperate the accessability on a role based level. For example, a backoffice employee e.g. user role should only communicate to the "financial" schema, whilst the dbo can talk to pretty much anything. You could extend this approach on a company base, seeing a company as a "user", but think of the amount of tables you would get if you have to create more and more companies. This would make your database huge. Therefor, in my opinion, not the best approach.
Finally, I'm intrigued by your statement "I do not want to have a 'company' column on every table". In my opinion, you should consider this as well. Having a discriminator property, like the companyId column on several tables are pretty easy to abstract using Entity Framework (or any ORM for that matter). This is what the concept of foreign keys is all about. Also, it would give you the advantage of indexing this column for performance. Your only consideration in this approach would be to make sure you provide this 'company discriminator' on all relevant tables.
The latter would be quite simple to enforce using EF Code First if you use a contract for each seperate data class to inherit from:
interface IMyTableName {
int companyId;
}
Just my quick thoughts, though.
I agree with Moriarty for the most part. Our company chose the one database per company approach, and we're paying for it every time we want to do a schema change. Since our deployments are automated, they should all be the same, but there are small differences each time. Moreover, these databases are really independent, so it's hard to keep our backups in sync as well.
It has been painful working with all these databases. The only plus side is that we can spread them out over multiple servers to increase performance. So I'm going to cast my vote for the one big database design.
I can't decide whether to keep the help desk application in the same database as the rest of the corporate applications or completely separate it.
The help desk application can log support request from a phone call, email, website.
We can get questions sent to us from registered customers and non-registered customers.
The only reason to keep the help desk application in the same database is so that we can share the user base. But then again we can have the user create a new account for support or sync the user accounts with the help desk application.
If we separate the help desk application, our database backup will be smaller. Or we can just keep the help desk application in the same database, which makes development/integration a lot easier overall, having only one database to backup. (Maybe larger but still one database with everything.)
What to do?
I think this is a subjective answer, but I would keep the help desk system as a separate entity, unless there is a good business reason to use the same user base.
This is mostly based on what I've seen in professional helpdesk call logging/ticket software, but I do have another compelling reason - security - logic is as follows:
Generally, a helpdesk ticketing system generally needs less sensitive information than other business system (accounting, shopping, CRM, etc). Your technicians will likely need to know how to contact a customer, but probably won't need to store full addresses, birth dates, etc. All of the following is based on an assumption - that your existing customer data contains sensitive or personally identifiable data that would not be needed by your ticketing system.
Principle 1: Reducing the attack surface area by limiting the stored data. Generally, I subscribe to the principle that you should ONLY collect the data you absolutely need. Having less sensitive information available means less that an attacker can steal.
Principle 2: Reducing the surface area by minimizing avenues of attack into existing sensitive data. Assuming you already have a large user base, and assuming that you're already storing potentially useful data about your customers, adding another application with hooks into that data is just adding further avenues of attack into the existing customer base. This leads me to...
Principle 3: Least privilege. The user you set up for the helpdesk software database should have access ONLY to the data absolutely needed by your helpdesk analysts. Accomplishing this is easier if you design your database with a specific set of needs in mind. It's a lot more difficult from a maintenance standpoint to have to set up views and stored procedures over sensitive data in order to only allow access to the non-sensitive data than it is to have a database designed to have only the data that you need.
Of course, I may be over-thinking it. And there are other compelling reasons for going either route. I'm just trying to give you something to think about.
This will definitely be a subjective answer based upon your environment. You have to weigh the benefits/drawbacks of one choice with the benefits/drawbacks of the other choice. However, my opinion would be that the best benefits will be found in separating the two databases. I really don't like to have one database with two purposes. Instead look to create a database with one purpose only. Here are the benefits I see to doing this:
Portability - if you decide to move the helpdesk to a different server, you can without issue. The same is true if you want to move the corporate database somewhere else
Separation of concerns - each database is designed for its own purpose. The security of one won't interfere with the security of the other.
Backup policies - Currently, you can only have one backup policy for both systems since they are in the same database. If you split them, you could back up one more often than the other (and the backup would be smaller/faster).
The drawbacks I see (not being able to access the corporate data as easily) actually come out as a positive in my mind. Accessing the data from the corporate database sounds good but it can be a security issue (also a maintainability issue). Instead, this way you can limit how much access (and what type of access) is granted to the helpdesk system. Databases can access each other fairly easily so it won't be that inconvenient and it will allow you to add a nice security barrier between your corporate data and your helpdesk data.
I have a C# application that allows one user to enter information about customers and job sites. The information is very basic.
Customer: Name, number, address, email, associated job site.
Job Site: Name, location.
Here are my specs I need for this program.
No limit on amount of data entered.
Single user per application. No concurrent activity or multiple users.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
Allows for user queries to display customers based on different combinations of customer information/job site information.
The data will never be viewed or manipulated outside of the application.
The program will be running almost always, minimized to the task bar.
Startup time is not very important, however I would like the queries to be considerably fast.
This all seems to point me towards a database, but a very lightweight one. However I also need it to have no limitations as far as data storage. If you agree I should use a database, please let me know what would be best suited for my needs. If you don't think I should use a database, please make some other suggestions on what you think would be best.
My suggestion would be to use SQLite. You can find it here: http://sqlite.org/. And you can find the C# wrapper version here: http://sqlite.phxsoftware.com/
SQLite is very lightweight and has some pretty powerful stuff for such a lightweight engine. Another option you can look into is Microsoft Access.
You're asking the wrong question again :)
The better question is "how do I build an application that lets me change the data storage implementation?"
If you apply the repository pattern and properly interface it you can build interchangable persistence layers. So you could start with one implementation and change it as-needed wihtout needing to re-engineer the business or application layers.
Once you have a repository interface you could try implementations in a lot of differnt approaches:
Flat File - You could persist the data as XML, and provided that it's not a lot of data you could store the full contents in-memory (just read the file at startup, write the file at shutdown). With in-memory XML you can get very high throughput without concern for database indexes, etc.
Distributable DB - SQLite or SQL Compact work great; they offer many DB benefits, and require no installation
Local DB - SQL Express is a good middle-ground between a lightweight and full-featured DB. Access, when used carefully, can suffice. The main benefit is that it's included with MS Office (although not installed by default), and some IT groups are more comfortable having Access installed on machines than SQL Express.
Full DB - MySql, SQL Server, PostGreSQL, et al.
Given your specific requirements I would advise you towards an XML-based flat file--with the only condition being that you are OK with the memory-usage of the application directly correlating to the size of the file (since your data is text, even with the weight of XML, this would take a lot of entries to become very large).
Here's the pros/cons--listed by your requirements:
Cons
No limit on amount of data entered.
using in-memory XML would mean your application would not scale. It could easily handle a 10MB data-file, 100MB shouldn't be an issue (unless your system is low on RAM), above that you have to seriously question "can I afford this much memory?".
Pros
Single user per application. No concurrent activity or multiple users.
XML can be read into memory and held by the process (AppDomain, really). It's perfectly suited for single-user scenarios where concurrency is a very narrow concern.
Allow user entries/data to be exported to an external file that can be easily shared between applications/users.
XML is perfect for exporting, and also easy to import to Excel, databases, etc...
Allows for user queries to display customers based on different combinations of customer information/job site information.
Linq-to-XML is your friend :D
The data will never be viewed or manipulated outside of the application.
....then holding it entirely in-memory doesn't cause any issues
The program will be running almost always, minimized to the task bar.
so loading the XML at startup, and writing at shutdown will be acceptible (if the file is very large it could take a while)
Startup time is not very important, however I would like the queries to be considerably fast
Reading the XML would be relatively slow at startup; but when it's loaded in-memory it will be hard to beat. Any given DB will require that the DB engine be started, that interop/cross-process/cross-network calls be made, that the results be loaded from disk (if not cached by the engine), etc...
It sounds to me like a database is 100% what you need. It offers both the data storage, data retrieval (including queries) and the ability to export data to a standard format (either direct from the database, or through your application.)
For a light database, I suggest SQLite (pronounced 'SQL Lite' ;) ). You can google for tutorials on how to set it up, and then how to interface with it via your C# code. I also found a reference to this C# wrapper for SQLite, which may be able to do much of the work for you!
How about SQLite? It sounds like it is a good fit for your application.
You can use System.Data.SQLite as the .NET wrapper.
You can get SQL Server Express for free. I would say the question is not so much why should you use a database, more why shouldn't you? This type of problem is exactly what databases are for, and SQL Server is a very powerful and widely used database, so if you are going to go for some other solution you need to provide a good reason why you wouldn't go with a database.
A database would be a good fit. SQLite is good as others have mentioned.
You could also use a local instance of SQL Server Express to take advantage of improved integration with other pieces of the Microsoft development stack (since you mention C#).
A third option is a document database like Raven which may fit from the sounds of your data.
edit
A fourth option would be to try Lightswitch when the beta comes out in a few days. (8-23-2010)
/edit
There is always going to be a limitation on data storage (the empty space of the hard disk). According to wikipedia, SQL Express is limited to 10 GB for SQL Server Express 2008 R2