How to implement a C# Winforms database application with synchronisation? - c#

Background
I am developing a C# winforms application - currently up to about 11000 LOC and the UI and logic is about 75% done but there is no persistence yet. There are hundreds of attributes on the forms. There are 23 entities/data classes.
Requirement
The data needs to be kept in an SQL database. Most of the users operate remotely and we cannot rely on them having a connection so we need a solution that maintains a database locally and keeps it in synch with the central database.
Edit: Most of the remote users will only require a subset of the database in their local copy. This is because if they don't have access permissions (as defined and stored in my application) to view other user's records, they will not receive copies of them during synchronisation.
How can I implement this?
Suggested Solution
I could use the Microsoft Entity Framework to create a database and the link between database and code. This would save a lot of manual work as there are hundreds of attributes. I am new to this technology but have done a "hello world" project in it.
For data synch, each entity would have an integer primary key ID. Additionally it would have a secondary ID column which relates to the central database. This secondary column would contain nulls in the central database but would be populated in the local databases.
For synchronisation, I would write code which copies the records and assigns the IDs accordingly. I would need to handle conflicts.
Can anyone foresee any stumbling blocks to doing this? Would I be better off using one of the recommended solutions for data sychronisation, and if so would these work with the entity framework?

Synching data between relational databases is a pain. Your best course of action is probably dependent on: how many users will there be? How probably are conflicts (i.e. that the users will work offline on the same data). Also possibly what kind of manpower do you have (do you have proper DBAs/Sql Server devs standing by to assist with the SQL part, or are you just .NET devs).
I don't envy you this task, it smells of trouble. I'd especially be worried about data corruption and spreading that corruption to all clients rapidly. I'd put extreme countermeasures in place before any data in the remote DB gets updated.
If you predict a lot of conflicts - the same chunk of data gets modified many times by multiple users - I'd probably at least consider creating an additional 'merge' layer to figure out, what is the correct order of operations to perform on the remote db.
One thought - it might be very wrong and crazy, but just the thing that popped in my mind - would be to use JSON Patch on the entities, be it actual domain objects or some configuration containers. All the changes the user makes are recorded as JSON Patch statements, then applied to the local db, and when the user is online - submitted - with timestamps! - to merge provider. The JSON Patch statements from different clients could be grouped by the entity id and sorted by timestamp, and user could get feedback on what other operations from different users are queued - and manually make amends to it. Those grouped statments could be even stored in a files in a git repo. Then at some pre-defined intervals, or triggered manually, the update would be performed on a server-side app and saved to the remote db. After this the users local copies would be refreshed from server.
It's just a rough idea, but I think that you need something with similar capability - it doesn't have to be JSON Patch + Git, you can do it in probably hundreds of ways. I don't thing though, that you will get away with just going through the local/remote db and making updates/merges. Imagine the scenario, where user updates some data (let's say, 20 fields) offline, another makes completely different updates to 20 fields, and 10 of those are common between the users. Now, what should the synch process do? Apply earlier and then latter changes? I'm fairly certain that both users would be furious, because their input was 'atomic' - either everything is changed, or nothing is. The latter 'commit' must be either rejected, or users should have an option to amend it in respect of the new data. That highly depends what your data is, and as I said - what will be number/behaviour of users. Duh, even time-zones become important here - if you have users all in one time-zone you might get away with having predefined times of day when system synchs - but no way you'll convince people with many different business hours that the 'synch session' will happen at e.g. 11 AM, when they are usually giving presentation to management or sth ;)

Related

Connecting to an horizontally scaled SQL database in different scenarios

I have built a web service in C#/WebApi2. It is completely REST based, and scales horizontally very easily with a load balancer in front of it since it has no state itself.
However, I'm looking for info/solution on how to handle the database scalability, and I would like to start without focusing on any particular technology, more specific, I would like to use Dapper ORM In combination with multiple DB's if possible.
For example, I can connect to a PostgreSQL using Dapper and the NGPSQL ADO.NET driver, but, are there components which handle the case of having one master PGSQL database and four slaves to read from? Are there already C# components that handle these situations, where you can have connections to all of these DB's and depending on the operation it chooses either the master in case of write actions or slave in case of read and load balances over the slaves (since the number of reads will be significantly higher than the writes, this would be a fairly good solution).
What if I have a master - master situation? And what about similar situations with other DB's such as MS SQL with AlwaysOn for example, or MySQL cluster and it's variations? Is there any components to handle this kind of thing, and if not, does anybody have any pointers on documentation/lectures/blogs/tutorials on this topic. I cannot imagine I'm the first one to encounter this, and writing a completely custom made connection pool might be just re-inventing the wheel...
I know it is a general question, but I have the feeling there should have been done work regarding this topic, I just can find it. I know in cloud scenarios, Azure and AWS, you have solutions for this a specific load balancers, but, I would need this for an on-premise solution as well. Any info would be appreciated.
One way to scale a database horizontally is to split your database into multiple databases - each having different set of data. Something like this:
Meta database (that has info on user, etc)
- Database 1 (has data for first 100000 users)
- Database 2 (has data for next 100000 users)
- Database 3 (has data for next 100000 users)
Your API requests would route the query to the respective database based on info from Meta database.
This provides for scalability but not availability. Many multi-tenant SAAS apps use this structure.
Some references:
http://jamesgolick.com/2010/3/30/what-does-scalable-database-mean.html
https://developer.salesforce.com/page/Multi_Tenant_Architecture

Pros/Cons Using multiple databases vs using single database

I need to design a windows application which represents multiple "customers" in SQL
Server. Each customer has the same data model, but it's independent.
what will be the Pros/Cons Using multiple databases vs using single database.
which one is the best way to do this work. if going for an single database, what will the steps to do for that.
Edited:
One thing is database will be hosted in cloud(rackspace) account.
Do not store data from multiple customers in the same database -- I have known companies that had to spend a lot of time/effort/money fixing this mistake. I have even known clients to balk at sharing a database computer even though the databases are separate - on the plus side, these clients are generally willing to pay for the extra hardware.
The problems with security alone should prevent you from ever doing this. You will lose large customers because of this.
If you have some customers that are unwilling to upgrade their software, it can be very difficult if you share a single database. Separate databases allow customers to continue using the old database structure until they are ready to upgrade.
You are artificially limiting a natural data partition that could provide significant scalability to your solution. Multiple small customers can still share a database server, they just see their own databases/catalogs, or they can run on separate database servers / instances.
You are complicating your database design because you will have to distinguish customer data that would otherwise be naturally separated, i.e., having to supply CustomerID on each where clause.
You are making your database slower by having more rows in all tables. You will use up database memory more rapidly because CustomerID is now part of every index, and fewer records can be stored in each index node. Your database is also slower due to the loss of the inherent advantage of locality of reference.
Data rollback for 1 customer can be very difficult, maybe even essentially impossible as the database grows - you will need custom procedures to do this that are much slower and resource intensive than a simple and standard restore from backup.
Large databases can be very difficult to backup / restore in a timely manner, possibly requiring additional spending on hardware to make it fast enough.
Your application(s) that use the database will be harder to maintain and test.
Any mistakes can be much more destructive as you can mess up all of your clients by a single mistake.
You prevent the possible performance enhancement of low-latency by forcing your database to a single location. E.g., overseas customer will be using slow, high-latency networks all the time.
You will be known as the stupid DBA, or the unemployed DBA, or maybe both.
There are some advantages to a shared database design though.
Common table schemas, code tables, stored procs, etc. need only be maintained and stored in 1 location.
Licensing costs may be reduced in some cases.
Some maintenance is easier, although almost certainly worse overall using a combined approach.
If all/most of your clients are very small, you can have a low resource utilization by not combining servers (i.e., a relatively high cost). You can mitigate the high cost by combining clients with their permission and explicit understanding, but still use separate databases for larger clients. You definitely need to be explicit and up-front with your clients in this situation.
Except for the server cost sharing, this is a very bad idea still - but cost can be a very important aspect too. This is really the only justification for this approach - avoid this if at all reasonable though. Maybe you would be better off to change a little more for you product, or just not be able to support tiny customers for a cheap price.
Reading an analysis of the recent Atlassian outage reveals that this mistake is precisely why they are having such trouble recovering.
There is a problem, though:
Atlassian can, indeed, restore all data to a checkpoint in a matter of
hours.
However, if they did this, while the impacted ~400 companies would get
back all their data, everyone else would lose all data committed since
that point
So now each customer’s data needs to be selectively restored.
Atlassian has no tools to do this in bulk.
The article also makes it clear that some customers are already migrating away from Atlassian for their OpsGenie product, and will certainly lose future business too. At a minimum, this will be a large problem for their business.
They also messed up big-time by ignoring the customer during this outage.
I'm assuming that by multiple customers you're not just storing customer information, you're hosting databases for an application for the customers, like CRM systems.
If so, then I would absolutely not store everything in the same database.
Reasons:
Backup, when one customer calls and says that he needs to restore a backup because an intern managed to clean out the production database and not the test database, you do not want to have to deal with all the other customers at the same time
Security, even with a bug in the application it won't be able to reach data for other customers. Also, consider if one customer is a bit too relaxed in their own security considerations and leaks passwords or whatnot to the system, if hackers discovers a way into that customers database, consider the fallout if that also includes all other customers you're hosting for.
Politics, some customers will not allow mixing their data with other customers even if you can 100% guarantee that access to their data won't be (accidentally) given to other customers
So bottom line: separate databases.
One day your developer will screw up something and one customer will access info of another customer. You will lose your customers as result. This alone should tell you that multiple customers can't be in one data base. no one will want to be your customer if they know this.
Do I have to really go over all issues that will eventually happen if this is the case? The answer is simple here - NO. You don't want to have information of multiple customers in the same database.
Only time that this happens is if you have multiplexer database to keep track of customer logons, sessions, etc. But data used and stored by customers should be in the dedicated database.
Some of the advantages to each approach to be considered are:
Single Database
Relating data from different services can be bound together by foreign key constraints
Analytic extracts are simpler to write and faster to execute
In the event of a disaster, restoring the platform to a consistent state is easier
For data that is referenced by multiple services, data cached by one service is likely to be used soon after by another service
Administration and monitoring is simpler and cheaper up front
Multiple Databases
Maintenance work, hardware problems, security breaches and so forth do not necessarily impact the whole platform
Assuming each database is on separate hardware, scaling up multiple machines yields more performance benefits than scaling up one big one
Source

Spreading of business logic between DB and client

Ok guys, another my question is seems to be very widely asked and generic. For instance, I have some accounts table in my db, let say it would be accounts table. On client (desktop winforms app) I have appropriate functionality to add new account. Let say in UI it's a couple of textboxes and one button.
Another one requirement is account uniqueness. So I can't add two same accounts. My question is should I check this account existence on client (making some query and looking at result) or make a stored procedure for adding new account and check account existence there. As it for me, it's better to make just a stored proc, there I can make any needed checks and after all checks add new account. But there is pros and cons of that way. For example, it will be very difficult to manage languagw of messages that stored proc should produce.
POST EDIT
I already have any database constraints, etc. The issue is how to process situation there user is being add an existence account.
POST EDIT 2
The account uniqueness is exposed as just a simple tiny example of business logic. My question is more abour handling complicated business logic on that accounts domain.
So, how can I manage this misunderstanding?
I belive that my question is basic and has proven solution. My tools are C#, .NET Framework 2.0. Thanks in advance, guys!
If the application is to be multi-user ( i.e. not just a single desktop app with a single user, but a centralised DB with the app acting as clients maybe on many workstations), then it is not safe to rely on the client (app) to check for such as uniqueness, existance, free numbers etc as there is a distinct possibility of change happening between calls (unless read locking is used, but this often become more of an issue than a help!).
There is the ability of course to precheck and then recheck (pre at app level, re at DB), but of course this would give extra DB traffic, so depends on whether it is a problem for you.
When I write SPROCs that will return to an app, I always use the same framework - I include parameters for a return code and message and always populate them. Then I can use standard routines to call them and even add in the parameters automatically. I can then either display the message directly on failure, or use the return code to localize it as required (or automate a response). I know some DBs (like SQL Svr) will return Return_Code parameters, but I impliment my own so I can leave inbuilt ones for serious system based errors and unexpected failures. Also allows me to have my own numbering systems for return codes (i.e. grouping them to match Enums in the code and/or grouping by severity)
On web apps I have also used a different concept at times. For example, sometimes a request is made for a new account but multiple pages are required (profile for example). Here I often use a header table that generates a hidden user ID against the requested unique username, a timestamp and someway of recognising them (IP Address etc). If after x hours it is not used, the header table deletes the row freeing up the number (depending on DB the number may never become useable again - this doesn;t really matter as it is just used to keep the user data unique until application is submitted) and the username. If completed correctly, then the records are simply copied across to the proper active tables.
//Edit - To Add:
Good point. But account uniqueness is just a very tiny simple sample.
What about more complex requirements for accounts in business logic?
For example, if I implement in just in client code (in winforms app) I
will go ok, but if I want another (say console version of my app or a
website) kind of my app work with this accounts I should do all this
logic again in new app! So, I'm looking some method to hold data right
from two sides (server db site and client side). – kseen yesterday
If the requirement is ever for mutiuse, then it is best to separate it. Putting it into a separate Class Library Project allows the DLL to be used by your WinForm, Console program, Service, etc. Although I would still prefer rock-face validation (DB level) as it is closest point in time to any action and least likely to be gazzumped.
The usual way is to separate into three projects. A display layer [DL] (your winform project/console/Service/etc) and Business Application Layer [BAL] (which holds all the business rules and calls to the DAL - it knows nothing about the diplay medium nor about the database thechnology) and finally the Data Access Layer [DAL] (this has all the database calls - this can be very basic with a method for insert/update/select/delete at SQL and SPROC level and maybe some classes for passing data back and forth). The DL references only the BAL which references the DAL. The DAL can be swapped for each technology (say change from SQL Server to MySQL) without affecting the rest of the application and business rules can be changed and set in the BAL with no affect to the DAL (DL may be affected if new methods are added or display requirement change due to data change etc). This framework can then be used again and again across all your apps and is easy to make quite drastic changes to (like DB topology).
This type of logic is usually kept in code for easier maintenance (which includes testing). However, if this is just a personal throwaway application, do what is most simple for you. If it's something that is going to grow, it's better to put things practices in place now, to ease maintenance/change later.
I'd have a AccountsRepository class (for example) with a AddAcount method that did the insert/called the stored procedure. Using database constraints (as HaLaBi mentioned), it would fail on trying to insert a duplicate. You would then determine how to handle this issue (passing a message back to the ui that it couldn't add) in the code. This would allow you to put tests around all of this. The only change you made in the db is to add the constraint.
Just my 2 cents on a Thrusday morning (before my cup of green tea). :)
i think the answer - like many - is 'it depends'
for sure it is a good thing to push logic as deeply as possible towards the database. This prevent bad data no matter how the user tries to get it in there.
this, in simple terms, results in applications that TRY - FAIL - RECOVER when attempting an invalid transaction. you need to check each call(stored proc, or triggered insert etc) and IF something bad happens, recover from that condition. Usually something like tell the user an issue occurred, reset the form or something, and let them try again.
i think at a minimum, this needs to happen.
but, in addition, to make a really nice experience for the user, the app should also preemptively check on certain data conditions ahead of time, and simply prevent the user from making bad inserts in the first place.
this is of course harder, and sometimes means double coding of business rules (one in the app, and one in the DB constraints) but it can make for a dramatically better user experience.
The solution is more of being methodical than technical:
Implement - "Defensive Programming" & "Design by Contract"
If the chances of a business-rule being changed over time is very less, then apply the constraint at database-level
Create a "validation or rules & aggregation layer (or class)" that will manage such conditions/constraints for entity and/or specific property
A much smarter way to do this would be to make a user-control for the entity and/or specific property (in your case the "Account-Code"), which would internally use the "validation or rules & aggregation layer (or class)"
This will allow you to ensure a "systematic-way-of-development" or a more "scalable & maintainable" application-architecture
If your application is a website then along with placing the validation on the client-side it is always better to have validation even in the business-layer or C# code as well
When ever a validation would fail you could implement & use a "custom-error-message" library, to ensure message-content is standard across the application
If the errors are raised from database itself (i.e., from stored-procedures), you could use the same "custom-error-message" class for converting the SQL Exception to the fixed or standardized message format
I know that this is all a bit too much, but is will always good for future.
Hope this helps.
As you should not depend on a specific Storage Provider (DB [mysql, mssql, ...], flat file, xml, binary, cloud, ...) in a professional project all constraint should be checked in the business logic (model).
The model shouldn't have to know anything about the storage provider.
Uncle Bob said something about architecture and databases: http://blog.8thlight.com/uncle-bob/2011/11/22/Clean-Architecture.html

SaaS application needs to export/backup data to individual customer sites

We have a cloud based SaaS application and many of our customers (school systems) require that a backup of their data be stored on-site for them.
All of our application data is stored in a single MS SQL database. At the very top of the "hierarchy" we have an "Organization". This organization represents a single customer in our system. Each organization has many child tables/objects/data. Each having FK relationships that ultimately end at "Organization".
We need a way to extract a SINGLE customer's data from the database and bundle it in some way so that it can be downloaded to the customers site. Preferably in a SQL Express, SQLite or an access database.
For example: Organization -> Skill Area -> Program -> Target -> Target Data are all tables in the system. Each one linking back to the parent by a FK. I need to get all the target data, targets, programs and skill areas per organization and export that data.
Does anyone have any suggestions about how to do this within SQL Server, a C# service, or a 3-rd party tool?
I need this solution to be easy to replicate for each customer who wants this feature "turned on"
Ideas?
I'm a big fan of using messaging to propagate data at the moment, so here's a message based solution that will allow external customers to keep a local, in sync copy of the data which you provide on the web.
The basic architecture would be an online, password secured and user specific list of changes which have occurred in the system.
At the server side this list would be appended to any time there was a change to an entity which is relevant to the specific customer.
At the client would run an application which checks the list of changes for any it hasn't yet received and then applies them to its local database (in the order they occurred).
There a a bunch of different ways of doing the list based component of the system but my gut feeling is that you would be best to use something like RSS to do this.
Below is a practical scenario of how this could work:
A new skill area is created for organisation "my org"
The skill is added to the central database and associated with the "my org" reccord
A SkillAreaExists event is also added at the same time to the "my org" RSS with JSON or XML data specifying the properties of the new skill area
A new program is added to the skill area that was just created
The program is added to the central database and associated with the skill area
A ProgramExists event is also added at the same time to the "my org" RSS with JSON or XML data specifying the properties of the new program
A SkillAreaHasProgram event is also added at the same time to the "my org" RSS with JSON or XML data specifying an identifier for the skill area and program
The client agent checks the RSS feed and sees the new messages and processes them in order
When the SkillAreaExists event is processed a new Skill area is added to the local DB
When the ProgramExists event is processed a new Program is added to the local DB
When the SkillAreaHasProgram event is processed the program is linked to the skill area
This approach has a whole bunch of benefits over traditional point in time replication.
Its online, a consumer of this can get realtime updates if required
Consistancy is maintained by order, at any point in time in the event stream if you stop receiving events you have a local DB which accuratly reflects the central DB as at some point in time.
Its diff based, you only need to recieve changes
Its auditable, you can see whats actually happened not just the current state.
Its easily recoverable, if there's a data consistency issue you can revert the entire DB by replaying the event stream.
It allows for multiple consumers, lots of individual copies of the clients info can exist and function autonomously.
We have had a great deal of success with these techniques for replicating data between sites especially when they are only sometimes online.
While there are some very interesting enterprise solutions that have been suggested, I think my approach would be to develop a plane old scheduled backup solution that simply exports the data for each organisation with a stored procedure or just a number of select statements.
Admittedly you'll have to keep this up to date as your database schema changes but if this is a production application I cant imagine that happens very drastically.
There are any number of technologies available to do this, be it SSIS, a custom windows service, or even something as rudimentary as a scheduled task that kicks off a stored procedure from the command line.
The format you choose to export to is entirely up to you and should probably be driven by how the backup is intended to be used. I might consider writing data to a number of CSV files and zipping the result such that it could be imported into other platforms should the need arise.
Other options might be to copy data across to a scratch database and then simply create a SQL backup of that database.
However you choose to go about it, I would encourage you to ensure that the process is well documented and has as much automated installation and setup as possible. Systems with loosely coupled dependencies such as common file locations or scheduled tasks are prone to getting tweaked and changed over time. Without those tweaks and changes being recorded you can create a system that works but can't be replicated. Soon no one wants to touch it and no one remembers exactly how it works. When it eventual needs changing, or worse it breaks, you have to start reverse engineering before you can fix it.
In a cloud based environment this is especially important because you want to be able to deploy as quickly as possible. If there is a lot of configuration that needs to be done you're likely to make mistakes or just be inconsistent. By creating a nuke-and-repave deployment you have a single point that you can change installation and configuration, safe in the knowledge that the change will be consistent across any deployment.
From what i understand, you have one large database for all the clients, you use relations which lead to the table organization to know which data for which client, and you want to backup the data based on client => organization.
To backup the data you can use one of the following methods:
As the comments from #Phil, and #Kris you can use SSIS for automated backup, check this link for structure backup, and check this link for how to Export a Query Result to a File using SSIS and instead of file do it to access or SQL Server database.
Build an application\service using C# to select the data and export it manually, need time but customization has no limits.
Have you looked at StreamInsight?
http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/complex-event-processing.aspx
When I've had to deal with backups of relational data in the past (in MySQL which isn't super different in terms of capability from MSSQL that you're running) is to create a backup "package" file which is essentially a zip file with a different file extension so that windows won't let users open it.
If you really want to get fancy, encrypt the file after zipping it and change the extension. I presume you're using ASP for your SaaS and since I'm a PHP-geek, I can't help too much with the code side of things, but the way I've handled this before was for a script that would package an entire Joomla site and Database for migration to a new server.
//open the MySQL connection
$dbc = mysql_connect($cfg->host,$cfg->user,$cfg->password);
//select the database
mysql_select_db($cfg->db,$dbc);
output( 'Getting database tables
');
//get all the tables in the database
$tables = array();
$result = mysql_query('SHOW TABLES',$dbc);
while($row = mysql_fetch_row($result)) {
$tables[] = $row[0];
}
output( 'Found '.count($tables).' tables to be migrated.
Exporting tables:
');
$return = "";
//cycle through the tables and get their create statements and data
foreach($tables as $table) {
$result = mysql_query('SELECT * FROM '.$table);
$num_fields = mysql_num_fields($result);
$return.= 'DROP TABLE IF EXISTS '.$table.";\n";
$row2 = mysql_fetch_row(mysql_query('SHOW CREATE TABLE '.$table));
$return.= $row2[1].";\n";
while($row = mysql_fetch_row($result)) {
$return.= 'INSERT INTO '.$table.' VALUES(';
for($j=0; $j<$num_fields; $j++) {
$row[$j] = mysql_escape_string($row[$j]);
$row[$j] = ereg_replace("\n","\\n",$row[$j]);
if (!empty($row[$j])) {
$return.= "'".$row[$j]."'" ;
} else {
$return.= "NULL";
}
if ($j<($num_fields-1)) {
$return.= ',';
}
}
$return.= ");\n";
}
}
That's the relevant portion of the code in PHP that loops the database structure and stores the recreation script in $result which can then be output to a file.
In your case, you don't want to recreate the databases, but rather the data itself. You've compounded the issue slightly since you have a SaaS that is prone to possible data structure changes which you'll need to be able to account for. My suggestion would be this then:
Use a similar system to the above to dump the relevant data from the individual tables. I'm simply pulling all the data, but you could pull only the parts that pertain to the individual user by using JOIN statements and whatnot. Dump the contents of each table's insert/replace statements into a file named after the table. Create a file called manifest.xml or something of that sort and populate it with the current version of your SaaS application, name/information, unique ID, etc of the client exporting the data.
Package all those files into a ZIP file, change the extension to whatever you want, encrypt it if you desire, etc. Let them download that backup file and you're set.
In your import script, you will need to read the version number of the exported data and compare it to some algorithm that can handle remapping the data based on revisions you make later on. This way if you need to re-import one of their backups later, you can correctly handle transitioning the data from when they pulled the backup to the current structure of the data in that table now.
Hopefully that helps ;)
Because you keep all the data in just one database, it will always be difficult to export/backup data on customer basis.
Even if you implement such scenario now, you will end up with two different places you need to maintain/change/test every time you change the database schema (fixing bugs, adding new features, optimization, etc).
I would recommend you to partition the data, say, by using a database per organization. Then you change your application just once (mainly around building a connection string for the specified organization), and then you can safely export/backup each database separately in a way you want it.
It also gives you a lot of extra benefits "for free" such as scalability and the ability to dedicate resources on per-organization base (whether it is needed in the future).
Say, you have a set of small and low priority (from a business point of view) organizations, and a big and high priority one. So you will be able to keep a set of small low priority databases on one server, but dedicate another one for that specific important big one.
Or if your current DB server is overloaded (perhaps you have A LOT of data and A LOT of requests to the database), you can simply get another cheap server and move half of the load without any changes in your system...
You still need to write something in order to split the existing big database into several small ones, but you do it just once, and after it is done this "migration tool" can be thrown away so you don't need to support it anymore.
Have you tried SyncFramework?
Have a look at this article!
It explains how to sync filtered data between databases using Sync Framework.
You can sync to the customer's database or sync to your own empty db and then export it as a file.
Did you thought about using an ORM? (Object Relational Mapper)
I know, and use, LLBLGen Pro (so I can talk only about the feature of this specific ORM)
Anyway, with LLBLGen you can reverse-engineer the DB and create a hierarchy of class that map the tables and relations of your DB.
Now If all the data of a customer is reachable via relations, I can tell to my ORM framework to load a single costumers (1 row of a specific table) and then load all the related data in the related table.
If the data is not too complex, it should be possible.
If you have hundreds of self referenced tables or strange relations, it may be undoable, it depend upon your data.
If all the data of a single customer is, say, 10'000 rows in 100 tables, it will probably work.
If all the data of is 100'000 rows in 1000 tables it "may" work if you have some times, and a lot of memory.
If all the data is 10'000'000 you probably cant load it all at once, and you'll need a more efficient way.
Anyway, if you can load all the data at once, then you'll have a nice "in memory" graph with all the data of a single customer, and then you can serialize this data, or project it on a dataset (obtaining a set of datatable/relations) and then serialize the dataset.
Using an ORM to load and export all the data of a single customer as explained, probably, is not the most efficient way of doing things, but when doable it's a simple and cheap way.
Naturally, with or without ORM, you can find hundreds of different way to export this data :-)
For you design, you should have sharded your database for customers.
However, as you have already developed the database design, I suggest you to create a temp database and create the new tables in this temp database using the FK relation.
For this, you need to sort the tables based on the FK relationship and create them in the temp database.
Then, select the table data from the source database and insert them in the temp database.
You can also use this technique to shard your database and revamp your database design.
Aravind

C# and Access 2000

I have developed an network application that is in use in my company for last few years.
At start it was managing information about users, rights etc.
Over the time it grew with other functionality. It grew to the point that I have tables with, let's say 10-20 columns and even 20,000 - 40,000 records.
I keep hearing that Access in not good for multi-user environments.
Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client.
It happens because there is no database engine on the server side and data filtering is done on the client side.
I would migrate this project to the SQL Server but unfortunately it cannot be done in this case.
I was wondering if there is more reliable solution for me than using Access Database and still stay with a single-file database system.
We have quite huge system using dBase IV.
As far as I know it is fully multiuser database system.
Maybe it will be good to use it instead of Access?
What makes me not sure is the fact that dBase IV is much older than Access 2000.
I am not sure if it would be a good solution.
Maybe there are some other options?
If you're having problems with your Jet/ACE back end with the number of records you mentioned, it sounds like you have schema design problems or an inefficiently-structured application.
As I said in my comment to your original question, Jet does not retrieve full tables. This is a myth propagated by people who don't have a clue what they are talking about. If you have appropriate indexes, only the index pages will be requested from the file server (and then, only those pages needed to satisfy your criteria), and then the only data pages retrieved will be those that have the records that match the criteria in your request.
So, you should look at your indexing if you're seeing full table scans.
You don't mention your user population. If it's over 25 or so, you probably would benefit from upsizing your back end, especially if you're already comfortable with SQL Server.
But the problem you described for such tiny tables indicates a design error somewhere, either in your schema or in your application.
FWIW, I've had Access apps with Jet back ends with 100s of thousands of records in multiple tables, used by a dozen simultaneous users adding and updating records, and response time retrieving individual records and small data sets was nearly instantaneous (except for a few complex operations like checking newly entered records for duplication against existing data -- that's slower because it uses lots of LIKE comparisons and evaluation of expressions for comparison). What you're experiencing, while not an Access front end, is not commensurate with my long experience with Jet databases of all sizes.
You may wish to read this informative thread about Access: Is MS Access (JET) suitable for multiuser access?
For the record this answer is copied/edited from another question I answered.
Aristo,
You CAN use Access as your centralized data store.
It is simply NOT TRUE that access will choke in multi-user scenarios--at least up to 15-20 users.
It IS true that you need a good backup strategy with the Access data file. But last I checked you need a good backup strategy with SQL Server, too. (With the very important caveat that SQL Server can do "hot" backups but not Access.)
So...you CAN use access as your data store. Then if you can get beyond the company politics controlling your network, perhaps then you could begin moving toward upfitting your current application to use SQL Server.
I recently answered another question on how to split your database into two files. Here is the link.
Creating the Front End MDE
Splitting your database file into front end : back end is sort of a key to making it more performant. (Assume, as David Fenton mentioned, that you have a reasonably good design.)
If I may mention one last thing...it is ridiculous that your company won't give you other deployment options. Surely there is someone there with some power who you can get to "imagine life without your application." I am just wondering if you have more power than you might realize.
Seth
The problems you experience with an Access Database shared amongst your users will be the same with any file based database.
A read will pull a lot of data into memory and writes are guarded with some type of file lock. Under your environment it sounds like you are going to have to make the best of what you have.
"Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client. "
Actually no. This is a common misstatement spread by folks who do not understand the nature of how Jet, the database engine inside Access, works. Pulling down all the records, or excessive number of records, happens because you don't have all the fields used in the selection criteria or sorting in the index. We've also found that indexing yes/no aka boolean fields can also make a huge difference in some queries.
What really happens is that Jet brings down the index pages and data pages which are required. While this is a lot more data than a database engine would create this is not the entire table.
I also have clients with 600K and 800K records in various tables and performance is just fine.
We have an Access database application that is used pretty heavily. I have had 23 users on all at the same time before without any issues. As long as they don't access the same record then I don't have any problems.
I do have a couple of forms that are used and updated by several different departments. For instance I have a Quoting form that contains 13 different tabs and 10-20 fields on each tab. Users are typically in a single record for minutes editing and looking for information. To avoid any write conflicts I call the below function any time a field is changed. As long as it is not a new record being entered, then it updates.
Function funSaveTheRecord()
If ([chkNewRecord].value = False And Me.Dirty) Then
'To save the record, turn off the form's Dirty property
Me.Dirty = False
End If
End Function
They way I have everything setup is as follows:
PDC.mdb <-- Front End, Stored on the users machine. Every user has their own copy. Links to tables found in PDC_be.mdb. Contains all forms, reports, queries, macros, and modules. I created a form that I can use to toggle on/off the shift key bipass. Only I have access to it.
PDC_be.mdb <-- Back End, stored on the server. Contains all data. Only form and VBA it contains is to toggle on/off the shift key bipass. Only I have access to it.
Secured.mdw <-- Security file, stored on the server.
Then I put a shortcut on the users desktop that ties the security file to the front end and also provides their login credentials.
This database has been running without error or corruption for over 6 years.
Access is not a flat file database system! It's a relational database system.
You can't use SQL Server Express?
Otherwise, MySQL is a good database.
But if you can't install ANYTHING (you should get into those politics sooner rather than later -- or it WILL be later), just use you existing database system.
Basically with Access, it cannot handle more than 5 people connected at the same time, or it will corrupt on you.

Categories

Resources