Available options for maintaining data consistency/sync across multiple systems

Available options for maintaining data consistency/sync across multiple systems - c#

My question about what the best , tried and tested (and new?) methods out there to do a fairly common requirement in most companies.
Every company has customers. And lets say a company A has about 10 different systems for its business needs.Customer is critical to all systems.
Customer can be maintained in any of the systems independently but if they fall out of sync then it’s not good. I know it’s ideal to keep one big master place/System for customer record and have all other systems take that information from that single location/system.
How do you build something like this.. SOA? ETLs? Webservice? Etc.. any other ideas out there that are new … and not to forget old methods.
We are a MS / .NET shop. This is mostly for my knowledge and learning.. please point me in right direction and I want to be aware of all my options.

Ideally all your different systems would share the same database, in which case that database would be the master. However that's almost never the case.
So the most common method I've seen is to have yet another system (lets call it a data warehouse) that takes feeds from your 10 different systems, aggregates them together, and forms a "master" view of a customer.

I have not done anything like this, but playing with the idea here are my thoughts. Perhaps something will be helpful.
This is a difficult question, and I'd say it mainly depends on what development ability and interfaces you have available in each of the 10 systems. You may need a data warehouse manager piece of software working like my next paragraph says with various plugins for all the different types of interfaces in the 10 systems involved.
Thinking from the data warehouse idea: Ideally each Customer in each system would have a LastModified field, although that is probably unlikely. So you'd almost need to serialize the Customer record from each source, store it in your data warehouse database with the last time the program updated that record. This idea would allow you to know exactly what record is the newest any time anything changes in any of the 10 systems and update fields based on that. This is about the best you could do if you're not developing some of the systems, only able to read from some fashion of an interface.
If you are developing all the systems, then I'd imagine WCF interfaces (I mention WCF because they have more connection options than webservices in general) to propagate updates to all the other systems (probably via a master hub application) might be the simplest option. Passing in the new values and the date it was updated, either from an event on the save button, or checking a LastModified field every hour/day.
Another difficulty is what happens if one Customer object has an Address field and another does not, will the updates between those two overwrite each other in any cases? Or if one had a CustomerName and another has CustomerFirstname and CustomerLastname
NoSQL ideas of variable data structure and ability to mark cached values as dirty also somewhat come to mind, not sure how much benefit those concepts would really add though.

Related

How to implement a C# Winforms database application with synchronisation?

Background
I am developing a C# winforms application - currently up to about 11000 LOC and the UI and logic is about 75% done but there is no persistence yet. There are hundreds of attributes on the forms. There are 23 entities/data classes.
Requirement
The data needs to be kept in an SQL database. Most of the users operate remotely and we cannot rely on them having a connection so we need a solution that maintains a database locally and keeps it in synch with the central database.
Edit: Most of the remote users will only require a subset of the database in their local copy. This is because if they don't have access permissions (as defined and stored in my application) to view other user's records, they will not receive copies of them during synchronisation.
How can I implement this?
Suggested Solution
I could use the Microsoft Entity Framework to create a database and the link between database and code. This would save a lot of manual work as there are hundreds of attributes. I am new to this technology but have done a "hello world" project in it.
For data synch, each entity would have an integer primary key ID. Additionally it would have a secondary ID column which relates to the central database. This secondary column would contain nulls in the central database but would be populated in the local databases.
For synchronisation, I would write code which copies the records and assigns the IDs accordingly. I would need to handle conflicts.
Can anyone foresee any stumbling blocks to doing this? Would I be better off using one of the recommended solutions for data sychronisation, and if so would these work with the entity framework?

Synching data between relational databases is a pain. Your best course of action is probably dependent on: how many users will there be? How probably are conflicts (i.e. that the users will work offline on the same data). Also possibly what kind of manpower do you have (do you have proper DBAs/Sql Server devs standing by to assist with the SQL part, or are you just .NET devs).
I don't envy you this task, it smells of trouble. I'd especially be worried about data corruption and spreading that corruption to all clients rapidly. I'd put extreme countermeasures in place before any data in the remote DB gets updated.
If you predict a lot of conflicts - the same chunk of data gets modified many times by multiple users - I'd probably at least consider creating an additional 'merge' layer to figure out, what is the correct order of operations to perform on the remote db.
One thought - it might be very wrong and crazy, but just the thing that popped in my mind - would be to use JSON Patch on the entities, be it actual domain objects or some configuration containers. All the changes the user makes are recorded as JSON Patch statements, then applied to the local db, and when the user is online - submitted - with timestamps! - to merge provider. The JSON Patch statements from different clients could be grouped by the entity id and sorted by timestamp, and user could get feedback on what other operations from different users are queued - and manually make amends to it. Those grouped statments could be even stored in a files in a git repo. Then at some pre-defined intervals, or triggered manually, the update would be performed on a server-side app and saved to the remote db. After this the users local copies would be refreshed from server.
It's just a rough idea, but I think that you need something with similar capability - it doesn't have to be JSON Patch + Git, you can do it in probably hundreds of ways. I don't thing though, that you will get away with just going through the local/remote db and making updates/merges. Imagine the scenario, where user updates some data (let's say, 20 fields) offline, another makes completely different updates to 20 fields, and 10 of those are common between the users. Now, what should the synch process do? Apply earlier and then latter changes? I'm fairly certain that both users would be furious, because their input was 'atomic' - either everything is changed, or nothing is. The latter 'commit' must be either rejected, or users should have an option to amend it in respect of the new data. That highly depends what your data is, and as I said - what will be number/behaviour of users. Duh, even time-zones become important here - if you have users all in one time-zone you might get away with having predefined times of day when system synchs - but no way you'll convince people with many different business hours that the 'synch session' will happen at e.g. 11 AM, when they are usually giving presentation to management or sth ;)

Multi-User Financial Account Manager App Design Recommendation

Love the site--it has been very informative throughout my studies. Just finished a quarter of C# intro and one of the projects was to design a Financial "Account Manager" app that keeps a balance and updates it when withdraws and deposits are made. The project was fairly simple and I didn't have any problems. Unfortunately, my next quarter doesn't include any programming classes :(, so I'm using the time to expand my knowledge through beefing up my Account Manager app.
First thing I wanted to do, was to enable multiple users. So far, I've included a CreateNewUser class that prohibits duplicate user names, checks new passwords for specific formatting requirements, salts and hashes it, and saves it to an "Accounts" table with the username (email address) and an auto-incremented user id. Simple enough.
So now I'm stuck: not sure what would be best practice. I don't think that the user should be using the same table as other users, so I'm thinking that each user should have their own table. Am I being "too paranoid", or is my thinking along the lines of common programming security practices? The truth is that nobody will probably ever use this app, but I'm trying to learn what I can apply in the real world when I grow up.
Using the same table only requires loading the DataSet with a query of matching userID's, so that wouldn't be a big deal. If I should use separate tables, then I would need to create a new table dynamically when the new user is created, and I was going to just name the table with the user id, which would simulate the account number in the real world, I'm assuming.
Anyway, I couldn't find another question that covered this, so I thought I'd ask ya'll for your thoughts.
Thanks,
Deadeddie

Think of it this way. If you're going to be keeping physical examples of these tables, for example, using a notebook. Would you rather have a lot of small notebooks or one big notebook that you can refer too?
As long as your code is written to only pull the correct data (in this case, matching userIDs) there isn't a big deal security wise because all your code will be handling the access permissions to the data. And your database and code have the correct permissions set on them as well.

So far, I've included a CreateNewUser class that prohibits duplicate
user names, checks new passwords for specific formatting requirements,
salts and hashes it, and saves it to an "Accounts" table with the
username (email address) and an auto-incremented user id. Simple
enough.
Already bad. It should be a Users table - Account in an application dealing with financial information has a very specific financial meaning, and you may want to have multiple accounts per user and / or an account shared by users.
Also, unless you write Powershell CmdLets (where one class per command is the pattern), a CreateNewUser class is as bad as going out an d burning cars. User is a class, some sort of repository is ok, but CREATE NEW is if anything a FUNCTION on the class. It is definitely not a complete class - you totally botch the concept of object orientation if you turn every method in a class.
I don't think that the user should be using the same table as other users,
Again a total beginenr mistake. Why not? Put in proper fields referencing the account and / or user as appropriate and be fine.
then I would need to create a new table dynamically when the new user is created,
Did you ever think what you are doing here? Maintenance wise every change means writing a program that finds out what user tables exist, then modifies them. Tooling support out of the window. I once saw an application written like that - invoice management. It had one invoice details tables PER INVOICE (and an invoice table per invoice, coded by invoice number) because the programmer never understood what databases are.
Am I being "too paranoid", or is my thinking along the lines of common programming security practices?
They are along the line "you are fired, learn how databases work".
Using the same table only requires loading the DataSet with a query of matching userID's
;) So DataSet are still around? Is there a reason you do programming archaeology, following the worst practices of the last 30 years at Microsoft - instead of using an ORM as Microsoft already provides since some time now (Linq2SQL, Entity Framework) which would make your application a lot - ah - more - ah - object oriented?
May I suggest reading a decent book? Look up "Building Object Applications That Work" by Scott Ambler? And no, it is not written for C# - interesting enough the concepts of good architecture are 99% language agnostic.

Spreading of business logic between DB and client

Ok guys, another my question is seems to be very widely asked and generic. For instance, I have some accounts table in my db, let say it would be accounts table. On client (desktop winforms app) I have appropriate functionality to add new account. Let say in UI it's a couple of textboxes and one button.
Another one requirement is account uniqueness. So I can't add two same accounts. My question is should I check this account existence on client (making some query and looking at result) or make a stored procedure for adding new account and check account existence there. As it for me, it's better to make just a stored proc, there I can make any needed checks and after all checks add new account. But there is pros and cons of that way. For example, it will be very difficult to manage languagw of messages that stored proc should produce.
POST EDIT
I already have any database constraints, etc. The issue is how to process situation there user is being add an existence account.
POST EDIT 2
The account uniqueness is exposed as just a simple tiny example of business logic. My question is more abour handling complicated business logic on that accounts domain.
So, how can I manage this misunderstanding?
I belive that my question is basic and has proven solution. My tools are C#, .NET Framework 2.0. Thanks in advance, guys!

If the application is to be multi-user ( i.e. not just a single desktop app with a single user, but a centralised DB with the app acting as clients maybe on many workstations), then it is not safe to rely on the client (app) to check for such as uniqueness, existance, free numbers etc as there is a distinct possibility of change happening between calls (unless read locking is used, but this often become more of an issue than a help!).
There is the ability of course to precheck and then recheck (pre at app level, re at DB), but of course this would give extra DB traffic, so depends on whether it is a problem for you.
When I write SPROCs that will return to an app, I always use the same framework - I include parameters for a return code and message and always populate them. Then I can use standard routines to call them and even add in the parameters automatically. I can then either display the message directly on failure, or use the return code to localize it as required (or automate a response). I know some DBs (like SQL Svr) will return Return_Code parameters, but I impliment my own so I can leave inbuilt ones for serious system based errors and unexpected failures. Also allows me to have my own numbering systems for return codes (i.e. grouping them to match Enums in the code and/or grouping by severity)
On web apps I have also used a different concept at times. For example, sometimes a request is made for a new account but multiple pages are required (profile for example). Here I often use a header table that generates a hidden user ID against the requested unique username, a timestamp and someway of recognising them (IP Address etc). If after x hours it is not used, the header table deletes the row freeing up the number (depending on DB the number may never become useable again - this doesn;t really matter as it is just used to keep the user data unique until application is submitted) and the username. If completed correctly, then the records are simply copied across to the proper active tables.
//Edit - To Add:
Good point. But account uniqueness is just a very tiny simple sample.
What about more complex requirements for accounts in business logic?
For example, if I implement in just in client code (in winforms app) I
will go ok, but if I want another (say console version of my app or a
website) kind of my app work with this accounts I should do all this
logic again in new app! So, I'm looking some method to hold data right
from two sides (server db site and client side). – kseen yesterday
If the requirement is ever for mutiuse, then it is best to separate it. Putting it into a separate Class Library Project allows the DLL to be used by your WinForm, Console program, Service, etc. Although I would still prefer rock-face validation (DB level) as it is closest point in time to any action and least likely to be gazzumped.
The usual way is to separate into three projects. A display layer [DL] (your winform project/console/Service/etc) and Business Application Layer [BAL] (which holds all the business rules and calls to the DAL - it knows nothing about the diplay medium nor about the database thechnology) and finally the Data Access Layer [DAL] (this has all the database calls - this can be very basic with a method for insert/update/select/delete at SQL and SPROC level and maybe some classes for passing data back and forth). The DL references only the BAL which references the DAL. The DAL can be swapped for each technology (say change from SQL Server to MySQL) without affecting the rest of the application and business rules can be changed and set in the BAL with no affect to the DAL (DL may be affected if new methods are added or display requirement change due to data change etc). This framework can then be used again and again across all your apps and is easy to make quite drastic changes to (like DB topology).

This type of logic is usually kept in code for easier maintenance (which includes testing). However, if this is just a personal throwaway application, do what is most simple for you. If it's something that is going to grow, it's better to put things practices in place now, to ease maintenance/change later.
I'd have a AccountsRepository class (for example) with a AddAcount method that did the insert/called the stored procedure. Using database constraints (as HaLaBi mentioned), it would fail on trying to insert a duplicate. You would then determine how to handle this issue (passing a message back to the ui that it couldn't add) in the code. This would allow you to put tests around all of this. The only change you made in the db is to add the constraint.
Just my 2 cents on a Thrusday morning (before my cup of green tea). :)

i think the answer - like many - is 'it depends'
for sure it is a good thing to push logic as deeply as possible towards the database. This prevent bad data no matter how the user tries to get it in there.
this, in simple terms, results in applications that TRY - FAIL - RECOVER when attempting an invalid transaction. you need to check each call(stored proc, or triggered insert etc) and IF something bad happens, recover from that condition. Usually something like tell the user an issue occurred, reset the form or something, and let them try again.
i think at a minimum, this needs to happen.
but, in addition, to make a really nice experience for the user, the app should also preemptively check on certain data conditions ahead of time, and simply prevent the user from making bad inserts in the first place.
this is of course harder, and sometimes means double coding of business rules (one in the app, and one in the DB constraints) but it can make for a dramatically better user experience.

The solution is more of being methodical than technical:
Implement - "Defensive Programming" & "Design by Contract"
If the chances of a business-rule being changed over time is very less, then apply the constraint at database-level
Create a "validation or rules & aggregation layer (or class)" that will manage such conditions/constraints for entity and/or specific property
A much smarter way to do this would be to make a user-control for the entity and/or specific property (in your case the "Account-Code"), which would internally use the "validation or rules & aggregation layer (or class)"
This will allow you to ensure a "systematic-way-of-development" or a more "scalable & maintainable" application-architecture
If your application is a website then along with placing the validation on the client-side it is always better to have validation even in the business-layer or C# code as well
When ever a validation would fail you could implement & use a "custom-error-message" library, to ensure message-content is standard across the application
If the errors are raised from database itself (i.e., from stored-procedures), you could use the same "custom-error-message" class for converting the SQL Exception to the fixed or standardized message format
I know that this is all a bit too much, but is will always good for future.
Hope this helps.

As you should not depend on a specific Storage Provider (DB [mysql, mssql, ...], flat file, xml, binary, cloud, ...) in a professional project all constraint should be checked in the business logic (model).
The model shouldn't have to know anything about the storage provider.
Uncle Bob said something about architecture and databases: http://blog.8thlight.com/uncle-bob/2011/11/22/Clean-Architecture.html

MVC C# CRM - One app many organizations

We've developed simple CRM application in ASP.NET MVC. It's for a single organization with few user accounts.
I'm looking for easy way to make it work with many organization. May I use ApplicationId from Membership provider for this? Every organization would have they own ApplicationId.
But this means that every row in the database would have to have ApplicationId too, right?
Please give you suggestions. Maybe there is a better way?

Unfortunately, for the "easy way" you already missed the bus. Since the easy way would have been to make this possible already by design. It would have not been that much of a burden to include the OwnerId to the data already in the first phase and make your business logic to work according to that.
Currently the "easy way" is to refactor all your data and business logic to include the OwnerId. And while doing it, look ahead. Think of the situations "what if we need to support this and that in the future" and leave some room there for the future by design. You don't need to fully implement everything right now but you'll find it out how easy it is to make your application scale if it was designed to scale.
What comes to the ApplicationId, that's an internal ID for your membership provides to scope your membership data per application. I would stay away from bleeding that logic to the whole of your application. Keep in mind that authenticating your web users and assigning them in roles and giving them rights through roles is a totally different process than ownership of data.
In ASP.NET MVC, you would use [Authorize] attribute to make sure that certain actions can or cannot be performed by certain users or groups while determining which data is whose, should be implemented in the data itself. Even if you would run two or more instances of your application, it'd still be the same application. So ApplicationId doesn't work here for scoping your data.
But, let's say your CRM would not be so small after all in the future and it becomes apparent that either your initial organization or one of the later ones would like to allow their customers to log on and check their data, you would need to build another application for the customers to log onto. This site would use a different applicationId than your CRM. Then your client organization could map the user accounts to their CRM records so that their customer could review them.
So, since your CRM is (still) small, the easiest way is to design a good schema for your clients to be stored in and then mark all your CRM data with an OwnerId. And that OwnerId cannot come from the users table, or membership table or anywhere near there. It has to come from the table that lists the legal owners of the data. Whether you want to call them Organizations, Companies, Clients or whatever. It cannot be userId, roleId, applicationId etc, since users might be leaving an owning organization, roles are shared between the organizations (at least the ones that are used to determine access to certain MVC actions) and applicationIds are meant for scoping membership and roles between different kinds of client applications.
So what you are missing here is the tables describing owners for the CRM records and mapping all the data to their owners. And for that there's no easy way. You already went on developing the CRM thinking "this is just a simple one-organization CRM so let's make things the easy". Now you're having a "simple multi-organization CRM and asking an easy way to recover from that initial lack of design. Next step would be asking how you make your "not so simple multi-organization CRM" to easily do something you didn't take in to account in the first place.
The easy solution is to design your application scalable and doing "just a little" extra to support future growth. It'll be much easier in the long run than to spend a lot of extra rewriting your application twice a year. Also, keep in mind. It's a CRM after all, and you can't go ahead and tell whoever is using it in their business that have a day off since we're fixing some stuff in the CRM.
I'm not patronizing you here. I'm answering to anyone who might be reading this to stop searching for easy solutions to recover from inadequate planning. There isn't any. And seeking one is making the same mistake twice.
Instead, grab some pen and paper and plan a workable design and make it work. Put some extra effort in the early stages of software design and development and you'll find that work saving you countless hours later in the process. That way, whoever is using your CRM will stay happy using it. It'll become easier to talk to your users about future changes while you don't have to think "I don't want to do that since it'd break the application again". Instead, you can enjoy together brainstorming the next cool step. Some of the ideas will be left for later but some room for implementation will be designed already on this stage so that the actual implementation year later will come in smoothly and will be enjoyable for all the parties involved.
That's my easy solution. I have 15 years of development behind and the fact that I'm still enjoying it to back the above up. And it's mainly because I take every (well most of them anyway) challenge as an opportunity to design the code better instead of trying to dodge the inevitable process. We have this old saying in Finland: "Either you'll do it or you'll cry doing it". And it fits the bill here perfectly. It's up to you if you like crying so much and take "the easy way" now.

Database handling in applications

This is a bit of difficult question to ask but any feedback at all is welcome.
Ill start by the background, I am a university student studying software engineering last year we covered c# and I got myself a job working in a software house coding prototype software in c# (their main language is c++ using QT) after producing the prototype it was given to some clients which have all passed back positive feedback.
Now I am looking at the app and thinking well I could use this as a showcase with my CV esp as the clients who used the software have said that they will sign something to reference it.
So if I am going to do that then I had better get it right and do it to the best I possibly can. so I have started to look at it and think where I can improve it and one of the ways in which I think that I can is the way it handles the database connections and the data along with it.
the app itself runs along side a MySQL server and there is 6 different schemas which it gets its data from.
I have written a class (called it databaseHandler) which has the mysqlconnection in it ( one question was about if the connection should remain open the whole time the app is running, or open it fire a query then close it etc) inside this class I have written a method which takes some arguments and creates its query string which it then does the whole mysqlDataReader = cmd.executeReader(), this then returns the reader back to where ever it was called from.
After speaking to a friend he mentioned it might be nice if the method returned the raw data and not the reader, therefore keeping all the database "stuff" away from the main app.
After playing around I managed to find a couple of tutorials on putting the reader data into arrays and arraylists and passing then back, also had a go at passing back an array list of hashtables - these methods obv mean that the dev must know the column names in order to find the correct data.
then I stumbled across a page which went on about creating a Class which had the attributes of the column names and created a list which you could pull your data from:
http://zensoftware.org/archives/248 is the link
so this made me think, in order to use this method, would I need to create 6 classes with the attributes of the columns of my tables ( a couple of tables has up to 10-15 columns)? or is there a better way for me to handle my data?
I am not really clued up on these things but if pointed in the right direction I am a very fast learner :)
Again I thank you for any input what so ever.
Vade

You have a lot of ideas that are very close but are pretty common problems, but good that you are actively thinking about how to handle them!
On the question about leaving the connection open for the whole program or only having it open during the actual query time. The common (and proper) way to do this is only have the connection open as much as you need it, so
MySqlConnection cn = new MySqlConnection(yourConnectionString);
//Execute your queries
cn.close();
This is better since you don't risk leaving open connections, or having transaction issues typing up databases and resources.
With the having just the data returned and not the actual datareader this is a good idea but by just returning the data as an ArrayList or whatever you are kind of losing the structure of the data a little.
A good way to do this would be to either have your class just take the datareader to populate it's data OR have the Data Layer just return an instance of your class after reading the data.

I believe that it would be an excellent approach if your data access class returned a custom class populated with data from the database. That would be object-oriented. Instead of, say, returning a DataSet or an array containing customer information, you would create a Customer class with properties. Then, when you retrieve the data from the database, you populate an instance of the Customer class with the data, and return it to the calling code.
A lot of the newer Microsoft technologies are focusing on making this task easier. Quite often, there are many more than 6 classes needed, and writing all that code can seem like drudgery. I would suggest that, if you are interested in learning about those newer approaches, and possibly adapting them to your own code, you can check out Linq to SQL and Entity Framework.

one question was about if the connection should remain open the whole
time the app is running, or open it fire a query then close it etc
You want to keep the connection open as little as possible. Therefore you should open on each data request and close it as soon as you are done. You should also dispose it but if your database stuff is inside a C# using statement that happens automatically.
As far as the larger question on how to return the data to your application you are on the right track. You typically want to hide the raw database from the rest of your application and mapping the raw data to other intermediate classes is the correct thing to do.
Now, how you do this mapping is a very large topic. Ideally you don't want to create classes that map one to one your tables/columns but rather provide your app a more app-friendly representation of the data (e.g. business objects rather than database tables.) For example, if your employee data is split in to or three tables for normalization purposes you can hide this complexity and present the information as a single Employee class that binds the data from the other tables together.

Abstracting away your data access code using objects is known as Object/Relational mapping. It's actually a much more complex task than it appears at first sight. There are several libraries, even in the framework itself, that already do very well what you're trying to implement.
If your needs are very simple, look into typed DataSets. They let you create the table classes in a designer and also generate objects that will do the loading and saving for you (given certain limitations)
If your needs are less simple, but still pretty simple, I recommend you take a look at Linq To SQL to see if it covers your needs, as it does table-class mapping in a very straightforward way and uses a more modern usage pattern than DataSets.
There are also more complex ORMs that allow you to define more complex mappings, like Entity Framework or nHibernate, but very often their complexity is not necessary.
Details like connection lifetime will then depend on your specific needs. Sometimes it's best to keep the connection open, if you have a lot of queries caused by user interaction, like is usually the case with a desktop app. Other times it's best to keep them as short as possible to avoid congestion, like the case of web apps.
Whichever technology you choose will likely end up guiding you onto a good set of practices for it, and the best you can do is try things out and see which works best for you.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.