In my application there are incoming messages and I would like to sample some of them, for simplicity let's say 1 every 10. I have a settings file in which I have the following properties:
MaxPerHour
MaxPerDay
MaxAllTime
It's not an option to keep the counts in the current class so I somehow need to store them (on database or memory).
Also an important thing is that there are multiple collectors, so I would need to be able to know how many Collector1 has collected in the last hour / this day. It's also in an async environment
I am out of ideas as I know that if I were to store this data in database it's would not be that performant.
It would be of value to know what your hosting environment is. I think a distributed cache may turn out to be your holy grail. You can share this instance over all your connectors and easily read/write/invalidate data over this shared instance.
For example, running your system in Azure, Azure Cache for Redis is awesome to solve your problem but then again, your infra is key to the correct answer.
Related
Let's say that I have REST API app and I enable inserting through it some items to database, I would like to make sure that a lot of people at the same time can call that REST API to insert some objects but I also want to be sure that there will never be inserted more than x items of given type.
What would be optimal strategy for that? Let's say that there are 2 options, app is hosted only on a single node and second option is that it is distributed and can have multiple nodes on different servers but with the same database.
If it is hosted on one machine then I assume that I can have shared semaphore, easy but requests will be blocking each other all the time.
On distributed option I assume that there would have to be transaction od db level but how it could look like if performance is inportant?
I assume that this is generic and problem of well known class of problems so maybe you can give me any hints where can I read about it?
Thanks for help!
Background
I am developing a C# winforms application - currently up to about 11000 LOC and the UI and logic is about 75% done but there is no persistence yet. There are hundreds of attributes on the forms. There are 23 entities/data classes.
Requirement
The data needs to be kept in an SQL database. Most of the users operate remotely and we cannot rely on them having a connection so we need a solution that maintains a database locally and keeps it in synch with the central database.
Edit: Most of the remote users will only require a subset of the database in their local copy. This is because if they don't have access permissions (as defined and stored in my application) to view other user's records, they will not receive copies of them during synchronisation.
How can I implement this?
Suggested Solution
I could use the Microsoft Entity Framework to create a database and the link between database and code. This would save a lot of manual work as there are hundreds of attributes. I am new to this technology but have done a "hello world" project in it.
For data synch, each entity would have an integer primary key ID. Additionally it would have a secondary ID column which relates to the central database. This secondary column would contain nulls in the central database but would be populated in the local databases.
For synchronisation, I would write code which copies the records and assigns the IDs accordingly. I would need to handle conflicts.
Can anyone foresee any stumbling blocks to doing this? Would I be better off using one of the recommended solutions for data sychronisation, and if so would these work with the entity framework?
Synching data between relational databases is a pain. Your best course of action is probably dependent on: how many users will there be? How probably are conflicts (i.e. that the users will work offline on the same data). Also possibly what kind of manpower do you have (do you have proper DBAs/Sql Server devs standing by to assist with the SQL part, or are you just .NET devs).
I don't envy you this task, it smells of trouble. I'd especially be worried about data corruption and spreading that corruption to all clients rapidly. I'd put extreme countermeasures in place before any data in the remote DB gets updated.
If you predict a lot of conflicts - the same chunk of data gets modified many times by multiple users - I'd probably at least consider creating an additional 'merge' layer to figure out, what is the correct order of operations to perform on the remote db.
One thought - it might be very wrong and crazy, but just the thing that popped in my mind - would be to use JSON Patch on the entities, be it actual domain objects or some configuration containers. All the changes the user makes are recorded as JSON Patch statements, then applied to the local db, and when the user is online - submitted - with timestamps! - to merge provider. The JSON Patch statements from different clients could be grouped by the entity id and sorted by timestamp, and user could get feedback on what other operations from different users are queued - and manually make amends to it. Those grouped statments could be even stored in a files in a git repo. Then at some pre-defined intervals, or triggered manually, the update would be performed on a server-side app and saved to the remote db. After this the users local copies would be refreshed from server.
It's just a rough idea, but I think that you need something with similar capability - it doesn't have to be JSON Patch + Git, you can do it in probably hundreds of ways. I don't thing though, that you will get away with just going through the local/remote db and making updates/merges. Imagine the scenario, where user updates some data (let's say, 20 fields) offline, another makes completely different updates to 20 fields, and 10 of those are common between the users. Now, what should the synch process do? Apply earlier and then latter changes? I'm fairly certain that both users would be furious, because their input was 'atomic' - either everything is changed, or nothing is. The latter 'commit' must be either rejected, or users should have an option to amend it in respect of the new data. That highly depends what your data is, and as I said - what will be number/behaviour of users. Duh, even time-zones become important here - if you have users all in one time-zone you might get away with having predefined times of day when system synchs - but no way you'll convince people with many different business hours that the 'synch session' will happen at e.g. 11 AM, when they are usually giving presentation to management or sth ;)
My question about what the best , tried and tested (and new?) methods out there to do a fairly common requirement in most companies.
Every company has customers. And lets say a company A has about 10 different systems for its business needs.Customer is critical to all systems.
Customer can be maintained in any of the systems independently but if they fall out of sync then it’s not good. I know it’s ideal to keep one big master place/System for customer record and have all other systems take that information from that single location/system.
How do you build something like this.. SOA? ETLs? Webservice? Etc.. any other ideas out there that are new … and not to forget old methods.
We are a MS / .NET shop. This is mostly for my knowledge and learning.. please point me in right direction and I want to be aware of all my options.
Ideally all your different systems would share the same database, in which case that database would be the master. However that's almost never the case.
So the most common method I've seen is to have yet another system (lets call it a data warehouse) that takes feeds from your 10 different systems, aggregates them together, and forms a "master" view of a customer.
I have not done anything like this, but playing with the idea here are my thoughts. Perhaps something will be helpful.
This is a difficult question, and I'd say it mainly depends on what development ability and interfaces you have available in each of the 10 systems. You may need a data warehouse manager piece of software working like my next paragraph says with various plugins for all the different types of interfaces in the 10 systems involved.
Thinking from the data warehouse idea: Ideally each Customer in each system would have a LastModified field, although that is probably unlikely. So you'd almost need to serialize the Customer record from each source, store it in your data warehouse database with the last time the program updated that record. This idea would allow you to know exactly what record is the newest any time anything changes in any of the 10 systems and update fields based on that. This is about the best you could do if you're not developing some of the systems, only able to read from some fashion of an interface.
If you are developing all the systems, then I'd imagine WCF interfaces (I mention WCF because they have more connection options than webservices in general) to propagate updates to all the other systems (probably via a master hub application) might be the simplest option. Passing in the new values and the date it was updated, either from an event on the save button, or checking a LastModified field every hour/day.
Another difficulty is what happens if one Customer object has an Address field and another does not, will the updates between those two overwrite each other in any cases? Or if one had a CustomerName and another has CustomerFirstname and CustomerLastname
NoSQL ideas of variable data structure and ability to mark cached values as dirty also somewhat come to mind, not sure how much benefit those concepts would really add though.
We have a huge ASP.NET web application which needs to be deployed to LIVE with zero or nearly zero downtime. Let me point out that I've read the following question/answers but unfortunately it doesn't solve our problems as our architecture is a little bit more complicated.
Let's say that currently we have two IIS servers responding to requests and both are connected to the same MSSQL server. The solution seems like a piece of cake but it isn't because of the major schema changes we have to apply from time to time. Because of it's huge size, a simple database backup takes around 8 minutes which has become unacceptable, but it is a must before every new deploy for security reasons.
I would like to ask your help to get this deployment time down as much as possible. If you have any great ideas for a different architecture or maybe you've used tools which can help us here then please do not be shy and share the info.
Currently the best idea we came up is buying another SQL server which would be set up as a replica of the original DB. From the load balancer we would route all new traffic to one of the two IIS webservers. When the second webserver is free of running sessions then we can make deploy the new code. Now comes the hard part. At this point we would go offline with the website, take down the replication between the two SQL servers so we directly have a snapshot of the database in a hopefully consistent state (saves us 7.5 of the 8 minutes). Finally we would update the database schema on the main SQL server, and route all traffic via the updated webserver while we are upgrading the second webserver to the new version.
Please also share your thoughts regarding this solution. Can we somehow manage to eliminate the need for going offline with the website? How do bluechip companies with mammuth web applications do deployment?
Every idea or suggestion is more than welcome! Buying new hardware or software is really not a problem - we just miss the breaking idea. Thanks in advance for your help!
Edit 1 (2010.01.12):
Another requirement is to eliminate manual intervention, so in fact we are looking for a way which can be applied in an automated way.
Let me just remind you the requirement list:
1. Backup of database
2a. Deploy of website
2b. Update of database schema
3. Change to updated website
4 (optional): easy way of reverting to the old website if something goes very wrong.
First off, you are likely unaware of the "point in time restore" concept. The long and short of it is that if you're properly backing up your transaction logs, it doesn't matter how long your backups take -- you always have the ability to restore back to any point in time. You just restore your last backup and reapply the transaction logs since then, and you can get a restore right up to the point of deployment.
What I would tend to recommend would be reinstalling the website on a different Web Site definition with a "dead" host header configured -- this is your staging site. Make a script which runs your db changes all at once (in a transaction) and then flips the host headers between the live site and the staging site.
Environment:
Current live web site(s)
Current live database
New version of web site(s)
New version of database
Approach:
Setup a feed (e.g. replication, a stored procedure etc.) so that the current live database server is sending data updates to the new version of the database.
Change your router so that the new requests get pointed to the new version of the website until the old sites are no longer serving requests.
Take down the old site and database.
In this approach there is zero downtime because both the old site and the new site (and their respective databases) are permitted to serve requests side-by-side. The only problem scenario is clients who have one request go to the new server and a subsequent request go to the old server. In that scenario, they will not see the new data that might have been created on the new site. A solution to that is to configure your router to temporarily use sticky sessions and ensure that new sessions all go to the new web server.
One possibility would be to use versioning in your database.
So you have a global setting which defines the current version of all stored procedures to use.
When you come to do a release you do the following:
1. Change database schema, ensuring no stored procedures of the previous
version are broken.
2. Release the next version of stored procedures
3. Change the global setting, which switches the application to use the
next set of stored procedures/new schema.
The tricky portion is ensuring you don't break anything when you change the database schema.
If you need to make fundamental changes, you'll need to either use 'temporary' tables, which are used for one version, before moving to the schema you want in the next version, or you can modify the previous versions stored procedures to be more flexible.
That should mean almost zero downtime, if you can get it right.
Firstly - do regular, small changes - I've worked as a freelance developer in several major Investment Banks on various 24/7 live trading systems and the best, smoothest deployment model I ever saw was regular (monthly) deployments with a well defined rollback strategy each time.
In this way, all changes are kept to a minimum, bugs get fixed in a timely manner, development doesn't feature creep, and because it's happening so often, EVERYONE is motivated to get the deployment process as automatic and hiccup free as possible.
But inevitably, big schema changes come along that make a rollback very difficult (although it's still important to know - and test - how you'll rollback in case you have to).
For these big schema changes we worked a model of 'bridging the gap'. That is to say that we would implement a database transformation layer which would run in near real-time, updating a live copy of the new style schema data in a second database, based on the live data in the currently deployed system.
We would copy this a couple of times a day to a UAT system and use it as the basis for testing (hence testers always have a realistic dataset to test, and the transformation layer is being tested as part of that).
So the change in database is continuously running live, and the deployment of the new system then is simply a case of:
Freeze everyone out
Switching off the transformation layer
Turning on the new application layer
Switching users over to new application layer
Unfreeze everything
This is where rollback becomes something of an issue though. If the new system has run for an hour, rolling back to the old system is not easy. A reverse transformation layer would be the ideal but I don't think we ever got anyone to buy into the idea of spending the time on it.
In the end we'd deploy during the quietest period possible and get everyone to agree that rollback would take us to the point of switchover and anything missing would have to be manually re-keyed. Mind you - that motivates people to test stuff properly :)
Finally - how to do the transformation layer - in some of the simpler cases we used triggers in the database itself. Only once I think we grafted code into a previous release that did 'double updates', the original update to the current system, and another update to the new style schema. The intention was to release the new system at the next release, but testing revealed the need for tweaks to the database and the 'transformation layer' was in production at that point, so that process got messy.
The model we used most often for the transformation layer was simply another server process running, watching the database and updating the new database based on any changes. This worked well as that code is running outside of production, can be changed at will without affecting the production system (well - if you run on a replication of the production database you can, but otherwise you have to watch out for not tying the production database up with some suicidal queries - just put the best most conscientious guys on this part of the code!)
Anyway - sorry for the long ramble - hope I put the idea over - continuously do your database deployment as a 'live, running' deployment to a second database, then all you've got to do to deploy the new system is deploy the application layer and pipe everything to it.
I saw this post a while ago, but have never used it, so can't vouch for ease of use/suitability, but MS have a free web farm deployment framework that may suit you:
http://weblogs.asp.net/scottgu/archive/2010/09/08/introducing-the-microsoft-web-farm-framework.aspx
See my answer here: How to deploy an ASP.NET Application with zero downtime
My approach is to use a combination of polling AppDomains and a named mutex to create an atomic deployment agent.
I would reccomend using Analysis Services instead of the database engine for your reporting needs. Then you could process your cubes.. move your database.. change a connection string, reprocess your cubes and thus-- have zero downtime.
Dead serious... There isn't a better product in the world than Analysis Services for this type of thing.
As you say you don't have problem buying new server's, I suggest the best way would be to get a new server deploy you application there first. Follow below steps:
1. Add any certificates if required to this new server and Test your application with new settings.
2. Shutdown your old server and assign it's IP to the new Server, the downtime would be the same as much your server takes to shutdown and you assigning the new IP to the new Server.
3. If you see the new Deploy is not working you can always revert back by following the step 2 again.
Regarding your database backup you would have to set a backup schedule.
I just answered a similar question here: Deploy ASP.NET web site and Update MSSQL database with zero downtime
It discusses how to update the database and IIS website during a deployment with zero downtime, mainly by ensuring your database is always backwards compatible (but just to the last application release).
I have developed an network application that is in use in my company for last few years.
At start it was managing information about users, rights etc.
Over the time it grew with other functionality. It grew to the point that I have tables with, let's say 10-20 columns and even 20,000 - 40,000 records.
I keep hearing that Access in not good for multi-user environments.
Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client.
It happens because there is no database engine on the server side and data filtering is done on the client side.
I would migrate this project to the SQL Server but unfortunately it cannot be done in this case.
I was wondering if there is more reliable solution for me than using Access Database and still stay with a single-file database system.
We have quite huge system using dBase IV.
As far as I know it is fully multiuser database system.
Maybe it will be good to use it instead of Access?
What makes me not sure is the fact that dBase IV is much older than Access 2000.
I am not sure if it would be a good solution.
Maybe there are some other options?
If you're having problems with your Jet/ACE back end with the number of records you mentioned, it sounds like you have schema design problems or an inefficiently-structured application.
As I said in my comment to your original question, Jet does not retrieve full tables. This is a myth propagated by people who don't have a clue what they are talking about. If you have appropriate indexes, only the index pages will be requested from the file server (and then, only those pages needed to satisfy your criteria), and then the only data pages retrieved will be those that have the records that match the criteria in your request.
So, you should look at your indexing if you're seeing full table scans.
You don't mention your user population. If it's over 25 or so, you probably would benefit from upsizing your back end, especially if you're already comfortable with SQL Server.
But the problem you described for such tiny tables indicates a design error somewhere, either in your schema or in your application.
FWIW, I've had Access apps with Jet back ends with 100s of thousands of records in multiple tables, used by a dozen simultaneous users adding and updating records, and response time retrieving individual records and small data sets was nearly instantaneous (except for a few complex operations like checking newly entered records for duplication against existing data -- that's slower because it uses lots of LIKE comparisons and evaluation of expressions for comparison). What you're experiencing, while not an Access front end, is not commensurate with my long experience with Jet databases of all sizes.
You may wish to read this informative thread about Access: Is MS Access (JET) suitable for multiuser access?
For the record this answer is copied/edited from another question I answered.
Aristo,
You CAN use Access as your centralized data store.
It is simply NOT TRUE that access will choke in multi-user scenarios--at least up to 15-20 users.
It IS true that you need a good backup strategy with the Access data file. But last I checked you need a good backup strategy with SQL Server, too. (With the very important caveat that SQL Server can do "hot" backups but not Access.)
So...you CAN use access as your data store. Then if you can get beyond the company politics controlling your network, perhaps then you could begin moving toward upfitting your current application to use SQL Server.
I recently answered another question on how to split your database into two files. Here is the link.
Creating the Front End MDE
Splitting your database file into front end : back end is sort of a key to making it more performant. (Assume, as David Fenton mentioned, that you have a reasonably good design.)
If I may mention one last thing...it is ridiculous that your company won't give you other deployment options. Surely there is someone there with some power who you can get to "imagine life without your application." I am just wondering if you have more power than you might realize.
Seth
The problems you experience with an Access Database shared amongst your users will be the same with any file based database.
A read will pull a lot of data into memory and writes are guarded with some type of file lock. Under your environment it sounds like you are going to have to make the best of what you have.
"Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client. "
Actually no. This is a common misstatement spread by folks who do not understand the nature of how Jet, the database engine inside Access, works. Pulling down all the records, or excessive number of records, happens because you don't have all the fields used in the selection criteria or sorting in the index. We've also found that indexing yes/no aka boolean fields can also make a huge difference in some queries.
What really happens is that Jet brings down the index pages and data pages which are required. While this is a lot more data than a database engine would create this is not the entire table.
I also have clients with 600K and 800K records in various tables and performance is just fine.
We have an Access database application that is used pretty heavily. I have had 23 users on all at the same time before without any issues. As long as they don't access the same record then I don't have any problems.
I do have a couple of forms that are used and updated by several different departments. For instance I have a Quoting form that contains 13 different tabs and 10-20 fields on each tab. Users are typically in a single record for minutes editing and looking for information. To avoid any write conflicts I call the below function any time a field is changed. As long as it is not a new record being entered, then it updates.
Function funSaveTheRecord()
If ([chkNewRecord].value = False And Me.Dirty) Then
'To save the record, turn off the form's Dirty property
Me.Dirty = False
End If
End Function
They way I have everything setup is as follows:
PDC.mdb <-- Front End, Stored on the users machine. Every user has their own copy. Links to tables found in PDC_be.mdb. Contains all forms, reports, queries, macros, and modules. I created a form that I can use to toggle on/off the shift key bipass. Only I have access to it.
PDC_be.mdb <-- Back End, stored on the server. Contains all data. Only form and VBA it contains is to toggle on/off the shift key bipass. Only I have access to it.
Secured.mdw <-- Security file, stored on the server.
Then I put a shortcut on the users desktop that ties the security file to the front end and also provides their login credentials.
This database has been running without error or corruption for over 6 years.
Access is not a flat file database system! It's a relational database system.
You can't use SQL Server Express?
Otherwise, MySQL is a good database.
But if you can't install ANYTHING (you should get into those politics sooner rather than later -- or it WILL be later), just use you existing database system.
Basically with Access, it cannot handle more than 5 people connected at the same time, or it will corrupt on you.