I have a query on bi-directional data sync.
The scenario is, that we have ERP software running on a local network
which is developed in PowerBuilder and the database is SQL Anywhere 16, Also, we have our cloud software which is developed in .net6 and the database is Azure SQL. And also we have a Middleware developed on .net which interacts with our API and local DB. After an operation like Invoice generation, we need to keep the quantity of a product accurate the same as local DB and cloud DB. Whether the operation happened in the cloud or local network. Please share your thoughts.
The approach would depend on whether you are willing to sacrifice consistency or concurrency.
If you are willing to sacrifice consistency, which in your case would be acceptable I believe for some use cases like syncing local invoices to the cloud, your middleware could asynchronously ensure that both the databases are in sync on the side.
If you are willing to sacrifice concurrency, which would be needed to ensure quantity of a product is accurate before checking out, you would essentially use a lock to ensure the product is available and is not being checked out by someone else. This has the down side of slowing down the system since multiple requests would be waiting as previous requests are being processed.
As for the actual implementation itself, you could use a queue for each transaction which the middleware could receive and sync for the first option. And for the second option, you would need to use some kind of distributed lock for each product that your API/Middleware would need to acquire before committing changes.
Depending on the scale and business KPIs of your application, you would have to decide how to approach the second option. Here is another SO thread which has a decent discussion on various approaches based on what you are willing to sacrifice.
Related
I am fairly new to asynchronous programming so I need help.
What I need to do is, create a windows service that constantly checks the database for menu updates (insert/updates), tables updates (insert/updates), menu category updates (insert/updates) and so on and if any change is detected the service will then need to POST those said changes to separate APIs one by one. Keeping in mind that the service will be used for just this purpose and the database that I need to check for updates is SQL Server.
So, how do I approach this scenario efficiently ? Do I create new Tasks (System.Threading.Tasks) or create new Threads (System.Threading.Thread) for each pieces like UpdateMenu that checks the menu updates and upload to api, UpdateTable, UpdateDishes and so on and how do I go about the Posting to the API part I mean do I create a new Task for each and every API call? I want the application to be as efficient as possible and pick the changes and post them to API as soon as possible.
Thanks in advance.
It seems that you are worried about the overhead of the mechanism that you are going to use, in order to fetch data from the database and post these data to APIs. You are thinking that maybe Threads are fast and Tasks are slower, or vice versa. In fact choosing between these two mechanisms is likely to have no measurable impact to your service's demand for CPU, memory or other system resources.
What is likely to be impactful, is the pattern of communication of your service with the database and the APIs. For example if your threads/tasks are not coordinated with each other, and query the database all at the same time, the database might be slow to respond, and might consume larger amounts of memory while preparing the response. That's not because your threads/tasks are slow. It's because your service is querying the database with a pattern that makes it harder for the database to respond. The same might be true for the pattern of communication with the APIs. If your workers are not coordinated, the network connectivity might become a bottleneck, or the remote machines that host the APIs might suffer.
So my advice is to focus on the usability factor of the mechanisms, and not on their supposed difference in performance. If you are comfortable and familiar with threads, and know nothing about tasks, use threads. If you are familiar with both threads and tasks, use tasks because they are generally easier to use. You'd better invest your time to optimize the communication pattern between your service and its dependencies, than for doing benchmarks trying to find the best between mechanisms that for all intents and purposes are equally efficient.
This is more of a programming strategy and direction question, than the actual code itself.
I am programming in C-Sharp.
I have an application that remotely starts processes on many different clients on the network, could be up to 1000 clients in theory.
It then monitors the status of the remote processes by reading a log file on each client.
I currently do this by running one thread that loops through all of the clients in a list, and reading the log file. It works fine for 10 or 20 machines, but 1000 would probably be untenable.
There are several problems with this approach:
First, if the thread doesn’t finish reading all of the client statuses before it’s called again, the client statuses at the end of the list might not be read and updated.
Secondly, if any client in the list goes offline during this period, the updating hangs, until that client is back online again.
So I require a different approach, and have thought up a few possible ways to resolve this.
Spawn a separate thread for each client, to read their log file and update its progress.
a. However, I’m not sure if having 1000 threads running on my machine is something that would be acceptable.
Test the connect for each machine first, before trying to read the file, and if it cannot connect, then just ignore it for that iteration and move on to the next client in the list.
a. This still has the same problem of not getting through the list before the next call, and causes more delay and it tries to test the connection via a port first. With 1000 clients, this would be noticeable.
Have each client send the data to the machine running the application whenever there is an update.
a. This could create a lot of chatter with 1000 machines trying to send data repeatedly.
So I’m trying to figure if there is another more efficient and reliable method, that I haven’t considered, or which one of these would be the best.
Right now I’m leaning towards having the clients send updates to the application, instead of having the application pulling the data.
Looking for thoughts, concerns, ideas and recommendations.
In my opinion, you are doing this (Monitoring) the wrong way. Instead of keeping all logs in a text file, you'd better preserve them in a central data repository that can be of any kind. With respect to the fact that you are monitoring the performance of those system, your design and the mechanism behind it must not impact the performance of the target systems negatively, and with this design the disk and CPU would be involved so much in certain cases that can result in a performance issue itself.
I recommend you to create a log repository server using a fast in-memory database like Redis, and send logged data directly to that server. Keep in mind that this database must be running on a different virtual machine. You can then tune Redis to store received data on physical Disk once a particular number of indexes are reached or a particular interval elapses. The in-memory feature here is advantageous as you may need to query information a lot in a monitoring application like this. On the other hand, the performance of Redis is so high that it efficiently passes processing millions of indexes.
The blueprint for you is that:
1- Centralize all log data in a single repository.
2- Configure clients to send monitored information to the centralized repository.
3- Read the data from the centralized repository by the main server (monitoring system) when required.
I'm not trying to advertise for a particular tool here as I'm only sharing my own experience. There's many more tools that you can use for this purpose such as ElasticSearch.
My ASP.NET MVC 4 project is using EF5 code-first, and some of the domain objects contain non- persisted counter properties which are updated according to incoming requests. These requests come very frequently and s scenario in which multiple request sessions are modifying these counters is quite probable.
My question is, is there a best practice, not necessarily related to ASP.NET or to EF, to handle this scenario? I think (but I'm not sure) that for the sake of this discussion, we can treat the domain objects as simple POCOs (which they are).
EDIT: As requested, following is the actual scenario:
The system is a subscriber and content management system. Peer servers are issuing requests which my system either authorizes or denies. Authorized requests result in opening sessions in peer servers. When a session is closed in the peer server, it issues a request notifying that the session has been closed.
My system needs to provide statistics - for example, the number of currently open sessions for each content item (one of the domain entities) - and provide real-time figures as well as per-minute, hourly, daily, weekly etc. figures.
These figures can't be extracted by means of querying the database due to performance issues, so I've decided to implement the basic counters in-memory, persist them every minute to the database and take the hourly, daily etc. figures from there.
The issue above results from the fact that each peer server request updates these "counters".
I hope it's clearer now.
Sounds like your scenario still requires a solid persistence strategy.
Your counter objects can be persisted to the HttpRuntime.Cache.
Dan Watson has an exceptional writeup here:
http://www.dotnetguy.co.uk/post/2010/03/29/c-httpruntime-simple-cache/
Be sure to use CacheItemPriority.NotRemovable to ensure that it maintains state during memory reclamation. The cache would be maintained within the scope of the app domain. You could retrieve and update counters (its thread safe!) in the cache and query its status from presumably a stats page or some other option. However if the data needs to be persisted beyond the scope of runtime then the strategy you're already using is sufficient.
Actually I think you have no need to wary about performance to much before you do not have enough info from tests and profiler tools.
But if you're working with EF, so you have deals with DataContext, which is the Unit Of Work pattern implementation described by Martin Fowler in his book. The main idea of such a pattern is reducing amount of requesting to database and operating the data in-memory as much as possible until you do not commit all your changes. So my short advice will be just using your EF entities in standard way, but not committing changes each time when data updates, but with the some intervals, for example after the 100 changes, storing data between requests in Session, Application session, Cache or somewhere else. The only thing you should care about is you using proper DataContext object each time, and do not forget disposed it when you no need it any more.
Following on from this question...
What to do when you’ve really screwed up the design of a distributed system?
... the client has reluctantly asked me to quote for option 3 (the expensive one), so they can compare prices to a company in India.
So, they want me to quote (hmm). In order for me to get this as accurate as possible, I will need to decide how I'm actually going to do it. Here's 3 scenarios...
Scenarios
Split the database
My original idea (perhaps the most tricky) will yield the best speed on both the website and the desktop application. However, it may require some synchronising between the two databases as the two "systems" so heavily connected. If not done properly and not tested thouroughly, I've learnt that synchronisation can be hell on earth.
Implement caching on the smallest system
To side-step the sync option (which I'm not fond of), I figured it may be more productive (and cheaper) to move the entire central database and web service to their office (i.e. in-house), and have the website (still on the hosted server) download data from the central office and store it in a small database (acting as a cache)...
Set up a new server in the customer's office (in-house).
Move the central database and web service to the new in-house server.
Keep the web site on the hosted server, but alter the web service URL so that it points to the office server.
Implement a simple cache system for images and most frequently accessed data (such as product information).
... the down-side is that when the end-user in the office updates something, their customers will effectively be downloading the data from a 60KB/s upload connection (albeit once, as it will be cached).
Also, not all data can be cached, for example when a customer updates their order. Also, connection redundancy becomes a huge factor here; what if the office connection is offline? Nothing to do but show an error message to the customers, which is nasty, but a necessary evil.
Mystery option number 3
Suggestions welcome!
SQL replication
I had considered MSSQL replication. But I have no experience with it, so I'm worried about how conflicts are handled, etc. Is this an option? Considering there are physical files involved, and so on. Also, I believe we'd need to upgrade from SQL express to SQL non-free, and buy two licenses.
Technical
Components
ASP.Net website
ASP.net web service
.Net desktop application
MSSQL 2008 express database
Connections
Office connection: 8 mbit down and 1 mbit up contended line (50:1)
Hosted virtual server: Windows 2008 with 10 megabit line
Having just read for the first time your original question related to this I'd say that you may have laid the foundation for resolving the problem simply because you are communicating with the database by a web service.
This web service may well be the saving grace as it allows you to split the communications without affecting the client.
A good while back I was involved in designing just such a system.
The first thing that we identified was that data which rarely changes - and immediately locked all of this out of consideration for distribution. A manual process for administering using the web server was the only way to change this data.
The second thing we identified was that data that should be owned locally. By this I mean data that only one person or location at a time would need to update; but that may need to be viewed at other locations. We fixed all of the keys on the related tables to ensure that duplication could never occur and that no auto-incrementing fields were used.
The third item was the tables that were truly shared - and although we worried a lot about these during stages 1 & 2 - in our case this part was straight-forwards.
When I'm talking about a server here I mean a DB Server with a set of web services that communicate between themselves.
As designed our architecture had 1 designated 'master' server. This was the definitive for resolving conflicts.
The rest of the servers were in the first instance a large cache of anything covered by item1. In fact it wasn't a large cache but a database duplication but you get the idea.
The second function of the each non-master server was to coordinate changes with the master. This involved a very simplistic process of actually passing through most of the work transparently to the master server.
We spent a lot of time designing and optimising all of the above - to finally discover that the single best performance improvement came from simply compressing the web service requests to reduce bandwidth (but it was over a single channel ISDN, which probably made the most difference).
The fact is that if you do have a web service then this will give you greater flexibility about how you implement this.
I'd probably start by investigating the feasability of implementing one of the SQL server replication methods
Usual disclaimers apply:
Splitting the database will not help a lot but it'll add a lot of nightmare. IMO, you should first try to optimize the database, update some indexes or may be add several more, optimize some queries and so on. For database performance tuning I recommend to read some articles from simple-talk.com.
Also in order to save bandwidth you can add bulk processing to your windows client and also add zipping (archiving) to your web service.
And probably you should upgrade to MS SQL 2008 Express, it's also free.
It's hard to recommend a good solution for your problem using the information I have. It's not clear where is the bottleneck. I strongly recommend you to profile your application to find exact place of the bottleneck (e.g. is it in the database or in fully used up channel and so on) and add a description of it to the question.
EDIT 01/03:
When the bottleneck is an up connection then you can do only the following:
1. Add archiving of messages to service and client
2. Implement bulk operations and use them
3. Try to reduce operations count per user case for the most frequent cases
4. Add a local database for windows clients and perform all operations using it and synchronize the local db and the main one on some timer.
And sql replication will not help you a lot in this case. The most fastest and cheapest solution is to increase up connection because all other ways (except the first one) will take a lot of time.
If you choose to rewrite the service to support bulking I recommend you to have a look at Agatha Project
Actually hearing how many they have on that one connection it may be time to up the bandwidth at the office (not at all my normal response) If you factor out the CRM system what else is a top user of the bandwidth? It maybe the they have reached the point of needing more bandwidth period.
But I am still curious to see how much information you are passing that is getting used. Make sure you are transferring efferently any chance you could add some easy quick measures to see how much people are actually consuming when looking at the data.
I have a WCF service which has two methods exposed:
Note: The wcf service and sql server is deployed in same machine.
Sql server has one table called employee which maintains employee information.
Read() This method retrieves all employees from sql server.
Write() This method writes (add,update,delete) employee info in employee table into sql server.
Now I have developed a desktop based application through which any client can query, add,update and delete employee information by consuming a web service.
Question:
How can I handle the scenario, if mulitple clients want update the employee information at the same time? Is the sql server itself handle this by using database locks ??
Please suggest me the best approach!!
Generally, in a disconnected environment optimistic concurrency with a rowversion/timestamp is the preferred approach. WCF does support distributed transactions, but that is a great way to introduce lengthy blocking into the system. Most ORM tools will support rowversion/timestamp out-of-the-box.
Of course, at the server you might want to use transactions (either connection-based or TransactionScope) to make individual repository methods "ACID", but I would try to avoid transactions on the wire as far as possible.
Re comments; sorry about that, I honestly didn't see those comments; sometimes stackoverflow doesn't make this easy if you get a lot of comments at once. There are two different concepts here; the waiting is a symptom of blocking, but if you have 100 clients updating the same record it is entirely appropriate to block during each transaction. To keep things simple: unless I can demonstrate a bottleneck (requiring extra work), I would start with a serializable transaction around the update operations (TransactionScope uses this by default). That way yes: you get appropriate blocking (ACID etc) for most scenarios.
However; the second issue is concurrency: if you get 100 updates for the same record, how do you know which to trust? Most systems will let the first update in, and discard the rest as they are operating on stale assumptions about the data. This is where the timestamp/rowversion come in. By enforcing "the timestamp/rowversion must match" on the UPDATE statement, you ensure that people can only update data that hasn't changed since they took their snapshot. For this purpose, it is common to keep the rowversion alongside any interesting data you are updating.
Another alternative is that you could instantiate the WCF service as a singleton (InstanceContext.Single) - which means there is only one instance of it running ever. Then, you could keep a simple object in memory for the purpose of update locking, and lock in your update method based on that object. When update calls come in from other sessions, they will have to wait until the lock is released.
Regards,
Steve