We have an application that reads and writes to a third party data storage.
The code of that data storage is closed source, we do not know about it and can not change it.
There is only a slim API that allows reading and writing to it.
An pessimistic offline lock helps to span transactions and have concurrent applications work with it. That will work fine I believe.
But now we have the problem that other software will also write and read to that storage
and our application shall update when changes in that data storage happen. The data storage itself does not provide any notification. The third party software will not change some global state that indicates that something has changed.
Is there any kind of pattern or best practise to "observe" that data storage and
publish events to update all clients (of our software)?
I really do not want to periodically read, compare and publish events if it is not
absolutely the last resort. Perhaps someone has a better idea here?
A non-System implemented Pessimistic Offline Lock requires cooperation/participation/enforcement among all possible modifers of the data. This is generally not possible and is one of the two reasons that this approach is rarely taken in modern software. To do anything remotely like this (i.e., with multiple heterogenuous writers) in a useful way requires some kind help/assistance from the System facilities themselves. (The second reason is the issues of determining and resolving abandoned locks, very problematic).
As for possible solutions, then from a purely design viewpoint, either optimistic offline locks, which still need some System help, but much less, or avoid the issue altogether through more detailed state-progression/control in your data model.
My approach, however, would be to set-aside the design question (initially) recognizing that this is primarily an issue of the data-store's capabilities and start there, looking to use System-provided lock/transaction control, (which both 1: usually works and 2: is how it is usually done).
AFAIK, issues of synchronizing multi-writer access always have to start with "What tools/controls/facilities are available to constrain, divert and/or track the out-of-application writers?" What you can accomplish is practically limited by those facilities.
For instance, if you can force all access through a service of your own, then you can do almost anything. But if all you have is the OS's file-locking and file-modification-dates, then you are a lot more constrained. And if you don't have even that, then there's not much you can do.
In fact I do not have direct access to the data store, it is hosted on
some server and I have no control over the other applications that
read and write to it. Right now, the best I can think of is having a
service as a proxy which periodically queries the store, compares it
to an older state and fires update events to my clients if some other
application has altered it (and fire some other event if my
application alters it to notify my own clients, leaving the other
applications with their own problems). It sound not very good to me,
but it probably does the job.
Yep, that's about all you can do, and that only supports Optimistic Concurrency (partially), not Pessimistic. You might get improvements by adding some kind of checksum/hash to your stored data, but that's only an optimization.
Related
I have this scenario, and I don't really know where to start. Suppose there's a Web service-like app (might be API tho) hosted on a server. That app receives a request to proccess some data (through some method we will call processData(data theData)).
On the other side, there's a robot (might be installed on the same server) that procceses the data. So, The web-service inserts the request on a common Database (both programms have access to it), and it's supposed to wait for that row to change and send the results back.
The robot periodically check the database for new rows, proccesses the data and set some sort of flag to that row, indicating that the data was processed.
So the main problem here is, what should the method proccessData(..) do to check for the changes of the data row?.
I know one way to do it: I can build an iteration block that checks for the row every x secs. But i don't want to do that. What I want to do is to build some sort of event listener, that triggers when the row changes. I know it might involve some asynchronous programming
I might be dreaming, but is that even possible in a web enviroment.?
I've been reading about a SqlDependency class, Async and AWait classes, etc..
Depending on how much control you have over design of this distributed system, it might be better for its architecture if you take a step back and try to think outside the domain of solutions you have narrowed the problem down to so far. You have identified the "main problem" to be finding a way for the distributed services to communicate with each other through the common database. Maybe that is a thought you should challenge.
There are many potential ways for these components to communicate and if your design goal is to reduce latency and thus avoid polling, it might in fact be the right way for the service that needs to be informed of completion of this work item to be informed of it right away. However, if in the future the throughput of this system has to increase, processing work items in bulk and instead poll for the information might become the only feasible option. This is also why I have chosen to word my answer a bit more generically and discuss the design of this distributed system more abstractly.
If after this consideration your answer remains the same and you do want immediate notification, consider having the component that processes a work item to notify the component(s) that need to be notified. As a general design principle for distributed systems, it is best to have the component that is most authoritative for a given set of data to also be the component to answer requests about that data. In this case, the data you have is the completion status of your work items, so the best component to act on this would be the component completing the work items. It might be better for that component to inform calling clients and components of that completion. Here it's also important to know if you only write this data to the database for the sake of communication between components or if those rows have any value beyond the completion of a given work item, such as for reporting purposes or performance indicators (KPIs).
I think there can be valid reasons, though, why you would not want to have such a call, such as reducing coupling between components or lack of access to communicate with the other component in a direct manner. There are many communication primitives that allow such notification, such as MSMQ under Windows, or Queues in Windows Azure. There are also reasons against it, such as dependency on a third component for communication within your system, which could reduce the availability of your system and lead to outages. The questions you might want to ask yourself here are: "How much work can my component do when everything around it goes down?" and "What are my design priorities for this system in terms of reliability and availability?"
So I think the main problem you might want to really try to solve fist is a bit more abstract: how should the interface through which components of this distributed system communicate look like?
If after all of this you remain set on having the interface of communication between those components be the SQL database, you could explore using INSERT and UPDATE triggers in SQL. You can easily look up the syntax of those commands and specify Stored Procedures that then get executed. In those stored procedures you would want to check the completion flag of any new rows and possibly restrain the number of rows you check by date or have an ID for the last processed work item. To then notify the other component, you could go as far as using the built-in stored procedure XP_cmdshell to execute command lines under Windows. The command you execute could be a simple tool that pings your service for completion of the task.
I'm sorry to have initially overlooked your suggestion to use SQL Query Notifications. That is also a feasible way and works through the Service Broker component. You would define a SqlCommand, as if normally querying your database, pass this to an instance of SqlDependency and then subscribe to the event called OnChange. Once you execute the SqlCommand, you should get calls to the event handler you added to OnChange.
I am not sure, however, how to get the exact changes to the database out of the SqlNotificationEventArgs object that will be passed to your event handler, so your query might need to be specific enough for the application to tell that the work item has completed whenever the query changes, or you might have to do another round-trip to the database from your application every time you are notified to be able to tell what exactly has changed.
Are you referring to a Message Queue? The .Net framework already provides this facility. I would say let the web service manage an application level queue. The robot will request the same web service for things to do. Assuming that the data needed for the jobs are small, you can keep the whole thing in memory. I would rather not involve a database, if you don't already have one.
I have many objects like Pg, Layout, SitePart etc.,
Multiple users can edit these objects and can save them back to Data base.
However at a time only one user can save the changes to DB, and let the other users wait until the member completes his job if both are updating same objects.
I have functionality for building objects and saving them to DB.
However how to implement this locking, how the other user will know when the object is released.
Please shed some thoughts on this how to proceed.
Thank you in advance..
The type of behaviour you are describing is called pessimistic concurrency. You haven't said if you need this lock to get locked within a single web request or across multiple requests. Rather than reinventing the wheel, you should use standard concurrency techniques, and read up on how you implement those for .net.
Typically web applications use optimistic concurrency; if you need pessimistic concurrency it gets very hard very quickly. ASP.NET does not offer out of the box support for pessimistic concurrency.
You haven't said how you access your database (e.g. if you are using ADO.NET, or EF), but EF also has concurrency control. Ultimately it comes down to using transaction objects such as SqlTransaction to coordinate the updates across tables, being able to check to see if another user beat you to the update, and if they did then deciding what to do.
With pessimistic concurrency you have a whole lot more to worry about - where to put your global lock (e.g. in the code) what happens if that goes wrong (e.g. recycling of application pools in IIS can mean that two users don't lock the same object if your lock is in a code-based singleton) and how to deal with timeouts if you record locks in your database. If you want to see another SO question related to pessimistic concurrency, see: How do I implement "pessimistic locking" in an asp.net application?
Edit. I should also have mentioned that if you are already building logic for building objects and saving them to the db then you should be aware of the Repository and Unit of Work patterns. If not, then you should read about those as well. You are solving a standard problem that has standard patterns to implement those solutions in most languages.
I am new to this web application development and I have task to do. This probably would be some kind of a service (probably WCF at least this is my idea) which will be responsible for locking and unlocking records in db. I'm searching for some kind of best practices and/or tools which wil do that. By tools I mean the opensource solutions or something like that. The case is that what to do when user i.e closes the browser, or one is editing the record and the other one also edit the record, what we should do in this case. I hope this is understandable what I want to accomplish. From what that I know the problem with locks is that they are statless so this is some kind of an issue but I don't know what kind :) Thank you in advance for your help and time :)
ps. I've tried to google this in Stack..but all I get is the lock keyword in c# and in google there are soultions but not quite what I am looking for. Maybe I'm typing in the wrong keywords...I don't know
I'm searching for some kind of best practices
Don't do this. Do not write applications that explicitly lock and unlock data in the database. There is absolutely 0 (zero) valid scenarios for this.
I recommend you read about optimistic concurrency control.
Also read Entity Framework Optimistic Concurrency Patterns and Anti-Pattern #3: Mishandled Concurrency.
On the whole, locking records in a database is a really dangerous thing to do - especially through a service that isn't related to the actual data manipulation process. If other programs encounter that locked record and want to write to it, they tend to have to deal with exotic synchronisation issues - do they wait? Do they discard the changes they wanted to write?
In most database engines, the process that's been locked just waits - before you know it, you can have dozens or hundreds of suspended database tasks, all waiting for the lock to be released.
As Remus Rusanu writes, you should read up on optimistic concurrency control - this is the best practice for transactional web applications. It's supported by the MS Entity Framework (assuming your app is built using .Net); code example here.
I'm trying to design a system which reports activity events to a database via a web service. The web service and database have already been built (COTS software) - all I have to do is provide the event source.
The catch, though, is that the event source needs to be fault tolerant. We have multiple replicated databases that I can talk to, so if the web service or database I'm talking to goes down, the software can quickly switch to another one that's up.
What I need help with though is the case when all the databases are down. I've already designed a queue that will hold on to the events as they pile in (and burst them out once the connection is restored), but the queue is an in-memory structure: if my app crashes in this state, or if power is lost, etc., then all the events in the queue are lost. This is unacceptable. What I need is a way to persist the events so that when a database comes back online I can send a burst of queued-up events, even in the event of power loss or crash.
I know that I don't want to re-implement the queue itself to use the file system as a backing store. This would work (and I've tried it) - but that method slows the system down dramatically as the hard drive becomes a bottleneck. Aside from this though, I can't think of a single way to design this system such that all the events are safely stored on the hard drive only when access to the database isn't available.
Does anyone have any ideas? =)
When I need messaging with fault tolerance (and/or guaranteed delivery, which based on your description I am guessing you also need), I usually turn to MSMQ. It provides both fault tolerance (messages are stored on disk in case of machine restart) and guaranteed delivery (messages will automatically and continually resend until they are received), as well as transactional sends and receives, message journaling, poison message handling, and other features.
I have been able to achieve a throughput of several thousand messages per second using MSMQ. Frankly, I am not sure that you will get too much better than that while still being fault tolerant.
msmq. I think you could also take a look at the notion of Job object.
I would agree with guys that better to use out of the box system like MSMQ with a set of messaging patterns in hand.
Anyway, if you have to do it yourself, you can use in memory database instead of serializing data yourself, I believe it should be faster enough.
I have a CRUD winform App that uses Merge Replication to allow "disconnected" functionality. My question is; If I am doing all initializing and synchronizing programatically with RMO (like HERE) does it matter if it is a Push or Pull?
What would be a difference?
I understand the differences between the two (see HERE) but it seems that if I am only interacting through RMO the differences become a little fuzzy. If I can it seems that, even though Pull is favored for Merge Replication, I would want to use Push to make the Server bear the brunt and easier management.
Also, due to our environment, I do not need "real-time" updates. Syncing, in either case, will be fired from a UI event.
Does anyone have any experience with this?
Thanks!
We use merge replication via RMO on 20+ client systems that are occasionally connected. As far as I know, you should go with pull subscriptions. I don't know if you could make it work with push subscriptions but I don't advise trying. As you say, the client system will be requesting the sync, which fits the definition of a pull subscription.
The "Use When" section in your second link is pretty clear in its recommendation for push in this case:
Data will typically be synchronized on demand or on a schedule rather than
continuously.
The publication has a large number of Subscribers, and/or it would be too
resource-intensive to run all the
agents at the Distributor.
Subscribers are autonomous, disconnected, and/or mobile.
Subscribers will determine when they
will connect and synchronize changes.
Most often used with merge replication.