We are in the process of redesigning a disjointed realtime OPC system which has proven to be cumbersome. Our technology stack is C#, .NET 4 and SQL Server 2008 R2, hosted on 32 bit Windows Server 2003. Physical archtecture currently dictates that all tiers are to be hosted on a single server, although with sufficient motivation (read: ROI) this could be increased to 2.
The basic existing architecture is:
External OPC devices call our webservice to populate SQL with realtime data, approxomately 300 events per second. We have no control over volume or batch processing here, although in the rewrite I would like to implement batching in the web service to spare SQL from 300 insert statements per second.
SQL is used as a central resource for various components (about 9 in total, all to be redesigned) that perform tasks ranging from alarms to reporting. This is currently the biggest problem with the existing design, in that there is no single BLL or even DAL through which all these components consume/manipulate data or govern behavior.
Components range from Windows Services to Web Services to Windows Applications. The biggest consumer of CPU time and SQL connections is a Windows Forms application that monitors all realtime data and raises alarms as required. It also does realtime trend graphs, which is quite expensive to run.
For the rewrite there is a strong push towards WPF with which, apart from the learning curve, I have no problem. My question is more concerned with the underlying architecture:
I am currently doing some research on how to implement a single DAL and BLL. For the DAL I am leaning towards EF or nHibernate, with Linq-to-SQL also a possibility.
For the BLL I only have experience in CSLA.NET, and I fear this may be a bit over the top for a system in which speed and resource consumption is crucial.
Does anybody have any experience with a similiar system, and are willing to share a few lessons or design guidelines?
I have some exposure to acquiring data from OPC servers, though the applications I implemented I believe were not as large scale as yours. For my application I had a publish - subscribe architecture based messaging layer, my suggestion based on my experience then would be
1) For your real time data acquisition you would need something based on a publish - subscribe mechanism, Biz talk server is the microsoft answer to ESB. So I would look at this.
2) Does your windows forms application need to look at the database directly ? I mean can it look at an intermediate that can say look at the db for historical purposes or subscribe to the real time feed if all it cares is real time information
Im not sure I like the idea of having an SQL server as a central point in the system. this server is going to get hammered - every time data changes on a device, it will write to the database. Every client will then continually refresh at a constant rate to detect if there are any changes. this is going to be a lot of work for the SQL server.
The OPC protocol involves clients subscribing to a server, so they can then be notified on change of any data. Using SQL in the middle prevents this.
Would you not be far better off using/creating an OPC server to retrieve all data from the device, then allowing each client to connect to this? That way they are only recieving data when it changes, rather than having to constantly check for updates?
If you need to log for historical reasons, you can always make an extra client which then logs the data to an SQL database.
Related
I need a function on my website so that it can update the sports data, for example, the result of a sports game in real time. I have seen some websites do that, but I don't know how to monitoring those data. Any suggestions or help?
There are 4 possible approaches for displaying real-time data on website:
Refresh page at periodic intervals
Obsolete method. Not recommended for modern apps.
AJAX calls from browser to pull data at periodic intervals
This is the most popular method used currently by many websites
Can be done with least development effort.
3. Websocket
Modern method. Used extensively in financial services domain.
Good for bi-directional communication between client and server.
Adds an unnecessary overhead for simple updates by server (Example: match score)
4. SSE (Server-Sent-Events)
Most modern of all methods. Quickly gaining adoption.
Least overhead.
Most preferred for near real-time update from server to client.
More information on SSE:
https://developers.facebook.com/docs/graph-api/server-sent-events/
I have SQL Server database with information for files - I'm talking about custom properties. These are categories and description for each file.
The Windows Forms application is for the user. But I will also make a Windows Service that will track any changes with the files. If a change happens(renamed,moved,deleted) the service has to update that same database accordingly. And I think it should do it right away, without any delay.
Now this is going to be my first time making WS plus the first time I will have to handle concurrency (theoretically I know about threads and so on).
So:
First of all, is it OK if one process is updating a database another process may be using at the same time? Do you need to handle that situation on the first place? (Probably, fx in our daily "user lives" we can't modify a file when it's being used by another process)
Is the idea these two to share one data source good ?
If it is, then how to handle the concurrency ? I can use WCF for the messages between the two, but then does the solution have something to do with WCF ? Because I'm going to use this for the first time as well :D.
Any help is appreciated. Thanks in advance for the time !
Since MS SQL is transactional there will be no big deal. You just have to watch out for data wich might be read and updated by one process - there it can be neccessary to use a Transaction scope (that's a .NET Class ;)).
From the Software architectural Point of view you should conside using a three-tier and not a two-tier application:
Two Tier:
Essentially your System with the persistance-layer (DB) communicating with the Clients directly
Three Tier:
Persistance-Layer <--> Logic-Layer (e.g. a WCF-Service handling the app logic) <--> Clients (Service and Forms - triggering app logic and showing results)
When it comes to concurrency it's going to be really straight forward. The MSSQL database engine handles just about all of it (e.g. locking and sharing). Further, if you leverage the SqlCommandBuilder to build your statements, the statements will automatically use optimistic concurrency.
As for the Windows service and how it gets notified, use a FileSystemWatcher, it going to be more efficient and you won't be published some service port on the local box.
I'd normally give you some good code examples but I'm answering this from my phone.
I have multi-layered application architecture that has 4 parts:
A networking server/client layer
An intermediate data layer to handle interactions between processes
A monitoring layer
A client layer made up of n number of instances
Client/Server layer:
The client/server layer handles asynchronous network communications with another computer implemented using a custom Layer 2 protocol. Due to design constraints built into the communications, it needs to remain independent and able to poll/push data to the data layer asynchronously.
Intermediate Layer:
The intermediate layer is currently implemented using a database. One table holds all of the possible labels that can be called on (about 120,000). A second table holds an intermediate cache of the first table containing only the values in use, this requires constant updates and gets flushed when a new collection of items is requested. The third table is where collection updates are sent and only contains data when a request is pending.
The Monitor Layer:
The monitor layer is a multi-threaded monolithic application. It spawns n number of client instances based on how many monitors are attached. It manages global state between all client instances because one or more of them may share similar/identical state. It creates a unique listing of values needed, manages sending update requests when the clients need a different set of labels, and manages recurring updates.
Obviously, this isn't ideal. If one instance goes down it can take the rest down with it. What I'd like to do is remove the intermediate layer, replace it with the monitor layer, and make everything spawn as subprocesses of the monitor process so they can be respawned at will if something goes awry (ex. comms heartbeat stops, client crashes, etc).
The database just seems too heavy and not specialized enough to handle the IPC (Inter Process Communications). The program was written under extreme time constraints so utilizing a database was the 'easy solution' with the expectation that it would change in the future. I'm a big fan of the robustness of Google Chrome's multi-process architecture but I know little about how they tie all the processes together (pipes, tcp, ?).
So:
Could I expect a significant performance improvement from using IPC over a database for the intermediate layer?
What form of IPC would be ideal on a Windows system?
Is there a cross platform (read Linux) alternative solution available that could be used in its place if development were moved to Mono?
Where can I find resources/examples to help get a start?
Note: I understand that the architecture of this system seems unnecessarily complex but it exists as a front-end for a much larger system. This application is also mission critical so stability trumps efficiency.
Update:
I forgot to mention in the initial question. The database data/index is loaded directly from a ramdisk on boot. The database itself has been indexed for optimal performance. Tables or values that require frequent writes are not indexed but the rest of the data is.
I'm looking for an alternative to measure against because optimization of the db has been taken to its limit and I think there's still a lot of room for improvement.
I will upload a some diagrams of the architecture as soon as I get some time to draw them up.
Yes. The database most likely involves the harddrive, and the harddrive is the slowest part of any computer so switching away from using the harddrive will probably have performance benefits.
I would go with zeromq / zmq. Its a message oriented framework that supports several communication patterns. For instance PUB/SUB or REQ/REP etc. More examples here
zmq is cross platform and its amazingly fast.
Some C# examples on github
I would consider looking into an Actor Model based solution, such as Akka.NET.
Following on from this question...
What to do when you’ve really screwed up the design of a distributed system?
... the client has reluctantly asked me to quote for option 3 (the expensive one), so they can compare prices to a company in India.
So, they want me to quote (hmm). In order for me to get this as accurate as possible, I will need to decide how I'm actually going to do it. Here's 3 scenarios...
Scenarios
Split the database
My original idea (perhaps the most tricky) will yield the best speed on both the website and the desktop application. However, it may require some synchronising between the two databases as the two "systems" so heavily connected. If not done properly and not tested thouroughly, I've learnt that synchronisation can be hell on earth.
Implement caching on the smallest system
To side-step the sync option (which I'm not fond of), I figured it may be more productive (and cheaper) to move the entire central database and web service to their office (i.e. in-house), and have the website (still on the hosted server) download data from the central office and store it in a small database (acting as a cache)...
Set up a new server in the customer's office (in-house).
Move the central database and web service to the new in-house server.
Keep the web site on the hosted server, but alter the web service URL so that it points to the office server.
Implement a simple cache system for images and most frequently accessed data (such as product information).
... the down-side is that when the end-user in the office updates something, their customers will effectively be downloading the data from a 60KB/s upload connection (albeit once, as it will be cached).
Also, not all data can be cached, for example when a customer updates their order. Also, connection redundancy becomes a huge factor here; what if the office connection is offline? Nothing to do but show an error message to the customers, which is nasty, but a necessary evil.
Mystery option number 3
Suggestions welcome!
SQL replication
I had considered MSSQL replication. But I have no experience with it, so I'm worried about how conflicts are handled, etc. Is this an option? Considering there are physical files involved, and so on. Also, I believe we'd need to upgrade from SQL express to SQL non-free, and buy two licenses.
Technical
Components
ASP.Net website
ASP.net web service
.Net desktop application
MSSQL 2008 express database
Connections
Office connection: 8 mbit down and 1 mbit up contended line (50:1)
Hosted virtual server: Windows 2008 with 10 megabit line
Having just read for the first time your original question related to this I'd say that you may have laid the foundation for resolving the problem simply because you are communicating with the database by a web service.
This web service may well be the saving grace as it allows you to split the communications without affecting the client.
A good while back I was involved in designing just such a system.
The first thing that we identified was that data which rarely changes - and immediately locked all of this out of consideration for distribution. A manual process for administering using the web server was the only way to change this data.
The second thing we identified was that data that should be owned locally. By this I mean data that only one person or location at a time would need to update; but that may need to be viewed at other locations. We fixed all of the keys on the related tables to ensure that duplication could never occur and that no auto-incrementing fields were used.
The third item was the tables that were truly shared - and although we worried a lot about these during stages 1 & 2 - in our case this part was straight-forwards.
When I'm talking about a server here I mean a DB Server with a set of web services that communicate between themselves.
As designed our architecture had 1 designated 'master' server. This was the definitive for resolving conflicts.
The rest of the servers were in the first instance a large cache of anything covered by item1. In fact it wasn't a large cache but a database duplication but you get the idea.
The second function of the each non-master server was to coordinate changes with the master. This involved a very simplistic process of actually passing through most of the work transparently to the master server.
We spent a lot of time designing and optimising all of the above - to finally discover that the single best performance improvement came from simply compressing the web service requests to reduce bandwidth (but it was over a single channel ISDN, which probably made the most difference).
The fact is that if you do have a web service then this will give you greater flexibility about how you implement this.
I'd probably start by investigating the feasability of implementing one of the SQL server replication methods
Usual disclaimers apply:
Splitting the database will not help a lot but it'll add a lot of nightmare. IMO, you should first try to optimize the database, update some indexes or may be add several more, optimize some queries and so on. For database performance tuning I recommend to read some articles from simple-talk.com.
Also in order to save bandwidth you can add bulk processing to your windows client and also add zipping (archiving) to your web service.
And probably you should upgrade to MS SQL 2008 Express, it's also free.
It's hard to recommend a good solution for your problem using the information I have. It's not clear where is the bottleneck. I strongly recommend you to profile your application to find exact place of the bottleneck (e.g. is it in the database or in fully used up channel and so on) and add a description of it to the question.
EDIT 01/03:
When the bottleneck is an up connection then you can do only the following:
1. Add archiving of messages to service and client
2. Implement bulk operations and use them
3. Try to reduce operations count per user case for the most frequent cases
4. Add a local database for windows clients and perform all operations using it and synchronize the local db and the main one on some timer.
And sql replication will not help you a lot in this case. The most fastest and cheapest solution is to increase up connection because all other ways (except the first one) will take a lot of time.
If you choose to rewrite the service to support bulking I recommend you to have a look at Agatha Project
Actually hearing how many they have on that one connection it may be time to up the bandwidth at the office (not at all my normal response) If you factor out the CRM system what else is a top user of the bandwidth? It maybe the they have reached the point of needing more bandwidth period.
But I am still curious to see how much information you are passing that is getting used. Make sure you are transferring efferently any chance you could add some easy quick measures to see how much people are actually consuming when looking at the data.
Say I need to design an in-memory service because of a very high load read/write system. I want to dump the results of the objects every 2 minutes. How would I access the in-memory objects/data from within a web application?
(I was thinking a Windows service would be running in the background handling the in-memory service etc.)
I want the fastest possible solution, and I would guess most people would say use a web service? What other options would I have? I just don't understand how I could hook into the Windows service's objects etc.
(Please don't ask why I would want to do this, maybe you're right and it's a bad idea but I am also curious if this type of architecture is possible.)
Update
I was looking at this site swoopo.com that I would think has a lot of hits near the end of auctions, but since the auction keeps resetting the hits to the database would be just crazy so I was thinking if they did it in memory then dumped to db every x minutes...
What you're describing is called a cache, with a facade front-end.
You write a facade to which you commit your changes and acquire your datasets. The facade queues up reads and writes and commits when the queue is full or after a certain amount of time has passed. Your web application has a single point of access to the data (the facade), and the facade is structured in such a way to avoid writing and reading from storage too often.
Most relational database management systems do this for you. They do this kind of optimization and queuing internally so writing another layer on top of it would only slow things down. So don't write a cache if you're using an RDBMS.
Regarding the specifics of accessing such a facade, you can treat it as just an object, and implement it however you want (its own thread, a thread pool, a Web service, a Windows service, whatever).
Any remoting technology would work such as sockets, pipes and the like.
Check out: www.remobjects.com
You could use a Windows Message Queues or a Service Bus, or even .NET remoting.
See http://www.nservicebus.com/, or http://code.google.com/p/masstransit/.
You could hook into the Windows Services objects by using Remoting or WCF, both offer very fast interprocess communication. Sockets are fast too but are more cumbersome to program compared to WCF. There is a ton of WCF documentation and support online.
Databases provide a level of caching for you. The advantage of an in memory golden copy such as the one you propose is that it never has to read from disk when a request comes in and if you host it on the same machine as your IIS (provided you have enough RAM for both) there is no extra network hop, making it much faster that querying a db. However, the downside to this approach is that it does not scale as well if you need to add machines to load balance.
Third party messaging providers such as TIBCO are also worth looking at.