Read real time data from other websites

Read real time data from other websites - c#

I need a function on my website so that it can update the sports data, for example, the result of a sports game in real time. I have seen some websites do that, but I don't know how to monitoring those data. Any suggestions or help?

There are 4 possible approaches for displaying real-time data on website:
Refresh page at periodic intervals
Obsolete method. Not recommended for modern apps.
AJAX calls from browser to pull data at periodic intervals
This is the most popular method used currently by many websites
Can be done with least development effort.
3. Websocket
Modern method. Used extensively in financial services domain.
Good for bi-directional communication between client and server.
Adds an unnecessary overhead for simple updates by server (Example: match score)
4. SSE (Server-Sent-Events)
Most modern of all methods. Quickly gaining adoption.
Least overhead.
Most preferred for near real-time update from server to client.
More information on SSE:
https://developers.facebook.com/docs/graph-api/server-sent-events/

Related

Pass Info to Client without Page Refresh in Asp.net MVC Application

Today we have seen some sites that pass data or notification to client without page refresh. Named Real time or interactive applications.
some of known site are :
Stackoverflow : notifications
Freelancer : passes project and professional counts asynchronously in numeric format
Google Mail : Counts mail memory usage by users in total.
and so more ... .
I have tried and searched some tools like SignalR. Basically SignalR designed for creating chat application. But is there a direct way without any extension in Microsoft Technologies to meet our purpose? For example suppose we want a simple counter like freelancer, Have we no way except using extensions like SignalR?

You can look at a technique called polling (which SignalR falls back to when support for other methods are not present), basically the concept is that every x seconds you'd send a request to the server to check for an update (more or less), for example (using jQuery):
setInterval(function() {
$.get("/Messages/GetCount", function(data) {
// do something with the data ...
});
}, 30000);
Every 30 seconds, check the Messages count - and perform an action accordingly. Here is a good article on polling and long polling (it mentions a SignalR alternative called Socket.IO).
Having said all that, I'd seriously just go with SignalR, those guys tested all kinds of corner cases, performance etc.

Use a Javascript timer on the client-side to make periodic asynchronous requests for updated information. This updated information can then be used to update the client-side, or can be used to prompt further requests for more details.
This solution can work for situations where you do not need to receive immediate updates whenever there are updates available on the server side (but instead can wait for the timer interval). It also may present some scaling issues and can lead to wasting bandwidth and client/server time while making unnecessary calls.
To overcome either of these, it would be best to use a library like SignalR (which can do much more than just chat applications - check out this blog post for a real world implementation that has nothing to do with chat).

Use Microsoft's ASP.NET Ajax implementation or JQuery:
Microsoft Ajax Overview

Rapid number updates on a website

I wonder how to update fast numbers on a website.
I have a machine that generates a lot of output, and I need to show it on line. However my problem is the update frequency is high, and therefore I am not sure how to handle it.
It would be nice to show the last N numbers, say ten. The numbers are updated at 30Hz. That might be too much for the human eye, but the human eye is only for control here.
I wonder how to do this. A page reload would keep the browser continuously loading a page, and for a web page something more then just these numbers would need to be shown.
I might generate a raw web engine that writes the number to a page over a specific IP address and port number, but even then I wonder whether this page reloading would be too slow, giving a strange experience to the users.
How should I deal with such an extreme update rate of data on a website? Usually websites are not like that.
In the tags for this question I named the languages that I understand. In the end I will probably write in C#.

a) WebSockets in conjuction with ajax to update only parts of the site would work, disadvantage: the clients infrastructure (proxies) must support those (which is currently not the case 99% of time).
b) With existing infrastructure the approach is Long Polling. You make an XmlHttpRequest using javascript. In case no data is present, the request is blocked on server side for say 5 to 10 seconds. In case data is avaiable, you immediately answer the request. The client then immediately sends a new request. I managed to get >500 updates per second using java client connecting via proxy, http to a webserver (real time stock data displayed).
You need to bundle several updates with each request in order to get enough throughput.

You don't have to use a page reload. You can use WebSockets to establish an open two-way communication between a browser (via JavaScript) and your server.
Python Tornado has support for this built-in. Additionally, there are a couple of other Python servers that support it. Socket.IO is a great JavaScript library, with fallback, to facilitate the client side.

On the backend you can use Redis or a NewSQL database like VoltDB for fast in-memory database updates. Caching helps a lot with high latency components (esp. in a write heavy application).
On the front-end you can look into websockets and the Comet web application model http://en.wikipedia.org/wiki/Comet_%28programming%29
Many gaming companies have to deal with fast counter updates and displays - it might be worth looking into. Zynga uses a protocol call AMF http://highscalability.com/blog/2010/3/10/how-farmville-scales-the-follow-up.html

Design considerations for realtime OPC system

We are in the process of redesigning a disjointed realtime OPC system which has proven to be cumbersome. Our technology stack is C#, .NET 4 and SQL Server 2008 R2, hosted on 32 bit Windows Server 2003. Physical archtecture currently dictates that all tiers are to be hosted on a single server, although with sufficient motivation (read: ROI) this could be increased to 2.
The basic existing architecture is:
External OPC devices call our webservice to populate SQL with realtime data, approxomately 300 events per second. We have no control over volume or batch processing here, although in the rewrite I would like to implement batching in the web service to spare SQL from 300 insert statements per second.
SQL is used as a central resource for various components (about 9 in total, all to be redesigned) that perform tasks ranging from alarms to reporting. This is currently the biggest problem with the existing design, in that there is no single BLL or even DAL through which all these components consume/manipulate data or govern behavior.
Components range from Windows Services to Web Services to Windows Applications. The biggest consumer of CPU time and SQL connections is a Windows Forms application that monitors all realtime data and raises alarms as required. It also does realtime trend graphs, which is quite expensive to run.
For the rewrite there is a strong push towards WPF with which, apart from the learning curve, I have no problem. My question is more concerned with the underlying architecture:
I am currently doing some research on how to implement a single DAL and BLL. For the DAL I am leaning towards EF or nHibernate, with Linq-to-SQL also a possibility.
For the BLL I only have experience in CSLA.NET, and I fear this may be a bit over the top for a system in which speed and resource consumption is crucial.
Does anybody have any experience with a similiar system, and are willing to share a few lessons or design guidelines?

I have some exposure to acquiring data from OPC servers, though the applications I implemented I believe were not as large scale as yours. For my application I had a publish - subscribe architecture based messaging layer, my suggestion based on my experience then would be
1) For your real time data acquisition you would need something based on a publish - subscribe mechanism, Biz talk server is the microsoft answer to ESB. So I would look at this.
2) Does your windows forms application need to look at the database directly ? I mean can it look at an intermediate that can say look at the db for historical purposes or subscribe to the real time feed if all it cares is real time information

Im not sure I like the idea of having an SQL server as a central point in the system. this server is going to get hammered - every time data changes on a device, it will write to the database. Every client will then continually refresh at a constant rate to detect if there are any changes. this is going to be a lot of work for the SQL server.
The OPC protocol involves clients subscribing to a server, so they can then be notified on change of any data. Using SQL in the middle prevents this.
Would you not be far better off using/creating an OPC server to retrieve all data from the device, then allowing each client to connect to this? That way they are only recieving data when it changes, rather than having to constantly check for updates?
If you need to log for historical reasons, you can always make an extra client which then logs the data to an SQL database.

What is the most cost-effective way to break up a centralised database?

Following on from this question...
What to do when you’ve really screwed up the design of a distributed system?
... the client has reluctantly asked me to quote for option 3 (the expensive one), so they can compare prices to a company in India.
So, they want me to quote (hmm). In order for me to get this as accurate as possible, I will need to decide how I'm actually going to do it. Here's 3 scenarios...
Scenarios
Split the database
My original idea (perhaps the most tricky) will yield the best speed on both the website and the desktop application. However, it may require some synchronising between the two databases as the two "systems" so heavily connected. If not done properly and not tested thouroughly, I've learnt that synchronisation can be hell on earth.
Implement caching on the smallest system
To side-step the sync option (which I'm not fond of), I figured it may be more productive (and cheaper) to move the entire central database and web service to their office (i.e. in-house), and have the website (still on the hosted server) download data from the central office and store it in a small database (acting as a cache)...
Set up a new server in the customer's office (in-house).
Move the central database and web service to the new in-house server.
Keep the web site on the hosted server, but alter the web service URL so that it points to the office server.
Implement a simple cache system for images and most frequently accessed data (such as product information).
... the down-side is that when the end-user in the office updates something, their customers will effectively be downloading the data from a 60KB/s upload connection (albeit once, as it will be cached).
Also, not all data can be cached, for example when a customer updates their order. Also, connection redundancy becomes a huge factor here; what if the office connection is offline? Nothing to do but show an error message to the customers, which is nasty, but a necessary evil.
Mystery option number 3
Suggestions welcome!
SQL replication
I had considered MSSQL replication. But I have no experience with it, so I'm worried about how conflicts are handled, etc. Is this an option? Considering there are physical files involved, and so on. Also, I believe we'd need to upgrade from SQL express to SQL non-free, and buy two licenses.
Technical
Components
ASP.Net website
ASP.net web service
.Net desktop application
MSSQL 2008 express database
Connections
Office connection: 8 mbit down and 1 mbit up contended line (50:1)
Hosted virtual server: Windows 2008 with 10 megabit line

Having just read for the first time your original question related to this I'd say that you may have laid the foundation for resolving the problem simply because you are communicating with the database by a web service.
This web service may well be the saving grace as it allows you to split the communications without affecting the client.
A good while back I was involved in designing just such a system.
The first thing that we identified was that data which rarely changes - and immediately locked all of this out of consideration for distribution. A manual process for administering using the web server was the only way to change this data.
The second thing we identified was that data that should be owned locally. By this I mean data that only one person or location at a time would need to update; but that may need to be viewed at other locations. We fixed all of the keys on the related tables to ensure that duplication could never occur and that no auto-incrementing fields were used.
The third item was the tables that were truly shared - and although we worried a lot about these during stages 1 & 2 - in our case this part was straight-forwards.
When I'm talking about a server here I mean a DB Server with a set of web services that communicate between themselves.
As designed our architecture had 1 designated 'master' server. This was the definitive for resolving conflicts.
The rest of the servers were in the first instance a large cache of anything covered by item1. In fact it wasn't a large cache but a database duplication but you get the idea.
The second function of the each non-master server was to coordinate changes with the master. This involved a very simplistic process of actually passing through most of the work transparently to the master server.
We spent a lot of time designing and optimising all of the above - to finally discover that the single best performance improvement came from simply compressing the web service requests to reduce bandwidth (but it was over a single channel ISDN, which probably made the most difference).
The fact is that if you do have a web service then this will give you greater flexibility about how you implement this.
I'd probably start by investigating the feasability of implementing one of the SQL server replication methods
Usual disclaimers apply:

Splitting the database will not help a lot but it'll add a lot of nightmare. IMO, you should first try to optimize the database, update some indexes or may be add several more, optimize some queries and so on. For database performance tuning I recommend to read some articles from simple-talk.com.
Also in order to save bandwidth you can add bulk processing to your windows client and also add zipping (archiving) to your web service.
And probably you should upgrade to MS SQL 2008 Express, it's also free.
It's hard to recommend a good solution for your problem using the information I have. It's not clear where is the bottleneck. I strongly recommend you to profile your application to find exact place of the bottleneck (e.g. is it in the database or in fully used up channel and so on) and add a description of it to the question.
EDIT 01/03:
When the bottleneck is an up connection then you can do only the following:
1. Add archiving of messages to service and client
2. Implement bulk operations and use them
3. Try to reduce operations count per user case for the most frequent cases
4. Add a local database for windows clients and perform all operations using it and synchronize the local db and the main one on some timer.
And sql replication will not help you a lot in this case. The most fastest and cheapest solution is to increase up connection because all other ways (except the first one) will take a lot of time.
If you choose to rewrite the service to support bulking I recommend you to have a look at Agatha Project

Actually hearing how many they have on that one connection it may be time to up the bandwidth at the office (not at all my normal response) If you factor out the CRM system what else is a top user of the bandwidth? It maybe the they have reached the point of needing more bandwidth period.
But I am still curious to see how much information you are passing that is getting used. Make sure you are transferring efferently any chance you could add some easy quick measures to see how much people are actually consuming when looking at the data.

What is the best way scale out work to multiple machines?

We're developing a .NET app that must make up to tens of thousands of small webservice calls to a 3rd party webservice. We would prefer a more 'chunky' call, but the 3rd party does not support it. We've designed the client to use a configurable number of worker threads, and through testing have code that is fairly well optimized for one multicore machine. However, we still want to improve the speed, and are looking at spreading the work accross multiple machines. We're well versed in typical client/server/database apps, but new to designing for multiple machines. So, a few questions related to that:
Is there any other client-side optimization, besides multithreading, that we should look at that could improve speed of a http request/response? (I should note this is a non-standard webservice, so is implemented using WebClient, not a WCF or SOAP client)
Our current thinking is to use WCF to publish chunks of work to MSMQ, and run clients on one or more machines to pull work off of the queue. We have experience with WCF + MSMQ, but want to be sure we're not missing better options. Are there other, better ways to do this today?
I've seen some 3rd party tools like DigiPede and Microsoft's HPC offerings, but these seem like overkill. Any experience with those products or reasons we should consider them over roll-our-own?

Sounds like your goal is to execute all these web service calls as quickly as you can, and get the results tabulated. Given that, your greatest efficiency control is going to be through scaling the number of concurrent requests you can make.
Be sure to look at your client-side connection limits. By default, I think the system default is 2 connections. I haven't tried this myself, but by upping the number of connections with this property, you should theoretically see a multiplier effect in terms of generating more requests by generating more connections from a single machine. There's more info on MS forums.
The MSMQ option works well. I'm running that configuration myself. ActiveMQ is also a fine solution, but MSMQ is already on the server.
You have a good starting point. Get that in operation, then move on to performance and throughput.

At CodeMash this year, Wesley Faler did an interesting presentation on this sort of problem. His solution was to store "jobs" in a DB, then use clients to pull down work and mark status when complete.
He then pushed the whole infrastructure up to Amazon's EC2.
Here's his slides from the presentation - they should give you the basic idea:
I've done something similar w/ multiple PC's locally - the basics of managing the workload were similar to Faler's approach.

If you have optimized the code, you could look into optimizing the network side to minimize the number of packets sent:
reuse HTTP sessions (i.e.: multiple transactions into one session by keeping the connection open, reduces TCP overhead)
reduce the number of HTTP headers to the minimum in the request to save bandwidth
if supported by server, use gzip to compress the body of the request (need to balance CPU usage to do the compression, and the bandwidth you save)

You might want to consider Rhino Service Bus instead of MSMQ. The source is available here.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.