What is the best way scale out work to multiple machines?

What is the best way scale out work to multiple machines? - c#

We're developing a .NET app that must make up to tens of thousands of small webservice calls to a 3rd party webservice. We would prefer a more 'chunky' call, but the 3rd party does not support it. We've designed the client to use a configurable number of worker threads, and through testing have code that is fairly well optimized for one multicore machine. However, we still want to improve the speed, and are looking at spreading the work accross multiple machines. We're well versed in typical client/server/database apps, but new to designing for multiple machines. So, a few questions related to that:
Is there any other client-side optimization, besides multithreading, that we should look at that could improve speed of a http request/response? (I should note this is a non-standard webservice, so is implemented using WebClient, not a WCF or SOAP client)
Our current thinking is to use WCF to publish chunks of work to MSMQ, and run clients on one or more machines to pull work off of the queue. We have experience with WCF + MSMQ, but want to be sure we're not missing better options. Are there other, better ways to do this today?
I've seen some 3rd party tools like DigiPede and Microsoft's HPC offerings, but these seem like overkill. Any experience with those products or reasons we should consider them over roll-our-own?

Sounds like your goal is to execute all these web service calls as quickly as you can, and get the results tabulated. Given that, your greatest efficiency control is going to be through scaling the number of concurrent requests you can make.
Be sure to look at your client-side connection limits. By default, I think the system default is 2 connections. I haven't tried this myself, but by upping the number of connections with this property, you should theoretically see a multiplier effect in terms of generating more requests by generating more connections from a single machine. There's more info on MS forums.
The MSMQ option works well. I'm running that configuration myself. ActiveMQ is also a fine solution, but MSMQ is already on the server.
You have a good starting point. Get that in operation, then move on to performance and throughput.

At CodeMash this year, Wesley Faler did an interesting presentation on this sort of problem. His solution was to store "jobs" in a DB, then use clients to pull down work and mark status when complete.
He then pushed the whole infrastructure up to Amazon's EC2.
Here's his slides from the presentation - they should give you the basic idea:
I've done something similar w/ multiple PC's locally - the basics of managing the workload were similar to Faler's approach.

If you have optimized the code, you could look into optimizing the network side to minimize the number of packets sent:
reuse HTTP sessions (i.e.: multiple transactions into one session by keeping the connection open, reduces TCP overhead)
reduce the number of HTTP headers to the minimum in the request to save bandwidth
if supported by server, use gzip to compress the body of the request (need to balance CPU usage to do the compression, and the bandwidth you save)

You might want to consider Rhino Service Bus instead of MSMQ. The source is available here.

Related

Azure - C# Concurrency - Best Practices

We are scraping an Web based API using Microsoft Azure. The issue is that there is SO much data to retrieve (there are combinations/permutations involved).
If we use a standard Web Job approach, we calculated it would take about 200 years to process all the data we want to get - and we would like our data to be refreshed every week.
Each request/response from the API takes about a 0.5-1.0 seconds to process. Request size is on average 20000 bytes and the average response is 35000 bytes. I believe the total number of requests is in the millions.
Another way to think about this question would be: how would you use Azure to Web scrape - and make sure you don't overload (in terms of memory + network) the VM it's running on? (I don't think you need too much CPU processing in this case).
What we have tried so far:
Used Service Bus Queues/Worker Roles scaled to 8 small VMs - but this caused a lot of network errors to occur (there must be some network limit to how much EACH worker role VM can handle).
Used Service Bus Queues/Continuous Web Job scaled to 8 small VMs - but this seems to work slower - and even scaled, doesn't give us too much control on what's happening behind the scenes. (We don't REALLY know how many VMs are up).
It seems that these things are built for CPU calculation - not for Web/API scraping.
Just to clarify: I throw my requests into a queue - which then get picked up by my multiple VMs for processing to get the responses. That's how I was using the queues. Each VM was using the ServiceBusTrigger class as prescribed by microsoft.
Is it better to have a lot small VMs or few massive VMs?
What C# classes should we be looking at?
What are the technical best practices when trying to do something like this on Azure?

Actually a web scraper is something that I have up and running, in Azure, for quite some time now :-)
AFAIK there is no 'magic bullet'. Scraping a lot of sources with deadlines is quite hard.
How it works (the most important things):
I use worker roles and C# code for the code itself.
For scheduling, I use the queue storage. I put crawling tasks on the queue with a timeout (e.g. 'when to crawl then') and have the scraper pull them off. You can put triggers on the queue size to ensure you meet deadlines in terms of speed -- personally I don't need them.
SQL Azure is slow, so I don't use that. Instead, I only use table storage for storing the scraped items. Note that updating data might be quite complex.
Don't use too much threading; instead, use async IO for all network traffic.
Also you might have to consider that extra threads require extra memory (parse trees can become quite big) - so there's a trade-off there... I do recall using some threads, but it's really just a few.
Note that probably this does require you to re-design and re-implement your complete web scraper if you're now using a threaded approach.. then again, there are some benefits:
Table storage and queue storage are cheap.
I currently use a single Extra Small VM to scrape well over a thousand web sources.
Inbound network traffic is for free.
As such, the result is quite cheap as well; I'm sure it's much less than the alternatives.
As for classes that I use... well, that's a bit of a long list. I'm using HttpWebRequest for the async HTTP requests and the Azure SDK -- but all the rest is hand crafted (and not open source).
P.S.: This doesn't just hold for Azure; most of this also holds for on-premise scrapers.

I have some experience with scraping so I will share my thoughts.
It seems that these things are built for CPU calculation - not for Web/API scraping.
They are built for dynamic scaling which given your task is not something you really need.
How to make sure you don't overload the VM?
Measure the response times and error rates and tune you code to lower them.
I don't think you need too much CPU processing in this case.
Depends on how much data is coming in each second and what you are doing with it. More complex parsing on quickly incoming data (if you decide to do it on the same machine) will eat up CPU pretty quickly.
8 small VMs caused a lot of network errors to occur (there must be some network limit)
The smaller the VMs the less shared resources they get. There are throughput limits and then there is an issue with your neighbors sharing the actual hardware with you. Often, the smaller your instance size the more trouble you run into.
Is it better to have a lot small VMs or few massive VMs?
In my experience, smaller VMs are too crippled. However, your mileage may vary and it all depends on the particular task and its solution implementation. Really, you have to measure yourself in your environment.
What C# classes should we be looking at?
What are the technical best practices when trying to do something like this on Azure?
With high throughput scraping you should be looking at infrastructure. You will have different latency in different Azure datacenters, and different experience with network latency/sustained throughput at different VM sizes, and depending on who in particular is sharing the hardware with you. The best practice is to try and find what works best for you - change datacenters, VM sizes and otherwise experiment.
Azure may not be the best solution to this problem (unless you are on a spending spree). 8 small VMs is $450 a month. It is enough to pay for an unmanaged dedicated server with 256Gb of RAM, 40 hardware threads and 500Mbps - 1Gbps (or even up to several Gbps bursts) of quality network bandwidth without latency issues.
For you budget, you will have a dedicated server that you cannot overload. You will have more than enough RAM to deal with async pinning (if you decide to go async), or enough hardware threads for multi-threaded synchronous IO which gives the best throughput (if you choose to go synchronous with a fixed-size threadpool).
On a sidenote, depending on the API specifics, it might turn out that your main issue will be the API owner simply throttling you down to a crawl when you start to put too much pressure on the API endpoints.

WCF Alternative for big messages

I have a WCF Service deployed in azure. I have a client consuming this service which runs on Windows Phone 7. Everything worked fine but when i was trying to send to server some larger files or enumerables with lost of items, errors occured. I found out that there can be configured max message size, max array length etc in configuration file. So i added few zeros to default values and it worked. However, i am not happy with this solution, because it is dirty.
my question is:
1.What exactly are disadvantages of mindlessly increasing message size limits and how does it affect service?
2.What is alternative for me instead of increasing message size?
In particular, i nedd to send to server GPS track which consists of same metadata and huge ammount of location points.
If i understand concept correctly, by default wcf uses SOAP, which is XML based. So objects sent are encoded as XML (similiar to XML serialization in .net?). So can it be somehow switched to some binary mode to send BLOBS or to upload large objects troyugh streams? Or is my oinly option to bypass WCF service completely and upload directly to server Storage (like SQL azure or Azure Blob Service), which exposes API to do so?
Thank you.

As Peretz mentioned in a comment, that's what is supposed to happen.
The defaults are just that--defaults. Not "recommended" settings, nor pseudo-max sizes. They're available to alter based on your needs (and should be).
You could use net.tcp binding (if you're not already) which handles data a little better (with regards to serializing), but what you're doing is well within the boundary of WCF and its abilities.

I had quite same problem with huge gps tracks. I realy suggest not to use soap for this kind of tasks. You can use WebHttpBinding and implement streaming from your storage, or use something like ASP.NET WebAPI(will ship with MVC4 and can be hosted outside of IIS) for low level plumbing of streams to client. This will allow you to implement multiple download streams and all what you might need in this kind of tasks.
As in overall design concept of such systems, try not to think that one tool can solve your problems. Just use right took for a task. If you have some busines tasks - implement transactional ws-* based services for them. To transfer huge amount of data - use something like REST services. This will also help to query tracks.
For example: Tracks/{deviceid}/{trackDate}.{format}.
Feel free to ask=)

You should not arbitarily increase message sizes by chucking 0's on the end of these settings. Yes they are defaults which can be changed, but increasing message sizes whenever a limit is hit is not something you should always resort to. One of the reasons for having such small sizes is security: it prevents clients from flooding servers with huge messages and taking them down. It also encourages clients to send small messages which helps with scalability.
There are encodings that you can use: it depends on the bindings used. I thought that WCF encodes SOAP as binary anyway...but I may be wrong, I haven't touched WCF for 6 months now.
In previous projects whenever we hit size limits we looked at cutting our data into smaller chunks. One of the best things we ever did was implement pagable grids in our GUI's which only got 10-20 or so records from the sever at a time. Entity Framework was greating in allowing us to write a single generic skip take query to do this for ALL of our grids.
Just increasing sizes is an easy fix...until you cant increase any further. Its a brittle and broken approach.

Best practice to implement a low latency live financial data feed using WCF?

I have a .NET service which need to feed live financial data to its clients. The output rate for this feed might get intense and I am looking for the best architecture to implement this type of service with low latency and high performance.
I was thinking of using some kind of a stream data provider, one that is used for audio or video, but send feed updates instead.
Would appreciate any thought on this subject, or any real world examples
Update:
I don't have to use WCF, that was only my first approach since it is the current technology. Any other implementation in C# is welcome.

Full Disclosure: I work for Informatica (formerly 29West) and am on the engineering team responsible for their messaging products. I am biased. I do, however, have a pretty good grasp of low-latency messaging in the financial market.
If you message rates are about 60 messages/sec. (as stated in a comment on Will Dean's answer), and they're being delivered to a GUI with a human sitting in front of it and reacting to the market at human-speed, it honestly doesn't matter a whole lot what software you use from a latency perspective. You might even be able to get away with using WCF (though I'd still recommend against it; we considered supporting it once and prototyped an adapter for it and it bloated latencies up by an order of magnitude - we decided not to bother with it at the time).
Now, Informatica's messaging software can pass messages between processes on the same machine in well under a microsecond, and if you want to buy some nice 10 gig-E NICs with kernel bypass or InfiniBand gear, you can pass millions of messages per second between machines with single-digit microseconds of latency. We'll also soon be releasing a new data serialization library that's supported in C/C++, Java, and .NET as part of the messaging product that in some cases is actually faster than Protocol Buffers (although Protocol Buffers are widely used and also a very good choice). Our .NET and Java APIs both have a feature called "ZOD" for "Zero Object Delivery", which is a kinda funny way of saying they generate no new objects during message delivery, meaning no garbage collection pauses & associated latency spikes/outliers. We've got another product called UMDS that's specifically designed to fan out high-speed backbone traffic to slower desktop apps without slowing down the backbone or other clients.
I could go on and on about how great Informatica's messaging software is and I do think it's worth checking out, but this already looks like a straight-up ad, and I'm an engineer, not a sales person. So here's a few pieces of more general advice:
If you have a lot of clients receiving the same data, you'll want some flavor of UDP multicast. You'll often want a reliable multicast transport of some kind - the well-known (and free) reliable multicast protocol is PGM. Windows includes an implementation of PGM that's usable in C#; I'll refer you to Mike Rettig's excellent blog post on how to use it if you want to try it out. (I happen to know Mike - he's a smart guy.) Protocol choice is an area in which you get what you pay for; Informatica's messaging includes a reliable multicast protocol loosely based off of PGM (our architect who designed it co-wrote the PGM RFC a long while back), but with a lot of major improvements. Plain PGM might be fine for what you need, though.
You want to go with a brokerless/serverless architecture. Have the apps communicate peer-to-peer with nothing in the middle. Avoid extra hops in the message path (which usually means avoid most JMS implementations, avoid almost anything with "queue" in the name somewhere, etc.).
Be mindful of how your system behaves when one individual client misbehaves. Can one slow consumer slow down everyone else?
There are a lot of OS tuning and BIOS tuning options that can benefit any sort of low-latency messaging, homegrown or bought - things like interrupt coalescing, tying NIC interrupts to a particular CPU core, receive-side scaling (which has historically been terrible when used with UDP on Windows, but should be getting much better in the future), disabling certain CPU power states, etc.
Resist the temptation to use built-in object serialization in .NET to send whole objects over the wire - it is orders of magnitude slower than using a simple binary format (like Protocol Buffers, or Informatica's serialization library, or your own binary format, etc.).
If you have more specific questions or need more detail on any of my advice, just let me know!

How low is 'low latency' and how busy is 'intense'? You need to have some idea of what you're aiming for to choose the right approach.
I could supply you some hardware which would respond to 100% of all requests within, say, 20us upto the full capacity of your network hardware, but it would not use WCF much at all.
To a very broad approximation, I would say that things like WCF are very high-level and trade-off ease-of-use and abstraction-for-the-benefit-of-the-programmer against performance (latency/throughput). Whether they trade it off too much for your application needs real numbers.
The lowest-latency, lowest-overhead IP-based protocol in widespread use is UDP - that's why it's used for things like DNS and NTP. It's very scalable at the server, because the server doesn't need to keep any state, and it's very simple to implement on almost any platform. But you do need to be thinking in terms of network packets rather than .NET objects. Do you get to supply the client-end software too?

Live financial data? Never rely on WCF on that. Instead, go with what other industries use. i.e. NASDAQ uses Real-Time Innovations - Data Distribution Service to deliver live stock ticks to users. They provide C/C++/C# api for their communications libraries, which is extremely easy to setup and use (compared to WCF).
In general, this sort of real-time data feeds use publish/subscribe paradigm which helps to make sure that the communication happens with minimal overhead. This sort of an approach is the main idea in message-oriented middle ware and it is exactly what financial services use for real-time stuff.
On a side node, you can deliver real-time audio-video packets using RTI-DDS library, as far as I know, unmanned aerial vehicles like MQ-9 uses again this library to deliver live video & geo-location information to the ground control stations.
There are also free data distribution service libraries but I've no experience in them. You just need to google for it.
Edit: I'm currently prototyping some HMI (human machine interface) software which uses aforementioned RTI-DDS libraries along with two other libraries which have such message oriented architectures, which did work a thread up to now for all my real-time communication needs. Here is a demo: http://epics.codeplex.com/ (It will be used in remotely controlling the equipment in our brand new nuclear research facility)

The more assumptions you make and features you cut out the faster you can make your system. The more robust and flexible you attempt to make things, the more your performance will suffer. I would suggest a few basic must haves:
A binary data serialization format.
Don't use XML or any other human
readable method of passing your
data.
A robust enough data
serialization format that it can
support cross-architecture,
cross-language endpoints. BER comes
to mind - C# seems to have support
A transport protocol that has
guaranteed delivery and data
integrity. If any type of
financial algorithm will be using
this data, even missing one tick
could mean the difference between
and order being triggered or missing
out on a price. Even if you are
going to aggregate ticks in your
server you still want control over
how the information is presented to
your clients. TCP works for distributed systems. However there are much faster alternatives if your clients are on the same machine as your server. UDP won't even garauntee order, which can be problematic (though not insurmountable).
With regard to internal processing:
Avoid strings and other classes that
add significant overhead to simple
tasks. Use basic character arrays
instead. I'm not sure what options
you have in C# or if you even have
lightweight alternatives. If so, use
them. This applies to data-structures as well.
Be aware of double/float comparison errors. Use comparisons that only check for the necessary level of precision. If possible convert everything to integers internally and provide enough metadata to convert back on the other end.
Use something similar to pooled allocators in C++. My lack of knowledge of C# prevents me from being more specific. Again C# probably isn't your best choice here. Bottom line is that you are going to be creating and destroying a lot of tick objects and there is no reason to ask the OS for the memory every time.
Only send out deltas, don't send information that your clients already have. This assumes you are using a transport with guaranteed delivery. If not you could end up displaying stale data for a long time.

This might be of interest although its specific to gaming ... Lowest Latency small size data Internet transfer protocol? c#
Here is a tutorial on UDP connection http://www.winsocketdotnetworkprogramming.com/clientserversocketnetworkcommunication8r.html
Another Article on UDP
http://msdn.microsoft.com/en-us/magazine/cc163648.aspx

You ask specifically about a "low latency User Feed". What do you really want with low latency, for 'Feed Only' (and especially if it does not generate revenue), could the Users wait a second; that is not low latency.
If you want to trade FAST then you need to physically move across the street from the Exchange (or nearby with an Optical Link). Next you need to 'Trade on the Card'; the Ethernet Card is 'smart' and is fed 'Trade Formulas' that program the Network Card to make a preprogrammed trade based on Data received (without pestering your Computer).
See: http://intelligenttradingtechnology.com/article/groundbreaking-results-high-performance-trading-fpga-and-x86-technologies
Learning to manipulate that Environment will buy you more than reinventing the wheel.
Ultra low latency is costly, but billions are at stake; your stakes (and pursuit of lower latency) with be throttled by $.

In the past i've used Tibco rv or raw sockets for streaming prices/ rates, where high frequency updates are expected. In this situation, it is often the client (or in in fact the user) who is the limitation (as there is only so many updates a user can process), and this is therefore an example of where you can 'lose' data. In this situation a client side service broker can be used to throttle updates.
If the system is used for automated trading or HFT then products like 29West LatencyBuster has been proven to work well and offers guaranteed messaging.

Should I use WCF for a simple textual wire protocol?

I need to write a program that will communicate with other .NET programs ... but also a legacy VFP program over TCP. I need to choose a fairly simple TCP message format that the VFP programmer can use. It could be as simple as a sequence of small XML blobs delimited by... I dunno, a null character? Whatever.
I need to choose between TcpListener/TcpClient and WCF. I started researching WCF but its architecture seems opaque and built-in Visual Studio templates are heavily biased toward making "web services" that act like a sort of RPC mechanism, but require a special "host" or web server that is external to the application. And Microsoft's 6-stage tutorial makes WCF sound pretty cumbersome (involving code generators, command-line and XML crap just to remotely subtract or multiply two numbers).
I want a self-contained app (no "host"), I want control of the wire protocol, and I want to understand how it works. WCF doesn't seem to facilitate these things, so I abandoned it in factor of TcpListener/TcpClient.
However, the program is to serve as an intermediary between a single (VFP) server and many (.NET) clients, and there will be communication in both directions and across different connections. Using TcpListener and TcpClient, the work of juggling the connections and threads is getting a bit messy, I have no experience with IAsyncResult, and I'm not just not confident in my code quality.
So I would like to solicit opinions again: should I still consider WCF instead? If yes, can you point me toward answers to the following questions?
Where in the web is a good explanation of WCF's architecture? Or do I need a book?
How is bi-directional communication done in WCF, where either side (of a single TCP connection) can send a message at any time?
How can I get past all the web-services and RPC mumbo-jumbo, and control the wire protocol?
In WCF, how do I shut down the app cleanly, closing all connections in parallel without hacky Thread.Abort() commands?
If no, how can I set up my code (that uses TcpListener/TcpClient/NetworkStream) so that I can read a message from a NetworkStream, but also accept requests from other connections, shut down cleanly at any time, and avoid wasting CPU time to poll queues and NetworkStreams that are inactive?

The short answer: go with WCF. While there's a good amount of tooling and code-generation and other bells and whistles around it, there's nothing that is preventing you from setting up everything in code (you can define your contracts, set the endpoints up, etc. all in code).
For your specific questions:
WCF Architecture - This is pretty basic, and it should get you up and running relatively quickly.
What you are looking for is duplex services. The NetTcpBinding allows for duplex services out-of-the-box (although you can do it with HTTP, you need a specific binding).
If you want to control the wire format, you will want to create a custom encoder. However, I have to strongly recommend against it. You want to create an XML file with null character to delineate separate messages? There's no need for that, the nature of XML is that you can create child elements to perform the appropriate grouping; there's no limit to how many elements you can nest. There's really no need for this.
Simply shutting down the ServiceHost by calling Close, this will allow all outstanding requests to complete, and then shut down gracefully. If you really want to tear down without concern, then call Abort.
In the end, I'd strongly recommend that you not use the NetTcpBinding; VFP will have a difficult time consuming the protocol. However, if you use an HTTP-based protocol, there are always tools that VFP can easily use to make the call and consume the contents (assuming you stick with XML).

Just to tack on about a common on using DCOM, VFP can utilize DCOM, but needs to be done with CreateObjectEx()... the only big difference is you need to know the GUID of the class instance you are connecting to on whatever server it is connecting to, AND the machine name its going to connect to.
Then the remote object does its work via exposed functions, but VFP calling it from some other machine on the network treats it as if the function was being performed locally and gets whatever the return values are.
I've done DCOM with VFP even as far back as 10 yrs ago for an insurance company...

How do you access in-memory services from web applications?

Say I need to design an in-memory service because of a very high load read/write system. I want to dump the results of the objects every 2 minutes. How would I access the in-memory objects/data from within a web application?
(I was thinking a Windows service would be running in the background handling the in-memory service etc.)
I want the fastest possible solution, and I would guess most people would say use a web service? What other options would I have? I just don't understand how I could hook into the Windows service's objects etc.
(Please don't ask why I would want to do this, maybe you're right and it's a bad idea but I am also curious if this type of architecture is possible.)
Update
I was looking at this site swoopo.com that I would think has a lot of hits near the end of auctions, but since the auction keeps resetting the hits to the database would be just crazy so I was thinking if they did it in memory then dumped to db every x minutes...

What you're describing is called a cache, with a facade front-end.
You write a facade to which you commit your changes and acquire your datasets. The facade queues up reads and writes and commits when the queue is full or after a certain amount of time has passed. Your web application has a single point of access to the data (the facade), and the facade is structured in such a way to avoid writing and reading from storage too often.
Most relational database management systems do this for you. They do this kind of optimization and queuing internally so writing another layer on top of it would only slow things down. So don't write a cache if you're using an RDBMS.
Regarding the specifics of accessing such a facade, you can treat it as just an object, and implement it however you want (its own thread, a thread pool, a Web service, a Windows service, whatever).

Any remoting technology would work such as sockets, pipes and the like.
Check out: www.remobjects.com

You could use a Windows Message Queues or a Service Bus, or even .NET remoting.
See http://www.nservicebus.com/, or http://code.google.com/p/masstransit/.

You could hook into the Windows Services objects by using Remoting or WCF, both offer very fast interprocess communication. Sockets are fast too but are more cumbersome to program compared to WCF. There is a ton of WCF documentation and support online.
Databases provide a level of caching for you. The advantage of an in memory golden copy such as the one you propose is that it never has to read from disk when a request comes in and if you host it on the same machine as your IIS (provided you have enough RAM for both) there is no extra network hop, making it much faster that querying a db. However, the downside to this approach is that it does not scale as well if you need to add machines to load balance.
Third party messaging providers such as TIBCO are also worth looking at.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.