Articles on how to organize background queue operations [closed]

Articles on how to organize background queue operations [closed] - c#

If your deployment will be on a single server, your initial idea of a WCF service is probably the way to go - see MSDN for a discussion regarding hosting in IIS or in a Windows Service.
As #JeffWatkins said, a good pattern to follow when calling the service is to simply pass it the location of the file on disk that needs processing. This will be much more efficient when dealing with large files.
I think the precise approach taken here will depend on the nature of files you are receiving from users. In the case of quite small files you may find it more efficient to stream them to your service from your website such that they never touch the disk. In this case, your service would then expose an additional method that is used when dealing with small files.
Edit
Introducing a condition where the file may be streamed is probably a good idea, but it would be valuable for you to do some testing so you can figure out:
Whether it is worth doing
What the optimal size is for streaming versus writing to disk
My answer was based on the assumption that you were deploying to a single machine. If you are wanting something more scalable, then yes, using MSMQ would be a good way to scale your application.
See MSDN for some sample code for building a WCF/MSMQ demo app.

I've designed something similar. We used a WCF service as the connection point, then RabbitMQ for queuing up the messages. Then, a separate service works with items in the queue, sending async callback when the task if finished, therefore finishing the WCF call (WCF has many built in features for dealing with this)
You can setup timeouts on each side, or you can even choose to drop the WCF connection and use the async callback to notify the user that "processing is finished"
I had much better luck with RabbitMQ than MSMQ, FYI.
I don't have any links for you, as this is something our team came up with and has worked very well (1000 TPS with a 4 server pool, 100% stateless) - Just an Idea.

I would give a serious look to ServiceStack. This functionality is built-in, and you will have minimal programming to do. In addition, ServiceStack's architecture is very good and easy to debug if you do run into any issues.
https://github.com/ServiceStack/ServiceStack/wiki/Messaging-and-redis
On a related note, my company does a lot of asynchronous background processing with a web-based REST api front end (the REST service uses ServiceStack). We do use multiple machines and have implemented a RabbitMQ backend; however, the RabbitMQ .NET library is very poorly-designed and unnecessarily cumbersome. I did a redesign of the core classes to fix this issue, but have not been able to publish them to the community yet as we have not released our project to production.

Have a look at http://www.devx.com/dotnet/Article/27560
It's a little bit dated but can give you a headstart and basic understanding.

Related

Recommendations on a Client Host application in .Net [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
My question is concerning an application that will use a host application where the database server is, and a few clients that will be sending information to the host.
Basically the host application receives some data from the clients, makes some calculations and error checking and then (if all goes well) it stores the information in the database. The data received could be easily serialized or get character separated in a string of less than 50 characters.
I know my basic option in developing this communication application is WCF and have worked with it before but my concerns for this particular case is the fact that:
The host and the clients will at most times be connected to the internet through wireless USB modems which as we all know do not provide the most reliable connection ever.
There will be many clients all sending information to the host at the same time, each having their own identification id since that determines the type of the data received and what it represents.
Due to the not so reliable connections i would like to be able to know if the packet has been sent successfully, and if not to be able to keep trying until the communication is complete.
New data will be sent from each client every couple of minutes and if lets say we have a connection failure for 5 minutes i would like to be able to send all unsent information when the connection is restored.
Lastly i'm kind of trying to figure how i would be able to know where to contact the host as the usb modems do not have a static ip and this could change from time to time.
My thought is to either try to establish a communication through WCF services where the clients would send all information to the host directly or maybe consider serializing the data from the clients in XML format, then upload them on a 3rd server that will be available all of the time, and then use the host application every one minute to try and synchronize the available information with the ones one the 3rd server.
Hope i made it pretty clear with this lengthy post on what i'm trying to accomplish and would really appreciate your thoughts for a project like this.

Instead of starting a discussion. Ill try and give you an answer.
I have implemented a system your describing. Based on that experience i can tell you that you will be wanting to look at a message based system to do the communication between your clients and host(s).
A message based system allows you to transparently handle the communication going on. It allows to you resend a message in case it failed to transmit.
To keep it short there are various message based frameworks available for the .Net community. To name a few: NServiceBus, Mass Transit, Rhino Service Bus, or maybe the more lightweight Agatha RRSL.
Point is, there are quite a few. Its up to you to research em and find out which one suits your needs best.

Any Good Patterns For Distributed Parallelism?

I've got a for loop I want to parallelize with something like PLINQ's Parallel.ForEach().
The key here is that the C++ library i'm calling to do the computation is decidedly not thread safe, therefore, any plans to parallelize this need to do so across multiple processes.
I was thinking about using WCF to create a "distributor" process to which the "client" and multiple "calculators" could both connect and add/remove items to/from a queue and then the "calculator" sends the results directly back to the client which could update the gui as it receives them. This architecture would allow me to bring as many "calculators" online as I have processors and as I see it even bring them up across multiple computers creating a potential farm of processing power to which all the clients could share.
I'm just wondering if anyone has had any experience doing this and if there are existing application blocks or frameworks that I can use to build this for me. PLINQ does it within the process. is there like a DPLINQ (distributed) or something?
Also if that doesn't exist, does anybody want to give an opinion on my proposed architecture? Any obvious pitfalls? Does anyone think it will work!?!?!?

Sounds like you could be looking for Dryad. It's a Microsoft research project right now, but they do have an "academic release" available. My understanding is that they are also in the process of better productizing it (probably some kind of integration with Azure) for RTM sometime near the end of 2011. Mary Jo Foley covers more about this here.

A long time standard for controlling/dispatching distributed work is MPI. I've only ever used it from C++, but implementations from many languages exist. A quick google suggests that MPI.Net could be a good implementation for .Net!

Does HttpListener work well on Mono?

I'm looking to write a small web service to run on a small Linux box. I prefer to code in C#, so I'm looking to use Mono.
I don't want the overhead of running a full web server or Mono's version of ASP.NET. I'm thinking of having a single process with a thread dealing with each client connection. Shared memory between threads instead of a database.
I've read a little on Microsoft's version of HttpListener and how it works with the Http.sys driver. Alas, Mono's documentation on this class is just the automated class interface with no discussion of how it works under the hood. (Linux doesn't have Http.sys, so I imagine it's implemented substantially differently.)
Could anyone point me towards some resources discussing this module please?
Many thanks, Bill, billpg.com
(A little background to my question for the interested.)
Some time ago, I asked this question, interested in keeping a long conversation open with lots of back-and-forth. I had settled on designing my own ad-hoc protocol, but people I spoke to really wanted a REST interface, even at the cost of the "Okay, send your command now" signal.
So, I wondered about running ASP.NET on a Linux/Mono server, but stumbled upon HttpListener. This seemed ideal, as each "conversation" could run in a separate thread. The thread that calls HttpListener in a loop can look for which thread each incomming connection is for and pass the reference to that thread.
The alternative for an ASP.NET driven service, would be to have the ASPX code pick up the state from a database, and write back the new state when it finishes. Yes, it would work, but that's a lot of overhead.

Greetings,
The HttpListener class in Mono works without much of a problem. I think that the most significant difference between its usage in a MS environment and a Linux environment is that port 80 cannot be bound to without a root/su/sudo security. Other ports do not have this restriction. For instance if you specify the prefix: http://localhost:1234/ the HttpListener works as expected. However if you add the prefix http://localhost/, which you would expect to listen on port 80, it fails silently. If you explicitly attempt to bind to port 80 (http://localhost:80/) then you throw an exception. If you invoke your application as a super user or root, you can explicitly bind to port 80 (http://localhost:80/).
I have not yet explored the rest of the HttpListener members in enough detail to make any useful comments about how well it operates in a linux environment. However, if there is interest, I will continue to post my observations.
chickenSandwich

I am not sure why you want to look so deep into the hood. Even on Microsoft side, the documents about http.sys may not provide you really valuable information if you are using the .NET Framework.
To know if something works on Mono good enough, you are always supposed to download its VMware or VPC image, and test your applications on it.
http://www.go-mono.com/mono-downloads/download.html
Though Mono is much more mature than a few years ago, we cannot say it has been tested by enough real-world applications like Microsoft.NET. So please test out your applications and submit issues you find to Mono team.
Based on my experience, minor issues are fixed within only a few days, while for major issues it takes a longer time. But with Mono source code available, you can fix on your own or find out good workarounds most of the times.

What is the best way scale out work to multiple machines?

We're developing a .NET app that must make up to tens of thousands of small webservice calls to a 3rd party webservice. We would prefer a more 'chunky' call, but the 3rd party does not support it. We've designed the client to use a configurable number of worker threads, and through testing have code that is fairly well optimized for one multicore machine. However, we still want to improve the speed, and are looking at spreading the work accross multiple machines. We're well versed in typical client/server/database apps, but new to designing for multiple machines. So, a few questions related to that:
Is there any other client-side optimization, besides multithreading, that we should look at that could improve speed of a http request/response? (I should note this is a non-standard webservice, so is implemented using WebClient, not a WCF or SOAP client)
Our current thinking is to use WCF to publish chunks of work to MSMQ, and run clients on one or more machines to pull work off of the queue. We have experience with WCF + MSMQ, but want to be sure we're not missing better options. Are there other, better ways to do this today?
I've seen some 3rd party tools like DigiPede and Microsoft's HPC offerings, but these seem like overkill. Any experience with those products or reasons we should consider them over roll-our-own?

Sounds like your goal is to execute all these web service calls as quickly as you can, and get the results tabulated. Given that, your greatest efficiency control is going to be through scaling the number of concurrent requests you can make.
Be sure to look at your client-side connection limits. By default, I think the system default is 2 connections. I haven't tried this myself, but by upping the number of connections with this property, you should theoretically see a multiplier effect in terms of generating more requests by generating more connections from a single machine. There's more info on MS forums.
The MSMQ option works well. I'm running that configuration myself. ActiveMQ is also a fine solution, but MSMQ is already on the server.
You have a good starting point. Get that in operation, then move on to performance and throughput.

At CodeMash this year, Wesley Faler did an interesting presentation on this sort of problem. His solution was to store "jobs" in a DB, then use clients to pull down work and mark status when complete.
He then pushed the whole infrastructure up to Amazon's EC2.
Here's his slides from the presentation - they should give you the basic idea:
I've done something similar w/ multiple PC's locally - the basics of managing the workload were similar to Faler's approach.

If you have optimized the code, you could look into optimizing the network side to minimize the number of packets sent:
reuse HTTP sessions (i.e.: multiple transactions into one session by keeping the connection open, reduces TCP overhead)
reduce the number of HTTP headers to the minimum in the request to save bandwidth
if supported by server, use gzip to compress the body of the request (need to balance CPU usage to do the compression, and the bandwidth you save)

You might want to consider Rhino Service Bus instead of MSMQ. The source is available here.

Which .NET Memcached client do you use, EnyimMemcached vs. BeITMemcached? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Seems like both EnyimMemcached (https://github.com/enyim/EnyimMemcached) and BeITMemcached (http://code.google.com/p/beitmemcached/) are popular .NET Memcached libraries. Both are reasonably active projects under development and have over a thousand downloads. Trying to figure out which one to use but found competing remarks! I did read another related post but still want to ask more people on this before making a decision.
EnyimMemcached claims on its project homepage (https://github.com/enyim/EnyimMemcached), that
based on our non-disclosed specially handcrafted in-house performance test we're the fastest C# client ever, using negative amount of system resources, be it memory or CPU time
and
we follow memcached's protocol specification as strictly as no one else: even the memcached guys ask us if they don't understand something
While BeITMemcached claims on its project wiki page (http://code.google.com/p/beitmemcached/wiki/Features) that
We have performed extensive functional testing and performance testing of the BeIT Memcached client and we are satisifed that it is working as it should. When we compared the performance against two other clients, the java port and the Enyim memcached client, our client consumed the least resources and had the best performance. It is also following the memcached protocol specifications more strictly, has the most memcached features, and is still much smaller in actual code size.
So for those who have experience on these or anything similar, which client did you choose to use and possibly why you chose the one you chose?
Thank you,
Ray.

We tested both and found Enyim to perform the best for our expected usage scenario: many (but not millions) cached objects, and millions of cache-get requests (average web site concurrency load = 16-20 requests.)
Our performance factor was measuring the time from making the request to having the object initialized in memory on the calling server. Both libraries would have sustained the job, but the enyim client was preferred in our testing.

There is a comparison between Enyim and BeIT at sysdot.wordpress.com/2011/03/08/memcached-clients-which-ones-best/

I have found Enyim to work the best.
It is easy to use, reliable and fast :)

Eniym client's Store() sometimes does not work correctly. It happens when key does not present in cache, for most cases after memcached service restart. This construction:
T val = _client.Get<T>(key);
if (val == null)
{
// ... filling val variable ...
var result = _client.Store(StoreMode.Add, key, val);
// ... result can be false, sometimes ...
}
works 50/50. T entity is [Serializable].

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.