Which option for building this polling windows service is better?

Which option for building this polling windows service is better? - c#

My team is writing a windows service which needs to poll data from a 3rd party system (the 3rd party system provides a web service which we use).
The process will look something like this:
1. Call the 3rd party WS
2. Save the recieved raw data to our DB
3. Process the raw data
4. Save the processed data to our DB
5. Repeat
The team agrees that we actually have 2 different logical operations:
1. Getting and saving the raw data
2. Processing the raw data and saving the results
We are trying to decide which of the following design options is better:
Option 1: Perform both operation on the same windows service, each operation on it's own thread
Option 2: Perform the first operation on the windows service, and async/one-way call a wcf service for the second operation
In your opinion, which option is better?
if you have another alternative you think is better, please share it.
Thanks.

It depends.
Given that you have an apparently sequential process, why use separate threads to read and then process the data? The simplest approach would be a single thread that loops around reading, processing it, and presumably waiting at some point so you aren't rate limited by the third party.
If however the processing takes a long time you might want to split the work between a single polling thread and a set of workers that process the data.
The simplest option is usually the right one for your initial implementation. Adding threads and WCF service calls before you need them is rarely the right thing to do.
To give a better answer you really need to supply more information: does the third party service limit how many calls you can make at once or how quickly you can make them, how long does the processing take, how often do you need to poll, ...

According to your comment, I would say that you have a thread that once a second polls the 3rd party service and starts two tasks.
Task 1 would store the raw data to database.
Task 2 would process the raw data and store the result in database.
If the polling thread retrieves 1000 entries it should poll again without delay.
You can either use System.Threading.ThreadPool or System.Threading.Tasks.Task.

Related

RESTFUL web service v Message queue when using Scatter Gatherer

Say I have a scatter gather setup like this:
1) Web app
2) RabbitMQ
3) Scatter gather API 1
4) Scatter gather API 2
5) Scatter gather API x
Say each scatter gather (and any new ones added in future) need to supply an image/update an image to the web app, so that when the web app displays the results on screen it also displays the image. What is the best way to do this?
1) RESTFUL call from each API to web app adding/updating an image where necessary
2) Use message queue to send the image
I believe option two is best because I am using a microservices architecture. However, this would mean that the image could be processed by the web app after requests are made (if competiting consumers are used). Therefore the image could be missing from the webpage?
The problem with option 1 is the scatter gatherer apis are tightly coupled with the web app.
What is the appropriate way to approach this?

The short answer: There is no right way to do this.
The long answer: Because there's no right way to do this, there a danger that any answer I give you will be an opinion. Rather than do that, I'm going to help clarify the ramifications of each option you've proposed.
First thing to note: Unless there is already an image available at the time of the HTTP request, then your HTTP response will not be able to include an image. This means that your front-end will need to be updated after the HTTP request/response cycle has concluded. There are two ways to do this: polling via AJAX requests, or pushing via sockets.
The advantage of polling is that it is probably easier to integrate into an existing web app. The advantage of pushing the image to the client via sockets is that the client won't need to spam your server with polling requests.
Second thing to note: Reporting back the image from the scatter/gather workers could happen either via an HTTP endpoint, or via the message queue, as you suggest.
The advantage of the HTTP endpoint is that it would likely be simpler to setup. The advantage of the message queue is that the worker would not have to wait for the the HTTP response (which could take a while if you're writing a large image file to disk) before moving on to the next job.
One more thing to note: If you choose to use an HTTP endpoint to create/update the images, it is possible that multiple scatter/gather workers will be trying to do this at the same time. You'll need to handle this to prevent multiple workers from trying to write to the same file at the same time. You could handle this by using a mutex to lock the file while one process is writing to it. If you choose to use a message queue, you'll have several options for dealing with this: you could use a mutex, or you could use a FIFO queue that guarantees the order of execution, or you could limit the number of workers on the queue to one, to prevent concurrency.
I do have experience with a similar system. My team and I chose to use a message queue. It worked well for us, given our constraints. But, ultimately, you'll need to decide which will work better for you given your constraints.
EDIT
The constraints we considered in choosing a message queue over HTTP included:
Not wanting to add private endpoints to a public facing web app
Not wanting to hold up a worker to wait on an HTTP request/response
Not wanting to make synchronous that which was asynchronous
There may have been other reasons. Those are the ones I remember off the top of my head.

What is the best Method for monitoring a large number of clients reliably with good performance

This is more of a programming strategy and direction question, than the actual code itself.
I am programming in C-Sharp.
I have an application that remotely starts processes on many different clients on the network, could be up to 1000 clients in theory.
It then monitors the status of the remote processes by reading a log file on each client.
I currently do this by running one thread that loops through all of the clients in a list, and reading the log file. It works fine for 10 or 20 machines, but 1000 would probably be untenable.
There are several problems with this approach:
First, if the thread doesn’t finish reading all of the client statuses before it’s called again, the client statuses at the end of the list might not be read and updated.
Secondly, if any client in the list goes offline during this period, the updating hangs, until that client is back online again.
So I require a different approach, and have thought up a few possible ways to resolve this.
Spawn a separate thread for each client, to read their log file and update its progress.
a. However, I’m not sure if having 1000 threads running on my machine is something that would be acceptable.
Test the connect for each machine first, before trying to read the file, and if it cannot connect, then just ignore it for that iteration and move on to the next client in the list.
a. This still has the same problem of not getting through the list before the next call, and causes more delay and it tries to test the connection via a port first. With 1000 clients, this would be noticeable.
Have each client send the data to the machine running the application whenever there is an update.
a. This could create a lot of chatter with 1000 machines trying to send data repeatedly.
So I’m trying to figure if there is another more efficient and reliable method, that I haven’t considered, or which one of these would be the best.
Right now I’m leaning towards having the clients send updates to the application, instead of having the application pulling the data.
Looking for thoughts, concerns, ideas and recommendations.

In my opinion, you are doing this (Monitoring) the wrong way. Instead of keeping all logs in a text file, you'd better preserve them in a central data repository that can be of any kind. With respect to the fact that you are monitoring the performance of those system, your design and the mechanism behind it must not impact the performance of the target systems negatively, and with this design the disk and CPU would be involved so much in certain cases that can result in a performance issue itself.
I recommend you to create a log repository server using a fast in-memory database like Redis, and send logged data directly to that server. Keep in mind that this database must be running on a different virtual machine. You can then tune Redis to store received data on physical Disk once a particular number of indexes are reached or a particular interval elapses. The in-memory feature here is advantageous as you may need to query information a lot in a monitoring application like this. On the other hand, the performance of Redis is so high that it efficiently passes processing millions of indexes.
The blueprint for you is that:
1- Centralize all log data in a single repository.
2- Configure clients to send monitored information to the centralized repository.
3- Read the data from the centralized repository by the main server (monitoring system) when required.
I'm not trying to advertise for a particular tool here as I'm only sharing my own experience. There's many more tools that you can use for this purpose such as ElasticSearch.

How to reduce the execution time in C# while calling an API?

I am creating a windows application (using windows form application) which calls the web service to fetch data. In data, I have to fetch 200+ clients information and for each client, I have to fetch all users information. A client can have 50 to 100 users. So, I am calling web service in a loop (after getting all clients list) for each client to fetch the users listing. This is a long process. I want to reduce the execution time for this whole process. So, please suggest me which approach can help in reducing the execution time which is currently up to 40-50 mins for one time data fetch. Let me know any solution like multithreading or any thing else, whichever is best suited to my application.
Thanks in advance.

If you are in control of the web service, have a method that returns all the clients at once instead of 1 by one to avoid rountrips as Michael suggested.
If not, make sure to make as many requests at the same time (not in sequence) to avoid as much laterncy as possible. For each request you will have at least 1 rountrip (so at least your ping's Worth of delay), if you make 150 requests then you'll get your ping to the server X 150 Worth of "just waiting on the network". If you split those requests in 4 bunches, and do each of these bunches in parallel, then you'll only wait 150/4*ping time. So the more requests you do concurrently, the least you wait.

I suggest you to avoid calling the service in a loop for every user to get the details, but instead do that loop in the server and return all the data in one-shot, otherwise you will suffer of a lot of useless latencies caused by the thousand of calls, and not just because of the server time or data-transferring time.
This is also a pattern, called Remote Facade or Facade Pattern explained by Martin Fowler and the Gang of Four:
any object that's intended to be used as a remote objects needs a coarse-grained interface that minimizes the number of calls needed to get some-thing done [...] Rather than ask for an order and its order lines individually, you need to access and update the order and order lines in a single call.

In case you're not in control of the web service, you could try to use a Parallel.ForEach loop instead of a ForEach loop to query the web service.
The MSDN has a tutorial on how to use it: http://msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx

WCF discard duplicate data

I'm writing a gateway between 2 services, one of the services is a very slow webservice, and gets overloaded quickly, the other is super quick and frequently sends the same data.
I'd like to have my service discard (at the earliest point possible) data that I've received that is equal to previous objects.
What is the best way to do this?
The best way I know (which I doubt is best) is to compare received objects after deserialization with the set of objects I've already received (a cache, in other words).
I care more that I discard as much as is computationally easy to discard, than making sure I discard all duplicate data.
FYI, the data has, among other things, geolocational information that is frequently the same.
Clarification:
Situation:
Service 1 is fast and frequently sends updates that have no new data.
Service 2 is slow
I want to send data from Service 1 to Service 2 (with some slight modifications), but only if I haven't already sent the same data.
Dale

It's hard to say what the best way is without a little more info, but it sounds like you could benefit from a relatively simple cache. I'm not sure if you're in a write heavy or read heavy scenario, but you should be able to make it work either way.
IE, the quick service is called and checks for results in cache before calling the slow service.

I think Kennith provided a good idea.
Just to frame the problem, it sounds to me like you have something like this situation (clarify your question if this isn't correct)...
[Service 1] -> (Calls) -> [Service 2]
Service 1 - Faster and overloads Service 2.
Service 1 - Sends repeat data to Service 2, so much of it can be ignored.
If this is the case, as Kenneth suggested, you may want to implement a caching mechanism of requests that have already been sent from Service 1, and store the answers as received from Service 2. So, in pseudocode, something like this:
Service 1 checks in some common storage to see if a request it is about to send has already been sent.
If it has, it checks to see the answer that was sent back from Service 2.
If it hasn't, it sends the request and adds that request to the list of requests that are sent.
When the answer is provided, that is stored in the cache along with the original request.
This will also have the benefit of making your lookups faster via Service 1.

The earliest point possible is to detect the duplicate data is in the sender of the data, not in the receiver.
Add a memory cache there. If the object is duplicate, then no need to serialize it and waste any time sending it.
By the way, if you want a good .Net memory cache, I've had good luck with Couchbase's MemcacheD

How the best solution to solve Vehicle Tracking GPS collecting data in C#?

I have several vehicles that send a data to server each minute. The server should be listening and decode the data to store in the database. There will be thousands of entries per minute. What is the best approach to solve that problem?

My personal favorite, WCF or WebService farm pumps the data to a Microsoft Message Queue (MSMQ) and have a application server (1 or more) convert the data and put it into the DB.
As you get deeper (if you ever need to), you can use the features of MSMQ to handle timeouts, load buffering, 'dead-letters', server failures, whatever. Consider this article.
On the web facing side of this, because it is stateless and thin you can easily scale out this layer without thinking about complex load balancing. You can use DNS load balancing to start and then move to a better solution when you need it.
As a further note, by using MSMQ, you can also see how far 'behind' the system is by looking at how many messages are in the queue. If that number is near 0, then you good. If that number keeps rising non-stop, you need more performance (add another application server).

We're doing exactly what Jason says, except using a direct TCP/UDP socket listener with a custom payload for higher performance.

How long do you expect each operation to take? From what you're saying it seems like you can just write the data straight to the db after processing, so you don't have to synchronize your threads at all (The db should have that taken care of for you).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.