I'm writing a gateway between 2 services, one of the services is a very slow webservice, and gets overloaded quickly, the other is super quick and frequently sends the same data.
I'd like to have my service discard (at the earliest point possible) data that I've received that is equal to previous objects.
What is the best way to do this?
The best way I know (which I doubt is best) is to compare received objects after deserialization with the set of objects I've already received (a cache, in other words).
I care more that I discard as much as is computationally easy to discard, than making sure I discard all duplicate data.
FYI, the data has, among other things, geolocational information that is frequently the same.
Clarification:
Situation:
Service 1 is fast and frequently sends updates that have no new data.
Service 2 is slow
I want to send data from Service 1 to Service 2 (with some slight modifications), but only if I haven't already sent the same data.
Dale
It's hard to say what the best way is without a little more info, but it sounds like you could benefit from a relatively simple cache. I'm not sure if you're in a write heavy or read heavy scenario, but you should be able to make it work either way.
IE, the quick service is called and checks for results in cache before calling the slow service.
I think Kennith provided a good idea.
Just to frame the problem, it sounds to me like you have something like this situation (clarify your question if this isn't correct)...
[Service 1] -> (Calls) -> [Service 2]
Service 1 - Faster and overloads Service 2.
Service 1 - Sends repeat data to Service 2, so much of it can be ignored.
If this is the case, as Kenneth suggested, you may want to implement a caching mechanism of requests that have already been sent from Service 1, and store the answers as received from Service 2. So, in pseudocode, something like this:
Service 1 checks in some common storage to see if a request it is about to send has already been sent.
If it has, it checks to see the answer that was sent back from Service 2.
If it hasn't, it sends the request and adds that request to the list of requests that are sent.
When the answer is provided, that is stored in the cache along with the original request.
This will also have the benefit of making your lookups faster via Service 1.
The earliest point possible is to detect the duplicate data is in the sender of the data, not in the receiver.
Add a memory cache there. If the object is duplicate, then no need to serialize it and waste any time sending it.
By the way, if you want a good .Net memory cache, I've had good luck with Couchbase's MemcacheD
Related
I am working on web api project, my web api is calling repository.Repository calls third party data source to perform CRUD.Calling the data source is very costly, and it gets updated weekly.
So I thought to implement caching. I have seen few output caching packages, but it does not fulfill my requirement, because:
If I output cache Get method, I am not able to use same Cache output in GetById method or the same cached data for some other operation like find opeartion. I have to also manually update cache when ever any update/post happens.
One more thing i am confused what to do in this scenario whether remove cache or update
cache whenever put or post operation happens?
I am totally confused to complete this requirement.Please suggest me how to fulfill this requirement.I searched on web,but have not found anything like that.
I am novice both on SO and WebAPI so pardon me if question not fulfilling the standard
If I output cache Get method, I am not able to use same Cache output
in GetById method or the same cached data for some other operation
like find opeartion. I have to also manually update cache when ever
any update/post happens.
To use the cached data for different operations like GetById and Find you need to store data in different data structures. Caches like REDIS supports hashmap for objects which can be used by GetById. It totally depends on your case what kind DS you need to use.
One more thing i am confused what to do in this scenario whether
remove cache or update cache whenever put or post operation happens?
To answer second part of your first question and this one I would say you need to choose between a writeback and write through cache. You can read more about WB and WT caches in this article. Basically there are two approaches
Each cache entry has some TTL and after that you fetch data from "data source". The Pros is your POST and PUT operations will be faster since it does not need to update the cache but cons is the data might be stale for sometime.
The second option is to invalidate the appropriate entry in the cache whenever a POST or PUT operation happens.
The third option is to update the cache entry at the time of POST and PUT.
In terms of write / update latency option 1 is the fastest but has the risk of getting stale data. Option 2 will slow down both GET, PUT / POST operations and option 3 will slow down the write operations.
Your choice should depend on the ratio of read and write operations in the system. If the system you are designing is read heavy then option 3 is better than 2.
I am creating a windows application (using windows form application) which calls the web service to fetch data. In data, I have to fetch 200+ clients information and for each client, I have to fetch all users information. A client can have 50 to 100 users. So, I am calling web service in a loop (after getting all clients list) for each client to fetch the users listing. This is a long process. I want to reduce the execution time for this whole process. So, please suggest me which approach can help in reducing the execution time which is currently up to 40-50 mins for one time data fetch. Let me know any solution like multithreading or any thing else, whichever is best suited to my application.
Thanks in advance.
If you are in control of the web service, have a method that returns all the clients at once instead of 1 by one to avoid rountrips as Michael suggested.
If not, make sure to make as many requests at the same time (not in sequence) to avoid as much laterncy as possible. For each request you will have at least 1 rountrip (so at least your ping's Worth of delay), if you make 150 requests then you'll get your ping to the server X 150 Worth of "just waiting on the network". If you split those requests in 4 bunches, and do each of these bunches in parallel, then you'll only wait 150/4*ping time. So the more requests you do concurrently, the least you wait.
I suggest you to avoid calling the service in a loop for every user to get the details, but instead do that loop in the server and return all the data in one-shot, otherwise you will suffer of a lot of useless latencies caused by the thousand of calls, and not just because of the server time or data-transferring time.
This is also a pattern, called Remote Facade or Facade Pattern explained by Martin Fowler and the Gang of Four:
any object that's intended to be used as a remote objects needs a coarse-grained interface that minimizes the number of calls needed to get some-thing done [...] Rather than ask for an order and its order lines individually, you need to access and update the order and order lines in a single call.
In case you're not in control of the web service, you could try to use a Parallel.ForEach loop instead of a ForEach loop to query the web service.
The MSDN has a tutorial on how to use it: http://msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx
My team is writing a windows service which needs to poll data from a 3rd party system (the 3rd party system provides a web service which we use).
The process will look something like this:
1. Call the 3rd party WS
2. Save the recieved raw data to our DB
3. Process the raw data
4. Save the processed data to our DB
5. Repeat
The team agrees that we actually have 2 different logical operations:
1. Getting and saving the raw data
2. Processing the raw data and saving the results
We are trying to decide which of the following design options is better:
Option 1: Perform both operation on the same windows service, each operation on it's own thread
Option 2: Perform the first operation on the windows service, and async/one-way call a wcf service for the second operation
In your opinion, which option is better?
if you have another alternative you think is better, please share it.
Thanks.
It depends.
Given that you have an apparently sequential process, why use separate threads to read and then process the data? The simplest approach would be a single thread that loops around reading, processing it, and presumably waiting at some point so you aren't rate limited by the third party.
If however the processing takes a long time you might want to split the work between a single polling thread and a set of workers that process the data.
The simplest option is usually the right one for your initial implementation. Adding threads and WCF service calls before you need them is rarely the right thing to do.
To give a better answer you really need to supply more information: does the third party service limit how many calls you can make at once or how quickly you can make them, how long does the processing take, how often do you need to poll, ...
According to your comment, I would say that you have a thread that once a second polls the 3rd party service and starts two tasks.
Task 1 would store the raw data to database.
Task 2 would process the raw data and store the result in database.
If the polling thread retrieves 1000 entries it should poll again without delay.
You can either use System.Threading.ThreadPool or System.Threading.Tasks.Task.
I have a smart client (WPF) that makes calls to the server va services (WCF). The screen I am working on holds a list of objects that it loads when the constructor is called. I am able to add, edit and delete records in the list.
Typically what I am doing is after every add or delete I am reloading the entire model from the service again, there are a number off reasons for this including the fact that the data may have changed on the server between calls.
This approach has proved to be a big hit on perfomance because I am loading everything sending the list up and down the wire on Add and Edit.
What other options are open to me, should I only be send the required information to the server and how would I go about not reloading all the data again ever time an add or delete is performed?
The optimal way of doing what you're describing (I'm going to assume that you know that client/server I/O is the bottleneck already) is to send only changes in both directions once the client is populated.
This can be straightforward if you've adopted a journaling model for updates to the data. In order for any process to make a change to the shared data, it has to create a time-stamped transaction that gets added to a journal. The update to the data is made by a method that applies the transaction to the data.
Once your data model supports transaction journals, you have a straightforward way of keeping the client and server in sync with a minimum of network traffic: to update the client, the server sends all of the journal entries that have been created since the last time the client was updated.
This can be a considerable amount of work to retrofit into an existing design. Before you go down this road, you want to be sure that the problem you're trying to fix is in fact the problem that you have.
Make sure this functionality is well-encapsulated so you can play with it without having to touch other components.
Have your source under version control and check in often.
I highly recommend having a suite of automated unit tests to verify that everything works as expected before refactoring and continues to work as you perform each change.
If the performance hit is on the server->client transfer of data, moreso than the querying, processing and disk IO on the server, you could consider devising a hash of a given collection or graph of objects, and passing the hash to a service method on the server, which would query and calculate the hash from the db, compare the hashes, and return true or false. Only if false would you then reload data. This works if changes are unlikely or infrequent, because it requires two calls to get the data, when it has changed. If changes in the db are a concern, you might not want to only get the changes when the user modifies or adds something -- this might be a completely separate action based on a timer, for example. Your concurrency strategy really depends on your data, number of users, likelihood of more than one user being interested in changing the same data at the same time, etc.
I have several vehicles that send a data to server each minute. The server should be listening and decode the data to store in the database. There will be thousands of entries per minute. What is the best approach to solve that problem?
My personal favorite, WCF or WebService farm pumps the data to a Microsoft Message Queue (MSMQ) and have a application server (1 or more) convert the data and put it into the DB.
As you get deeper (if you ever need to), you can use the features of MSMQ to handle timeouts, load buffering, 'dead-letters', server failures, whatever. Consider this article.
On the web facing side of this, because it is stateless and thin you can easily scale out this layer without thinking about complex load balancing. You can use DNS load balancing to start and then move to a better solution when you need it.
As a further note, by using MSMQ, you can also see how far 'behind' the system is by looking at how many messages are in the queue. If that number is near 0, then you good. If that number keeps rising non-stop, you need more performance (add another application server).
We're doing exactly what Jason says, except using a direct TCP/UDP socket listener with a custom payload for higher performance.
How long do you expect each operation to take? From what you're saying it seems like you can just write the data straight to the db after processing, so you don't have to synchronize your threads at all (The db should have that taken care of for you).