Can I use asynchronous programming to fetch data from a server? - c#

This is my situation:
I have a account (userid/password) to communicate with an airline central reservation system through their API.
The API provide methods to connect, disconnect, sign in, signout, sendcommand and getdatareturn.
This is the steps I do sequentially to get wanted data.
Connect to host
sign in to system
send a command to get a list of passergers of a flight at a specified date from a city to another city (LD command with some parameter like flight no,
flight date, pair of city for original and destination),
but in this step, the host only return a part of the full list (for example, it return only 20 passengers and
end of this list is a # character to signal that there are more)
if I want a full list, I must send another command (MD command) to move down and so on to the end of list (with the signal by the END string)
.The passenger list content passenger name, class and a PRN code, base on these PNR code, I must send another command to get details passenger information
like fullname, itinerary, contact information ... then process it (this consume some time to do)(and in this details, I can send various command to get more information...)
sign out of system
disconnect from host
Can I use multithread or parallel techonology for #3 to get data from server?

Depends on the type of connection. How do you connect, and do you remain connected?
If it's a pair of sockets that keep communicating (i.e. stateful), you could try to create another connection, log in again, and request the data you want. If it's done stateless (over HTTP for example) using some kind of session ID to correlate subsequent requests, you could simply simultaneously issue multiple requests with the same session ID and see if that works.
So through your initial connection you request the list of PNR's, and then use that connection and new connections to request passenger data for multiple passengers at the time, getting all data for all passengers on the list.
If both options to achieve that don't work, and you're stuck to using one connection, I'm afraid there is no other solution. Couldn't you try to contact them to ask if this is possible?

I'm afraid my answer is "It depends". I see no problems with parallel queries from the client side, and some of the information (such as per passenger detail) could probably be done in a separate, parallel query, but getting the full list sounds as if it should be done in a single thread / connection.
Why: I don't know the system you're querying, but it sounds as if it saves the state of your query (what is he asking for, how far down the list is he currently) and so would probably not handle "Give me parts 1, 2, and 3" of the list very well, especially if part 3 didn't exist (and you don't know until you see the "#" at the end of part 2, which depends on part 1...)

Can I use multithread or parallel techonology for #2 to get data from server?
What purpose would it serve?
You cannot log out of the system until the data is returned, and the last two steps certainly are not resource intensive, or dependent upon your user interface.
What you actually want to do, sending the command more then once, is not really a task for multiple threads. You simply want to send the command until the symbol that indicates there isn't additional data is not detected.
If you are not already doing this that simply means you should.
This is no different then reading user input within a console application.

Well, you could perhaps use parallel operations to allow users to issue more than one query at the same time, assuming that the CRS allows multiple connections from the same IP. The user might, for example have more than one 'CRS' form on their screen and so be handling more than one query at the same time, eg for different dates, airports, flights, pax.
As noted by other posters, if the users is only processing one query at a time, there is not much point in paralleling anything, (except maybe the UI and client protocol so that the UI is not locked and so allow query cancel).
That said, given a requirement like this, I would normally design in such a way that multiple queries are the default behaviour anyway. I would have the CRS query form host everything needed to interact with the CRS so that, if necessary/possible, two instances of the form would allow two concurrent queries if supported by the server. This is more flexible than the alternative of running two processes.

Related

What is the best Method for monitoring a large number of clients reliably with good performance

This is more of a programming strategy and direction question, than the actual code itself.
I am programming in C-Sharp.
I have an application that remotely starts processes on many different clients on the network, could be up to 1000 clients in theory.
It then monitors the status of the remote processes by reading a log file on each client.
I currently do this by running one thread that loops through all of the clients in a list, and reading the log file. It works fine for 10 or 20 machines, but 1000 would probably be untenable.
There are several problems with this approach:
First, if the thread doesn’t finish reading all of the client statuses before it’s called again, the client statuses at the end of the list might not be read and updated.
Secondly, if any client in the list goes offline during this period, the updating hangs, until that client is back online again.
So I require a different approach, and have thought up a few possible ways to resolve this.
Spawn a separate thread for each client, to read their log file and update its progress.
a. However, I’m not sure if having 1000 threads running on my machine is something that would be acceptable.
Test the connect for each machine first, before trying to read the file, and if it cannot connect, then just ignore it for that iteration and move on to the next client in the list.
a. This still has the same problem of not getting through the list before the next call, and causes more delay and it tries to test the connection via a port first. With 1000 clients, this would be noticeable.
Have each client send the data to the machine running the application whenever there is an update.
a. This could create a lot of chatter with 1000 machines trying to send data repeatedly.
So I’m trying to figure if there is another more efficient and reliable method, that I haven’t considered, or which one of these would be the best.
Right now I’m leaning towards having the clients send updates to the application, instead of having the application pulling the data.
Looking for thoughts, concerns, ideas and recommendations.
In my opinion, you are doing this (Monitoring) the wrong way. Instead of keeping all logs in a text file, you'd better preserve them in a central data repository that can be of any kind. With respect to the fact that you are monitoring the performance of those system, your design and the mechanism behind it must not impact the performance of the target systems negatively, and with this design the disk and CPU would be involved so much in certain cases that can result in a performance issue itself.
I recommend you to create a log repository server using a fast in-memory database like Redis, and send logged data directly to that server. Keep in mind that this database must be running on a different virtual machine. You can then tune Redis to store received data on physical Disk once a particular number of indexes are reached or a particular interval elapses. The in-memory feature here is advantageous as you may need to query information a lot in a monitoring application like this. On the other hand, the performance of Redis is so high that it efficiently passes processing millions of indexes.
The blueprint for you is that:
1- Centralize all log data in a single repository.
2- Configure clients to send monitored information to the centralized repository.
3- Read the data from the centralized repository by the main server (monitoring system) when required.
I'm not trying to advertise for a particular tool here as I'm only sharing my own experience. There's many more tools that you can use for this purpose such as ElasticSearch.

Is there any way to see how many actors an ActorSystem has, and what their names are?

I am thinking about building a system that pre-fetches queries for the user using information in a currently returned query (the user queries the DB for a userId, that returns a list of phone calls the user has been on. I want to pre-fetch the queries for the phone calls (things like date, duration, and location of the recorded phone call). The user may never query those things, but they generally do.
Currently, each query is taking ~10 seconds (there's some optimization that needs to happen in the code, but the big bottleneck is the DB and that's out of my hands).
So, I want to do the prefetching with actors. I'll eventually figure out a way for the actors to kill themselves if they go unused for x time, but first I'd like a way to see what actors I have running.
Is there any way to do that in Akka.NET?
How you could model this is by having an parent actor that handles a certain set of prefetchingActor(s). Werther you have one Actor that manages all your prefetches, or multiple ones where each one manages a specialized set of prefetchers is up to you.
But the point is that you will end up with a well-defined top level actor(s) which you can send a message to ask how many active prefetchers it has.
Update:
You can use ActorSelection and wildcards to send a message to multiple actors.
Regarding handling timeouts. I'd use a passivate pattern. A child actor can use the ReceiveTimeout method to monitor its own activity, and send a passivate message to its parent once it detects its been idle for x amount of time.
Once the parent receives the passivate message it would stop routing messages to the child, buffering them instead. And stop the actor. Once actor termination is confirmed, it would check if there are any new messages for that actor, if so recreate the actor and flush messages from its buffer to it.

Akka.Net ConsistentHashing router as a load balancer for in memory data object

I have huge in memory object and I am just wondering, if I can create a ConsistentHashing router and be able split load to underling actors.
The main problem is that I need to ingest actors with data after creation, so just wondering how I can achieve that.
The master object has a good number of records, which will be grouped by Id field.
This is my ideas:
when actor is created - can I get it's hashPool to retrieve required ids from master object?
when actor is created I can wrap ingestion message ConsistentHashableEnvelope and then when queering using same wrap to ask for data- will this work?
reference
As per the comments on your question, I understand better what you're trying to do.
I've done something similar, but slightly different in the way the actors have to start up: a caching system where I use a user's ID in a ConsistentHashableEnvelope to route the request to the actor that should hadndle that user's request. If the user's data isn't available, it is loaded into memory from a third-party service. All following requests will operate on this data in memory.
In your case I'd tackle the problem in this manner:
Setup your ConsistentHashing router with actors that startup in a state that can Receive<> the individual entries they need.
Simply Tell() the router all the individual entries in this large object you want to slice up using a ConsistentHashableEnvelope as a wrapper, and the router will send each entry to its correct destination.
In the actors, Receive<> the entries and use some method to merge the received data into the existing internal structure. This means that when the actor first starts up it will simply store the slices and if it receives entries later on you can decide what to do (replace, update, whatever you wish).
When handing off subsequest messages to the router, always make sure you're using the correct key ConsistentHashableEnvelope, otherwise the message will be routed to an actor that doesn't have the data!
From your post I don't know much about the rest of your project requirements, but if you need to be able to first populate the actors with their slices before handling other requests that depend on this data, you may want to start the routees in an AwaitingData state using Become(), and then move them to a Ready state once their data has been received.
If you add more info about your project I may be able to help you some more.

How to implement locking across a network

I have a desktop application. In this application there many records that users can open and work on. If a user clicks on a record the program will lock the record so no one else can use it. If the record is already locked then the user may still view it but it will be read-only. Many users on our local network can open and work on records.
My first thought is to use the database to manage locks on records. But I am not sure how or if this is the best approach. Is there any programming patterns or ready made solutions I can use?
I've implemented a similar system for a WPF application accessing a database, however I no longer have access to the source code, I'll try to explain here. The route I took was somewhat different from using the database. Using a Duplex WCF service you can host a service somewhere (i.e. the database server) from which clients connect. Key things to understand:
You can make this service generic by having some kind of data type and by making sure each row type has the same type of primary key (e.g. a long). In that case, you could have a signature similar to: bool AcquireLock(string dataType, long id) or replacing the bool/long by bool[] and long[] if users frequently modify a larger number of rows.
On the server side, you must be able to quickly respond to this request. Consider storing the data in something along the lines of a Dictionary<String (DataType), Dictionary<User, HashSet<long>> where the root string is a datatype.
When someone connects, he can receive a list of all locks for a given data type (e.g. when a screen opens that locks that type of records), while also registering to receive updates for a given data type.
The socket connection between the client as the server defines that the user is 'connected'. If the socket closes, the server releases all locks for that user, immediately notifying others that the user has lost his lock, making the record available again for editing. (This covers scenarios such as a user disconnecting or killing a process).
To avoid concurrency issues, make sure a user acquired the lock before allowing him to make any changes. (e.g. BeginEdit, check with the server first, by implementing IEditableObject on your view model).
When a lock is released, the client tells the server if he made changes to the row, so that other clients can update the respective data. When the socket disconnects, assume no changes.
Nice feature to add: when providing users with a list / update of locks, also provide the user id, so that people can see who is working on what.
This form of 'real time concurrency' provides a much better user experience than providing a way to handle optimistic concurrency problems, and might also be technically easier to implement, depending on your scenario.

Scaling out Windows Services

I am looking for some input on how to scale out a Windows Service that is currently running at my company. We are using .NET 4.0 (can and will be upgraded to 4.5 at some point in the future) and running this on Windows Server 2012.
About the service
The service's job is to query for new rows in a logging table (We're working with an Oracle database), process the information, create and/or update a bunch of rows in 5 other tables (let's call them Tracking tables), update the logging table and repeat.
The logging table has large amounts of XML (can go up to 20 MB per row) which needs to be selected and saved in the other 5 Tracking tables. New rows are added all the time at the maximum rate of 500,000 rows an hour.
The Tracking tables' traffic is much higher, ranging from 90,000 new rows in the smallest one to potentially millions of rows in the largest table, each hour. Not to mention that there are Update operations on those tables as well.
About the data being processed
I feel this bit is important for finding a solution based on how these objects are grouped and processed. The data structure looks like this:
public class Report
{
public long Id { get; set; }
public DateTime CreateTime { get; set; }
public Guid MessageId { get; set; }
public string XmlData { get; set; }
}
public class Message
{
public Guid Id { get; set; }
}
Report is the logging data I need to select and process
For every Message there are on average 5 Reports. This can vary between 1 to hundreds in some cases.
Message has a bunch of other collections and other relations, but they are irrelevant to the question.
Today the Windows Service we have barely manages the load on a 16-core server (I don't remember the full specs, but it's safe to say this machine is a beast). I have been tasked with finding a way to scale out and add more machines that will process all this data and not interfere with the other instances.
Currently each Message gets it's own Thread and handles the relevant reports. We handle reports in batches, grouped by their MessageId to reduce the number of DB queries to a minimum when processing the data.
Limitations
At this stage I am allowed to re-write this service from scratch using any architecture I see fit.
Should an instance crash, the other instances need to be able to pick up where the crashed one left. No data can be lost.
This processing needs to be as close to real-time as possible from the reports being inserted into the database.
I'm looking for any input or advice on how to build such a project. I assume the services will need to be stateless, or is there a way to synchronize caches for all the instances somehow? How should I coordinate between all the instances and make sure they're not processing the same data? How can I distribute the load equally between them? And of course, how to handle an instance crashing and not completing it's work?
EDIT
Removed irrelevant information
For your work items, Windows Workflow is probably your quickest means to refactor your service.
Windows Workflow Foundation # MSDN
The most useful thing you'll get out of WF is workflow persistence, where a properly designed workflow may resume from a Persist point, should anything happen to the workflow from the last point at which it was saved.
Workflow Persistence # MSDN
This includes the ability for a workflow to be recovered from another process should any other process crash while processing the workflow. The resuming process doesn't need to be on the same machine if you use the shared workflow store. Note that all recoverable workflows require the use of the workflow store.
For work distribution, you have a couple options.
A service to produce messages combined with host-based load balancing via workflow invocation using WCF endpoints via the WorkflowService class. Note that you'll probably want to use the design-mode editor here to construct entry methods rather than manually setup Receive and corresponding SendReply handlers (these map to WCF methods). You would likely call the service for every Message, and perhaps also call the service for every Report. Note that the CanCreateInstance property is important here. Every invocation tied to it will create a running instance that runs independently.
~
WorkflowService Class (System.ServiceModel.Activities) # MSDN
Receive Class (System.ServiceModel.Activities) # MSDN
Receive.CanCreateInstance Property (System.ServiceModel.Activities) # MSDN
SendReply Class (System.ServiceModel.Activities) # MSDN
Use a service bus that has Queue support. At the minimum, you want something that potentially accepts input from any number of clients, and whose outputs may be uniquely identified and handled exactly once. A few that come to mind are NServiceBus, MSMQ, RabbitMQ, and ZeroMQ. Out of the items mentioned here, NServiceBus is exclusively .NET ready out-of-the-box. In a cloud context, your options also include platform-specific offerings such as Azure Service Bus and Amazon SQS.
~
NServiceBus
MSMQ # MSDN
RabbitMQ
ZeroMQ
Azure Service Bus # MSDN
Amazon SQS # Amazon AWS
~
Note that the service bus is just the glue between a producer that will initiate Messages and a consumer that can exist on any number of machines to read from the queue. Similarly, you can use this indirection for Report generation. Your consumer will create workflow instances that may then use workflow persistence.
Windows AppFabric may be used to host workflows, allowing you to use many techniques that apply to IIS load balancing to distribute your work. I don't personally have any experience with it, so there's not much I can say for it other than it has good monitoring support out-of-the-box.
~
How to: Host a Workflow Service with Windows App Fabric # MSDN
I solved this by coding all this scalability and redundancy stuff on my own. I will explain what I did and how I did it, should anyone ever need this.
I created a few processes in each instance to keep track of the others and know which records the particular instance can process. On start up, the instance would register in the database (if it's not already) in a table called Instances. This table has the following columns:
Id Number
MachineName Varchar2
LastActive Timestamp
IsMaster Number(1)
After registering and creating a row in this table if the instance's MachineName wasn't found, the instance starts pinging this table every second in a separate thread, updating its LastActive column. Then it selects all the rows from this table and makes sure that the Master Instance (more on that later) is still alive - meaning that it's LastActive time is in the last 10 seconds. If the master instance stopped responding, it will assume control and set itself as master. In the next iteration it will make sure that there is only one master (in case another instance decided to assume control as well simultaneously), and if not it will yield to the instance with the lowest Id.
What is the master instance?
The service's job is to scan a logging table and process that data so people can filter and read through it easily. I didn't state this in my question, but it might be relevant here. We have a bunch of ESB servers writing multiple records to the logging table per request, and my service's job is to keep track of them in near real-time. Since they're writing their logs asynchronously, I could potentially get a finished processing request A before started processing request A entry in the log. So, I have some code that sorts those records and makes sure my service processes the data in the correct order. Because I needed to scale out this service, only one instance can do this logic to avoid lots of unnecessary DB queries and possibly insane bugs.
This is where the Master Instance comes in. Only it executes this sorting logic and temporarily saves the log record Id's in another table called ReportAssignment. This table's job is to keep track of which records were processed and by whom. Once processing is complete, the record is deleted. The table looks like this:
RecordId Number
InstanceId Number Nullable
The master instance sorts the log entries and inserts their Id's here. All my service instances check this table in 1 second intervals for new records that aren't being processed by anyone or that are being processed by an inactive instance, and that the [record's Id] % [number of isnstances] == [index of current instance in a sorted array of all the active instances] (that were acquired during the Pinging process). The query looks somewhat like this:
SELECT * FROM ReportAssignment
WHERE (InstanceId IS NULL OR InstanceId NOT IN (1, 2, 3)) // 1,2,3 are the active instances
AND RecordId % 3 == 0 // 0 is the index of the current instance in the list of active instances
Why do I need to do this?
The other two instances would query for RecordId % 3 == 1 and RecordId % 3 == 2.
RecordId % [instanceCount] == [indexOfCurrentInstance] ensures that the records are distributed evenly between all instances.
InstanceId NOT IN (1,2,3) allows the instances to take over records that were being processed by an instance that crashed, and not process the records of already active instances when a new instance is added.
Once an instance queries for these records, it will execute an update command, setting the InstanceId to its own and query the logging table for records with those Id's. When processing is complete, it deletes the records from ReportAssignment.
Overall I am very pleased with this. It scales nicely, ensures that no data is lost should the instance go down, and there were nearly no alterations to the existing code we have.

Categories

Resources