Current situation: an existing SQL Server stored procedure I have no control upon returns 10 large strings in separate resultsets in about 30 seconds (~3 seconds per dataset). The existing ASP.NET Web API controller method that collects these strings only returns a response once all strings are obtained from the stored procedure. When the client receives the response, it takes another 30 seconds to process the strings and display the results, for a total of 1 minute from request initiation to operation completion.
Contemplated improvement: somehow transmit the strings to the client as soon as each is obtained from the SqlDataReader, so the client can work on interpreting each string while receiving the subsequent ones. The total time from request initiation to completion would thus roughly be halved.
I have considered the WebClient events at my disposal, such as DownloadStringCompleted and DownloadProgressChanged, but feel none is viable and generally think I am on the wrong track, hence this question. I have all kinds of ideas, such as saving strings to temporary files on the server and sending each file name to the client through a parallel SignalR channel for the client to request in parallel, etc., but feel I would both lose my time and your opportunity to enlighten me.
I would not resort to inverting the standard client / server relationship using a "server push" approach. All you need is some kind of intermediary dataset. It could be a singleton object (or multiple objects, one per client) on your server, or another table in an actual database (perhaps NoSql).
The point is that the client will not directly access the slow data flow you're dealing with. Instead the client will only access the intermediary dataset. On the first request, you will start off the process of migrating data from the slow dataset to the intermediary database and the client will have to wait until the first batch is ready.
The client will then make additional requests as he processes each result on his end. If more intermediary results are already available he will get them immediately, otherwise he will have to wait like he did on the first request.
But the server is continuously waiting on the slow data set and adding more data to the intermediate data set. You will have to have a way of marking the intermediate data as having already been sent to the client or not. You will probably want to spawn a separate thread for the code that moves data from the slow data source to the intermediate one.
Related
I have an array of websites that (asynchronously) send event analytics into an ASP.NET website, which then should send the events into an Azure EventHubs instance.
The challenge I'm facing is that with requests exceeding 50,000 per second I've noticed that my response times to serve these requests are into the multi-second range, effecting total load times for the initial sending website. I have scaled up all parts however I recognize that sending an event per request is not very efficient due to the overhead of opening an AMQP connection to Event Hubs and sending off the payload.
As a solution I've been trying to batch the Event Data that gets sent to my EventHubs instance however I've been running into some problems with synchronizing.
With each request, I add the Event Data into a static EventDataBatch created via EventHubClient.CreateBatch() with eventHubData.TryAdd() then I check to see that the quantity of events is within a predefined threshold and if so, I send the events asynchronously via EventHubClient.SendAsync(). The challenge this has created is that since this is a ASP .NET application, there could be many threads attempting to serve requests at any given instance - any of which could be trying to to eventHubData.TryAdd() or EventHubClient.SendAsync() at the same point in time.As a poor attempt to resolve this I have attempted to call lock(batch) prior to eventHubData.TryAdd() however this does not resolve the issue since I cannot also lock the asynchronous method EventHubClient.SendAsync().
What is the best way to implement this solution so that each request does not require it's own request to Event hubs and can take advantage of batching while also preserving the integrity of the batch itself and not running into any deadlock issues?
Have a look at the source code for the application insights SDK to see how they have solved this problem - you can reuse the key parts of this to achieve the same thing with event hubs AMQP.
The pattern is ,
1) Buffer data. Define a buffer that you will share among threads with a maximum size. Multiple threads write data into the buffer
https://github.com/Microsoft/ApplicationInsights-dotnet/blob/develop/src/Microsoft.ApplicationInsights/Channel/TelemetryBuffer.cs
2) Prepare a transmission. You can transmit the items in the buffer either when the buffer is full, when some interval elapses, or whichever happens first. Take all the items from the buffer to send
https://github.com/Microsoft/ApplicationInsights-dotnet/blob/develop/src/Microsoft.ApplicationInsights/Channel/InMemoryTransmitter.cs
3) Do the transmission. Send all items as multiple data points in a single Event Hub message,
https://github.com/Microsoft/ApplicationInsights-dotnet/blob/develop/src/Microsoft.ApplicationInsights/Channel/Transmission.cs
They are the 3 classes that combine to achieve this using HTTP to post to the Application Insights collection endpoint - you can see how the sample pattern can be applied to collect, amalgamate and transmit to Event Hubs.
You'll need to control the maximum message size, which is 256KB per Event Hub message, which you could do by setting the telemetry buffer size - that's up to your client logic to manage that.
I'm trying to write an asynch socket application which transfering complex objects over across sides..
I used the example here...
Everything is fine till i try send multi package data. When the transferred data requires multiple package transfer server application is suspending and server is going out of control without any errors...
After many hours later i find a solution; if i close client sender socket after each EndSend callback, the problem is solving. But i couldn't understand why this is necessary? Or are there any other solution for the situation?
My (2) projects is same with example above only i changed EndSend callback method like following:
public void EndSendCallback(IAsyncResult result)
{
Status status = (Status)result.AsyncState;
int size = status.Socket.EndSend(result);
status.Socket.Close(); // <--------------- This line solved the situation
Console.Out.WriteLine("Send data: " + size + " bytes.");
Console.ReadLine();
allDone.Set();
}
Thanks..
This is due to the example code given not handling multiple packages (and being broken).
A few observations:
The server can only handle 1 client at a time.
The server simply checks whether the data coming in is in a single read smaller than the data requested and if so, assumes that's the last part.
The server then ignores the client socket while leaving the connection open. This puts the responsibility of closing the connection on the client side which can be confusing and which will waste resources on the server.
Now the first observation is an implementation detail and not really relevant in your case. The second observation is relevant for you since it will likely result in unexplained bugs- probably not in development- but when this code is actually running somewhere in a real scenario. Sockets are not streamlined. When the client sents over 1000 bytes. This might require 1 call to read on the server or 10. A call to read simply returns as soon as there is 'some' data available. What you need to do is implement some sort of protocol that communicates either how much data is being sent over- or when all the data has been sent over. I really recommend just to stick with the HTTP protocol since this is a well tested and well supported protocol that suits most scenario's.
The third observation might also cause bugs where the server is running out of resources since it leaves all connections open.
In my app I have a map and when the user does any operation on the map a request is sent to the server asking it to give it the map for the new bounding box but the problem now is that if a user say zooms in fast or pans the map continuously we end up sending many requests to the server and end up sending back the result to the client too.
Now I want to handle this more gracefully both at the server end and at the client end. I have thought of ways to handle the same at the client end but I need a way to do the same at the server end gracefully. what i mean is I dont want to end up processing stale requests which my client doesnt expect a response frm anyways. IS there a way I can achieve the same.
I am using MVC architecture in .NET Framework.
Thanks in advance.
P.S.All these queries are obviously ajax queries
Multiple ways to do this :
First way :
On the server side where you receive the request from client for new bounding window information, have the server operation wait for a small fraction of time ( the duration of time can be fine tuned later ) before it starts the processing.If there is a new request arriving from same client ( for the same zoom operation ) within this wait time , then let the old request be discarded. When no new request arrives from the client after the wait elapses , the server interprets the current request as the final one and processes this one. In order to minimize the delay appearing on the client side , the server can prepare those resources necessary to process the request which do not depend upon the exact details of the zoom parameters.
Second way ( can be used along with the first approach ) :
If the server is capable of it , use multiple threads for processing client requests. This way , you can safely discard the stale results and still avoid any delay in zooming appearing on to the client.
I've got a server side protocol that controls a telephony system, I've already implemented a client library that communicates with it which is in production now, however there are some problems with the system I have at the moment, so I am considering re-writing it.
My client library is currently written in Java but I am thinking of re-writing it in both C# and Java to allow for different clients to have access to the same back end.
The messages start with a keyword have a number of bytes of meta data and then some data. The messages are always terminated by an end of message character.
Communication is duplex between the client and the server usually taking the form of a request from the Client which provokes several responses from the server, but can be notifications.
The messages are marked as being on of:
C: Command
P: Pending (server is still handling the request)
D: Data data as a response to
R: Response
B: Busy (Server is too busy to handle response at the moment)
N: Notification
My current architecture has each message being parsed and a thread spawned to handle it, however I'm finding that some of the Notifications are processed out of order which is causing me some trouble as they have to be handled in the same order they arrive.
The duplex messages tend to take the following message format:
Client -> Server: Command
Server -> Client: Pending (Optional)
Server -> Client: Data (optional)
Server -> Client: Response (2nd entry in message data denotes whether this is an error or not)
I've been using the protocol for over a year and I've never seen the a busy message but that doesn't mean they don't happen.
The server can also send notifications to the client, and there are a few Response messages that are auto triggered by events on the server so they are sent without a corresponding Command being issued.
Some Notification Messages will arrive as part of sequence of messages, which are related for example:
NotificationName M00001
NotificationName M00001
NotificationName M00000
The string M0000X means that either there is more data to come or that this is the end of the messages.
At present the tcp client is fairly dumb it just spawns a thread that notifies an event on a subscriber that the message has been received, the event is specific to the message keyword and the type of message (So data,Responses and Notifications are handled separately) this works fairly effectively for Data and response messages, but falls over with the notification messages as they seem to arrive in rapid sequence and a race condition sometimes seems to cause the Message end to be processed before the ones that have the data are processed, leading to lost message data.
Given this really badly written description of how the system works how would you go about writing the client side transport code?
The meta data does not have a message number, and I have not control over the underlying protocol as it's provided by a vendor.
The requirement that messages must be processed in the order in which they're received almost forces a producer/consumer design, where the listener gets requests from the client, parses them, and then places the parsed request into a queue. A separate thread (the consumer) takes each message from the queue in order, processes it, and sends a response to the client.
Alternately, the consumer could put the result into a queue so that another thread (perhaps the listener thread?) can send the result to the client. In that case you'd have two producer/consumer relationships:
Listener -> event queue -> processing thread -> output queue -> output thread
In .NET, this kind of thing is pretty easy to implement using BlockingCollection to handle the queues. I don't know if there is something similar in Java.
The possibility of a multi-message request complicates things a little bit, as it seems like the listener will have to buffer messages until the last part of the request comes in before placing the entire thing into the queue.
To me, the beauty of the producer/consumer design is that it forces a hard separation between different parts of the program, making each much easier to debug and minimizing the possibility of shared state causing problems. The only slightly complicated part here is that you'll have to include the connection (socket or whatever) as part of the message that gets shared in the queues so that the output thread knows where to send the response.
It's not clear to me if you have to process all messages in the order they're received or if you just need to process messages for any particular client in the proper order. For example, if you have:
Client 1 message A
Client 1 message B
Client 2 message A
Is it okay to process the first message from Client 2 before you process the second message from Client 1? If so, then you can increase throughput by using what is logically multiple queues--one per client. Your "consumer" then becomes multiple threads. You just have to make sure that only one message per client is being processed at any time.
I would have one thread per client which does the parsing and processing. That way the processing would be in the order it is sent/arrives.
As you have stated, the tasks cannot be perform in parallel safely. performing the parsing and processing in different threads is likely to add as much overhead as you might save.
If your processing is relatively simple and doesn't depend on external systems, a single thread should be able to handle 1K to 20K messages per second.
Is there any other issues you would want to fix?
I can recommend only for Java-based solution.
I would use some already mature transport framework. By "some" I mean the only one I have worked with until now -- Apache MINA. However, it works and it's very flexible.
Regarding processing messages out-of-order -- for messages which must be produced in the order they were received you could build queues and put such messages into queues.
To limit number of queues, you could instantiate, say, 4 queues, and route incoming message to particular queue depending on the last 2 bits (indeces 0-3) of the hash of the ordering part of the message (for example, on the client_id contained in the message).
If you have more concrete questions, I can update my answer appropriately.
I need your suggest about data processing.
My server is a data server (using SQL Server 2005). And my client will get data from the server, and display them on windows.
Server and client is on internet (not LAN). So, time to get client is depended on the data's size and internet speed.
Assume: the SQL Server has a table with 2 column (Value and Change), the client will get data from this table (store in a datatable) and display them on a datagridview with 3 columns: Value, Change, and ChangePercent.
Note: ChangePercent = Change/Value;
I have a question: data in ChangePercent field should be calculated at server or client?
If I do at the server, the server will be overhead if there are a lot of clients. Moreover, the data returns to clients is greater (data of 3 fields).
If I do on the client, the client will only get data with 2 fields (Value & Change). Data in column ChangePercent will be calculated at client.
P/S: the connection between client and server is across a .net remoting. Client is a winform C# 2.0.
Thanks.
Go with calculation on the client.
Almost certainly the calculation will be faster than it takes to get the extra field over the line, apart from the fact that business logic shouldn't be calculated on a database server anyway.
Assuming that all variables will be of the same type, you needlessly increase your data transfer with 33% when calculating on the server. This matters only for large result sets obviously.
I don't think it matters where you do it, a division operation won't be too much of an overhead for either the server or the client. But consider that you have to write code on the client to handle a very simple operation that can be handled on the server.
EDIT: you can make a test table with, say, 1.000.000 records and see the actual execution time with the division and without it.
I would suggest using Method #2: Send 2 fields and let the third being calculated by the client.
The relative amount of calculation is very small for the client.