I'm sure this question is going to prove my ignorance, but I'm having a hard time understanding this. I'm willing to ask a dumb question to get a good answer.
All of the posts I've read about async streams do a good job of showing off the feature, but they don't explain why it's an improvement over the alternative.
Or, perhaps, when should one use async streams over good old client-server communication?
I can see where streaming the contents of a large file might be a good use for async streams, but many of the examples I've seen use async streams to transmit small bits of sensor data (temperature, for example). It seems like an IoT device with a temperature sensor could just HTTP POST the data to a server, and the server could respond. Why would the server implement async streams in that case?
I can already feel your pain as you struggle to make sense of those words, but please have mercy on me. :)
As requested, here are some examples I've come across that confused me. I'll post more as I find them, but I wanted to go ahead and get you started:
The first half of the .NET Conf keynote was a massive async stream demo... I couldn't understand why they were using async streams here: https://www.youtube.com/watch?v=1xQE2bWkwjo&list=PLReL099Y5nRd04p81Q7p5TtyjCrj9tz1t&index=4&t=
Here's another example that confused me
I wanted to write a professional response but the crude one is probably needed too:
Forget you ever heard about async streams. What were they thinking?
Call it await foreach, or async enumerables or async iterators. It has nothing to do with IO and streams.
The term is used because it exists in other languages, not because it has anything to do with IO. In Java for example, streams are Java's implementation of C#'s IEnumerable. So, to ease adoption by future Android devs, C# adopted Java's bad idea.
We can look at the language design meetings for the actual justification for this term I guess.
Serious original answer
There's no vs. It's like contrasting automatic gear boxes and cars. Cars can have automatic gear boxes, they aren't used instead of gear boxes.
Async streams is purely a programming concept that allows the creation of async iteratos. It's the feature that allows us to write this to make HTTP calls in a loop and process the results as they arrive :
await foreach(var someValue from someAsyncIterator(5))
{
...
}
IAsyncEnumerable<string> someAsyncIterator(int max)
{
for(int i=0;i<max;i++)
{
var response=await httpClient.GetStringAsync($"{baseUrl}/{i}");
yield return response;
}
}
When they appear as action results it's only to allow the ASP.NET Core middleware to start processing results as they are produced, they don't affect the contents of the HTTP response itself.
gRPC's streams on the other hand allow the server to send individual responses to the client asynchronously. Laurent Kempe in gRPC and C# 8 Async stream and Steve Gordon in Server Streaming with GRPC and .NET Core show how these can be used together
Copying from Steve Gordon's samples, let's say we have a weather service that sends forecasts to the client, whose proto file contains :
service WeatherForecasts {
rpc GetWeather (google.protobuf.Empty) returns (WeatherReply);
rpc GetWeatherStream (google.protobuf.Empty) returns (stream WeatherData);
rpc GetTownWeatherStream (stream TownWeatherRequest) returns (stream TownWeatherForecast);
}
Before C# 8, the client would have to block until it received all responses before processing them:
using var channel = GrpcChannel.ForAddress("https://localhost:5005");
var client = new WeatherForecastsClient(channel);
var reply = await client.GetWeatherAsync(new Empty());
foreach (var forecast in reply.WeatherData)
{
//Do something with the data
}
In C# 8 though, the responses can be received and processed as they arrive :
using var replies = client.GetWeatherStream(new Empty(), cancellationToken: cts.Token);
await foreach (var weatherData in replies.ResponseStream.ReadAllAsync(cancellationToken: cts.Token))
{
//Do something with the data
}
**
Related
I am having some severe performance issues in a project i'm working on. It's a standard web application project - users send requests to an API which trigger some form of computation in various handlers.
The problem right now is pretty much any request will drive the CPU usage of the server up significantly, regardless of what internal computation the corresponding function is supposed to do. For example, we have an endpoint to display a game from the database - the user sends a request containing an ID and the server will respond with a JSON-object. When this request is being processed the CPU usage goes from 5% (with the app just running) to 25-30%. Several concurrent requests will tank the server, with .net-core using 60-70% of the CPU.
The request chain looks like:
(Controller)
[HttpGet("game/{Id}")]
public async Task<IActionResult> GetPerson(string Id)
{
try
{
var response = await _GameService.GetGameAsync(Id);
return Ok(new FilteredResponse(response, 200));
}
Service
public async Task<PlayerFilteredGameState> GetGameAsync(string gameId, string apiKey)
{
var response = await _ironmanDataHandler.GetGameAsync(gameId);
var filteredGame = _responseFilterHelper.FilterForPlayer(response, apiKey);
return filteredGame;
}
Data handler
public async Task<GameState> GetGameAsync(string gameStateId)
{
using (var db = _dbContextFactory.Create())
{
var specifiedGame = await db.GameStateIronMan.FirstOrDefaultAsync(a => a.gameId == gameStateId);
if (specifiedGame == null)
{
throw new ApiException("There is no game with that ID.", 404);
}
var deserializedGame = JsonConvert.DeserializeObject<GameState>(specifiedGame.GameState);
return deserializedGame;
}
}
I've tried mocking all function return values and database accesses, replacing all computed values with null/new Game() etc etc but it doesn't improve the performance. I've spent lots of time with different performance analysis tools but there isn't a single function that uses more than 0,5-1% of the CPU.
After a lot of investigation the only "conclusion" i've reached is that it seems to have something to do with the internal functionality of async/await and the way we use it in our project, because it doesn't matter what we do in the called functions - as soon as we call a function the performance takes a huge hit.
I also tried making the functions synchronous just to see if there was something wrong with my system, however performance is massively reduced if i do that (which is good, i suppose).
I really am at a loss here because we aren't really doing anything out of the ordinary and we're still having large issues.
UPDATE
I've performed a performance analysis in ANTS. Im not really sure how to present the results, so i took a picture of what the callstack looks like.
If your gamestate is a large object, deserializing it can be quite taxing.
You could create a test where you just deserialize a saved game state, and do some profiling with various game states (a fresh start, after some time, ...) to see if there are differences.
If you find that deserializing takes a lot of CPU no matter what, you could look into changing the structure and seeing if you can optimize the amount of data that is saved
I want to extend my experience with the .NET framework and want to build a client/server application.
Actually, the client/server is a small Point Of Sale system but first, I want to focus on the communication between server and client.
In the future, I want to make it a WPF application but for now, I simply started with a console application.
2 functionalities:
client(s) receive(s) a dataset and every 15/30min an update with changed prices/new products
(So the code will be in a Async method with a Thread.sleep for 15/30 mins).
when closing the client application, sending a kind of a report (for example, an xml)
On the internet, I found lots of examples but i can't decide which one is the best/safest/performanced manner of working so i need some advice for which techniques i should implement.
CLIENT/SERVER
I want 1 server application that handles max 6 clients. I read that threads use a lot of mb and maybe a better way will be tasks with async/await functionallity.
Example with ASYNC/AWAIT
http://bsmadhu.wordpress.com/2012/09/29/simplify-asynchronous-programming-with-c-5-asyncawait/
Example with THREADS
mikeadev.net/2012/07/multi-threaded-tcp-server-in-csharp/
Example with SOCKETS
codereview.stackexchange.com/questions/5306/tcp-socket-server
This seems to be a great example of sockets, however, the revisioned code isn't working completely because not all the classes are included
msdn.microsoft.com/en-us/library/fx6588te(v=vs.110).aspx
This example of MSDN has a lot more with Buffersize and a signal for the end of a message. I don't know if this just an "old way" to do this because in my previous examples, they just send a string from the client to the server and that's it.
.NET FRAMEWORK REMOTING/ WCF
I found also something about the remoting part of .NET and WCF but don' know if I need to implement this because i think the example with Async/Await isn't bad.
SERIALIZED OBJECTS / DATASET / XML
What is the best way to send data between it? Juse an XML serializer or just binary?
Example with Dataset -> XML
stackoverflow.com/questions/8384014/convert-dataset-to-xml
Example with Remoting
akadia.com/services/dotnet_dataset_remoting.html
If I should use the Async/Await method, is it right to something like this in the serverapplication:
while(true)
{
string input = Console.ReadLine();
if(input == "products")
SendProductToClients(port);
if(input == "rapport")
{
string Example = Console.ReadLine();
}
}
Here are several things anyone writing a client/server application should consider:
Application layer packets may span multiple TCP packets.
Multiple application layer packets may be contained within a single TCP packet.
Encryption.
Authentication.
Lost and unresponsive clients.
Data serialization format.
Thread based or asynchronous socket readers.
Retrieving packets properly requires a wrapper protocol around your data. The protocol can be very simple. For example, it may be as simple as an integer that specifies the payload length. The snippet I have provided below was taken directly from the open source client/server application framework project DotNetOpenServer available on GitHub. Note this code is used by both the client and the server:
private byte[] buffer = new byte[8192];
private int payloadLength;
private int payloadPosition;
private MemoryStream packet = new MemoryStream();
private PacketReadTypes readState;
private Stream stream;
private void ReadCallback(IAsyncResult ar)
{
try
{
int available = stream.EndRead(ar);
int position = 0;
while (available > 0)
{
int lengthToRead;
if (readState == PacketReadTypes.Header)
{
lengthToRead = (int)packet.Position + available >= SessionLayerProtocol.HEADER_LENGTH ?
SessionLayerProtocol.HEADER_LENGTH - (int)packet.Position :
available;
packet.Write(buffer, position, lengthToRead);
position += lengthToRead;
available -= lengthToRead;
if (packet.Position >= SessionLayerProtocol.HEADER_LENGTH)
readState = PacketReadTypes.HeaderComplete;
}
if (readState == PacketReadTypes.HeaderComplete)
{
packet.Seek(0, SeekOrigin.Begin);
BinaryReader br = new BinaryReader(packet, Encoding.UTF8);
ushort protocolId = br.ReadUInt16();
if (protocolId != SessionLayerProtocol.PROTOCAL_IDENTIFIER)
throw new Exception(ErrorTypes.INVALID_PROTOCOL);
payloadLength = br.ReadInt32();
readState = PacketReadTypes.Payload;
}
if (readState == PacketReadTypes.Payload)
{
lengthToRead = available >= payloadLength - payloadPosition ?
payloadLength - payloadPosition :
available;
packet.Write(buffer, position, lengthToRead);
position += lengthToRead;
available -= lengthToRead;
payloadPosition += lengthToRead;
if (packet.Position >= SessionLayerProtocol.HEADER_LENGTH + payloadLength)
{
if (Logger.LogPackets)
Log(Level.Debug, "RECV: " + ToHexString(packet.ToArray(), 0, (int)packet.Length));
MemoryStream handlerMS = new MemoryStream(packet.ToArray());
handlerMS.Seek(SessionLayerProtocol.HEADER_LENGTH, SeekOrigin.Begin);
BinaryReader br = new BinaryReader(handlerMS, Encoding.UTF8);
if (!ThreadPool.QueueUserWorkItem(OnPacketReceivedThreadPoolCallback, br))
throw new Exception(ErrorTypes.NO_MORE_THREADS_AVAILABLE);
Reset();
}
}
}
stream.BeginRead(buffer, 0, buffer.Length, new AsyncCallback(ReadCallback), null);
}
catch (ObjectDisposedException)
{
Close();
}
catch (Exception ex)
{
ConnectionLost(ex);
}
}
private void Reset()
{
readState = PacketReadTypes.Header;
packet = new MemoryStream();
payloadLength = 0;
payloadPosition = 0;
}
If you're transmitting point of sale information, it should be encrypted. I suggest TLS which is easily enabled on through .Net. The code is very simple and there are quite a few samples out there so for brevity I'm not going to show it here. If you are interested, you can find an example implementation in DotNetOpenServer.
All connections should be authenticated. There are many ways to accomplish this. I've use Windows Authentication (NTLM) as well as Basic. Although NTLM is powerful as well as automatic it is limited to specific platforms. Basic authentication simply passes a username and password after the socket has been encrypted. Basic authentication can still, however; authenticate the username/password combination against the local server or domain controller essentially impersonating NTLM. The latter method enables developers to easily create non-Windows client applications that run on iOS, Mac, Unix/Linux flavors as well as Java platforms (although some Java implementations support NTLM). Your server implementation should never allow application data to be transferred until after the session has been authenticated.
There are only a few things we can count on: taxes, networks failing and client applications hanging. It's just the nature of things. Your server should implement a method to clean up both lost and hung client sessions. I've accomplished this in many client/server frameworks through a keep-alive (AKA heartbeat) protocol. On the server side I implement a timer that is reset every time a client sends a packet, any packet. If the server doesn't receive a packet within the timeout, the session is closed. The keep-alive protocol is used to send packets when other application layer protocols are idle. Since your application only sends XML once every 15 minutes sending a keep-alive packet once a minute would able the server side to issue an alert to the administrator when a connection is lost prior to the 15 minute interval possibly enabling the IT department to resolve a network issue in a more timely fashion.
Next, data format. In your case XML is great. XML enables you to change up the payload however you want whenever you want. If you really need speed, then binary will always trump the bloated nature of string represented data.
Finally, as #NSFW already stated, threads or asynchronous doesn't really matter in your case. I've written servers that scale to 10000 connections based on threads as well as asynchronous callbacks. It's all really the same thing when it comes down to it. As #NSFW said, most of us are using asynchronous callbacks now and the latest server implementation I've written follows that model as well.
Threads are not terribly expensive, considering the amount of RAM available on modern systems, so I don't think it's helpful to optimize for a low thread count. Especially if we're talking about a difference between 1 thread and 2-5 threads. (With hundreds or thousands of threads, the cost of a thread starts to matter.)
But you do want to optimize for minimal blocking of whatever threads you do have. So for example instead of using Thread.Sleep to do work on 15 minute intervals, just set a timer, let the thread return, and trust the system to invoke your code 15 minutes later. And instead of blocking operations for reading or writing information over the network, use non-blocking operations.
The async/await pattern is the new hotness for asynchronous programming on .Net, and it is a big improvement over the Begin/End pattern that dates back to .Net 1.0. Code written with async/await is still using threads, it is just using features of C# and .Net to hide a lot of the complexity of threads from you - and for the most part, it hides the stuff that should be hidden, so that you can focus your attention on your application's features rather than the details of multi-threaded programming.
So my advice is to use the async/await approach for all of your IO (network and disk) and use timers for periodic chores like sending those updates you mentioned.
And about serialization...
One of the biggest advantages of XML over binary formats is that you can save your XML transmissions to disk and open them up using readily-available tools to confirm that the payload really contains the data that you thought would be in there. So I tend to avoid binary formats unless bandwidth is scarce - and even then, it's useful to develop most of the app using a text-friendly format like XML, and then switch to binary after the basic mechanism of sending and receiving data have been fleshed out.
So my vote is for XML.
And regarding your code example, well ther's no async/await in it...
But first, note that a typical simple TCP server will have a small loop that listens for incoming connections and starts a thread to hanadle each new connection. The code for the connection thread will then listen for incoming data, process it, and send an appropriate response. So the listen-for-new-connections code and the handle-a-single-connection code are completely separate.
So anyway, the connection thread code might look similar to what you wrote, but instead of just calling ReadLine you'd do something like "string line = await ReadLine();" The await keyword is approximately where your code lets one thread exit (after invoking ReadLine) and then resumes on another thread (when the result of ReadLine is available). Except that awaitable methods should have a name that ends with Async, for example ReadLineAsync. Reading a line of text from the network is not a bad idea, but you'll have to write ReadLineAsync yourself, building upon the existing network API.
I hope this helps.
I have been reading a lot about ThreadPools, Tasks, and Threads. After awhile I got pretty confused with the whole thing. Lots of people saying negative/positive things about each... Maybe someone can help me find a solution for my problem. I created a simple diagram here to get my point across better.
Basically on the left is a list of 5 strings (URL's) that need to be processed. In the center is just my idea of a handler that has 2 events to track progress. Inside that handler it takes all 5 URL's creates separate tasks for them, shown in blue. Once each one complete I want each one to return the webpage results to the handler. When they have all returned a value I want the OnComplete to be called and all this information passed back to the main thread.
Hopefully you can understand what I am trying to do. Thanks in advance for anyone who would like to help!
Update
I have taken your suggestions and put them to use. But I still have a few questions. Here is the code I have built, mind it is not build proof, just a concept to see if I'm going in the right direction. Please read the comments, I had included my questions on how to proceed in there. Thank you for all who took interest in my question so far.
public List<String> ProcessList (string[] URLs)
{
List<string> data = new List<string>();
for(int i = 0; i < URLs.Length - 1; i++)
{
//not sure how to do this now??
//I want only 10 HttpWebRequest running at once.
//Also I want this method to block until all the URL data has been returned.
}
return data;
}
private async Task<string> GetURLData(string URL)
{
//First setup out web client
HttpWebRequest Request = GetWebRequest(URL);
//
//Check if the client holds a value. (There were no errors)
if (Request != null)
{
//GetCouponsAsync will return to the calling function and resumes
//here when GetResponse is complete.
WebResponse Response = await Request.GetResponseAsync();
//
//Setup our Stream to read the reply
Stream ResponseStream = Response.GetResponseStream();
//return the reply string here...
}
}
As #fendorio and #ps2goat pointed out async await is perfect for your scenario. Here is another msdn article
http://msdn.microsoft.com/en-us/library/hh300224.aspx
It seems to me that you are trying to replicate a webserver within a webserver.
Each web request starts its own thread in a webserver. As these requests can originate from anywhere that has access to the server, nothing but the server itself has access or the ability to manage them (in a clean way).
If you would like to handle requests and keep track of them like I believe you are asking, AJAX requests would be the best way to do this. This way you can leave the server to manage the threads and requests as it does best, but you can manage their progress and monitor them via JSON return results.
Look into jQuery.ajax for some ideas on how to do this.
To achieve the above mentioned functionality in a simple way, I would prefer calling a BackgroundWorker for each of the tasks. You can keep track of the progress plus you get a notification upon task completion.
Another reason to choose this is that the mentioned tasks look like a back-end job and not tightly coupled with the UI.
Here's a MSDN link and this is the link for a cool tutorial.
Both C# and Scala have adopted frameworks for simplifying doing asynchronous/parallel computation, but in different ways. The latest C# (5.0, still in beta) has decided on an async/await framework (using continuation-passing under the hood, but in an easier-to-use way), while Scala instead uses the concept of "actors", and has recently taken the actors implementation in Akka and incorporated it into the base library.
Here's a task to consider: We receive a series of requests to do various operations -- e.g. from user input, requests to a server, etc. Some operations are fast, but some take awhile. For the slow ones, we'd like to asynchronously do the operation (in another thread) and process it when the thread is done, while still being free to process new requests.
A simple synchronous loop might be (pseudo-code):
while (1) {
val request = waitForAnything(user_request, server_request)
val result = do_request(request)
if (result needs to be sent back)
send_back_result(result)
}
In a basic fork/join framework, you might do something like this (pseudo-code):
val threads: Set[Thread]
while (1) {
val request = waitForAnything(user_request, server_request, termination of thread)
if (request is thread_terminate) {
threads.delete(request.terminated_thread)
val result = request.thread_result
if (result needs to be sent back)
send_back_result(result)
} else if (request is slow) {
val thread = new Thread(() => do_request(request))
Threads.add(thread)
thread.start()
} else {
val result = do_request(request)
if (result needs to be sent back)
send_back_result(result)
}
}
How would this look expressed using async/await and using actors, and more generally what are the advantages/disadvantages of these approach?
Please consider mine as a partial answer: "old" Scala actors have been replaced by Akka actors , which are much more then a simple async/await library.
Akka actors are "message-handlers" which are organized into a hierarchy which can be running on one or more JVMs and even distributed across a network.
When you realize your asynchronous processing requires actors (read later why this is not forcely necessary), Akka let you and helps you to put in place the best patterns in terms of failure handling, dispatching and routing
Akka comes with different transport layers, and other fancies ready-to-use facilities such as explicit Finite State Machines, Dataflow concurrency and others.
Akka comes with Futures, which are more likely to corresponds to the Async/Await framework in C# 5.0
You can read more about Akka futures on the Akka website or on this post:
Parallel file processing in Scala
I can't speak for Scala, but the C# version would look something like this (I don't have an IDE handy so pardon any errors/typos:
public async Task<int> GetResult()
{
while (true)
{
var request = await waitForAnything(user_request, server_request);
var result = await do_request(request);
if (isValid(result))
return result;
}
}
And you would call it something like so:
public void RunTheProgram()
{
int result = await GetResult();
Console.WriteLine("Result: {0}", result);
}
In short, the actual code would look very much like the pseudo code, i.e. very much like how a normal human being would think about the problem. That's the real beauty of C#'s async/await.
Scala has implemented also the async/await paradigm, which can simplify some algorithms.
Here is the proposal:
http://docs.scala-lang.org/sips/pending/async.html
Here is the implementation:
https://github.com/scala/async
Background
Hi.
I write a program that analyzes the packets for specific words contained therein. I need to analyze outgoing email, jabber, ICQ. If the words are found, the packet is blocked.I did it, but I have a problem with the files and sending email through the web.
Problems
Simple code:
while (Ndisapi.ReadPacket(hNdisapi, ref Request))
{
// some work
switch (protocol)
{
//....
case "HTTP":
// parse packet(byte[])
HTTP.HttpField field = HTTP.ParseHttp(ret);
if (field != null && field.Method == HTTP.HttpMethod.POST)
{
// analyze packet and drop if needed
DoWork();
}
}
The problem is the following. For example, I attach to email the file of 500 KB. The file will be split approximately in 340 packets. In the code above, DoWork() only for first packet will be executed.
Ok, then I need to restore session completely and pass whole session to DoWork(). I did it. But I can't wait while session is finished, because other packet( http, arp, all packets) will be suspended (And after a couple of minutes the Internet is disconnected).
Therefore, the first question:
How to solve this problem (may be advice for design program)?
Now the email, suppose this code:
switch (protocol)
{
//....
case "HTTP":
// parse packet(byte[])
var httpMimeMessage = Mime.Parse(ret);
// analyze packet and drop if needed
DoSomeWork();
break;
}
For example, we are looking for word "Finance". Then, if we open any website and there will be a word finance then packet is blocked.
Second question: How do I determine that this is the e-mail?
Thanks and sorry for my English.
To be able to analyze more than one packet/stream at the same time, you'll need to refactor your solution to use threading or some other form of multitasking and since your task appears to be both compute and io-intensive, you'll probably want to take a hard look at how to leverage event-handling at the operating system level (select, epoll, or the equivalent for your target platform).
And to answer your second question regarding email, you'll need to be able to identify and track the tcp session used to deliver email messages from client to server, assuming the session hasn't been encrypted.
As I'm sure you already know, the problem you're trying to solve is a very complicated one, requiring very specialized skills like realtime programming, deep knowledge of networking protocols, etc.
Of course, there are several "deep packet inspection" solutions out there already that do all of this for you, (typically used by public companies to fulfill regulatory requirements like Sarbanes-Oxley), but they are quite expensive.