I have a C# library that is surfaced through a webpage UI. There's of course the UI layer, which makes calls through a business logic layer, which in turn hits the data access layer to retrieve the data. The data is actually retrieved via a web service which has the capability to query data asynchronously.
I'm starting to learn how to deal with requests asynchronously to both speed up page time (grabbing multiple items from the web service as soon as I can with multiple threads), and for scalability by threads not being tied up all the time.
Given the three-tier architecture of my code, would this mean that pages asynchronously make calls through the business layer (and therefore speed up by making concurrent requests), and the web service in the data layer in turn would also make its requests asynchronously? The logic of juggling all the callbacks is hurting my brain. Are there any resources out there for this type of approach?
If you can stand to introduce a new language, F# is terribly good at writing asynchronous code, one of its main strengths, IMHO, aside from its conciseness. Writing async code looks almost exactly like linear non-async code!
Related links:
Beyond Foundations of F# - Asynchronous Workflows
An introduction to F# (video)
Concurrency in F# (video - excellent short case study on speeding up an existing real-world C# insurance processing system with selective introduction of F# replacement modules)
If you don't want to introduce a new language, here is a technique for using iterators to simplify your code:
Asynchronous Programming in C# using Iterators
There is a somewhat useful concept called Future<T>, which represents a unit of an asyncronous work to be done in future.
So, in simple words, at the beginning of some action (like app start or page load) you can define, which values you will need in future and let the background threads to compute them for you.
When your user demands a specific value, you just ask for it from the according Future<T>. If its already done, you get it immediately, otherwise you'll have to block your main thread, or somehow inform the user that the value is still not ready.
Some discussion of that concept you can find here.
I guess you have two areas to handle callbacks from:
Data Layer -> Business Objects
Business Objects -> Presentation Layer
For the first one, I would use code generation to automatically create the aysnc and callback logic for all the methods. If you have alot of classes that are pretty close to being the same then this work quite well.
For the second area, I tend to use databinding to 'automatically' update the UI as the objects underneath are populating.
Related
I am fairly new to asynchronous programming so I need help.
What I need to do is, create a windows service that constantly checks the database for menu updates (insert/updates), tables updates (insert/updates), menu category updates (insert/updates) and so on and if any change is detected the service will then need to POST those said changes to separate APIs one by one. Keeping in mind that the service will be used for just this purpose and the database that I need to check for updates is SQL Server.
So, how do I approach this scenario efficiently ? Do I create new Tasks (System.Threading.Tasks) or create new Threads (System.Threading.Thread) for each pieces like UpdateMenu that checks the menu updates and upload to api, UpdateTable, UpdateDishes and so on and how do I go about the Posting to the API part I mean do I create a new Task for each and every API call? I want the application to be as efficient as possible and pick the changes and post them to API as soon as possible.
Thanks in advance.
It seems that you are worried about the overhead of the mechanism that you are going to use, in order to fetch data from the database and post these data to APIs. You are thinking that maybe Threads are fast and Tasks are slower, or vice versa. In fact choosing between these two mechanisms is likely to have no measurable impact to your service's demand for CPU, memory or other system resources.
What is likely to be impactful, is the pattern of communication of your service with the database and the APIs. For example if your threads/tasks are not coordinated with each other, and query the database all at the same time, the database might be slow to respond, and might consume larger amounts of memory while preparing the response. That's not because your threads/tasks are slow. It's because your service is querying the database with a pattern that makes it harder for the database to respond. The same might be true for the pattern of communication with the APIs. If your workers are not coordinated, the network connectivity might become a bottleneck, or the remote machines that host the APIs might suffer.
So my advice is to focus on the usability factor of the mechanisms, and not on their supposed difference in performance. If you are comfortable and familiar with threads, and know nothing about tasks, use threads. If you are familiar with both threads and tasks, use tasks because they are generally easier to use. You'd better invest your time to optimize the communication pattern between your service and its dependencies, than for doing benchmarks trying to find the best between mechanisms that for all intents and purposes are equally efficient.
We already have parallel fan-out working in our code (using ParallelEnumerable) which is currently running on a 12-core, 64G RAM server. But we would like to convert the code to use Rx so that we can have better flexibility over our downstream pipeline.
Current Workflow:
We read millions of records from a database (in a streaming fashion).
On the client side, we then use a custom OrderablePartitioner<T> class to group the database records into groups. Let’s call an instance of this class: partioner.
We then use partioner.AsParallel().WithDegreeOfParallelism(5).ForAll(group => ProcessGroupOfRecordsAsync(group));Note: this could be read as “Process all the groups, 5 at a time in parallel.” (I.e. parallel fan-out).
ProcessGroupOfRecordsAsync() – loops through all the records in the group and turns them into hundreds or even thousands of POCO objects for further processing (i.e. serial fan-out or better yet, expand).
Depending on the client’s needs:
This new serial stream of POCO objects are evaluated, sorted, ranked, transformed, filtered, filtered by manual process, and possibly more parallel and/or serial fanned-out throughout the rest of the pipeline.
The end of the pipeline may end up storing new records into the database, displaying the POCO objects in a form or displayed in various graphs.
The process currently works just fine, except that point #5 and #6 aren’t as flexible as we would like. We need the ability to swap in and out various downstream workflows. So, our first attempt was to use a Func<Tin, Tout> like so:
partioner.AsParallel
.WithDegreeOfParallelism(5)
.ForAll(group =>ProcessGroupOfRecordsAsync(group, singleRecord =>
NextTaskInWorkFlow(singleRecord));
And that works okay, but the more we flushed out our needs the more we realized we are just re-implementing Rx.
Therefore, we would like to do something like the following in Rx:
IObservable<recordGroup> rg = dbContext.QueryRecords(inputArgs)
.AsParallel().WithDegreeOfParallelism(5)
.ProcessGroupOfRecordsInParallel();
If (client1)
rg.AnalizeRecordsForClient1().ShowResults();
if (client2)
rg.AnalizeRecordsForClient2()
.AsParallel()
.WithDegreeOfParallelism(3)
.MoreProcessingInParallel()
.DisplayGraph()
.GetUserFeedBack()
.Where(data => data.SaveToDatabase)
.Select(data => data.NewRecords)
.SaveToDatabase(Table2);
...
using(rg.Subscribe(groupId =>LogToScreen(“Group {0} finished.”, groupId);
It sounds like you might want to investigate Dataflows in the Task Parallel Library - This might be a better fit than Rx for dealing with part 5, and could be extended to handle the whole problem.
In general, I don't like the idea of trying to use Rx for parallelization of CPU bound tasks; its usually not a good fit. If you are not too careful, you can introduce inefficiencies inadvertently. Dataflows can give you nice way to parallelize only where it makes most sense.
From MSDN:
The Task Parallel Library (TPL) provides dataflow components to help increase the robustness of concurrency-enabled applications. These dataflow components are collectively referred to as the TPL Dataflow Library. This dataflow model promotes actor-based programming by providing in-process message passing for coarse-grained dataflow and pipelining tasks. The dataflow components build on the types and scheduling infrastructure of the TPL and integrate with the C#, Visual Basic, and F# language support for asynchronous programming. These dataflow components are useful when you have multiple operations that must communicate with one another asynchronously or when you want to process data as it becomes available. For example, consider an application that processes image data from a web camera. By using the dataflow model, the application can process image frames as they become available. If the application enhances image frames, for example, by performing light correction or red-eye reduction, you can create a pipeline of dataflow components. Each stage of the pipeline might use more coarse-grained parallelism functionality, such as the functionality that is provided by the TPL, to transform the image.
Kaboo!
As no one has provided anything definite, I'll point out that the source code can be browsed at GitHub at Rx. Taking a quick tour around, it looks like at least some of the processing (all of it?) is done on the thread-pool already. So, maybe it's not possibly to explicitly control the parallelization degree besides implementing your own scheduler (e.g. Rx TestScheduler), but it happens nevertheless. See also the links below, judging from the answers (especially the one provided by James in the first link), the observable tasks are queued and processed serially by design -- but one can provide multiple streams for Rx to process.
See also the other questions that are related and visible on the left side (by default). In particular it looks like this one, Reactive Extensions: Concurrency within the subscriber, could provide some answers to your question. Or maybe Run methods in Parallel using Reactive.
<edit: Just a note that if storing objects to database becomes a problem, the Rx stream could push the save operations to, say, a ConcurrentQueue, which would then be processed separately. Other option would be to let Rx to queue items with a proper combination of some time and number of items and push them to the database by bulk insert.
Most people seem to build a listener socket and will include "events" to be invoked for processing. EG: SocketConnected, DataReceived. The programmer initializes a listener and binds to the "events" methods to receive socket events to build the service.
I feel on a large scale implementation, it would be more efficient to avoid delegates in the listener. And to complete all the processing in the callback methods. Possibly using different call backs for receiving data based on the knowledge of knowing what command is coming next. (This is part of my Message Frame Structure)
I have looked around for highly scalable examples, but I only find the standard MSDN implementations for asynchronous sockets or variations from other programmers that replicate the MSDN example.
Does anyone have any good experience that could point me in the right direction?
Note> The service will hold thousands of clients and in most cases, the clients stayed connected and updates received by the service will be send out to all other connected clients. It is a synchronized P2P type system for an object orientated database.
The difference between an event call and a callback is negligible. A callback is just the invocation of a delegate (or a function pointer). You can't do asynchronous operation without some sort of callback and expect to get results of any kind.
With events, they can be multicast. This means multiple callback calls--so that would be more costly because you calling multiple methods. But, if you're doing that you probably need to do it--the alternative is to have multiple delegates and call them manually. So, there'd be no real benefit. Events can often include sender/eventargs; so, you've got that extra object and the creation of the eventargs instance; but I've never seen a situation where that affected performance.
Personally, I don't use the event-based asynchronous pattern--I've found (prior to .NET 4.5) that the asynchronous programming model to be more ubiquitous. In .NET 4.5 I much prefer the task asynchronous pattern (single methods that end in Async instead of two methods one starting with Begin and one starting with End) because they can be used with async/await and less wordy.
Now, if the question is the difference between new AsyncCallback(Async_Send_Receive.Read_Callback) e.g.:
s.BeginReceive(so.buffer, 0, StateObject.BUFFER_SIZE, 0,
new AsyncCallback(Async_Send_Receive.Read_Callback), so);
and just Async_Send_Receive.Read_Callback e.g.:
s.BeginReceive(so.buffer, 0, StateObject.BUFFER_SIZE, 0,
Async_Send_Receive.Read_Callback, so);
The second is just a short-hand of the first; the AsyncCallback delegate is still created under the covers.
But, as with most things; even if it's generally accepted not to be noticeably different in performance, test and measure. If one way has more benefits (included performance) than another, use that one.
My only advice to you is this: Go with the style that provides the most clarity.
Eliminating an entire language feature because of an unmeasured speed difference would be premature. The cost of method calls/delegate invocations is highly unlikely to be the bottleneck in your code. Sure, you could benchmark the relative cost of one versus another, but if your program is only spending 1% of its setting up method invocations, then even huge differences won't really affect your program.
My best advice to you if you really want to juice your server, just make sure that all your IO happens asynchronously, and never run long-running tasks in the threadpool. .net4.5 async/await really simplifies all of this... consider using it for more maintainable code.
I have worked with live betting systems using sockets and with two way active messaging. It is really easier to work with a framework to handle the socket layer like WCF P2P. It handles all the connection problems for you and you can concentrate on your bussiness logic.
I have a C# service application which interacts with a database. It was recently migrated from .NET 2.0 to .NET 4.0 so there are plenty of new tools we could use.
I'm looking for pointers to programming approaches or tools/libraries to handle defining tasks, configuring which tasks they depend on, queueing, prioritizing, cancelling, etc.
There are various types of services:
Data (for retrieving and updating)
Calculation (populate some table with the results of a calculation on the data)
Reporting
These services often depend on one another and are triggered on demand, i.e., a Reporting task, will probably have code within it such as
if (IsSomeDependentCalculationRequired())
PerformDependentCalculation(); // which may trigger further calculations
GenerateRequestedReport();
Also, any Data modification is likely to set the Required flag on some of the Calculation or Reporting services, (so the report could be out of date before it's finished generating). The tasks vary in length from a few seconds to a couple of minutes and are performed within transactions.
This has worked OK up until now, but it is not scaling well. There are fundamental design problems and I am looking to rewrite this part of the code. For instance, if two users request the same report at similar times, the dependent tasks will be executed twice. Also, there's currently no way to cancel a task in progress. It's hard to maintain the dependent tasks, etc..
I'm NOT looking for suggestions on how to implement a fix. Rather I'm looking for pointers to what tools/libraries I would be using for this sort of requirement if I were starting in .NET 4 from scratch. Would this be a good candidate for Windows Workflow? Is this what Futures are for? Are there any other libraries I should look at or books or blog posts I should read?
Edit: What about Rx Reactive Extensions?
I don't think your requirements fit into any of the built-in stuff. Your requirements are too specific for that.
I'd recommend that you build a task queueing infrastructure around a SQL database. Your tasks are pretty long-running (seconds) so you don't need particularly high throughput in the task scheduler. This means you won't encounter performance hurdles. It will actually be a pretty manageable task from a programming perspective.
Probably you should build a windows service or some other process that is continuously polling the database for new tasks or requests. This service can then enforce arbitrary rules on the requested tasks. For example it can detect that a reporting task is already running and not schedule a new computation.
My main point is that your requirements are that specific that you need to use C# code to encode them. You cannot make an existing tool fit your needs. You need the turing completeness of a programming language to do this yourself.
Edit: You should probably separate a task-request from a task-execution. This allows multiple parties to request a refresh of some reports while at the same time only one actual computation is running. Once this single computation is completed all task-requests are marked as completed. When a request is cancelled the execution does not need to be cancelled. Only when the last request is cancelled the task-execution is cancelled as well.
Edit 2: I don't think workflows are the solution. Workflows usually operate separately from each other. But you don't want that. You want to have rules which span multiple tasks/workflows. You would be working against the system with a workflow based model.
Edit 3: A few words about the TPL (Task Parallel Library). You mentioned it ("Futures"). If you want some inspiration on how tasks could work together, how dependencies could be created and how tasks could be composed, look at the Task Parallel Library (in particular the Task and TaskFactory classes). You will find some nice design patterns there because it is very well designed. Here is how you model a sequence of tasks: You call Task.ContinueWith which will register a continuation function as a new task. Here is how you model dependencies: TaskFactory.WhenAll(Task[]) starts a task that only runs when all its input tasks are completed.
BUT: The TPL itself is probably not well suited for you because its task cannot be saved to disk. When you reboot your server or deploy new code, all existing tasks are being cancelled and the process aborted. This is likely to be unacceptable. Please just use the TPL as inspiration. Learn from it what a "task/future" is and how they can be composed. Then implement your own form of tasks.
Does this help?
I would try to use the state machine package stateless to model the workflow. Using a package will provide a consistent way to advance the state of the workflow, across the various services. Each of your services would hold an internal statemachine implementation, and expose methods for advancing it. Stateless will be resposible for triggering actions based on the state of the workflow, and enforce you to explicitly setup the various states that it can be in - this will be particularly useful for maintenance, and it will probably help you understand the domain better.
If you want to solve this fundamental problem properly and in a scalable way, you should probably look as SOA architecture style.
Your services will receive commands and generate events you can handle in order to react on facts happen in your system.
And, yes, there are tools for it. For example NServiceBus is a wonderful tool to build SOA systems.
You can do a SQL data agent to run SQL queries in timed interval. You have to write the application yourself it looks like. Write like a long running program that checks the time and does something. I don't think there is clearcut tools out there to do what you are trying to do. Do C# application, WCF service. data automation can be done in the sql itself.
If I understand you right you want to cache the generated reports and do not the work again. As other commenters have pointed out this can be solved elegantly with multiple Producer/Consumer queues and some caches.
First you enqueue your Report request. Based on the report genration parameters you can check the cache first if a previously generated report is already available and simply return this one. If due to changes in the database the report becomes obsolete you need to take care that the cache is invalidated in a reliable manner.
Now if the report was not generated yet you need need to schedule the report for generation. The report scheduler needs to check if the same report is already beeing generated. If yes register an event to notify you when it is completed and return the report once it is finished. Make sure that you do not access the data via the caching layer since it could produce races (report is generated, data is changed and the finished report would be immediatly discared by the cache leaving noting for you to return).
Or if you do want to prevent to return outdated reports you can let the caching layer become your main data provider which will produce as many reports until one report is generated in time which was not outdated. But be aware that if you have constant changes in your database you might enter an endless loop here by constantly generating invalid reports if the report generation time is longer as the average time between to changes to your db.
As you can see you have plenty of options here without actually talking about .NET, TPL, SQL server. First you need to set your goals how fast/scalable and reliable your system should be then you need to choose the appropriate architecture-design as described above for your particular problem domain. I cannot do it for you because I do not have your full domain know how what is acceptable and what not.
The tricky part is the handover part between different queues with the proper reliability and correctness guarantees. Depending on your specific report generation needs you can put this logic into the cloud or use a single thread by putting all work into the proper queues and work on them concurrently or one by one or something in between.
TPL and SQL server can help there for sure but they are only tools. If used wrongly due to not sufficient experience with the one or the other it might turn out that a different approach (like the usage of only in memory queues and persisted reports on in the file system) is better suited for your problem.
From my current understanding I would not use SQL server to misuse it as a cache but if you want a database I would use something like RavenDB or RaportDB which look stable and much more light weight compared to a full blown SQL server.
But if you already have a SQL server running then go ahead and use it.
I am not sure if I understood you correctly, but you might want to have a look at JAMS Scheduler: http://www.jamsscheduler.com/. It's non-free, but a very good system for scheduling depending tasks and reporting. I have used it with success at my previous company. It's written in .NET and there is a .NET API for it, so you can write your own apps communicating with JAMS. They also have a very good support and are eager to implement new features.
I might inherit a somewhat complex multithreaded application, which currently has several files with 2+k loc, lots of global variables accessed from everywhere and other practices that I would consider quite smelly.
Before I start adding new features with the current patterns, I'd like to try and see if I can make the basic architecture of the application better. Here's a short description :
App has in memory lists of data, listA, listB
App has local copy of the data (for offline functionality) dataFileA, dataFileB
App has threads tA1, tB1 which update dirty data from client to server
Threads tA2, tB2 update dirty data from server to client
Threads tA3, tB3 update dirty data from in memory lists to local files
I'm kinda trouble on what different patterns, strategies, programming practices etc I should look into in order to have the knowledge to make the best decisions on this.
Here's some goals I've invented for myself:
Keep the app as stable as possible
Make it easy for Generic Intern to add new features (big no-no to 50 lines of boilerplate code in each new EditRecordX.cs )
Reduce complexity
Thanks for any keywords or other tips which could help me on this project.
To Quibblesome's excellent suggestions, I might also add that using immutable objects is often an effective way to reduce the risk of threading problems. (Immutable objects, like strings in .NET and Java, cannot be modified once they are created.)
I'd suggest another goal would be to remove/reduce global state and keep information on the stack as often as possible to reduce the possibility of race conditions and weird threading issues.
Perhaps it might be worth seeing if you can incorporate tA2, tB2, tA3 and tB3 into the same threads to kill a few. If that isn't possible consider putting them behind a facade (a thread that concerns itself with moving data requests between the UI and the service that is talking to the server). This is so the "user facing" code only has to deal with one client as opposed to two. (I don't count the backup as a client as this sounds like a one-way process).
If the threads (UI and facade) wait for one another to finish their requests then this should prevent a "pull update" happening at the same time as a "push update".
For making these kind of changes in general you will want to look at Martin Fowler's Refactoring: Improving the Design of Existing Code (much of which is on the refactoring website) and also Refactoring to Patterns. You also might find Working Effectively with Legacy Code useful in supporting safe changes. None of this is as much help specifically with multithreading, except in that simpler code is easier to handle in a multithreading environment.
I think you should take a look at this: http://msdn.microsoft.com/en-us/concurrency/default.aspx
And this blog entry: http://blogs.msdn.com/pfxteam/
And this: http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx
Hope it helps.