How to run parallel code for multiple search methods - c#

I'm working on a code I would like to improve. It is a search method. Based on an input I would like to search this value in multiple tables of my database.
public async Task<IEnumerable<SearchResponseModel>> Search(string input)
{
var listOfSearchResponse = new List<SearchResponseModel>();
listOfSearchResponse.AddRange(await SearchOrder(input)),
listOfSearchResponse.AddRange(await SearchJob(input));
listOfSearchResponse.AddRange(await SearchClient(input));
listOfSearchResponse.AddRange(await SearchItem(input));
listOfSearchResponse.AddRange(await SearchProduction(input));
return listOfSearchResponse;
}
I use the work await because every search is defined like this one:
public async Task<IEnumerable<SearchResponseModel>> SearchOrder(string input) {...}
My five search methods are not yet really async. They all execute in sequence after the previous one. What should I do from here to make them parallel?

I would think that something like this should work, in theory:
var tasks = new[]
{
SearchOrder(input),
SearchJob(input),
SearchClient(input),
SearchItem(input),
SearchProduction(input)
};
await Task.WhenAll(tasks);
//var listOfSearchResponse = tasks.Select(t => t.Result).ToList();
var listOfSearchResponse = tasks. SelectMany(t => t.Result).ToList();
In practice, it's hard to know how much benefit you'll see.

It's worth considering using Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
public IObservable<SearchResponseModel> SearchObservable(string input) =>
Observable.Defer<SearchResponseModel>(() =>
new []
{
Observable.FromAsync(() => SearchOrder(input)),
Observable.FromAsync(() => SearchJob(input)),
Observable.FromAsync(() => SearchClient(input)),
Observable.FromAsync(() => SearchItem(input)),
Observable.FromAsync(() => SearchProduction(input)),
}
.Merge()
.SelectMany(x => x));
The advantage here is that as each search completes you get the partial results through from the observable - there's no need to wait until all the tasks have finished.
Observables signal each value as they are produced and they signal a finial completion so you know when all of the results are through.

Related

rx.net locking up from use of ToEnumerable

I am trying to convert the below statement so that I can get the key alongside the selected list:
var feed = new Subject<TradeExecuted>();
feed
.GroupByUntil(x => (x.Execution.Contract.Symbol, x.Execution.AccountId, x.Tenant, x.UserId), x => Observable.Timer(TimeSpan.FromSeconds(5)))
.SelectMany(x => x.ToList())
.Select(trades => Observable.FromAsync(() => Mediator.Publish(trades, cts.Token)))
.Concat() // Ensure that the results are serialized.
.Subscribe(cts.Token); // Check status of calls.
The above works, whereas the below does not - when I try and itterate over the list, it locks up.
feed
.GroupByUntil(x => (x.Execution.Contract.Symbol, x.Execution.AccountId, x.Tenant, x.UserId), x => Observable.Timer(timespan))
.Select(x => Observable.FromAsync(() =>
{
var list = x.ToEnumerable(); // <---- LOCK UP if we use list.First() etc
var aggregate = AggregateTrades(x.Key.Symbol, x.Key.AccountId, x.Key.Tenant, list);
return Mediator.Publish(aggregate, cts.Token);
}))
.Concat()
.Subscribe(cts.Token); // Check status of calls.
I am clearly doing something wrong and probably horrific!
Going back to the original code, how can I get the Key alongside the enumerable list (and avoiding the hack below)?
As a sidenote, the below code works but it a nasty hack where I get the keys from the first list item:
feed
.GroupByUntil(x => (x.Execution.Contract.Symbol, x.Execution.AccountId, x.Tenant, x.UserId), x => Observable.Timer(TimeSpan.FromSeconds(5)))
.SelectMany(x => x.ToList())
.Select(trades => Observable.FromAsync(() =>
{
var firstTrade = trades.First();
var aggregate = AggregateTrades(firstTrade.Execution.Contract.Symbol, firstTrade.Execution.AccountId, firstTrade.Tenant, trades);
return Mediator.Publish(aggregate, cts.Token);
}))
.Concat() // Ensure that the results are serialized.
.Subscribe(cts.Token); // Check status of calls.
All versions of your code suffer from trying to eagerly evaluate the grouped sub-observable. Since in v1 and v3 your group observable will run a maximum of 5 seconds, that isn't horrible/awful, but it's still not great. In v2, I don't know what timespan is, but assuming it's 5 seconds, you have the same problem: Trying to turn the grouped sub-observable into a list or an enumerable means waiting for the sub-observable to complete, blocking the thread (or the task).
You can fix this by using the Buffer operator to lazily evaluate the grouped sub-observable:
var timespan = TimeSpan.FromSeconds(5);
feed
.GroupByUntil(x => (x.Execution.Contract.Symbol, x.Execution.AccountId, x.Tenant, x.UserId), x => Observable.Timer(timespan))
.SelectMany(x => x
.Buffer(timespan)
.Select(list => Observable.FromAsync(() =>
{
var aggregate = AggregateTrades(x.Key.Symbol, x.Key.AccountId, x.Key.Tenant, list));
return Mediator.Publish(aggregate, cts.Token);
}))
)
.Concat() // Ensure that the results are serialized.
.Subscribe(cts.Token); // Check status of calls.
This essentially means that until timespan is up, the items in the group by gather in a list inside Buffer. Once timespan is up, they're released as a list, and the mediator publish happens.

Rx.Net - process groups asynchronously and in parallel with a constrained concurrency

Playing with System.Reactive trying to resolve the next task -
Break an incoming stream of strings into groups
Items in each group must be processed asynchronously and sequentially
Groups must be processed in parallel
No more than N groups must be processed at the same time
Ideally, w/o using sync primitives
Here is the best I've figured out so far -
TaskFactory taskFactory = new (new LimitedConcurrencyLevelTaskScheduler(2));
TaskPoolScheduler scheduler = new (taskFactory);
source
.GroupBy(item => item)
.SelectMany(g => g.Select(item => Observable.FromAsync(() => onNextAsync(item))).ObserveOn(scheduler).Concat())
.Subscribe();
Any idea how to achieve it w/o a scheduler? Couldn't make it work via Merge()
The easiest way to enforce the "No more than N groups must be processed at the same time" limitation, is probably to use a SemaphoreSlim. So instead of this:
.SelectMany(g => g.Select(item => Observable.FromAsync(() => onNextAsync(item))).Concat())
...you can do this:
var semaphore = new SemaphoreSlim(N, N);
//...
.SelectMany(g => g.Select(item => Observable.FromAsync(async () =>
{
await semaphore.WaitAsync();
try { return await onNextAsync(item); }
finally { semaphore.Release(); }
})).Merge(1))
Btw in the current Rx version (5.0.0) I don't trust the Concat operator, and I prefer to use the Merge(1) instead.
To solve this problem using exclusively Rx tools, ideally you would like to have something like this:
source
.GroupBy(item => item.Key)
.Select(group => group.Select(
item => Observable.FromAsync(() => ProcessAsync(item))).Merge(1))
.Merge(maxConcurrent: N)
.Wait();
The inner Merge(1) would enforce the exclusive processing within each group, and the outer Merge(N) would enforce the global maximum concurrency policy. Unfortunately this doesn't work because the outer Merge(N) restricts the subscriptions to the inner sequences (the IGroupedObservable<T>s), not to their individual elements. This is not what you want. The result will be that only the first N groups to be processed, and the elements of all other groups will be ignored. The GroupBy operator creates hot subsequences, and if you don't subscribe to them immediately you'll lose elements.
In order for the outer Merge(N) to work as desired, you'll have to merge freely all the inner sequences that are produced by the Observable.FromAsync, and have some other mechanism to serialize the processing of each group. One idea is to implement a special Select operator that emits an Observable.FromAsync only after the previous one is completed. Below is such an implementation, based on the Zip operator. The Zip operator maintains internally two hidden buffers, so that it can produce pairs from two sequences that might emit elements with different frequences. This buffering is exactly what we need in order to avoid losing elements.
private static IObservable<IObservable<TResult>> SelectOneByOne<TSource, TResult>(
this IObservable<TSource> source,
Func<TSource, IObservable<TResult>> selector)
{
var subject = new BehaviorSubject<Unit>(default);
var synchronizedSubject = Observer.Synchronize(subject);
return source
.Zip(subject, (item, _) => item)
.Select(item => selector(item).Do(
_ => { },
_ => synchronizedSubject.OnNext(default),
() => synchronizedSubject.OnNext(default)));
}
The BehaviorSubject<T> contains initially one element, so the first pair will be produced immediately. The second pair will not be produced before the first element has been processed. The same with the third pair and second element, etc.
You could then use this operator to solve the problem like this:
source
.GroupBy(item => item.Key)
.SelectMany(group => group.SelectOneByOne(
item => Observable.FromAsync(() => ProcessAsync(item))))
.Merge(maxConcurrent: N)
.Wait();
The above solution is presented only for the purpose of answering the question. I don't think that I would trust it in a production environment.

How can I get the first async response fastest (and don't perform the remainder)?

Here's the setup: There is a federal remote service which returns whether a particular value is correct or not correct. We can send requests as we like, up to 50 per request to the remote service.
Since we need to only use the correct value, and the set of possible values is small (~700), we can just send 15 or so batch requests of 50 and the correct value will be part of the result set. As such, I've used the following code:
Observable
.Range(0, requests.Count)
.Select(i => Observable.FromAsync(async () =>
{
responses.Add(await client.FederalService.VerifyAsync(requests[i]));
Console.Write(".");
}))
.Merge(8)
.Wait();
But - what I don't like about this is that if one of the earlier requests has the correct value, I still run all the possibilities through the service wasting time. I'm trying to make this run as fast as possible. I know the exit condition (response code is from 1 to 99, any response code within 50-59 indicates the value is "correct").
Is there a way to make this code a little smarter, so we minimize the number of requests? Unfortunately, the value we are verifying is distributed evenly so sorting the requests does nothing (that I'm aware of).
You should consider usage of the FirstAsync method here:
The secret in our example is the FirstAsync method. We are actually awaiting the first result returned by our observable and don’t care about any further results.
So your code could be like this:
await Observable
.Range(0, requests.Count)
.Select(i => Observable.FromAsync(async () =>
{
responses.Add(await client.FederalService.VerifyAsync(requests[i]));
Console.Write(".");
}))
.FirstAsync()
.Subscribe(Console.WriteLine);
> System.Reactive.Linq.ObservableImpl.Defer`1[System.Reactive.Unit]
Rx and Await: Some Notes article provides some tricks with similar methods. For example, you have an overload for FirstAsync, which can be filtered, as the LINQ' method First:
await Observable
.Range(0, requests.Count)
.Select(i => Observable.FromAsync(async () =>
{
responses.Add(await client.FederalService.VerifyAsync(requests[i]));
Console.Write(".");
}))
.FirstAsync(r => /* do the check here */)
.Subscribe(Console.WriteLine);
You're pretty close. Change your observable to this:
Observable
.Range(0, requests.Count)
.Select(i => Observable.FromAsync(async () =>
{
var response = await Task.FromResult(i); //replace with client.FederalService.VerifyAsync(requests[i])
responses.Add(response);
Console.Write($"{i}.");
var responseCode = response; //replace with however you get the response code.
return responseCode >= 50 && responseCode <= 59;
}))
.Merge(8)
.Where(b => b)
.Take(1)
.Wait();
This way your observable continues to emit values, so you can continue acting on it.

Rx extensions Parallel.ForEach throttling

I'm following the answer to this question: Rx extensions: Where is Parallel.ForEach? in order to run a number of operations in parallel using Rx.
The problem I'm running into is that it seems to be allocating a new thread for every request, whereas using Parallel.ForEach did considerably fewer.
The processes I'm running in parallel are quite memory intensive, so if I'm trying to process hundreds of items at once the answer provided to the linked question quickly sees me running out of memory.
Is there a way I can modify that answer to throttle the number of items being done at any given time?
I've taken a look at the Window and Buffer operations, my code looks like this:
return inputs.Select(i => new AccountViewModel(i))
.ToObservable()
.ObserveOn(RxApp.MainThreadScheduler)
.ToList()
.Do(l =>
{
using (Accounts.SuppressChangeNotifications())
{
Accounts.AddRange(l);
}
})
.SelectMany(x => x)
.SelectMany(acc => Observable.StartAsync(async () =>
{
var res = await acc.ProcessAsync(config, m, outputPath);
processed++;
var prog = ((double) processed/inputs.Count())*100.0;
OverallProgress.Message.OnNext(string.Format("Processing Accounts ({0:000}%)", prog));
OverallProgress.Progress.OnNext(prog);
return res;
}))
.All(x => x);
Ideally I want to be able to batch it up into chunks of account view models, that I then call the ProcessAsync method on, and only once all of that batch are done move on.
Ideally I'd like it so that if even only one of the batch finished, it moved on, but only ever kept the same batch size.
So if I've got a batch of 5 and 1 finishes, I'd like another to start, but only one until more space is available.
As usual, Paul Betts has answered a similar question that solves my problem:
The question: Reactive Extensions Parallel processing based on specific number
Has some information on using Observable.Defer and then merging into batches, using that I've modified my previous code like so:
return inputs.Select(i => new AccountViewModel(i))
.ToObservable()
.ObserveOn(RxApp.MainThreadScheduler)
.ToList()
.Do(l =>
{
using (Accounts.SuppressChangeNotifications())
{
Accounts.AddRange(l);
}
})
.SelectMany(x => x)
.Select(x => Observable.DeferAsync(async _ =>
{
var res = await x.ProcessAsync(config, m, outputPath);
processed++;
var prog = ((double) processed/inputs.Count())*100.0;
OverallProgress.Message.OnNext(string.Format("Processing Accounts ({0:000}%)", prog));
OverallProgress.Progress.OnNext(prog);
return Observable.Return(res);
}))
.Merge(5)
.All(x => x);
And sure enough, I get the rolling completion behaviour (e.g. if 1/5 finish then just one starts).
Clearly I've got a few more fundamentals to grasp, but this is brilliant!

Rx extensions: Where is Parallel.ForEach?

I have a piece of code which is using Parallel.ForEach, probably based on a old version of Rx extensions or the Tasks Parallel Library. I installed a current version of Rx extensions but cannot find Parallel.ForEach. I'm not using any other fancy stuff of the library and just want to process some data in parallel like this:
Parallel.ForEach(records, ProcessRecord);
I found this question, but I would not like to depend on an old versions of Rx. But I was not able to find something similar for Rx, so what's the current and most straight forward way to do that using a current Rx version? The project is using .NET 3.5.
No need to do all this silly goosery if you have Rx:
records.ToObservable()
.SelectMany(x => Observable.Start(() => ProcessRecord(x), Scheduler.ThreadPoolScheduler))
.ToList()
.First();
(Or, if you want the order of the items maintained at the cost of efficiency):
records.ToObservable()
.Select(x => Observable.Start(() => ProcessRecord(x), Scheduler.ThreadPoolScheduler))
.Concat()
.ToList()
.First();
Or if you want to limit how many items at the same time:
records.ToObservable()
.Select(x => Observable.Defer(() => Observable.Start(() => ProcessRecord(x), Scheduler.ThreadPoolScheduler)))
.Merge(5 /* at a time */)
.ToList()
.First();
Here's a simple replacement:
class Parallel
{
public static void ForEach<T>(IEnumerable<T> source, Action<T> body)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (body == null)
{
throw new ArgumentNullException("body");
}
var items = new List<T>(source);
var countdown = new CountdownEvent(items.Count);
WaitCallback callback = state =>
{
try
{
body((T)state);
}
finally
{
countdown.Signal();
}
};
foreach (var item in items)
{
ThreadPool.QueueUserWorkItem(callback, item);
}
countdown.Wait();
}
}
In case someone find this thread nowdays, the updated version of the answer would be :
records.AsParallel().WithDegreeOfParallelism(5).ForAll(x => ProcessRecord(x));

Categories

Resources