I'm new to Parallel programming but I've searched a lot of blogs and other sites (including SO) to achieve the following: I need to call an external webservice (which I don't have code access) multiple times. Every call takes about 10 seconds to process so I decided to use Parallel to gain performance.
public BlockingCollection<TransacaoResponse> ExecutarAsync(TransacaoRequest request)
{
BlockingCollection<TransacaoResponse> listResponse = new BlockingCollection<TransacaoResponse>();
Parallel.ForEach<Item>(request.Itens, new ParallelOptions() { MaxDegreeOfParallelism = 10 }, (c) =>
{
TransacaoResponse responseInner = new TransacaoResponse();
int result = _operadora.EnviarTransacao(request.Codigo);
responseInner.Status = result;
listResponse.Add(responseInner);
});
return listResponse;
}
even using Parallel.ForEach and BlockingCollection, it's taking 50 secs to return the result (I tested calling this method 5 times), the same time comparing to standard Foreach. What am I missing? Is there a bottleneck within this code? Thanks.
Related
I want to increase the performance of a procedure which invokes a web service multiple times sequentially and store the result in a list.
Due that a single call to the WS last 1second and I need to do something like 300 calls to the web service if I do the job sequentially it takes 300 seconds to accomplish the task, that's why I changed the procedure implementation to multithreading using the following piece of code:
List<WCFResult> resultList= new List<WCFResult>()
using (var ws = new WCFService(binding, endpoint))
{
foreach (var singleElement in listOfelements)
{
Action action = () =>
{
var singleResult = ws.Call(singleElement);
resultList.Add(singleResult);
};
tasks.Add(Task.Factory.StartNew(action, TaskCreationOptions.LongRunning));
}
}
Task.WaitAll(tasks.ToArray());
//Do other stuff with the resultList...
Using this code I achieve to save 0.1 seconds per single element which is less than I thought, do you know any further optimization I can do? Or can you share an alternative?
Using the following code all the request are handled in half of the time
ParallelOptions ops = new ParallelOptions();
ops.MaxDegreeOfParallelism = 16;
ConcurrentBag<WCFResult> sapResultList = new ConcurrentBag<WCFResult>();
Parallel.ForEach(allItems, ops, item =>
{
var ws = new WCFClient(binding, endpoint);
result = ws.Call(item);
svc.Close();
resultList.Add(result);
});
//Do other stuff with the resultList...
Mission accomplished. I also modified the result list to be a ConcurrentBag instead of a List
I have 1000 elements in a TPL dataflow block,
each element will call external webservices.
the web service supports a maximum of 10 simultaneous calls,
which is easily achieved using:
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
...
}
The web service requires each call to have a unique id passed which distinguises it from the other simultaneous calls.
In theory this should be a guid, but in practise the 11th GUID will fail - because the throttling mechanism on the server is slow to recognise that the first call is finished.
The vendor suggests we recycle the guids, keeping 10 in active use.
I intend to have an array of GUIDS, each task will use (Interlocked.Increment(ref COUNTER) % 10 ) as the array index
EDIT :
I just realised this won't work!
It assumes tasks will complete in order which they may not
I could implement this as a queue of IDs where each task borrows and returns one, but the question still stands, is there a an easier, pre bulit thread-safe way to do this?
(there will never be enough calls for COUNTER to overflow)
But I've been surprised a number of times by C# (I'm new to .net) that I am implementing something that already exists.
Is there a better thread-safe way for each task to recycle from a pool of ids?
Creating resource pools is the exact situation System.Collections.ConcurrentBag<T> is useful for. Wrap it up in a BlockingCollection<T> to make the code easier.
class Example
{
private readonly BlockingCollection<Guid> _guidPool;
private readonly TransformBlock<Foo, Bar> _transform;
public Example(int concurrentLimit)
{
_guidPool = new BlockingCollection<Guid>(new ConcurrentBag<Guid>(), concurrentLimit)
for(int i = 0: i < concurrentLimit; i++)
{
_guidPool.Add(Guid.NewGuid());
}
_transform = new TransformBlock<Foo, Bar>(() => SomeAction,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = concurrentLimit
//...
});
//...
}
private async Task<Bar> SomeAction(Foo foo)
{
var id= _guidPool.Take();
try
{
//...
}
finally
{
_guidPool.Add(id);
}
}
}
I have an API that must call in parallel 4 HttpClients supporting a concurrency of 500 users per second (all of them calling the API at the same time)
There must be a strict timeout letting the API to return a result even if not all the HttpClients calls have returned a value.
The endpoints are external third party APIs and I don't have any control on them or know the code.
I did extensive research on the matter, but even if many solution works, I need the one that consume less CPU as possible since I have a low server budget.
So far I came up with this:
var conn0 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn1 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn2 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn3 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var list = new List<HttpClient>() { conn0, conn1, conn2, conn3 };
var timeout = TimeSpan.FromMilliseconds(1000);
var allTasks = new List<Task<Task>>();
//the async DoCall method just call the HttpClient endpoint and return a MyResponse object
foreach (var call in list)
{
allTasks.Add(Task.WhenAny(DoCall(call), Task.Delay(timeout)));
}
var completedTasks = await Task.WhenAll(allTasks);
var allResults = completedTasks.OfType<Task<MyResponse>>().Select(task => task.Result).ToList();
return allResults;
I use WhenAny and two tasks, one for the call, one for the timeout.If the call task is late, the other one return anyway.
Now, this code works perfectly and everything is async, but I wonder if there is a better way of achieving this.
Ever single call to this API creates lot of threads and with 500concurrent users it needs an avarage of 8(eight) D3_V2 Azure 4-core machines resulting in crazy expenses, and the higher the timeout is, the higher the CPU use is.
Is there a better way to do this without using so many CPU resources (maybe Parallel Linq a better choice than this)?
Is the HttpClient timeout alone sufficient to stop the call and return if the endpoint do not reply in time, without having to use the second task in WhenAny?
UPDATE:
The endpoints are third party APIs, I don't know the code or have any control, the call is done in JSON and return JSON or a string.
Some of them reply after 10+ seconds once in a while or got stuck and are extremely slow,so the timeout is to free the threads and return even if with partial data from the other that returned in time.
Caching is possible but only partially since the data change all the time, like stocks and forex real time currency trading.
Your approach using the two tasks just for timeout do work, but you can do a better thing: use CancellationToken for the task, and for getting the answers from a server:
var cts = new CancellationTokenSource();
// set the timeout equal to the 1 second
cts.CancelAfter(1000);
// provide the token for your request
var response = await client.GetAsync(url, cts.Token);
After that, you simply can filter the completed tasks:
var allResults = completedTasks
.Where(t => t.IsCompleted)
.Select(task => task.Result).ToList();
This approach will decrease the number of tasks you're creating no less than two times, and will decrease the overhead on your server. Also, it will provide you a simple way to cancel some part of the handling or even whole one. If your tasks are completely independent from each other, you may use a Parallel.For for calling the http clients, yet still usee the token for cancelling the operation:
ParallelLoopResult result = Parallel.For(list, call => DoCall(call, cts.Token));
// handle the result of the parallel tasks
or, using the PLINQ:
var results = list
.AsParallel()
.Select(call => DoCall(call, cts.Token))
.ToList();
I have a huge collection, over which i have to perform a specific task(which involves calling a wcf service). I want to control the number of threads instead of using Parallel.ForEach directly. Here i have 2 options:
I am using below to partition the data:
List<MyCollectionObject> MyCollection = new List<MyCollectionObject>();
public static IEnumerable<List<T>> PartitionMyData<T>(this IList<T> source, Int32 size)
{
for (int i = 0; i < Math.Ceiling(source.Count / (Double)size); i++)
{
yield return new List<T>(source.Skip(size * i).Take(size));
}
}
Option 1:
MyCollection.PartitionMyData(AutoEnrollRequests.Count()/threadValue).AsParallel().AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
foreach(MyCollectionObject obj in requests)
{
//Do Something
}
}
Option2:
MyCollection.PartitionMyData(threadValue).AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
Action<MyCollectionObject> dosomething =
{
}
Parallel.ForEach(requests,dosomething)
}
If i have 16 objects in my collection, as per my knowledge Option1 will launch 4 threads, each thread having 4 objects will be processed synchronously.
Option 2 will launch 4 threads with 1 object each, process them and again will launch 4 threads.
Can anyone please suggest which option is better?
P.S.
I understand .Net framework does thread pooling and we need not control the number of threads but due to some design decision we want to use it.
Thanks In Advance,
Rohit
I want to control the number of threads instead of using Parallel.ForEach directly
You can control de number of threads in Parallel.ForEach if you use this call with a ParallelOptions object:
Parallel.ForEach(requests,
new ParallelOptions(){MaxDegreeOfParallelism = 4}, //change here
dosomething)
It's impossible to give an A or B answer here. It depends on too many unknowns.
I will assume you want the fastest approach. To see which is better, run both on the target environment (or closest approximation you can get) and see which one completes fastest.
I am new to Parallel Programming and infact this is the first time I am trying it. I am currently doing a project in .NET 4 and prefer to do have 4 or 5 parallel executions.
I see some options. There is Task.Factory.StartNew Parallel.For Parallel.ForEach etc.
What I am going to do is post to a web-site and fetch the responses for about 200 URLs.
When I use Parallel.ForEach I didn't find a way to control the number of threads and the application went using 130+ threads and the website went unresponsive :)
I am interested in using Task.Factory.StartNew within a for loop and divide the URLs in to 4 or 5 tasks.
List<Task> tasks = new List<Task>();
for (int i = 0; i < 5; i++)
{
List<string> UrlForTask = GetUrlsForTask(i,5); //Lets say will return some thing like 1 of 5 of the list of URLs
int j = i;
var t = Task.Factory.StartNew(() =>
{
List<PageSummary> t = GetSummary(UrlForTask);
Summary.AddRange(t); //Summary is a public variable
}
tasks.Add(t);
}
I believe that these Tasks kind of boil down to threads. So if I make Summary a List<PageSummary> will it be kind of thread safe (I understand there are issues accessing a shared variable by multiple threads)?
Is this where we should use ConcurrentQueue<T> ?
Do you know of a good resource that helps to learn about accessing and updating a shared variable by multiple tasks etc?
What is the best way I could use for this type of task as you may think ?
Parallel.ForEach has overloads that take a ParallelOptions instance. The MaxDegreeOfParallelism property of that class is what you need to use.
List<MyRequest> requests = ...;
BlockingCollection<MyResponse> responses = ...;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(
requests,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
request => responses.Add(MyDownload(request)));
responses.CompleteAdding();
});
foreach (var response in responses.GetConsumingEnumerable())
{
Console.WriteLine(response.MyMessage);
}