I want to increase the performance of a procedure which invokes a web service multiple times sequentially and store the result in a list.
Due that a single call to the WS last 1second and I need to do something like 300 calls to the web service if I do the job sequentially it takes 300 seconds to accomplish the task, that's why I changed the procedure implementation to multithreading using the following piece of code:
List<WCFResult> resultList= new List<WCFResult>()
using (var ws = new WCFService(binding, endpoint))
{
foreach (var singleElement in listOfelements)
{
Action action = () =>
{
var singleResult = ws.Call(singleElement);
resultList.Add(singleResult);
};
tasks.Add(Task.Factory.StartNew(action, TaskCreationOptions.LongRunning));
}
}
Task.WaitAll(tasks.ToArray());
//Do other stuff with the resultList...
Using this code I achieve to save 0.1 seconds per single element which is less than I thought, do you know any further optimization I can do? Or can you share an alternative?
Using the following code all the request are handled in half of the time
ParallelOptions ops = new ParallelOptions();
ops.MaxDegreeOfParallelism = 16;
ConcurrentBag<WCFResult> sapResultList = new ConcurrentBag<WCFResult>();
Parallel.ForEach(allItems, ops, item =>
{
var ws = new WCFClient(binding, endpoint);
result = ws.Call(item);
svc.Close();
resultList.Add(result);
});
//Do other stuff with the resultList...
Mission accomplished. I also modified the result list to be a ConcurrentBag instead of a List
Related
I have an API that must call in parallel 4 HttpClients supporting a concurrency of 500 users per second (all of them calling the API at the same time)
There must be a strict timeout letting the API to return a result even if not all the HttpClients calls have returned a value.
The endpoints are external third party APIs and I don't have any control on them or know the code.
I did extensive research on the matter, but even if many solution works, I need the one that consume less CPU as possible since I have a low server budget.
So far I came up with this:
var conn0 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn1 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn2 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var conn3 = new HttpClient
{
Timeout = TimeSpan.FromMilliseconds(1000),
BaseAddress = new Uri("http://endpoint")
};
var list = new List<HttpClient>() { conn0, conn1, conn2, conn3 };
var timeout = TimeSpan.FromMilliseconds(1000);
var allTasks = new List<Task<Task>>();
//the async DoCall method just call the HttpClient endpoint and return a MyResponse object
foreach (var call in list)
{
allTasks.Add(Task.WhenAny(DoCall(call), Task.Delay(timeout)));
}
var completedTasks = await Task.WhenAll(allTasks);
var allResults = completedTasks.OfType<Task<MyResponse>>().Select(task => task.Result).ToList();
return allResults;
I use WhenAny and two tasks, one for the call, one for the timeout.If the call task is late, the other one return anyway.
Now, this code works perfectly and everything is async, but I wonder if there is a better way of achieving this.
Ever single call to this API creates lot of threads and with 500concurrent users it needs an avarage of 8(eight) D3_V2 Azure 4-core machines resulting in crazy expenses, and the higher the timeout is, the higher the CPU use is.
Is there a better way to do this without using so many CPU resources (maybe Parallel Linq a better choice than this)?
Is the HttpClient timeout alone sufficient to stop the call and return if the endpoint do not reply in time, without having to use the second task in WhenAny?
UPDATE:
The endpoints are third party APIs, I don't know the code or have any control, the call is done in JSON and return JSON or a string.
Some of them reply after 10+ seconds once in a while or got stuck and are extremely slow,so the timeout is to free the threads and return even if with partial data from the other that returned in time.
Caching is possible but only partially since the data change all the time, like stocks and forex real time currency trading.
Your approach using the two tasks just for timeout do work, but you can do a better thing: use CancellationToken for the task, and for getting the answers from a server:
var cts = new CancellationTokenSource();
// set the timeout equal to the 1 second
cts.CancelAfter(1000);
// provide the token for your request
var response = await client.GetAsync(url, cts.Token);
After that, you simply can filter the completed tasks:
var allResults = completedTasks
.Where(t => t.IsCompleted)
.Select(task => task.Result).ToList();
This approach will decrease the number of tasks you're creating no less than two times, and will decrease the overhead on your server. Also, it will provide you a simple way to cancel some part of the handling or even whole one. If your tasks are completely independent from each other, you may use a Parallel.For for calling the http clients, yet still usee the token for cancelling the operation:
ParallelLoopResult result = Parallel.For(list, call => DoCall(call, cts.Token));
// handle the result of the parallel tasks
or, using the PLINQ:
var results = list
.AsParallel()
.Select(call => DoCall(call, cts.Token))
.ToList();
I'm new to Parallel programming but I've searched a lot of blogs and other sites (including SO) to achieve the following: I need to call an external webservice (which I don't have code access) multiple times. Every call takes about 10 seconds to process so I decided to use Parallel to gain performance.
public BlockingCollection<TransacaoResponse> ExecutarAsync(TransacaoRequest request)
{
BlockingCollection<TransacaoResponse> listResponse = new BlockingCollection<TransacaoResponse>();
Parallel.ForEach<Item>(request.Itens, new ParallelOptions() { MaxDegreeOfParallelism = 10 }, (c) =>
{
TransacaoResponse responseInner = new TransacaoResponse();
int result = _operadora.EnviarTransacao(request.Codigo);
responseInner.Status = result;
listResponse.Add(responseInner);
});
return listResponse;
}
even using Parallel.ForEach and BlockingCollection, it's taking 50 secs to return the result (I tested calling this method 5 times), the same time comparing to standard Foreach. What am I missing? Is there a bottleneck within this code? Thanks.
I have a list of 100 urls. I need to fetch the html content of those urls. Lets say I don't use the async version of DownloadString and instead do the following.
var task1 = SyTask.Factory.StartNew(() => new WebClient().DownloadString("url1"));
What I want to achieve is to get the html string for at max 4 urls at a time.
I start 4 tasks for the first four urls. Assume the 2nd url completes, I want to immediately start the 5th task for the 5th url. And so on. This way at max 4 only 4 urls will be downloaded, and for all purposes there will always be 4 urls being downloaded, ie till all 100 are processed.
I can't seem to visualize how will I actually achieve this. There must be an established pattern for doing this. Thoughts?
EDIT:
Following up on #Damien_The_Unbeliever's comment to use Parallel.ForEach, I wrote the following
var urls = new List<string>();
var results = new Dictionary<string, string>();
var lockObj = new object();
Parallel.ForEach(urls,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
url =>
{
var str = new WebClient().DownloadString(url);
lock (lockObj)
{
results[url] = str;
}
});
I think the above reads better than creating individual tasks and using a semaphore to limit concurrency. That said having never used or worked with Parallel.ForEach, I am unsure if this correctly does what I need to do.
SemaphoreSlim sem = new SemaphoreSlim(4);
foreach (var url in urls)
{
sem.Wait();
Task.Factory.StartNew(() => new WebClient().DownloadString(url))
.ContinueWith(t => sem.Release());
}
Actually, Task.WaitAnyis much better for what you're trying to achieve than ContinueWith
int tasksPerformedCount = 0
Task[] tasks = //initial 4 tasks
while(tasksPerformedCount< 100)
{
//returns the index of the first task to complete, as soon as it completes
int index = Task.WaitAny(tasks);
tasksPerformedCount++;
//replace it with a new one
tasks[index] = //new task
}
Edit:
Another example of Task.WaitAny from http://www.amazon.co.uk/Exam-Ref-70-483-Programming-In/dp/0735676828/ref=sr_1_1?ie=UTF8&qid=1378105711&sr=8-1&keywords=exam+ref+70-483+programming+in+c
namespace Chapter1 {
public static class Program {
public static void Main() {
Task<int>[] tasks = new Task<int>[3];
tasks[0] = Task.Run(() => { Thread.Sleep(2000); return 1; });
tasks[1] = Task.Run(() => { Thread.Sleep(1000); return 2; });
tasks[2] = Task.Run(() => { Thread.Sleep(3000); return 3; });
while (tasks.Length > 0)
{
int i = Task.WaitAny(tasks);
Task<int> completedTask = tasks[i];
Console.WriteLine(completedTask.Result);
var temp = tasks.ToList();
temp.RemoveAt(i);
tasks = temp.ToArray();
}
}
}
}
Attempting to write a HTML crawler using the Async CTP I have gotten stuck as to how to write a recursion free method for accomplishing this.
This is the code I have so far.
private readonly ConcurrentStack<LinkItem> _LinkStack;
private readonly Int32 _MaxStackSize;
private readonly WebClient client = new WebClient();
Func<string, string, Task<List<LinkItem>>> DownloadFromLink = async (BaseURL, uri) =>
{
string html = await client.DownloadStringTaskAsync(uri);
return LinkFinder.Find(html, BaseURL);
};
Action<LinkItem> DownloadAndPush = async (o) =>
{
List<LinkItem> result = await DownloadFromLink(o.BaseURL, o.Href);
if (this._LinkStack.Count() + result.Count <= this._MaxStackSize)
{
this._LinkStack.PushRange(result.ToArray());
o.Processed = true;
}
};
Parallel.ForEach(this._LinkStack, (o) =>
{
DownloadAndPush(o);
});
But obviously this doesn't work as I would hope because at the time that Parallel.ForEach executes the first (and only iteration) I only have only 1 item. The simplest approach I can think of to make the ForEach recursive but I can't (I don't think) do this as I would quickly run out of stack space.
Could anyone please guide me as to how I can restructure this code, to create what I would describe as a recursive continuation that adds items until either the MaxStackSize is reached or the system runs out of memory?
I think the best way to do something like this using C# 5/.Net 4.5 is to use TPL Dataflow. There even is a walkthrough on how to implement web crawler using it.
Basically, you create one "block" that takes care of downloading one URL and getting the link from it:
var cts = new CancellationTokenSource();
Func<LinkItem, Task<IEnumerable<LinkItem>>> downloadFromLink =
async link =>
{
// WebClient is not guaranteed to be thread-safe,
// so we shouldn't use one shared instance
var client = new WebClient();
string html = await client.DownloadStringTaskAsync(link.Href);
return LinkFinder.Find(html, link.BaseURL);
};
var linkFinderBlock = new TransformManyBlock<LinkItem, LinkItem>(
downloadFromLink,
new ExecutionDataflowBlockOptions
{ MaxDegreeOfParallelism = 4, CancellationToken = cts.Token });
You can set MaxDegreeOfParallelism to any value you want. It says at most how many URLs can be downloaded concurrently. If you don't want to limit it at all, you can set it to DataflowBlockOptions.Unbounded.
Then you create one block that processes all the downloaded links somehow, like storing them all in a list. It can also decide when to cancel downloading:
var links = new List<LinkItem>();
var storeBlock = new ActionBlock<LinkItem>(
linkItem =>
{
links.Add(linkItem);
if (links.Count == maxSize)
cts.Cancel();
});
Since we didn't set MaxDegreeOfParallelism, it defaults to 1. That means using collection that is not thread-safe should be okay here.
We create one more block: it will take a link from linkFinderBlock, and pass it both to storeBlock and back to linkFinderBlock.
var broadcastBlock = new BroadcastBlock<LinkItem>(li => li);
The lambda in its constructor is a "cloning function". You can use it to create a clone of the item if you want to, but it shouldn't be necessary here, since we don't modify the LinkItem after creation.
Now we can connect the blocks together:
linkFinderBlock.LinkTo(broadcastBlock);
broadcastBlock.LinkTo(storeBlock);
broadcastBlock.LinkTo(linkFinderBlock);
Then we can start processing by giving the first item to linkFinderBlock (or broadcastBlock, if you want to also send it to storeBlock):
linkFinderBlock.Post(firstItem);
And finally wait until the processing is complete:
try
{
linkFinderBlock.Completion.Wait();
}
catch (AggregateException ex)
{
if (!(ex.InnerException is TaskCanceledException))
throw;
}
Our company has a web service which I want to send XML files (stored on my drive) via my own HTTPWebRequest client in C#. This already works. The web service supports 5 synchronuous requests at the same time (I get a response from the web service once the processing on the server is completed). Processing takes about 5 minutes for each request.
Throwing too many requests (> 5) results in timeouts for my client. Also, this can lead to errors on the server side and incoherent data. Making changes on the server side is not an option (from different vendor).
Right now, my Webrequest client will send the XML and wait for the response using result.AsyncWaitHandle.WaitOne();
However, this way, only one request can be processed at a time although the web service supports 5. I tried using a Backgroundworker and Threadpool but they create too many requests at same, which make them useless to me. Any suggestion, how one could solve this problem? Create my own Threadpool with exactly 5 threads? Any suggestions, how to implement this?
The easy way is to create 5 threads ( aside: that's an odd number! ) that consume the xml files from a BlockingCollection.
Something like:
var bc = new BlockingCollection<string>();
for ( int i = 0 ; i < 5 ; i++ )
{
new Thread( () =>
{
foreach ( var xml in bc.GetConsumingEnumerable() )
{
// do work
}
}
).Start();
}
bc.Add( xml_1 );
bc.Add( xml_2 );
...
bc.CompleteAdding(); // threads will end when queue is exhausted
If you're on .Net 4, this looks like a perfect fit for Parallel.ForEach(). You can set its MaxDegreeOfParallelism, which means you are guaranteed that no more items are processed at one time.
Parallel.ForEach(items,
new ParallelOptions { MaxDegreeOfParallelism = 5 },
ProcessItem);
Here, ProcessItem is a method that processes one item by accessing your server and blocking until the processing is done. You could use a lambda instead, if you wanted.
Creating your own threadpool of five threads isn't tricky - Just create a concurrent queue of objects describing the request to make, and have five threads that loop through performing the task as needed. Add in an AutoResetEvent and you can make sure they don't spin furiously while there are no requests that need handling.
It can though be tricky to return the response to the correct calling thread. If this is the case for how the rest of your code works, I'd take a different approach and create a limiter that acts a bit like a monitor but allowing 5 simultaneous threads rather than only one:
private static class RequestLimiter
{
private static AutoResetEvent _are = new AutoResetEvent(false);
private static int _reqCnt = 0;
public ResponseObject DoRequest(RequestObject req)
{
for(;;)
{
if(Interlocked.Increment(ref _reqCnt) <= 5)
{
//code to create response object "resp".
Interlocked.Decrement(ref _reqCnt);
_are.Set();
return resp;
}
else
{
if(Interlocked.Decrement(ref _reqCnt) >= 5)//test so we don't end up waiting due to race on decrementing from finished thread.
_are.WaitOne();
}
}
}
}
You could write a little helper method, that would block the current thread until all the threads have finished executing the given action delegate.
static void SpawnThreads(int count, Action action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
new Thread(() =>
{
action();
countdown.Signal();
}).Start();
}
countdown.Wait();
}
And then use a BlockingCollection<string> (thread-safe collection), to keep track of your xml files. By using the helper method above, you could write something like:
static void Main(string[] args)
{
var xmlFiles = new BlockingCollection<string>();
// Add some xml files....
SpawnThreads(5, () =>
{
using (var web = new WebClient())
{
web.UploadFile(xmlFiles.Take());
}
});
Console.WriteLine("Done");
Console.ReadKey();
}
Update
An even better approach would be to upload the files async, so that you don't waste resources on using threads for an IO task.
Again you could write a helper method:
static void SpawnAsyncs(int count, Action<CountdownEvent> action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
action(countdown);
}
countdown.Wait();
}
And use it like:
static void Main(string[] args)
{
var urlXML = new BlockingCollection<Tuple<string, string>>();
urlXML.Add(Tuple.Create("http://someurl.com", "filename"));
// Add some more to collection...
SpawnAsyncs(5, c =>
{
using (var web = new WebClient())
{
var current = urlXML.Take();
web.UploadFileCompleted += (s, e) =>
{
// some code to mess with e.Result (response)
c.Signal();
};
web.UploadFileAsyncAsync(new Uri(current.Item1), current.Item2);
}
});
Console.WriteLine("Done");
Console.ReadKey();
}