HttpClient crawling results in memory leak - c#

I am working on a WebCrawler implementation but am facing a strange memory leak in ASP.NET Web API's HttpClient.
So the cut down version is here:
[UPDATE 2]
I found the problem and it is not HttpClient that is leaking. See my answer.
[UPDATE 1]
I have added dispose with no effect:
static void Main(string[] args)
{
int waiting = 0;
const int MaxWaiting = 100;
var httpClient = new HttpClient();
foreach (var link in File.ReadAllLines("links.txt"))
{
while (waiting>=MaxWaiting)
{
Thread.Sleep(1000);
Console.WriteLine("Waiting ...");
}
httpClient.GetAsync(link)
.ContinueWith(t =>
{
try
{
var httpResponseMessage = t.Result;
if (httpResponseMessage.IsSuccessStatusCode)
httpResponseMessage.Content.LoadIntoBufferAsync()
.ContinueWith(t2=>
{
if(t2.IsFaulted)
{
httpResponseMessage.Dispose();
Console.ForegroundColor = ConsoleColor.Magenta;
Console.WriteLine(t2.Exception);
}
else
{
httpResponseMessage.Content.
ReadAsStringAsync()
.ContinueWith(t3 =>
{
Interlocked.Decrement(ref waiting);
try
{
Console.ForegroundColor = ConsoleColor.White;
Console.WriteLine(httpResponseMessage.RequestMessage.RequestUri);
string s =
t3.Result;
}
catch (Exception ex3)
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine(ex3);
}
httpResponseMessage.Dispose();
});
}
}
);
}
catch(Exception e)
{
Interlocked.Decrement(ref waiting);
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine(e);
}
}
);
Interlocked.Increment(ref waiting);
}
Console.Read();
}
The file containing links is available here.
This results in constant rising of the memory. Memory analysis shows many bytes held possibly by the AsyncCallback. I have done many memory leak analysis before but this one seems to be at the HttpClient level.
I am using C# 4.0 so no async/await here so only TPL 4.0 is used.
The code above works but is not optimised and sometimes throws tantrum yet is enough to reproduce the effect. Point is I cannot find any point that could cause memory to be leaked.

OK, I got to the bottom of this. Thanks to #Tugberk, #Darrel and #youssef for spending time on this.
Basically the initial problem was I was spawning too many tasks. This started to take its toll so I had to cut back on this and have some state for making sure the number of concurrent tasks are limited. This is basically a big challenge for writing processes that have to use TPL to schedule the tasks. We can control threads in the thread pool but we also need to control the tasks we are creating so no level of async/await will help this.
I managed to reproduce the leak only a couple of times with this code - other times after growing it would just suddenly drop. I know that there was a revamp of GC in 4.5 so perhaps the issue here is that GC did not kick in enough although I have been looking at perf counters on GC generation 0, 1 and 2 collections.
So the take-away here is that re-using HttpClient does NOT cause memory leak.

I'm no good at defining memory issues but I gave it a try with the following code. It's in .NET 4.5 and uses async/await feature of C#, too. It seems to keep memory usage around 10 - 15 MB for the entire process (not sure if you see this a better memory usage though). But if you watch # Gen 0 Collections, # Gen 1 Collections and # Gen 2 Collections perf counters, they are pretty high with the below code.
If you remove the GC.Collect calls below, it goes back and forth between 30MB - 50MB for entire process. The interesting part is that when I run your code on my 4 core machine, I don't see abnormal memory usage by the process either. I have .NET 4.5 installed on my machine and if you don't, the problem might be related to CLR internals of .NET 4.0 and I am sure that TPL has improved a lot on .NET 4.5 based on resource usage.
class Program {
static void Main(string[] args) {
ServicePointManager.DefaultConnectionLimit = 500;
CrawlAsync().ContinueWith(task => Console.WriteLine("***DONE!"));
Console.ReadLine();
}
private static async Task CrawlAsync() {
int numberOfCores = Environment.ProcessorCount;
List<string> requestUris = File.ReadAllLines(#"C:\Users\Tugberk\Downloads\links.txt").ToList();
ConcurrentDictionary<int, Tuple<Task, HttpRequestMessage>> tasks = new ConcurrentDictionary<int, Tuple<Task, HttpRequestMessage>>();
List<HttpRequestMessage> requestsToDispose = new List<HttpRequestMessage>();
var httpClient = new HttpClient();
for (int i = 0; i < numberOfCores; i++) {
string requestUri = requestUris.First();
var requestMessage = new HttpRequestMessage(HttpMethod.Get, requestUri);
Task task = MakeCall(httpClient, requestMessage);
tasks.AddOrUpdate(task.Id, Tuple.Create(task, requestMessage), (index, t) => t);
requestUris.RemoveAt(0);
}
while (tasks.Values.Count > 0) {
Task task = await Task.WhenAny(tasks.Values.Select(x => x.Item1));
Tuple<Task, HttpRequestMessage> removedTask;
tasks.TryRemove(task.Id, out removedTask);
removedTask.Item1.Dispose();
removedTask.Item2.Dispose();
if (requestUris.Count > 0) {
var requestUri = requestUris.First();
var requestMessage = new HttpRequestMessage(HttpMethod.Get, requestUri);
Task newTask = MakeCall(httpClient, requestMessage);
tasks.AddOrUpdate(newTask.Id, Tuple.Create(newTask, requestMessage), (index, t) => t);
requestUris.RemoveAt(0);
}
GC.Collect(0);
GC.Collect(1);
GC.Collect(2);
}
httpClient.Dispose();
}
private static async Task MakeCall(HttpClient httpClient, HttpRequestMessage requestMessage) {
Console.WriteLine("**Starting new request for {0}!", requestMessage.RequestUri);
var response = await httpClient.SendAsync(requestMessage).ConfigureAwait(false);
Console.WriteLine("**Request is completed for {0}! Status Code: {1}", requestMessage.RequestUri, response.StatusCode);
using (response) {
if (response.IsSuccessStatusCode){
using (response.Content) {
Console.WriteLine("**Getting the HTML for {0}!", requestMessage.RequestUri);
string html = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
Console.WriteLine("**Got the HTML for {0}! Legth: {1}", requestMessage.RequestUri, html.Length);
}
}
else if (response.Content != null) {
response.Content.Dispose();
}
}
}
}

A recent reported "Memory Leak" in our QA environment taught us this:
Consider the TCP Stack
Don't assume the TCP Stack can do what is asked in the time "thought appropriate for the application". Sure we can spin off Tasks at will and we just love asych, but....
Watch the TCP Stack
Run NETSTAT when you think you have a memory leak. If you see residual sessions or half-baked states, you may want to rethink your design along the lines of HTTPClient reuse and limiting the amount of concurrent work being spun up. You also may need to consider using Load Balancing across multiple machines.
Half-baked sessions show up in NETSTAT with Fin-Waits 1 or 2 and Time-Waits or even RST-WAIT 1 and 2. Even "Established" sessions can be virtually dead just waiting for time-outs to fire.
The Stack and .NET are most likely not broken
Overloading the stack puts the machine to sleep. Recovery takes time and 99% of the time the stack will recover. Remember also that .NET will not release resources before their time and that no user has full control of GC.
If you kill the app and it takes 5 minutes for NETSTAT to settle down, that's a pretty good sign the system is overwhelmed. It's also a good show of how the stack is independent of the application.

The default HttpClient leaks when you use it as a short-lived object and create new HttpClients per request.
Here is a reproduction of this behavior.
As a workaround, I was able to keep using HttpClient as a short-lived object by using the following Nuget package instead of the built-in System.Net.Http assembly:
https://www.nuget.org/packages/HttpClient
Not sure what the origin of this package is, however, as soon as I referenced it the memory leak disappeared. Make sure that you remove the reference to the built-in .NET System.Net.Http library and use the Nuget package instead.

Related

Insert Multiple records in AWS keyspace using C#

Hello I just newly started with Cassandra not much familiar, can u please let me know the error here
I am trying to insert 16000 records using the bellow code
public async Task AddSprintsStories(List<SprintStories> sprintStories)
{
var tasks = new List<Task>();
try
{
if (sprintStories.Count > 0)
{
foreach (var item in sprintStories)
{
SprintStories sprintStoryData = new SprintStories();
sprintStoryData.Id = item.Id;
sprintStoryData.ProjectId = item.ProjectId;
sprintStoryData.SprintId = item.SprintId;
tasks.Add(mapper.InsertAsync<SprintStories>(sprintStoryData, new CqlQueryOptions().SetConsistencyLevel(ConsistencyLevel.LocalQuorum)));
}
await Task.WhenAll(tasks);
}
}
catch (Exception e)
{
}
}
but facing the error: c# Server timeout during write query at consistency LOCALQUORUM (0 peer(s) acknowledged the write over 2 required)
can anyone please help me out here
How does the Cassandra cluster look during this cluster? CPU or disk I/O maxed-out? Without knowing that, my guess is that those 16000 writes are happening faster than your cluster can process them, creating write back pressure. Finally, it just can't process anymore, so they start failing.
For a possible solution, try limiting the number of active threads. Something like this should do it.
int maxActiveThreads = 20;
int activeThreads = 0;
foreach (var item in sprintStories)
{
...
tasks.Add(mapper.InsertAsync<SprintStories>(sprintStoryData, new CqlQueryOptions().SetConsistencyLevel(ConsistencyLevel.LocalQuorum)));
activeThreads++;
if (activeThreads >= maxActiveThreads)
{
await Task.WhenAll(tasks);
activeThreads = 0;
}
}
await Task.WhenAll(tasks);
With this code, only 20 writes will be competing for Cassandra cluster resources at any given time. Do note, that I'm just using 20 as an example. Adjust that number to something that meets your requirements for performance and stability.
Ryan Svihla wrote a great blog post on this topic- Cassandra: Batch Loading Without the BATCH - The Nuanced Edition

How to use partitions in order to parallel consume one topic in kafka with .NET Core C#?

We are using the .NET Kafka client to consume messages from one topic in a C# code.
However, it seems to be a wee bit too slow.
Wondering if we could parallelize the process a bit, so I checked this answer there: Kafka how to consume one topic parallel
But I don't really see how to implement this partition thing with the .NET Kafka client in my example below:
var consumerBuilder = new ConsumerBuilder<Ignore, string>(GetConfig())
.SetErrorHandler((_, e) => _logger.LogError("Kafka consumer error on Revenue response. {#KafkaConsumerError}", e));
using (var consumer = consumerBuilder.Build())
{
consumer.Subscribe(RevenueResponseTopicName);
try
{
while (!stoppingToken.IsCancellationRequested)
{
var consumeResult = consumer.Consume(stoppingToken);
RevenueTopicResponseModel revenueResponse;
try
{
revenueResponse = JsonConvert.DeserializeObject<RevenueTopicResponseModel>(consumeResult.Value);
}
catch
{
_logger.LogCritical("Impossible to deserialize the response. {#RevenueConsumeResult}", consumeResult);
continue;
}
_logger.LogInformation("Revenue response received from Kafka. {RevenueTopicResponse}",
consumeResult.Value);
await _revenueService.RevenueResultReceivedAsync(revenueResponse);
}
}
catch (OperationCanceledException)
{
_logger.LogInformation($"Operation canceled. Closing {nameof(RevenueResponseConsumer)}.");
consumer.Close();
}
catch (Exception e)
{
_logger.LogCritical(e, $"Unhandled exception during {nameof(RevenueResponseConsumer)}.");
}
}
You need to create topic with multiple partitions, let's say 10.
In your code create 10 consumers with the same Consumer Group - brokers will distribute topic messages among your consumers.
Basically, just put your code inside for loop:
for (int i = 0; i < 10; i++)
{
var consumerBuilder = new ConsumerBuilder<Ignore, string>(GetConfig())
.SetErrorHandler((_, e) => _logger.LogError("Kafka consumer error on Revenue response. {#KafkaConsumerError}", e));
using (var consumer = consumerBuilder.Build())
{
// your processing here
}
}
In order to answer to this question correctly we need to know what is the reason behind this requirement to partitioning.
If your topic doesn't have lots of messages to be processed then it's not the case to use partitioning. If the issue is that a single message processing tooks too much time and you want parallelize the work, then you could add consumed messages to a Channel and have as many consumers of that channel as needed in background.
Basically you should still use a single consumer per process since a consumer utilizes threads in background
Also you may find my consideration about Kafka Consumer in C# in the article
If you have any questions, please feel free to ask! I'll be glad to help you
You can commit after a set of offsets instead of committing on each offset, which could give you some performance benefit.
if( result.offset % 5 == 0)
{
consumer.Commit(result)
}
Assuming EnableAutoCommit = false

How to measure performance of Response.Redirect(url) and Response.Redirect(url,false)?

Many people[1][2] say Response.Redirect(url) is bad, we should use Response.Redirect(url,false), because the former throws exception and kills the thread, and thus has scalability issue.
So I want to know the performance differences between the two ways, in numerical values.
I created a asp.net web page, whose only code is Response.Redirect.
Then I wrote this console application to issue requests to the page.
private const int concurrentRequests = 800;
static void Main(string[] args)
{
Console.WriteLine("Type in the URL:");
var url = Console.ReadLine();
Console.WriteLine($"concurrentRequests={concurrentRequests}");
ServicePointManager.DefaultConnectionLimit = concurrentRequests;
List<Task> tasks = new List<Task>(concurrentRequests);
Stopwatch watch = new Stopwatch();
watch.Start();
for (int i = 0; i < concurrentRequests; i++)
{
Task t = new Task(o => GetResponse((string)o), url);
tasks.Add(t);
t.Start();
}
Task.WaitAll(tasks.ToArray());
watch.Stop();
Console.WriteLine($"Execution time: {watch.ElapsedMilliseconds}");
Console.ReadKey();
}
static void GetResponse(string url)
{
var request =(HttpWebRequest) WebRequest.Create(url);
request.AllowAutoRedirect = false;
var response = request.GetResponse();
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
var content = sr.ReadToEnd();
}
}
I also reduced asp.net threads to 4 in machine.config.
However, it turns out Response.Redirect(url) takes 350ms to execute while Response.Redirect(url,false) takes 415ms.
Why doesn't the result conform to the theory in the articles?
There are one more important issue here and this is the security. For the moment let me say why the one is faster than the other.
The slow and unsecured one case Response.Redirect(url,endRespose:false)
It is slower because using the false parameter you allow the full cycle of the page to run and render on client ! together with the redirect command ! So the page is normally runs and almost renders on client page - and for that reason is slower.
The other case that is throw an exception is stop the rest processing and stop further rendering of page - and left only the redirect command - and what you have render on page up to that time.
On this question and answer of mine I analyse and prove that if you stop the redirect of the page, you can see the full result of the page - if you have set redirect( ,false) with the false option.
Redirect to a page with endResponse to true VS CompleteRequest and security thread
To close, I always use the redirect with stop of the rest process of the page - but if I use the other one that allow the rest of the processing I have on my mind all the above - that the page is rendered and the extra time that's take.

c# How to load test a webservice

I need to test if there's any memory leak in our application and monitor to see if memory usage increases too much while processing the requests.
I'm trying to develop some code to make multiple simultaneous calls to our api/webservice method. This api method is not asynchronous and takes some time to complete its operation.
I've made a lot of research about Tasks, Threads and Parallelism, but so far I had no luck. The problem is, even after trying all the below solutions, the result is always the same, it appears to be processing only two requests at the time.
Tried:
-> Creating tasks inside a simple for loop and starting them with and without setting them with TaskCreationOptions.LongRunning
-> Creating threads inside a simple for loop and starting them with and without high priority
-> Creating a list of actions on a simple for loop and starting them using
Parallel.Foreach(list, options, item => item.Invoke)
-> Running directly inside a Parallel.For loop (below)
-> Running TPL methods with and without Options and TaskScheduler
-> Tried with different values for MaxParallelism and maximum threads
-> Checked this post too, but it didn't help either. (Could I be missing something?)
-> Checked some other posts here in Stackoverflow, but with F# solutions that I don't know how to properly translate them to C#. (I never used F#...)
(Task Scheduler class taken from msdn)
Here's the basic structure that I have:
public class Test
{
Data _data;
String _url;
public Test(Data data, string url)
{
_data = data;
_url = url;
}
public ReturnData Execute()
{
ReturnData returnData;
using(var ws = new WebService())
{
ws.Url = _url;
ws.Timeout = 600000;
var wsReturn = ws.LongRunningMethod(data);
// Basically convert wsReturn to my method return, with some logic if/else etc
}
return returnData;
}
}
sealed class ThreadTaskScheduler : TaskScheduler, IDisposable
{
// The runtime decides how many tasks to create for the given set of iterations, loop options, and scheduler's max concurrency level.
// Tasks will be queued in this collection
private BlockingCollection<Task> _tasks = new BlockingCollection<Task>();
// Maintain an array of threads. (Feel free to bump up _n.)
private readonly int _n = 100;
private Thread[] _threads;
public TwoThreadTaskScheduler()
{
_threads = new Thread[_n];
// Create unstarted threads based on the same inline delegate
for (int i = 0; i < _n; i++)
{
_threads[i] = new Thread(() =>
{
// The following loop blocks until items become available in the blocking collection.
// Then one thread is unblocked to consume that item.
foreach (var task in _tasks.GetConsumingEnumerable())
{
TryExecuteTask(task);
}
});
// Start each thread
_threads[i].IsBackground = true;
_threads[i].Start();
}
}
// This method is invoked by the runtime to schedule a task
protected override void QueueTask(Task task)
{
_tasks.Add(task);
}
// The runtime will probe if a task can be executed in the current thread.
// By returning false, we direct all tasks to be queued up.
protected override bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued)
{
return false;
}
public override int MaximumConcurrencyLevel { get { return _n; } }
protected override IEnumerable<Task> GetScheduledTasks()
{
return _tasks.ToArray();
}
// Dispose is not thread-safe with other members.
// It may only be used when no more tasks will be queued
// to the scheduler. This implementation will block
// until all previously queued tasks have completed.
public void Dispose()
{
if (_threads != null)
{
_tasks.CompleteAdding();
for (int i = 0; i < _n; i++)
{
_threads[i].Join();
_threads[i] = null;
}
_threads = null;
_tasks.Dispose();
_tasks = null;
}
}
}
And the test code itself:
private void button2_Click(object sender, EventArgs e)
{
var maximum = 100;
var options = new ParallelOptions
{
MaxDegreeOfParallelism = 100,
TaskScheduler = new ThreadTaskScheduler()
};
// To prevent UI blocking
Task.Factory.StartNew(() =>
{
Parallel.For(0, maximum, options, i =>
{
var data = new Data();
// Fill data
var test = new Test(data, _url); //_url is pre-defined
var ret = test.Execute();
// Check return and display on screen
var now = DateTime.Now.ToString("HH:mm:ss");
var newText = $"{Environment.NewLine}[{now}] - {ret.ReturnId}) {ret.ReturnDescription}";
AppendTextBox(newText, ref resultTextBox);
}
}
public void AppendTextBox(string value, ref TextBox textBox)
{
if (InvokeRequired)
{
this.Invoke(new ActionRef<string, TextBox>(AppendTextBox), value, textBox);
return;
}
textBox.Text += value;
}
And the result that I get is basically this:
[10:08:56] - (0) OK
[10:08:56] - (0) OK
[10:09:23] - (0) OK
[10:09:23] - (0) OK
[10:09:49] - (0) OK
[10:09:50] - (0) OK
[10:10:15] - (0) OK
[10:10:16] - (0) OK
etc
As far as I know there's no limitation on the server side. I'm relatively new to the Parallel/Multitasking world. Is there any other way to do this? Am I missing something?
(I simplified all the code for clearness and I believe that the provided code is enough to picture the mentioned scenarios. I also didn't post the application code, but it's a simple WinForms screen just to call and show results. If any code is somehow relevant, please let me know, I can edit and post it too.)
Thanks in advance!
EDIT1: I checked on the server logs that it's receiving the requests two by two, so it's indeed something related to sending them, not receiving.
Could it be a network problem/limitation related to how the framework manages the requests/connections? Or something with the network at all (unrelated to .net)?
EDIT2: Forgot to mention, it's a SOAP webservice.
EDIT3: One of the properties that I send (inside data) needs to change for each request.
EDIT4: I noticed that there's always an interval of ~25 secs between each pair of request, if it's relevant.
I would recommend not to reinvent the wheel and just use one of the existing solutions:
Most obvious choice: if your Visual Studio license allows you can use MS Load Testing Framework, most likely you won't even have to write a single line of code: How to: Create a Web Service Test
SoapUI is a free and open source web services testing tool, it has some limited load testing capabilities
If for some reasons SoapUI is not suitable (i.e. you need to run load tests in clustered mode from several hosts or you need more enhanced reporting) you can use Apache JMeter - free and open source multiprotocol load testing tool which supports web services load testing as well.
A good solution to create load tests without write a own project is use this service https://loader.io/targets
It is free for small tests, you can POST Parameters, Header,... and you have a nice reporting.
Isnt the "two requests at a time" the result of the default maxconnection=2 limit on connectionManagement?
<configuration>
<system.net>
<connectionManagement>
<add address = "http://www.contoso.com" maxconnection = "4" />
<add address = "*" maxconnection = "2" />
</connectionManagement>
</system.net>
</configuration>
My favorite load testing library is NBomber. It has an easy and powerful API, realistic user simulations, and provides you with nice HTML reports about latency and requests per second.
I used it to test my API and wrote an article about how I did it.

Closing WCF Service from Async method?

I have a service layer project on an MVC 5 ASP.NET application I am creating on .NET 4.5.2 which calls out to an External 3rd Party WCF Service to Get Information asynchronously. An original method to call external service was as below (there are 3 of these all similar in total which I call in order from my GetInfoFromExternalService method (note it isnt actually called that - just naming it for illustration)
private async Task<string> GetTokenIdForCarsAsync(Car[] cars)
{
try
{
if (_externalpServiceClient == null)
{
_externalpServiceClient = new ExternalServiceClient("WSHttpBinding_IExternalService");
}
string tokenId= await _externalpServiceClient .GetInfoForCarsAsync(cars).ConfigureAwait(false);
return tokenId;
}
catch (Exception ex)
{
//TODO plug in log 4 net
throw new Exception("Failed" + ex.Message);
}
finally
{
CloseExternalServiceClient(_externalpServiceClient);
_externalpServiceClient= null;
}
}
So that meant that when each async call had completed the finally block ran - the WCF client was closed and set to null and then newed up when another request was made. This was working fine until a change needed to be made whereby if the number of cars passed in by User exceeds 1000 I create a Split Function and then call my GetInfoFromExternalService method in a WhenAll with each 1000 - as below:
if (cars.Count > 1000)
{
const int packageSize = 1000;
var packages = SplitCarss(cars, packageSize);
//kick off the number of split packages we got above in Parallel and await until they all complete
await Task.WhenAll(packages.Select(GetInfoFromExternalService));
}
However this now falls over as if I have 3000 cars the method call to GetTokenId news up the WCF service but the finally blocks closes it so the second batch of 1000 that is attempting to be run throws an exception. If I remove the finally block the code works ok - but it is obviously not good practice to not be closing this WCF client.
I had tried putting it after my if else block where the cars.count is evaluated - but if a User uploads for e.g 2000 cars and that completes and runs in say 1 min - in the meantime as the user had control in the Webpage they could upload another 2000 or another User could upload and again it falls over with an Exception.
Is there a good way anyone can see to correctly close the External Service Client?
Based on the related question of yours, your "split" logic doesn't seem to give you what you're trying to achieve. WhenAll still executes requests in parallel, so you may end up running more than 1000 requests at any given moment of time. Use SemaphoreSlim to throttle the number of simultaneously active requests and limit that number to 1000. This way, you don't need to do any splits.
Another issue might be in how you handle the creation/disposal of ExternalServiceClient client. I suspect there might a race condition there.
Lastly, when you re-throw from the catch block, you should at least include a reference to the original exception.
Here's how to address these issues (untested, but should give you the idea):
const int MAX_PARALLEL = 1000;
SemaphoreSlim _semaphoreSlim = new SemaphoreSlim(MAX_PARALLEL);
volatile int _activeClients = 0;
readonly object _lock = new Object();
ExternalServiceClient _externalpServiceClient = null;
ExternalServiceClient GetClient()
{
lock (_lock)
{
if (_activeClients == 0)
_externalpServiceClient = new ExternalServiceClient("WSHttpBinding_IExternalService");
_activeClients++;
return _externalpServiceClient;
}
}
void ReleaseClient()
{
lock (_lock)
{
_activeClients--;
if (_activeClients == 0)
{
_externalpServiceClient.Close();
_externalpServiceClient = null;
}
}
}
private async Task<string> GetTokenIdForCarsAsync(Car[] cars)
{
var client = GetClient();
try
{
await _semaphoreSlim.WaitAsync().ConfigureAwait(false);
try
{
string tokenId = await client.GetInfoForCarsAsync(cars).ConfigureAwait(false);
return tokenId;
}
catch (Exception ex)
{
//TODO plug in log 4 net
throw new Exception("Failed" + ex.Message, ex);
}
finally
{
_semaphoreSlim.Release();
}
}
finally
{
ReleaseClient();
}
}
Updated based on the comment:
the External WebService company can accept me passing up to 5000 car
objects in one call - though they recommend splitting into batches of
1000 and run up to 5 in parallel at one time - so when I mention 7000
- I dont mean GetTokenIdForCarAsync would be called 7000 times - with my code currently it should be called 7 times - i.e giving me back 7
token ids - I am wondering can I use your semaphore slim to run first
5 in parallel and then 2
The changes are minimal (but untested). First:
const int MAX_PARALLEL = 5;
Then, using Marc Gravell's ChunkExtension.Chunkify, we introduce GetAllTokenIdForCarsAsync, which in turn will be calling GetTokenIdForCarsAsync from above:
private async Task<string[]> GetAllTokenIdForCarsAsync(Car[] cars)
{
var results = new List<string>();
var chunks = cars.Chunkify(1000);
var tasks = chunks.Select(chunk => GetTokenIdForCarsAsync(chunk)).ToArray();
await Task.WhenAll(tasks);
return tasks.Select(task => task.Result).ToArray();
}
Now you can pass all 7000 cars into GetAllTokenIdForCarsAsync. This is a skeleton, it can be improved with some retry logic if any of the batch requests has failed (I'm leaving that up to you).

Categories

Resources