Inconsistent outcome when trying to fetch all items from a paginated url

Inconsistent outcome when trying to fetch all items from a paginated url - c#

I am having the issue of fetching all forms currently hosted in hubspot.
I tried with a simple for loop where I made one request at the time, and fetched one form at a time, which worked, but was very slow.
I then thought it might work better if I created a seperate task for each request, and then made the task create each request, and store them in one common list.
Problem is that I expect the list to have 2000 items, but I never seem to get that, it seem pretty inconsistent the number of items I get?
But how come?
This is how I have setup my for fetching scheme.
private static async Task<IEnumerable<HubspotModel>> GetForms(
string hubspotPath, int pageSize)
{
int totalResults;
int offset = 0;
List<HubspotModel> output = new();
List<Task> tasks = new();
using var client = new HttpClient();
{
System.Net.Http.Headers.HttpResponseHeaders requestHeader = client
.GetAsync($"https://api.hubapi.com{hubspotPath}?" +
$"hapikey={HubspotConfiguration.ApiKey}&limit={1}&offset={0}")
.Result.Headers;
totalResults = int.Parse(requestHeader.GetValues("x-total-count").First());
do
{
tasks.Add(Task.Run(() =>
{
int scopedOffset = offset;
IEnumerable<HubspotModel> forms = GetFormsFromHubspot(hubspotPath,
pageSize, offset, client);
output.AddRange(forms);
}).ContinueWith(requestReponse =>
{
if (requestReponse.IsFaulted)
{
Console.WriteLine("it failed");
}
}));
offset += pageSize;
}
while (totalResults > offset);
await Task.WhenAll(tasks);
}
return output;
}
private static IEnumerable<HubspotModel> GetFormsFromHubspot(string hubspotPath,
int pageSize, int offset, HttpClient client)
{
HttpResponseMessage request = client
.GetAsync($"https://api.hubapi.com{hubspotPath}?" +
$"hapikey={HubspotConfiguration.ApiKey}&limit={pageSize}&offset={offset}")
.Result;
request.EnsureSuccessStatusCode();
string content = request.Content.ReadAsStringAsync().Result;
IEnumerable<Dictionary<string, object>> jsonResponse = JsonSerializer
.Deserialize<IEnumerable<Dictionary<string, object>>>(content,
new JsonSerializerOptions() { });
var guid = Guid.Parse(jsonResponse.First()["guid"].ToString());
var forms = jsonResponse.Select(x => new HubspotModel()
{
id = Guid.Parse(x["guid"].ToString()),
FormName = x["name"].ToString(),
Form = x
});
return forms;
}

First of all, I'd suggest to make GetFormsFromHotspot an async as well and use await client.GetAsync( ...) and await request.Content.ReadAsStringAsync() instead of client.GetAsync(...).Result and ReadAsStringAsync().Result respectively, because using .Result will block the current thread and thus, you will throw away the advantages of async Tasks.
But the main cause of the problem should be the following
GetFormsFromHubspot(hubspotPath, pageSize, offset, client);
Here you are calling the GetFormsFromHubspot with an offset parameter from an outer scope (and that value keeps changing), thus it will not use the value it had when you created that task but it uses the value it actually has, when that particular part of the code is really executed. So the value that is used as an offset is quite random. You already tried to create a
int scopedOffset = offset;
but you don't use it. And also you create it at the wrong position. Create that scopedOffset outside of the task, but inside the loop's body. So it will be created at the creationtime of the task. And because it's inside the loop's body, a new value will be created for each task.
The following should do the trick (after you refactor GetFormsFromHubspot to be async.
do {
int scopedOffset = offset
tasks.Add(Task.Run(async () => {
IEnumerable<HubspotModel> forms = await GetFormsFromHubspot(hubspotPath, pageSize, scopedOffset, client);
output.AddRange(forms);
})
.ContinueWith(...);
);
offset += pageSize;
} while (totalResults > offset);

The main problem with your code is that the List<T> is not thread-safe. When multiple threads are Adding to a list concurrently without synchronization, its behavior becomes undefined (throws exceptions, becomes corrupted etc). There are many ways to solve this problem:
Synchronize the access to the list with the lock statement: lock (output) output.AddRange(forms);.
Use a concurrent collection instead of the List<T>, for example a ConcurrentQueue<T>.
Avoid collecting manually the output altogether. Instead of storing your tasks in a List<Task>, you can store them in a List<Task<HubspotModel[]>>, meaning that each task will be a generic Task<TResult>, with the TResult being an array of HubspotModel instances. Finally you will get all the output at once, when you await the Task.WhenAll.
Below is an implementation of the third idea. Notice that I have avoided creating a HttpClient instance, because the recommendation is to instantiated this class only once, and reuse it throughout the life of the application.
private static async Task<HubspotModel[]> GetFormsAsync(HttpClient client,
string hubspotPath, int pageSize)
{
string url = $"https://api.hubapi.com{hubspotPath}?hapikey=" +
$"{HubspotConfiguration.ApiKey}&limit={1}&offset={0}";
HttpResponseMessage response = await client.GetAsync(url)
.ConfigureAwait(false);
response.EnsureSuccessStatusCode();
int totalCount = Int32.Parse(response.Headers
.GetValues("x-total-count").First());
List<int> offsets = new();
for (int offset = 0; offset < totalCount; offset += pageSize)
offsets.Add(offset);
Task<HubspotModel[]>[] tasks = offsets.Select(offset => Task.Run(async () =>
{
HubspotModel[] forms = await GetFormsAsync(client,
hubspotPath, pageSize, offset).ConfigureAwait(false);
return forms;
})).ToArray();
HubspotModel[][] results = await Task.WhenAll(tasks).ConfigureAwait(false);
return results.SelectMany(x => x).ToArray();
}
private async static Task<HubspotModel[]> GetFormsAsync(HttpClient client,
string hubspotPath, int pageSize, int offset)
{
string url = $"https://api.hubapi.com{hubspotPath}?hapikey=" +
$"{HubspotConfiguration.ApiKey}&limit={pageSize}&offset={offset}";
HttpResponseMessage response = await client.GetAsync(url)
.ConfigureAwait(false);
response.EnsureSuccessStatusCode();
string content = await response.Content.ReadAsStringAsync()
.ConfigureAwait(false);
IEnumerable<Dictionary<string, object>> jsonResponse = JsonSerializer
.Deserialize<IEnumerable<Dictionary<string, object>>>(content,
new JsonSerializerOptions() { });
Guid guid = Guid.Parse(jsonResponse.First()["guid"].ToString());
HubspotModel[] forms = jsonResponse.Select(x => new HubspotModel()
{
Id = Guid.Parse(x["guid"].ToString()),
FormName = x["name"].ToString(),
Form = x
}).ToArray();
return forms;
}
One more improvement that you could consider doing is to switch from the Task.WhenAll to the new (.NET 6) API Parallel.ForEachAsync. The advantage is that you'll get control over the degree of parallelism, and so you'll be able to reduce the parallelization in case the remote server can't keep up with the pressure. Unfortunatelly the Parallel.ForEachAsync method does not return the results like the Task.WhenAll, so you'll be back to your original problem. You can find a solution about this here: ForEachAsync with Result.

Related

Asynchronously download and compile list of JsonDocument

I'm a little new (returning after a couple of decades) to C# and to the async/await model of programming. Looking for a little guidance, since I received an understandable warning CS1998 that the asynchronous method lacks await and operators and will run synchronously.
The code below I think is straightforward - the server API returns data in pages of 25 items. I'm using a continuation to add each page of 25 to a List of JsonDocuments. Calling code will handle the parsing as needed. I'm not sure how I could reasonably leverage anything further in this, but looking for any suggestions/guidance.
internal static async Task<List<JsonDocument>> Get_All_Data(HttpClient client, string endpoint)
{
Console.WriteLine("Downloading all data from {0}{1}", client.BaseAddress, endpoint);
var all_pages = new List<JsonDocument>();
// Get first page to determine total number of pages
HttpResponseMessage response = client.GetAsync(endpoint).Result;
Console.WriteLine("Initial download complete - parsing headers to determine total pages");
//int items_per_page;
if (int.TryParse(Get_Header_Value("X-Per-Page", response.Headers), out int items_per_page) == false)
// throw new Exception("Response missing X-Per-Page in header");
items_per_page = 25;
if (int.TryParse(Get_Header_Value("X-Total-Count", response.Headers), out int total_items) == false)
//throw new Exception("Response missing X-Total-Count in header");
total_items = 1;
// Divsion returns number of complete pages, add 1 for partial IF total items_json is not an exact multiple of items_per_page
var total_pages = total_items / items_per_page;
if ((total_items % items_per_page) != 0) total_pages++;
Console.WriteLine("{0} pages to be downloaded", total_pages);
var http_tasks = new Task[total_pages];
for (int i = 1; i <= total_pages; i++)
{
Console.WriteLine("Downloading page {0}", i);
var paged_endpoint = endpoint + "?page=" + i;
response = client.GetAsync(paged_endpoint).Result;
http_tasks[i - 1] = response.Content.ReadAsStringAsync().ContinueWith((_content) => { all_pages.Add(JsonDocument.Parse(_content.Result)); }); ;
//http_tasks[i].ContinueWith((_content) => { all_pages.Add(JsonDocument.Parse_List(_content.Result)); });
}
System.Threading.Tasks.Task.WaitAll(http_tasks); // wait for all of the downloads and parsing to complete
return all_pages;
}
Thanks for your help

My suggestion is to await all asynchronous operations, and use the Parallel.ForEachAsync method to parallelize the downloading of the JSON documents, while maintaining control of the degree of parallelism:
static async Task<JsonDocument[]> GetAllData(HttpClient client, string endpoint)
{
HttpResponseMessage response = await client.GetAsync(endpoint);
response.EnsureSuccessStatusCode();
if (!Int32.TryParse(GetHeaderValue(response, "X-Total-Count"),
out int totalItems) || totalItems < 0)
totalItems = 1;
if (!Int32.TryParse(GetHeaderValue(response, "X-Per-Page"),
out int itemsPerPage) || itemsPerPage < 1)
itemsPerPage = 25;
int totalPages = ((totalItems - 1) / itemsPerPage) + 1;
JsonDocument[] results = new JsonDocument[totalPages];
ParallelOptions options = new() { MaxDegreeOfParallelism = 5 };
await Parallel.ForEachAsync(Enumerable.Range(1, totalPages), options,
async (page, ct) =>
{
string pageEndpoint = endpoint + "?page=" + page;
HttpResponseMessage pageResponse = await client
.GetAsync(pageEndpoint, ct);
pageResponse.EnsureSuccessStatusCode();
string pageContent = await response.Content.ReadAsStringAsync(ct);
JsonDocument result = JsonDocument.Parse(pageContent);
results[page - 1] = result;
});
return results;
}
static string GetHeaderValue(HttpResponseMessage response, string name)
=> response.Headers.TryGetValues(name, out var values) ?
values.FirstOrDefault() : null;
The MaxDegreeOfParallelism is configured to the value 5 for demonstration purposes. You can find the optimal degree of parallelism by experimenting with your API. Setting the value too low might result in mediocre performance. Setting the value too high might overburden the target server, and potentially trigger an anti-DoS-attack mechanism.
If you are not familiar with the Enumerable.Range, it is a LINQ method that returns an incremented numeric sequence of integers that starts from start, and contains count elements.
The GetAllData is an asynchronous method and it is supposed to be awaited. If you are calling it without await, and your application is a UI application like WinForms or WPF, you are at risk of experiencing a deadlock. Don't panic, it happens consistently, and you'll observe it during the testing. One way to prevent it is to append .ConfigureAwait(false) to all awaited operations inside the GetAllData method.

Calling HttpClient and getting identical results from paged requests - is it me or the service?

I am sending five HttpClient requests to the same URL, but with a varying page number parameter. They all fire async, and then I await for them all to finish using Tasks.WaitAll(). My requests are using System.Net.Http.HttpClient.
This mostly works fine, and I get five distinct results representing each page of the data about 99% of the time.
But every so often, and I have not dug into deep analysis yet, I get the exact same response for each task. Each task does indeed instantiate its own HttpClient. When I was reusing one client instance, I got this problem. But since I started instantiating new clients for every call, the problem went away.
I am calling a 3rd party web service over which I have no control. So before nagging their team too much about this, I do want to know if I may be doing something wrong here, or if there is some aspect of HttpClient ot Task that I'm missing.
Here is the calling code:
for (int i = 1; i <= 5; i++)
{
page = load_made + i;
var t_page = page;
var t_url = url;
var task = new Task<List<T>>(() => DoPagedLoad<T>(t_page, per_page, t_url));
task.Run();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
Here is the code in the DoPagedLoad, which returns a Task:
var client = new HttpClient();
var response = client.GetAsync(url).Result;
var results = response.Content.ReadAsStringAsync().Result();
I would appreciate any help from folks familiar with the possible quirks of Task and HttpClient
NOTE: Run is an extension method to help with async exceptions.
public static Task Run(this Task task)
{
task.Start();
task.ContinueWith(t =>
{
if(t.Exception != null)
Log.Error(t.Exception.Flatten().ToString());
});
return task;
}

It's hard to give a definitive answer because we don't have all the detail but here's a sample implementation of how you should fire off HTTP requests. Notice that all async operations are awaited - Result and Wait / WaitAll are not used. You should almost never need / use any of those - they block synchronously and can create problems.
Also notice that there are no global cookie containers, default headers, etc. defined for the HTTP client. If you need any of that stuff, just create individial HttpRequestMessage objects and add whatever headers you need to add. Don't use the global properties - it's a lot cleaner to just set per-request properties.
// Globally defined HTTP client.
private static readonly HttpClient _httpClient = new HttpClient();
// Other stuff here...
private async Task SomeFunctionToGetContent()
{
var requestTasks = new List<Task<HttpResponseMessage>>();
var responseTasks = new List<Task>();
for (var i = 0; i < 5; i++)
{
// Fake URI but still based on the counter (or other
// variable, similar to page in the question)
var uri = new Uri($"https://.../{i}.html");
requestTasks.Add(_httpClient.GetAsync(uri));
}
await (Task.WhenAll(requestTasks));
for (var i = 0; i < 5; i++)
{
var response = await (requestTasks[i]);
responseTasks.Add(HandleResponse(response));
}
await (Tasks.WhenAll(responseTasks));
}
private async Task HandleResponse(HttpResponseMessage response)
{
try
{
if (response.Content != null)
{
var content = await (response.Content.ReadAsStringAsync());
// do something with content here; check IsSuccessStatusCode to
// see if the request failed or succeeded
}
else
{
// Do something when no content
}
}
finally
{
response.Dispose();
}
}

Parallel.For and httpclient crash the application C#

I want to avoid application crashing problem due to parallel for loop and httpclient but I am unable to apply solutions that are provided elsewhere on the web due to my limited knowledge of programming. My code is pasted below.
class Program
{
public static List<string> words = new List<string>();
public static int count = 0;
public static string output = "";
private static HttpClient Client = new HttpClient();
public static void Main(string[] args)
{
//input path strings...
List<string> links = new List<string>();
links.AddRange(File.ReadAllLines(input));
List<string> longList = new List<string>(File.ReadAllLines(#"a.txt"));
words.AddRange(File.ReadAllLines(output1));
System.Net.ServicePointManager.DefaultConnectionLimit = 8;
count = longList.Count;
//for (int i = 0; i < longList.Count; i++)
Task.Run(() => Parallel.For(0, longList.Count, new ParallelOptions { MaxDegreeOfParallelism = 5 }, (i, loopState) =>
{
Console.WriteLine(i);
string link = #"some link" + longList[i] + "/";
try
{
if (!links.Contains(link))
{
Task.Run(async () => { await Download(link); }).Wait();
}
}
catch (System.Exception e)
{
}
}));
//}
}
public static async Task Download(string link)
{
HtmlAgilityPack.HtmlDocument document = new HtmlDocument();
document.LoadHtml(await getURL(link));
//...stuff with html agility pack
}
public static async Task<string> getURL(string link)
{
string result = "";
HttpResponseMessage response = await Client.GetAsync(link);
Console.WriteLine(response.StatusCode);
if(response.IsSuccessStatusCode)
{
HttpContent content = response.Content;
var bytes = await response.Content.ReadAsByteArrayAsync();
result = Encoding.UTF8.GetString(bytes);
}
return result;
}
}
There are solutions for example this one, but I don't know how to put await keyword in my main method, and currently the program simply exits due to its absence before Task.Run(). As you can see I have already applied a workaround regarding async Download() method to call it in main method.
I have also doubts regarding the use of same instance of httpclient in different parallel threads. Please advise me whether I should create new instance of httpclient each time.

You're right that you have to block tasks somewhere in a console application, otherwise the program will just exit before it's complete. But you're doing this more than you need to. Aim for just blocking the main thread and delegating the rest to an async method. A good practice is to create a method with a signature like private async Task MainAsyc(args), put the "guts" of your program logic there, call it from Main like this:
MainAsync(args).Wait();
In your example, move everything from Main to MainAsync. Then you're free to use await as much as you want. Task.Run and Parallel.For are explicitly consuming new threads for I/O bound work, which is unnecessary in the async world. Use Task.WhenAll instead. The last part of your MainAsync method should end up looking something like this:
await Task.WhenAll(longList.Select(async s => {
Console.WriteLine(i);
string link = #"some link" + s + "/";
try
{
if (!links.Contains(link))
{
await Download(link);
}
}
catch (System.Exception e)
{
}
}));
There is one little wrinkle here though. Your example is throttling the parallelism at 5. If you find you still need this, TPL Dataflow is a great library for throttled parallelism in the async world. Here's a simple example.
Regarding HttpClient, using a single instance across threads is completely safe and highly encouraged.

Can this c# async process be more performant?

I am working on a program which makes multiple json calls to retrieve it's data.
The data however is pretty big and when running it without async it takes 17 hours to fully process.
The fetching of the data goes as follows:
Call to a service with a page number (2000 pages in total to be processed), which returns 200 records per page.
For each record it returns, an other service needs to be called to receive the data for the current record.
I'm new to the whole async functionality and I've made an attempt using async and await and already made a performance boost but was wondering if this is the correct way of using it and if there are any other ways to increase performance?
This is the code I currently have:
static void Main(string[] args)
{
MainAsyncCall().Wait();
Console.ReadKey();
}
public static async Task MainAsyncCall()
{
ServicePointManager.DefaultConnectionLimit = 999999;
List<Task> allPages = new List<Task>();
for (int i = 0; i <= 10; i++)
{
var page = i;
allPages.Add(Task.Factory.StartNew(() => processPage(page)));
}
Task.WaitAll(allPages.ToArray());
Console.WriteLine("Finished all pages");
}
public static async Task processPage(Int32 page)
{
List<Task> players = new List<Task>();
using (var client = new HttpClient())
{
string url = "<Request URL>";
var response = client.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
dynamic item = Newtonsoft.Json.JsonConvert.DeserializeObject(content);
dynamic data = item.data;
var localPage = page;
Console.WriteLine($"Processing Page: {localPage}");
foreach (dynamic d in data)
{
players.Add(Task.Factory.StartNew(() => processPlayer(d, localPage)));
}
}
Task.WaitAll(players.ToArray());
Console.WriteLine($"Finished Page: {page}");
}
public static async Task processPlayer(dynamic player, int page)
{
using (var client = new HttpClient())
{
string url = "<Request URL>";
HttpResponseMessage response = null;
response = client.GetAsync(url).Result;
var content = await response.Content.ReadAsStringAsync();
dynamic item = Newtonsoft.Json.JsonConvert.DeserializeObject(content);
Console.WriteLine($"{page}: Processed {item.name}");
}
}
Any suggestion is welcome!

This is what it should look like to me:
static void Main(string[] args)
{
// it's okay here to use wait because we're at the root of the application
new AsyncServerCalls().MainAsyncCall().Wait();
Console.ReadKey();
}
public class AsyncServerCalls
{
// dont use static async methods
public async Task MainAsyncCall()
{
ServicePointManager.DefaultConnectionLimit = 999999;
List<Task> allPages = new List<Task>();
for (int i = 0; i <= 10; i++)
{
var page = i;
allPages.Add(processPage(page));
}
await Task.WhenAll(allPages.ToArray());
Console.WriteLine("Finished all pages");
}
public async Task processPage(Int32 page)
{
List<Task> players = new List<Task>();
using (var client = new HttpClient())
{
string url = "<Request URL>";
var response = await client.GetAsync(url)// nope .Result;
var content = await response.Content.ReadAsStringAsync(); // again never use .Result;
dynamic item = Newtonsoft.Json.JsonConvert.DeserializeObject(content);
dynamic data = item.data;
var localPage = page;
Console.WriteLine($"Processing Page: {localPage}");
foreach (dynamic d in data)
{
players.Add(processPlayer(d, localPage)); // no need to put the task unnecessarily on a different thread, let the current SynchronisationContext deal with that
}
}
await Task.WhenAll(players.ToArray()); // always await a task in an async method
Console.WriteLine($"Finished Page: {page}");
}
public async Task processPlayer(dynamic player, int page)
{
using (var client = new HttpClient())
{
string url = "<Request URL>";
HttpResponseMessage response = null;
response = await client.GetAsync(url); // don't use .Result;
var content = await response.Content.ReadAsStringAsync();
dynamic item = Newtonsoft.Json.JsonConvert.DeserializeObject(content);
Console.WriteLine($"{page}: Processed {item.name}");
}
}
}
So basially the points here are to make sure you let the SynchronisationContext do it's job. Inside a console program it should use the TaskSchedular.Default which is a ThreadPool SynchronisationContext. You can always force this by doing:
static void Main(string[] args)
{
Task.Run(() => new AsyncServerCalls().MainAsyncCall()).Wait();
Console.ReadKey();
}
Reference to Task.Run forcing Default
One thing you need to remember, which I got into trouble with last week is that you can fire hose the thread pool, i.e. spawn so many tasks that the your process just dies with insane CPU and Memory usage. So you may need to use a Semaphore to just limit the number of threads that going to be created.
I created a solution that processes a single file in multiple parts all at the same time Parallel Read it is still being worked on, but shows the uses of async stuff
Just to clarify the parallelism.
When you take a reference to all those tasks:
allPages.Add(processPage(page));
They all will be started.
When you do:
await Task.WhenAll(allPages);
This will block the current method execution until all those page processes have been executed (it won't block the current thread though, don't get these confused)
Danger Zone
If you don't want to block method execution on
Task.WhenAll
So, you can parallel all page processes for each page, then you can add that Task to an overall List<Task>.
However, the danger with this is the fire hosing... You are going to limit the number of threads you execute at some point, so where.... well that is up to you but just remember, it will happen at some point.

Throttling asynchronous tasks in asp .net, with a limit on N successful tasks

I'm using Asp .Net 4.5.1.
I have tasks to run, which call a web-service, and some might fail. I need to run N successful tasks which perform some light CPU work and mainly call a web service, then stop, and I want to throttle.
For example, let's assume we have 300 URLs in some collection. We need to run a function called Task<bool> CheckUrlAsync(url) on each of them, with throttling, meaning, for example, having only 5 run "at the same time" (in other words, have maximum 5 connections used at any given time). Also, we only want to perform N (assume 100) successful operations, and then stop.
I've read this and this and still I'm not sure what would be the correct way to do it.
How would you do it?
Assume ASP .Net
Assume IO call (http call to web serice), no heavy CPU operations.

Use Semaphore slim.
var semaphore = new SemaphoreSlim(5);
var tasks = urlCollection.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
};
while(tasks.Where(t => t.Completed).Count() < 100)
{
await.Task.WhenAny(tasks);
}
Although I would prefer to use Rx.Net to produce some better code.
using(var semaphore = new SemaphoreSlim(5))
{
var results = urlCollection.ToObservable()
.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
}).Take(100).ToList();
}
Okay...this is going to be fun.
public static class SemaphoreHelper
{
public static Task<T> ContinueWith<T>(
this SemaphoreSlim semaphore,
Func<Task<T>> action)
var ret = semaphore.WaitAsync()
.ContinueWith(action);
ret.ContinueWith(_ => semaphore.Release(), TaskContinuationOptions.None);
return ret;
}
var semaphore = new SemaphoreSlim(5);
var results = urlCollection.Select(
url => semaphore.ContinueWith(() => CheckUrlAsync(url)).ToList();
I do need to add that the code as it stands will still run all 300 URLs, it just will return quicker...thats all. You would need to add the cancelation token to the semaphore.WaitAsync(token) to cancel the queued work. Again I suggest using Rx.Net for that. Its just easier to use Rx.Net to get the cancelation token to work with .Take(100).

Try something like this?
private const int u_limit = 100;
private const int c_limit = 5;
List<Task> tasks = new List<Task>();
int totalRun = 0;
while (totalRun < u_limit)
{
for (int i = 0; i < c_limit; i++)
{
tasks.Add(Task.Run (() => {
// Your code here.
}));
}
Task.WaitAll(tasks);
totalRun += c_limit;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Inconsistent outcome when trying to fetch all items from a paginated url - c#

Related

Asynchronously download and compile list of JsonDocument

Calling HttpClient and getting identical results from paged requests - is it me or the service?

Parallel.For and httpclient crash the application C#

Can this c# async process be more performant?

Throttling asynchronous tasks in asp .net, with a limit on N successful tasks

Categories

Resources