AFAIK some methods in the .Net library are able to do I/O jobs asynchronously without consuming a thread from the pool.
If my information are correct the WebClient *Async methods do that.
I'd like to verify it by checking that effectively threads from the pool are not used during a download.
So my general question is : how can I monitor the current state of the thread-pool?
number of threads
number of busy threads
Is there some API (GetAvailableThreads?) or performance counters that would give this information?
EDIT: here are some more details
I'm writing a simple benchmark for educational purposes:
string[] urls = Enumerable.Repeat("http://google.com", 32).ToArray();
/*{
"http://google.com",
"http://yahoo.com",
"http://microsoft.com",
"http://wikipedia.com",
"http://cnn.com",
"http://facebook.com",
"http://youtube.com",
"http://twitter.com"
};*/
/*Task.Run(() =>
{
while (true)
{
int wt, cpt;
ThreadPool.GetAvailableThreads(out wt, out cpt);
Console.WriteLine("{0} / {1}", wt, cpt);
Thread.Sleep(100);
}
});*/
WebClient webClient = new WebClient();
Stopwatch stopwatch = Stopwatch.StartNew();
foreach (string url in urls)
{
webClient.DownloadString(url);
Console.WriteLine("Got '{0}'", url);
}
stopwatch.Stop();
TimeSpan sequentialTime = stopwatch.Elapsed;
stopwatch.Restart();
CountdownEvent cde = new CountdownEvent(1);
foreach (string url in urls)
{
cde.AddCount();
webClient = new WebClient();
webClient.DownloadStringCompleted += (_, __) =>
{
Console.WriteLine("Got '{0}'", __.UserState);
cde.Signal();
};
webClient.DownloadStringAsync(new Uri(url), url);
}
cde.Signal();
cde.Wait();
stopwatch.Stop();
TimeSpan asyncTime = stopwatch.Elapsed;
stopwatch.Restart();
ThreadLocal<WebClient> threadWebClient = new ThreadLocal<WebClient>(() => new WebClient());
urls.AsParallel().WithDegreeOfParallelism(urls.Length).ForAll(url => threadWebClient.Value.DownloadString(url));
stopwatch.Stop();
TimeSpan PLinqTime = stopwatch.Elapsed;
Console.WriteLine("Sequential time: {0}.", sequentialTime);
Console.WriteLine("PLinq time: {0}.", PLinqTime);
Console.WriteLine("Async time: {0}.", asyncTime);
I'm comparing :
naive sequential loop
PLINQ loop
async I/Os
The interesting part are the last two.
I expect and try to prove that async I/Os are:
faster because they will create less pressure on the pool (less threads need to be created...)
lighter because they will consume less thread of the pool
My "benchmark" shows that it's faster and I guess that's because the pool does not need to allocate new threads for each request whereas with PLINQ each parallel request will block one thread.
Now I'd like to check the numbers about thread consumption.
The commented task was a poor attempt to monitor the pool. It may be the good starting point but until now the result are not really consistent with what I expect: it never displays that more than 3/4 threads are consumed, whereas I expect something like 32 threads busy.
I'm open to any idea to enhance it or better any other use-case that would clearly highlight the differences between the two approaches.
Hope this is clearer now, and sorry for not having provided the details sooner. :)
The ThreadPool class provides the GetAvailableThreads method which "Retrieves the difference between the maximum number of thread pool threads returned by the GetMaxThreads method, and the number currently active." [1]: http://msdn.microsoft.com/en-us/library/system.threading.threadpool.getavailablethreads%28v=vs.110%29.aspx
You can capture the ratio thustly:
int workerThreads;
int completionPortThreads;
ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
Console.WriteLine("{0} of {1} threads available", workerThreads, completionPortThreads);
Related
I have read a few stackoverflow threads about multi-threading in a foreach loop, but I am not sure I am understanding and using it right.
I have tried multiple scenarios, but I am not seeing much increase in performance.
Here is what I believe runs Asynchronous tasks, but running synchronously in the loop using a single thread:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
}
stopWatch.Stop();
Here is what I hoped to be running Asynchronously (still using a single thread) - I would have expected some speed improvement already here:
List<Task<ExchangeTicker>> exchTkrs = new List<Task<ExchangeTicker>>();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
exchTkrs.Add(selectedApi.GetTickerAsync(symbol));
}
}
ExchangeTicker[] retTickers = await Task.WhenAll(exchTkrs);
stopWatch.Stop();
Here is what I would have hoped to run Asynchronously in Multi-thread:
stopWatch.Start();
Parallel.ForEach(selectedApis, async (IExchangeAPI selectedApi) =>
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
});
stopWatch.Stop();
Stop watch results interpreted as follows:
Console.WriteLine("Time elapsed (ns): {0}", stopWatch.Elapsed.TotalMilliseconds * 1000000);
Console outputs:
Time elapsed (ns): 4183308100
Time elapsed (ns): 4183946299.9999995
Time elapsed (ns): 4188032599.9999995
Now, the speed improvement looks minuscule. Am I doing something wrong or is that more or less what I should be expecting? I suppose writing to files would be a better to check that.
Would you mind also confirming I am interpreting the different use cases correctly?
Finally, using a foreach loop in order to get the ticker from multiple platforms in parallel may not be the best approach. Suggestions on how to improve this would be welcome.
EDIT
Note that I am using the ExchangeSharp code base that you can find here
Here is what the GerTickerAsync() method looks like:
public virtual async Task<ExchangeTicker> GetTickerAsync(string marketSymbol)
{
marketSymbol = NormalizeMarketSymbol(marketSymbol);
return await Cache.CacheMethod(MethodCachePolicy, async () => await OnGetTickerAsync(marketSymbol), nameof(GetTickerAsync), nameof(marketSymbol), marketSymbol);
}
For the Kraken API, you then have:
protected override async Task<ExchangeTicker> OnGetTickerAsync(string marketSymbol)
{
JToken apiTickers = await MakeJsonRequestAsync<JToken>("/0/public/Ticker", null, new Dictionary<string, object> { { "pair", NormalizeMarketSymbol(marketSymbol) } });
JToken ticker = apiTickers[marketSymbol];
return await ConvertToExchangeTickerAsync(marketSymbol, ticker);
}
And the Caching method:
public static async Task<T> CacheMethod<T>(this ICache cache, Dictionary<string, TimeSpan> methodCachePolicy, Func<Task<T>> method, params object?[] arguments) where T : class
{
await new SynchronizationContextRemover();
methodCachePolicy.ThrowIfNull(nameof(methodCachePolicy));
if (arguments.Length % 2 == 0)
{
throw new ArgumentException("Must pass function name and then name and value of each argument");
}
string methodName = (arguments[0] ?? string.Empty).ToStringInvariant();
string cacheKey = methodName;
for (int i = 1; i < arguments.Length;)
{
cacheKey += "|" + (arguments[i++] ?? string.Empty).ToStringInvariant() + "=" + (arguments[i++] ?? string.Empty).ToStringInvariant("(null)");
}
if (methodCachePolicy.TryGetValue(methodName, out TimeSpan cacheTime))
{
return (await cache.Get<T>(cacheKey, async () =>
{
T innerResult = await method();
return new CachedItem<T>(innerResult, CryptoUtility.UtcNow.Add(cacheTime));
})).Value;
}
else
{
return await method();
}
}
At first it should be pointed out that what you are trying to achieve is performance, not asynchrony. And you are trying to achieve it by running multiple operations concurrently, not in parallel. To keep the explanation simple I'll use a simplified version of your code, and I'll assume that each operation is a direct web request, without an intermediate caching layer, and with no dependencies to values existing in dictionaries.
foreach (var symbol in selectedSymbols)
{
var ticker = await selectedApi.GetTickerAsync(symbol);
}
The above code runs the operations sequentially. Each operation starts after the completion of the previous one.
var tasks = new List<Task<ExchangeTicker>>();
foreach (var symbol in selectedSymbols)
{
tasks.Add(selectedApi.GetTickerAsync(symbol));
}
var tickers = await Task.WhenAll(tasks);
The above code runs the operations concurrently. All operations start at once. The total duration is expected to be the duration of the longest running operation.
Parallel.ForEach(selectedSymbols, async symbol =>
{
var ticker = await selectedApi.GetTickerAsync(symbol);
});
The above code runs the operations concurrently, like the previous version with Task.WhenAll. It offers no advantage, while having the huge disadvantage that you no longer have a way to await the operations to complete. The Parallel.ForEach method will return immediately after launching the operations, because the Parallel class doesn't understand async delegates (it does not accept Func<Task> lambdas). Essentially there are a bunch of async void lambdas in there, that are running out of control, and in case of an exception they will bring down the process.
So the correct way to run the operations concurrently is the second way, using a list of tasks and the Task.WhenAll. Since you've already measured this method and haven't observed any performance improvements, I am assuming that there something else that serializes the concurrent operations. It could be something like a SemaphoreSlim hidden somewhere in your code, or some mechanism on the server side that throttles your requests. You'll have to investigate further to find where and why the throttling happens.
In general, when you do not see an increase by multi threading, it is because your task is not CPU limited or large enough to offset the overhead.
In your example, i.e.:
selectedApi.GetTickerAsync(symbol);
This can hae 2 reasons:
1: Looking up the ticker is brutally fast and it should not be an async to start with. I.e. when you look it up in a dictionary.
2: This is running via a http connection where the runtime is LIMITING THE NUMBER OF CONCURRENT CALLS. Regardless how many tasks you open, it will not use more than 4 at the same time.
Oh, and 3: you think async is using threads. It is not. It is particularly not the case in a codel ike this:
await selectedApi.GetTickerAsync(symbol);
Where you basically IMMEDIATELY WAIT FOR THE RESULT. There is no multi threading involved here at all.
foreach (IExchangeAPI selectedApi in selectedApis) {
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
} }
This is linear non threaded code using an async interface to not block the current thread while the (likely expensive IO) operation is in place. It starts one, THEN WAITS FOR THE RESULT. No 2 queries ever start at the same time.
If you want a possible (just as example) more scalable way:
In the foreach, do not await but add the task to a list of tasks.
Then start await once all the tasks have started. Likein a 2nd loop.
WAY not perfect, but at least the runtime has a CHANCE to do multiple lookups at the same time. Your await makes sure that you essentially run single threaded code, except async, so your thread goes back into the pool (and is not waiting for results), increasing your scalability - an item possibly not relevant in this case and definitely not measured in your test.
I have two versions of my program that submit ~3000 HTTP GET requests to a web server.
The first version is based off of what I read here. That solution makes sense to me because making web requests is I/O bound work, and the use of async/await along with Task.WhenAll or Task.WaitAll means that you can submit 100 requests all at once and then wait for them all to finish before submitting the next 100 requests so that you don't bog down the web server. I was surprised to see that this version completed all of the work in ~12 minutes - way slower than I expected.
The second version submits all 3000 HTTP GET requests inside a Parallel.ForEach loop. I use .Result to wait for each request to finish before the rest of the logic within that iteration of the loop can execute. I thought that this would be a far less efficient solution, since using threads to perform tasks in parallel is usually better suited for performing CPU bound work, but I was surprised to see that the this version completed all of the work within ~3 minutes!
My question is why is the Parallel.ForEach version faster? This came as an extra surprise because when I applied the same two techniques against a different API/web server, version 1 of my code was actually faster than version 2 by about 6 minutes - which is what I expected. Could performance of the two different versions have something to do with how the web server handles the traffic?
You can see a simplified version of my code below:
private async Task<ObjectDetails> TryDeserializeResponse(HttpResponseMessage response)
{
try
{
using (Stream stream = await response.Content.ReadAsStreamAsync())
using (StreamReader readStream = new StreamReader(stream, Encoding.UTF8))
using (JsonTextReader jsonTextReader = new JsonTextReader(readStream))
{
JsonSerializer serializer = new JsonSerializer();
ObjectDetails objectDetails = serializer.Deserialize<ObjectDetails>(
jsonTextReader);
return objectDetails;
}
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<HttpResponseMessage> TryGetResponse(string urlStr)
{
try
{
HttpResponseMessage response = await httpClient.GetAsync(urlStr)
.ConfigureAwait(false);
if (response.StatusCode != HttpStatusCode.OK)
{
throw new WebException("Response code is "
+ response.StatusCode.ToString() + "... not 200 OK.");
}
return response;
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<ListOfObjects> GetObjectDetailsAsync(string baseUrl, int id)
{
string urlStr = baseUrl + #"objects/id/" + id + "/details";
HttpResponseMessage response = await TryGetResponse(urlStr);
ObjectDetails objectDetails = await TryDeserializeResponse(response);
return objectDetails;
}
// With ~3000 objects to retrieve, this code will create 100 API calls
// in parallel, wait for all 100 to finish, and then repeat that process
// ~30 times. In other words, there will be ~30 batches of 100 parallel
// API calls.
private Dictionary<int, Task<ObjectDetails>> GetAllObjectDetailsInBatches(
string baseUrl, Dictionary<int, MyObject> incompleteObjects)
{
int batchSize = 100;
int numberOfBatches = (int)Math.Ceiling(
(double)incompleteObjects.Count / batchSize);
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= new Dictionary<int, Task<ObjectDetails>>(incompleteObjects.Count);
var orderedIncompleteObjects = incompleteObjects.OrderBy(pair => pair.Key);
for (int i = 0; i < 1; i++)
{
var batchOfObjects = orderedIncompleteObjects.Skip(i * batchSize)
.Take(batchSize);
var batchObjectsTaskList = batchOfObjects.Select(
pair => GetObjectDetailsAsync(baseUrl, pair.Key));
Task.WaitAll(batchObjectsTaskList.ToArray());
foreach (var objTask in batchObjectsTaskList)
objectTaskDict.Add(objTask.Result.id, objTask);
}
return objectTaskDict;
}
public void GetObjectsVersion1()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= GetAllObjectDetailsInBatches(baseUrl, incompleteObjects);
foreach (KeyValuePair<int, MyObject> pair in incompleteObjects)
{
ObjectDetails objectDetails = objectTaskDict[pair.Key].Result
.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
};
}
public void GetObjectsVersion2()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Parallel.ForEach(incompleteHosts, pair =>
{
ObjectDetails objectDetails = GetObjectDetailsAsync(
baseUrl, pair.Key).Result.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
});
}
A possible reason why Parallel.ForEach may run faster is because it creates the side-effect of throttling. Initially x threads are processing the first x elements (where x in the number of the available cores), and progressively more threads may be added depending on internal heuristics. Throttling IO operations is a good thing because it protects the network and the server that handles the requests from becoming overburdened. Your alternative improvised method of throttling, by making requests in batches of 100, is far from ideal for many reasons, one of them being that 100 concurrent requests are a lot of requests! Another one is that a single long running operation may delay the completion of the batch until long after the completion of the other 99 operations.
Note that Parallel.ForEach is also not ideal for parallelizing IO operations. It just happened to perform better than the alternative, wasting memory all along. For better approaches look here: How to limit the amount of concurrent async I/O operations?
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?view=netframework-4.8
Basically the parralel foreach allows iterations to run in parallel so you are not constraining the iteration to run in serial, on a host that is not thread constrained this will tend to lead to improved throughput
In short:
Parallel.Foreach() is most useful for CPU bound tasks.
Task.WaitAll() is more useful for IO bound tasks.
So in your case, you are getting information from webservers, which is IO. If the async methods are implemented correctly, it won't block any thread. (It will use IO Completion ports to wait on) This way the threads can do other stuff.
By running the async methods GetObjectDetailsAsync(baseUrl, pair.Key).Result synchroniced, it will block a thread. So the threadpool will be flood by waiting threads.
So I think the Task solution will have a better fit.
I was noticing an initial slow down in my process and upon taking multiple hangdumps, I was able to isolate the issue and reproduce the scenario using the following code. I am using a library that has locks and what not, which eventually calls the user side implementation of certain methods. These methods make async calls using httpclient. These async calls are made from within these locks inside the library.
Now, my theory as to what is happening (do correct me if I am wrong):
The tasks that get spun try to acquire the lock and hold on to the threads fast enough such that the first PingAsync method needs to wait for the default task scheduler to spin up a new thread for it to run on, which is 0.5 s based on the default .net scheduling algorithm. This is why I think I notice delays for total tasks greater than 32, which also increases linearly with increasing total tasks count.
The workaround:
Increase the minthreads count, which I think is treating the symptom and not the actual problem.
Another way is to have a limited concurrency to control the number of tasks fired. But these are tasks spun by a webserver for incoming httprequests and typically we will not have control over it (or will we?)
I understand that combining asyc and non-async is bad design and using sempahores' async calls would be a better way to go. Assuming I do not have control over this library, how does one go about mitigating this problem?
const int ParallelCount = 16;
const int TotalTasks = 33;
static object _lockObj = new object();
static HttpClient _httpClient = new HttpClient();
static int count = 0;
static void Main(string[] args)
{
ThreadPool.GetMinThreads(out int workerThreads, out int ioThreads);
Console.WriteLine($"Min threads count. Worker: {workerThreads}. IoThreads: {ioThreads}");
ThreadPool.GetMaxThreads(out workerThreads, out ioThreads);
Console.WriteLine($"Max threads count. Worker: {workerThreads}. IoThreads: {ioThreads}");
//var done = ThreadPool.SetMaxThreads(1024, 1000);
//ThreadPool.GetMaxThreads(out workerThreads, out ioThreads);
//Console.WriteLine($"Set Max Threads success? {done}.");
//Console.WriteLine($"Max threads count. Worker: {workerThreads}. IoThreads: {ioThreads}");
//var done = ThreadPool.SetMinThreads(1024, 1000);
//ThreadPool.GetMinThreads(out workerThreads, out ioThreads);
//Console.WriteLine($"Set Min Threads success? {done}.");
//Console.WriteLine($"Min threads count. Worker: {workerThreads}. IoThreads: {ioThreads}");
var startTime = DateTime.UtcNow;
var tasks = new List<Task>();
for (int i = 0; i < TotalTasks; i++)
{
tasks.Add(Task.Run(() => LibraryMethod()));
//while (tasks.Count > ParallelCount)
//{
// var task = Task.WhenAny(tasks.ToArray()).GetAwaiter().GetResult();
// if (task.IsFaulted)
// {
// throw task.Exception;
// }
// tasks.Remove(task);
//}
}
Task.WaitAll(tasks.ToArray());
//while (tasks.Count > 0)
//{
// var task = Task.WhenAny(tasks.ToArray()).GetAwaiter().GetResult();
// if (task.IsFaulted)
// {
// throw task.Exception;
// }
// tasks.Remove(task);
// Console.Write(".");
//}
Console.Write($"\nDone in {(DateTime.UtcNow-startTime).TotalMilliseconds}");
Console.ReadLine();
}
Assuming this is the part where library methods are called,
public static void LibraryMethod()
{
lock (_lockObj)
{
SimpleNonAsync();
}
}
Eventually, the user implementation of this method gets called which is async.
public static void SimpleNonAsync()
{
//PingAsync().Result;
//PingAsync().ConfigureAwaiter(false).Wait();
PingAsync().Wait();
}
private static async Task PingAsync()
{
Console.Write($"{Interlocked.Increment(ref count)}.");
await _httpClient.SendAsync(new HttpRequestMessage
{
RequestUri = new Uri($#"http://127.0.0.1"),
Method = HttpMethod.Get
});
}
These async calls are made from within these locks inside the library.
This is a design flaw. No one should call arbitrary code while under a lock.
That said, the locks have nothing to do with the problem you're seeing.
I understand that combining asyc and non-async is bad design and using sempahores' async calls would be a better way to go. Assuming I do not have control over this library, how does one go about mitigating this problem?
The problem is that the library is forcing your code to be synchronous. This means one thread is being blocked for every download; there's no way around that as long as the library's callbacks are synchronous.
Increase the minthreads count, which I think is treating the symptom and not the actual problem.
If you can't modify the library, then you must use one thread per request, and this becomes a viable workaround. You have to treat the symptom because you can't fix the problem (i.e., the library).
Another way is to have a limited concurrency to control the number of tasks fired. But these are tasks spun by a webserver for incoming httprequests and typically we will not have control over it (or will we?)
No; the tasks causing problems are the ones you're spinning up yourself using Task.Run. The tasks on the server are completely independent; your code can't influence or even detect them.
If you want higher concurrency without waiting for thread injection, then you'll need to increase min threads, and you'll also probably need to increase ServicePointManager.DefaultConnectionLimit. You can then continue to use Task.Run, or (as I would prefer) Parallel or Parallel LINQ to do parallel processing. One nice aspect of Parallel / Parallel LINQ is that it has built-in support for throttling, if that is also desired.
I create dynamic threads in C# and I need to get the status of those running threads.
List<string>[] list;
list = dbConnect.Select();
for (int i = 0; i < list[0].Count; i++)
{
Thread th = new Thread(() =>{
sendMessage(list[0]['1']);
//calling callback function
});
th.Name = "SID"+i;
th.Start();
}
for (int i = 0; i < list[0].Count; i++)
{
// here how can i get list of running thread here.
}
How can you get list of running threads?
On Threads
I would avoid explicitly creating threads on your own.
It is much more preferable to use the ThreadPool.QueueUserWorkItem or if you do can use .Net 4.0 you get the much more powerful Task parallel library which also allows you to use a ThreadPool threads in a much more powerful way (Task.Factory.StartNew is worth a look)
What if we choose to go by the approach of explicitly creating threads?
Let's suppose that your list[0].Count returns 1000 items. Let's also assume that you are performing this on a high-end (at the time of this writing) 16core machine. The immediate effect is that we have 1000 threads competing for these limited resources (the 16 cores).
The larger the number of tasks and the longer each of them runs, the more time will be spent in context switching. In addition, creating threads is expensive, this overhead creating each thread explicitly could be avoided if an approach of reusing existing threads is used.
So while the initial intent of multithreading may be to increase speed, as we can see it can have quite the opposite effect.
How do we overcome 'over'-threading?
This is where the ThreadPool comes into play.
A thread pool is a collection of threads that can be used to perform a number of tasks in the background.
How do they work:
Once a thread in the pool completes its task, it is returned to a queue of waiting threads, where it can be reused. This reuse enables applications to avoid the cost of creating a new thread for each task.
Thread pools typically have a maximum number of threads. If all the threads are busy, additional tasks are placed in queue until they can be serviced as threads become available.
So we can see that by using a thread pool threads we are more efficient both
in terms of maximizing the actual work getting done. Since we are not over saturating the processors with threads, less time is spent switching between threads and more time actually executing the code that a thread is supposed to do.
Faster thread startup: Each threadpool thread is readily available as opposed to waiting until a new thread gets constructed.
in terms of minimising memory consumption, the threadpool will limit the number of threads to the threadpool size enqueuing any requests that are beyond the threadpool size limit. (see ThreadPool.GetMaxThreads). The primary reason behind this design choice, is of course so that we don't over-saturate the limited number of cores with too many thread requests keeping context switching to lower levels.
Too much Theory, let's put all this theory to the test!
Right, it's nice to know all this in theory, but let's put it to practice and see what
the numbers tell us, with a simplified crude version of the application that can give us a coarse indication of the difference in orders of magnitude. We will do a comparison between new Thread, ThreadPool and Task Parallel Library (TPL)
new Thread
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
Thread thread = new Thread(() =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0) // By the way, what does your sendMessage(list[0]['1']); mean? what is this '1'? if it is i you are not thread safe.
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
thread.Name = "SID" + iCopy; // you can also use i here.
thread.Start();
}
Console.ReadKey();
Console.WriteLine(GC.GetTotalMemory(false) - initialMemoryFootPrint);
Console.ReadKey();
}
Result:
ThreadPool.EnqueueUserWorkItem
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
ThreadPool.QueueUserWorkItem((w) =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0)
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
}
Console.ReadKey();
Console.WriteLine("Memory usage: " + (GC.GetTotalMemory(false) - initialMemoryFootPrint));
Console.ReadKey();
}
Result:
Task Parallel Library (TPL)
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
Task.Factory.StartNew(() =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0) // By the way, what does your sendMessage(list[0]['1']); mean? what is this '1'? if it is i you are not thread safe.
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
}
Console.ReadKey();
Console.WriteLine("Memory usage: " + (GC.GetTotalMemory(false) - initialMemoryFootPrint));
Console.ReadKey();
}
Result:
So we can see that:
+--------+------------+------------+--------+
| | new Thread | ThreadPool | TPL |
+--------+------------+------------+--------+
| Time | 6749 | 228ms | 222ms |
| Memory | ≈300kb | ≈103kb | ≈123kb |
+--------+------------+------------+--------+
The above falls nicely inline to what we anticipated in theory. High memory for new Thread as well as slower overall performance when compared to ThreadPool. ThreadPool and TPL have equivalent performance with TPL having a slightly higher memory footprint than a pure thread pool but it's probably a price worth paying given the added flexibility Tasks provide (such as cancellation, waiting for completion querying status of task)
At this point, we have proven that using ThreadPool threads is the preferable option in terms of speed and memory.
Still, we have not answered your question. How to track the state of the threads running.
To answer your question
Given the insights we have gathered, this is how I would approach it:
List<string>[] list = listdbConnect.Select()
int itemCount = list[0].Count;
Task[] tasks = new Task[itemCount];
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
tasks[i] = Task.Factory.StartNew(() =>
{
// NOTE: Do not use i in here as it is not thread safe to do so!
sendMessage(list[0]['1']);
//calling callback function
});
}
// if required you can wait for all tasks to complete
Task.WaitAll(tasks);
// or for any task you can check its state with properties such as:
tasks[1].IsCanceled
tasks[1].IsCompleted
tasks[1].IsFaulted
tasks[1].Status
As a final note, you can not use the variable i in your Thread.Start, since it would create a closure over a changing variable which would effectively be shared amongst all Threads. To get around this (assuming you need to access i), simply create a copy of the variable and pass the copy in, this would make one closure per thread which would make it thread safe.
Good luck!
Use Process.Threads:
var currentProcess = Process.GetCurrentProcess();
var threads = currentProcess.Threads;
Note: any threads owned by the current process will show up here, including those not explicitly created by you.
If you only want the threads that you created, well, why don't you just keep track of them when you create them?
Create a List<Thread> and store each new thread in your first for loop in it.
List<string>[] list;
List<Thread> threads = new List<Thread>();
list = dbConnect.Select();
for (int i = 0; i < list[0].Count; i++)
{
Thread th = new Thread(() =>{
sendMessage(list[0]['1']);
//calling callback function
});
th.Name = "SID"+i;
th.Start();
threads.add(th)
}
for (int i = 0; i < list[0].Count; i++)
{
threads[i].DoStuff()
}
However if you don't need i you can make the second loop a foreach instead of a for
As a side note, if your sendMessage function does not take very long to execute you should somthing lighter weight then a full Thread, use a ThreadPool.QueueUserWorkItem or if it is available to you, a Task
Process.GetCurrentProcess().Threads
This gives you a list of all threads running in the current process, but beware that there are threads other than those you started yourself.
Use Process.Threads to iterate through your threads.
Given this code:
var arrayStrings = new string[1000];
Parallel.ForEach<string>(arrayStrings, someString =>
{
DoSomething(someString);
});
Will all 1000 threads spawn almost simultaneously?
No, it won't start 1000 threads - yes, it will limit how many threads are used. Parallel Extensions uses an appropriate number of cores, based on how many you physically have and how many are already busy. It allocates work for each core and then uses a technique called work stealing to let each thread process its own queue efficiently and only need to do any expensive cross-thread access when it really needs to.
Have a look at the PFX Team Blog for loads of information about how it allocates work and all kinds of other topics.
Note that in some cases you can specify the degree of parallelism you want, too.
On a single core machine... Parallel.ForEach partitions (chunks) of the collection it's working on between a number of threads, but that number is calculated based on an algorithm that takes into account and appears to continually monitor the work done by the threads it's allocating to the ForEach. So if the body part of the ForEach calls out to long running IO-bound/blocking functions which would leave the thread waiting around, the algorithm will spawn up more threads and repartition the collection between them. If the threads complete quickly and don't block on IO threads for example, such as simply calculating some numbers, the algorithm will ramp up (or indeed down) the number of threads to a point where the algorithm considers optimum for throughput (average completion time of each iteration).
Basically the thread pool behind all the various Parallel library functions, will work out an optimum number of threads to use. The number of physical processor cores forms only part of the equation. There is NOT a simple one to one relationship between the number of cores and the number of threads spawned.
I don't find the documentation around the cancellation and handling of synchronizing threads very helpful. Hopefully MS can supply better examples in MSDN.
Don't forget, the body code must be written to run on multiple threads, along with all the usual thread safety considerations, the framework does not abstract that factor... yet.
Great question. In your example, the level of parallelization is pretty low even on a quad core processor, but with some waiting the level of parallelization can get quite high.
// Max concurrency: 5
[Test]
public void Memory_Operations()
{
ConcurrentBag<int> monitor = new ConcurrentBag<int>();
ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
var arrayStrings = new string[1000];
Parallel.ForEach<string>(arrayStrings, someString =>
{
monitor.Add(monitor.Count);
monitor.TryTake(out int result);
monitorOut.Add(result);
});
Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}
Now look what happens when a waiting operation is added to simulate an HTTP request.
// Max concurrency: 34
[Test]
public void Waiting_Operations()
{
ConcurrentBag<int> monitor = new ConcurrentBag<int>();
ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
var arrayStrings = new string[1000];
Parallel.ForEach<string>(arrayStrings, someString =>
{
monitor.Add(monitor.Count);
System.Threading.Thread.Sleep(1000);
monitor.TryTake(out int result);
monitorOut.Add(result);
});
Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}
I haven't made any changes yet and the level of concurrency/parallelization has jumped drammatically. Concurrency can have its limit increased with ParallelOptions.MaxDegreeOfParallelism.
// Max concurrency: 43
[Test]
public void Test()
{
ConcurrentBag<int> monitor = new ConcurrentBag<int>();
ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
var arrayStrings = new string[1000];
var options = new ParallelOptions {MaxDegreeOfParallelism = int.MaxValue};
Parallel.ForEach<string>(arrayStrings, options, someString =>
{
monitor.Add(monitor.Count);
System.Threading.Thread.Sleep(1000);
monitor.TryTake(out int result);
monitorOut.Add(result);
});
Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}
// Max concurrency: 391
[Test]
public void Test()
{
ConcurrentBag<int> monitor = new ConcurrentBag<int>();
ConcurrentBag<int> monitorOut = new ConcurrentBag<int>();
var arrayStrings = new string[1000];
var options = new ParallelOptions {MaxDegreeOfParallelism = int.MaxValue};
Parallel.ForEach<string>(arrayStrings, options, someString =>
{
monitor.Add(monitor.Count);
System.Threading.Thread.Sleep(100000);
monitor.TryTake(out int result);
monitorOut.Add(result);
});
Console.WriteLine("Max concurrency: " + monitorOut.OrderByDescending(x => x).First());
}
I reccommend setting ParallelOptions.MaxDegreeOfParallelism. It will not necessarily increase the number of threads in use, but it will ensure you only start a sane number of threads, which seems to be your concern.
Lastly to answer your question, no you will not get all threads to start at once. Use Parallel.Invoke if you are looking to invoke in parallel perfectly e.g. testing race conditions.
// 636462943623363344
// 636462943623363344
// 636462943623363344
// 636462943623363344
// 636462943623363344
// 636462943623368346
// 636462943623368346
// 636462943623373351
// 636462943623393364
// 636462943623393364
[Test]
public void Test()
{
ConcurrentBag<string> monitor = new ConcurrentBag<string>();
ConcurrentBag<string> monitorOut = new ConcurrentBag<string>();
var arrayStrings = new string[1000];
var options = new ParallelOptions {MaxDegreeOfParallelism = int.MaxValue};
Parallel.ForEach<string>(arrayStrings, options, someString =>
{
monitor.Add(DateTime.UtcNow.Ticks.ToString());
monitor.TryTake(out string result);
monitorOut.Add(result);
});
var startTimes = monitorOut.OrderBy(x => x.ToString()).ToList();
Console.WriteLine(string.Join(Environment.NewLine, startTimes.Take(10)));
}
It works out an optimal number of threads based on the number of processors/cores. They will not all spawn at once.
See Does Parallel.For use one Task per iteration? for an idea of a "mental model" to use. However the author does state that "At the end of the day, it’s important to remember that implementation details may change at any time."
Does Parallel.ForEach limit the number of active threads?
If you don't configure the Parallel.ForEach with a positive MaxDegreeOfParallelism, the answer is no. With the default configuration, and assuming a source sequence with sufficiently large size, the Parallel.ForEach will use all the threads that are immediately available in the ThreadPool, and will continuously ask for more. By itself the Parallel.ForEach imposes zero restrictions to the number of threads. It is only limited by the capabilities of the associated TaskScheduler.
The default ParallelOptions.MaxDegreeOfParallelism is -1, which means unlimited parallelism.
The default ParallelOptions.TaskScheduler is the ThreadPoolTaskScheduler, better known as TaskScheduler.Default.
So if you want to understand the behavior of an unconfigured Parallel.ForEach, you have to know the behavior of the ThreadPool. Which is simple enough so that it can be described in a single paragraph. The ThreadPool satisfies immediately all requests for work, by starting new threads, up to a soft limit which be default is Environment.ProcessorCount. Upon reaching this limit, further requests are queued, and new threads are created with a rate of one new thread per second to satisfy the demand¹. There is also a hard limit on the number of threads, which in my machine is 32,767 threads. The soft limit is configurable with the ThreadPool.SetMinThreads method. The ThreadPool also retires threads, in case there are too many and there is no queued work, at about the same rate (1 per second).
Below is an experimental demonstration that the Parallel.ForEach uses all the available threads in the ThreadPool. The number of available threads
is configured up front with the ThreadPool.SetMinThreads, and then the Parallel.ForEach kicks in and takes all of them:
ThreadPool.SetMinThreads(workerThreads: 100, completionPortThreads: 10);
HashSet<Thread> threads = new();
int concurrency = 0;
int maxConcurrency = 0;
Parallel.ForEach(Enumerable.Range(1, 1500), n =>
{
lock (threads) maxConcurrency = Math.Max(maxConcurrency, ++concurrency);
lock (threads) threads.Add(Thread.CurrentThread);
// Simulate a CPU-bound operation that takes 200 msec
Stopwatch stopwatch = Stopwatch.StartNew();
while (stopwatch.ElapsedMilliseconds < 200) { }
lock (threads) --concurrency;
});
Console.WriteLine($"Number of unique threads: {threads.Count}");
Console.WriteLine($"Maximum concurrency: {maxConcurrency}");
Output (after waiting for ~5 seconds):
Number of unique threads: 102
Maximum concurrency: 102
Online demo.
The number of completionPortThreads is irrelevant for this test. The Parallel.ForEach uses the threads designated as "workerThreads". The 102 threads are:
The 100 ThreadPool threads that were created immediately on demand.
One more ThreadPool thread that was injected after 1 sec delay.
The main thread of the console application.
¹ The injection rate of the ThreadPool is not documented. As of .NET 7 it is one thread per second, but this could change in future .NET versions.