Most efficient way to execute several threads - c#

You may skip this part:
I am creating an application where the client needs to find the server on the same network.
The server:
public static void StartListening(Int32 port)
{
TcpListener server = new TcpListener(IP.GetCurrentIP(), port);
server.Start();
Thread t = new Thread(new ThreadStart(() =>
{
while (true)
{
// wait for connection
TcpClient client = server.AcceptTcpClient();
if (stopListening)
{
break;
}
}
}));
t.IsBackground = true;
t.Start();
}
Let's say the server is listening on port 12345
then the client:
get the current ip address of the client let's say it is 192.168.5.88
create a list of all posible ip addresses. The server ip address will probably be related to the client's ip if they are on the same local network therefore I construct the list as:
192.168.5.0
192.168.5.1
192.168.5.2
192.168.5.3
.....etc
.....
192.168.0.88
192.168.1.88
192.168.2.88
192.168.3.88
...etc
192.0.5.88
192.1.5.88
192.2.5.88
192.3.5.88
192.4.5.88
..... etc
0.168.5.88
1.168.5.88
2.168.5.88
3.168.5.88
4.168.5.88
.... etc
Then I try to connect with every possible ip and port 12345. If one connection is successful then that means that I found the address of the server.
Now my question is:
Now I have done this in two ways. I know just the basics about threads and I don't know if this is dangerous but it works really fast.
// first way
foreach (var ip in ListOfIps)
{
new Thread(new ThreadStart(() =>
{
TryConnect(ip);
})).Start();
}
the second way I belive it is more safe but it takes much more time:
// second way
foreach (var ip in ListOfIps)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(TryConnect), ip);
}
I have to call the TryConnect method about 1000 times and each time it takes about 2 seconds (I set the connection timeout to 2 seconds). What will be the most efficient and secure way of calling it 1000 times?
EDIT 2
Here are the results using different techniques:
1) Using threadpool
..
..
var now = DateTime.Now;
foreach (var item in allIps)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(DoWork), item);
}
ThreadPool.QueueUserWorkItem(new WaitCallback(PrintTimeDifference), now);
}
static void PrintTimeDifference(object startTime)
{
Console.WriteLine("------------------Done!----------------------");
var s = (DateTime)startTime;
Console.WriteLine((DateTime.Now-s).Seconds);
}
It took 37 seconds to complete
2) Using threads:
..
..
var now = DateTime.Now;
foreach (var item in allIps)
{
new Thread(new ThreadStart(() =>
{
DoWork(item);
})).Start();
}
ThreadPool.QueueUserWorkItem(new WaitCallback(PrintTimeDifference), now);
It took 12 seconds to complete
3) Using tasks:
..
..
var now = DateTime.Now;
foreach (var item in allIps)
{
var t = Task.Factory.StartNew(() =>
DoWork(item)
);
}
ThreadPool.QueueUserWorkItem(new WaitCallback(PrintTimeDifference), now);
}
static void PrintTimeDifference(object startTime)
{
Console.WriteLine("------------------Done!----------------------");
var s = (DateTime)startTime;
Console.WriteLine((DateTime.Now-s).Seconds);
}
It took 8 seconds!!

In this case I would prefer the solution with the ThreadPool-Threads, because creating 1000 Threads is a heavy operation (when you think of the memory each thread gets).
But since .NET 4 there is another solution with the class Task.
Tasks are workloads which can be executed in parallel. You can define and run them like this:
var t = Task.Factory.StartNew(() => DoAction());
You don't have to care about the number of threads used because the runtime environment handles that. So if you have the possibility to split your workload into smaller packages which can be executed in parallel I would use Tasks to do the work.

Both methods run the risk of creating way too many Threads.
A thread is expensive in the time it takes to be created and in memory consumption.
It does look like your 2nd approach, using the ThreadPool, should work better. Because of the long timeout (2 sec) it will still create many threads, but far less then 1000.
The better approach (requires Fx 4) would be to use Parallel.ForEach(...). But that too may require some tuning.
And a really good solution would use a broadcast (UDP) protocol to discover services.

Now I made my own benchmark.
Here is the code:
class Program {
private static long parallelIterations = 100;
private static long taskIterations = 100000000;
static void Main(string[] args) {
Console.WriteLine("Parallel Iterations: {0:n0}", parallelIterations);
Console.WriteLine("Task Iterations: {0:n0}", taskIterations);
Analyse("Simple Threads", ExecuteWorkWithSimpleThreads);
Analyse("ThreadPool Threads", ExecuteWorkWithThreadPoolThreads);
Analyse("Tasks", ExecuteWorkWithTasks);
Analyse("Parallel For", ExecuteWorkWithParallelFor);
Analyse("Async Delegates", ExecuteWorkWithAsyncDelegates);
}
private static void Analyse(string name, Action action) {
Stopwatch watch = new Stopwatch();
watch.Start();
action();
watch.Stop();
Console.WriteLine("{0}: {1} seconds", name.PadRight(20), watch.Elapsed.TotalSeconds);
}
private static void ExecuteWorkWithSimpleThreads() {
Thread[] threads = new Thread[parallelIterations];
for (long i = 0; i < parallelIterations; i++) {
threads[i] = new Thread(DoWork);
threads[i].Start();
}
for (long i = 0; i < parallelIterations; i++) {
threads[i].Join();
}
}
private static void ExecuteWorkWithThreadPoolThreads() {
object locker = new object();
EventWaitHandle waitHandle = new ManualResetEvent(false);
int finished = 0;
for (long i = 0; i < parallelIterations; i++) {
ThreadPool.QueueUserWorkItem((threadContext) => {
DoWork();
lock (locker) {
finished++;
if (finished == parallelIterations)
waitHandle.Set();
}
});
}
waitHandle.WaitOne();
}
private static void ExecuteWorkWithTasks() {
Task[] tasks = new Task[parallelIterations];
for (long i = 0; i < parallelIterations; i++) {
tasks[i] = Task.Factory.StartNew(DoWork);
}
Task.WaitAll(tasks);
}
private static void ExecuteWorkWithParallelFor() {
Parallel.For(0, parallelIterations, (n) => DoWork());
}
private static void ExecuteWorkWithAsyncDelegates() {
Action[] actions = new Action[parallelIterations];
IAsyncResult[] results = new IAsyncResult[parallelIterations];
for (long i = 0; i < parallelIterations; i++) {
actions[i] = DoWork;
results[i] = actions[i].BeginInvoke((result) => { }, null);
}
for (long i = 0; i < parallelIterations; i++) {
results[i].AsyncWaitHandle.WaitOne();
results[i].AsyncWaitHandle.Close();
}
}
private static void DoWork() {
//Thread.Sleep(TimeSpan.FromMilliseconds(taskDuration));
for (long i = 0; i < taskIterations; i++ ) { }
}
}
Here is the result with different settings:
Parallel Iterations: 100.000
Task Iterations: 100
Simple Threads : 13,4589412 seconds
ThreadPool Threads : 0,0682997 seconds
Tasks : 0,1327014 seconds
Parallel For : 0,0066053 seconds
Async Delegates : 2,3844015 seconds
Parallel Iterations: 100
Task Iterations: 100.000.000
Simple Threads : 5,6415113 seconds
ThreadPool Threads : 5,5798242 seconds
Tasks : 5,6261562 seconds
Parallel For : 5,8721274 seconds
Async Delegates : 5,6041608 seconds
As you can see simple threads are not efficient when there are too much of them.
But when using some of them they are very efficient because there is little overhead (e.g. synchronization).

Well there are pros and cons to this approach:
Using an individual thread per connection will (in theory) let you make all connections in parallel, since this is a blocking I/O operation all threads will be suspended until the respective connection succeeds. However, creating 1000 threads is a bit of an overkill on the system.
Using the thread pool gives you the benefit of reusing threads, but only a limited number of connection tasks can be active at one time. For example if the thread pool has 4 threads, then 4 connections will be attempted, then another 4 and so on. This is light on resource but may take too long because, as you said, a single connection needs about 2 seconds.
So I would advise a trade-off: create a thread-pool with about 50 threads (using the SetMaxThreads method) and queue all the connections. That way, it will be lighter on resources than 1000 threads, and still process connections reasonably fast.

Related

I am trying to call a method in a loop .It should be called only 20 times in 10 seconds . I am using semaphore like the below code

By using the below code firstly some of the calls are not getting made lets say out of 250 , 238 calls are made and rest doesn't.Secondly I am not sure if the calls are made at the rate of 20 calls per 10 seconds.
public List<ShowData> GetAllShowAndTheirCast()
{
ShowResponse allShows = GetAllShows();
ShowCasts showCast = new ShowCasts();
showCast.showCastList = new List<ShowData>();
using (Semaphore pool = new Semaphore(20, 20))
{
for (int i = 0; i < allShows.Shows.Length; i++)
{
pool.WaitOne();
Thread t = new Thread(new ParameterizedThreadStart((taskId) =>
{
showCast.showCastList.Add(MapResponse(allShows.Shows[i]));
}));
pool.Release();
t.Start(i);
}
}
//for (int i = 0; i < allShows.Shows.Length; i++)
//{
// showCast.showCastList.Add(MapResponse(allShows.Shows[i]));
//}
return showCast.showCastList;
}
public ShowData MapResponse(Show s)
{
CastResponse castres = new CastResponse();
castres.CastlistResponse = (GetShowCast(s.id)).CastlistResponse;
ShowData sd = new ShowData();
sd.id = s.id;
sd.name = s.name;
if (castres.CastlistResponse != null && castres.CastlistResponse.Any())
{
sd.cast = new List<CastData>();
foreach (var item in castres.CastlistResponse)
{
CastData cd = new CastData();
cd.birthday = item.person.birthday;
cd.id = item.person.id;
cd.name = item.person.name;
sd.cast.Add(cd);
}
}
return sd;
}
public ShowResponse GetAllShows()
{
ShowResponse response = new ShowResponse();
string showUrl = ClientAPIUtils.apiUrl + "shows";
response.Shows = JsonConvert.DeserializeObject<Show[]>(ClientAPIUtils.GetDataFromUrl(showUrl));
return response;
}
public CastResponse GetShowCast(int showid)
{
CastResponse res = new CastResponse();
string castUrl = ClientAPIUtils.apiUrl + "shows/" + showid + "/cast";
res.CastlistResponse = JsonConvert.DeserializeObject<List<Cast>>(ClientAPIUtils.GetDataFromUrl(castUrl));
return res;
}
All the Calls should be made , but I am not sure where they are getting aborted and even please let me know how to check the rate of calls being made.
I'm assuming that your goal is to process all data about shows but no more than 20 at once.
For that kind of task you should probably use ThreadPool and limit maximum number of concurrent threads using SetMaxThreads.
https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool?view=netframework-4.7.2
You have to make sure that collection that you are using to store your results is thread-safe.
showCast.showCastList = new List<ShowData>();
I don't think that standard List is thread-safe. Thread-safe collection is ConcurrentBag (there are others as well). You can make standard list thread-safe but it requires more code. After you are done processing and need to have results in list or array you can convert collection to desired type.
https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentbag-1?view=netframework-4.7.2
Now to usage of semaphore. What your semaphore is doing is ensuring that maximum 20 threads can be created at once. Assuming that this loop runs in your app main thread your semaphore has no purpose. To make it work you need to release semaphore once thread is completed; but you are calling thread Start() after calling Release(). That results in thread being executed outside "critical area".
using (Semaphore pool = new Semaphore(20, 20)) {
for (int i = 0; i < allShows.Shows.Length; i++) {
pool.WaitOne();
Thread t = new Thread(new ParameterizedThreadStart((taskId) =>
{
showCast.showCastList.Add(MapResponse(allShows.Shows[i]));
pool.Release();
}));
t.Start(i);
}
}
I did not test this solution; additional problems might arise.
Another issue with this program is that it does not wait for all threads to complete. Once all threads are started; program will end. It is possible (and in your case I'm sure) that not all threads completed its operation; this is why ~240 data packets are done when program finishes.
thread.Join();
But if called right after Start() it will stop main thread until it is completed so to keep program concurrent you need to create a list of threads and Join() them at the end of program. It is not the best solution. How to wait on all threads that program can add to ThreadPool
Wait until all threads finished their work in ThreadPool
As final note you cannot access loop counter like that. Final value of loop counter is evaluated later and with test I ran; code has tendency to process odd records twice and skip even. This is happening because loop increases counter before previous thread is executed and causes to access elements outside bounds of array.
Possible solution to that is to create method that will create thread. Having it in separate method will evaluate allShows.Shows[i] to show before next loop pass.
public void CreateAndStartThread(Show show, Semaphore pool, ShowCasts showCast)
{
pool.WaitOne();
Thread t = new Thread(new ParameterizedThreadStart((s) => {
showCast.showCastList.Add(MapResponse((Show)s));
pool.Release();
}));
t.Start(show);
}
Concurrent programming is tricky and I would highly recommend to do some exercises with examples on common pitfalls. Books on C# programming are sure to have a chapter or two on the topic. There are plenty of online courses and tutorials on this topic to learn from.
Edit:
Working solution. Still might have some issues.
public ShowCasts GetAllShowAndTheirCast()
{
ShowResponse allShows = GetAllShows();
ConcurrentBag<ShowData> result = new ConcurrentBag<ShowData>();
using (var countdownEvent = new CountdownEvent(allShows.Shows.Length))
{
using (Semaphore pool = new Semaphore(20, 20))
{
for (int i = 0; i < allShows.Shows.Length; i++)
{
CreateAndStartThread(allShows.Shows[i], pool, result, countdownEvent);
}
countdownEvent.Wait();
}
}
return new ShowCasts() { showCastList = result.ToList() };
}
public void CreateAndStartThread(Show show, Semaphore pool, ConcurrentBag<ShowData> result, CountdownEvent countdownEvent)
{
pool.WaitOne();
Thread t = new Thread(new ParameterizedThreadStart((s) =>
{
result.Add(MapResponse((Show)s));
pool.Release();
countdownEvent.Signal();
}));
t.Start(show);
}

Thread vs Parallel.For performance

I am struggling to understand the difference between threads and Parallel.For. I created two functions, one used Parallel.For other invoked threads. Invoking 10 threads would appear to be faster, can anyone please explain? Would threads use multiple processors available in the system (to get executed in parallel) or does it just do time slicing in reference to CLR?
public static bool ParallelProcess()
{
Stopwatch sw = new Stopwatch();
sw.Start();
Parallel.For(0, 10, x =>
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(3000);
});
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 10; i++)
{
Thread t = new Thread(new ThreadStart(Thread1));
t.Start();
if (i == 9)
t.Join();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1()
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", 0,
Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(3000);
}
When called below methods, Parallel.For took twice time then threads.
Algo.ParallelThread(); //took 3 secs
Algo.ParallelProcess(); //took 6 secs
Parallel utilizes however many threads the underlying scheduler provides, which would be the minimum number of threadpool threads to start with.
The number of minimum threadpool threads is by default set to the number of processors. As time goes on and based on many different factors, e.g. all current threads being busy, the scheduler might decide to spawn more threads and go higher than the minimum count.
All of that is managed for you to stop unnecessary resource usage. Your second example circumvents all that by spawning threads manually. If you explicitly set the number of threadpool threads e.g. ThreadPool.SetMinThreads(100, 100), you'll see even the Parallel one takes 3 seconds as it immediately has more threads available to use.
You've got a bunch of things here that are going wrong.
(1) Don't use sw.Elapsed.Seconds this value is an int and (obviously) truncates the fractional part of the time. Worse though, if you have a process that takes 61 seconds to complete this will report 1 as it's like the second hand on a clock. You should instead use sw.Elapsed.TotalSeconds which reports as a double and it shows the total number of seconds regardless how many minutes or hours, etc.
(2) Parallel.For uses the thread-pool. This significantly reduces (or even eliminates) the overhead for creating threads. Each time you call new Thread(() => ...) you are allocating over 1MB of RAM and chewing up valuable resources before any processing can take place.
(3) You're artificially loading up the threads with Thread.Sleep(3000); and this means you are overshadowing the actual time it takes to create threads with a massive sleep.
(4) Parallel.For is, by default, limited by the number of cores on your CPU. So when you run 10 threads the work is being cut in to two steps - meaning that the Thread.Sleep(3000); is being run twice in series, hence the 6 seconds that it's running. The new Thread approach is running all of the threads in one go meaning that it takes just over 3 seconds, but again, the Thread.Sleep(3000); is swamping the thread start up time.
(5) You're also dealing with a CLR JIT issue. The first time you run your code the start-up costs are enormous. Let's change the code to remove the sleeps and to properly join the threads:
public static bool ParallelProcess()
{
Stopwatch sw = new Stopwatch();
sw.Start();
Parallel.For(0, 10, x =>
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", x, Thread.CurrentThread.ManagedThreadId));
});
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.TotalMilliseconds));
return true;
}
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var threads = Enumerable.Range(0, 10).Select(x => new Thread(new ThreadStart(Thread1))).ToList();
foreach (var thread in threads) thread.Start();
foreach (var thread in threads) thread.Join();
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.TotalMilliseconds));
return true;
}
private static void Thread1()
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", 0, Thread.CurrentThread.ManagedThreadId));
}
Now, to get rid of the CLR/JIT start up time, let's run the code like this:
ParallelProcess();
ParallelThread();
ParallelProcess();
ParallelThread();
ParallelProcess();
ParallelThread();
The times we get are like this:
Time in secs 3.8617
Time in secs 4.7719
Time in secs 0.3633
Time in secs 1.6332
Time in secs 0.3551
Time in secs 1.6148
The starting run times are terrible compared to the second and third runs that are far more consistent.
The result is that running Parallel.For is 4 to 5 times faster than calling new Thread.
Your snippets are not equivalent. Here is a version of ParallelThread that would do the same as ParallelProcess but starting new threads:
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var threads = new Thread[10];
for (int i = 0; i < 10; i++)
{
int x = i;
threads[i] = new Thread(() => Thread1(x));
threads[i].Start();
}
for (int i = 0; i < 10; i++)
{
threads[i].Join();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1(int x)
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(3000);
}
Here, I am making sure to wait for all the threads. And also, I making sure to match the console output. Things that OP code does not do.
However, the time difference is still there.
Let me tell you what makes the difference, at least in my tests: the order. Run ParallelProcess before ParallelThread and they should both take 3 seconds to complete (ignoring the initial runs, which will take longer because of compilation). I cannot really explain it.
We could modify the above code futher to use the ThreadPool, and that did also result in ParallelProcess completing in 3 seconds (even though I did not modify that version). This is the version of ParallelThread with ThreadPool I came up with:
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var events = new ManualResetEvent[10];
for (int i = 0; i < 10; i++)
{
int x = i;
events[x] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem
(
_ =>
{
Thread1(x);
events[x].Set();
}
);
}
for (int i = 0; i < 10; i++)
{
events[i].WaitOne();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1(int x)
{
Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(3000);
}
Note: We could use WaitAll on the events, but that would fail on a STAThread.
You have Thread.Sleep(3000) which are the 3 seconds we see. Meaning that we are not really measuring the overhead of any of these methods.
So, I decided I want to study this futher, and to do it, I went up one order of magnitud (from 10 to 100) and removed the Console.WriteLine (which is introducing synchronization anyway).
This is my code listing:
void Main()
{
ParallelThread();
ParallelProcess();
}
public static bool ParallelProcess()
{
Stopwatch sw = new Stopwatch();
sw.Start();
Parallel.For(0, 100, x =>
{
/*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));*/
Thread.Sleep(3000);
});
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var events = new ManualResetEvent[100];
for (int i = 0; i < 100; i++)
{
int x = i;
events[x] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem
(
_ =>
{
Thread1(x);
events[x].Set();
}
);
}
for (int i = 0; i < 100; i++)
{
events[i].WaitOne();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1(int x)
{
/*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));*/
Thread.Sleep(3000);
}
I am getting 6 seconds for ParallelThread and 9 seconds for ParallelProcess. This remains true even after reversing the order. Which makes me much more confident that this is a real measure of the overhead.
Adding ThreadPool.SetMinThreads(100, 100); bring the time back down to 3 seconds, for both ParallelThread (remember that this version is using the ThreadPool) and ParallelProcess. Meaning that this overhead comes from the thread pool. Now, I can go back to the version that spawns new threads (modified to spawn 100 and with Console.WriteLine commented):
public static bool ParallelThread()
{
Stopwatch sw = new Stopwatch();
sw.Start();
var threads = new Thread[100];
for (int i = 0; i < 100; i++)
{
int x = i;
threads[i] = new Thread(() => Thread1(x));
threads[i].Start();
}
for (int i = 0; i < 100; i++)
{
threads[i].Join();
}
sw.Stop();
Console.WriteLine(string.Format("Time in secs {0}", sw.Elapsed.Seconds));
return true;
}
private static void Thread1(int x)
{
/*Console.WriteLine(string.Format("Printing {0} thread = {1}", x,
Thread.CurrentThread.ManagedThreadId));*/
Thread.Sleep(3000);
}
I get consistent 3 seconds from this version (meaning the time overhead is negligible, since, as I said earlier, Thread.Sleep(3000) is 3 seconds), however I want to note that it would be leaving more garbage to collect than using the ThreadPool or Parallel.For. On the other hand, using Parallel.For remains tied to the ThreadPool. By the way, if you want to degrade its performance, reducing the minimun number of threads is not enough, you got to degreade the maximun number of threads too (e.g. ThreadPool.SetMaxThreads(1, 1);).
All in all, please notice that Parallel.For is easier to use, and harder to wrong.
Invoking 10 threads would appear to be faster, can anyone please explain?
Spawning threads is fast. Although, it will leade to more garbage. Also, note that your test is not great.
Would threads use multiple processors available in the system (to get executed in parallel) or does it just do time slicing in reference to CLR?
Yes, they would. They map to the underlaying operating system threads, can be preempted by it, and will run in any core according to their affinity (see ProcessThread.ProcessorAffinity). To be clear, they are not fibers nor coroutines.
To put it in the simplest of the simplest terms, using Thread class guarantees to create a thread on the operating system level but using the Parallel.For the CLR thinks twice before spawning the OS-level threads. If it feels that it is a good time to create thread on OS-level, it goes ahead, otherwise it employs the available Thread pool. TPL is written to be optimized with a multi-core environment.

Using Task Parallel Library do handle frequent URL requests

I am using .Net to build a stock quote updater. Suppose there are X number of stock symbols to be updated during market hours. in order to keep the updating at a pace not exceeding data provider's limit (e.g. Yahoo finance), I will try to limit the number of requests/sec by using a mechanism similar to thread pool. Let's say I want to allow only 5 requests/sec, that corresponds to a pool of 5 threads.
I heard about TPL and would like to use it although I am inexperienced of it. How can I specify the number of threads in the implicitly used pool in Task? Here is a loop to schedule the requests where requestFunc(url) is the function to update quotes. I like to get some comments or suggestions from the experts to do it properly:
// X is a number much bigger than 5
List<Task> tasks = new List<Task>();
for (int i=0; i<X; i++)
{
Task t = Task.Factory.StartNew(() => { requestFunc(url); }, TaskCreationOptions.None);
t.Wait(100); //slow down 100 ms. I am not sure if this is the right thing to do
tasks.Add(t);
}
Task.WaitAll(tasks);
Ok, I added a outer loop to make it run continuously. When I make some changes of #steve16351 's code, it only loops once. Why????
static void Main(string[] args)
{
LimitedExecutionRateTaskScheduler scheduler = new LimitedExecutionRateTaskScheduler(5);
TaskFactory factory = new TaskFactory(scheduler);
List<string> symbolsToCheck = new List<string>() { "GOOG", "AAPL", "MSFT", "AGIO", "MNK", "SPY", "EBAY", "INTC" };
while (true)
{
List<Task> tasks = new List<Task>();
Console.WriteLine("Starting...");
foreach (string symbol in symbolsToCheck)
{
Task t = factory.StartNew(() => { write(symbol); },
CancellationToken.None, TaskCreationOptions.None, scheduler);
tasks.Add(t);
}
//Task.WhenAll(tasks);
Console.WriteLine("Ending...");
Console.Read();
}
//Console.Read();
}
public static void write (string symbol)
{
DateTime dateValue = DateTime.Now;
//Console.WriteLine("[{0:HH:mm:ss}] Doing {1}..", DateTime.Now, symbol);
Console.WriteLine("Date and Time with Milliseconds: {0} doing {1}..",
dateValue.ToString("MM/dd/yyyy hh:mm:ss.fff tt"), symbol);
}
If you want to have a flow of url requests while limiting to no more than 5 concurrent operations you should use TPL Dataflow's ActionBlock:
var block = new ActionBlock<string>(
url => requestFunc(url),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
foreach (var url in urls)
{
block.Post(url);
}
block.Complete();
await block.Completion;
You Post to it the urls and for each of them it would perform the request while making sure there are no more than MaxDegreeOfParallelism requests at a time.
When you are done, you can call Complete to signal the block for completion and await the Completion task to asynchronously wait until the block actually completes.
Do not worry about the amount of threads; just make sure that you are not exceeding the number of requests per sec. Use a single timer to signal a ManualResetEvent every 200 ms and have the tasks wait for that ManualResetEvent inside a loop.
To create a timer and make it signal the ManualResetEvent every 200 ms:
resetEvent = new ManualResetEvent(false);
timer = new Timer((state)=>resetEvent.Set(), 200, 0);
Make sure you clean up the timer (call Dispose) when you do not need it anymore.
Let the number of threads be determined by the run-time.
This would be a poor implementation if you create a single task per stock because you do not know when a stock will be updated.
So you could just put all the stocks in a list and have a single task update each stock one after another.
By giving another list of stocks to another task you could give that task a higher priority by setting its timer to every 250 ms and the low priority to every 1000 ms. That would add up to 5 times a second and the high priority list would be updated 4 times more often than the low priority.
You could use a custom task scheduler which limits the rate at which tasks can start.
In the below, tasks are queued up, and dequeued with a timer set to the frequency of your maximum allowed rate. So if 5 requests a second, the timer is set to 200ms. On the tick, a task is then dequeued and executed from those that are pending.
EDIT: In addition to the request rate, you can also extend to control the maximum number of executing threads as well.
static void Main(string[] args)
{
TaskFactory factory = new TaskFactory(new LimitedExecutionRateTaskScheduler(5, 5)); // 5 per second, 5 max executing
List<string> symbolsToCheck = new List<string>() { "GOOG", "AAPL", "MSFT" };
for (int i = 0; i < 5; i++)
symbolsToCheck.AddRange(symbolsToCheck);
foreach (string symbol in symbolsToCheck)
{
factory.StartNew(() =>
{
Console.WriteLine("[{0:HH:mm:ss}] [{1}] Doing {2}..", DateTime.Now, Thread.CurrentThread.ManagedThreadId, symbol);
Thread.Sleep(5000);
Console.WriteLine("[{0:HH:mm:ss}] [{1}] {2} is done", DateTime.Now, Thread.CurrentThread.ManagedThreadId, symbol);
});
}
Console.Read();
}
public class LimitedExecutionRateTaskScheduler : TaskScheduler
{
private ConcurrentQueue<Task> _pendingTasks = new ConcurrentQueue<Task>();
private readonly object _taskLocker = new object();
private List<Task> _executingTasks = new List<Task>();
private readonly int _maximumConcurrencyLevel = 5;
private Timer _doWork = null;
public LimitedExecutionRateTaskScheduler(double requestsPerSecond, int maximumDegreeOfParallelism)
{
_maximumConcurrencyLevel = maximumDegreeOfParallelism;
long frequency = (long)(1000.0 / requestsPerSecond);
_doWork = new Timer(ExecuteRequests, null, frequency, frequency);
}
public override int MaximumConcurrencyLevel
{
get
{
return _maximumConcurrencyLevel;
}
}
protected override bool TryDequeue(Task task)
{
return base.TryDequeue(task);
}
protected override void QueueTask(Task task)
{
_pendingTasks.Enqueue(task);
}
private void ExecuteRequests(object state)
{
Task queuedTask = null;
int currentlyExecutingTasks = 0;
lock (_taskLocker)
{
for (int i = 0; i < _executingTasks.Count; i++)
if (_executingTasks[i].IsCompleted)
_executingTasks.RemoveAt(i--);
currentlyExecutingTasks = _executingTasks.Count;
}
if (currentlyExecutingTasks == MaximumConcurrencyLevel)
return;
if (_pendingTasks.TryDequeue(out queuedTask) == false)
return; // no work to do
lock (_taskLocker)
_executingTasks.Add(queuedTask);
base.TryExecuteTask(queuedTask);
}
protected override bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued)
{
return false; // not properly implemented just to complete the class
}
protected override IEnumerable<Task> GetScheduledTasks()
{
return new List<Task>(); // not properly implemented just to complete the class
}
}
You could use a while loop with a task delay to control when your requests are issued. Using an async void method to make your requests means you don't get blocked by a failing request.
Async void is fire and forget which some devs don't lkke but I think it would work as a possible solution in this case.
I also think erno de weerd makes a great suggestion around prioritising calls to more important stocks.
Thanks #steve16351! It works like this:
static void Main(string[] args)
{
LimitedExecutionRateTaskScheduler scheduler = new LimitedExecutionRateTaskScheduler(5);
TaskFactory factory = new TaskFactory(scheduler);
List<string> symbolsToCheck = new List<string>() { "GOOG", "AAPL", "MSFT", "AGIO", "MNK", "SPY", "EBAY", "INTC" };
while (true)
{
List<Task> tasks = new List<Task>();
foreach (string symbol in symbolsToCheck)
{
Task t = factory.StartNew(() =>
{
write(symbol);
}, CancellationToken.None,
TaskCreationOptions.None, scheduler);
tasks.Add(t);
}
}
}
public static void write (string symbol)
{
DateTime dateValue = DateTime.Now;
Console.WriteLine("Date and Time with Milliseconds: {0} doing {1}..",
dateValue.ToString("MM/dd/yyyy hh:mm:ss.fff tt"), symbol);
}

How can i use AutoResetEventHandler to signal Main thread function to start threads again once the first set of worker threads are done processing

Requirement :- At any given point of time only 4 threads should be calling four different functions. As soon as these threads complete, next available thread should call the same functions.
Current code :- This seems to be the worst possible way to achieve something like this. While(True) will cause unnecessary CPU spikes and i could see CPU rising to 70% when running the following code.
Question :- How can i use AutoResetEventHandler to signal Main thread Process() function to start next 4 threads again once the first 4 worker threads are done processing without wasting CPU cycles. Please suggest
public class Demo
{
object protect = new object();
private int counter;
public void Process()
{
int maxthread = 4;
while (true)
{
if (counter <= maxthread)
{
counter++;
Thread t = new Thread(new ThreadStart(DoSomething));
t.Start();
}
}
}
private void DoSomething()
{
try
{
Thread.Sleep(50000); //simulate long running process
}
finally
{
lock (protect)
{
counter--;
}
}
}
You can use TPL to achieve what you want in a simpler way. If you run the code below you'll notice that an entry is written after each thread terminates and only after all four threads terminate the "Finished batch" entry is written.
This sample uses the Task.WaitAll to wait for the completion of all tasks. The code uses an infinite loop for illustration purposes only, you should calculate the hasPendingWork condition based on your requirements so that you only start a new batch of tasks if required.
For example:
private static void Main(string[] args)
{
bool hasPendingWork = true;
do
{
var tasks = InitiateTasks();
Task.WaitAll(tasks);
Console.WriteLine("Finished batch...");
} while (hasPendingWork);
}
private static Task[] InitiateTasks()
{
var tasks = new Task[4];
for (int i = 0; i < tasks.Length; i++)
{
int wait = 1000*i;
tasks[i] = Task.Factory.StartNew(() =>
{
Thread.Sleep(wait);
Console.WriteLine("Finished waiting: {0}", wait);
});
}
return tasks;
}
One other thing, from the textual requirement section on your question I'm lead to believe that a batch of four new threads should only start after all previously four threads completed. However the code you posted is not compatible with that requirement, since it starts a new thread immediately after a previous thread terminate. You should clarify what exactly is your requirement.
UPDATE:
If you want to start a thread immediately after one of the four threads terminate you can still use TPL instead of starting new threads explicitly but you can limit the number of running threads to four by using a SemaphoreSlim. For example:
private static SemaphoreSlim TaskController = new SemaphoreSlim(4);
private static void Main(string[] args)
{
var random = new Random(570);
while (true)
{
// Blocks thread without wasting CPU
// if the number of resources (4) is exhausted
TaskController.Wait();
Task.Factory.StartNew(() =>
{
Console.WriteLine("Started");
Thread.Sleep(random.Next(1000, 3000));
Console.WriteLine("Completed");
// Releases a resource meaning TaskController.Wait will unblock
TaskController.Release();
});
}
}

How do you get list of running threads in C#?

I create dynamic threads in C# and I need to get the status of those running threads.
List<string>[] list;
list = dbConnect.Select();
for (int i = 0; i < list[0].Count; i++)
{
Thread th = new Thread(() =>{
sendMessage(list[0]['1']);
//calling callback function
});
th.Name = "SID"+i;
th.Start();
}
for (int i = 0; i < list[0].Count; i++)
{
// here how can i get list of running thread here.
}
How can you get list of running threads?
On Threads
I would avoid explicitly creating threads on your own.
It is much more preferable to use the ThreadPool.QueueUserWorkItem or if you do can use .Net 4.0 you get the much more powerful Task parallel library which also allows you to use a ThreadPool threads in a much more powerful way (Task.Factory.StartNew is worth a look)
What if we choose to go by the approach of explicitly creating threads?
Let's suppose that your list[0].Count returns 1000 items. Let's also assume that you are performing this on a high-end (at the time of this writing) 16core machine. The immediate effect is that we have 1000 threads competing for these limited resources (the 16 cores).
The larger the number of tasks and the longer each of them runs, the more time will be spent in context switching. In addition, creating threads is expensive, this overhead creating each thread explicitly could be avoided if an approach of reusing existing threads is used.
So while the initial intent of multithreading may be to increase speed, as we can see it can have quite the opposite effect.
How do we overcome 'over'-threading?
This is where the ThreadPool comes into play.
A thread pool is a collection of threads that can be used to perform a number of tasks in the background.
How do they work:
Once a thread in the pool completes its task, it is returned to a queue of waiting threads, where it can be reused. This reuse enables applications to avoid the cost of creating a new thread for each task.
Thread pools typically have a maximum number of threads. If all the threads are busy, additional tasks are placed in queue until they can be serviced as threads become available.
So we can see that by using a thread pool threads we are more efficient both
in terms of maximizing the actual work getting done. Since we are not over saturating the processors with threads, less time is spent switching between threads and more time actually executing the code that a thread is supposed to do.
Faster thread startup: Each threadpool thread is readily available as opposed to waiting until a new thread gets constructed.
in terms of minimising memory consumption, the threadpool will limit the number of threads to the threadpool size enqueuing any requests that are beyond the threadpool size limit. (see ThreadPool.GetMaxThreads). The primary reason behind this design choice, is of course so that we don't over-saturate the limited number of cores with too many thread requests keeping context switching to lower levels.
Too much Theory, let's put all this theory to the test!
Right, it's nice to know all this in theory, but let's put it to practice and see what
the numbers tell us, with a simplified crude version of the application that can give us a coarse indication of the difference in orders of magnitude. We will do a comparison between new Thread, ThreadPool and Task Parallel Library (TPL)
new Thread
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
Thread thread = new Thread(() =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0) // By the way, what does your sendMessage(list[0]['1']); mean? what is this '1'? if it is i you are not thread safe.
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
thread.Name = "SID" + iCopy; // you can also use i here.
thread.Start();
}
Console.ReadKey();
Console.WriteLine(GC.GetTotalMemory(false) - initialMemoryFootPrint);
Console.ReadKey();
}
Result:
ThreadPool.EnqueueUserWorkItem
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
ThreadPool.QueueUserWorkItem((w) =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0)
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
}
Console.ReadKey();
Console.WriteLine("Memory usage: " + (GC.GetTotalMemory(false) - initialMemoryFootPrint));
Console.ReadKey();
}
Result:
Task Parallel Library (TPL)
static void Main(string[] args)
{
int itemCount = 1000;
Stopwatch stopwatch = new Stopwatch();
long initialMemoryFootPrint = GC.GetTotalMemory(true);
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
int iCopy = i; // You should not use 'i' directly in the thread start as it creates a closure over a changing value which is not thread safe. You should create a copy that will be used for that specific variable.
Task.Factory.StartNew(() =>
{
// lets simulate something that takes a while
int k = 0;
while (true)
{
if (k++ > 100000)
break;
}
if ((iCopy + 1) % 200 == 0) // By the way, what does your sendMessage(list[0]['1']); mean? what is this '1'? if it is i you are not thread safe.
Console.WriteLine(iCopy + " - Time elapsed: (ms)" + stopwatch.ElapsedMilliseconds);
});
}
Console.ReadKey();
Console.WriteLine("Memory usage: " + (GC.GetTotalMemory(false) - initialMemoryFootPrint));
Console.ReadKey();
}
Result:
So we can see that:
+--------+------------+------------+--------+
| | new Thread | ThreadPool | TPL |
+--------+------------+------------+--------+
| Time | 6749 | 228ms | 222ms |
| Memory | ≈300kb | ≈103kb | ≈123kb |
+--------+------------+------------+--------+
The above falls nicely inline to what we anticipated in theory. High memory for new Thread as well as slower overall performance when compared to ThreadPool. ThreadPool and TPL have equivalent performance with TPL having a slightly higher memory footprint than a pure thread pool but it's probably a price worth paying given the added flexibility Tasks provide (such as cancellation, waiting for completion querying status of task)
At this point, we have proven that using ThreadPool threads is the preferable option in terms of speed and memory.
Still, we have not answered your question. How to track the state of the threads running.
To answer your question
Given the insights we have gathered, this is how I would approach it:
List<string>[] list = listdbConnect.Select()
int itemCount = list[0].Count;
Task[] tasks = new Task[itemCount];
stopwatch.Start();
for (int i = 0; i < itemCount; i++)
{
tasks[i] = Task.Factory.StartNew(() =>
{
// NOTE: Do not use i in here as it is not thread safe to do so!
sendMessage(list[0]['1']);
//calling callback function
});
}
// if required you can wait for all tasks to complete
Task.WaitAll(tasks);
// or for any task you can check its state with properties such as:
tasks[1].IsCanceled
tasks[1].IsCompleted
tasks[1].IsFaulted
tasks[1].Status
As a final note, you can not use the variable i in your Thread.Start, since it would create a closure over a changing variable which would effectively be shared amongst all Threads. To get around this (assuming you need to access i), simply create a copy of the variable and pass the copy in, this would make one closure per thread which would make it thread safe.
Good luck!
Use Process.Threads:
var currentProcess = Process.GetCurrentProcess();
var threads = currentProcess.Threads;
Note: any threads owned by the current process will show up here, including those not explicitly created by you.
If you only want the threads that you created, well, why don't you just keep track of them when you create them?
Create a List<Thread> and store each new thread in your first for loop in it.
List<string>[] list;
List<Thread> threads = new List<Thread>();
list = dbConnect.Select();
for (int i = 0; i < list[0].Count; i++)
{
Thread th = new Thread(() =>{
sendMessage(list[0]['1']);
//calling callback function
});
th.Name = "SID"+i;
th.Start();
threads.add(th)
}
for (int i = 0; i < list[0].Count; i++)
{
threads[i].DoStuff()
}
However if you don't need i you can make the second loop a foreach instead of a for
As a side note, if your sendMessage function does not take very long to execute you should somthing lighter weight then a full Thread, use a ThreadPool.QueueUserWorkItem or if it is available to you, a Task
Process.GetCurrentProcess().Threads
This gives you a list of all threads running in the current process, but beware that there are threads other than those you started yourself.
Use Process.Threads to iterate through your threads.

Categories

Resources