Run a given number of the same process concurrently - c#

My situation is simple but complex. I am trying to write a program, that needs to execute an external process 1100 times, 4 at a time. I am completely stumped on how to go about this. The application I am writing is a "Windows Form Application", and I am utilizing BackgroundWorker to run the tasks async.
Example, I have a list of 1100 different strings, and I want to run the process 1 time per string, but only 4 at a time, and then move on to the next 4.
Any help would be appreciated.

Consider the following code:
private async void CodeOnUiThread()
{
//do ui stuff before starting
await ExecuteProcesses();
//do ui stuff after completing.
}
private async Task ExecuteProcesses()
{
await Task.Factory.StartNew(() =>
{
List<string> myStrings = GetMyStrings(); //or whatever you need
Parallel.ForEach(myStrings,
new ParallelOptions()
{
MaxDegreeOfParallelism = 4
}, (s) =>
{
var process = new Process();
process.StartInfo = new ProcessStartInfo("myProcess.exe", s);
process.Start();
process.WaitForExit();
});
});
}
This allows a maximum of 4 threads to run simultaniously, thus not allowing more than 4 processes to execute at the same time.
Update:
You can also use Environment.ProcessorCount to get the number of cores. However the Parallel.ForEach call will handle this correctly by default.
Update 2
Parallel.ForEach will block the thread currently running. I have updated the code above.

How about this (complete example):
class Program
{
static void Main(string[] args)
{
List<StringContainer> strContainers = new List<StringContainer>();
for (int i = 0; i < 1100; i++)
{
strContainers.Add(new StringContainer() { str = "string" + i });
}
Parallel.ForEach(
strContainers,
new ParallelOptions() { MaxDegreeOfParallelism = 4 },
x => ProcessString(x));
foreach (var item in strContainers)
{
Console.WriteLine(item.str);
}
Console.ReadKey();
}
private static void ProcessString(StringContainer strContainer)
{
strContainer.str += "_processed";
}
}
public class StringContainer
{
public string str;
}

Related

How to make parallel async and await use 100% of the cpu?

I have searched a lot of codes saying that is possible to use multiple cores with parallel async, but none worked. It is always stuck in a single core.
Is it possible?
This is uses 100% of the cpu:
class Program
{
static void Main(string[] args)
{
Parallel.For(0, 100000, p =>
{
while (true) { }
});
}
}
Using parallel async, I dont get more than 12%:
class Program
{
static void Main(string[] args)
{
Task.Run(async () =>
{
var tasks = Enumerable.Range(0, 1000000).Select(i =>
{
while (true) { }
return Task.FromResult(0);
});
await Task.WhenAll(tasks);
}).GetAwaiter().GetResult();
}
}
100% cpu. Thanks all that helped, and not just mocked or tried to close:
class Program
{
static void Main(string[] args)
{
Task.Run(async () =>
{
var tasks = new List<Task>();
for (int i = 0; i < 100; i++)
tasks.Add(Task.Run(() =>
{
while (true) { }
}));
await Task.WhenAll(tasks);
}).GetAwaiter().GetResult();
}
}
Your code only creates one task, Task.FromResult returns a finished task and it's added after the while loop, that will be executed 100000 times, but one after other as the generation is done by the synchronous function Select.
You can change your code to this:
class Program
{
static void Main(string[] args)
{
var tasks = Enumerable.Range(0, 1000000).Select(i =>
{
return Task.Run(() =>
{
while (true) { }
return 0;
});
});
Task.WhenAll(tasks).GetAwaiter().GetResult();
}
}
It will use the 100%, tested.
When WhenAll goes to get the first task, it'll ask your query, task, for the first task. It will use the selector to try to translate the first result of Range (0), into a Task. Since your selector has an infinite loop, that will never return. The only work that ever happens is sitting waiting for the first task to be generated, when it never will be.
Other than that your main thread does nothing but ask a thread pool thread to do all the things, you never create any additional threads, and thus have no oppertunity to use multiple cores on your CPU.

Executing N number of threads in parallel and in a sequential manner

I have an application where i have 1000+ small parts of 1 large file.
I have to upload maximum of 16 parts at a time.
I used Thread parallel library of .Net.
I used Parallel.For to divide in multiple parts and assigned 1 method which should be executed for each part and set DegreeOfParallelism to 16.
I need to execute 1 method with checksum values which are generated by different part uploads, so i have to set certain mechanism where i have to wait for all parts upload say 1000 to complete.
In TPL library i am facing 1 issue is it is randomly executing any of the 16 threads from 1000.
I want some mechanism using which i can run first 16 threads initially, if the 1st or 2nd or any of the 16 thread completes its task next 17th part should be started.
How can i achieve this ?
One possible candidate for this can be TPL Dataflow. This is a demonstration which takes in a stream of integers and prints them out to the console. You set the MaxDegreeOfParallelism to whichever many threads you wish to spin in parallel:
void Main()
{
var actionBlock = new ActionBlock<int>(
i => Console.WriteLine(i),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 16});
foreach (var i in Enumerable.Range(0, 200))
{
actionBlock.Post(i);
}
}
This can also scale well if you want to have multiple producer/consumers.
Here is the manual way of doing this.
You need a queue. The queue is sequence of pending tasks. You have to dequeue and put them inside list of working task. When ever the task is done remove it from list of working task and take another from queue. Main thread controls this process. Here is the sample of how to do this.
For the test i used List of integer but it should work for other types because its using generics.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
ParallelQueue(items, DoWork);
}
private static void ParallelQueue<T>(List<T> items, Action<T> action)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (pending.Count != 0 && working.Count < 16) // Maximum tasks
{
var item = pending.Dequeue(); // get item from queue
working.Add(Task.Run(() => action((T)item))); // run task
}
else
{
Task.WaitAny(working.ToArray());
working.RemoveAll(x => x.IsCompleted); // remove finished tasks
}
}
}
private static void DoWork(int i) // do your work here.
{
// this is just an example
Task.Delay(i).Wait();
Console.WriteLine(i);
}
Please let me know if you encounter problem of how to implement DoWork for your self. because if you change method signature you may need to do some changes.
Update
You can also do this with async await without blocking the main thread.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
Task t = ParallelQueue(items, DoWork);
// able to do other things.
t.Wait();
}
private static async Task ParallelQueue<T>(List<T> items, Func<T, Task> func)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (working.Count < 16 && pending.Count != 0)
{
var item = pending.Dequeue();
working.Add(Task.Run(async () => await func((T)item)));
}
else
{
await Task.WhenAny(working);
working.RemoveAll(x => x.IsCompleted);
}
}
}
private static async Task DoWork(int i)
{
await Task.Delay(i);
}
var workitems = ... /*e.g. Enumerable.Range(0, 1000000)*/;
SingleItemPartitioner.Create(workitems)
.AsParallel()
.AsOrdered()
.WithDegreeOfParallelism(16)
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.ForAll(i => { Thread.Slee(1000); Console.WriteLine(i); });
This should be all you need. I forgot how the methods are named exactly... Look at the documentation.
Test this by printing to the console after sleeping for 1sec (which this sample code does).
Another option would be to use a BlockingCollection<T> as a queue between your file reader thread and your 16 uploader threads. Each uploader thread would just loop around consuming the blocking collection until it is complete.
And, if you want to limit memory consumption in the queue you can set an upper limit on the blocking collection such that the file-reader thread will pause when the buffer has reached capacity. This is particularly useful in a server environment where you may need to limit memory used per user/API call.
// Create a buffer of 4 chunks between the file reader and the senders
BlockingCollection<Chunk> queue = new BlockingCollection<Chunk>(4);
// Create a cancellation token source so you can stop this gracefully
CancellationTokenSource cts = ...
File reader thread
...
queue.Add(chunk, cts.Token);
...
queue.CompleteAdding();
Sending threads
for(int i = 0; i < 16; i++)
{
Task.Run(() => {
foreach (var chunk in queue.GetConsumingEnumerable(cts.Token))
{
.. do the upload
}
});
}

Using Task Parallel Library do handle frequent URL requests

I am using .Net to build a stock quote updater. Suppose there are X number of stock symbols to be updated during market hours. in order to keep the updating at a pace not exceeding data provider's limit (e.g. Yahoo finance), I will try to limit the number of requests/sec by using a mechanism similar to thread pool. Let's say I want to allow only 5 requests/sec, that corresponds to a pool of 5 threads.
I heard about TPL and would like to use it although I am inexperienced of it. How can I specify the number of threads in the implicitly used pool in Task? Here is a loop to schedule the requests where requestFunc(url) is the function to update quotes. I like to get some comments or suggestions from the experts to do it properly:
// X is a number much bigger than 5
List<Task> tasks = new List<Task>();
for (int i=0; i<X; i++)
{
Task t = Task.Factory.StartNew(() => { requestFunc(url); }, TaskCreationOptions.None);
t.Wait(100); //slow down 100 ms. I am not sure if this is the right thing to do
tasks.Add(t);
}
Task.WaitAll(tasks);
Ok, I added a outer loop to make it run continuously. When I make some changes of #steve16351 's code, it only loops once. Why????
static void Main(string[] args)
{
LimitedExecutionRateTaskScheduler scheduler = new LimitedExecutionRateTaskScheduler(5);
TaskFactory factory = new TaskFactory(scheduler);
List<string> symbolsToCheck = new List<string>() { "GOOG", "AAPL", "MSFT", "AGIO", "MNK", "SPY", "EBAY", "INTC" };
while (true)
{
List<Task> tasks = new List<Task>();
Console.WriteLine("Starting...");
foreach (string symbol in symbolsToCheck)
{
Task t = factory.StartNew(() => { write(symbol); },
CancellationToken.None, TaskCreationOptions.None, scheduler);
tasks.Add(t);
}
//Task.WhenAll(tasks);
Console.WriteLine("Ending...");
Console.Read();
}
//Console.Read();
}
public static void write (string symbol)
{
DateTime dateValue = DateTime.Now;
//Console.WriteLine("[{0:HH:mm:ss}] Doing {1}..", DateTime.Now, symbol);
Console.WriteLine("Date and Time with Milliseconds: {0} doing {1}..",
dateValue.ToString("MM/dd/yyyy hh:mm:ss.fff tt"), symbol);
}
If you want to have a flow of url requests while limiting to no more than 5 concurrent operations you should use TPL Dataflow's ActionBlock:
var block = new ActionBlock<string>(
url => requestFunc(url),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
foreach (var url in urls)
{
block.Post(url);
}
block.Complete();
await block.Completion;
You Post to it the urls and for each of them it would perform the request while making sure there are no more than MaxDegreeOfParallelism requests at a time.
When you are done, you can call Complete to signal the block for completion and await the Completion task to asynchronously wait until the block actually completes.
Do not worry about the amount of threads; just make sure that you are not exceeding the number of requests per sec. Use a single timer to signal a ManualResetEvent every 200 ms and have the tasks wait for that ManualResetEvent inside a loop.
To create a timer and make it signal the ManualResetEvent every 200 ms:
resetEvent = new ManualResetEvent(false);
timer = new Timer((state)=>resetEvent.Set(), 200, 0);
Make sure you clean up the timer (call Dispose) when you do not need it anymore.
Let the number of threads be determined by the run-time.
This would be a poor implementation if you create a single task per stock because you do not know when a stock will be updated.
So you could just put all the stocks in a list and have a single task update each stock one after another.
By giving another list of stocks to another task you could give that task a higher priority by setting its timer to every 250 ms and the low priority to every 1000 ms. That would add up to 5 times a second and the high priority list would be updated 4 times more often than the low priority.
You could use a custom task scheduler which limits the rate at which tasks can start.
In the below, tasks are queued up, and dequeued with a timer set to the frequency of your maximum allowed rate. So if 5 requests a second, the timer is set to 200ms. On the tick, a task is then dequeued and executed from those that are pending.
EDIT: In addition to the request rate, you can also extend to control the maximum number of executing threads as well.
static void Main(string[] args)
{
TaskFactory factory = new TaskFactory(new LimitedExecutionRateTaskScheduler(5, 5)); // 5 per second, 5 max executing
List<string> symbolsToCheck = new List<string>() { "GOOG", "AAPL", "MSFT" };
for (int i = 0; i < 5; i++)
symbolsToCheck.AddRange(symbolsToCheck);
foreach (string symbol in symbolsToCheck)
{
factory.StartNew(() =>
{
Console.WriteLine("[{0:HH:mm:ss}] [{1}] Doing {2}..", DateTime.Now, Thread.CurrentThread.ManagedThreadId, symbol);
Thread.Sleep(5000);
Console.WriteLine("[{0:HH:mm:ss}] [{1}] {2} is done", DateTime.Now, Thread.CurrentThread.ManagedThreadId, symbol);
});
}
Console.Read();
}
public class LimitedExecutionRateTaskScheduler : TaskScheduler
{
private ConcurrentQueue<Task> _pendingTasks = new ConcurrentQueue<Task>();
private readonly object _taskLocker = new object();
private List<Task> _executingTasks = new List<Task>();
private readonly int _maximumConcurrencyLevel = 5;
private Timer _doWork = null;
public LimitedExecutionRateTaskScheduler(double requestsPerSecond, int maximumDegreeOfParallelism)
{
_maximumConcurrencyLevel = maximumDegreeOfParallelism;
long frequency = (long)(1000.0 / requestsPerSecond);
_doWork = new Timer(ExecuteRequests, null, frequency, frequency);
}
public override int MaximumConcurrencyLevel
{
get
{
return _maximumConcurrencyLevel;
}
}
protected override bool TryDequeue(Task task)
{
return base.TryDequeue(task);
}
protected override void QueueTask(Task task)
{
_pendingTasks.Enqueue(task);
}
private void ExecuteRequests(object state)
{
Task queuedTask = null;
int currentlyExecutingTasks = 0;
lock (_taskLocker)
{
for (int i = 0; i < _executingTasks.Count; i++)
if (_executingTasks[i].IsCompleted)
_executingTasks.RemoveAt(i--);
currentlyExecutingTasks = _executingTasks.Count;
}
if (currentlyExecutingTasks == MaximumConcurrencyLevel)
return;
if (_pendingTasks.TryDequeue(out queuedTask) == false)
return; // no work to do
lock (_taskLocker)
_executingTasks.Add(queuedTask);
base.TryExecuteTask(queuedTask);
}
protected override bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued)
{
return false; // not properly implemented just to complete the class
}
protected override IEnumerable<Task> GetScheduledTasks()
{
return new List<Task>(); // not properly implemented just to complete the class
}
}
You could use a while loop with a task delay to control when your requests are issued. Using an async void method to make your requests means you don't get blocked by a failing request.
Async void is fire and forget which some devs don't lkke but I think it would work as a possible solution in this case.
I also think erno de weerd makes a great suggestion around prioritising calls to more important stocks.
Thanks #steve16351! It works like this:
static void Main(string[] args)
{
LimitedExecutionRateTaskScheduler scheduler = new LimitedExecutionRateTaskScheduler(5);
TaskFactory factory = new TaskFactory(scheduler);
List<string> symbolsToCheck = new List<string>() { "GOOG", "AAPL", "MSFT", "AGIO", "MNK", "SPY", "EBAY", "INTC" };
while (true)
{
List<Task> tasks = new List<Task>();
foreach (string symbol in symbolsToCheck)
{
Task t = factory.StartNew(() =>
{
write(symbol);
}, CancellationToken.None,
TaskCreationOptions.None, scheduler);
tasks.Add(t);
}
}
}
public static void write (string symbol)
{
DateTime dateValue = DateTime.Now;
Console.WriteLine("Date and Time with Milliseconds: {0} doing {1}..",
dateValue.ToString("MM/dd/yyyy hh:mm:ss.fff tt"), symbol);
}

a static variable to limit the number of processes

I have multiple instances of a class that has a function that does some process that lasts more than an hour, and I need to allow only a max of 2 processes running at a time across all instances, and if the number of processes was 2 then it has to wait until the the value of running process goes under 2, so I came up with this
public class SomeClass
{
private static int _ProcessesRunningCount=0;
public int ProcessesRunningCount
{
get {return Interlocked.CompareExchange(ref _ProcessesRunningCount, 0, 0); }
}
public void StartProcessing()
{
if (ProcessesRunningCount < 2)
{
Interlocked.Increment(ref _ProcessesRunningCount);
Task.Factory.StartNew(() => Process());
}
else
{
//wait and start after _ProcessesRunningCount gets to less than 2
}
}
private void Process()
{
//Do the processing
System.Threading.Thread.Sleep(100000);
Interlocked.Decrement(ref _ProcessesRunningCount);
}
}
However I am not sure how to achieve the wait part, and not sure if that is a good way to do it, but I don't want to create a manager class that handles everything
example
var A = new SomeClass();
var B = new SomeClass();
var C = new SomeClass();
var D = new SomeClass();
A.StartProcessing(); //process will start
B.StartProcessing(); //process will start
C.StartProcessing(); //process will wait until _ProcessesRunningCount goes under 2
D.StartProcessing(); //process will wait until _ProcessesRunningCount goes under 2
You can use a semaphore to limit the number of processes you spin up. There's an example on MSDN that should fit right into your current design. A semaphore is similar to a mutex (lock), but it allows more than 1 thread to access the critical section. The Thread in the example will start a Process and should block until the process exits.

Run x number of web requests simultaneously

Our company has a web service which I want to send XML files (stored on my drive) via my own HTTPWebRequest client in C#. This already works. The web service supports 5 synchronuous requests at the same time (I get a response from the web service once the processing on the server is completed). Processing takes about 5 minutes for each request.
Throwing too many requests (> 5) results in timeouts for my client. Also, this can lead to errors on the server side and incoherent data. Making changes on the server side is not an option (from different vendor).
Right now, my Webrequest client will send the XML and wait for the response using result.AsyncWaitHandle.WaitOne();
However, this way, only one request can be processed at a time although the web service supports 5. I tried using a Backgroundworker and Threadpool but they create too many requests at same, which make them useless to me. Any suggestion, how one could solve this problem? Create my own Threadpool with exactly 5 threads? Any suggestions, how to implement this?
The easy way is to create 5 threads ( aside: that's an odd number! ) that consume the xml files from a BlockingCollection.
Something like:
var bc = new BlockingCollection<string>();
for ( int i = 0 ; i < 5 ; i++ )
{
new Thread( () =>
{
foreach ( var xml in bc.GetConsumingEnumerable() )
{
// do work
}
}
).Start();
}
bc.Add( xml_1 );
bc.Add( xml_2 );
...
bc.CompleteAdding(); // threads will end when queue is exhausted
If you're on .Net 4, this looks like a perfect fit for Parallel.ForEach(). You can set its MaxDegreeOfParallelism, which means you are guaranteed that no more items are processed at one time.
Parallel.ForEach(items,
new ParallelOptions { MaxDegreeOfParallelism = 5 },
ProcessItem);
Here, ProcessItem is a method that processes one item by accessing your server and blocking until the processing is done. You could use a lambda instead, if you wanted.
Creating your own threadpool of five threads isn't tricky - Just create a concurrent queue of objects describing the request to make, and have five threads that loop through performing the task as needed. Add in an AutoResetEvent and you can make sure they don't spin furiously while there are no requests that need handling.
It can though be tricky to return the response to the correct calling thread. If this is the case for how the rest of your code works, I'd take a different approach and create a limiter that acts a bit like a monitor but allowing 5 simultaneous threads rather than only one:
private static class RequestLimiter
{
private static AutoResetEvent _are = new AutoResetEvent(false);
private static int _reqCnt = 0;
public ResponseObject DoRequest(RequestObject req)
{
for(;;)
{
if(Interlocked.Increment(ref _reqCnt) <= 5)
{
//code to create response object "resp".
Interlocked.Decrement(ref _reqCnt);
_are.Set();
return resp;
}
else
{
if(Interlocked.Decrement(ref _reqCnt) >= 5)//test so we don't end up waiting due to race on decrementing from finished thread.
_are.WaitOne();
}
}
}
}
You could write a little helper method, that would block the current thread until all the threads have finished executing the given action delegate.
static void SpawnThreads(int count, Action action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
new Thread(() =>
{
action();
countdown.Signal();
}).Start();
}
countdown.Wait();
}
And then use a BlockingCollection<string> (thread-safe collection), to keep track of your xml files. By using the helper method above, you could write something like:
static void Main(string[] args)
{
var xmlFiles = new BlockingCollection<string>();
// Add some xml files....
SpawnThreads(5, () =>
{
using (var web = new WebClient())
{
web.UploadFile(xmlFiles.Take());
}
});
Console.WriteLine("Done");
Console.ReadKey();
}
Update
An even better approach would be to upload the files async, so that you don't waste resources on using threads for an IO task.
Again you could write a helper method:
static void SpawnAsyncs(int count, Action<CountdownEvent> action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
action(countdown);
}
countdown.Wait();
}
And use it like:
static void Main(string[] args)
{
var urlXML = new BlockingCollection<Tuple<string, string>>();
urlXML.Add(Tuple.Create("http://someurl.com", "filename"));
// Add some more to collection...
SpawnAsyncs(5, c =>
{
using (var web = new WebClient())
{
var current = urlXML.Take();
web.UploadFileCompleted += (s, e) =>
{
// some code to mess with e.Result (response)
c.Signal();
};
web.UploadFileAsyncAsync(new Uri(current.Item1), current.Item2);
}
});
Console.WriteLine("Done");
Console.ReadKey();
}

Categories

Resources