Net 6 ConsoleApp multiple BlockingCollection<T> huge CPU consumption - c#

I have a Net 6 Console app where I use several BlockingCollections to process files that are dropped in a folder. I watch the folder using Net's FileWatcher().
In the Created event, I use a Channel to handle the processing, which is done in two phases, and after each phase the result item is moved to a BlockingCollection, that will then be consumed by the next phase.
Program.cs
public static async Task Main(string[] args)
{
BlockingCollection<FileMetadata> _fileMetaDataQueue = new BlockingCollection<FileMetadata>()
var channel = Channel.CreateUnbounded<FileSystemEventArgs>();
// Start a task to monitor the channel and process notifications
var notificationProcessor = Task.Run(() => ProcessNotifications(channel, _fileMetaDataQueue));
Task fileCopyingTask = Task.Run(() => fileCopyThread.Start()); //injected using DI
Task processMovedFile = Task.Run(() => ProcessDestinationThread.Start()); //injected using DI
Task retryOnErrorTask = Task.Run(() => RetryOnErrorThread.Start()); //injected using DI
using var watcher = new FileSystemWatcher(sourceFolder); //C:\temp
// other fw related config
watcher.Created += (sender, e) => channel.Writer.WriteAsync(e);
}
private async Task ProcessNotifications(Channel<FileSystemEventArgs> channel, BlockingCollection<FileMetadata> queue)
{
await foreach (var e in channel.Reader.ReadAllAsync())
{
Thread.Sleep(300); // So the file is released after it is dropped
try
{
// Process the file and add its name and extension to the queue
FileMetaData fileInfo = ExtractFileMetadata(e.FullPath); //processing method
queue.Add(fileInfo);
}
try
{
// logging etc
}
}
}
The BlockingCollection queue is then consumed in the FileCopyThread class, with the Start() method exposed (and called)
FileCopyThread.cs
BlockingCollection<FileMetadata> resultQueue = new();
BlockingCollection<FileMetadata> retryQueue = new();
public async Task Start()
{
await Task.Run(() => {
ProcessQueue();
});
}
private void ProcessQueue()
{
// Since IsCompleted is never set, it will always run
while (!fileMetadataQueue.IsCompleted)
{
// Try to remove an item from the queue
if (fileMetadataQueue.TryTake(out FileMetadata result))
{
// Copy the file to a new location
var newFileLocation = processorOps.MoveFile(result); // move file to other path
// Add the new file location to the result queue
if (newFileLocation != String.Empty)
{
result.newFileLocation = newFileLocation;
resultQueue.Add(result);
}
else {
retryQueue.Add(result);
}
}
}
}
The ProcessDestinationThread and RetryOnErrorThread work in exactly the same way, but do some different processing, and consume the resultQueue and the retryQueue, respectively.
Now when I run this app, it works fine, everything gets processed as expected, but my CPU and power usage is between 85% and 95%, which is huge, IMO, and does so even when it is not processing anything, just sitting idle. I figured this is because all the infinite loops, but how can I remedy this?
Birds eye view: What I would like is that in case the filewatcher.created event is not firing (ie no files are dropped) then the all the queues after it can be running in idle, so to speak. No need for constant checking, then.
I thought about calling CompleteAdding() on the BlockingCollection<T>, but it seems that I cannot reverse that. And the app is supposed to run indefinitely: So if the drop folder is empty, it might be receiving new files at any time.
Is there a way how I can reduce the CPU usage of my application?
Ps. I am aware that this code is not a fully working example. The real code is far more complex than this, and I had to remove a lot of stuff that is distracting. If you think any pieces of relevant code are missing, I can provide them. I hope this code will at least make clear what I am trying to achieve.

private void ProcessQueue()
{
while (!fileMetadataQueue.IsCompleted)
{
if (fileMetadataQueue.TryTake(out FileMetadata result))
{
//...
}
}
}
This pattern for consuming a BlockingCollection<T> is incorrect. It causes a tight loop that burns unproductively a CPU core. The correct pattern is to use the GetConsumingEnumerable method:
private void ProcessQueue()
{
foreach (FileMetadata result in fileMetadataQueue.GetConsumingEnumerable())
{
//...
}
}

Related

Enforce Observable Subscribers to only write to the stream one at a time

I currently am using observables to manage messages being generated on bus which are being pushed over various streams.
All works well but as messages can come in, it's possible for the system to try and write multiple messages to the stream at once (i.e. messages coming in from multiple threads) or that messages are published quicker than they can be written to the stream... as you can image, this causes issues when writing.
Hence I'm trying to figure out how I can organize things so that when messages come in only one will be processed at a time. Any thoughts?
public class MessageStreamResource : IResourceStartup
{
private readonly IBus _bus;
private readonly ISubject<string> _sender;
public MessageStreamResource(IBus bus)
{
_bus = bus;
_senderSubject = new Subject<string>();
//`All` can publish messages at the same time as it's
//collecting data being generated from different threads
_bus.All.Subscribe(message => Observable.Start(() => ProcessMessage(message), TaskPoolScheduler.Default));
//Note the above hops off the calls context so that the
//writing to the stream wont slow down the caller.
}
public void Configure(IAppBuilder app)
{
app.Map("/stream", async context =>
{
...
await context.Response.WriteAsync("Lets party!\n");
await context.Response.Body.FlushAsync();
var unSubscribe = _sender.Subscribe(async t =>
{
//PROBLEM HERE
//I only want this callback to be executed
//one at a time...
await context.Response.WriteAsync($"{t}\n");
await context.Response.Body.FlushAsync();
});
...
await HoldOpenTask;
});
}
private void ProcessMessage(IMessage message)
{
_sender.OnNext(message.Payload);
}
}
If I understood the question correctly, this possibly can be done with SemaphoreSlim:
// ...
var semaphore = new SemaphoreSlim(initialCount: 1);
var unSubscribe = _sender.Subscribe(async t =>
{
//PROBLEM HERE
//I only want this callback to be executed
//one at a time...
await semaphore.WaitAsync();
try
{
await context.Response.WriteAsync($"{t}\n");
await context.Response.Body.FlushAsync();
}
finally
{
semaphore.Release();
}
});
SemaphoreSlim is IDisposable, make sure to dispose of it when appropriate.
Updated, from the second look, MapExtensions.Map
accepts Action<IAppBuilder>, so you're passing an async void lambda, essentially creating a bunch of fire-and-forget asynchronous operations. The Map call will return to the caller, while they may still be lingering around. This is most likely not what you want, is it?

Is Looping inside a Task Recommended?

Is Looping inside a task really recommended?
example code:
public void doTask(){
Task.Factory.StartNew(() => {
do{
// do tasks here.... call webservice
}while(true till cancelled)
});
}
any answers would be great! :)
because it is a case for my webservice calling right now, and the memory consumption goes out of control.
So may I ask, is looping inside a task really good or not recommended at all?
As Requested by SLC, heres the code:
CancellationTokenSource tokenSrc;
Task myTask;
private void btnStart_Click(object sender, EventArgs e)
{
isPressed = !isPressed;
if(isPressed)
{
tokenSrc = new CancellationTokenSource();
myTask = Task.Factory.StartNew(() =>
{
do{
checkMatches(tokenSrc.Token);
}while(tokenSrc.IsCancellationRequested != true);
}, tokenSrc.Token);
}
else {
try{
tokenSrc.Cancel();
// Log to notepad
}
catch(Exception err){
// Log to notepad
}
finally {
if(myTask.IsCanceled || myTask.IsCompleted || myTask.isFaulted) {
myTask.Dispose();
}
}
}
}
private void checkMatches(CancellationTokenSource token)
{
try
{
if(!token.IsCancellationRequested)
{
//Create Endpoint...
//Bypass ServCertValidation for test purposes
ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback(delegate {return true;});
using(WebServiceAsmx.SoapClient client = new....)
{
client.CheckResp response = client.chkMatch();
// if's here for the response then put to logs
}
}
}
catch(Exception err)
{
// err.toLogs
}
}
It's perfectly fine to do this, especially if your task runs constantly, for example picking up a message queue.
while (not shutting down)
get next email to send
if exists next email to send
send
else
wait for 10 seconds
wend
Ensure that you have a way to get out if you need to cancel it, like you've done with a flag, and you should be fine.
Regarding webservices:
You should have no problem calling the webservice repeatedly, nor should it cause any memory spikes. However, you should make sure your initialisation code is not inside the loop:
BAD
while (notShuttingDown)
make a new connection
initialise
make a call to the service()
wend
GOOD
make a new connection
initialise
while (notShuttingDown)
make a call to the service
wend
Depending on your webservice it might be more optimal to create a batch operation, for example if your service is HTTP then hitting it repeatedly involves a lot of overhead. A persistent TCP connection might be better because it could be creating and destroying a lot of objects to make the calls.
For example
slow, lots of overhead:
myRecords = { cat, dog, mouse }
foreach record in myRecords
webservice check record
endforeach
faster:
myRecords = { cat, dog, mouse }
webservice check [myRecords] // array of records is passed instead of one by one
Debugging: The most likely risk is that somehow the task is not being disposed correctly - can you add this to your method to debug?
myTask = Task.Factory.StartNew(() =>
{
Console.Writeline("Task Started");
do{
checkMatches(tokenSrc.Token);
Thread.Sleep(10); // Some pause to stop your code from going as fast as it possibly can and putting your CPU usage to 100% (or 100/number of cores%)
}while(tokenSrc.IsCancellationRequested != true);
Console.Writeline("Task Stopped");
}
You might have to change that so it writes to a file or similar depending on if you have a console.
Then run it and make sure that only 1 task is being created.

One of multiple Tasks acquires a lock in Mutex much longer than other Tasks do

SITUATION
Currently in my project I have 3 Workers that have a working loop inside, and one CommonWork class object, which contains Work methods (DoFirstTask, DoSecondTask, DoThirdTask) that Workers can call. Each Work method must be executed mutually exclusively in respect to each other method. Each of methods spawn more nested Tasks that are waited until they are finished.
PROBLEM
When all 3 Workers are started, 2 Workers perform somewhat at the same speed, but 3rd Worker is lagging behind or 1st Worker is super-fast, 2nd a bit slower and 3rd is very slow, it depends on real world.
BIZARRENESS
When only 2 Workers are working, they share the work nicely too, and perform at the same speed.
What's more interesting, that even 3rd Worker calls fewer number of CommonWork methods, and has the potential to perform more loop cycles, it does not. I tried to simulate that in the code below with condition:
if (Task.CurrentId.Value < 3)
When debugging, I found out, that 3rd Worker was waiting on acquiring a lock on a Mutex substantially longer than other Workers. Sometimes, other two Workers just work interchangingly, and the 3rd keeps waiting on Mutex.WaitOne(); I guess, without really entering it, because other Workers have no problem in acquiring that lock!
WHAT I TRIED ALREADY
I tried starting Worker Tasks as TaskCreateOptions.LongRunning, but nothing changed. I also tried making nested Tasks to be child Tasks by specifying TaskCreateOpions.AttachedToParent, thinking it might be related to local queues and scheduling, but apparently it is not.
SIMPLIFIED CODE
Below is the simplified code of my real-world application. Sad to say, I could not reproduce this situation in this simple example:
class Program
{
public class CommonWork
{
private Mutex _mutex;
public CommonWork() { this._mutex = new Mutex(false); }
private void Lock() { this._mutex.WaitOne(); }
private void Unlock() { this._mutex.ReleaseMutex(); }
public void DoFirstTask(int taskId)
{
this.Lock();
try
{
// imitating sync work from 3rd Party lib, that I need to make async
var t = Task.Run(() => {
Thread.Sleep(500); // sync work
});
... // doing some work here
t.Wait();
Console.WriteLine("Task {0}: DoFirstTask - complete", taskId);
}
finally { this.Unlock(); }
}
public void DoSecondTask(int taskId)
{
this.Lock();
try
{
// imitating sync work from 3rd Party lib, that I need to make async
var t = Task.Run(() => {
Thread.Sleep(500); // sync work
});
... // doing some work here
t.Wait();
Console.WriteLine("Task {0}: DoSecondTask - complete", taskId);
}
finally { this.Unlock(); }
}
public void DoThirdTask(int taskId)
{
this.Lock();
try
{
// imitating sync work from 3rd Party lib, that I need to make async
var t = Task.Run(() => {
Thread.Sleep(500); // sync work
});
... // doing some work here
t.Wait();
Console.WriteLine("Task {0}: DoThirdTask - complete", taskId);
}
finally { this.Unlock(); }
}
}
// Worker class
public class Worker
{
private CommonWork CommonWork { get; set; }
public Worker(CommonWork commonWork)
{ this.CommonWork = commonWork; }
private void Loop()
{
while (true)
{
this.CommonWork.DoFirstTask(Task.CurrentId.Value);
if (Task.CurrentId.Value < 3)
{
this.CommonWork.DoSecondTask(Task.CurrentId.Value);
this.CommonWork.DoThirdTask(Task.CurrentId.Value);
}
}
}
public Task Start()
{
return Task.Run(() => this.Loop());
}
}
static void Main(string[] args)
{
var work = new CommonWork();
var client1 = new Worker(work);
var client2 = new Worker(work);
var client3 = new Worker(work);
client1.Start();
client2.Start();
client3.Start();
Console.ReadKey();
}
} // end of Program
The solution was to use new SempahoreSlim(1) instead of Mutex (or simple lock, or Monitor). Only using SemaphoreSlim made Thread Scheduling to be round-robin, and therefore did not make some Threads/Tasks "special" in respect to other threads. Thanks I3arnon.
If someone could comment why it is so, I would appreciate it.

c# do the equivalent of restarting a Task with some parameter

The main idea here is to fetch some data from somewhere, when it's fetched start writing it, and then prepare the next batch of data to be written, while waiting for the previous write to be complete.
I know that a Task cannot be restarted or reused (nor should it be), although I am trying to find a way to do something like this :
//The "WriteTargetData" method should take the "data" variable
//created in the loop below as a parameter
//WriteData basically do a shedload of mongodb upserts in a separate thread,
//it takes approx. 20-30 secs to run
var task = new Task(() => WriteData(somedata));
//GetData also takes some time.
foreach (var data in queries.Select(GetData))
{
if (task.Status != TaskStatus.Running)
{
//start task with "data" as a parameter
//continue the loop to prepare the next batch of data to be written
}
else
{
//wait for task to be completed
//"restart" task
//continue the loop to prepare the next batch of data to be written
}
}
Any suggestion appreciated ! Thanks. I don't necessarily want to use Task, I just think it might be the way to go.
This may be over simplifying your requirements, but would simply "waiting" for the previous task to complete work for you? You can use Task.WaitAny and Task.WaitAll to wait for previous operations to complete.
pseudo code:
// Method that makes calls to fetch and write data.
public async Task DoStuff()
{
Task currTask = null;
object somedata = await FetchData();
while (somedata != null)
{
// Wait for previous task.
if (currTask != null)
Task.WaitAny(currTask);
currTask = WriteData(somedata);
somedata = await FetchData();
}
}
// Whatever method fetches data.
public Task<object> FetchData()
{
var data = new object();
return Task.FromResult(data);
}
// Whatever method writes data.
public Task WriteData(object somedata)
{
return Task.Factory.StartNew(() => { /* write data */});
}
The Task class is not designed to be restarted. so you Need to create a new task and run the body with the same Parameters. Next i do not see where you start the task with the WriteData function in its body. That will property Eliminate the call of if (task.Status != TaskStatus.Running) There are AFAIK only the class Task and Thread where task is only the abstraction of an action that will be scheduled with the TaskScheduler and executed in different threads ( when we talking about the Common task Scheduler, the one you get when you call TaskFactory.Scheduler ) and the Number of the Threads are equal to the number of Processor Cores.
To you Business App. Why do you wait for the execution of WriteData? Would it be not a lot more easy to gater all data and than submit them into one big Write?
something like ?
public void Do()
{
var task = StartTask(500);
var array = new[] {1000, 2000, 3000};
foreach (var data in array)
{
if (task.IsCompleted)
{
task = StartTask(data);
}
else
{
task.Wait();
task = StartTask(data);
}
}
}
private Task StartTask(int data)
{
var task = new Task(DoSmth, data);
task.Start();
return task;
}
private void DoSmth(object time)
{
Thread.Sleep((int) time);
}
You can use a thread and an AutoResetEvent. I have code like this for several different threads in my program:
These are variable declarations that belong to the main program.
public AutoResetEvent StartTask = new AutoResetEvent(false);
public bool IsStopping = false;
public Thread RepeatingTaskThread;
Somewhere in your initialization code:
RepeatingTaskThread = new Thread( new ThreadStart( RepeatingTaskProcessor ) ) { IsBackground = true; };
RepeatingTaskThread.Start();
Then the method that runs the repeating task would look something like this:
private void RepeatingTaskProcessor() {
// Keep looping until the program is going down.
while (!IsStopping) {
// Wait to receive notification that there's something to process.
StartTask.WaitOne();
// Exit if the program is stopping now.
if (IsStopping) return;
// Execute your task
PerformTask();
}
}
If there are several different tasks you want to run, you can add a variable that would indicate which one to process and modify the logic in PerformTask to pick which one to run.
I know that it doesn't use the Task class, but there's more than one way to skin a cat & this will work.

How to detect completion with unknown concurrent Task pushing & pulling ConcurrentQueue<T>

Few days ago I tried to perform a fast search on my disks do few things like, Attributes, Extensions, perform change inside files etc ...
The idea was to make it with really few limitation/lock in order to avoid "latency" for big file or directory with a lots of files inside etc ...
I know it's far for "Best Practices", since i'm not using things like "MaxDegreeOfParallelism" or the Pulling loop with "while(true)"
Even though, the code is running quite fast since we have the architecture to support it.
I tried to move to code to a dummy console project if anybody would like to check what's going on.
class Program
{
static ConcurrentQueue<String> dirToCheck;
static ConcurrentQueue<String> fileToCheck;
static int fileCount; //
static void Main(string[] args)
{
Initialize();
Task.Factory.StartNew(() => ScanDirectories(), TaskCreationOptions.LongRunning);
Task.Factory.StartNew(() => ScanFiles(), TaskCreationOptions.LongRunning);
Console.ReadLine();
}
static void Initialize()
{
//Instantiate caches
dirToCheck = new ConcurrentQueue<string>();
fileToCheck = new ConcurrentQueue<string>();
//Enqueue Directory to Scan here
//Avoid to Enqueue Nested/Sub directories, else they are going to be dcan at least twice
dirToCheck.Enqueue(#"C:\");
//Initialize counters
fileCount = 0;
}
static void ScanDirectories()
{
String dirToScan = null;
while (true)
{
if (dirToCheck.TryDequeue(out dirToScan))
{
ExtractDirectories(dirToScan);
ExtractFiles(dirToScan);
}
//Just here as a visual tracker to have some kind an idea about what's going on and where's the load
Console.WriteLine(dirToCheck.Count + "\t\t" + fileToCheck.Count + "\t\t" + fileCount);
}
}
static void ScanFiles()
{
while (true)
{
String fileToScan = null;
if (fileToCheck.TryDequeue(out fileToScan))
{
CheckFileAsync(fileToScan);
}
}
}
private static Task ExtractDirectories(string dirToScan)
{
Task worker = Task.Factory.StartNew(() =>
{
try
{
Parallel.ForEach<String>(Directory.EnumerateDirectories(dirToScan), (dirPath) =>
{
dirToCheck.Enqueue(dirPath);
});
}
catch (UnauthorizedAccessException) { }
}, TaskCreationOptions.AttachedToParent);
return worker;
}
private static Task ExtractFiles(string dirToScan)
{
Task worker = Task.Factory.StartNew(() =>
{
try
{
Parallel.ForEach<String>(Directory.EnumerateFiles(dirToScan), (filePath) =>
{
fileToCheck.Enqueue(filePath);
});
}
catch (UnauthorizedAccessException) { }
}, TaskCreationOptions.AttachedToParent);
return worker;
}
static Task CheckFileAsync(String filePath)
{
Task worker = Task.Factory.StartNew(() =>
{
//Add statement to play along with the file here
Interlocked.Increment(ref fileCount);
//WARNING !!! If your file fullname is too long this code may not be executed or may just crash
//I just put a simple check 'cause i found 2 or 3 different error message between the framework & msdn documentation
//"Full paths must not exceed 260 characters to maintain compatibility with Windows operating systems. For more information about this restriction, see the entry Long Paths in .NET in the BCL Team blog"
if (filePath.Length > 260)
return;
FileInfo fi = new FileInfo(filePath);
//Add statement here to use FileInfo
}, TaskCreationOptions.AttachedToParent);
return worker;
}
}
Problems:
How can I detect that i'm done with ScanDirectory?
Once it's done, I can manage to enqueue a String empty or whatever to the file queue, to exit it.
I know that if I use "AttachedToParent" I can have a Completion state on the parent Task, and then for example do something like "ContinueWith(()=> { /SomeCode to notice the end/})"
But still the parent task is doing Pulling and is stuck in a kind of infinite loop and each sub statement begin new Task.
On the other hand, I cannot simply test "Count" in each Queue 'cause I might have Flush the File List and Directory List but there might be another task that's going to call "EnumerateDirectory()".
I'm trying to find some kind of "reactive" solution and avoid some "if()" inside the loop that would be checked 80% of time for nothing since it's a simple while(true){} with AsyncCall.
PS: I know i could use TPL Dataflow, i'm not because i'm stuck on .net 4.0 for know, anyway, in .net 4.5 without dataflow since there's few improvement in the TPL, i'm still curious about it
Instead of ConcurrentQueue<T>, you could use BlockingCollection<T>.
BlockingCollection<T> is designed specifically for producer/consumer scenarios such as this, and provides a CompleteAdding method so the producer can notify the consumers that it has finished adding work.

Categories

Resources