I have a windows service with a thread that runs every 2 minutes.
while (true)
{
try
{
repNeg.willExecuteLoopWithTasks(param1, param2, param3);
Thread.Sleep(20000);
}
Inside this I have a loop with tasks:
foreach (RepModel repModelo in listaRep)
{
Task t = new Task(() => { this.coletaFunc(repModelo.EndIp, user, tipoFilial); });
t.Start();
}
But I think this implementation is wrong. I need to only run one task for every element in the list,
and, when a specific task finishes, wait a minute and start again.
M8's I need to say i have 2 situations here.
1 - I can't wait all Task Finish. Because some task can take more then 2 hours to finish and another can take only 27 seconds.
2 - My List of tasks can change. Thats why i got a Thread. Every 2 minutes My thread get a list of Tasks to execute and then start a loop.
But sometimes my Task not Finished yet and another Thread Start Again and then strange things show in my log.
I tryed to use a Dictionry to solve my problem but after some time of execution, sometimes takes days, my log show:
"System.IndexOutOfRangeException"
Here is what I would do...
Create a new class that stores the following (as properties):
a RepModel ID (something unique)
a DateTime for the last time ran
a int for the frequency the task should run in seconds
a bool to determine if the task is in progress or not
Then you need a global list of the class somewhere, say called "JobList".
Your main app should have a Timer, which runs every couple of minutes. The job of this timer is to check for new RepModel (assume these can change over time, i.e a database list). When this ticks, is loops the list and adds any new ones (different ID) to JobList. You may also want to remove any that are no longer required (i.e. removed from DB list).
Then you have a second timer, this runs every second. It's job is to check all items in the JobList and compare the last run time with the current time (and ensure they are not already in progress). If the duration has lapped, then kick off the task. Once the task is complete, update the last run time so it can work next time, ensuring to change the "in progress" flag as you go.
This is all theory and you will need to give it a try yourself, but I think it covers what you are actually trying to achieve.
Some sample code (may or may not compile/work):
class Job
{
public int ID { get; set; }
public DateTime? LastRun { get; set; }
public int Frequency { get; set; }
public bool InProgress { get; set; }
}
List<Job> JobList = new List<Job>();
// Every 2 minutes (or whatever).
void timerMain_Tick()
{
foreach (RepModel repModelo in listaRep)
{
if(!JobList.Any(x => x.ID == repModelo.ID)
{
JobList.Add(new Job(){ ID = repModel.ID, Frequency = 120 });
}
}
}
// Every 10 seconds (or whatever).
void timerTask_Tick()
{
foreach(var job in JobList.Where(x => !x.InProgress && (x.LastRun == null || DateTime.Compare(x.LastRun.AddSeconds(x.Duration), DateTime.Now) < 0))
{
Task t = new Task(() => {
// Do task.
}).ContinueWith(task => {
job.LastRun = DateTime.Now;
job.InProgress = false;
}, TaskScheduler.FromCurrentSynchronizationContext());;
job.InProgress = true;
t.Start();
}
}
So what you really need here is a class that has two operations, it needs to be able to start processing one of your models, and it needs to be able to end processing of one of your models. Separating it from the list will make this easier.
When you start processing a model you'll want to create a CancellationTokenSource to associate with it so that you can stop processing it later. Processing it, in your case, means having a loop, while not cancelled, that runs an operation and then waits a while. Ending the operation is as easy as cancelling the token source.
public class Foo
{
private ConcurrentDictionary<RepModel, CancellationTokenSource> tokenLookup =
new ConcurrentDictionary<RepModel, CancellationTokenSource>();
public async Task Start(RepModel model)
{
var cts = new CancellationTokenSource();
tokenLookup[model] = cts;
while (!cts.IsCancellationRequested)
{
await Task.Run(() => model.DoWork());
await Task.Delay(TimeSpan.FromMinutes(1));
}
}
public void End(RepModel model)
{
CancellationTokenSource cts;
if (tokenLookup.TryRemove(model, out cts))
cts.Cancel();
}
}
If you are using framework 4.0 and more, you may try to benefit from
Parallel.ForEach
Executes a foreach operation in which iterations may run in parallel.
Parallel code may look like this:
Parallel.ForEach(listaRep , repModelo => {
this.coletaFunc(repModelo.EndIp, user, tipoFilial);
});
This will run on multiple cores (if that is possible), and you don't need some specific task sceduler, as your code will wait until all parallel tasks inside parallel loop are finished. And after you can call recursively the same function, if condition was met.
Related
In my application I have the need to continually process some piece(s) of Work on some set interval(s). I had originally written a Task to continually check a given Task.Delay to see if it was completed, if so the Work would be processed that corresponded to that Task.Delay. The draw back to this method is the Task that checks these Task.Delays would be in a psuedo-infinite loop when no Task.Delay is completed.
To solve this problem I found that I could create a "recursive Task" (I am not sure what the jargon for this would be) that processes the work at the given interval as needed.
// New Recurring Work can be added by simply creating
// the Task below and adding an entry into this Dictionary.
// Recurring Work can be removed/stopped by looking
// it up in this Dictionary and calling its CTS.Cancel method.
private readonly object _LockRecurWork = new object();
private Dictionary<Work, Tuple<Task, CancellationTokenSource> RecurringWork { get; set; }
...
private Task CreateRecurringWorkTask(Work workToDo, CancellationTokenSource taskTokenSource)
{
return Task.Run(async () =>
{
// Do the Work, then wait the prescribed amount of time before doing it again
DoWork(workToDo);
await Task.Delay(workToDo.RecurRate, taskTokenSource.Token);
// If this Work's CancellationTokenSource is not
// cancelled then "schedule" the next Work execution
if (!taskTokenSource.IsCancellationRequested)
{
lock(_LockRecurWork)
{
RecurringWork[workToDo] = new Tuple<Task, CancellationTokenSource>
(CreateRecurringWorkTask(workToDo, taskTokenSource), taskTokenSource);
}
}
}, taskTokenSource.Token);
}
Should/Could this be represented with a chain of Task.ContinueWith? Would there be any benefit to such an implementation? Is there anything majorly wrong with the current implementation?
Yes!
Calling ContinueWith tells the Task to call your code as soon as it finishes. This is far faster than manually polling it.
I have an IEnumerable<customClass> object that has roughly 10-15 entries, so not a lot, but I'm running into a System.IO.FileNotFoundException when I try and do
Parallel.Foreach(..some linq query.., object => { ...stuff....});
with the enumerable. Here is the code I have that sometimes works, other times doesn't:
IEnumerable<UserIdentifier> userIds = script.Entries.Select(x => x.UserIdentifier).Distinct();
await Task.Factory.StartNew(() =>
{
Parallel.ForEach(userIds, async userId =>
{
Stopwatch watch = new Stopwatch();
watch.Start();
_Log.InfoFormat("user identifier: {0}", userId);
await Task.Factory.StartNew(() =>
{
foreach (ScriptEntry se in script.Entries.Where(x => x.UserIdentifier.Equals(userId)))
{
// // Run the script //
_Log.InfoFormat("waiting {0}", se.Delay);
Task.Delay(se.Delay);
_Log.InfoFormat("running SelectionInformation{0}", se.SelectionInformation);
ExecuteSingleEntry(se);
_Log.InfoFormat("[====== SelectionInformation {1} ELAPSED TIME: {0} ======]", watch.Elapsed,
se.SelectionInformation.Verb);
}
});
watch.Stop();
_Log.InfoFormat("[====== TOTAL ELAPSED TIME: {0} ======]", watch.Elapsed);
});
});
When the function ExecuteSingleEntry is ran, there is a function a few calls deep within that function that creates a temp directory and files. It seems to me, that when I run the parallel.foreach the function is getting slammed at once by numerous calls (I'm testing 5 at once currently but need to handle about 10) and isn't creating some of the files I need. But if I hit a break point in the file creation function and just F5 every time it gets hit I don't have any problems with a file not found exception being thrown.
So, my question is, how can I achieve running a subset of my scripts.Entries in parallel based on the user id within the script entries with a delay of 1 second between each different user id entries being started?
and a script entry is like:
UserIdentifier: 141, SelectionInformation: class of stuff, Ids: list of EntryIds, Names: list of Entry Names
And each user identifier can appear 1 or more times in the array. I want to start all the different user identifiers, more or less, at once. Then Task out the different SelectionInformation's tied to a script entry.
scripts.Entries is an array of ScriptEntry, which is as follows:
[DataMember]
public TimeSpan Delay { get; set; }
[DataMember]
public SelectionInformation Selection { get; set; }
[DataMember]
public long[] Ids { get; set; }
[DataMember]
public string Names { get; set; }
[DataMember]
public long UserIdentifier { get; set; }
I referenced: Parallel.ForEach vs Task.Factory.StartNew to obtain the
Task.Factory.StartNew(() => Parallel.Foreach({ }) ) so my UI doesn't lock up on me
There are a few principles to apply:
Prefer Task.Run over Task.Factory.StartNew. I describe on my blog why StartNew is dangerous; Run is a much safer, more modern alternative.
Don't pass an async lambda to Parallel.ForEach. It doesn't make sense, and it won't work right.
Task.Delay doesn't do anything by itself. You either have to await it or use the synchronous version (Thread.Sleep).
(In fact, in your case, the internal StartNew is meaningless; it's already parallel, and the code - running on a thread pool thread - is trying to start a new operation on a thread pool thread and immediately asynchronously await it???)
After applying these principles:
await Task.Run(() =>
{
Parallel.ForEach(userIds, userId =>
{
Stopwatch watch = new Stopwatch();
watch.Start();
_Log.InfoFormat("user identifier: {0}", userId);
foreach (ScriptEntry se in script.Entries.Where(x => x.UserIdentifier.Equals(userId)))
{
// // Run the script //
_Log.InfoFormat("waiting {0}", se.Delay);
Thread.Sleep(se.Delay);
_Log.InfoFormat("running SelectionInformation{0}", se.SelectionInformation);
ExecuteSingleEntry(se);
_Log.InfoFormat("[====== SelectionInformation {1} ELAPSED TIME: {0} ======]", watch.Elapsed,
se.SelectionInformation.Verb);
}
watch.Stop();
_Log.InfoFormat("[====== TOTAL ELAPSED TIME: {0} ======]", watch.Elapsed);
});
});
The main idea here is to fetch some data from somewhere, when it's fetched start writing it, and then prepare the next batch of data to be written, while waiting for the previous write to be complete.
I know that a Task cannot be restarted or reused (nor should it be), although I am trying to find a way to do something like this :
//The "WriteTargetData" method should take the "data" variable
//created in the loop below as a parameter
//WriteData basically do a shedload of mongodb upserts in a separate thread,
//it takes approx. 20-30 secs to run
var task = new Task(() => WriteData(somedata));
//GetData also takes some time.
foreach (var data in queries.Select(GetData))
{
if (task.Status != TaskStatus.Running)
{
//start task with "data" as a parameter
//continue the loop to prepare the next batch of data to be written
}
else
{
//wait for task to be completed
//"restart" task
//continue the loop to prepare the next batch of data to be written
}
}
Any suggestion appreciated ! Thanks. I don't necessarily want to use Task, I just think it might be the way to go.
This may be over simplifying your requirements, but would simply "waiting" for the previous task to complete work for you? You can use Task.WaitAny and Task.WaitAll to wait for previous operations to complete.
pseudo code:
// Method that makes calls to fetch and write data.
public async Task DoStuff()
{
Task currTask = null;
object somedata = await FetchData();
while (somedata != null)
{
// Wait for previous task.
if (currTask != null)
Task.WaitAny(currTask);
currTask = WriteData(somedata);
somedata = await FetchData();
}
}
// Whatever method fetches data.
public Task<object> FetchData()
{
var data = new object();
return Task.FromResult(data);
}
// Whatever method writes data.
public Task WriteData(object somedata)
{
return Task.Factory.StartNew(() => { /* write data */});
}
The Task class is not designed to be restarted. so you Need to create a new task and run the body with the same Parameters. Next i do not see where you start the task with the WriteData function in its body. That will property Eliminate the call of if (task.Status != TaskStatus.Running) There are AFAIK only the class Task and Thread where task is only the abstraction of an action that will be scheduled with the TaskScheduler and executed in different threads ( when we talking about the Common task Scheduler, the one you get when you call TaskFactory.Scheduler ) and the Number of the Threads are equal to the number of Processor Cores.
To you Business App. Why do you wait for the execution of WriteData? Would it be not a lot more easy to gater all data and than submit them into one big Write?
something like ?
public void Do()
{
var task = StartTask(500);
var array = new[] {1000, 2000, 3000};
foreach (var data in array)
{
if (task.IsCompleted)
{
task = StartTask(data);
}
else
{
task.Wait();
task = StartTask(data);
}
}
}
private Task StartTask(int data)
{
var task = new Task(DoSmth, data);
task.Start();
return task;
}
private void DoSmth(object time)
{
Thread.Sleep((int) time);
}
You can use a thread and an AutoResetEvent. I have code like this for several different threads in my program:
These are variable declarations that belong to the main program.
public AutoResetEvent StartTask = new AutoResetEvent(false);
public bool IsStopping = false;
public Thread RepeatingTaskThread;
Somewhere in your initialization code:
RepeatingTaskThread = new Thread( new ThreadStart( RepeatingTaskProcessor ) ) { IsBackground = true; };
RepeatingTaskThread.Start();
Then the method that runs the repeating task would look something like this:
private void RepeatingTaskProcessor() {
// Keep looping until the program is going down.
while (!IsStopping) {
// Wait to receive notification that there's something to process.
StartTask.WaitOne();
// Exit if the program is stopping now.
if (IsStopping) return;
// Execute your task
PerformTask();
}
}
If there are several different tasks you want to run, you can add a variable that would indicate which one to process and modify the logic in PerformTask to pick which one to run.
I know that it doesn't use the Task class, but there's more than one way to skin a cat & this will work.
The code below continues to create threads, even when the queue is empty..until eventually an OutOfMemory exception occurs. If i replace the Parallel.ForEach with a regular foreach, this does not happen. anyone know of reasons why this may happen?
public delegate void DataChangedDelegate(DataItem obj);
public class Consumer
{
public DataChangedDelegate OnCustomerChanged;
public DataChangedDelegate OnOrdersChanged;
private CancellationTokenSource cts;
private CancellationToken ct;
private BlockingCollection<DataItem> queue;
public Consumer(BlockingCollection<DataItem> queue) {
this.queue = queue;
Start();
}
private void Start() {
cts = new CancellationTokenSource();
ct = cts.Token;
Task.Factory.StartNew(() => DoWork(), ct);
}
private void DoWork() {
Parallel.ForEach(queue.GetConsumingPartitioner(), item => {
if (item.DataType == DataTypes.Customer) {
OnCustomerChanged(item);
} else if(item.DataType == DataTypes.Order) {
OnOrdersChanged(item);
}
});
}
}
I think Parallel.ForEach() was made primarily for processing bounded collections. And it doesn't expect collections like the one returned by GetConsumingPartitioner(), where MoveNext() blocks for a long time.
The problem is that Parallel.ForEach() tries to find the best degree of parallelism, so it starts as many Tasks as the TaskScheduler lets it run. But the TaskScheduler sees there are many Tasks that take a very long time to finish, and that they're not doing anything (they block) so it keeps on starting new ones.
I think the best solution is to set the MaxDegreeOfParallelism.
As an alternative, you could use TPL Dataflow's ActionBlock. The main difference in this case is that ActionBlock doesn't block any threads when there are no items to process, so the number of threads wouldn't get anywhere near the limit.
The Producer/Consumer pattern is mainly used when there is just one Producer and one Consumer.
However, what you are trying to achieve (multiple consumers) more neatly fits in the Worklist pattern. The following code was taken from a slide for unit2 slide "2c - Shared Memory Patterns" from a parallel programming class taught at the University of Utah, which is available in the download at http://ppcp.codeplex.com/
BlockingCollection<Item> workList;
CancellationTokenSource cts;
int itemcount
public void Run()
{
int num_workers = 4;
//create worklist, filled with initial work
worklist = new BlockingCollection<Item>(
new ConcurrentQueue<Item>(GetInitialWork()));
cts = new CancellationTokenSource();
itemcount = worklist.Count();
for( int i = 0; i < num_workers; i++)
Task.Factory.StartNew( RunWorker );
}
IEnumberable<Item> GetInitialWork() { ... }
public void RunWorker() {
try {
do {
Item i = worklist.Take( cts.Token );
//blocks until item available or cancelled
Process(i);
//exit loop if no more items left
} while (Interlocked.Decrement( ref itemcount) > 0);
} finally {
if( ! cts.IsCancellationRequested )
cts.Cancel();
}
}
}
public void AddWork( Item item) {
Interlocked.Increment( ref itemcount );
worklist.Add(item);
}
public void Process( Item i )
{
//Do what you want to the work item here.
}
The preceding code allows you to add worklist items to the queue, and lets you set an arbitrary number of workers (in this case, four) to pull items out of the queue and process them.
Another great resource for the Parallelism on .Net 4.0 is the book "Parallel Programming with Microsoft .Net" which is freely available at: http://msdn.microsoft.com/en-us/library/ff963553
Internally in the Task Parallel Library, the Parallel.For and Parallel.Foreach follow a hill-climbing algorithm to determine how much parallelism should be utilized for the operation.
More or less, they start with running the body on one task, move to two, and so on, until a break-point is reached and they need to reduce the number of tasks.
This works quite well for method bodies that complete quickly, but if the body takes a long time to run, it may take a long time before the it realizes it needs to decrease the amount of parallelism. Until that point, it continues adding tasks, and possibly crashes the computer.
I learned the above during a lecture given by one of the developers of the Task Parallel Library.
Specifying the MaxDegreeOfParallelism is probably the easiest way to go.
I'm currently working on a project, where we have the challenge to process items in parallel. So far not a big deal ;)
Now to the problem. We have a list of IDs, where we periodically (every 2 sec's) what to call a StoredProcedure for each ID.
The 2 sec's need to be checked for each item individually, as they are added and removing during runtime.
In addition we want to configure the maximum degree of parallelism, as the DB should not be flooded with 300 threads concurrently.
An item which is being processed should not be rescheduled for processing until it has finished with the previous execution. Reason is that we want to prevent queueing up a lot of items, in case of delays on the DB.
Right now we are using a self-developed component, that has a main thread, which periodically checks what items need to scheduled for processing. Once it has the list, it's dropping those on a custom IOCP-based thread pool, and then uses waithandles to wait for the items being processed. Then the next iteration starts. IOCP because of the work-stealing it provides.
I would like to replace this custom implementation with a TPL/.NET 4 version, and I would like to know how you would solve it (ideally simple and nicely readable/maintainable).
I know about this article: http://msdn.microsoft.com/en-us/library/ee789351.aspx, but it's just limiting the amount of threads being used. Leaves work stealing, periodically executing the items ....
Ideally it will become a generic component, that can be used for some all the tasks that need to be done periodically for a list of items.
any input welcome,
tia
Martin
I don't think you actually need to get down and dirty with direct TPL Tasks for this. For starters I would set up a BlockingCollection around a ConcurrentQueue (the default) with no BoundedCapacity set on the BlockingCollection to store the IDs that need to be processed.
// Setup the blocking collection somewhere when your process starts up (OnStart for a Windows service)
BlockingCollection<string> idsToProcess = new BlockingCollection<string>();
From there I would just use Parallel::ForEach on the enumeration returned from the BlockingCollection::GetConsumingEnumerable. In the ForEach call you will setup your ParallelOptions::MaxDegreeOfParallelism Inside the body of the ForEach you will execute your stored procedure.
Now, once the stored procedure execution completes, you're saying you don't want to re-schedule the execution for at least two seconds. No problem, schedule a System.Threading.Timer with a callback which will simply add the ID back to the BlockingCollection in the supplied callback.
Parallel.ForEach(
idsToProcess.GetConsumingEnumerable(),
new ParallelOptions
{
MaxDegreeOfParallelism = 4 // read this from config
},
(id) =>
{
// ... execute sproc ...
// Need to declare/assign this before the delegate so that we can dispose of it inside
Timer timer = null;
timer = new Timer(
_ =>
{
// Add the id back to the collection so it will be processed again
idsToProcess.Add(id);
// Cleanup the timer
timer.Dispose();
},
null, // no state, id wee need is "captured" in the anonymous delegate
2000, // probably should read this from config
Timeout.Infinite);
}
Finally, when the process is shutting down you would call BlockingCollection::CompleteAdding so that the enumerable being processed with stop blocking and complete and the Parallel::ForEach will exit. If this were a Windows service for example you would do this in OnStop.
// When ready to shutdown you just signal you're done adding
idsToProcess.CompleteAdding();
Update
You raised a valid concern in your comment that you might be processing a large amount of IDs at any given point and fear that there would be too much overhead in a timer per ID. I would absolutely agree with that. So in the case that you are dealing with a large list of IDs concurrently, I would change from using a timer-per-ID to using another queue to hold the "sleeping" IDs which is monitored by a single short interval timer instead. First you'll need a ConcurrentQueue onto which to place the IDs that are asleep:
ConcurrentQueue<Tuple<string, DateTime>> sleepingIds = new ConcurrentQueue<Tuple<string, DateTime>>();
Now, I'm using a two-part Tuple here for illustration purposes, but you may want to create a more strongly typed struct for it (or at least alias it with a using statement) for better readability. The tuple has the id and a DateTime which represents when it was put on the queue.
Now you'll also want to setup the timer that will monitor this queue:
Timer wakeSleepingIdsTimer = new Timer(
_ =>
{
DateTime utcNow = DateTime.UtcNow;
// Pull all items from the sleeping queue that have been there for at least 2 seconds
foreach(string id in sleepingIds.TakeWhile(entry => (utcNow - entry.Item2).TotalSeconds >= 2))
{
// Add this id back to the processing queue
idsToProcess.Enqueue(id);
}
},
null, // no state
Timeout.Infinite, // no due time
100 // wake up every 100ms, probably should read this from config
);
Then you would simply change the Parallel::ForEach to do the following instead of setting up a timer for each one:
(id) =>
{
// ... execute sproc ...
sleepingIds.Enqueue(Tuple.Create(id, DateTime.UtcNow));
}
This is pretty similar to the approach you said you already had in your question, but does so with TPL tasks. A task just adds itself back to a list of things to schedule when its done.
The use of locking on a plain list is fairly ugly in this example, would probably want a better collection to hold the list of things to schedule
// Fill the idsToSchedule
for (int id = 0; id < 5; id++)
{
idsToSchedule.Add(Tuple.Create(DateTime.MinValue, id));
}
// LongRunning will tell TPL to create a new thread to run this on
Task.Factory.StartNew(SchedulingLoop, TaskCreationOptions.LongRunning);
That starts up the SchedulingLoop, which actually performs the checking if its been two seconds since something ran
// Tuple of the last time an id was processed and the id of the thing to schedule
static List<Tuple<DateTime, int>> idsToSchedule = new List<Tuple<DateTime, int>>();
static int currentlyProcessing = 0;
const int ProcessingLimit = 3;
// An event loop that performs the scheduling
public static void SchedulingLoop()
{
while (true)
{
lock (idsToSchedule)
{
DateTime currentTime = DateTime.Now;
for (int index = idsToSchedule.Count - 1; index >= 0; index--)
{
var scheduleItem = idsToSchedule[index];
var timeSincePreviousRun = (currentTime - scheduleItem.Item1).TotalSeconds;
// start it executing in a background task
if (timeSincePreviousRun > 2 && currentlyProcessing < ProcessingLimit)
{
Interlocked.Increment(ref currentlyProcessing);
Console.WriteLine("Scheduling {0} after {1} seconds", scheduleItem.Item2, timeSincePreviousRun);
// Schedule this task to be processed
Task.Factory.StartNew(() =>
{
Console.WriteLine("Executing {0}", scheduleItem.Item2);
// simulate the time taken to call this procedure
Thread.Sleep(new Random((int)DateTime.Now.Ticks).Next(0, 5000) + 500);
lock (idsToSchedule)
{
idsToSchedule.Add(Tuple.Create(DateTime.Now, scheduleItem.Item2));
}
Console.WriteLine("Done Executing {0}", scheduleItem.Item2);
Interlocked.Decrement(ref currentlyProcessing);
});
// remove this from the list of things to schedule
idsToSchedule.RemoveAt(index);
}
}
}
Thread.Sleep(100);
}
}