Context
I am trying to implement a REST API web service that "wraps" an existing C program.
Problem / Goal
Given that the C program has slow initialisation time and high RAM usage when I tell it to open a specific folder (assume this cannot be improved), I am thinking of caching the C handle/object, so the next time a GET request hits the same folder, I can use the existing handle.
What I've tried
First declare a static dictionary mapping from folder path to handle:
static ConcurrentDictionary<string, IHandle> handles = new ConcurrentDictionary<string, IHandle>();
In my GET function:
IHandle theHandle = handles.GetOrAdd(dir.Name, x => {
return new Handle(x); //this is the slow and memory-intensive function
});
This way, whenever a specific folder has been GET'd before, it will already have a handle ready for me to use.
Why it's not good
So now I run the risk of running out of memory if too many folders are cached simultaneously. How might I add a GC-like background process to TryRemove() and call IHandle.Dispose() on old handles, perhaps in a Least Recently Used or Least Frequently Used policy? Ideally it should start triggering only upon low physical memory available.
I have tried adding the following statement in the GET function, but it seems too hacky and is very limited in function. This way works OK only if I always want handles to expire after 10 seconds, and it does not restart the timer if a subsequent request comes in within 10 seconds.
HostingEnvironment.QueueBackgroundWorkItem(ct =>
{
System.Threading.Thread.Sleep(10000);
if (handles.TryRemove(dir.Name, out var handle2))
handle2.Dispose();
});
What this question is not
I don't think caching the output is the solution here. After I return the result of this GET request (it's just the metadata of the folder contents), there might be another GET request for more in-depth data, which requires calling Handle's methods.
I hope my question is clear enough!
Handles closing on low memory.
ConcurrentQueue<(string, IHandle)> handles = new ConcurrentQueue<(string, IHandle)>();
void CheckMemory_OptionallyReleaseOldHandles()
{
var performance = new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes");
while (performance.NextValue() <= YOUR_TRESHHOLD)
{
if (handles.TryDequeue(out ValueTuple<string, IHandle> value))
{
value.Item2.Dispose();
}
}
}
Your Get method.
IHandle GetHandle()
{
IHandle theHandle = handles.FirstOrDefault(v => v.Item1 == dir.Name).Item2;
if (theHandle == null)
{
theHandle = new Handle(dir.Name);
handles.Enqueue((dir.Name, theHandle));
}
return theHandle;
});
Your background task.
void SetupMemoryCheck()
{
Action<CancellationToken> BeCheckingTheMemory = ct =>
{
for(;;)
{
if (ct.IsCancellationRequested)
{
break;
}
CheckMemory_OptionallyReleaseOldHandles();
Thread.Sleep(500);
};
};
HostingEnvironment.QueueBackgroundWorkItem(ct =>
{
var tf = new TaskFactory(ct, TaskCreationOptions.LongRunning, TaskContinuationOptions.None, TaskScheduler.Current);
tf.StartNew(() => BeCheckingTheMemory(ct));
});
}
I suppose the collection will have little elems so there is no need to dictionary.
I did’t catch your LRU/LFU demand first time. Here you can check for some hybrid LRU/LFU cache model.
Handles closing on low memory.
/*
* string – handle name,
* IHandle – the handle,
* int – hit count,
*/
ConcurrentDictionary<string, (IHandle, int)> handles = new ConcurrentDictionary<string, (IHandle, int)>();
void FreeResources()
{
if (handles.Count == 0)
{
return;
}
var performance = new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes");
while (performance.NextValue() <= YOUR_TRESHHOLD)
{
int maxIndex = (int)Math.Ceiling(handles.Count / 2.0d);
KeyValuePair<string, (IHandle, int)> candidate = handles.First();
for (int index = 1; index < maxIndex; index++)
{
KeyValuePair<string, (IHandle, int)> item = handles.ElementAt(index);
if(item.Value.Item2 < candidate.Value.Item2)
{
candidate = item;
}
}
candidate.Value.Item1.Dispose();
handles.TryRemove(candidate.Key, out _);
}
}
Get method.
IHandle GetHandle(Dir dir, int handleOpenAttemps = 1)
{
if(handles.TryGetValue(dir.Name, out (IHandle, int) handle))
{
handle.Item2++;
}
else
{
if(new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes").NextValue() < YOUR_TRESHHOLD)
{
FreeResources();
}
try
{
handle.Item1 = new Handle(dir.Name);
}
catch (OutOfMemoryException)
{
if (handleOpenAttemps == 2)
{
return null;
}
FreeResources();
return GetHandle(dir, handleOpenAttemps++);
}
catch (Exception)
{
// Your handling.
}
handle.Item2 = 1;
handles.TryAdd(dir.Name, handle);
}
return handle.Item1;
}
Background task.
void SetupMemoryCheck()
{
Action<CancellationToken> BeCheckingTheMemory = ct =>
{
for (;;)
{
if (ct.IsCancellationRequested) break;
FreeResources();
Thread.Sleep(500);
}
};
HostingEnvironment.QueueBackgroundWorkItem(ct =>
{
new Task(() => BeCheckingTheMemory(ct), TaskCreationOptions.LongRunning).Start();
});
}
If you expect big collection the for loop could be optimised.
Related
I have 16 tasks doing the same job, each of them return an array. I want to combine the results in pairs and do same job until I have only one task. I don't know what is the best way to do this.
public static IComparatorNetwork[] Prune(IComparatorNetwork[] nets, int numTasks)
{
var tasks = new Task[numTasks];
var netsPerTask = nets.Length/numTasks;
var start = 0;
var concurrentSet = new ConcurrentBag<IComparatorNetwork>();
for(var i = 0; i < numTasks; i++)
{
IComparatorNetwork[] taskNets;
if (i == numTasks - 1)
{
taskNets = nets.Skip(start).ToArray();
}
else
{
taskNets = nets.Skip(start).Take(netsPerTask).ToArray();
}
start += netsPerTask;
tasks[i] = Task.Factory.StartNew(() =>
{
var pruner = new Pruner();
concurrentSet.AddRange(pruner.Prune(taskNets));
});
}
Task.WaitAll(tasks.ToArray());
if(numTasks > 1)
{
return Prune(concurrentSet.ToArray(), numTasks/2);
}
return concurrentSet.ToArray();
}
Right now I am waiting for all tasks to complete then I repeat with half of the tasks until I have only one. I would like to not have to wait for all on each iteration. I am very new with parallel programming probably the approach is bad.
The code I am trying to parallelize is the following:
public IComparatorNetwork[] Prune(IComparatorNetwork[] nets)
{
var result = new List<IComparatorNetwork>();
for (var i = 0; i < nets.Length; i++)
{
var isSubsumed = false;
for (var index = result.Count - 1; index >= 0; index--)
{
var n = result[index];
if (nets[i].IsSubsumed(n))
{
isSubsumed = true;
break;
}
if (n.IsSubsumed(nets[i]))
{
result.Remove(n);
}
}
if (!isSubsumed)
{
result.Add(nets[i]);
}
}
return result.ToArray();
}`
So what you're fundamentally doing here is aggregating values, but in parallel. Fortunately, PLINQ already has an implementation of Aggregate that works in parallel. So in your case you can simply wrap each element in the original array in its own one element array, and then your Prune operation is able to combine any two arrays of nets into a new single array.
public static IComparatorNetwork[] Prune(IComparatorNetwork[] nets)
{
return nets.Select(net => new[] { net })
.AsParallel()
.Aggregate((a, b) => new Pruner().Prune(a.Concat(b).ToArray()));
}
I'm not super knowledgeable about the internals of their aggregate method, but I would imagine it's likely pretty good and doesn't spend a lot of time waiting unnecessarily. But, if you want to write your own, so that you can be sure the workers are always pulling in new work as soon as their is new work, here is my own implementation. Feel free to compare the two in your specific situation to see which performs best for your needs. Note that PLINQ is configurable in many ways, feel free to experiment with other configurations to see what works best for your situation.
public static T AggregateInParallel<T>(this IEnumerable<T> values, Func<T, T, T> function, int numTasks)
{
Queue<T> queue = new Queue<T>();
foreach (var value in values)
queue.Enqueue(value);
if (!queue.Any())
return default(T); //Consider throwing or doing something else here if the sequence is empty
(T, T)? GetFromQueue()
{
lock (queue)
{
if (queue.Count >= 2)
{
return (queue.Dequeue(), queue.Dequeue());
}
else
{
return null;
}
}
}
var tasks = Enumerable.Range(0, numTasks)
.Select(_ => Task.Run(() =>
{
var pair = GetFromQueue();
while (pair != null)
{
var result = function(pair.Value.Item1, pair.Value.Item2);
lock (queue)
{
queue.Enqueue(result);
}
pair = GetFromQueue();
}
}))
.ToArray();
Task.WaitAll(tasks);
return queue.Dequeue();
}
And the calling code for this version would look like:
public static IComparatorNetwork[] Prune2(IComparatorNetwork[] nets)
{
return nets.Select(net => new[] { net })
.AggregateInParallel((a, b) => new Pruner().Prune(a.Concat(b).ToArray()), nets.Length / 2);
}
As mentioned in comments, you can make the pruner's Prune method much more efficient by having it accept two collections, not just one, and only comparing items from each collection with the other, knowing that all items from the same collection will not subsume any others from that collection. This makes the method not only much shorter, simpler, and easier to understand, but also removes a sizeable portion of the expensive comparisons. A few minor adaptations can also greatly reduce the number of intermediate collections created.
public static IReadOnlyList<IComparatorNetwork> Prune(IReadOnlyList<IComparatorNetwork> first, IReadOnlyList<IComparatorNetwork> second)
{
var firstItemsNotSubsumed = first.Where(outerNet => !second.Any(innerNet => outerNet.IsSubsumed(innerNet)));
var secondItemsNotSubsumed = second.Where(outerNet => !first.Any(innerNet => outerNet.IsSubsumed(innerNet)));
return firstItemsNotSubsumed.Concat(secondItemsNotSubsumed).ToList();
}
With the the calling code just needs minor adaptations to ensure the types match up and that you pass in both collections rather than concatting them first.
public static IReadOnlyList<IComparatorNetwork> Prune(IReadOnlyList<IComparatorNetwork> nets)
{
return nets.Select(net => (IReadOnlyList<IComparatorNetwork>)new[] { net })
.AggregateInParallel((a, b) => Pruner.Prune(a, b), nets.Count / 2);
}
I have a problem during serialization to JSON file, when using Newtonsoft.Json.
In a loop I am fiering tasks in various threads:
List<Task> jockeysTasks = new List<Task>();
for (int i = 1; i < 1100; i++)
{
int j = i;
Task task = Task.Run(async () =>
{
LoadedJockey jockey = new LoadedJockey();
jockey = await Task.Run(() => _scrapServices.ScrapSingleJockeyPL(j));
if (jockey.Name != null)
{
_allJockeys.Add(jockey);
}
UpdateStatusBar = j * 100 / 1100;
if (j % 100 == 0)
{
await Task.Run(() => _dataServices.SaveAllJockeys(_allJockeys)); //saves everything to JSON file
}
});
jockeysTasks.Add(task);
}
await Task.WhenAll(jockeysTasks);
And if (j % 100 == 0), it is rying to save the collection _allJockeys to the file (I will make some counter to make it more reliable, but that is not the point):
public void SaveAllJockeys(List<LoadedJockey> allJockeys)
{
if (allJockeys.Count != 0)
{
if (File.Exists(_jockeysFileName)) File.Delete(_jockeysFileName);
try
{
using (StreamWriter file = File.CreateText(_jockeysFileName))
{
JsonSerializer serializer = new JsonSerializer();
serializer.Serialize(file, allJockeys);
}
}
catch (Exception e)
{
dialog.ShowDialog("Could not save the results, " + e.ToString(), "Error");
}
}
}
During that time, as I belive, another tasks are adding new collection item to the collection, and it is throwing to me the exception:
Collection was modified; enumeration operation may not execute.
As I was reading in THE ARTICLE, you can change type of iteration to avoid an exception. As far as I know, I can not modify the way, how Newtonsoft.Json pack is doing it.
Thank you in advance for any tips how to avoid the exception and save the collection wihout unexpected changes.
You should probably inherit from List and use a ReaderWriterLock (https://learn.microsoft.com/en-us/dotnet/api/system.threading.readerwriterlock?view=netframework-4.8)
i.e. (not tested pseudo C#)
public class MyJockeys: List<LoadedJockey>
{
System.Threading.ReaderWriterLock _rw_lock = new System.Threading.ReaderWriterLock();
public new Add(LoadedJockey j)
{
try
{
_rw_lock.AcquireWriterLock(5000); // or whatever you deem an acceptable timeout
base.Add(j);
}
finally
{
_rw_lock.ReleaseWriterLock();
}
}
public ToJSON()
{
try
{
_rw_lock.AcquireReaderLock(5000); // or whatever you deem an acceptable timeout
string s = ""; // Serialize here using Newtonsoft
return s;
}
finally
{
_rw_lock.ReleaseReaderLock();
}
}
// And override Remove and anything else you need
}
Get the idea?
Hope this helps.
Regards,
Adam.
I tied to use ToList() on the collection, what creates copy of the list, with positive effect.
We have some legacy code that tests thread safety on a number of classes. A recent hardware upgrade (from 2 to 4 core) is presenting random failures with an exception accessing an item from List<>.
[Test]
public void CheckThreadSafeInThreadPool()
{
Console.WriteLine("Initialised ThreadLocalDataContextStore...");
var container = new ContextContainerTest();
Console.WriteLine("Starting...");
container.StartPool();
while (container.ThreadNumber < 5)
{
Thread.Sleep(1000);
}
foreach (var message in container.Messages)
{
Console.WriteLine(message);
if (message.Contains("A supposedly new thread is able to see the old value"))
{
Assert.Fail("Thread leaked values - not thread safe");
}
}
Console.WriteLine("Complete");
}
public class ContextContainerTest
{
private ThreadLocalDataContextStore store;
public int ThreadNumber;
public List<string> Messages;
public void StartPool()
{
Messages = new List<string>();
store = new ThreadLocalDataContextStore();
store.ClearContext();
var msoContext = new MsoContext();
msoContext.Principal = new GenericPrincipal(new GenericIdentity("0"), null);
store.StoreContext(msoContext);
for (var counter = 0; counter < 5; counter++)
{
Messages.Add(string.Format("Assigning work item {0}", counter));
ThreadPool.QueueUserWorkItem(ExecuteMe, counter);
}
}
public void ExecuteMe(object input)
{
string hashCode = Thread.CurrentThread.GetHashCode().ToString();
if (store.GetContext() == null || store.GetContext().Principal == null)
{
Messages.Add(string.Format("[{0}] A New Thread", hashCode));
var msoContext = new MsoContext();
msoContext.Principal = new GenericPrincipal(new GenericIdentity("2"), null);
store.StoreContext(msoContext);
}
else if (store.GetContext().Principal.Identity.Name == "1")
{
Messages.Add(string.Format("[{0}] Thread reused", hashCode));
}
else
{
Messages.Add(string.Format("[{0}] A supposedly new thread is able to see the old value {1}"
, hashCode, store.GetContext().GetDiagnosticInformation()));
}
Messages.Add(string.Format("[{0}] Context at starting: {1}", hashCode, store.GetContext().GetDiagnosticInformation()));
store.GetContext().SetAsCurrent(new GenericPrincipal(new GenericIdentity("99"), null));
Messages.Add(string.Format("[{0}] Context at End: {1}", hashCode, store.GetContext().GetDiagnosticInformation()));
store.GetContext().SetAsCurrent(new GenericPrincipal(new GenericIdentity("1"), null));
Thread.Sleep(80);
ThreadNumber++;
}
}
The failure is random, and occurs at the following section of code within the test itself;
foreach (var message in container.Messages)
{
Console.WriteLine(message);
if (message.Contains("A supposedly new thread is able to see the old value"))
{
Assert.Fail("Thread leaked values - not thread safe");
}
}
A subtle change resolves the issue, but someone is niggling that we should not need to do that, why is the message null if Messages is not and why does it work most of the time and not others.
if (message != null && message.Contains("A supposedly new thread is able to see the old value"))
{
}
Another solution was to change the List to be threadsafe, but that doesnt answer why the issue arose in the first place.
List<T> is not a thread safe element if you are using .Net 4 and above you can use ConcurrentBag<T> from System.Collection.Concurrent and if older you got to implement one yourself. See this might help.
Hope I was helpful.
In my Producer-Consumer scenario, I have multiple consumers, and each of the consumers send an action to external hardware, which may take some time. My Pipeline looks somewhat like this:
BatchBlock --> TransformBlock --> BufferBlock --> (Several) ActionBlocks
I have assigned BoundedCapacity of my ActionBlocks to 1.
What I want in theory is, I want to trigger the Batchblock to send a group of items to the Transformblock only when one of my Actionblocks are available for operation. Till then the Batchblock should just keep buffering elements and not pass them on to the Transformblock. My batch-sizes are variable. As Batchsize is mandatory, I do have a really high upper-limit for BatchBlock batch size, however I really don't wish to reach upto that limit, I would like to trigger my batches depending upon the availability of the Actionblocks permforming the said task.
I have achieved this with the help of the Triggerbatch() method. I am calling the Batchblock.Triggerbatch() as the last action in my ActionBlock.However interestingly after several days of working properly the pipeline has come to a hault. Upon checking I found out that sometimes the inputs to the batchblock come in after the ActionBlocks are done with their work. In this case the ActionBlocks do actually call Triggerbatch at the end of their work, however since at this point there is no input to the Batchblock at all, the call to TriggerBatch is fruitless. And after a while when inputs do flow in to the Batchblock, there is no one left to call TriggerBatch and restart the Pipeline. I was looking for something where I could just check if something is infact present in the inputbuffer of the Batchblock, however there is no such feature available, I could also not find a way to check if the TriggerBatch was fruitful.
Could anyone suggest a possible solution to my problem. Unfortunately using a Timer to triggerbatches is not an option for me. Except for the start of the Pipeline, the throttling should be governed only by the availability of one of the ActionBlocks.
The example code is here:
static BatchBlock<int> _groupReadTags;
static void Main(string[] args)
{
_groupReadTags = new BatchBlock<int>(1000);
var bufferOptions = new DataflowBlockOptions{BoundedCapacity = 2};
BufferBlock<int> _frameBuffer = new BufferBlock<int>(bufferOptions);
var consumerOptions = new ExecutionDataflowBlockOptions { BoundedCapacity = 1};
int batchNo = 1;
TransformBlock<int[], int> _workingBlock = new TransformBlock<int[], int>(list =>
{
Console.WriteLine("\n\nWorking on Batch Number {0}", batchNo);
//_groupReadTags.TriggerBatch();
int sum = 0;
foreach (int item in list)
{
Console.WriteLine("Elements in batch {0} :: {1}", batchNo, item);
sum += item;
}
batchNo++;
return sum;
});
ActionBlock<int> _worker1 = new ActionBlock<int>(async x =>
{
Console.WriteLine("Number from ONE :{0}",x);
await Task.Delay(500);
Console.WriteLine("BatchBlock Output Count : {0}", _groupReadTags.OutputCount);
_groupReadTags.TriggerBatch();
},consumerOptions);
ActionBlock<int> _worker2 = new ActionBlock<int>(async x =>
{
Console.WriteLine("Number from TWO :{0}", x);
await Task.Delay(2000);
_groupReadTags.TriggerBatch();
}, consumerOptions);
_groupReadTags.LinkTo(_workingBlock);
_workingBlock.LinkTo(_frameBuffer);
_frameBuffer.LinkTo(_worker1);
_frameBuffer.LinkTo(_worker2);
_groupReadTags.Post(10);
_groupReadTags.Post(20);
_groupReadTags.TriggerBatch();
Task postingTask = new Task(() => PostStuff());
postingTask.Start();
Console.ReadLine();
}
static void PostStuff()
{
for (int i = 0; i < 10; i++)
{
_groupReadTags.Post(i);
Thread.Sleep(100);
}
Parallel.Invoke(
() => _groupReadTags.Post(100),
() => _groupReadTags.Post(200),
() => _groupReadTags.Post(300),
() => _groupReadTags.Post(400),
() => _groupReadTags.Post(500),
() => _groupReadTags.Post(600),
() => _groupReadTags.Post(700),
() => _groupReadTags.Post(800)
);
}
Here is an alternative BatchBlock implementation with some extra features. It includes a TriggerBatch method with this signature:
public int TriggerBatch(int nextMinBatchSizeIfEmpty);
Invoking this method will either trigger a batch immediately if the input queue is not empty, otherwise it will set a temporary MinBatchSize that will affect only the next batch. You could invoke this method with a small value for nextMinBatchSizeIfEmpty to ensure that in case a batch cannot be currently produced, the next batch will occur sooner than the configured BatchSize at the block's constructor.
This method returns the size of the batch produced. It returns 0 in case that the input queue is empty, or the output queue is full, or the block has completed.
public class BatchBlockEx<T> : ITargetBlock<T>, ISourceBlock<T[]>
{
private readonly ITargetBlock<T> _input;
private readonly IPropagatorBlock<T[], T[]> _output;
private readonly Queue<T> _queue;
private readonly object _locker = new object();
private int _nextMinBatchSize = Int32.MaxValue;
public Task Completion { get; }
public int InputCount { get { lock (_locker) return _queue.Count; } }
public int OutputCount => ((BufferBlock<T[]>)_output).Count;
public int BatchSize { get; }
public BatchBlockEx(int batchSize, DataflowBlockOptions dataflowBlockOptions = null)
{
if (batchSize < 1) throw new ArgumentOutOfRangeException(nameof(batchSize));
dataflowBlockOptions = dataflowBlockOptions ?? new DataflowBlockOptions();
if (dataflowBlockOptions.BoundedCapacity != DataflowBlockOptions.Unbounded &&
dataflowBlockOptions.BoundedCapacity < batchSize)
throw new ArgumentOutOfRangeException(nameof(batchSize),
"Number must be no greater than the value specified in BoundedCapacity.");
this.BatchSize = batchSize;
_output = new BufferBlock<T[]>(dataflowBlockOptions);
_queue = new Queue<T>(batchSize);
_input = new ActionBlock<T>(async item =>
{
T[] batch = null;
lock (_locker)
{
_queue.Enqueue(item);
if (_queue.Count == batchSize || _queue.Count >= _nextMinBatchSize)
{
batch = _queue.ToArray(); _queue.Clear();
_nextMinBatchSize = Int32.MaxValue;
}
}
if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
}, new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 1,
CancellationToken = dataflowBlockOptions.CancellationToken
});
var inputContinuation = _input.Completion.ContinueWith(async t =>
{
try
{
T[] batch = null;
lock (_locker)
{
if (_queue.Count > 0)
{
batch = _queue.ToArray(); _queue.Clear();
}
}
if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
}
finally
{
if (t.IsFaulted)
{
_output.Fault(t.Exception.InnerException);
}
else
{
_output.Complete();
}
}
}, TaskScheduler.Default).Unwrap();
this.Completion = Task.WhenAll(inputContinuation, _output.Completion);
}
public void Complete() => _input.Complete();
void IDataflowBlock.Fault(Exception ex) => _input.Fault(ex);
public int TriggerBatch(Func<T[], bool> condition, int nextMinBatchSizeIfEmpty)
{
if (nextMinBatchSizeIfEmpty < 1)
throw new ArgumentOutOfRangeException(nameof(nextMinBatchSizeIfEmpty));
int count = 0;
lock (_locker)
{
if (_queue.Count > 0)
{
T[] batch = _queue.ToArray();
if (condition == null || condition(batch))
{
bool accepted = _output.Post(batch);
if (accepted) { _queue.Clear(); count = batch.Length; }
}
_nextMinBatchSize = Int32.MaxValue;
}
else
{
_nextMinBatchSize = nextMinBatchSizeIfEmpty;
}
}
return count;
}
public int TriggerBatch(Func<T[], bool> condition)
=> TriggerBatch(condition, Int32.MaxValue);
public int TriggerBatch(int nextMinBatchSizeIfEmpty)
=> TriggerBatch(null, nextMinBatchSizeIfEmpty);
public int TriggerBatch() => TriggerBatch(null, Int32.MaxValue);
DataflowMessageStatus ITargetBlock<T>.OfferMessage(
DataflowMessageHeader messageHeader, T messageValue,
ISourceBlock<T> source, bool consumeToAccept)
{
return _input.OfferMessage(messageHeader, messageValue, source,
consumeToAccept);
}
T[] ISourceBlock<T[]>.ConsumeMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target, out bool messageConsumed)
{
return _output.ConsumeMessage(messageHeader, target, out messageConsumed);
}
bool ISourceBlock<T[]>.ReserveMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target)
{
return _output.ReserveMessage(messageHeader, target);
}
void ISourceBlock<T[]>.ReleaseReservation(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target)
{
_output.ReleaseReservation(messageHeader, target);
}
IDisposable ISourceBlock<T[]>.LinkTo(ITargetBlock<T[]> target,
DataflowLinkOptions linkOptions)
{
return _output.LinkTo(target, linkOptions);
}
}
Another overload of the TriggerBatch method allows to examine the batch that can be currently produced, and decide if it should be triggered or not:
public int TriggerBatch(Func<T[], bool> condition);
The BatchBlockEx class does not support the Greedy and MaxNumberOfGroups options of the built-in BatchBlock.
I have found that using TriggerBatch in this way is unreliable:
_groupReadTags.Post(10);
_groupReadTags.Post(20);
_groupReadTags.TriggerBatch();
Apparently TriggerBatch is intended to be used inside the block, not outside it like this. I have seen this result in odd timing issues, like items from next batch batch being included in the current batch, even though TriggerBatch was called first.
Please see my answer to this question for an alternative using DataflowBlock.Encapsulate: BatchBlock produces batch with elements sent after TriggerBatch()
I am trying to poll an API as fast and as efficiently as possible to get market data. The API allows you to get market data from batchSize markets per request. The API allows you to have 3 concurrent requests but no more (or throws errors).
I may be requesting data from many more than batchSize different markets.
I continuously loop through all of the markets, requesting the data in batches, one batch per thread and 3 threads at any time.
The total number of markets (and hence batches) can change at any time.
I'm using the following code:
private static object lockObj = new object();
private void PollMarkets()
{
const int NumberOfConcurrentRequests = 3;
for (int i = 0; i < NumberOfConcurrentRequests; i++)
{
int batch = 0;
Task.Factory.StartNew(async () =>
{
while (true)
{
if (markets.Count > 0)
{
List<string> batchMarketIds;
lock (lockObj)
{
var numBatches = (int)Math.Ceiling((double)markets.Count / batchSize);
batchMarketIds = markets.Keys.Skip(batch*batchSize).Take(batchSize).ToList();
batch = (batch + 1) % numBatches;
}
var marketData = await GetMarketData(batchMarketIds);
// Do something with marketData
}
else
{
await Task.Delay(1000); // wait for some markets to be added.
}
}
}
});
}
}
Even though there is a lock in the critical section, each thread starts with batch = 0 (each thread is often polling for duplicate data).
If I change batch to a private volatile field the above code works as I want it to (volatile and lock).
So for some reason my lock doesn't work? I feel like it's something obvious but I'm missing it.
I believe that it is best here to use a lock instead of a volatile field, is this also correct?
Thanks
The issue was that you were defining the batch variable inside the for loop. That meant that the threads were using their own variable instead of sharing it.
In my mind you should use Queue<> to create a jobs pipeline.
Something like this
private int batchSize = 10;
private Queue<int> queue = new Queue<int>();
private void AddMarket(params int[] marketIDs)
{
lock (queue)
{
foreach (var marketID in marketIDs)
{
queue.Enqueue(marketID);
}
if (queue.Count >= batchSize)
{
Monitor.Pulse(queue);
}
}
}
private void Start()
{
for (var tid = 0; tid < 3; tid++)
{
Task.Run(async () =>
{
while (true)
{
List<int> toProcess;
lock (queue)
{
if (queue.Count < batchSize)
{
Monitor.Wait(queue);
continue;
}
toProcess = new List<int>(batchSize);
for (var count = 0; count < batchSize; count++)
{
toProcess.Add(queue.Dequeue());
}
if (queue.Count >= batchSize)
{
Monitor.Pulse(queue);
}
}
var marketData = await GetMarketData(toProcess);
}
});
}
}