Porting a simple algorithm to TPL with thread-local data - c#

I have a very simple algorithm that clusters blobs based on their x and y distance to each other. I ported the same to use Parallel.For with thread-local data but the results were incorrect. In other words, I may not have used synchronization properly to isolate each thread.
Simply cannot figure out why the results of the two implementations are different. Any thoughts would be appreciated.
I wanted to post fully compilable code but the objects used are too tightly integrated to the project context. Since the algorithm is very simeple, hopefully that will not get in the way.
Class level declerations:
/// <summary>
/// Contains the master blobl collection to be clustered.
/// </summary>
public List<Blob> Blobs { get; private set; }
/// <summary>
/// List of clusters to be computed.
/// </summary>
public List<Cluster> Clusters { get; private set; }
Linear Example (Works fine):
Cluster cluster = null;
for (int i = 0; i < this.Blobs.Count; i++)
{
cluster = new Cluster();
cluster.Id = i;
if (this.Blobs [i].ClusterId == 0)
{
cluster.Blobs.Add(this.Blobs [i], i);
for (int j = 0; j < this.Blobs.Count; j++)
{
if (this.Blobs [j].ClusterId == 0)
{
if (this.Blobs [i].Rectangle.IntersectsWith(this.Blobs [j].Rectangle))
{
cluster.Blobs.Add(this.Blobs [j], i);
}
else if (this.Blobs [i].Rectangle.IsCloseTo(this.Blobs [j].Rectangle, distanceThreshold))
{
cluster.Blobs.Add(this.Blobs [j], i);
}
}
}
}
if (cluster.Blobs.Count > 2)
{
this.Clusters.Add(cluster);
}
}
Parallel Port (Incorrect clusters):
System.Threading.Tasks.Parallel.For<Cluster>
(
0,
this.Blobs.Count,
new ParallelOptions() { MaxDegreeOfParallelism = degreeOfParallelism },
() => new Cluster(),
(i, loop, cluster) =>
{
cluster.Id = i;
if (this.Blobs [i].ClusterId == 0)
{
cluster.Blobs.Add(this.Blobs [i], i);
for (int j = 0; j < this.Blobs.Count; j++)
{
if (this.Blobs [j].ClusterId == 0)
{
if (this.Blobs [i].Rectangle.IntersectsWith(this.Blobs [j].Rectangle))
{
cluster.Blobs.Add(this.Blobs [j], i);
}
else if (this.Blobs [i].Rectangle.IsCloseTo(this.Blobs [j].Rectangle, distanceThreshold))
{
cluster.Blobs.Add(this.Blobs [j], i);
}
}
}
}
return (cluster);
},
(cluster) =>
{
lock (this.Clusters)
{
if (cluster.Blobs.Count > 2)
{
this.Clusters.Add(cluster);
}
}
}
);

I think your problem is misunderstanding of that “thread-local data”. According to the documentation of Parallel.For(), it's:
[…] some local state that may be shared amongst iterations that execute on the same thread.
What this means is that some iterations of your loop will share the same Cluster object, which will cause incorrect results for you. If the localInit and localFinally executed for each iteration, then they would be useless, because you could do exactly the same thing by moving their code to the beginning and end of the loop.
The reason why the delegates are there is that you can use them for optimization. With them, you don't have to access shared state (in your case this.Clusters) as often, which can improve performance.
If you don't need this optimization, don't use the two delegates and instead write the body of your loop like this:
i =>
{
var cluster = new Cluster { Id = i };
// rest of the loop here
if (cluster.Blobs.Count > 2)
{
lock (this.Clusters)
{
this.Clusters.Add(cluster);
}
}
}
(In the above code, I also switched the lock with the if as an optimization.)
If you think the optimization using thread-local data would be useful for you (i.e. it would actually speed things up), you can use it. But the data in question would have to be a list of Clusters, not just a single Cluster. Something like:
() => new List<Cluster>(),
(i, loop, clusters) =>
{
var cluster = new Cluster { Id = i };
// rest of the loop here
if (cluster.Blobs.Count > 2)
clusters.Add(cluster);
return clusters;
},
clusters =>
{
lock (this.Clusters)
{
this.Clusters.AddRange(clusters);
}
}

Related

Tasks combine result and continue

I have 16 tasks doing the same job, each of them return an array. I want to combine the results in pairs and do same job until I have only one task. I don't know what is the best way to do this.
public static IComparatorNetwork[] Prune(IComparatorNetwork[] nets, int numTasks)
{
var tasks = new Task[numTasks];
var netsPerTask = nets.Length/numTasks;
var start = 0;
var concurrentSet = new ConcurrentBag<IComparatorNetwork>();
for(var i = 0; i < numTasks; i++)
{
IComparatorNetwork[] taskNets;
if (i == numTasks - 1)
{
taskNets = nets.Skip(start).ToArray();
}
else
{
taskNets = nets.Skip(start).Take(netsPerTask).ToArray();
}
start += netsPerTask;
tasks[i] = Task.Factory.StartNew(() =>
{
var pruner = new Pruner();
concurrentSet.AddRange(pruner.Prune(taskNets));
});
}
Task.WaitAll(tasks.ToArray());
if(numTasks > 1)
{
return Prune(concurrentSet.ToArray(), numTasks/2);
}
return concurrentSet.ToArray();
}
Right now I am waiting for all tasks to complete then I repeat with half of the tasks until I have only one. I would like to not have to wait for all on each iteration. I am very new with parallel programming probably the approach is bad.
The code I am trying to parallelize is the following:
public IComparatorNetwork[] Prune(IComparatorNetwork[] nets)
{
var result = new List<IComparatorNetwork>();
for (var i = 0; i < nets.Length; i++)
{
var isSubsumed = false;
for (var index = result.Count - 1; index >= 0; index--)
{
var n = result[index];
if (nets[i].IsSubsumed(n))
{
isSubsumed = true;
break;
}
if (n.IsSubsumed(nets[i]))
{
result.Remove(n);
}
}
if (!isSubsumed)
{
result.Add(nets[i]);
}
}
return result.ToArray();
}`
So what you're fundamentally doing here is aggregating values, but in parallel. Fortunately, PLINQ already has an implementation of Aggregate that works in parallel. So in your case you can simply wrap each element in the original array in its own one element array, and then your Prune operation is able to combine any two arrays of nets into a new single array.
public static IComparatorNetwork[] Prune(IComparatorNetwork[] nets)
{
return nets.Select(net => new[] { net })
.AsParallel()
.Aggregate((a, b) => new Pruner().Prune(a.Concat(b).ToArray()));
}
I'm not super knowledgeable about the internals of their aggregate method, but I would imagine it's likely pretty good and doesn't spend a lot of time waiting unnecessarily. But, if you want to write your own, so that you can be sure the workers are always pulling in new work as soon as their is new work, here is my own implementation. Feel free to compare the two in your specific situation to see which performs best for your needs. Note that PLINQ is configurable in many ways, feel free to experiment with other configurations to see what works best for your situation.
public static T AggregateInParallel<T>(this IEnumerable<T> values, Func<T, T, T> function, int numTasks)
{
Queue<T> queue = new Queue<T>();
foreach (var value in values)
queue.Enqueue(value);
if (!queue.Any())
return default(T); //Consider throwing or doing something else here if the sequence is empty
(T, T)? GetFromQueue()
{
lock (queue)
{
if (queue.Count >= 2)
{
return (queue.Dequeue(), queue.Dequeue());
}
else
{
return null;
}
}
}
var tasks = Enumerable.Range(0, numTasks)
.Select(_ => Task.Run(() =>
{
var pair = GetFromQueue();
while (pair != null)
{
var result = function(pair.Value.Item1, pair.Value.Item2);
lock (queue)
{
queue.Enqueue(result);
}
pair = GetFromQueue();
}
}))
.ToArray();
Task.WaitAll(tasks);
return queue.Dequeue();
}
And the calling code for this version would look like:
public static IComparatorNetwork[] Prune2(IComparatorNetwork[] nets)
{
return nets.Select(net => new[] { net })
.AggregateInParallel((a, b) => new Pruner().Prune(a.Concat(b).ToArray()), nets.Length / 2);
}
As mentioned in comments, you can make the pruner's Prune method much more efficient by having it accept two collections, not just one, and only comparing items from each collection with the other, knowing that all items from the same collection will not subsume any others from that collection. This makes the method not only much shorter, simpler, and easier to understand, but also removes a sizeable portion of the expensive comparisons. A few minor adaptations can also greatly reduce the number of intermediate collections created.
public static IReadOnlyList<IComparatorNetwork> Prune(IReadOnlyList<IComparatorNetwork> first, IReadOnlyList<IComparatorNetwork> second)
{
var firstItemsNotSubsumed = first.Where(outerNet => !second.Any(innerNet => outerNet.IsSubsumed(innerNet)));
var secondItemsNotSubsumed = second.Where(outerNet => !first.Any(innerNet => outerNet.IsSubsumed(innerNet)));
return firstItemsNotSubsumed.Concat(secondItemsNotSubsumed).ToList();
}
With the the calling code just needs minor adaptations to ensure the types match up and that you pass in both collections rather than concatting them first.
public static IReadOnlyList<IComparatorNetwork> Prune(IReadOnlyList<IComparatorNetwork> nets)
{
return nets.Select(net => (IReadOnlyList<IComparatorNetwork>)new[] { net })
.AggregateInParallel((a, b) => Pruner.Prune(a, b), nets.Count / 2);
}

How to cache slow resource initialisation from C# Web API REST Server?

Context
I am trying to implement a REST API web service that "wraps" an existing C program.
Problem / Goal
Given that the C program has slow initialisation time and high RAM usage when I tell it to open a specific folder (assume this cannot be improved), I am thinking of caching the C handle/object, so the next time a GET request hits the same folder, I can use the existing handle.
What I've tried
First declare a static dictionary mapping from folder path to handle:
static ConcurrentDictionary<string, IHandle> handles = new ConcurrentDictionary<string, IHandle>();
In my GET function:
IHandle theHandle = handles.GetOrAdd(dir.Name, x => {
return new Handle(x); //this is the slow and memory-intensive function
});
This way, whenever a specific folder has been GET'd before, it will already have a handle ready for me to use.
Why it's not good
So now I run the risk of running out of memory if too many folders are cached simultaneously. How might I add a GC-like background process to TryRemove() and call IHandle.Dispose() on old handles, perhaps in a Least Recently Used or Least Frequently Used policy? Ideally it should start triggering only upon low physical memory available.
I have tried adding the following statement in the GET function, but it seems too hacky and is very limited in function. This way works OK only if I always want handles to expire after 10 seconds, and it does not restart the timer if a subsequent request comes in within 10 seconds.
HostingEnvironment.QueueBackgroundWorkItem(ct =>
{
System.Threading.Thread.Sleep(10000);
if (handles.TryRemove(dir.Name, out var handle2))
handle2.Dispose();
});
What this question is not
I don't think caching the output is the solution here. After I return the result of this GET request (it's just the metadata of the folder contents), there might be another GET request for more in-depth data, which requires calling Handle's methods.
I hope my question is clear enough!
Handles closing on low memory.
ConcurrentQueue<(string, IHandle)> handles = new ConcurrentQueue<(string, IHandle)>();
void CheckMemory_OptionallyReleaseOldHandles()
{
var performance = new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes");
while (performance.NextValue() <= YOUR_TRESHHOLD)
{
if (handles.TryDequeue(out ValueTuple<string, IHandle> value))
{
value.Item2.Dispose();
}
}
}
Your Get method.
IHandle GetHandle()
{
IHandle theHandle = handles.FirstOrDefault(v => v.Item1 == dir.Name).Item2;
if (theHandle == null)
{
theHandle = new Handle(dir.Name);
handles.Enqueue((dir.Name, theHandle));
}
return theHandle;
});
Your background task.
void SetupMemoryCheck()
{
Action<CancellationToken> BeCheckingTheMemory = ct =>
{
for(;;)
{
if (ct.IsCancellationRequested)
{
break;
}
CheckMemory_OptionallyReleaseOldHandles();
Thread.Sleep(500);
};
};
HostingEnvironment.QueueBackgroundWorkItem(ct =>
{
var tf = new TaskFactory(ct, TaskCreationOptions.LongRunning, TaskContinuationOptions.None, TaskScheduler.Current);
tf.StartNew(() => BeCheckingTheMemory(ct));
});
}
I suppose the collection will have little elems so there is no need to dictionary.
I did’t catch your LRU/LFU demand first time. Here you can check for some hybrid LRU/LFU cache model.
Handles closing on low memory.
/*
* string – handle name,
* IHandle – the handle,
* int – hit count,
*/
ConcurrentDictionary<string, (IHandle, int)> handles = new ConcurrentDictionary<string, (IHandle, int)>();
void FreeResources()
{
if (handles.Count == 0)
{
return;
}
var performance = new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes");
while (performance.NextValue() <= YOUR_TRESHHOLD)
{
int maxIndex = (int)Math.Ceiling(handles.Count / 2.0d);
KeyValuePair<string, (IHandle, int)> candidate = handles.First();
for (int index = 1; index < maxIndex; index++)
{
KeyValuePair<string, (IHandle, int)> item = handles.ElementAt(index);
if(item.Value.Item2 < candidate.Value.Item2)
{
candidate = item;
}
}
candidate.Value.Item1.Dispose();
handles.TryRemove(candidate.Key, out _);
}
}
Get method.
IHandle GetHandle(Dir dir, int handleOpenAttemps = 1)
{
if(handles.TryGetValue(dir.Name, out (IHandle, int) handle))
{
handle.Item2++;
}
else
{
if(new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes").NextValue() < YOUR_TRESHHOLD)
{
FreeResources();
}
try
{
handle.Item1 = new Handle(dir.Name);
}
catch (OutOfMemoryException)
{
if (handleOpenAttemps == 2)
{
return null;
}
FreeResources();
return GetHandle(dir, handleOpenAttemps++);
}
catch (Exception)
{
// Your handling.
}
handle.Item2 = 1;
handles.TryAdd(dir.Name, handle);
}
return handle.Item1;
}
Background task.
void SetupMemoryCheck()
{
Action<CancellationToken> BeCheckingTheMemory = ct =>
{
for (;;)
{
if (ct.IsCancellationRequested) break;
FreeResources();
Thread.Sleep(500);
}
};
HostingEnvironment.QueueBackgroundWorkItem(ct =>
{
new Task(() => BeCheckingTheMemory(ct), TaskCreationOptions.LongRunning).Start();
});
}
If you expect big collection the for loop could be optimised.

Processing records speed

I realise this is a non-specific code question. But I suspect that people with answers are on this forum.
I am receiving a large amount of records of < 100 bytes via TCP at a rate of 10 per millisecond.
I have to parse and process the data and that takes me 100 microseconds - so I am pretty maxed out.
Does 100 microseconds seem large?
Here is an example of the kind of processing I do with LINQ. It is really convenient - but is it inherently slow?
public void Process()
{
try
{
int ptr = PayloadOffset + 1;
var cPair = MessageData.GetString(ref ptr, 7);
var orderID = MessageData.GetString(ref ptr, 15);
if (Book.CPairs.ContainsKey(cPair))
{
var cPairGroup = Book.CPairs[cPair];
if (cPairGroup.BPrices != null)
{
cPairGroup.BPrices.ForEach(x => { x.BOrders.RemoveAll(y => y.OrderID.Equals(orderID)); });
cPairGroup.BPrices.RemoveAll(x => x.BOrders.Count == 0);
}
}
}
}
public class BOrderGroup
{
public double Amount;
public string OrderID;
}
public class BPriceGroup
{
public double BPrice;
public List<BOrderGroup> BOrders;
}
public class CPairGroup
{
public List<BPriceGroup> BPrices;
}
public static Dictionary<string, CPairGroup> CPairs;
As other have mentioned, LINQ is not inherently slow. But it can be slower than equivalent non-LINQ code (this is why Roslyn team has "Avoid LINQ" guide under coding conventions).
If this is your hot path and you need every microsecond than you should probably implement logic in such a way:
public void Process()
{
try
{
int ptr = PayloadOffset + 1;
var cPair = MessageData.GetString(ref ptr, 7);
var orderID = MessageData.GetString(ref ptr, 15);
if (Book.CPairs.TryGetValue(cPair, out CPairGroup cPairGroup) && cPairGroup != null)
{
for (int i = cPairGroup.BPrices.Count - 1; i >= 0; i--)
{
var x = cPairGroup.BPrices[i];
for (int j = x.BOrders.Count - 1; j >= 0; j--)
{
var y = x.BOrders[j];
if (y.OrderID.Equals(orderID))
{
x.BOrders.RemoveAt(j);
}
}
if (x.BOrders.Count == 0)
{
cPairGroup.BPrices.RemoveAt(i);
}
}
}
}
}
Main points:
Avoid double dictionary lookup by using TryGetValue
Single iteration over cPairGroup.BPrices
In place modification of structures by iterating backwards
This code should not contain any additional heap allocations

LibGit2Sharp implementation of showbranch independent

I'm trying to use LibGit2Sharp to recreate the functionality of git show-brach --independent which, according to the docs does this: Among the <reference>s given, display only the ones that cannot be reached from any other <reference>.
My best attempt so far is the following:
List<Commit> GetIndependent(IRepository repo, IEnumerable<Commit> commits)
{
var indep = new List<Commit>();
foreach (var commit in commits)
{
if (repo.Commits.QueryBy(new CommitFilter
{
FirstParentOnly = false,
IncludeReachableFrom = commit,
ExcludeReachableFrom = commits.Where(x => x.Equals(commit) == false)
}).Any())
{
indep.Add(commit);
}
}
return indep;
}
Unfortunately, this becomes astronomically slow as the amount of history increases. It's actually much faster for me to exec git directly, parse the output, and have LibGit2Sharp lookup the resulting SHAs than to use the above code. I assume this has to do with some optimization that Git has but LibGit2 does not. Is this even doing what I want? If so, is there a better way to achieve this in LibGit2Sharp?
I finally found a better way that utilizes merge bases, thanks to this question pointing me in the right direction.
Here's the new code:
/// <summary>
/// Implementation of `git show-branch --indepenent`
///
/// "Among the <reference>s given, display only the ones that cannot be reached from any other <reference>"
/// </summary>
/// <param name="commitsToCheck"></param>
/// <returns></returns>
private List<Commit> GetIndependent(IRepository repo, IEnumerable<Commit> commitsToCheck)
{
var commitList = commitsToCheck.ToList();
for (var i = commitList.Count - 1; i > 0; --i)
{
var first = commitList[i];
for (var j = commitList.Count - 1; j >= 0; --j)
{
if (i == j) continue;
var second = commitList[j];
var mergeBase = repo.ObjectDatabase.FindMergeBase(first, second);
if (first.Equals(mergeBase))
{
// First commit (i) is reachable from second (j), so drop i
commitList.RemoveAt(i);
// No reason to check anything else against this commit
j = -1;
} else if (second.Equals(mergeBase))
{
// Second (j) is reachable from first, so drop j
commitList.RemoveAt(j);
// If this was at a lower index than i, dec i since we shifted one down
if (j < i)
{
--i;
}
}
}
}
return commitList;
}

Using Parallel Linq Extensions to union two sequences, how can one yield the fastest results first?

Let's say I have two sequences returning integers 1 to 5.
The first returns 1, 2 and 3 very fast, but 4 and 5 take 200ms each.
public static IEnumerable<int> FastFirst()
{
for (int i = 1; i < 6; i++)
{
if (i > 3) Thread.Sleep(200);
yield return i;
}
}
The second returns 1, 2 and 3 with a 200ms delay, but 4 and 5 are returned fast.
public static IEnumerable<int> SlowFirst()
{
for (int i = 1; i < 6; i++)
{
if (i < 4) Thread.Sleep(200);
yield return i;
}
}
Unioning both these sequences give me just numbers 1 to 5.
FastFirst().Union(SlowFirst());
I cannot guarantee which of the two methods has delays at what point, so the order of the execution cannot guarantee a solution for me. Therefore, I would like to parallelise the union, in order to minimise the (artifical) delay in my example.
A real-world scenario: I have a cache that returns some entities, and a datasource that returns all entities. I'd like to be able to return an iterator from a method that internally parallelises the request to both the cache and the datasource so that the cached results yield as fast as possible.
Note 1: I realise this is still wasting CPU cycles; I'm not asking how can I prevent the sequences from iterating over their slow elements, just how I can union them as fast as possible.
Update 1: I've tailored achitaka-san's great response to accept multiple producers, and to use ContinueWhenAll to set the BlockingCollection's CompleteAdding just the once. I just put it here since it would get lost in the lack of comments formatting. Any further feedback would be great!
public static IEnumerable<TResult> SelectAsync<TResult>(
params IEnumerable<TResult>[] producer)
{
var resultsQueue = new BlockingCollection<TResult>();
var taskList = new HashSet<Task>();
foreach (var result in producer)
{
taskList.Add(
Task.Factory.StartNew(
() =>
{
foreach (var product in result)
{
resultsQueue.Add(product);
}
}));
}
Task.Factory.ContinueWhenAll(taskList.ToArray(), x => resultsQueue.CompleteAdding());
return resultsQueue.GetConsumingEnumerable();
}
Take a look at this.
The first method just returns everything in order results come.
The second checks uniqueness. If you chain them you will get the result you want I think.
public static class Class1
{
public static IEnumerable<TResult> SelectAsync<TResult>(
IEnumerable<TResult> producer1,
IEnumerable<TResult> producer2,
int capacity)
{
var resultsQueue = new BlockingCollection<TResult>(capacity);
var producer1Done = false;
var producer2Done = false;
Task.Factory.StartNew(() =>
{
foreach (var product in producer1)
{
resultsQueue.Add(product);
}
producer1Done = true;
if (producer1Done && producer2Done) { resultsQueue.CompleteAdding(); }
});
Task.Factory.StartNew(() =>
{
foreach (var product in producer2)
{
resultsQueue.Add(product);
}
producer2Done = true;
if (producer1Done && producer2Done) { resultsQueue.CompleteAdding(); }
});
return resultsQueue.GetConsumingEnumerable();
}
public static IEnumerable<TResult> SelectAsyncUnique<TResult>(this IEnumerable<TResult> source)
{
HashSet<TResult> knownResults = new HashSet<TResult>();
foreach (TResult result in source)
{
if (knownResults.Contains(result)) {continue;}
knownResults.Add(result);
yield return result;
}
}
}
The cache would be nearly instant compared to fetching from the database, so you could read from the cache first and return those items, then read from the database and return the items except those that were found in the cache.
If you try to parallelise this, you will add a lot of complexity but get quite a small gain.
Edit:
If there is no predictable difference in the speed of the sources, you could run them in threads and use a synchronised hash set to keep track of which items you have already got, put the new items in a queue, and let the main thread read from the queue:
public static IEnumerable<TItem> GetParallel<TItem, TKey>(Func<TItem, TKey> getKey, params IEnumerable<TItem>[] sources) {
HashSet<TKey> found = new HashSet<TKey>();
List<TItem> queue = new List<TItem>();
object sync = new object();
int alive = 0;
object aliveSync = new object();
foreach (IEnumerable<TItem> source in sources) {
lock (aliveSync) {
alive++;
}
new Thread(s => {
foreach (TItem item in s as IEnumerable<TItem>) {
TKey key = getKey(item);
lock (sync) {
if (found.Add(key)) {
queue.Add(item);
}
}
}
lock (aliveSync) {
alive--;
}
}).Start(source);
}
while (true) {
lock (sync) {
if (queue.Count > 0) {
foreach (TItem item in queue) {
yield return item;
}
queue.Clear();
}
}
lock (aliveSync) {
if (alive == 0) break;
}
Thread.Sleep(100);
}
}
Test stream:
public static IEnumerable<int> SlowRandomFeed(Random rnd) {
int[] values = new int[100];
for (int i = 0; i < 100; i++) {
int pos = rnd.Next(i + 1);
values[i] = i;
int temp = values[pos];
values[pos] = values[i];
values[i] = temp;
}
foreach (int value in values) {
yield return value;
Thread.Sleep(rnd.Next(200));
}
}
Test:
Random rnd = new Random();
foreach (int item in GetParallel(n => n, SlowRandomFeed(rnd), SlowRandomFeed(rnd), SlowRandomFeed(rnd), SlowRandomFeed(rnd))) {
Console.Write("{0:0000 }", item);
}

Categories

Resources