A Better Way To Make My Parallel.ForEach Thread Safe?

A Better Way To Make My Parallel.ForEach Thread Safe? - c#

I would like to make the following code thread-safe. Unfortunately, I have tried locking at various levels within this code with no success. The only instance I can seem to achieve thread-safety is to place a lock around the entire loop which effectively makes the Parallel.ForEach no faster (possibly even slower) than just using foreach. The code is relatively/almost safe with no locking. It only seems to show slight variations in the summation of the geneTokens.Value[-1] keys and gtCandidates.Value[-1] keys about once out of every 20 or so executions.
I realize that Dictionary is not thread-safe. However, I cannot change this particular object to a ConcurrentDictionary without taking a major performance hit downstream. I would rather run this part of the code with a regular foreach than change that particular object. However, I am using ConcurrentDictionary to hold the individual Dictionary objects. I have also tried making this change, and it does not solve my race issue.
Here are my Class level variables:
//Holds all tokens derived from each sequence chunk
public static ConcurrentBag<sequenceItem> tokenBag =
new ConcurrentBag<sequenceItem>();
public BlockingCollection<sequenceItem> sequenceTokens = new
BlockingCollection<sequenceItem>(tokenBag);
public ConcurrentDictionary<string, int> categories = new
ConcurrentDictionary<string, int>();
public ConcurrentDictionary<int, Dictionary<int, int>> gtStartingFrequencies = new
ConcurrentDictionary<int, Dictionary<int, int>>();
public ConcurrentDictionary<string, Dictionary<int, int>> gtCandidates = new
ConcurrentDictionary<string, Dictionary<int, int>>();
public ConcurrentDictionary<string, Dictionary<int, int>> geneTokens = new
ConcurrentDictionary<string, Dictionary<int, int>>();
Here is the Parallel.ForEach:
Parallel.ForEach(sequenceTokens.GetConsumingEnumerable(), seqToken =>
{
lock (locker)
{
//Check to see if the Sequence Token is a Gene Token
Dictionary<int, int> geneTokenFreqs;
if (geneTokens.TryGetValue(seqToken.text, out geneTokenFreqs))
{ //The Sequence Token is a Gene Token
*****************Race Issue Seems To Occur Here****************************
//Increment or create category frequencies for each category provided
int frequency;
foreach (int category in seqToken.categories)
{
if (geneTokenFreqs.TryGetValue(category, out frequency))
{ //increment the category frequency, if it already exists
frequency++;
geneTokenFreqs[category] = frequency;
}
else
{ //Create the category frequency, if it does not exist
geneTokenFreqs.Add(category, 1);
}
}
//Update the frequencies total [-1] by the total # of categories incremented.
geneTokenFreqs[-1] += seqToken.categories.Length;
******************************************************************************
}
else
{ //The Sequence Token is NOT yet a Gene Token
//Check to see if the Sequence Token is a Gene Token Candidate yet
Dictionary<int, int> candidateTokenFreqs;
if (gtCandidates.TryGetValue(seqToken.text, out candidateTokenFreqs))
{
*****************Race Issue Seems To Occur Here****************************
//Increment or create category frequencies for each category provided
int frequency;
foreach (int category in seqToken.categories)
{
if (candidateTokenFreqs.TryGetValue(category, out frequency))
{ //increment the category frequency, if it already exists
frequency++;
candidateTokenFreqs[category] = frequency;
}
else
{ //Create the category frequency, if it does not exist
candidateTokenFreqs.Add(category, 1);
}
}
//Update the frequencies total [-1] by the total # of categories incremented.
candidateTokenFreqs[-1] += seqToken.categories.Length;
*****************************************************************************
//Only update the candidate sequence count once per sequence
if (candidateTokenFreqs[-3] != seqToken.sequenceId)
{
candidateTokenFreqs[-3] = seqToken.sequenceId;
candidateTokenFreqs[-2]++;
//Promote the Token Candidate to a Gene Token, if it has been found >=
//the user defined candidateThreshold
if (candidateTokenFreqs[-2] >= candidateThreshold)
{
Dictionary<int, int> deletedCandidate;
gtCandidates.TryRemove(seqToken.text, out deletedCandidate);
geneTokens.TryAdd(seqToken.text, candidateTokenFreqs);
}
}
}
else
{
//create a new token candidate frequencies dictionary by making
//a copy of the default dictionary from
gtCandidates.TryAdd(seqToken.text, new
Dictionary<int, int>(gtStartingFrequencies[seqToken.sequenceId]));
}
}
}
});

Clearly one data race comes from the fact that some threads will be adding items here:
geneTokens.TryAdd(seqToken.text, candidateTokenFreqs);
and others will be reading here:
if (geneTokens.TryGetValue(seqToken.text, out geneTokenFreqs))

How i have used Concurrent Dictionary in my Project:
I m putting a flag in the dictionary and checking from another thread , that if flag is there or not. If flag present i do my task accordingly..
For this what i m doing is as :
1) Declaring Concurrent Dictionary
2)Adding the flag using TryADD method
3)trying to retrieve the flat using TryGet Methid.
1) Declaring
Dim cd As ConcurrentDictionary(Of Integer, [String]) = New ConcurrentDictionary(Of Integer, String)()
2) Adding
If cd.TryAdd(1, "uno") Then
Console.WriteLine("CD.TryAdd() succeeded when it should have failed")
numFailures += 1
End If
3) Retrieving
If cd.TryGetValue(1, "uno") Then
Console.WriteLine("CD.TryAdd() succeeded when it should have failed")
numFailures += 1
End If

Related

Is HashSet<T> thread safe as a value of ConcurrentDictionary<TKey, HashSet<T>>?

If I have the following code:
var dictionary = new ConcurrentDictionary<int, HashSet<string>>();
foreach (var user in users)
{
if (!dictionary.ContainsKey(user.GroupId))
{
dictionary.TryAdd(user.GroupId, new HashSet<string>());
}
dictionary[user.GroupId].Add(user.Id.ToString());
}
Is the act of adding an item into the HashSet inherently thread safe because HashSet is a value property of the concurrent dictionary?

No. Putting a container in a thread-safe container does not make the inner container thread safe.
dictionary[user.GroupId].Add(user.Id.ToString());
is calling HashSet's add after retrieving it from the ConcurrentDictionary. If this GroupId is looked up from two threads at once this would break your code with strange failure modes. I saw the result of one of my teammates making the mistake of not locking his sets, and it wasn't pretty.
This is a plausible solution. I'd do something different myself but this is closer to your code.
if (!dictionary.ContainsKey(user.GroupId))
{
dictionary.TryAdd(user.GroupId, new HashSet<string>());
}
var groups = dictionary[user.GroupId];
lock(groups)
{
groups.Add(user.Id.ToString());
}

No, the collection (the dictionary itself) is thread-safe, not whatever you put in it. You have a couple of options:
Use AddOrUpdate as #TheGeneral mentioned:
dictionary.AddOrUpdate(user.GroupId, new HashSet<string>(), (k,v) => v.Add(user.Id.ToString());
Use a concurrent collection, like the ConcurrentBag<T>:
ConcurrentDictionary<int, ConcurrentBag<string>>
Whenever you are building the Dictionary, as in your code, you should be better off accessing it as little as possible. Think of something like this:
var dictionary = new ConcurrentDictionary<int, ConcurrentBag<string>>();
var grouppedUsers = users.GroupBy(u => u.GroupId);
foreach (var group in grouppedUsers)
{
// get the bag from the dictionary or create it if it doesn't exist
var currentBag = dictionary.GetOrAdd(group.Key, new ConcurrentBag<string>());
// load it with the users required
foreach (var user in group)
{
if (!currentBag.Contains(user.Id.ToString())
{
currentBag.Add(user.Id.ToString());
}
}
}
If you actually want a built-in concurrent HashSet-like collection, you'd need to use ConcurrentDictionary<int, ConcurrentDictionary<string, string>>, and care either about the key or the value from the inner one.

How is this parallel for not processing all elements?

I've created this normal for loop:
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
Dictionary<string, Dictionary<string, bool>> filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
foreach (var item in files)
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
}
return filesAnalyzed;
}
The loop just checks if each file that is in the variable "files" has all the dependencies specified in the variable "dependencies".
the "files" variable should only have unique elements because it is used as the key for the result, a dictionary, but I check this before calling the method.
The for loop works correctly and all elements are processed in single thread, so I wanted to increase the performance by changing to a parallel for loop, the problem is that not all the elements that come from the "files" variable are being processed in the parallel for (in my test case I get 30 elements instead of 53).
I've tried to increase the timespan, or to remove all the "Monitor.TryEnter" code and use just a lock(filesAnalyzed) but still got the same result
I'm not very familiar with the paraller for, so it might be something in the syntax that I'm using.
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
Parallel.For<KeyValuePair<string, Dictionary<string, bool>>>(
//start index
0,
//end index
files.Count(),
// initialization?
()=>new KeyValuePair<string, Dictionary<string, bool>>(),
(index, loop, result) =>
{
var temp = new KeyValuePair<string, Dictionary<string, bool>>(
files.ElementAt(index),
AnalyzeFile(files.ElementAt(index), dependencies));
return temp;
}
,
//finally
(x) =>
{
if (Monitor.TryEnter(filesAnalyzed, new TimeSpan(0, 0, 30)))
{
try
{
filesAnalyzed.Add(x.Key, x.Value);
}
finally
{
Monitor.Exit(filesAnalyzed);
}
}
}
);
return filesAnalyzed;
}
any feedback is appreciated

Assuming the code inside AnalyzeFile and dependencies is thread safe, how about something like this:
var filesAnalyzed = files
.AsParellel()
.Select(x => new{Item = x, File = AnalyzeFile(x, dependencies)})
.ToDictionary(x => x.Item, x=> x.File);

Rewrite your normal loop this way:
Parallel.Foreach(files, item=>
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
You should also use ConcurrentDictionary except Dictionary to make all process thread-safe

You can simplify your code a lot if you use Parallel LINQ instead :
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = ( from item in files.AsParallel()
let result=AnalyzeFile(item, dependencies)
select (Item:item,Result:result)
).ToDictionary( it=>it.Item,it=>it.Result)
return filesAnalyzed;
}
I used tuple syntax in this case to avoid noise. It also cuts down on allocations.
Using method syntax, the same can be written as :
var filesAnalyzed = files.AsParallel()
.Select(item=> (item, AnalyzeFile(item, dependencies)))
.ToDictionary( it=>it.Item,it=>it.Result)
Dictionary<> isn't thread-safe for modification. If you wanted to use Parallel.ForEach without locking, you'd have to use ConcurrentDictionary
var filesAnalyzed = ConcurrentDictionary<string,Dictionary<string,bool>>;
Parallel.ForEach(files,file => {
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
In this case at least, there is no benefit in using Parallel over PLINQ.

Hard to say what is exactly going wrong without debugging the code. Just looking at it though I would have used a ConcurrentDictionary for filesAnalyzed variable instead of a normal `Dictionary and get rid of the Monitor.
I would also check whether same key already exists in the dictionary filesAnalyzed, it could be that you are trying to add a kvp withthe key that is added to the dictionary already.

How to achieve remove_if functionality in .NET ConcurrentDictionary

I have a scenario where I have to keep reference counted object for given key in the ConcurrentDictionary, if reference count reaches 0, I want to delete the key. This has to be thread safe hence I am planning to use the ConcurrentDictionary.
Sample program as follows. In the concurrent dictionary, I have key and value , the value is KeyValuePair which holds my custom object and reference count.
ConcurrentDictionary<string, KeyValuePair<object, int>> ccd =
new ConcurrentDictionary<string, KeyValuePair<object, int>>();
// following code adds the key, if not exists with reference
// count for my custom object to 1
// if the key already exists it increments the reference count
var addOrUpdateValue = ccd.AddOrUpdate("mykey",
new KeyValuePair<object, int>(new object(), 1),
(k, pair) => new KeyValuePair<object, int>(pair.Key, pair.Value + 1));
Now I want a way to remove the key when the reference count reaches to 0. I was thinking , remove method on ConcurrentDictionary which takes key and predicate , removes the key if the predicate return 'true'. Example.
ConcurrentDictionary.remove(TKey, Predicate<TValue> ).
There is no such method on ConcurrentDictionary, question is how to do the same in thread safe way ?.

.NET doesn't expose a RemoveIf directly, but it does expose the building blocks necessary to make it work without doing your own locking.
ConcurrentDictionary implements ICollection<T>, which has a Remove that takes and tests for a full KeyValuePair instead of just a key. Despite being hidden, this Remove is still thread-safe and we'll use it to implement this. One caveat for this to work is that Remove uses EqualityComparer<T>.Default to test the value, so it must be equality comparable. Your current one is not, so we'll re-implement that as such:
struct ObjectCount : IEquatable<ObjectCount>
{
public object Object { get; }
public int Count { get; }
public ObjectCount(object o, int c)
{
Object = o;
Count = c;
}
public bool Equals(ObjectCount o) =>
object.Equals(Object, o.Object) && Count == o.Count;
public override bool Equals(object o) =>
(o as ObjectCount?)?.Equals(this) == true;
// this hash combining will work but you can do better.
// it is not actually used by any of this code.
public override int GetHashCode() =>
(Object?.GetHashCode() ?? 0) ^ Count.GetHashCode();
}
And finally, we'll define a method to increment/decrement counts from your dictionary:
void UpdateCounts(ConcurrentDictionary<string, ObjectCount> dict, string key, int toAdd)
{
var addOrUpdateValue = dict.AddOrUpdate(key,
new ObjectCount(new object(), 1),
(k, pair) => new ObjectCount(pair.Key, pair.Value + toAdd));
if(addOrUpdateValue.Count == 0)
{
((ICollection<KeyValuePair<string, ObjectCount>>)dict).Remove(
new KeyValuePair<string, ObjectCount>(key, addOrUpdateValue));
}
}
The value for that key might be changed between the calls of AddOrUpdate and Remove, but that doesn't matter to us: because Remove tests the full KeyValuePair, it will only remove it if the value hasn't changed since the update.
This is the common lock-free pattern of setting up a change and then using a final thread-safe op to safely "commit" the change only if our data structure hasn't been updated in the meantime.

You can't use a ConcurrentDictionary because it does not expose its internal locking. Your increment must occur under the same lock that controls the add (a simple interlocked add is not enough as a concurrent thread may remove the object before you increment the count). Similarly, the decrement must acquire the lock to be able to safely remove it if it reaches 0 count. What this spell is that you must use a dictionary for which you control the locking explicitly.

I had a similar issue - having multi-threaded piece of code, I needed to count the number of times I've accessed certain type of resource. In other words - I need to find distribution of access to different resource types.
The way I've solved it:
Create store for your counts:
ConcurrentDictionary<string, StrongBox<int>> _counts = new ConcurrentDictionary<string, StrongBox<int>>();
When the resource is accessed, increment access count:
Interlocked.Increment(ref _counts.GetOrAdd(_resourceType, new StrongBox<int>(0)).Value);
In your case you'll have to take care of decrement as well.
I know it's not a full solution to the problem you've presented, and it's not a direct answer to it, but I hope it can be useful to someone.

Currently (.NET 6) the ConcurrentDictionary<K,V> class has no API available that allows to update or remove a key, based on a user-supplied delegate. This functionality can be achieved laboriously by using the methods TryGetValue, TryUpdate and TryRemove in a loop:
string key = "mykey";
while (true)
{
if (!ccd.TryGetValue(key, out KeyValuePair<object, int> existingValue))
break; // The key was not found
// Create new value based on the existing value
KeyValuePair<object, int> newValue = KeyValuePair
.Create(existingValue.Key, existingValue.Value - 1);
if (newValue.Value > 0)
{
if (ccd.TryUpdate(key, newValue, existingValue))
break; // The key was updated successfully
}
else
{
if (ccd.TryRemove(KeyValuePair.Create(key, existingValue)))
break; // The key was removed successfully
}
// We lost the race to either TryUpdate or TryRemove. Try again.
}
In case there is no contention, the loop will exit after a single iteration.
I have made a proposal on GitHub, for an API TryUpdateOrRemove that would fill this void. In case this proposal is accepted, the above code could be reduced to this:
ccd.TryUpdateOrRemove(key, (_, existingValue) =>
{
KeyValuePair<object, int> newValue = KeyValuePair
.Create(existingValue.Key, existingValue.Value - 1);
if (newValue.Value > 0) return (UpdateRemoveResult.Update, newValue);
return (UpdateRemoveResult.Remove, default);
});
In case you like this proposal, make sure to give it an upvote on GitHub. Not only it's less code, but it should be also more efficient because the key would be hashed only once.

This will give you a dictionary that tracks the count of an item if it is not zero and has no item when it is 0. The increment and decrement are fairly straightforward. The Remove empty node looks odd, but preserves accurate count even if adds and removes come in out of order. The decrement initial value of -1, again is to handle when calls come in out of order.
Concurrent programming is weird sometimes.
private void Increment(string key)
{
var result = ccd.AddOrUpdate(key,new KeyValuePair<object, int>(new object(), 1),(k, pair) => new KeyValuePair<object, int>(pair.Key, pair.Value + 1));
RemoveEmptyNode(key, result);
}
private void Decrement(string key)
{
var result = ccd.AddOrUpdate(key, new KeyValuePair<object, int>(new object(), -1), (k, pair) => new KeyValuePair<object, int>(pair.Key, pair.Value - 1));
RemoveEmptyNode(key, result);
}
private void RemoveEmptyNode(string key, KeyValuePair<object, int> result)
{
if (result.Value == 0)
{
KeyValuePair<object, int> removedKeyValuePair;
if (ccd.TryRemove(key, out removedKeyValuePair))
{
if (removedKeyValuePair.Value != 0)
{
ccd.AddOrUpdate(key, removedKeyValuePair,
(k, pair) => new KeyValuePair<object, int>(key, pair.Value + removedKeyValuePair.Value));
}
}
}
}
}

Complex LINQ query vs Complex for loops

Right now I have this complex function, I have a list of MainObjects(call them MO) and for each of these objects I have to loop over a list of objects(call them C) with a title, a status and a list of sub-objects (call them E). The function loops over these sub-objects(E) and uses it's title and quantity properties.
The goal of the function is to create a dictionary(D1), where the Key is a C(title) and the Values is another dictionary(D2), where the Key is E(title) and the Values yet another dictionary(D3), where the Key is C(status) and the value E(quantity).
So in the end I will have all (unique) C(title)'s wherein I can see all (unique) E(title)'s wherein I can see all different C(status)'s and the E(quantity) of these statuses (with as extra challenge if 2 E(quantity)'s would have the same status with the same title on the same course they should be added to each other and then put in as value).
I made this all work fine.
However. The function is big and hard to understand, so I'm looking for a more approachable way of dealing with this problem.
One of these ways was supposed to be LINQ. However, I have little to no knowledge about this and for a massively complex function as this I can hardly understand how to deal with this in LINQ.
I'm also concerned about performance since this WPF project is heavily dependable on user-experience. So I'm not sure if LINQ would actually make things faster, slower or same.
Here is where you guys come in.
Is LINQ a better way to deal with this problem?
Is the performance similar to the one of my function?
Is the LINQ query more understandable?
Is there an alternative way of dealing with this complex function
rather then the 2 methods I'm describing?
Underneath you will find the function I used to deal with this function my way.
It is done in 3 steps:
Step1: Loop the MO's, C's, E's and create a list of dictionaries.
Step2: Join the duplicate key's of the result of step1 and create a
first stage dictionary.
Step3: Split the deeper dictionaries so that
we can use the E object as intended.
Result: has been put in the 'final' object. A list of dictionaries with as keys C(title) and values a list of dictionaries. This list of dictionaries with as keys E(title) and values a Dictionary. This Dictionary has as keys C(status) and values E(quantity). This E(quantity) is a combined value of each quantity of each E of the same C(status) for a same C.
//DateTime start = DateTime.Now; //start performance test
//start -> step 1
List<Dictionary<string/*C(title)*/, Dictionary<int/*C(status)*/, List<E>>>> firstResultList = new List<Dictionary<string, Dictionary<int, List<E>>>>();
foreach(MO mo in listOfMOs)
{
foreach (C c in mo.listOfCs)
{
Dictionary<string, Dictionary<int, List<E>>> D1 = new Dictionary<string, Dictionary<int, List<E>>>();
int cStatus = c.status;
Dictionary<int, List<E>> D2 = new Dictionary<int, List<E>>();
List<E> eList = new List<E>();
foreach (E e in c.listOfEs)
{
eList.Add(e);
}
D2.Add(cStatus, eList);
D1.Add(c.Title, D2);
firstResultList.Add(D1);
}
}
//firstResultList = step1 results
//Console.WriteLine(firstResultList.ToString());
//
//step1 -> step2
Dictionary<string/*C(title)*/, List<Dictionary<int/*C(status)*/, List<E>>>> groupedDict = new Dictionary<string, List<Dictionary<int, List<E>>>>();
foreach (Dictionary<string, Dictionary<int, List<E>>> dict in firstResultList)
{
List<Dictionary<int, List<E>>> listje;
if(groupedDict.ContainsKey(dict.Keys.ElementAt(0)))
{
listje = groupedDict[dict.Keys.ElementAt(0)];
}
else
{
listje = new List<Dictionary<int, List<E>>>();
}
listje.Add(dict[dict.Keys.ElementAt(0)]);
groupedDict[dict.Keys.ElementAt(0)] = listje;
}
//groupedDict = step2 results
//Console.WriteLine(groupedDict.ToString());
//
//step2 -> step3
Dictionary<string/*C(title)*/, List<Dictionary<string/*E(title)*/, Dictionary<int/*C(status)*/, int/*E(quantity)*/>>>> final = new Dictionary<string, List<Dictionary<string, Dictionary<int, int>>>>();
int index = 0;
foreach (List<Dictionary<int, List<E>>> list in groupedDict.Values)
{
//Within one unique C
List<Dictionary<string, Dictionary<int, int>>> eStatusQuantityList = new List<Dictionary<string, Dictionary<int, int>>>();
foreach (Dictionary<int, List<E>> dict in list)
{
foreach (List<E> eList in dict.Values)
{
foreach(E e in eList)
{
if (eStatusQuantityList.Count > 0)
{
foreach (Dictionary<string, Dictionary<int, int>> dict2 in eStatusQuantityList)
{
Dictionary<int, int> statusQuantityDict;
if (dict2.ContainsKey(e.Title))
{
statusQuantityDict = dict2[e.Title];
//int quantity = statusQuantityDict.value//statusQuantityDict[dict.Keys.ElementAt(0)];
int quantity = 0;
int value;
bool hasValue = statusQuantityDict.TryGetValue(dict.Keys.ElementAt(0), out value);
if (hasValue) {
quantity = value;
} else {
// do something when the value is not there
}
statusQuantityDict[dict.Keys.ElementAt(0)] = quantity + e.Quantity;
dict2[e.Title] = statusQuantityDict;
}
else
{
statusQuantityDict = new Dictionary<int, int>();
statusQuantityDict.Add(dict.Keys.ElementAt(0), e.Quantity);
dict2.Add(e.Title, statusQuantityDict);
}
}
}
else
{
Dictionary<string, Dictionary<int, int>> test = new Dictionary<string, Dictionary<int, int>>();
Dictionary<int, int> test2 = new Dictionary<int, int>();
test2.Add(dict.Keys.ElementAt(0), e.Quantity);
test.Add(e.Title, test2);
eStatusQuantityList.Add(test);
}
}
}
}
//ending
string key = groupedDict.Keys.ElementAt(index);
final[key] = eStatusQuantityList;
index++;
//
}
//final contains step3 results
//Console.WriteLine(final.ToString());
/*
for (int i = 0; i<final.Keys.Count; i++)
{
Console.WriteLine(final.Keys.ElementAt(i));
}
for (int i = 0; i < final.Values.Count; i++)
{
Console.WriteLine(final.Values.ElementAt(i));
}
*/
//
//TimeSpan duration = DateTime.Now - start; //end performance test
//Console.WriteLine("That took " + duration.TotalMilliseconds + " ms"); //performance test results //60.006 is fine, 600.006 is OOM. //Our range of objects is max. 300 MO's though
As you can see this is a hell of a function. But it works fine (2-5ms (avg. 2.5) for our max target of MO's). But I can see people (other then myself) messing up when they have to readjust this function for some reason. So any improvement in maintainability or readability would be cool.

Is LINQ a better way to deal with this problem?
Better is subjective. Better looking? Better performance? Better (as in easier) understanding?
Is the performance similar to the one of my function?
LINQ performance is usually not quite as good as doing it manually, however there is always a trade off because LINQ can be (not always) easier to understand.
Is the LINQ query more understandable?
It can be. But if you've ever used reSharper where it looks at your code and says it can turn it into a LINQ query then you'll know that sometimes it makes it less understandable.
Is there an alternative way of dealing with this complex function
rather then the 2 methods I'm describing?
Mix-n-match? You can hand-code performance critical parts and leave the rest in LINQ. But to find the performance critical parts you should use a profiler rather than just guessing.

Sorted Dictionary sorted on value in C# (LRU cache)

I want to implement an LRU cache , where least recently used elements will be evicted asynchronously . My current idea is to use a Dictionary to store the <key,value> pairs , and to keep track of the times of accesses of the objects, to keep a SortedDictionary <key, timestamp>. The idea is for the async thread to get the LRU items from the SortedDictionary and remove from the cache . But for this to work, SortedDictionary needs to sort by value, which it does not.
I could have used a separate SortedList instead of the SortedDictionary for keeping the {key and timestamp} sorted on the timestamp , but then I'll have to do a "linear" lookup for finding the key from the list (when I have to UPDATE the timestamp, when the same key is accessed again) - I am looking for a better than linear way if possible. Can someone share ideas to deal with this problem?
So, my problem boils down to this:
I've to lookup keys in <= logn time for UPDATING the timestamp while at the same time able to get the keys sorted based on the timestamp .
One way thought of was to keep a SortedDictionary of <{key,timestamp},null> which orders the keys based on the timestamp part of {key,timestamp} . While this is fine , the problem is hashcode() will just have to return key.hashcode() (for lookup while updating timestamp) , while equals() should also use timestamp . So , equals() and hashcode() are in conflict , so felt that this is not a good idea ...

What you should do is keep two dictionaries, one sorted by time and one by keys.
Remember that dictionaries are only holding references to your actual objects, so which dictionary you use to update the object doesn't matter.
To update the object create a function that will update both the dictionaries
var oldObj = keyedObject[key];
timedObjects.Remove(oldObj.LastUpdateTime);
timedObjects.Add(myUpdatedObject.LastUpdateTime,myUpdatedObject);
keyedObject[key] = myUpdatedObject;
Now you have track of the same object by both time and key.
I am keeping only one reference to an object in timedObjects. This helps while removing.
You can keep trimming your timedObjects dictionary as required.
Ofcource, while trimming you must bear in mind that there is another dictionary keyedObject that has reference to the same object. Merely calling Remove will not be enough.
Your remove code will have to be like this:
removeObject = timedObjects[timeToRemove];
timedObjects.Remove(timeToRemove);
keyedObject.Remove(removeObject.key);
timeToRemove will mostly come from a for loop, where you decide which object to remove

The type of map you're looking for is (at least in Java) called a LinkedHashMap.
From the javadoc:
Hash table and linked list implementation of the Map interface, with
predictable iteration order. This implementation differs from HashMap
in that it maintains a doubly-linked list running through all of its
entries. This linked list defines the iteration ordering, which is
normally the order in which keys were inserted into the map
(insertion-order).
A special constructor is provided to create a linked hash map whose
order of iteration is the order in which its entries were last
accessed, from least-recently accessed to most-recently
(access-order). This kind of map is well-suited to building LRU
caches.
Source for LinkedHashMap from the OpenJDK
AFAIK, there are no existing implementations of a LinkedHashMap in C#. That being said, it shouldn't be terribly difficult to write one.

Instead of sorteddictionary, write your own linked list, and have the Dictionary point to its nodes as values. It will be always sorted by timestamp, updating timestamp and removing the least resently used element will be O(1).

here is the implementation of the LRU Cache in c#. efficient O(1), but not thread safe;
static void Main(string[] args)
{
var cache = new LruCache(3);
cache.Put(1, 1);
cache.Put(2, 2);
Console.WriteLine(cache.Get(1)); // returns 1
cache.Put(3, 3); // evicts key 2
Console.WriteLine(cache.Get(2)); // returns -1 (not found)
cache.Put(4, 4); // evicts key 1
Console.WriteLine(cache.Get(1)); // returns -1 (not found)
Console.WriteLine(cache.Get(3)); // returns 3
Console.WriteLine(cache.Get(4)); // returns 4
}
public class DoubleLinkedList
{
public int key;
public int value;
public DoubleLinkedList next;
public DoubleLinkedList prev;
public DoubleLinkedList(int k, int v)
{
key = k;
value = v;
}
}
public class LruCache
{
private int size;
private int capacity;
private Dictionary<int, DoubleLinkedList> map;
private DoubleLinkedList head;
private DoubleLinkedList tail;
public LruCache(int cap)
{
capacity = cap;
map = new Dictionary<int, DoubleLinkedList>();
head = new DoubleLinkedList(0, 0);
tail = new DoubleLinkedList(0, 0);
head.next = tail;
tail.prev = head;
}
public int Get(int key)
{
if (map.ContainsKey(key))
{
if (tail.prev.key != key)
{
var node = map[key];
RemoveNode(node);
AddToEnd(node);
}
return map[key].value;
}
return -1;
}
private void AddToEnd(DoubleLinkedList node)
{
var beforeTail = tail.prev;
node.prev = beforeTail;
beforeTail.next = node;
tail.prev = node;
node.next = tail;
}
private void RemoveNode(DoubleLinkedList node)
{
var before = node.prev;
before.next = node.next;
node.next.prev = before;
}
public void Put(int key, int value)
{
if (map.ContainsKey(key))
{
map[key].value = value;
var node = map[key];
RemoveNode(node);
AddToEnd(node);
}
else
{
size++;
if (size > capacity)
{
var node = head.next;
RemoveNode(node);
map.Remove(node.key);
size--;
}
var newNode = new DoubleLinkedList(key, value);
AddToEnd(newNode);
map.Add(key, newNode);
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

A Better Way To Make My Parallel.ForEach Thread Safe? - c#

Clearly one data race comes from the fact that some threads will be adding items here: geneTokens.TryAdd(seqToken.text, candidateTokenFreqs); and others will be reading here: if (geneTokens.TryGetValue(seqToken.text, out geneTokenFreqs))

Related

Is HashSet<T> thread safe as a value of ConcurrentDictionary<TKey, HashSet<T>>?

How is this parallel for not processing all elements?

How to achieve remove_if functionality in .NET ConcurrentDictionary

Complex LINQ query vs Complex for loops

Sorted Dictionary sorted on value in C# (LRU cache)

Categories

Resources