I'm using the code below to either increment or insert a value in a dictionary. If the key I'm incrementing doesn't exist I'd like to set its value to 1.
public void IncrementCount(Dictionary<int, int> someDictionary, int id)
{
int currentCount;
if (someDictionary.TryGetValue(id, out currentCount))
{
someDictionary[id] = currentCount + 1;
}
else
{
someDictionary[id] = 1;
}
}
Is this an appropriate way of doing so?
Your code is fine. But here's a way to simplify in a way that doesn't require branching in your code:
int currentCount;
// currentCount will be zero if the key id doesn't exist..
someDictionary.TryGetValue(id, out currentCount);
someDictionary[id] = currentCount + 1;
This relies on the fact that the TryGetValue method sets value to the default value of its type if the key doesn't exist. In your case, the default value of int is 0, which is exactly what you want.
UPD. Starting from C# 7.0 this snippet can be shortened using out variables:
// declare variable right where it's passed
someDictionary.TryGetValue(id, out var currentCount);
someDictionary[id] = currentCount + 1;
As it turns out it made sense to use the ConcurrentDictionary which has the handy upsert method: AddOrUpdate.
So, I just used:
someDictionary.AddOrUpdate(id, 1, (id, count) => count + 1);
Here's a nice extension method:
public static void Increment<T>(this Dictionary<T, int> dictionary, T key)
{
int count;
dictionary.TryGetValue(key, out count);
dictionary[key] = count + 1;
}
Usage:
var dictionary = new Dictionary<string, int>();
dictionary.Increment("hello");
dictionary.Increment("hello");
dictionary.Increment("world");
Assert.AreEqual(2, dictionary["hello"]);
Assert.AreEqual(1, dictionary["world"]);
It is readable and the intent is clear. I think this is fine. No need to invent some smarter or shorter code; if it doesn't keep the intent just as clear as your initial version :-)
That being said, here is a slightly shorter version:
public void IncrementCount(Dictionary<int, int> someDictionary, int id)
{
if (!someDictionary.ContainsKey(id))
someDictionary[id] = 0;
someDictionary[id]++;
}
If you have concurrent access to the dictionary, remember to synchronize access to it.
Just some measurements on .NET 4 for integer keys.
It's not quite an answer to your question, but for the sake of completeness I've measured the behavior of various classes useful for incrementing integers based on integer keys: simple Array, Dictionary (#Ani's approach), Dictionary (simple approach), SortedDictionary (#Ani's approach) and ConcurrentDictionary.TryAddOrUpdate.
Here is the results, adjusted by 2.5 ns for wrapping with classes instead of direct usage:
Array 2.5 ns/inc
Dictionary (#Ani) 27.5 ns/inc
Dictionary (Simple) 37.4 ns/inc
SortedDictionary 192.5 ns/inc
ConcurrentDictionary 79.7 ns/inc
And that's the code.
Note that ConcurrentDictionary.TryAddOrUpdate is three times slower than Dictionary's TryGetValue + indexer's setter. And the latter is ten times slower than Array.
So I would use an array if I know the range of keys is small and a combined approach otherwise.
Here is a handy unit test for you to play with concerning ConcurrentDictionary and how to keep the values threadsafe:
ConcurrentDictionary<string, int> TestDict = new ConcurrentDictionary<string,int>();
[TestMethod]
public void WorkingWithConcurrentDictionary()
{
//If Test doesn't exist in the dictionary it will be added with a value of 0
TestDict.AddOrUpdate("Test", 0, (OldKey, OldValue) => OldValue+1);
//This will increment the test key value by 1
TestDict.AddOrUpdate("Test", 0, (OldKey, OldValue) => OldValue+1);
Assert.IsTrue(TestDict["Test"] == 1);
//This will increment it again
TestDict.AddOrUpdate("Test", 0, (OldKey, OldValue) => OldValue+1);
Assert.IsTrue(TestDict["Test"] == 2);
//This is a handy way of getting a value from the dictionary in a thread safe manner
//It would set the Test key to 0 if it didn't already exist in the dictionary
Assert.IsTrue(TestDict.GetOrAdd("Test", 0) == 2);
//This will decriment the Test Key by one
TestDict.AddOrUpdate("Test", 0, (OldKey, OldValue) => OldValue-1);
Assert.IsTrue(TestDict["Test"] == 1);
}
Related
I am performing two updates on a value I get from TryGet I would like to know that which of these is better?
Option 1: Locking only out value?
if (HubMemory.AppUsers.TryGetValue(ConID, out OnlineInfo onlineinfo))
{
lock (onlineinfo)
{
onlineinfo.SessionRequestId = 0;
onlineinfo.AudioSessionRequestId = 0;
onlineinfo.VideoSessionRequestId = 0;
}
}
Option 2: Locking whole dictionary?
if (HubMemory.AppUsers.TryGetValue(ConID, out OnlineInfo onlineinfo))
{
lock (HubMemory.AppUsers)
{
onlineinfo.SessionRequestId = 0;
onlineinfo.AudioSessionRequestId = 0;
onlineinfo.VideoSessionRequestId = 0;
}
}
I'm going to suggest something different.
Firstly, you should be storing immutable types in the dictionary to avoid a lot of threading issues. As it is, any code could modify the contents of any items in the dictionary just by retrieving an item from it and changing its properties.
Secondly, ConcurrentDictionary provides the TryUpdate() method to allow you to update values in the dictionary without having to implement explicit locking.
TryUpdate() requires three parameters: The key of the item to update, the updated item and the original item that you got from the dictionary and then updated.
TryUpdate() then checks that the original has NOT been updated by comparing the value currently in the dictionary with the original that you pass to it. Only if it is the SAME does it actually update it with the new value and return true. Otherwise it returns false without updating it.
This allows you to detect and respond appropriately to cases where some other thread has changed the value of the item you're updating while you were updating it. You can either ignore this (in which case, first change gets priority) or try again until you succeed (in which case, last change gets priority). What you do depend on your situation.
Note that this requires that your type implements IEquatable<T>, since that is used by the ConcurrentDictionary to compare values.
Here's a sample console app that demonstrates this:
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
sealed class Test: IEquatable<Test>
{
public Test(int value1, int value2, int value3)
{
Value1 = value1;
Value2 = value2;
Value3 = value3;
}
public Test(Test other) // Copy ctor.
{
Value1 = other.Value1;
Value2 = other.Value2;
Value3 = other.Value3;
}
public int Value1 { get; }
public int Value2 { get; }
public int Value3 { get; }
#region IEquatable<Test> implementation (generated using Resharper)
public bool Equals(Test other)
{
if (other is null)
return false;
if (ReferenceEquals(this, other))
return true;
return Value1 == other.Value1 && Value2 == other.Value2 && Value2 == other.Value3;
}
public override bool Equals(object obj)
{
return ReferenceEquals(this, obj) || obj is Test other && Equals(other);
}
public override int GetHashCode()
{
unchecked
{
return (Value1 * 397) ^ Value2;
}
}
public static bool operator ==(Test left, Test right)
{
return Equals(left, right);
}
public static bool operator !=(Test left, Test right)
{
return !Equals(left, right);
}
#endregion
}
static class Program
{
static void Main()
{
var dict = new ConcurrentDictionary<int, Test>();
dict.TryAdd(0, new Test(1000, 2000, 3000));
dict.TryAdd(1, new Test(4000, 5000, 6000));
dict.TryAdd(2, new Test(7000, 8000, 9000));
Parallel.Invoke(() => update(dict), () => update(dict));
}
static void update(ConcurrentDictionary<int, Test> dict)
{
for (int i = 0; i < 100000; ++i)
{
for (int attempt = 0 ;; ++attempt)
{
var original = dict[0];
var modified = new Test(original.Value1 + 1, original.Value2 + 1, original.Value3 + 1);
var updatedOk = dict.TryUpdate(1, modified, original);
if (updatedOk) // Updated OK so don't try again.
break; // In some cases you might not care, so you would never try again.
Console.WriteLine($"dict.TryUpdate() returned false in iteration {i} attempt {attempt} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
}
}
}
There's a lot of boilerplate code there to support the IEquatable<T> implementation and also to support the immutability.
Fortunately, C# 9 has introduced the record type which makes immutable types much easier to implement. Here's the same sample console app that uses a record instead. Note that record types are immutable and also implement IEquality<T> for you:
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
namespace System.Runtime.CompilerServices // Remove this if compiling with .Net 5
{ // This is to allow earlier versions of .Net to use records.
class IsExternalInit {}
}
namespace Demo
{
record Test(int Value1, int Value2, int Value3);
static class Program
{
static void Main()
{
var dict = new ConcurrentDictionary<int, Test>();
dict.TryAdd(0, new Test(1000, 2000, 3000));
dict.TryAdd(1, new Test(4000, 5000, 6000));
dict.TryAdd(2, new Test(7000, 8000, 9000));
Parallel.Invoke(() => update(dict), () => update(dict));
}
static void update(ConcurrentDictionary<int, Test> dict)
{
for (int i = 0; i < 100000; ++i)
{
for (int attempt = 0 ;; ++attempt)
{
var original = dict[0];
var modified = original with
{
Value1 = original.Value1 + 1,
Value2 = original.Value2 + 1,
Value3 = original.Value3 + 1
};
var updatedOk = dict.TryUpdate(1, modified, original);
if (updatedOk) // Updated OK so don't try again.
break; // In some cases you might not care, so you would never try again.
Console.WriteLine($"dict.TryUpdate() returned false in iteration {i} attempt {attempt} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
}
}
}
Note how much shorter record Test is compared to class Test, even though it provides the same functionality. (Also note that I added class IsExternalInit to allow records to be used with .Net versions prior to .Net 5. If you're using .Net 5, you don't need that.)
Finally, note that you don't need to make your class immutable. The code I posted for the first example will work perfectly well if your class is mutable; it just won't stop other code from breaking things.
Addendum 1:
You may look at the output and wonder why so many retry attempts are made when the TryUpdate() fails. You might expect it to only need to retry a few times (depending on how many threads are concurrently attempting to modify the data). The answer to this is simply that the Console.WriteLine() takes so long that it's much more likely that some other thread changed the value in the dictionary again while we were writing to the console.
We can change the code slightly to only print the number of attempts OUTSIDE the loop like so (modifying the second example):
static void update(ConcurrentDictionary<int, Test> dict)
{
for (int i = 0; i < 100000; ++i)
{
int attempt = 0;
while (true)
{
var original = dict[1];
var modified = original with
{
Value1 = original.Value1 + 1,
Value2 = original.Value2 + 1,
Value3 = original.Value3 + 1
};
var updatedOk = dict.TryUpdate(1, modified, original);
if (updatedOk) // Updated OK so don't try again.
break; // In some cases you might not care, so you would never try again.
++attempt;
}
if (attempt > 0)
Console.WriteLine($"dict.TryUpdate() took {attempt} retries in iteration {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
With this change, we see that the number of retry attempts drops significantly. This shows the importance of minimising the amount of time spent in code between TryUpdate() attempts.
Addendum 2:
As noted by Theodor Zoulias below, you could also use ConcurrentDictionary<TKey,TValue>.AddOrUpdate(), as the example below shows. This is probably a better approach, but it is slightly harder to understand:
static void update(ConcurrentDictionary<int, Test> dict)
{
for (int i = 0; i < 100000; ++i)
{
int attempt = 0;
dict.AddOrUpdate(
1, // Key to update.
key => new Test(1, 2, 3), // Create new element; won't actually be called for this example.
(key, existing) => // Update existing element. Key not needed for this example.
{
++attempt;
return existing with
{
Value1 = existing.Value1 + 1,
Value2 = existing.Value2 + 1,
Value3 = existing.Value3 + 1
};
}
);
if (attempt > 1)
Console.WriteLine($"dict.TryUpdate() took {attempt-1} retries in iteration {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
If you just need to lock the dictionary value, for instance to make sure the 3 values are set at the same time. Then it doesn't really matter what reference type you lock over, just as long as it is a reference type, it's the same instance, and everything else that needs to read or modify those values are also locked on the same instance.
You can read more on how the Microsoft CLR implementation deals with locking and how and why locks work with a reference types here
Why Do Locks Require Instances In C#?
If you are trying to have internal consistency with the dictionary and the value, that's to say, if you are trying to protect not only the internal consistency of the dictionary and the setting and reading of object in the dictionary. Then the your lock is not appropriate at all.
You would need to place a lock around the entire statement (including the TryGetValue) and every other place where you add to the dictionary or read/modify the value. Once again, the object you lock over is not important, just as long as it's consistent.
Note 1 : it is normal to use a dedicated instance to lock over (i.e. some instantiated object) either statically or an instance member depending on your needs, as there is less chance of you shooting yourself in the foot.
Note 2 : there are a lot more ways that can implement thread safety here, depending on your needs, if you are happy with stale values, whether you need every ounce of performance, and if you have a degree in minimal lock coding and how much effort and innate safety you want to bake in. And that is entirely up to you and your solution.
The first option (locking on the entry of the dictionary) is more efficient because it is unlikely to create significant contention for the lock. For this to happen, two threads should try to update the same entry at the same time. The second option (locking on the entire dictionary) is quite possible to create contention under heavy usage, because two threads will be synchronized even if they try to update different entries concurrently.
The first option is also more in the spirit of using a ConcurrentDictionary<K,V> in the first place. If you are going to lock on the entire dictionary, you might as well use a normal Dictionary<K,V> instead. Regarding this dilemma, you may find this question interesting: When should I use ConcurrentDictionary and Dictionary?
I need to use a list of numbers (longs) as a Dictionary key in order to do some group calculations on them.
When using the long array as a key directly, I get a lot of collisions. If I use string.Join(",", myLongs) as a key, it works as I would expect it to, but that's much, much slower (because the hash is more complicated, I assume).
Here's an example demonstrating my problem:
Console.WriteLine("Int32");
Console.WriteLine(new[] { 1, 2, 3, 0}.GetHashCode());
Console.WriteLine(new[] { 1, 2, 3, 0 }.GetHashCode());
Console.WriteLine("String");
Console.WriteLine(string.Join(",", new[] { 1, 2, 3, 0}).GetHashCode());
Console.WriteLine(string.Join(",", new[] { 1, 2, 3, 0 }).GetHashCode());
Output:
Int32
43124074
51601393
String
406954194
406954194
As you can see, the arrays return a different hash.
Is there any way of getting the performance of the long array hash, but the uniqeness of the string hash?
See my own answer below for a performance comparison of all the suggestions.
About the potential duplicate -- that question has a lot of useful information, but as this question was primarily about finding high performance alternatives, I think it still provides some useful solutions that are not mentioned there.
That the first one is different is actually good. Arrays are a reference type and luckily they are using the reference (somehow) during hash generation. I would guess that is something like the Pointer that is used on machine code level, or some Garbage Colletor level value. One of the things you have no influence on but is copied if you assign the same instance to a new reference variable.
In the 2nd case you get the hash value on a string consisting of "," and whatever (new[] { 1, 2, 3, 0 }).ToString(); should return. The default is something like teh class name, so of course in both cases they will be the same. And of course string has all those funny special rules like "compares like a value type" and "string interning", so the hash should be the same.
Another alternative is to leverage the lesser known IEqualityComparer to implement your own hash and equality comparisons. There are some notes you'll need to observe about building good hashes, and it's generally not good practice to have editable data in your keys, as it'll introduce instability should the keys ever change, but it would certainly be more performant than using string joins.
public class ArrayKeyComparer : IEqualityComparer<int[]>
{
public bool Equals(int[] x, int[] y)
{
return x == null || y == null
? x == null && y == null
: x.SequenceEqual(y);
}
public int GetHashCode(int[] obj)
{
var seed = 0;
if(obj != null)
foreach (int i in obj)
seed %= i.GetHashCode();
return seed;
}
}
Note that this still may not be as performant as a tuple, since it's still iterating the array rather than being able to take a more constant expression.
Your strings are returning the same hash codes for the same strings correctly because string.GetHashCode() is implemented that way.
The implementation of int[].GetHashCode() does something with its memory address to return the hash code, so arrays with identical contents will nevertheless return different hash codes.
So that's why your arrays with identical contents are returning different hash codes.
Rather than using an array directly as a key, you should consider writing a wrapper class for an array that will provide a proper hash code.
The main disadvantage with this is that it will be an O(N) operation to compute the hash code (it has to be - otherwise it wouldn't represent all the data in the array).
Fortunately you can cache the hash code so it's only computed once.
Another major problem with using a mutable array for a hash code is that if you change the contents of the array after using it for the key of a hashing container such as Dictionary, you will break the container.
Ideally you would only use this kind of hashing for arrays that are never changed.
Bearing all that in mind, a simple wrapper would look like this:
public sealed class IntArrayKey
{
public IntArrayKey(int[] array)
{
Array = array;
_hashCode = hashCode();
}
public int[] Array { get; }
public override int GetHashCode()
{
return _hashCode;
}
int hashCode()
{
int result = 17;
unchecked
{
foreach (var i in Array)
{
result = result * 23 + i;
}
}
return result;
}
readonly int _hashCode;
}
You can use that in place of the actual arrays for more sensible hash code generation.
As per the comments below, here's a version of the class that:
Makes a defensive copy of the array so that it cannot be modified.
Implements equality operators.
Exposes the underlying array as a read-only list, so callers can access its contents but cannot break its hash code.
Code:
public sealed class IntArrayKey: IEquatable<IntArrayKey>
{
public IntArrayKey(IEnumerable<int> sequence)
{
_array = sequence.ToArray();
_hashCode = hashCode();
Array = new ReadOnlyCollection<int>(_array);
}
public bool Equals(IntArrayKey other)
{
if (other is null)
return false;
if (ReferenceEquals(this, other))
return true;
return _hashCode == other._hashCode && equals(other.Array);
}
public override bool Equals(object obj)
{
return ReferenceEquals(this, obj) || obj is IntArrayKey other && Equals(other);
}
public static bool operator == (IntArrayKey left, IntArrayKey right)
{
return Equals(left, right);
}
public static bool operator != (IntArrayKey left, IntArrayKey right)
{
return !Equals(left, right);
}
public IReadOnlyList<int> Array { get; }
public override int GetHashCode()
{
return _hashCode;
}
bool equals(IReadOnlyList<int> other) // other cannot be null.
{
if (_array.Length != other.Count)
return false;
for (int i = 0; i < _array.Length; ++i)
if (_array[i] != other[i])
return false;
return true;
}
int hashCode()
{
int result = 17;
unchecked
{
foreach (var i in _array)
{
result = result * 23 + i;
}
}
return result;
}
readonly int _hashCode;
readonly int[] _array;
}
If you wanted to use the above class without the overhead of making a defensive copy of the array, you can change the constructor to:
public IntArrayKey(int[] array)
{
_array = array;
_hashCode = hashCode();
Array = new ReadOnlyCollection<int>(_array);
}
If you know the length of the arrays you're using, you could use a Tuple.
Console.WriteLine("Tuple");
Console.WriteLine(Tuple.Create(1, 2, 3, 0).GetHashCode());
Console.WriteLine(Tuple.Create(1, 2, 3, 0).GetHashCode());
Outputs
Tuple
1248
1248
I took all the suggestions from this question and the similar byte[].GetHashCode() question, and made a simple performance test.
The suggestions are as follows:
int[] as key (original attempt -- does not work at all, included as a benchmark)
string as key (original solution -- works, but slow)
Tuple as key (suggested by David)
ValueTuple as key (inspired by the Tuple)
Direct int[] hash as key
IntArrayKey (suggested by Matthew Watson)
int[] as key with Skeet's IEqualityComparer
int[] as key with David's IEqualityComparer
I generated a List containing one million int[]-arrays of length 7 containing random numbers between 100 000 and 999 999 (which is an approximation of my current use case). Then I duplicated the first 100 000 of these arrays, so that there are 900 000 unique arrays, and 100 000 that are listed twice (to force collisions).
For each solution, I enumerated the list, and added the keys to a Dictionary, OR incremented the Value if the key already existed. Then I printed how many keys had a Value more than 1**, and how much time it took.
The results are as follows (ordered from best to worst):
Algorithm Works? Time usage
NonGenericSkeetEquality YES 392 ms
SkeetEquality YES 422 ms
ValueTuple YES 521 ms
QuickIntArrayKey YES 747 ms
IntArrayKey YES 972 ms
Tuple YES 1 609 ms
string YES 2 291 ms
DavidEquality YES 1 139 200 ms ***
int[] NO 336 ms
IntHash NO 386 ms
The Skeet IEqualityComparer is only slightly slower than using the int[] as key directly, with the huge advantage that it actually works, so I'll use that.
** I'm aware that this is not a completely fool proof solution, as I could theoretically get the expected number of collisions without it actually being the collisions I expected, but having run the test a lot of times, I'm fairly certain I don't.
*** Did not finish, probably due to poor hashing algorithm and a lot of equality checks. Had to reduce the number of arrays to 10 000, then multiply the time usage by 100 to compare with the others.
Suppose the following code:
if (myDictionary.ContainsKey(aKey))
myDictionary[aKey] = aValue;
else
myDictionary.Add(aKey, aValue);
This code accesses the dictionary two times, once for determining whether aKey exist, another time for updating (if exists) or adding (if does not exist). I guess the performance of this method is "acceptable" when this code is executed only a few times. However, in my application similar code is executed roughly 500K times. I profiled my code, and it shows 80% of CPU time spent on this section (see the following figure), so this motivates an improvement.
Note that, the dictionary is lambdas.
First workaround is simply:
myDictionary[aKey] = aValue;
If aKey exist it's value is replaced with aValue; if does not exist, a KeyValuePair with aKey as key and aValue as value is added to myDictionary. However, this method has two drawbacks:
First, you don't know if aKey exist or not that prevents you from additional logics. For instance, you can not rewrite following code based on this workaround:
int addCounter = 0, updateCounter = 0;
if (myDictionary.ContainsKey(aKey))
{
myDictionary[aKey] = aValue;
addCounter++;
}
else
{
myDictionary.Add(aKey, aValue);
updateCounter++;
}
Second, the update can not be a function of the old value. For instance, you can not do a logic similar to:
if (myDictionary.ContainsKey(aKey))
myDictionary[aKey] = (myDictionary[aKey] * 2) + aValue;
else
myDictionary.Add(aKey, aValue);
The second workaround is to use ConcurrentDictionary. It's clear that by using delegates we can solve the second aforementioned issue; however, still, it is not clear to me how we can address the first issue.
Just to remind you, my concern is to speed up. Given that there is only one thread using this procedure, I don't think the penalty of concurrency (with locks) for only one thread is worth using ConcurrentDictionary.
Am I missing a point? does anyone have a better suggestion?
If you really want AddOrUpdate method like in ConcurrentDictionary but without performance implications of using one, you will have to implement such Dictionary yourself.
The good news is that since CoreCLR is open source, you can take actual .Net Dictionary source from CoreCLR repository and apply your own modification. It seems it will not be so hard, take a look at the Insert private method there.
One possible implementation would be (untested):
public void AddOrUpdate(TKey key, Func<TKey, TValue> adder, Func<TKey, TValue, TValue> updater) {
if( key == null ) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (buckets == null) Initialize(0);
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int targetBucket = hashCode % buckets.Length;
for (int i = buckets[targetBucket]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) {
entries[i].value = updater(key, entries[i].value);
version++;
return;
}
}
int index;
if (freeCount > 0) {
index = freeList;
freeList = entries[index].next;
freeCount--;
}
else {
if (count == entries.Length)
{
Resize();
targetBucket = hashCode % buckets.Length;
}
index = count;
count++;
}
entries[index].hashCode = hashCode;
entries[index].next = buckets[targetBucket];
entries[index].key = key;
entries[index].value = adder(key);
buckets[targetBucket] = index;
version++;
}
Since .NET 6 there is a new method CollectionsMarshal.GetValueRefOrAddDefault to do just that.
Sample usage:
Dictionary<string, string> dictionary = new Dictionary<string, string>();
ref string? dictionaryValue = ref CollectionsMarshal.GetValueRefOrAddDefault(dictionary, "key", out bool exists);
//variable 'exists' is true if key was present, and false if it had to be added
if (exists)
{
//Update the value of dictionaryValue variable
dictionaryValue = dictionaryValue?.ToLowerCaseInvariant();
}
else
{
//assign new value
dictionaryValue = "test";
}
The only drawback is that you cannot decide not to add the new value after invoking this method. It always creates placeholder empty value in your dictionary if key is missing. You basically have to assign this new value or you are left with an empty entry in your dictionary.
My Problem:
I want to convert my randomBloodType() method to a static method that can take any enum type. I want my method to take any type of enum whether it be BloodType, DaysOfTheWeek, etc. and perform the operations shown below.
Some Background on what the method does:
The method currently chooses a random element from the BloodType enum based on the values assigned to each element. An element with a higher value has a higher probability to be picked.
Code:
public enum BloodType
{
// BloodType = Probability
ONeg = 4,
OPos = 36,
ANeg = 3,
APos = 28,
BNeg = 1,
BPos = 20,
ABNeg = 1,
ABPos = 5
};
public BloodType randomBloodType()
{
// Get the values of the BloodType enum and store it in a array
BloodType[] bloodTypeValues = (BloodType[])Enum.GetValues(typeof(BloodType));
List<BloodType> bloodTypeList = new List<BloodType>();
// Create a list where each element occurs the approximate number of
// times defined as its value(probability)
foreach (BloodType val in bloodTypeValues)
{
for(int i = 0; i < (int)val; i++)
{
bloodTypeList.Add(val);
}
}
// Sum the values
int sum = 0;
foreach (BloodType val in bloodTypeValues)
{
sum += (int)val;
}
//Get Random value
Random rand = new Random();
int randomValue = rand.Next(sum);
return bloodTypeList[randomValue];
}
What I have tried so far:
I have tried to use generics. They worked out for the most part, but I was unable to cast my enum elements to int values. I included a example of a section of code that was giving me problems below.
foreach (T val in bloodTypeValues)
{
sum += (int)val; // This line is the problem.
}
I have also tried using Enum e as a method parameter. I was unable to declare the type of my array of enum elements using this method.
(Note: My apologies in advance for the lengthy answer. My actual proposed solution is not all that long, but there are a number of problems with the proposed solutions so far and I want to try to address those thoroughly, to provide context for my own proposed solution).
In my opinion, while you have in fact accepted one answer and might be tempted to use either one, neither of the answers provided so far are correct or useful.
Commenter Ben Voigt has already pointed out two major flaws with your specifications as stated, both related to the fact that you are encoding the enum value's weight in the value itself:
You are tying the enum's underlying type to the code that then must interpret that type.
Two enum values that have the same weight are indistinguishable from each other.
Both of these issues can be addressed. Indeed, while the answer you accepted (why?) fails to address the first issue, the one provided by Dweeberly does address this through the use of Convert.ToInt32() (which can convert from long to int just fine, as long as the values are small enough).
But the second issue is much harder to address. The answer from Asad attempts to address this by starting with the enum names and parsing them to their values. And this does indeed result in the final array being indexed containing the corresponding entries for each name separately. But the code actually using the enum has no way to distinguish the two; it's really as if those two names are a single enum value, and that single enum value's probability weight is the sum of the value used for the two different names.
I.e. in your example, while the enum entries for e.g. BNeg and ABNeg will be selected separately, the code that receives these randomly selected value has no way to know whether it was BNeg or ABNeg that was selected. As far as it knows, those are just two different names for the same value.
Now, even this problem can be addressed (but not in the way that Asad attempts to…his answer is still broken). If you were, for example, to encode the probabilities in the value while still ensuring unique values for each name, you could decode those probabilities while doing the random selection and that would work. For example:
enum BloodType
{
// BloodType = Probability
ONeg = 4 * 100 + 0,
OPos = 36 * 100 + 1,
ANeg = 3 * 100 + 2,
APos = 28 * 100 + 3,
BNeg = 1 * 100 + 4,
BPos = 20 * 100 + 5,
ABNeg = 1 * 100 + 6,
ABPos = 5 * 100 + 7,
};
Having declared your enum values that way, then you can in your selection code divide the enum value by 100 to obtain its probability weight, which then can be used as seen in the various examples. At the same time, each enum name has a unique value.
But even solving that problem, you are still left with problems related to the choice of encoding and representation of the probabilities. For example, in the above you cannot have an enum that has more than 100 values, nor one with weights larger than (2^31 - 1) / 100; if you want an enum that has more than 100 values, you need a larger multiplier but that would limit your weight values even more.
In many scenarios (maybe all the ones you care about) this won't be an issue. The numbers are small enough that they all fit. But that seems like a serious limitation in what seems like a situation where you want a solution that is as general as possible.
And that's not all. Even if the encoding stays within reasonable limits, you have another significant limit to deal with: the random selection process requires an array large enough to contain for each enum value as many instances of that value as its weight. Again, if the values are small maybe this is not a big problem. But it does severely limit the ability of your implementation to generalize.
So, what to do?
I understand the temptation to try to keep each enum type self-contained; there are some obvious advantages to doing so. But there are also some serious disadvantages that result from that, and if you truly ever try to use this in a generalized way, the changes to the solutions proposed so far will tie your code together in ways that IMHO negate most if not all of the advantage of keeping the enum types self-contained (primarily: if you find you need to modify the implementation to accommodate some new enum type, you will have to go back and edit all of the other enum types you're using…i.e. while each type looks self-contained, in reality they are all tightly coupled with each other).
In my opinion, a much better approach would be to abandon the idea that the enum type itself will encode the probability weights. Just accept that this will be declared separately somehow.
Also, IMHO is would be better to avoid the memory-intensive approach proposed in your original question and mirrored in the other two answers. Yes, this is fine for the small values you're dealing with here. But it's an unnecessary limitation, making only one small part of the logic simpler while complicating and restricting it in other ways.
I propose the following solution, in which the enum values can be whatever you want, the enum's underlying type can be whatever you want, and the algorithm uses memory proportionally only to the number of unique enum values, rather than in proportion to the sum of all of the probability weights.
In this solution, I also address possible performance concerns, by caching the invariant data structures used to select the random values. This may or may not be useful in your case, depending on how frequently you will be generating these random values. But IMHO it is a good idea regardless; the up-front cost of generating these data structures is so high that if the values are selected with any regularity at all, it will begin to dominate the run-time cost of your code. Even if it works fine today, why take the risk? (Again, especially given that you seem to want a generalized solution).
Here is the basic solution:
static T NextRandomEnumValue<T>()
{
KeyValuePair<T, int>[] aggregatedWeights = GetWeightsForEnum<T>();
int weightedValue =
_random.Next(aggregatedWeights[aggregatedWeights.Length - 1].Value),
index = Array.BinarySearch(aggregatedWeights,
new KeyValuePair<T, int>(default(T), weightedValue),
KvpValueComparer<T, int>.Instance);
return aggregatedWeights[index < 0 ? ~index : index + 1].Key;
}
static KeyValuePair<T, int>[] GetWeightsForEnum<T>()
{
object temp;
if (_typeToAggregatedWeights.TryGetValue(typeof(T), out temp))
{
return (KeyValuePair<T, int>[])temp;
}
if (!_typeToWeightMap.TryGetValue(typeof(T), out temp))
{
throw new ArgumentException("Unsupported enum type");
}
KeyValuePair<T, int>[] weightMap = (KeyValuePair<T, int>[])temp;
KeyValuePair<T, int>[] aggregatedWeights =
new KeyValuePair<T, int>[weightMap.Length];
int sum = 0;
for (int i = 0; i < weightMap.Length; i++)
{
sum += weightMap[i].Value;
aggregatedWeights[i] = new KeyValuePair<T,int>(weightMap[i].Key, sum);
}
_typeToAggregatedWeights[typeof(T)] = aggregatedWeights;
return aggregatedWeights;
}
readonly static Random _random = new Random();
// Helper method to reduce verbosity in the enum-to-weight array declarations
static KeyValuePair<T1, T2> CreateKvp<T1, T2>(T1 t1, T2 t2)
{
return new KeyValuePair<T1, T2>(t1, t2);
}
readonly static KeyValuePair<BloodType, int>[] _bloodTypeToWeight =
{
CreateKvp(BloodType.ONeg, 4),
CreateKvp(BloodType.OPos, 36),
CreateKvp(BloodType.ANeg, 3),
CreateKvp(BloodType.APos, 28),
CreateKvp(BloodType.BNeg, 1),
CreateKvp(BloodType.BPos, 20),
CreateKvp(BloodType.ABNeg, 1),
CreateKvp(BloodType.ABPos, 5),
};
readonly static Dictionary<Type, object> _typeToWeightMap =
new Dictionary<Type, object>()
{
{ typeof(BloodType), _bloodTypeToWeight },
};
readonly static Dictionary<Type, object> _typeToAggregatedWeights =
new Dictionary<Type, object>();
Note that the work of actually selecting a random value is simply a matter of choosing a non-negative random integer less than the sum of the weights, and then using a binary search to find the appropriate enum value.
Once per enum type, the code will build the table of values and weight-sums that will be used for the binary search. This result is stored in a cache dictionary, _typeToAggregatedWeights.
There are also the objects that have to be declared and which will be used at run-time to build this table. Note that the _typeToWeightMap is just in support of making this method 100% generic. If you wanted to write a different named method for each specific type you wanted to support, that could still used a single generic method to implement the initialization and selection, but the named method would know the correct object (e.g. _bloodTypeToWeight) to use for initialization.
Alternatively, another way to avoid the _typeToWeightMap while still keeping the method 100% generic would be to have the _typeToAggregatedWeights be of type Dictionary<Type, Lazy<object>>, and have the values of the dictionary (the Lazy<object> objects) explicitly reference the appropriate weight array for the type.
In other words, there are lots of variations on this theme that would work fine. But they will all have essentially the same structure as above; semantics would be the same and performance differences would be negligible.
One thing you'll notice is that the binary search requires a custom IComparer<T> implementation. That is here:
class KvpValueComparer<TKey, TValue> :
IComparer<KeyValuePair<TKey, TValue>> where TValue : IComparable<TValue>
{
public readonly static KvpValueComparer<TKey, TValue> Instance =
new KvpValueComparer<TKey, TValue>();
private KvpValueComparer() { }
public int Compare(KeyValuePair<TKey, TValue> x, KeyValuePair<TKey, TValue> y)
{
return x.Value.CompareTo(y.Value);
}
}
This allows the Array.BinarySearch() method to correct compare the array elements, allowing a single array to contain both the enum values and their aggregated weights, but limiting the binary search comparison to just the weights.
Assuming your enum values are all of type int (you can adjust this accordingly if they're long, short, or whatever):
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof (TEnum), curr);
return agg.Concat(Enumerable.Repeat((TEnum)value,(int)value)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
Here's how you would use it:
var rng = new Random();
var randomBloodType = RandomEnumValue<BloodType>(rng);
People seem to have their knickers in a knot about multiple indistinguishable enum values in the input enum (for which I still think the above code provides expected behavior). Note that there is no answer here, not even Peter Duniho's, that will allow you to distinguish enum entries when they have the same value, so I'm not sure why this is being considered as a metric for any potential solutions.
Nevertheless, an alternative approach that doesn't use the enum values as probabilities is to use an attribute to specify the probability:
public enum BloodType
{
[P=4]
ONeg,
[P=36]
OPos,
[P=3]
ANeg,
[P=28]
APos,
[P=1]
BNeg,
[P=20]
BPos,
[P=1]
ABNeg,
[P=5]
ABPos
}
Here is what the attribute used above looks like:
[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
public class PAttribute : Attribute
{
public int Weight { get; private set; }
public PAttribute(int weight)
{
Weight = weight;
}
}
and finally, this is what the method to get a random enum value would like:
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof(TEnum), curr);
FieldInfo fi = typeof (TEnum).GetField(curr);
var weight = ((PAttribute)fi.GetCustomAttribute(typeof(PAttribute), false)).Weight;
return agg.Concat(Enumerable.Repeat((TEnum)value, weight)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
(Note: if this code is performance critical, you might need to tweak this and add caching for the reflection data).
Some of this you can do and some of it isn't so easy. I believe the following extension method will do what you describe.
static public class Util {
static Random rnd = new Random();
static public int PriorityPickEnum(this Enum e) {
// The approved types for an enum are byte, sbyte, short, ushort, int, uint, long, or ulong
// However, Random only supports a int (or double) as a max value. Either way
// it doesn't have the range for uint, long and ulong.
//
// sum enum
int sum = 0;
foreach (var x in Enum.GetValues(e.GetType())) {
sum += Convert.ToInt32(x);
}
var i = rnd.Next(sum); // get a random value, it will form a ratio i / sum
// enums may not have a uniform (incremented) value range (think about flags)
// therefore we have to step through to get to the range we want,
// this is due to the requirement that return value have a probability
// proportional to it's value. Note enum values must be sorted for this to work.
foreach (var x in Enum.GetValues(e.GetType()).OfType<Enum>().OrderBy(a => a)) {
i -= Convert.ToInt32(x);
if (i <= 0) return Convert.ToInt32(x);
}
throw new Exception("This doesn't seem right");
}
}
Here is an example of using this extension:
BloodType bt = BloodType.ABNeg;
for (int i = 0; i < 100; i++) {
var v = (BloodType) bt.PriorityPickEnum();
Console.WriteLine("{0}: {1}({2})", i, v, (int) v);
}
This should work pretty well for enum's of type byte, sbyte, ushort, short and int. Once you get beyond int (uint, long, ulong) the problem is the Random class. You can adjust the code to use doubles generated by Random, which would cover uint, but the Random class just doesn't have the range to cover long and ulong. Of course you could use/find/write a different Random class if this is important.
I wanted to count the number of repeated characters in a text file..
I wrote this code
foreach(char c in File.ReadAllText(path))
{
if(dic.ContainsKey(c))
{
int i=dic[c];
dic[c]=i++;
}
else
{
dic.Add(c,1);
}
}
It's adding all the unique words but it's showing value for all keys as 1 even if there are repeated characters!
I think you want:
dic[c] = i + 1;
Or possibly, although IMHO this just adds complexity since you don't use i after:
dic[c] = ++i;
Explanation:
i++ is a post-increment operation. This means it assigns the current value of i to dic[c] and then increments i. So in summary, you're always reading in i=1, putting the i=1 back into the dictionary, then incrementing i to 2 before sending it to the void.
Addendum:
You don't really need to go through a temporary variable at all. You can simply read and assign the value back in one operation with dic[c] += 1; or even increment it with dic[c]++;.
i++ will add one to the value of i but return the value of i before the increment. You don't want to do that. You just want to return the value of i incremented by one. To do this, just write:
dic[c] = i+1;
On a side note, you could do the whole thing using LINQ instead:
var dic = File.ReadAllText(path).GroupBy(c => c)
.ToDictionary(group => group.Key, group => group.Count());
You want dic[c] = i + 1; or dic[c] += 1 or dic[c]++. In your code the post increment operator is incrementing i after assignment takes place so it has no effect on the value of dic[c].
dic[c]=i++; translates to
dic[c] = i;
i = i++;
i isn't a reference value and thus the value of i placed inside the dictionary will not change after you increment it outside the dictionary.
Use dic[c]++; instead.
This is because i gets incremented after being affected to dict[c]. Try this instead :
if(dic.ContainsKey(c))
{
dic[c] += 1;
}
Dictionary<char, int> LetterCount(string textPath)
{
var dic = new Dictionary<char, int>();
foreach (char c in System.IO.File.ReadAllText(textPath))
{
if (dic.ContainsKey(c))
dic[c]++;
else
dic.Add(c, 1);
}
return dic;
}
Then use like this:
var letters = LetterCount(#"C:\Text.txt");
// letters will contain the result