Find min and max of cumulative sum in Linq - c#

I have the following function which I am using to find the terminal accumulative positive and negative value, which is working:
public class CumulativeTotal
{
[Test]
public void CalculatesTerminalValue()
{
IEnumerable<decimal> sequence = new decimal[] { 10, 20, 20, -20, -50, 10 };
var values = FindTerminalValues(sequence);
Assert.That(values.Item1, Is.EqualTo(-20));
Assert.That(values.Item2, Is.EqualTo(50));
Assert.Pass();
}
public static Tuple<decimal,decimal> FindTerminalValues(IEnumerable<decimal> values)
{
decimal largest = 0;
decimal smallest = 0;
decimal current = 0;
foreach (var value in values)
{
current += value;
if (current > largest)
largest = current;
else if (current < smallest)
smallest = current;
}
return new Tuple<decimal, decimal>(smallest,largest);
}
}
However, in the interests of learning, how could i implement with Linq?
I can see a package MoreLinq, but not sure where to start!

You can try standard Linq Aggregate method:
// Let's return named tuple: unlike min, max
// current .Item1 and .Item2 are not readable
public static (decimal min, decimal max) FindTerminalValues(IEnumerable<decimal> values) {
//public method arguments validation
if (values is null)
throw new ArgumentNullException(nameof(values));
(var min, var max, _) = values
.Aggregate((min: decimal.MaxValue, max: decimal.MinValue, curr: 0m),
(s, a) => (Math.Min(s.min, s.curr + a),
Math.Max(s.max, s.curr + a),
s.curr + a));
return (min, max);
}

yes, you can use MoreLinq like this, it has the Scan method.
public static Tuple<decimal, decimal> FindTerminalValues(IEnumerable<decimal> values)
{
var cumulativeSum = values.Scan((acc, x) => acc + x).ToList();
decimal min = cumulativeSum.Min();
decimal max = cumulativeSum.Max();
return new Tuple<decimal, decimal>(min, max);
}
The Scan extension method generates a new sequence by applying a function to each element in the input sequence, using the previous element as an accumulator. In this case, the function is simply the addition operator, so the Scan method generates a sequence of the cumulative sum of the input sequence.

The major flaw in the code you've presented is that if the running sum of the the sequence stays below zero or above zero the whole time then the algorithm incorrectly returns zero as one of the terminals.
Take this:
IEnumerable<decimal> sequence = new decimal[] { 10, 20, };
Your current algorithm returns (0, 30) when it should be (10, 30).
To correct that you must start with the first value of the sequence as the default minimum and maximum.
Here's an implementation that does that:
public static (decimal min, decimal max) FindTerminalValues(IEnumerable<decimal> values)
{
if (!values.Any())
throw new System.ArgumentException("no values");
decimal first = values.First();
IEnumerable<decimal> scan = values.Scan((x, y) => x + y);
return scan.Aggregate(
(min: first, max: first),
(a, x) =>
(
min: x < a.min ? x : a.min,
max: x > a.max ? x : a.max)
);
}
It uses System.Interactive to get the Scan operator (but you could use MoreLinq.
However, the one downside to this approach is that IEnumerable<decimal> is not guaranteed to return the same values every time. You either need to (1) pass in a decimal[], List<decimal>, or other structure that will always return the same sequence, or (2) ensure you only iterate the IEnumerable<decimal> once.
Here's how to do (2):
public static (decimal min, decimal max) FindTerminalValues(IEnumerable<decimal> values)
{
var e = values.GetEnumerator();
if (!e.MoveNext())
throw new System.ArgumentException("no values");
var terminal = (min: e.Current, max: e.Current);
decimal value = e.Current;
while (e.MoveNext())
{
value += e.Current;
terminal = (Math.Min(value, terminal.min), Math.Max(value, terminal.max));
}
return terminal;
}

You can use the Aggregate method in LINQ to achieve this.
The Aggregate method applies a function to each element in a sequence and returns the accumulated result. It takes as parameter an initial accumulator object to keep track of the smallest and largest function.
public static Tuple<decimal,decimal> FindTerminalValues(IEnumerable<decimal> values)
{
return values.Aggregate(
// Initial accumulator value:
new Tuple<decimal, decimal>(0, 0),
// Accumulation function:
(acc, value) =>
{
// Add the current value to the accumulator:
var current = acc.Item1 + value;
// Update the smallest and largest accumulated values:
var smallest = Math.Min(current, acc.Item1);
var largest = Math.Max(current, acc.Item2);
// Return the updated accumulator value:
return new Tuple<decimal, decimal>(smallest, largest);
});
}

Related

Return the next whole number

I want to pass a number and have the next whole number returned,
I've tried Math.Ceiling(3) , but it returns 3.
Desired output :
double val = 9.1 => 10
double val = 3 => 4
Thanks
There are two ways I would suggest doing this:
Using Math.Floor():
return Math.Floor(input + 1);
Using casting (to lose precision)
return (int)input + 1;
Fiddle here
Using just the floor or ceiling wont give you the next whole number in every case.
For eg:- If you input negative numbers. Better way is to create a function that does that.
public class Test{
public int NextWholeNumber(double n)
{
if(n < 0)
return 0;
else
return Convert.ToInt32(Math.Floor(n)+1);
}
// Main method
static public void Main()
{
Test o = new Test();
Console.WriteLine(o.NextWholeNumber(1.254));
}
}
Usually when you refer to whole number it is positive integers only. But if you require negative integers as well then you can try this, the code will return 3.0 => 4, -1.0 => 0, -1.1 => -1
double doubleValue = double.Parse(Console.ReadLine());
int wholeNumber = 0;
if ((doubleValue - Math.Floor(doubleValue) > 0))
{
wholeNumber = int.Parse(Math.Ceiling(doubleValue).ToString());
}
else
{
wholeNumber = int.Parse((doubleValue + 1).ToString());
}

Combination Algorithm

Length = input Long(can be 2550, 2880, 2568, etc)
List<long> = {618, 350, 308, 300, 250, 232, 200, 128}
The program takes a long value, for that particular long value we have to find the possible combination from the above list which when added give me a input result(same value can be used twice). There can be a difference of +/- 30.
Largest numbers have to be used most.
Ex:Length = 868
For this combinations can be
Combination 1 = 618 + 250
Combination 2 = 308 + 232 + 200 +128
Correct Combination would be Combination 1
But there should also be different combinations.
public static void Main(string[] args)
{
//subtotal list
List<int> totals = new List<int>(new int[] { 618, 350, 308, 300, 250, 232, 200, 128 });
// get matches
List<int[]> results = KnapSack.MatchTotal(2682, totals);
// print results
foreach (var result in results)
{
Console.WriteLine(string.Join(",", result));
}
Console.WriteLine("Done.");
}
internal static List<int[]> MatchTotal(int theTotal, List<int> subTotals)
{
List<int[]> results = new List<int[]>();
while (subTotals.Contains(theTotal))
{
results.Add(new int[1] { theTotal });
subTotals.Remove(theTotal);
}
if (subTotals.Count == 0)
return results;
subTotals.Sort();
double mostNegativeNumber = subTotals[0];
if (mostNegativeNumber > 0)
mostNegativeNumber = 0;
if (mostNegativeNumber == 0)
subTotals.RemoveAll(d => d > theTotal);
for (int choose = 0; choose <= subTotals.Count; choose++)
{
IEnumerable<IEnumerable<int>> combos = Combination.Combinations(subTotals.AsEnumerable(), choose);
results.AddRange(from combo in combos where combo.Sum() == theTotal select combo.ToArray());
}
return results;
}
public static class Combination
{
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int choose)
{
return choose == 0 ?
new[] { new T[0] } :
elements.SelectMany((element, i) =>
elements.Skip(i + 1).Combinations(choose - 1).Select(combo => (new[] { element }).Concat(combo)));
}
}
I Have used the above code, can it be more simplified, Again here also i get unique values. A value can be used any number of times. But the largest number has to be given the most priority.
I have a validation to check whether the total of the sum is greater than the input value. The logic fails even there..
The algorithm you have shown assumes that the list is sorted in ascending order. If not, then you shall first have to sort the list in O(nlogn) time and then execute the algorithm.
Also, it assumes that you are only considering combinations of pairs and you exit on the first match.
If you want to find all combinations, then instead of "break", just output the combination and increment startIndex or decrement endIndex.
Moreover, you should check for ranges (targetSum - 30 to targetSum + 30) rather than just the exact value because the problem says that a margin of error is allowed.
This is the best solution according to me because its complexity is O(nlogn + n) including the sorting.
V4 - Recursive Method, using Stack structure instead of stack frames on thread
It works (tested in VS), but there could be some bugs remaining.
static int Threshold = 30;
private static Stack<long> RecursiveMethod(long target)
{
Stack<long> Combination = new Stack<long>(establishedValues.Count); //Can grow bigger, as big as (target / min(establishedValues)) values
Stack<int> Index = new Stack<int>(establishedValues.Count); //Can grow bigger
int lowerBound = 0;
int dimensionIndex = lowerBound;
long fail = -1 * Threshold;
while (true)
{
long thisVal = establishedValues[dimensionIndex];
dimensionIndex++;
long afterApplied = target - thisVal;
if (afterApplied < fail)
lowerBound = dimensionIndex;
else
{
target = afterApplied;
Combination.Push(thisVal);
if (target <= Threshold)
return Combination;
Index.Push(dimensionIndex);
dimensionIndex = lowerBound;
}
if (dimensionIndex >= establishedValues.Count)
{
if (Index.Count == 0)
return null; //No possible combinations
dimensionIndex = Index.Pop();
lowerBound = dimensionIndex;
target += Combination.Pop();
}
}
}
Maybe V3 - Suggestion for Ordered solution trying every combination
Although this isn't chosen as the answer for the related question, I believe this is a good approach - https://stackoverflow.com/a/17258033/887092(, otherwise you could try the chosen answer (although the output for that is only 2 items in set being summed, rather than up to n items)) - it will enumerate every option including multiples of the same value. V2 works but would be slightly less efficient than an ordered solution, as the same failing-attempt will likely be attempted multiple times.
V2 - Random Selection - Will be able to reuse the same number twice
I'm a fan of using random for "intelligence", allowing the computer to brute force the solution. It's also easy to distribute - as there is no state dependence between two threads trying at the same time for example.
static int Threshold = 30;
public static List<long> RandomMethod(long Target)
{
List<long> Combinations = new List<long>();
Random rnd = new Random();
//Assuming establishedValues is sorted
int LowerBound = 0;
long runningSum = Target;
while (true)
{
int newLowerBound = FindLowerBound(LowerBound, runningSum);
if (newLowerBound == -1)
{
//No more beneficial values to work with, reset
runningSum = Target;
Combinations.Clear();
LowerBound = 0;
continue;
}
LowerBound = newLowerBound;
int rIndex = rnd.Next(LowerBound, establishedValues.Count);
long val = establishedValues[rIndex];
runningSum -= val;
Combinations.Add(val);
if (Math.Abs(runningSum) <= 30)
return Combinations;
}
}
static int FindLowerBound(int currentLowerBound, long runningSum)
{
//Adjust lower bound, so we're not randomly trying a number that's too high
for (int i = currentLowerBound; i < establishedValues.Count; i++)
{
//Factor in the threshold, because an end aggregate which exceeds by 20 is better than underperforming by 21.
if ((establishedValues[i] - Threshold) < runningSum)
{
return i;
}
}
return -1;
}
V1 - Ordered selection - Will not be able to reuse the same number twice
Add this very handy extension function (uses a binary algorithm to find all combinations):
//Make sure you put this in a static class inside System namespace
public static IEnumerable<List<T>> EachCombination<T>(this List<T> allValues)
{
var collection = new List<List<T>>();
for (int counter = 0; counter < (1 << allValues.Count); ++counter)
{
List<T> combination = new List<T>();
for (int i = 0; i < allValues.Count; ++i)
{
if ((counter & (1 << i)) == 0)
combination.Add(allValues[i]);
}
if (combination.Count == 0)
continue;
yield return combination;
}
}
Use the function
static List<long> establishedValues = new List<long>() {618, 350, 308, 300, 250, 232, 200, 128, 180, 118, 155};
//Return is a list of the values which sum to equal the target. Null if not found.
List<long> FindFirstCombination(long target)
{
foreach (var combination in establishedValues.EachCombination())
{
//if (combination.Sum() == target)
if (Math.Abs(combination.Sum() - target) <= 30) //Plus or minus tolerance for difference
return combination;
}
return null; //Or you could throw an exception
}
Test the solution
var target = 858;
var result = FindFirstCombination(target);
bool success = (result != null && result.Sum() == target);
//TODO: for loop with random selection of numbers from the establishedValues, Sum and test through FindFirstCombination

Extract the k maximum elements of a list

Let's say I have a collection of some type, e.g.
IEnumerable<double> values;
Now I need to extract the k highest values from that collection, for some parameter k. This is a very simple way to do this:
values.OrderByDescending(x => x).Take(k)
However, this (if I understand this correctly) first sorts the entire list, then picks the first k elements. But if the list is very large, and k is comparatively small (smaller than log n), this is not very efficient - the list is sorted in O(nlog n), but I figure selecting the k highest values from a list should be more like O(nk).
So, does anyone have any suggestion for a better, more efficient way to do this?
This gives a bit of a performance increase. Note that it's ascending rather than descending but you should be able to repurpose it (see comments):
static IEnumerable<double> TopNSorted(this IEnumerable<double> source, int n)
{
List<double> top = new List<double>(n + 1);
using (var e = source.GetEnumerator())
{
for (int i = 0; i < n; i++)
{
if (e.MoveNext())
top.Add(e.Current);
else
throw new InvalidOperationException("Not enough elements");
}
top.Sort();
while (e.MoveNext())
{
double c = e.Current;
int index = top.BinarySearch(c);
if (index < 0) index = ~index;
if (index < n) // if (index != 0)
{
top.Insert(index, c);
top.RemoveAt(n); // top.RemoveAt(0)
}
}
}
return top; // return ((IEnumerable<double>)top).Reverse();
}
Consider the below method:
static IEnumerable<double> GetTopValues(this IEnumerable<double> values, int count)
{
var maxSet = new List<double>(Enumerable.Repeat(double.MinValue, count));
var currentMin = double.MinValue;
foreach (var t in values)
{
if (t <= currentMin) continue;
maxSet.Remove(currentMin);
maxSet.Add(t);
currentMin = maxSet.Min();
}
return maxSet.OrderByDescending(i => i);
}
And the test program:
static void Main()
{
const int SIZE = 1000000;
const int K = 10;
var random = new Random();
var values = new double[SIZE];
for (var i = 0; i < SIZE; i++)
values[i] = random.NextDouble();
// Test values
values[SIZE/2] = 2.0;
values[SIZE/4] = 3.0;
values[SIZE/8] = 4.0;
IEnumerable<double> result;
var stopwatch = new Stopwatch();
stopwatch.Start();
result = values.OrderByDescending(x => x).Take(K).ToArray();
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
result = values.GetTopValues(K).ToArray();
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
}
On my machine results are 1002 and 14.
Another way of doing this (haven't been around C# for years, so pseudo-code it is, sorry) would be:
highestList = []
lowestValueOfHigh = 0
for every item in the list
if(lowestValueOfHigh > item) {
delete highestList[highestList.length - 1] from list
do insert into list with binarysearch
if(highestList[highestList.length - 1] > lowestValueOfHigh)
lowestValueOfHigh = highestList[highestList.length - 1]
}
I wouldn't state anything about performance without profiling. In this answer I'll just try to implement O(n*k) take-one-enumeration-for-one-max-value approach. Personally I think that ordering approach is superior. Anyway:
public static IEnumerable<double> GetMaxElements(this IEnumerable<double> source)
{
var usedIndices = new HashSet<int>();
while (true)
{
var enumerator = source.GetEnumerator();
int index = 0;
int maxIndex = 0;
double? maxValue = null;
while(enumerator.MoveNext())
{
if((!maxValue.HasValue||enumerator.Current>maxValue)&&!usedIndices.Contains(index))
{
maxValue = enumerator.Current;
maxIndex = index;
}
index++;
}
usedIndices.Add(maxIndex);
if (!maxValue.HasValue) break;
yield return maxValue.Value;
}
}
Usage:
var biggestElements = values.GetMaxElements().Take(3);
Downsides:
Method assumes that source IEnumerable has an order
Method uses additional memory/operations to save used indices.
Advantage:
You can be sure that it takes one enumeration to get next max value.
See it running
Here is a Linqy TopN operator for enumerable sequences, based on the PriorityQueue<TElement, TPriority> collection:
/// <summary>
/// Selects the top N elements from the source sequence. The selected elements
/// are returned in descending order.
/// </summary>
public static IEnumerable<T> TopN<T>(this IEnumerable<T> source, int n,
IComparer<T> comparer = default)
{
ArgumentNullException.ThrowIfNull(source);
if (n < 1) throw new ArgumentOutOfRangeException(nameof(n));
PriorityQueue<bool, T> top = new(comparer);
foreach (var item in source)
{
if (top.Count < n)
top.Enqueue(default, item);
else
top.EnqueueDequeue(default, item);
}
List<T> topList = new(top.Count);
while (top.TryDequeue(out _, out var item)) topList.Add(item);
for (int i = topList.Count - 1; i >= 0; i--) yield return topList[i];
}
Usage example:
IEnumerable<double> topValues = values.TopN(k);
The topValues sequence contains the k maximum values in the values, in descending order. In case there are duplicate values in the topValues, the order of the equal values is undefined (non-stable sort).
For a SortedSet<T>-based implementation that compiles on .NET versions earlier than .NET 6, you could look at the 5th revision of this answer.
An operator PartialSort with similar functionality exists in the MoreLinq package. It's not implemented optimally though (source code). It performs invariably a binary search for each item, instead of comparing it with the smallest item in the top list, resulting in many more comparisons than necessary.
Surprisingly the LINQ itself is well optimized for the OrderByDescending+Take combination, resulting in excellent performance. It's only slightly slower than the TopN operator above. This applies to all versions of the .NET Core and later (.NET 5 and .NET 6). It doesn't apply to the .NET Framework platform, where the complexity is O(n*log n) as expected.
A demo that compares 4 different approaches can be found here. It compares:
values.OrderByDescending(x => x).Take(k).
values.OrderByDescending(x => x).HideIdentity().Take(k), where HideIdentity is a trivial LINQ propagator that hides the identity of the underlying enumerable, and so it effectively disables the LINQ optimizations.
values.PartialSort(k, MoreLinq.OrderByDirection.Descending) (MoreLinq).
values.TopN(k)
Below is a typical output of the demo, running in Release mode on .NET 6:
.NET 6.0.0-rtm.21522.10
Extract the 100 maximum elements from 2,000,000 random values, and calculate the sum.
OrderByDescending+Take Duration: 156 msec, Comparisons: 3,129,640, Sum: 99.997344
OrderByDescending+HideIdentity+Take Duration: 1,415 msec, Comparisons: 48,602,298, Sum: 99.997344
MoreLinq.PartialSort Duration: 277 msec, Comparisons: 13,999,582, Sum: 99.997344
TopN Duration: 62 msec, Comparisons: 2,013,207, Sum: 99.997344

Best way to distribute different outcomes?

I have a method called "GetValue()" which is supposed to return the value "A", "B", "C" or "D" on each method call.
I want this method to return the value "A" in 30% of the method calls and the value "B" in 14% of the method calls, the value "C" 31%.. and so on...
Wich is the best way to distribute theese values smoothly, I do not want the method to return the value "A" xxx times in a row becouse the value "A" are farest from it's requested outcome percentage.
Please, all answeres are appreciated.
You can use the Random class to achieve this:
private static Random Generator = new Random();
public string GetValue()
{
var next = Generator.Next(100);
if (next < 30) return "A";
if (next < 44) return "B";
if (next < 75) return "C";
return "D";
}
Update
For a more generic random weighted value store, the following may be a good starting point:
public class WeightedValueStore<T> : IDisposable
{
private static readonly Random Generator = new Random();
private readonly List<Tuple<int, T>> _values = new List<Tuple<int, T>>();
private readonly ReaderWriterLockSlim _valueLock = new ReaderWriterLockSlim();
public void AddValue(int weight, T value)
{
_valueLock.EnterWriteLock();
try
{
_values.Add(Tuple.Create(weight, value));
}
finally
{
_valueLock.ExitWriteLock();
}
}
public T GetValue()
{
_valueLock.EnterReadLock();
try
{
var totalWeight = _values.Sum(t => t.Item1);
var next = Random.Next(totalWeight);
foreach (var tuple in _values)
{
next -= tuple.Item1;
if (next < 0) return tuple.Item2;
}
return default(T); // Or throw exception here - only reachable if _values has no elements.
}
finally
{
_valueLock.ExitReadLock();
}
}
public void Dispose()
{
_valueLock.Dispose();
}
}
Which would then be useable like so:
public string GetValue()
{
using (var valueStore = new WeightedValueStore<string>())
{
valueStore.AddValue(30, "A");
valueStore.AddValue(14, "B");
valueStore.AddValue(31, "C");
valueStore.AddValue(25, "D");
return valueStore.GetValue();
}
}
Use Random.
Take care of the seed. See this link.
Example:
// You can provide a seed as a parameter of the Random() class.
private static Random RandomGenerator = new Random();
private static string Generate()
{
int value = RandomGenerator.Next(100);
if (value < 30)
{
return "A";
}
else if (value < 44)
{
return "B";
}
else
{
return "C";
}
}
If you want that distribution by average, you can just pick a random number and check it.
Random rnd = new Random();
int value = rnd.Next(100); // get a number in the range 0 - 99
if (value < 30) return "A";
if (value < 30+14) return "B";
if (value < 30+14+31) return "C";
return "D";
Note that you should create the random generator once, and reuse it for subsequent calls. If you create a new one each time, they will be initialised with the same random sequence if two method calls come too close in time.
If you want exactly that distribution for 100 items, then you would create an array with 100 items, where 30 are "A", 14 are "B", and so on. Shuffle the array (look up Fisher-Yates), and return one item from the array for each method call.
Let's say you have the arrays
String[] possibleOutcomes = new String[] { "A", "B", "C", "D" }
and
int[] possibleOutcomeProbabilities = new int[] { 30, 14, 31, 25 }
You can use the following strategy whenever you are required to output one of the outcomes:
Find the sum of all elements in possibleOutcomeProbabilities. Lets call this sum totalProbability.
Generate a random number between 1 and totalProbability. Lets call this randomly generated number outcomeBucket.
Iterate over possibleOutcomeProbabilities to determine which outcome outcomeBucket corresponds to. You then pick the corresponding outcome from possibleOutcomes.
This strategy will certainly not give you first 30% outcomes as A, next 14% as B, etc. However, as probability works, over a sufficiently large number of outcomes, this strategy will ensure that your possible outcomes are distributed as per their expected probabilities. This strategy gives you the advantage that outcome probabilities are not required to add up to 100%. You can even specify relative probabilities, such as, 1:2:3:4, etc.
If you are really worried about the fastest possible implementation for the strategy, you can tweak it as follows:
a. Calculate totalProbability only once, or when the probablities are changed.
b. Before calculating totalProbability, see if the elements in possibleOutcomeProbabilities have any common divisors and eliminate those. This will give you a smaller probability space to traverse each time.
try this:
Random r = new Random();
private string GetValue()
{
double d = r.Next();
if(d < 0.3)
return "A";
else if(d < 0.5)
return "B";
...etc.
}
EDIT: just make sure that the Random variable is created outside the function or you'll get the same value each time.
I would not recommend any hard-coded approach (it is hard to maintain and it's bad practice). I'd prefer a more generic solution instead.
enum PossibleOutcome { A, B, C, D, Undefined }
// sample data: possible outcome vs its probability
static readonly Dictionary<PossibleOutcome, double> probabilities = new Dictionary<PossibleOutcome, double>()
{
{PossibleOutcome.A, 0.31},
{PossibleOutcome.B, 0.14},
{PossibleOutcome.C, 0.30},
{PossibleOutcome.D, 0.25}
};
static Random random = new Random();
static PossibleOutcome GetValue()
{
var result = random.NextDouble();
var sum = 0.0;
foreach (var probability in probabilities)
{
sum += probability.Value;
if (result <= sum)
{
return probability.Key;
}
}
return PossibleOutcome.Undefined; // it shouldn't happen
}
static void Main(string[] args)
{
if (probabilities.Sum(pair => pair.Value) != 1.0)
{
throw new ApplicationException("Probabilities must add up to 100%!");
}
for (var i = 0; i < 100; i++)
{
Console.WriteLine(GetValue().ToString());
}
Console.ReadLine();
}

Good way to get the key of the highest value of a Dictionary in C#

I'm trying to get the key of the maximum value in the Dictionary<string, double> results.
This is what I have so far:
double max = results.Max(kvp => kvp.Value);
return results.Where(kvp => kvp.Value == max).Select(kvp => kvp.Key).First();
However, since this seems a little inefficient, I was wondering whether there was a better way to do this.
edit: .NET 6 introduced a new method
var max = results.MaxBy(kvp => kvp.Value).Key;
You should probably use that if you can.
I think this is the most readable O(n) answer using standard LINQ.
var max = results.Aggregate((l, r) => l.Value > r.Value ? l : r).Key;
edit: explanation for CoffeeAddict
Aggregate is the LINQ name for the commonly known functional concept Fold
It loops over each element of the set and applies whatever function you provide.
Here, the function I provide is a comparison function that returns the bigger value.
While looping, Aggregate remembers the return result from the last time it called my function. It feeds this into my comparison function as variable l. The variable r is the currently selected element.
So after aggregate has looped over the entire set, it returns the result from the very last time it called my comparison function. Then I read the .Key member from it because I know it's a dictionary entry
Here is a different way to look at it [I don't guarantee that this compiles ;) ]
var l = results[0];
for(int i=1; i<results.Count(); ++i)
{
var r = results[i];
if(r.Value > l.Value)
l = r;
}
var max = l.Key;
After reading various suggestions, I decided to benchmark them and share the results.
The code tested:
// TEST 1
for (int i = 0; i < 999999; i++)
{
KeyValuePair<GameMove, int> bestMove1 = possibleMoves.First();
foreach (KeyValuePair<GameMove, int> move in possibleMoves)
{
if (move.Value > bestMove1.Value) bestMove1 = move;
}
}
// TEST 2
for (int i = 0; i < 999999; i++)
{
KeyValuePair<GameMove, int> bestMove2 = possibleMoves.Aggregate((a, b) => a.Value > b.Value ? a : b);
}
// TEST 3
for (int i = 0; i < 999999; i++)
{
KeyValuePair<GameMove, int> bestMove3 = (from move in possibleMoves orderby move.Value descending select move).First();
}
// TEST 4
for (int i = 0; i < 999999; i++)
{
KeyValuePair<GameMove, int> bestMove4 = possibleMoves.OrderByDescending(entry => entry.Value).First();
}
The results:
Average Seconds Test 1 = 2.6
Average Seconds Test 2 = 4.4
Average Seconds Test 3 = 11.2
Average Seconds Test 4 = 11.2
This is just to give an idea of their relative performance.
If your optimizing 'foreach' is fastest, but LINQ is compact and flexible.
Maybe this isn't a good use for LINQ. I see 2 full scans of the dictionary using the LINQ solution (1 to get the max, then another to find the kvp to return the string.
You could do it in 1 pass with an "old fashioned" foreach:
KeyValuePair<string, double> max = new KeyValuePair<string, double>();
foreach (var kvp in results)
{
if (kvp.Value > max.Value)
max = kvp;
}
return max.Key;
You can sort dictionary by using OrderBy (for find min value) or OrderByDescending (for max value) then get first element. It also help when you need find second max/min element
Get dictionary key by max value:
double min = results.OrderByDescending(x => x.Value).First().Key;
Get dictionary key by min value:
double min = results.OrderBy(x => x.Value).First().Key;
Get dictionary key by second max value:
double min = results.OrderByDescending(x => x.Value).Skip(1).First().Key;
Get dictionary key by second min value:
double min = results.OrderBy(x => x.Value).Skip(1).First().Key;
This is a fast method. It is O(n), which is optimal. The only problem I see is that it iterates over the dictionary twice instead of just once.
You can do it iterating over the dictionary once by using MaxBy from morelinq.
results.MaxBy(kvp => kvp.Value).Key;
Little extension method:
public static KeyValuePair<K, V> GetMaxValuePair<K,V>(this Dictionary<K, V> source)
where V : IComparable
{
KeyValuePair<K, V> maxPair = source.First();
foreach (KeyValuePair<K, V> pair in source)
{
if (pair.Value.CompareTo(maxPair.Value) > 0)
maxPair = pair;
}
return maxPair;
}
Then:
int keyOfMax = myDictionary.GetMaxValuePair().Key;
Check These out:
result.Where(x=>x.Value==result.Values.Max()).Select(x=>x.Key).ToList()
My version based off the current Enumerable.Max implementation with an optional comparer:
public static TSource MaxValue<TSource, TConversionResult>(this IEnumerable<TSource> source, Func<TSource, TConversionResult> function, IComparer<TConversionResult> comparer = null)
{
comparer = comparer ?? Comparer<TConversionResult>.Default;
if (source == null) throw new ArgumentNullException(nameof(source));
TSource max = default;
TConversionResult maxFx = default;
if ( (object)maxFx == null) //nullable stuff
{
foreach (var x in source)
{
var fx = function(x);
if (fx == null || (maxFx != null && comparer.Compare(fx, maxFx) <= 0)) continue;
maxFx = fx;
max = x;
}
return max;
}
//valuetypes
var notFirst = false;
foreach (var x in source)
{
var fx = function(x);
if (notFirst)
{
if (comparer.Compare(fx, maxFx) <= 0) continue;
maxFx = fx;
max = x;
}
else
{
maxFx = fx;
max = x;
notFirst = true;
}
}
if (notFirst)
return max;
throw new InvalidOperationException("Sequence contains no elements");
}
Example usage:
class Wrapper
{
public int Value { get; set; }
}
[TestMethod]
public void TestMaxValue()
{
var dictionary = new Dictionary<string, Wrapper>();
for (var i = 0; i < 19; i++)
{
dictionary[$"s:{i}"] = new Wrapper{Value = (i % 10) * 10 } ;
}
var m = dictionary.Keys.MaxValue(x => dictionary[x].Value);
Assert.AreEqual(m, "s:9");
}
How about doing it in parallel using Interlocked.Exchange for thread safety :) Keep in mind that Interlocked.Exchange will only work with a reference type.(i.e. a struct or key value pair (unless wrapped in a class) will not work to hold the max value.
Here's an example from my own code:
//Parallel O(n) solution for finding max kvp in a dictionary...
ClassificationResult maxValue = new ClassificationResult(-1,-1,double.MinValue);
Parallel.ForEach(pTotals, pTotal =>
{
if(pTotal.Value > maxValue.score)
{
Interlocked.Exchange(ref maxValue, new
ClassificationResult(mhSet.sequenceId,pTotal.Key,pTotal.Value));
}
});
EDIT (Updated code to avoid possible race condition above):
Here's a more robust pattern which also shows selecting a min value in parallel. I think this addresses the concerns mentioned in the comments below regarding a possible race condition:
int minVal = int.MaxValue;
Parallel.ForEach(dictionary.Values, curVal =>
{
int oldVal = Volatile.Read(ref minVal);
//val can equal anything but the oldVal
int val = ~oldVal;
//Keep trying the atomic update until we are sure that either:
//1. CompareExchange successfully changed the value.
//2. Another thread has updated minVal with a smaller number than curVal.
// (in the case of #2, the update is no longer needed)
while (oldval > curVal && oldval != val)
{
val = oldval;
oldval = Interlocked.CompareExchange(ref minVal, curVal, oldval);
}
});
I think using the standard LINQ Libraries this is as fast as you can go.

Categories

Resources