I am working on some C# code dealing with problems like moving averages, where I often need to take a List / IEnumerable and work on chunks of consecutive data. The F# Seq module has a great function, windowed, which taking in a Sequence, returns a sequence of chunks of consecutive elements.
Does C# have an equivalent function out-of-the-box with LINQ?
You can always just call SeqModule.Windowed from C#, you just need to reference FSharp.Core.Dll. The function names are also slightly mangled, so you call Windowed rather than windowed, so that it fits with the C# capitalisation conventions
You could always roll your own (or translate the one from F# core):
let windowed windowSize (source: seq<_>) =
checkNonNull "source" source
if windowSize <= 0 then invalidArg "windowSize" (SR.GetString(SR.inputMustBeNonNegative))
seq { let arr = Microsoft.FSharp.Primitives.Basics.Array.zeroCreateUnchecked windowSize
let r = ref (windowSize-1)
let i = ref 0
use e = source.GetEnumerator()
while e.MoveNext() do
arr.[!i] <- e.Current
i := (!i + 1) % windowSize
if !r = 0 then
yield Array.init windowSize (fun j -> arr.[(!i+j) % windowSize])
else
r := (!r - 1) }
My attempt looks like this, it's way slower than just calling F# directly (as suggested by John Palmer). I'm guessing it's because of F# using an Unchecked array.:
public static IEnumerable<T[]> Windowed<T>(this IEnumerable<T> list, int windowSize)
{
//Checks elided
var arr = new T[windowSize];
int r = windowSize - 1, i = 0;
using(var e = list.GetEnumerator())
{
while(e.MoveNext())
{
arr[i] = e.Current;
i = (i + 1) % windowSize;
if(r == 0)
yield return ArrayInit<T>(windowSize, j => arr[(i + j) % windowSize]);
else
r = r - 1;
}
}
}
public static T[] ArrayInit<T>(int size, Func<int, T> func)
{
var output = new T[size];
for(var i = 0; i < size; i++) output[i] = func(i);
return output;
}
John Palmer's answer is great, here is an example based on his answer.
var numbers = new[] {1, 2, 3, 4, 5};
var windowed = SeqModule.Windowed(2, numbers);
You may (or not) want to add ToArray() to the end, without ToArray, the return type is still in F# world (Sequence). With ToArray, it is back to C# world (Array).
The Reactive Extensions have a few operators to help with this, such as Buffer and Window. The Interactive Extensions, which can be found in the experimental branch, add these and a significant number of additional operators to LINQ.
Related
if there are N values, each value can be drawn from a sub-set of values, say (1,2,3) for example, then how to derive the possible combinations? note that each one will contain N values, not the subsets.
for example, let's say N = 4, the possible outputs could be:
1,1,1,1
1,2,1,3
2,1,1,3
...
If you have M different values and want to generate N-element combinations, consider these combinations as N-digit numbers in M-ary numeral system. There are M^N such combinations. Pseudocode:
for i = 0 to Power(M, N) - 1 do
represent i in M-ary system:
tmp = i
for k = 0 to N - 1 do
digit[k] = tmp % M //integer modulo
tmp = tmp / M //integer division
Example: if N = 3, M = 3, there are 27 combinations , and at 11-th step we have
11(dec) = 102 (trinary), combination is (1,0,2) if set is zero-based, or (2,1,3) if it is 1-based
This is another solution, using a static utility class:
static class SequencesCalculation
{
public static List<int[]> Calculate(int[] availableValues, int digitsCount)
{
var combIndexes = CalculateRecursive(new List<int[]>(), availableValues.Length, new int[digitsCount], digitsCount - 1);
var result = combIndexes.Select(x => x.Select(i => availableValues[i]).ToArray()).ToList();
return result;
}
static List<int[]> CalculateRecursive(List<int[]> doneCombinations, int valuesCount, int[] array, int i)
{
doneCombinations.Add((int[])array.Clone());
//base case
if (array.All(x => x == valuesCount - 1))
return doneCombinations;
NextCombination(array, valuesCount, i);
return CalculateRecursive(doneCombinations, valuesCount, array, i);
}
static void NextCombination(int[] array, int valuesCount, int i)
{
array[i] = (array[i] + 1) % valuesCount;
if (i == 0)
return;
if (array[i] == 0)
NextCombination(array, valuesCount, i - 1);
}
}
I think the #MBo solution is far more elegant than mine. I don't know which is faster. His solution has lot of arithmetic divisions, but mine has much stack allocation forward and backward because of all the recursive method calls.
Anyway, I think that his solution should be checked as the correct answer.
I am trying to make my own multithreaded mergesort algorithm by using an ArrayList. I am familiar with this method in Java but trying to bring it over to c# is not working as planned. I get the following error when trying to compare two ArrayList items Error 1 Operator '<' cannot be applied to operands of type 'object' and 'object'. I know you cannot directly compare two objects like that, in Java you can use compareTo or something like that, is there any equivalent for c#?
Here is the code causing the error in case you need it, bear in mind that I copied this from one of my Java programs that worked with integer arrays.
int size = (last - first) + 1;
ArrayList temp = new ArrayList();
int mid = (first + last) / 2;
int i1 = 0;
int i2 = first;
int i3 = mid + 1;
while(i2 <= mid && i3 <= last)
{
if(list[i2] < list[i3])
temp[i1++] = list[i2++];
else temp[i1++] = list[i3++];
}
while(i2 <= mid)
temp[i1++] = list[i2++];
while(i3 <= last)
temp[i1++] = list[i3++];
i3 = first;
for(i1 = 0; i1 < temp.Count; i1++, i3++)
list[i3] = temp[i1];
I think just use a SortedList of ints.
var sl = new SortedList();
sl.Add(15, 15);
sl.Add(443, 443);
sl.Add(2, 2);
sl.Add(934, 934);
sl.Add(55, 55);
foreach (var item in sl.Values)
{
Console.WriteLine(item); // Outputs 2, 15, 55, 443, 934
}
Or else a generic List and call Sort (better perf I think).
var list = new List<int>();
list.Add(5);
list.Add(1);
list.Add(59);
list.Add(4);
list.Sort();
foreach (var element in list)
{
Console.WriteLine(element); // Outputs 1, 4, 5, 59
}
I would suggest looking into the IComparer<T> interface. You can make a version of your MergeSort algorithm that takes an IComparer<T> which can be used to compare the objects for sorting. It would probably give you similar functionality to what you are used to.
You could do this in addition to defining a version of MergeSort that restricts the type to IComparable<T>. This way, between both versions of the function, you can handle objects that implement the interface already and also allow your users to provide a comparison for objects that don't implement it.
You could put the MergeSort in as an Extension Method on the IList<T> interface as such.
public static class MergeSortExtension
{
public static IList<T> MergeSort<T>(this IList<T> list) where T : IComparable<T>
{
return list.MergeSort(Comparer<T>.Default);
}
public static IList<T> MergeSort<T>(this IList<T> list, IComparer<T> comparer)
{
// Sort code.
}
}
You could cast each item to IComparable or do an as IComparable and check for null (casting will throw an exception if the object instance doesn't implement the interface). But what #DavidG L suggested probably the way to go. But make a constraint on T that it must implement IComparable.
Taking all of your suggestions into account I have come up with the following solution:
public static void merge(List<T> list , int first, int last) {
int size = (last - first) + 1;
List<T> temp = new List<T>();
IEnumerable<IComparable> sorter = (IEnumerable<IComparable>)list;
int mid = (first + last) / 2;
int i1 = 0;
int i2 = first;
int i3 = mid + 1;
while(i2 <= mid && i3 <= last)
{
if (sorter.ElementAt(i2).CompareTo(sorter.ElementAt(i3)) < 0)
temp[i1++] = list[i2++];
else temp[i1++] = list[i3++];
}
while(i2 <= mid)
temp[i1++] = list[i2++];
while(i3 <= last)
temp[i1++] = list[i3++];
i3 = first;
for(i1 = 0; i1 < temp.Count; i1++, i3++)
list[i3] = temp[i1];
}
Thank you for all the help, I do not get any errors anymore.
The problem is that ArrayList is not a generic collection, so it allows to call only object's methods on any item. You can use LINQ to cast to generic IEnumerable<int>, then you will be able to call methods on int, including comparing and ordering:
ArrayList al = new ArrayList();
al.Add(1);
al.Add(2);
IEnumerable<int> coll = al.Cast<int>();
if (coll.ElementAt(0) < coll.ElementAt(1))
// ...
or:
var ordered = coll.OrderBy(n => n).ToList();
If your ArrayList contains objects of different types, you should use OfType<int> to filetr out ints, but the proper way would be to use a typed collection like List<int> instead of ArrayList.
Let's say I have a collection of some type, e.g.
IEnumerable<double> values;
Now I need to extract the k highest values from that collection, for some parameter k. This is a very simple way to do this:
values.OrderByDescending(x => x).Take(k)
However, this (if I understand this correctly) first sorts the entire list, then picks the first k elements. But if the list is very large, and k is comparatively small (smaller than log n), this is not very efficient - the list is sorted in O(nlog n), but I figure selecting the k highest values from a list should be more like O(nk).
So, does anyone have any suggestion for a better, more efficient way to do this?
This gives a bit of a performance increase. Note that it's ascending rather than descending but you should be able to repurpose it (see comments):
static IEnumerable<double> TopNSorted(this IEnumerable<double> source, int n)
{
List<double> top = new List<double>(n + 1);
using (var e = source.GetEnumerator())
{
for (int i = 0; i < n; i++)
{
if (e.MoveNext())
top.Add(e.Current);
else
throw new InvalidOperationException("Not enough elements");
}
top.Sort();
while (e.MoveNext())
{
double c = e.Current;
int index = top.BinarySearch(c);
if (index < 0) index = ~index;
if (index < n) // if (index != 0)
{
top.Insert(index, c);
top.RemoveAt(n); // top.RemoveAt(0)
}
}
}
return top; // return ((IEnumerable<double>)top).Reverse();
}
Consider the below method:
static IEnumerable<double> GetTopValues(this IEnumerable<double> values, int count)
{
var maxSet = new List<double>(Enumerable.Repeat(double.MinValue, count));
var currentMin = double.MinValue;
foreach (var t in values)
{
if (t <= currentMin) continue;
maxSet.Remove(currentMin);
maxSet.Add(t);
currentMin = maxSet.Min();
}
return maxSet.OrderByDescending(i => i);
}
And the test program:
static void Main()
{
const int SIZE = 1000000;
const int K = 10;
var random = new Random();
var values = new double[SIZE];
for (var i = 0; i < SIZE; i++)
values[i] = random.NextDouble();
// Test values
values[SIZE/2] = 2.0;
values[SIZE/4] = 3.0;
values[SIZE/8] = 4.0;
IEnumerable<double> result;
var stopwatch = new Stopwatch();
stopwatch.Start();
result = values.OrderByDescending(x => x).Take(K).ToArray();
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
result = values.GetTopValues(K).ToArray();
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
}
On my machine results are 1002 and 14.
Another way of doing this (haven't been around C# for years, so pseudo-code it is, sorry) would be:
highestList = []
lowestValueOfHigh = 0
for every item in the list
if(lowestValueOfHigh > item) {
delete highestList[highestList.length - 1] from list
do insert into list with binarysearch
if(highestList[highestList.length - 1] > lowestValueOfHigh)
lowestValueOfHigh = highestList[highestList.length - 1]
}
I wouldn't state anything about performance without profiling. In this answer I'll just try to implement O(n*k) take-one-enumeration-for-one-max-value approach. Personally I think that ordering approach is superior. Anyway:
public static IEnumerable<double> GetMaxElements(this IEnumerable<double> source)
{
var usedIndices = new HashSet<int>();
while (true)
{
var enumerator = source.GetEnumerator();
int index = 0;
int maxIndex = 0;
double? maxValue = null;
while(enumerator.MoveNext())
{
if((!maxValue.HasValue||enumerator.Current>maxValue)&&!usedIndices.Contains(index))
{
maxValue = enumerator.Current;
maxIndex = index;
}
index++;
}
usedIndices.Add(maxIndex);
if (!maxValue.HasValue) break;
yield return maxValue.Value;
}
}
Usage:
var biggestElements = values.GetMaxElements().Take(3);
Downsides:
Method assumes that source IEnumerable has an order
Method uses additional memory/operations to save used indices.
Advantage:
You can be sure that it takes one enumeration to get next max value.
See it running
Here is a Linqy TopN operator for enumerable sequences, based on the PriorityQueue<TElement, TPriority> collection:
/// <summary>
/// Selects the top N elements from the source sequence. The selected elements
/// are returned in descending order.
/// </summary>
public static IEnumerable<T> TopN<T>(this IEnumerable<T> source, int n,
IComparer<T> comparer = default)
{
ArgumentNullException.ThrowIfNull(source);
if (n < 1) throw new ArgumentOutOfRangeException(nameof(n));
PriorityQueue<bool, T> top = new(comparer);
foreach (var item in source)
{
if (top.Count < n)
top.Enqueue(default, item);
else
top.EnqueueDequeue(default, item);
}
List<T> topList = new(top.Count);
while (top.TryDequeue(out _, out var item)) topList.Add(item);
for (int i = topList.Count - 1; i >= 0; i--) yield return topList[i];
}
Usage example:
IEnumerable<double> topValues = values.TopN(k);
The topValues sequence contains the k maximum values in the values, in descending order. In case there are duplicate values in the topValues, the order of the equal values is undefined (non-stable sort).
For a SortedSet<T>-based implementation that compiles on .NET versions earlier than .NET 6, you could look at the 5th revision of this answer.
An operator PartialSort with similar functionality exists in the MoreLinq package. It's not implemented optimally though (source code). It performs invariably a binary search for each item, instead of comparing it with the smallest item in the top list, resulting in many more comparisons than necessary.
Surprisingly the LINQ itself is well optimized for the OrderByDescending+Take combination, resulting in excellent performance. It's only slightly slower than the TopN operator above. This applies to all versions of the .NET Core and later (.NET 5 and .NET 6). It doesn't apply to the .NET Framework platform, where the complexity is O(n*log n) as expected.
A demo that compares 4 different approaches can be found here. It compares:
values.OrderByDescending(x => x).Take(k).
values.OrderByDescending(x => x).HideIdentity().Take(k), where HideIdentity is a trivial LINQ propagator that hides the identity of the underlying enumerable, and so it effectively disables the LINQ optimizations.
values.PartialSort(k, MoreLinq.OrderByDirection.Descending) (MoreLinq).
values.TopN(k)
Below is a typical output of the demo, running in Release mode on .NET 6:
.NET 6.0.0-rtm.21522.10
Extract the 100 maximum elements from 2,000,000 random values, and calculate the sum.
OrderByDescending+Take Duration: 156 msec, Comparisons: 3,129,640, Sum: 99.997344
OrderByDescending+HideIdentity+Take Duration: 1,415 msec, Comparisons: 48,602,298, Sum: 99.997344
MoreLinq.PartialSort Duration: 277 msec, Comparisons: 13,999,582, Sum: 99.997344
TopN Duration: 62 msec, Comparisons: 2,013,207, Sum: 99.997344
Consider this List<string>
List<string> data = new List<string>();
data.Add("Text1");
data.Add("Text2");
data.Add("Text3");
data.Add("Text4");
The problem I had was: how can I get every combination of a subset of the list?
Kinda like this:
#Subset Dimension 4
Text1;Text2;Text3;Text4
#Subset Dimension 3
Text1;Text2;Text3;
Text1;Text2;Text4;
Text1;Text3;Text4;
Text2;Text3;Text4;
#Subset Dimension 2
Text1;Text2;
Text1;Text3;
Text1;Text4;
Text2;Text3;
Text2;Text4;
#Subset Dimension 1
Text1;
Text2;
Text3;
Text4;
I came up with a decent solution which a think is worth to share here.
Similar logic as Abaco's answer, different implementation....
foreach (var ss in data.SubSets_LB())
{
Console.WriteLine(String.Join("; ",ss));
}
public static class SO_EXTENSIONS
{
public static IEnumerable<IEnumerable<T>> SubSets_LB<T>(
this IEnumerable<T> enumerable)
{
List<T> list = enumerable.ToList();
ulong upper = (ulong)1 << list.Count;
for (ulong i = 0; i < upper; i++)
{
List<T> l = new List<T>(list.Count);
for (int j = 0; j < sizeof(ulong) * 8; j++)
{
if (((ulong)1 << j) >= upper) break;
if (((i >> j) & 1) == 1)
{
l.Add(list[j]);
}
}
yield return l;
}
}
}
I think, the answers in this question need some performance tests. I'll give it a go. It is community wiki, feel free to update it.
void PerfTest()
{
var list = Enumerable.Range(0, 21).ToList();
var t1 = GetDurationInMs(list.SubSets_LB);
var t2 = GetDurationInMs(list.SubSets_Jodrell2);
var t3 = GetDurationInMs(() => list.CalcCombinations(20));
Console.WriteLine("{0}\n{1}\n{2}", t1, t2, t3);
}
long GetDurationInMs(Func<IEnumerable<IEnumerable<int>>> fxn)
{
fxn(); //JIT???
var count = 0;
var sw = Stopwatch.StartNew();
foreach (var ss in fxn())
{
count = ss.Sum();
}
return sw.ElapsedMilliseconds;
}
OUTPUT:
1281
1604 (_Jodrell not _Jodrell2)
6817
Jodrell's Update
I've built in release mode, i.e. optimizations on. When I run via Visual Studio I don't get a consistent bias between 1 or 2, but after repeated runs LB's answer wins, I get answers approaching something like,
1190
1260
more
but if I run the test harness from the command line, not via Visual Studio, I get results more like this
987
879
still more
EDIT
I've accepted the performance gauntlet, what follows is my amalgamation that takes the best of all answers. In my testing, it seems to have the best performance yet.
public static IEnumerable<IEnumerable<T>> SubSets_Jodrell2<T>(
this IEnumerable<T> source)
{
var list = source.ToList();
var limit = (ulong)(1 << list.Count);
for (var i = limit; i > 0; i--)
{
yield return list.SubSet(i);
}
}
private static IEnumerable<T> SubSet<T>(
this IList<T> source, ulong bits)
{
for (var i = 0; i < source.Count; i++)
{
if (((bits >> i) & 1) == 1)
{
yield return source[i];
}
}
}
Same idea again, almost the same as L.B's answer but my own interpretation.
I avoid the use of an internal List and Math.Pow.
public static IEnumerable<IEnumerable<T>> SubSets_Jodrell(
this IEnumerable<T> source)
{
var count = source.Count();
if (count > 64)
{
throw new OverflowException("Not Supported ...");
}
var limit = (ulong)(1 << count) - 2;
for (var i = limit; i > 0; i--)
{
yield return source.SubSet(i);
}
}
private static IEnumerable<T> SubSet<T>(
this IEnumerable<T> source,
ulong bits)
{
var check = (ulong)1;
foreach (var t in source)
{
if ((bits & check) > 0)
{
yield return t;
}
check <<= 1;
}
}
You'll note that these methods don't work with more than 64 elements in the intial set but it starts to take a while then anyhow.
I developed a simple ExtensionMethod for lists:
/// <summary>
/// Obtain all the combinations of the elements contained in a list
/// </summary>
/// <param name="subsetDimension">Subset Dimension</param>
/// <returns>IEnumerable containing all the differents subsets</returns>
public static IEnumerable<List<T>> CalcCombinations<T>(this List<T> list, int subsetDimension)
{
//First of all we will create a binary matrix. The dimension of a single row
//must be the dimension of list
//on which we are working (we need a 0 or a 1 for every single element) so row
//dimension is to obtain a row-length = list.count we have to
//populate the matrix with the first 2^list.Count binary numbers
int rowDimension = Convert.ToInt32(Math.Pow(2, list.Count));
//Now we start counting! We will fill our matrix with every number from 1
//(0 is meaningless) to rowDimension
//we are creating binary mask, hence the name
List<int[]> combinationMasks = new List<int[]>();
for (int i = 1; i < rowDimension; i++)
{
//I'll grab the binary rapresentation of the number
string binaryString = Convert.ToString(i, 2);
//I'll initialize an array of the apropriate dimension
int[] mask = new int[list.Count];
//Now, we have to convert our string in a array of 0 and 1, so first we
//obtain an array of int then we have to copy it inside our mask
//(which have the appropriate dimension), the Reverse()
//is used because of the behaviour of CopyTo()
binaryString.Select(x => x == '0' ? 0 : 1).Reverse().ToArray().CopyTo(mask, 0);
//Why should we keep masks of a dimension which isn't the one of the subset?
// We have to filter it then!
if (mask.Sum() == subsetDimension) combinationMasks.Add(mask);
}
//And now we apply the matrix to our list
foreach (int[] mask in combinationMasks)
{
List<T> temporaryList = new List<T>(list);
//Executes the cycle in reverse order to avoid index out of bound
for (int iter = mask.Length - 1; iter >= 0; iter--)
{
//Whenever a 0 is found the correspondent item is removed from the list
if (mask[iter] == 0)
temporaryList.RemoveAt(iter);
}
yield return temporaryList;
}
}
}
So considering the example in the question:
# Row Dimension of 4 (list.Count)
Binary Numbers to 2^4
# Binary Matrix
0 0 0 1 => skip
0 0 1 0 => skip
[...]
0 1 1 1 => added // Text2;Text3;Text4
[...]
1 0 1 1 => added // Text1;Text3;Text4
1 1 0 0 => skip
1 1 0 1 => added // Text1;Text2;Text4
1 1 1 0 => added // Text1;Text2;Text3
1 1 1 1 => skip
Hope this can help someone :)
If you need clarification or you want to contribute feel free to add answers or comments (which one is more appropriate).
Is it easier to use linq to do the same thing as this code? (check and see how many values are equal to the value following):
int[] values = {1,2,3,3,5,6,7};
int counter=0;
for (int f =0; f< values.Length-1; f++)
{
if(values[f]==values[f+1])
{
counter++;
}
}
Yes, you can do this quite easily with Zip in .NET 4:
var count = values.Zip(values.Skip(1), (x, y) => new { x, y })
.Count(pair => pair.x == pair.y);
The trick of combining Zip and Skip(1) takes a little bit of getting your head round, but it's a really neat one. Basically you start with a sequence of n values, and the result is a sequence of n - 1 pairs, each of which contains a value and its successor.
From there, it's just a matter of counting the pairs which are the same :)
Note that the sequence in question will be evaluated twice, so you wouldn't want to do this for anything which was lazily evaluated, or which wouldn't give the same results when evaluated twice.
No I don't think it's easier. The code you have is easy to understand and concise, I wouldn't refactor it with linq. Make sure you test it a bit more though, as you might get an out of bounds error on the last loop.
There exists a very neat solution:
var list = new[] { 1, 2, 3, 3, 5, 6, 7, 7 };
var pairs = SeqModule.Pairwise(list);
var count = pairs.Count(p => p.Item1 == p.Item2);
This requires that you reference the assembly FSharp.Core and you use using Microsoft.FSharp.Collections;. Alternatively, you can implement the Pairwise method as a extension method and thereby avoid using another assembly.
For anyone who might be interested in F#, here is a solution:
let lst = [1;2;3;3;5;6;7;7]
let count = lst |> Seq.pairwise
|> Seq.filter (fun (x, y) -> x = y)
|> Seq.length
Try this:
var set = values.Where((value, i) => i < (values.Length - 1) ? values[i] == values[i + 1] : false);
EDIT:
Sorry, forgot to add set.Count() to get the result :-)
Given that the values are in an array, you can do this:
int duplicates = values.Skip(1).Where((n, i) => n == values[i]).Count();
You could say something like:
private static int CountDoubles( IEnumerable<int> Xs )
{
int? prev = null ;
int count = Xs.Count( curr => {
bool isMatch = prev.HasValue && prev == curr ;
prev = curr ;
return isMatch ;
} ) ;
return count ;
}
but that's neither simpler nor cleaner than your original version. I'd tweak yours slightly, though::
public static int CountDoubles( int[] Xs )
{
int n = 0 ;
for ( int i = 1 ; i < Xs.Length ; ++i )
{
n += ( Xs[i-1] == Xs[i] ? 1 : 0 ) ;
}
return n ;
}