OrderBy and Take in LINQ

OrderBy and Take in LINQ - c#

I was watching some videos on Youtube with programmers interviews. One of the questions is to write a function that returns n-th smallest element of an array.
In the video I watched a lady try to code that in C++ with some recurrency, but I thought that well, in C# it's one liner:
var nth = vals.OrderBy(x => x).Take(i).Last();
Then I realised that I have no idea whether in fact this is good solution, since the next question was about time complexity. I went to docs and all I found is that the object returned by OrderBy has all the information needed to perform full deferred sort when it is enumerated.
So I decided to test it and wrote a class MyComparable : IComparable<MyComparable> with single value and a static counter in CompareTo method:
class MyComparable : IComparable<MyComparable>
{
public MyComparable(int val)
{
Val = val;
}
public static int CompareCount { get; set; }
public int Val { get; set; }
public int CompareTo(MyComparable other)
{
++CompareCount;
if (ReferenceEquals(this, other)) return 0;
if (ReferenceEquals(null, other)) return 1;
return Val.CompareTo(other.Val);
}
}
Then I wrote a loop that finds n-th element in an array:
static void Main(string[] args)
{
var rand = new Random();
var vals = Enumerable.Range(0, 10000)
// .Reverse() // pesimistic scenario
// .OrderBy(x => rand.Next()) // average scenario
.Select(x => new MyComparable(x))
.ToArray();
for (int i = 1; i < 100; i++)
{
var nth = vals.OrderBy(x => x).Take(i).Last();
Console.WriteLine($"i: {i,5}, OrderBy: {MyComparable.CompareCount,10}, value {nth.Val}");
MyComparable.CompareCount = 0;
var my_nth = vals.OrderByInsertion().Take(i).Last();
Console.WriteLine($"i: {i,5}, Insert : {MyComparable.CompareCount,10}, value {my_nth.Val}");
MyComparable.CompareCount = 0;
my_nth = vals.OrderByInsertionWithIndex().Take(i).Last();
Console.WriteLine($"i: {i,5}, Index : {MyComparable.CompareCount,10}, value {my_nth.Val}");
MyComparable.CompareCount = 0;
Console.WriteLine();
Console.WriteLine();
}
}
I also wrote 2 "different" implementations of finding min element, returning it and removing it from a list:
public static IEnumerable<T> OrderByInsertion<T>(this IEnumerable<T> input) where T : IComparable<T>
{
var list = input.ToList();
while (list.Any())
{
var min = list.Min();
yield return min;
list.Remove(min);
}
}
public static IEnumerable<T> OrderByInsertionWithIndex<T>(this IEnumerable<T> input) where T : IComparable<T>
{
var list = input.ToList();
while (list.Any())
{
var minIndex = 0;
for (int i = 1; i < list.Count; i++)
{
if (list[i].CompareTo(list[minIndex]) < 0)
{
minIndex = i;
}
}
yield return list[minIndex];
list.RemoveAt(minIndex);
}
}
The results really were a surprise to me:
i: 1, OrderBy: 19969, value 0
i: 1, Insert : 9999, value 0
i: 1, Index : 9999, value 0
i: 2, OrderBy: 19969, value 1
i: 2, Insert : 19997, value 1
i: 2, Index : 19997, value 1
i: 3, OrderBy: 19969, value 2
i: 3, Insert : 29994, value 2
i: 3, Index : 29994, value 2
i: 4, OrderBy: 19969, value 3
i: 4, Insert : 39990, value 3
i: 4, Index : 39990, value 3
i: 5, OrderBy: 19970, value 4
i: 5, Insert : 49985, value 4
i: 5, Index : 49985, value 4
...
i: 71, OrderBy: 19973, value 70
i: 71, Insert : 707444, value 70
i: 71, Index : 707444, value 70
...
i: 99, OrderBy: 19972, value 98
i: 99, Insert : 985050, value 98
i: 99, Index : 985050, value 98
Just using LINQ OrderBy().Take(n) is by far the most efficient and fastest, which is what I anticipated, but would never have guessed the gap being some orders of magnitude.
So, my question is mostly to interviewers: how would you grade such an answer?
Code:
var nth = vals.OrderBy(x => x).Take(i).Last();
Time complexity:
I don't know the details, but probably OrderBy uses some kind quick-sort, no n log(n), no matter of which n-th element we want.
Would you ask me to implement my own methods like those or would it be enough to use the framework?
EDIT:
So, it turns out that like suggested answer below, OrderedEnumerable uses variation of QuickSelect to sort just the elements up to where you ask for. On the plus side, it caches the order.
While you are able to find n-th element a little faster, it's not class faster, it's some percentage faster. Also, each and every C# programmer will understand your one liner.
I think that my answer during the interview would end up somewhere along "I will use OrderBy, because it is fast enough and writing it takes 10s. If it turns out to be to slow, we can gain some time with QucikSelect, but implementing it nicely takes a lot of time"
Thanks everyone who decided to participate in this and sorry to everyone who wasted their time thinking this was something else :)

Okay let's start with the low-hanging fruit:
Your implementation is wrong. You need to take index + 1 elements from the sequence. To understand this, consider index = 0 and reread the documentation for Take.
Your "benchmark comparison" only works because calling OrderBy() on an IEnumerable does not modify the underlying collection. For what we're going to do it's easier to just allow modifications to the underlying array. As such I've taken the liberty to change your code to generate the values from scratch in every iteration.
Additionally Take(i + 1).Last() is equivalent to ElementAt(i). You should really be using that.
Oh and your benchmark is really not useful, because the more elements of your range you need to consume with Take, the more these algorithms should come closer to each other. As far as I can tell, you're correct with your runtime analysis of O(n log n).
There is a solution that has a time-complexity of O(n) though (not O(log n) as I incorrectly claimed earlier). This is the solution the interviewer was expecting.
For whatever it's worth, the code you've written there is not movable to that solution, because you don't have any control over the sort process.
If you had you could implement Quickselect, (which is the goal here), resulting in a theoretical improvement over the LINQ query you propose here (especially for high indices). The following is a translation of th pseudocode from the wikipedia article on quickselect, based on your code
static T SelectK<T>(T[] values, int left, int right, int index)
where T : IComparable<T>
{
if (left == right) { return values[left]; }
// could select pivot deterministically through median of 3 or something
var pivotIndex = rand.Next(left, right + 1);
pivotIndex = Partition(values, left, right, pivotIndex);
if (index == pivotIndex) {
return values[index];
} else if (index < pivotIndex) {
return SelectK(values, left, pivotIndex - 1, index);
} else {
return SelectK(values, pivotIndex + 1, right, index);
}
}
static int Partition<T>(T[] values, int left, int right, int pivot)
where T : IComparable<T>
{
var pivotValue = values[pivot];
Swap(values, pivot, right);
var storeIndex = left;
for (var i = left; i < right; i++) {
if (values[i].CompareTo(pivotValue) < 0)
{
Swap(values, storeIndex, i);
storeIndex++;
}
}
Swap(values, right, storeIndex);
return storeIndex;
}
A non-representative subsample of a test I've run gives the output:
i: 6724, OrderBy: 52365, value 6723
i: 6724, SelectK: 40014, value 6724
i: 395, OrderBy: 14436, value 394
i: 395, SelectK: 26106, value 395
i: 7933, OrderBy: 32523, value 7932
i: 7933, SelectK: 17712, value 7933
i: 6730, OrderBy: 46076, value 6729
i: 6730, SelectK: 34367, value 6730
i: 6536, OrderBy: 53458, value 6535
i: 6536, SelectK: 18341, value 6536
Since my SelectK implementation uses a random pivot element, there is quite some variation in it's output, (see for example the second run). It's also considerably worse than the highly optimized sorting algorithm implemented in the standard library.
Even then there are cases where SelectK straight up outperforms the standard library even though I didn't put much effort in.
Now replacing the random pivot with a median of 3[1] (which is a pretty bad pivot selector), we can obtain a slightly different SelectK and race that against OrderBy and SelectK.
I've been racing these three horses with 1m elements in the array, using the random sort you already had, requesting an index in the last 20% of the array and got results like the following:
Winning counts: OrderBy 32, SelectK 32, MedianOf3 35
Winning counts: OrderBy 26, SelectK 35, MedianOf3 38
Winning counts: OrderBy 25, SelectK 41, MedianOf3 33
Even for 100k elements and without restricting the index to the end of the array this pattern seems to persist, though not quite as pronounced:
--- 100k elements
Winning counts: OrderBy 24, SelectK 34, MedianOf3 41
Winning counts: OrderBy 33, SelectK 33, MedianOf3 33
Winning counts: OrderBy 32, SelectK 38, MedianOf3 29
--- 1m elements
Winning counts: OrderBy 27, SelectK 32, MedianOf3 40
Winning counts: OrderBy 32, SelectK 38, MedianOf3 29
Winning counts: OrderBy 35, SelectK 31, MedianOf3 33
Generally speaking, a sloppily implemented quickselect outperforms your suggestion in the average case two thirds of the time... I'd say that's a pretty strong indicator, it's the better algorithm to use if you want to get into the nitty gritty details.
Of course your implementation is significantly easier to understand :)
[1] - Implementation taken from this SO answer, incurring 3 comparisons per recursion depth step

Related

Fastest way for Linq to find duplicate Lists?

Given a data structure of:
class TheClass
{
int NodeID;
double Cost;
List<int> NodeIDs;
}
And a List with data:
27 -- 10.0 -- 1, 5, 27
27 -- 10.0 -- 1, 5, 27
27 -- 10.0 -- 1, 5, 27
27 -- 15.5 -- 1, 4, 13, 14, 27
27 -- 10.0 -- 1, 4, 25, 26, 27
27 -- 15.5 -- 1, 4, 13, 14, 27
35 -- 10.0 -- 1, 4, 13, 14, 35
I want to reduce it to the unique NodeIDs lists
27 -- 10.0 -- 1, 5, 27
27 -- 15.5 -- 1, 4, 13, 14, 27
27 -- 10.0 -- 1, 4, 25, 26, 27
35 -- 10.0 -- 1, 4, 13, 14, 35
Then I'll be summing the Cost column (Node 27 total cost: 10.0 + 15.5 + 10.0 = 35.5) -- that part is straight forward.
What is the fastest way to remove the duplicate rows / find uniques?
Production data set will have NodeIDs lists of 100 to 200 IDs, about 1,500 in List with around 500 being unique.
I'm 100% focused on speed -- if adding some other data would help, I'm happy to (I've tried hashing the lists into a SHA value, but that turned out slower than my current grunt exhaustive search).

.GroupBy(x=> string.Join(",", x.NodeIDs)).Select(x=>x.First())
That should be faster for big data than Distinct.

If you want to remove duplicate objects according to equal lists you could create a custom IEqualityComparer<T> for lists and use that for Enumerable.GroupBy. Then you just need to create new instances of your class for each group and sum up Cost.
Here is a possible implementation (from):
public class ListEqualityComparer<T> : IEqualityComparer<List<T>>
{
public bool Equals(List<T> lhs, List<T> rhs)
{
return lhs.SequenceEqual(rhs);
}
public int GetHashCode(List<T> list)
{
unchecked
{
int hash = 23;
foreach (T item in list)
{
hash = (hash * 31) + (item == null ? 0 : item.GetHashCode());
}
return hash;
}
}
}
and here is a query that selects one (unique) instance per group:
var nodes = new List<TheClass>(); // fill ....
var uniqueAndSummedNodes = nodes
.GroupBy(n => n.NodeIDs, new ListEqualityComparer<int>())
.Select(grp => new TheClass
{
NodeID = grp.First().NodeID, // just use the first, change accordingly
Cost = grp.Sum(n => n.Cost),
NodeIDs = grp.Key
});
nodes = uniqueAndSummedNodes.ToList();
This implementation uses SequenceEqual which takes the order and the number of occurences of each number in the list into account.
Edit: I've only just seen that you don't want to sum up the group's Costs but to sum up all groups' Cost, that's simple:
double totalCost = nodes.Sum(n => n.Cost);
If you dont want to sum up the group itself replace
...
Cost = grp.Sum(n => n.Cost),
with
...
Cost = grp.First().Cost, // presumes that all are the same

How to sort a number sequence that wraps around

I have a sequence of objects, that each have a sequence number that goes from 0 to ushort.MaxValue (0-65535). I have at max about 10 000 items in my sequence, so there should not be any duplicates, and the items are mostly sorted due to the way they are loaded. I only need to access the data sequentially, I don't need them in a list, if that can help. It is also something that is done quite frequently, so it cannot have a too high Big-O.
What is the best way to sort this list?
An example sequence could be (in this example, assume the sequence number is a single byte and wraps at 255):
240 241 242 243 244 250 251 245 246 248 247 249 252 253 0 1 2 254 255 3 4 5 6
The correct order would then be
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 0 1 2 3 4 5 6
I have a few different approaches, including making a array of ushort.MaxValue size, and just incrementing the position, but that seems like a very inefficient way, and I have some problems when the data I receive have a jump in sequence. However, it's O(1) in performance..
Another approach is to order the items normally, then find the split (6-240), and move the first items to the end. But I'm not sure if that is a good idea.
My third idea is to loop the sequence, until I find a wrong sequence number, look ahead until I find the correct one, and move it to its correct position. However, this can potentially be quite slow if there is a wrong sequence number early on.

Is this what you are looking for?
var groups = ints.GroupBy(x => x < 255 / 2)
.OrderByDescending(list => list.ElementAt(0))
.Select(x => x.OrderBy(u => u))
.SelectMany(i => i).ToList();
Example
In:
int[] ints = new int[] { 88, 89, 90, 91, 92, 0, 1, 2, 3, 92, 93, 94, 95, 96, 97, 4, 5, 6, 7, 8, 99, 100, 9, 10, 11, 12, 13 };
Out:
88 89 90 91 92 92 93 94 95 96 97 99 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13

I realise this is an old question byte I also needed to do this and would have liked an answer so...
Use a SortedSet<FileData> with a custom comparer;
where FileData contains information about the files you are working with
e.g.
struct FileData
{
public ushort SequenceNumber;
...
}
internal class Sequencer : IComparer<FileData>
{
public int Compare(FileData x, FileData y)
{
ushort comparer = (ushort)(x.SequenceNumber - y.SequenceNumber);
if (comparer == 0) return 0;
if (comparer < ushort.MaxValue / 2) return 1;
return -1;
}
}
As you read file information from disk add them to your SortedSet
When you read them out of the SortedSet they are now in the correct order
Note that the SortedSet uses a Red-Black Internally which should give you a nice balance between performance and memory
Insertion is O(log n)
Traversal is O(n)

Algorithm for sum of integers [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Finding all possible combinations of numbers to reach a given sum
I have to create method which from array of numbers selects numbers, which sum will be exact as required one or if such doesn't exist select the minimal greater one.
What would be the algorithm of this function?
public int[] selectExactSum(int[] X, int SUM) {
}
example:
numbers are: {5, 2, 8, 4, 6} and required sum is 12.
The result would be: {2, 4, 6}
If required sum is 13, the result would be:{2, 8, 4} - so, the sum would be in this case 14 - the first minimal greater one.
If Required sum is 15, the possible results would be: {5, 2, 8} or {5, 4, 6}. In this case return the one of your choice - probably the first one you get.
What would be the algorithm for custom numbers and sum?
Thanks,
Simon

This is a generalized case of the problem called subset sum. It's an NP-complete problem, thus the best known algorithm is pseudo-polynomial.
If you understand the above linked algorithm, you can deduct the modification necessary to solve your problem.

How about recursively?
public static int[] SelectExactSum(int[] x, int sum) {
int[]
rest = x.Skip(1).ToArray(),
with = x.Length == 1 ? x : x.Take(1).Concat(SelectExactSum(rest, sum - x[0])).ToArray(),
without = x.Length == 1 ? new int[0] : SelectExactSum(rest, sum);
int withSum = with.Sum(), withoutSum = without.Sum();
return
withSum >= sum ? (withoutSum >= sum ? (withSum < withoutSum ? with : without) : with) :
withoutSum >= sum ? without : new int[0];
}
Note: Calling SelectExactSum(new int[] {5,2,8,4,6}, 13) doesn't return {2,8,4} as stated in the question, but {5,8} as that actually sums up to 13.

Took me about 15 minutes to made it, you can see it running here:
http://jesuso.net/projects/selectExactSum/index.php?numbers=5%2C2%2C8%2C4%2C6&reqSum=15
And here's the code:
http://jesuso.net/projects/selectExactSum/selectExactSum.txt
I made it as simple as possible, but its made on PHP, let me know if you need some help translating it to c#.

Custom order by, is it possible?

I have the following collection:
-3, -2, -1, 0, 1, 2, 3
How can I in a single order by statement sort them in the following form:
The negative numbers are sorted first by their (absolute value) then the positive numbers.
-1, -2, -3, 0, 1, 2, 3

Combination sorting, first by the sign, then by the absolute value:
list.OrderBy(x => Math.Sign(x)).ThenBy(x => Math.Abs(x));
or:
from x in list
orderby Math.Sign(x), Math.Abs(x)
select x;
This is conceptually similar to the SQL statement:
SELECT x
FROM list
ORDER BY SIGN(x), ABS(x)
In LINQ-to-Objects, the sort is performed only once, not twice.
WARNING: Math.Abs(x) will fail if x == int.MinValue. If this marginal case is important, then you have to handle it separately.

var numbers = new[] { -3, -2, -1, 0, 1, 2, 3 };
var customSorted = numbers.OrderBy(n => n < 0 ? int.MinValue - n : n);
The idea here is to compare non-negative numbers by the value they have. And compare negative numbers with the value int.MinValue - n which is -2147483648 - n and because n is negative, the higher negative number we, the lower negative result the outcome will be.
It doesn't work when the list itself contains the number int.MinValue because this evaluates to 0 which would be equal to 0 itself. As Richard propose it could be made with long´s if you need the full range but the performance will be slightly impaired by this.

Try something like (VB.Net example)
Orderby(Function(x) iif(x<0, Math.Abs(x), x*1000))
...if the values are <1000

You could express it in LINQ, but if I were reading the code two years later, I'd prefer to see something like:
list.OrderBy(i=>i, new NegativeThenPositiveByAscendingAbsoluteValueComparer());
You will need to implement IComparer.

How do I order an array of floating point numbers using a criteria other than size, using LINQ?

For example, I have an array of floating point numbers:
float[] numbers = new float[] { 1, 34, 65, 23, 56, 8, 5, 3, 234 };
If I use:
Array.Sort(numbers);
Then the array is sorted by the size of the number.
I want to sort the numbers by another criteria, so element A should go before element B if f(A) < f(B), rather than the usual of A < B.
So, for example, If I want to sort them according to there value modulo 5. The array would become:
5, 65, 1, 56, 3, 8, 23, 34, 234
I think it can be done through LINQ, but I'm not sure how.

I want to sort the numbers by another criteria, so element A should go before element B if f(A) < f(B)
numbers.OrderBy(f);

You can use the Comparison<T> overload of Array.Sort:
Array.Sort(numbers, (a,b) => (a % 5).CompareTo(b % 5));
Comparison<T> is just a delegate, so you can use lambdas / anonymous methods. It's not LINQ, but I think it's what you meant.

Using LINQ:
var result = from n in numbers orderby n % 5, n select n;
var sortedNumbers = result.ToArray();
Alternately:
var result = numbers.OrderBy(n => n % 5).ThenBy(n => n);
Ordering by mod 5, then by the number yields the results in the order you specified.

Have a look at IComparer ;-) and let an List sort the elements for you with your custom comparer.

More Brute even! I suggest you to make an Array of f(x) and then sort on that!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

OrderBy and Take in LINQ - c#

Related

Fastest way for Linq to find duplicate Lists?

How to sort a number sequence that wraps around

Algorithm for sum of integers [duplicate]

Custom order by, is it possible?

How do I order an array of floating point numbers using a criteria other than size, using LINQ?

Categories

Resources