Bug in Microsoft's internal PriorityQueue<T>?

Bug in Microsoft's internal PriorityQueue<T>? - c#

In the .NET Framework in PresentationCore.dll, there is a generic PriorityQueue<T> class whose code can be found here.
I wrote a short program to test the sorting, and the results weren't great:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using MS.Internal;
namespace ConsoleTest {
public static class ConsoleTest {
public static void Main() {
PriorityQueue<int> values = new PriorityQueue<int>(6, Comparer<int>.Default);
Random random = new Random(88);
for (int i = 0; i < 6; i++)
values.Push(random.Next(0, 10000000));
int lastValue = int.MinValue;
int temp;
while (values.Count != 0) {
temp = values.Top;
values.Pop();
if (temp >= lastValue)
lastValue = temp;
else
Console.WriteLine("found sorting error");
Console.WriteLine(temp);
}
Console.ReadLine();
}
}
}
Results:
2789658
3411390
4618917
6996709
found sorting error
6381637
9367782
There is a sorting error, and if the sample size is increased, the number of sorting errors increases somewhat proportionally.
Have I done something wrong? If not, where is the bug in the code of the PriorityQueue class located exactly?

The behavior can be reproduced using the initialization vector [0, 1, 2, 4, 5, 3]. The result is:
[0, 1, 2, 4, 3, 5]
(we can see that 3 is incorrectly placed)
The Push algorithm is correct. It builds a min-heap in a straightforward way:
Start from the bottom right
If the value is greater than the parent node then insert it and return
Otherwise, put instead the parent in the bottom right position, then try inserting the value at the parent place (and keep swapping up the tree until the right place has been found)
The resulting tree is:
0
/ \
/ \
1 2
/ \ /
4 5 3
The issue is with the Pop method. It starts by considering the top node as a "gap" to fill (since we popped it):
*
/ \
/ \
1 2
/ \ /
4 5 3
To fill it, it searches for the lowest immediate child (in this case: 1). It then moves the value up to fill the gap (and the child is now the new gap):
1
/ \
/ \
* 2
/ \ /
4 5 3
It then does the exact same thing with the new gap, so the gap moves down again:
1
/ \
/ \
4 2
/ \ /
* 5 3
When the gap has reached the bottom, the algorithm... takes the bottom-rightmost value of the tree and uses it to fill the gap:
1
/ \
/ \
4 2
/ \ /
3 5 *
Now that the gap is at the bottom-rightmost node, it decrements _count to remove the gap from the tree:
1
/ \
/ \
4 2
/ \
3 5
And we end up with... A broken heap.
To be perfectly honest, I don't understand what the author was trying to do, so I can't fix the existing code. At most, I can swap it with a working version (shamelessly copied from Wikipedia):
internal void Pop2()
{
if (_count > 0)
{
_count--;
_heap[0] = _heap[_count];
Heapify(0);
}
}
internal void Heapify(int i)
{
int left = (2 * i) + 1;
int right = left + 1;
int smallest = i;
if (left <= _count && _comparer.Compare(_heap[left], _heap[smallest]) < 0)
{
smallest = left;
}
if (right <= _count && _comparer.Compare(_heap[right], _heap[smallest]) < 0)
{
smallest = right;
}
if (smallest != i)
{
var pivot = _heap[i];
_heap[i] = _heap[smallest];
_heap[smallest] = pivot;
Heapify(smallest);
}
}
Main issue with that code is the recursive implementation, which will break if the number of elements is too large. I strongly recommend using an optimized thirdparty library instead.
Edit: I think I found out what is missing. After taking the bottom-rightmost node, the author just forgot to rebalance the heap:
internal void Pop()
{
Debug.Assert(_count != 0);
if (_count > 1)
{
// Loop invariants:
//
// 1. parent is the index of a gap in the logical tree
// 2. leftChild is
// (a) the index of parent's left child if it has one, or
// (b) a value >= _count if parent is a leaf node
//
int parent = 0;
int leftChild = HeapLeftChild(parent);
while (leftChild < _count)
{
int rightChild = HeapRightFromLeft(leftChild);
int bestChild =
(rightChild < _count && _comparer.Compare(_heap[rightChild], _heap[leftChild]) < 0) ?
rightChild : leftChild;
// Promote bestChild to fill the gap left by parent.
_heap[parent] = _heap[bestChild];
// Restore invariants, i.e., let parent point to the gap.
parent = bestChild;
leftChild = HeapLeftChild(parent);
}
// Fill the last gap by moving the last (i.e., bottom-rightmost) node.
_heap[parent] = _heap[_count - 1];
// FIX: Rebalance the heap
int index = parent;
var value = _heap[parent];
while (index > 0)
{
int parentIndex = HeapParent(index);
if (_comparer.Compare(value, _heap[parentIndex]) < 0)
{
// value is a better match than the parent node so exchange
// places to preserve the "heap" property.
var pivot = _heap[index];
_heap[index] = _heap[parentIndex];
_heap[parentIndex] = pivot;
index = parentIndex;
}
else
{
// Heap is balanced
break;
}
}
}
_count--;
}

Kevin Gosse's answer identifies the problem. Although his re-balancing of the heap will work, it's not necessary if you fix the fundamental problem in the original removal loop.
As he pointed out, the idea is to replace the item at the top of the heap with the lowest, right-most item, and then sift it down to the proper location. It's a simple modification of the original loop:
internal void Pop()
{
Debug.Assert(_count != 0);
if (_count > 0)
{
--_count;
// Logically, we're moving the last item (lowest, right-most)
// to the root and then sifting it down.
int ix = 0;
while (ix < _count/2)
{
// find the smallest child
int smallestChild = HeapLeftChild(ix);
int rightChild = HeapRightFromLeft(smallestChild);
if (rightChild < _count-1 && _comparer.Compare(_heap[rightChild], _heap[smallestChild]) < 0)
{
smallestChild = rightChild;
}
// If the item is less than or equal to the smallest child item,
// then we're done.
if (_comparer.Compare(_heap[_count], _heap[smallestChild]) <= 0)
{
break;
}
// Otherwise, move the child up
_heap[ix] = _heap[smallestChild];
// and adjust the index
ix = smallestChild;
}
// Place the item where it belongs
_heap[ix] = _heap[_count];
// and clear the position it used to occupy
_heap[_count] = default(T);
}
}
Note also that the code as written has a memory leak. This bit of code:
// Fill the last gap by moving the last (i.e., bottom-rightmost) node.
_heap[parent] = _heap[_count - 1];
Does not clear the value from _heap[_count - 1]. If the heap is storing reference types, then the references remain in the heap and cannot be garbage collected until the memory for the heap is garbage collected. I don't know where this heap is used, but if it's large and lives for any significant amount of time, it could cause excess memory consumption. The answer is to clear the item after it's copied:
_heap[_count - 1] = default(T);
My replacement code incorporates that fix.

Not reproducible in .NET Framework 4.8
Trying to reproduce this issue in 2020 with the .NET Framework 4.8 implementation of the PriorityQueue<T> as linked in the question using the following XUnit test ...
public class PriorityQueueTests
{
[Fact]
public void PriorityQueueTest()
{
Random random = new Random();
// Run 1 million tests:
for (int i = 0; i < 1000000; i++)
{
// Initialize PriorityQueue with default size of 20 using default comparer.
PriorityQueue<int> priorityQueue = new PriorityQueue<int>(20, Comparer<int>.Default);
// Using 200 entries per priority queue ensures possible edge cases with duplicate entries...
for (int j = 0; j < 200; j++)
{
// Populate queue with test data
priorityQueue.Push(random.Next(0, 100));
}
int prev = -1;
while (priorityQueue.Count > 0)
{
// Assert that previous element is less than or equal to current element...
Assert.True(prev <= priorityQueue.Top);
prev = priorityQueue.Top;
// remove top element
priorityQueue.Pop();
}
}
}
}
... succeeds in all 1 million test cases:
So it seems like Microsoft fixed the bug in their implementation:
internal void Pop()
{
Debug.Assert(_count != 0);
if (!_isHeap)
{
Heapify();
}
if (_count > 0)
{
--_count;
// discarding the root creates a gap at position 0. We fill the
// gap with the item x from the last position, after first sifting
// the gap to a position where inserting x will maintain the
// heap property. This is done in two phases - SiftDown and SiftUp.
//
// The one-phase method found in many textbooks does 2 comparisons
// per level, while this method does only 1. The one-phase method
// examines fewer levels than the two-phase method, but it does
// more comparisons unless x ends up in the top 2/3 of the tree.
// That accounts for only n^(2/3) items, and x is even more likely
// to end up near the bottom since it came from the bottom in the
// first place. Overall, the two-phase method is noticeably better.
T x = _heap[_count]; // lift item x out from the last position
int index = SiftDown(0); // sift the gap at the root down to the bottom
SiftUp(index, ref x, 0); // sift the gap up, and insert x in its rightful position
_heap[_count] = default(T); // don't leak x
}
}
As the link in the questions only points to most recent version of Microsoft's source code (currently .NET Framework 4.8) it's hard to say what exactly was changed in the code but most notably there's now an explicit comment not to leak memory, so we can assume the memory leak mentioned in #JimMischel's answer has been addressed as well which can be confirmed using the Visual Studio Diagnostic tools:
If there was a memory leak we'd see some changes here after a couple of million Pop() operations...

Related

Binary search with comparer is faster than without

I have a data that consists of about 2 million records. I am trying to find the single data, which is closest to the given timeframe. The list of data is ordered and the data is represented by the following class:
public class DataPoint
{
public long OpenTimeTs;
}
I have implemented 3 methods that do the same job and produce the same results. I have some questions about why one of the approaches performs faster
Method 1
uses the binary search within the list of long
private DataPoint BinaryFindClosest(List<DataPoint> candles, List<long> times, long dateToFindMs)
{
int index = times.BinarySearch(dateToFindMs);
if (index >= 0)
return candles[index];
// If not found, List.BinarySearch returns the complement
// of the index where the element should have been.
index = ~index;
// This date search for is larger than any
if (index == times.Count)
return candles[index - 1];
// The date searched is smaller than any in the list.
if (index == 0)
return candles[0];
if (Math.Abs(dateToFindMs - times[index - 1]) < Math.Abs(dateToFindMs - times[index]))
return candles[index - 1];
else
return candles[index];
}
Method 2
Almost same as method 1, except it uses custom object comparer.
private DataPoint BinaryFindClosest2(List<DataPoint> candles, DataPoint toFind)
{
var comparer = Comparer<DataPoint>.Create((x, y) => x.OpenTimeTs > y.OpenTimeTs ? 1 : x.OpenTimeTs < y.OpenTimeTs ? -1 : 0);
int index = candles.BinarySearch(toFind, comparer);
if (index >= 0)
return candles[index];
// If not found, List.BinarySearch returns the complement
// of the index where the element should have been.
index = ~index;
// This date search for is larger than any
if (index == candles.Count)
return candles[index - 1];
// The date searched is smaller than any in the list.
if (index == 0)
return candles[0];
if (Math.Abs(toFind.OpenTimeTs - candles[index - 1].OpenTimeTs) < Math.Abs(toFind.OpenTimeTs - candles[index].OpenTimeTs))
return candles[index - 1];
else
return candles[index];
}
Method 3
Finally this is the method I've been using before discovering the BinarySearch approach on stackoverflow in some other topic.
private DataPoint FindClosest(List<DataPoint> candles, DataPoint toFind)
{
long timeToFind = toFind.OpenTimeTs;
int smallestDistanceIdx = -1;
long smallestDistance = long.MaxValue;
for (int i = 0; i < candles.Count(); i++)
{
var candle = candles[i];
var distance = Math.Abs(candle.OpenTimeTs - timeToFind);
if (distance <= smallestDistance)
{
smallestDistance = distance;
smallestDistanceIdx = i;
}
else
{
break;
}
}
return candles[smallestDistanceIdx];
}
Question
Now here comes the problem. After running some benchmarks, it has come to my attention that the second method (which uses the custom comprarer) is the fastest amonght the others.
I would like to know why the approach with the custom comparer is performing faster than the approach that binary-searches within the list of longs.
I am using the following code to test the methods:
var candles = AppState.GetLoadSymbolData();
var times = candles.Select(s => s.OpenTimeTs).ToList();
var dateToFindMs = candles[candles.Count / 2].OpenTimeTs;
var candleToFind = new DataPoint() { OpenTimeTs = dateToFindMs };
var numberOfFinds = 100_000;
var sw = Stopwatch.StartNew();
for (int i = 0; i < numberOfFinds; i++)
{
var foundCandle = BinaryFindClosest(candles, times, dateToFindMs);
}
sw.Stop();
var elapsed1 = sw.ElapsedMilliseconds;
sw.Restart();
for (int i = 0; i < numberOfFinds; i++)
{
var foundCandle = BinaryFindClosest2(candles, candleToFind);
}
sw.Stop();
var elapsed2 = sw.ElapsedMilliseconds;
sw.Restart();
for (int i = 0; i < numberOfFinds; i++)
{
var foundCandle = FindClosest(candles, candleToFind);
}
sw.Stop();
var elapsed3 = sw.ElapsedMilliseconds;
Console.WriteLine($"Elapsed 1: {elapsed1} ms");
Console.WriteLine($"Elapsed 2: {elapsed2} ms");
Console.WriteLine($"Elapsed 3: {elapsed3} ms");
In release mode, the results are following:
Elapsed 1: 19 ms
Elapsed 2: 1 ms
Elapsed 3: 60678 ms
Logically I would assume that it should be faster to compare the list of longs, but this is not the case. I tried profiling the code, but it only points to BinarySearch method slow execution, nothing else.. So there must be some internal processes that slow things down for longs.
Edit: After following the advice I have implemented a proper benchmark test using benchmarkdotnet and here are the results
Method
N
Mean
Error
StdDev
Gen0
Allocated
BinaryFindClosest
10000
28.31 ns
0.409 ns
0.362 ns
-
-
BinaryFindClosest2
10000
75.85 ns
0.865 ns
0.722 ns
0.0014
24 B
FindClosest
10000
3,363,223.68 ns
63,300.072 ns
52,858.427 ns
-
2 B
It does look like order in which methods are executed messed up my initial result. Now it looks like the first method works faster (and it should be). The slowest is of course my own implementation. I have tuned it a bit, but it still is the slowest method:
public static DataPoint FindClosest(List<DataPoint> candles, List<long> times, DataPoint toFind)
{
long timeToFind = toFind.OpenTimeTs;
int smallestDistanceIdx = -1;
long smallestDistance = long.MaxValue;
var count = candles.Count();
for (int i = 0; i < count; i++)
{
var diff = times[i] - timeToFind;
var distance = diff < 0 ? -diff : diff;
if (distance < smallestDistance)
{
smallestDistance = distance;
smallestDistanceIdx = i;
}
else
{
break;
}
}
return candles[smallestDistanceIdx];
}
To make the long story short - use a proper benchmarking tool.

Please give a look at the IL that the Method 1 and 2 generate. It is likely an invalid test. They should be almost the same machine code.
First: i don't see where you guarantee the ordering. But suppose it is there somehow. Binary search will find the most hidden number in almost 20 to 25 steps (log2(2.000.000)). This test smells weird.
Second: where is the definition of BinaryFindClosestCandle(candles, times, dateToFindMs)? Why is it receiving both the class instances and the list of longs? Why don't you return the index applying the binary search on the long list and use it to index the original list of candles? (if you create the list of longs with select, the 1:1 relation in the lists is preserved)
Third: The data you are using is a class, so that all elements live on the heap. You are boxing an array of 2 million long numbers in method2. It is almost a crime. Deferencing data from the heap will cost much more than the comparison itself. I still think that the lists are not ordered.
Create a swap list to apply the seach algo on, as you did with times, but convert it to an array with a .ToArray() instead and let it on the stack. I don't think there can be anything better on the market than the default comparer of long valueTypes.
EDIT FOR SOLUTION HINT:
Depending on how many insertions you do before one lookup for the minimum value I would go for the following:
if (insertions/lookups > 300.000)
{
a. store the index of the minimum (and the minimum value) apart in a dedicated field, I would store also a flag for IsUpdated to get false at the first deletion from the list.
b. spawn a parallel thread to refresh that index and the minumum value at every now an then (depending on how often you do the lookups) if the IsUpdated is false, or lazily when you start a lookup with a IsUpdated = false.
}
else
{
use a dictionary with the long as a key ( I suppose that two entities with the same long value are likely to be considered equal).
}

Code efficiency and accuracy

I'm trying to solve a problem on code wars and the unit tests provided make absolutely no sense...
The problem is as follows and sounds absolutely simple enough to have something working in 5 minutes
Consider a sequence u where u is defined as follows:
The number u(0) = 1 is the first one in u.
For each x in u, then y = 2 * x + 1 and z = 3 * x + 1 must be in u too.
There are no other numbers in u.
Ex: u = [1, 3, 4, 7, 9, 10, 13, 15, 19, 21, 22, 27, ...]
1 gives 3 and 4, then 3 gives 7 and 10, 4 gives 9 and 13, then 7 gives 15 and 22 and so on...
Task:
Given parameter n the function dbl_linear (or dblLinear...) returns the element u(n) of the ordered (with <) sequence u.
Example:
dbl_linear(10) should return 22
At first I used a sortedset with a linq query as I didnt really care about efficiency, I quickly learned that this operation will have to calculate to ranges where n could equal ~100000 in under 12 seconds.
So this abomination was born, then butchered time and time again since a for loop would generate issues for some reason. It was then "upgraded" to a while loop which gave slightly more passed unit tests ( 4 -> 8 ).
public class DoubleLinear {
public static int DblLinear(int n) {
ListSet<int> table = new ListSet<int> {1};
for (int i = 0; i < n; i++) {
table.Put(Y(table[i]));
table.Put(Z(table[i]));
}
table.Sort();
return table[n];
}
private static int Y(int y) {
return 2 * y + 1;
}
private static int Z(int z) {
return 3 * z + 1;
}
}
public class ListSet<T> : List<T> {
public void Put(T item) {
if (!this.Contains(item))
this.Add(item);
}
}
With this code it still fails the calculation in excess of n = 75000, but passes up to 8 tests.
I've checked if other people have passed this, and they have. However, i cannot check what they wrote to learn from it.
Can anyone provide insight to what could be wrong here? I'm sure the answer is blatantly obvious and I'm being dumb.
Also is using a custom list in this way a bad idea? is there a better way?

ListSet is slow for sorting, and you constantly get memory reallocation as you build the set. I would start by allocating the table in its full size first, though honestly I would also tell you using a barebones array of the size you need is best for performance.
If you know you need n = 75,000+, allocate a ListSet (or an ARRAY!) of that size. If the unit tests start taking you into the stratosphere, there is a binary segmentation technique we can discuss, but that's a bit involved and logically tougher to build.
I don't see anything logically wrong with the code. The numbers it generates are correct from where I'm standing.
EDIT: Since you know 3n+1 > 2n+1, you only ever have to maintain 6 values:
Target index in u
Current index in u
Current x for y
Current x for z
Current val for y
Current val for z
public static int DblLinear(int target) {
uint index = 1;
uint ind_y = 1;
uint ind_z = 1;
uint val_y = 3;
uint val_z = 4;
if(target < 1)
return 1;
while(index < target) {
if(val_y < val_z) {
ind_y++;
val_y = 2*ind_y + 1;
} else {
ind_z++;
val_z = 3*ind_z + 1;
}
index++;
}
return (val_y < val_z) ? val_y : val_z;
}
You could modify the val_y if to be a while loop (more efficient critical path) if you either widen the branch to 2 conditions or implement a backstep loop for when you blow past your target index.
No memory allocation will definitely speed your calculations up, even f people want to (incorrectly) belly ache about branch prediction in such an easily predictable case.
Also, did you turn optimization on in your Visual Studio project? If you're submitting a binary and not a code file, then that can also shave quite a bit of time.

find items in knapsack bag

I want to solve the knapsack problem recursively in C#. This is my code:
public int f(int n, int remain)
{
if (n < 0) return 0;
if (w[n] > remain)
{
// Thread.VolatileWrite(ref check[n], 0);
check[n] = 0;
return f(n - 1, remain);
}
else
{
int a = f(n - 1, remain);
int b = p[n] + f(n - 1, remain - w[n]);
if (a >= b)
{
// Thread.VolatileWrite(ref check[n], 0);
check[n] = 0;
return a;
}
else
{
// Thread.VolatileWrite(ref check[n], 1);
check[n] = 1;
return b;
}
}
}
w is an array that holds weights and p is an array that holds prices. n is the number of items and remain is the maximum weight.
My problem is with the check array. I have used this array to store items that are going to be in the bag but it does not work always, sometimes the solution is right and sometimes not. I have tried everything but could not figure it out. How can I solve this?

The usage of the check array is wrong, since it indicates the last assignment, and it does not have to be the one chosen.
Here is a counter example that explains why it does not work.
Assume:
weights = [1,2]
values = [2,1]
w = 2
Now, let examine what will happen:
f(1,2):
f(0,2):
f(-1,2) = 0
a = 0
f(-1,1) = 0
b = 2 + 0 = 2
b>a -> check[0] = 1
return f(0,2) = 2
a = 2
f(0,0):
w[0] > 0: check[0] = 0
return f(-1,0) = 0
return f(0,0) = 0
b = 1 + 0 = 1
a > b: check[1] = 0
return f(1,2) = 2
So, the optimal solution to this problem is 2 (chosing the 2nd element), but your solution chose no element (check = [0,0])
This happens because the changing of check is global, and not local to the calling environment, and specifically - the assignment in deep levels do not depend on the choice you made in higher levels.
To handle it you can either:
make your list not global, and each recursive call will have its own
instance of a list. The "parent" call will chose not only which
value to take, but according to this choice - the parent will also
chose the list it will use, and append "his" choice to it, before forwarding up to its parent.
Switch to a DP solution, or mimic the DP solution, and then use the table you created to figure out which elements to chose as I described in this thread: How to find which elements are in the bag, using Knapsack Algorithm [and not only the bag's value]?

Get all possible distinct triples using LINQ

I have a List contains these values: {1, 2, 3, 4, 5, 6, 7}. And I want to be able to retrieve unique combination of three. The result should be like this:
{1,2,3}
{1,2,4}
{1,2,5}
{1,2,6}
{1,2,7}
{2,3,4}
{2,3,5}
{2,3,6}
{2,3,7}
{3,4,5}
{3,4,6}
{3,4,7}
{3,4,1}
{4,5,6}
{4,5,7}
{4,5,1}
{4,5,2}
{5,6,7}
{5,6,1}
{5,6,2}
{5,6,3}
I already have 2 for loops that able to do this:
for (int first = 0; first < test.Count - 2; first++)
{
int second = first + 1;
for (int offset = 1; offset < test.Count; offset++)
{
int third = (second + offset)%test.Count;
if(Math.Abs(first - third) < 2)
continue;
List<int> temp = new List<int>();
temp .Add(test[first]);
temp .Add(test[second]);
temp .Add(test[third]);
result.Add(temp );
}
}
But since I'm learning LINQ, I wonder if there is a smarter way to do this?

UPDATE: I used this question as the subject of a series of articles starting here; I'll go through two slightly different algorithms in that series. Thanks for the great question!
The two solutions posted so far are correct but inefficient for the cases where the numbers get large. The solutions posted so far use the algorithm: first enumerate all the possibilities:
{1, 1, 1 }
{1, 1, 2 },
{1, 1, 3 },
...
{7, 7, 7}
And while doing so, filter out any where the second is not larger than the first, and the third is not larger than the second. This performs 7 x 7 x 7 filtering operations, which is not that many, but if you were trying to get, say, permutations of ten elements from thirty, that's 30 x 30 x 30 x 30 x 30 x 30 x 30 x 30 x 30 x 30, which is rather a lot. You can do better than that.
I would solve this problem as follows. First, produce a data structure which is an efficient immutable set. Let me be very clear what an immutable set is, because you are likely not familiar with them. You normally think of a set as something you add items and remove items from. An immutable set has an Add operation but it does not change the set; it gives you back a new set which has the added item. The same for removal.
Here is an implementation of an immutable set where the elements are integers from 0 to 31:
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System;
// A super-cheap immutable set of integers from 0 to 31 ;
// just a convenient wrapper around bit operations on an int.
internal struct BitSet : IEnumerable<int>
{
public static BitSet Empty { get { return default(BitSet); } }
private readonly int bits;
private BitSet(int bits) { this.bits = bits; }
public bool Contains(int item)
{
Debug.Assert(0 <= item && item <= 31);
return (bits & (1 << item)) != 0;
}
public BitSet Add(int item)
{
Debug.Assert(0 <= item && item <= 31);
return new BitSet(this.bits | (1 << item));
}
public BitSet Remove(int item)
{
Debug.Assert(0 <= item && item <= 31);
return new BitSet(this.bits & ~(1 << item));
}
IEnumerator IEnumerable.GetEnumerator() { return this.GetEnumerator(); }
public IEnumerator<int> GetEnumerator()
{
for(int item = 0; item < 32; ++item)
if (this.Contains(item))
yield return item;
}
public override string ToString()
{
return string.Join(",", this);
}
}
Read this code carefully to understand how it works. Again, always remember that adding an element to this set does not change the set. It produces a new set that has the added item.
OK, now that we've got that, let's consider a more efficient algorithm for producing your permutations.
We will solve the problem recursively. A recursive solution always has the same structure:
Can we solve a trivial problem? If so, solve it.
If not, break the problem down into a number of smaller problems and solve each one.
Let's start with the trivial problems.
Suppose you have a set and you wish to choose zero items from it. The answer is clear: there is only one possible permutation with zero elements, and that is the empty set.
Suppose you have a set with n elements in it and you want to choose more than n elements. Clearly there is no solution, not even the empty set.
We have now taken care of the cases where the set is empty or the number of elements chosen is more than the number of elements total, so we must be choosing at least one thing from a set that has at least one thing.
Of the possible permutations, some of them have the first element in them and some of them do not. Find all the ones that have the first element in them and yield them. We do this by recursing to choose one fewer elements on the set that is missing the first element.
The ones that do not have the first element in them we find by enumerating the permutations of the set without the first element.
static class Extensions
{
public static IEnumerable<BitSet> Choose(this BitSet b, int choose)
{
if (choose < 0) throw new InvalidOperationException();
if (choose == 0)
{
// Choosing zero elements from any set gives the empty set.
yield return BitSet.Empty;
}
else if (b.Count() >= choose)
{
// We are choosing at least one element from a set that has
// a first element. Get the first element, and the set
// lacking the first element.
int first = b.First();
BitSet rest = b.Remove(first);
// These are the permutations that contain the first element:
foreach(BitSet r in rest.Choose(choose-1))
yield return r.Add(first);
// These are the permutations that do not contain the first element:
foreach(BitSet r in rest.Choose(choose))
yield return r;
}
}
}
Now we can ask the question that you need the answer to:
class Program
{
static void Main()
{
BitSet b = BitSet.Empty.Add(1).Add(2).Add(3).Add(4).Add(5).Add(6).Add(7);
foreach(BitSet result in b.Choose(3))
Console.WriteLine(result);
}
}
And we're done. We have generated only as many sequences as we actually need. (Though we have done a lot of set operations to get there, but set operations are cheap.) The point here is that understanding how this algorithm works is extremely instructive. Recursive programming on immutable structures is a powerful tool that many professional programmers do not have in their toolbox.

You can do it like this:
var data = Enumerable.Range(1, 7);
var r = from a in data
from b in data
from c in data
where a < b && b < c
select new {a, b, c};
foreach (var x in r) {
Console.WriteLine("{0} {1} {2}", x.a, x.b, x.c);
}
Demo.
Edit: Thanks Eric Lippert for simplifying the answer!

var ints = new int[] { 1, 2, 3, 4, 5, 6, 7 };
var permutations = ints.SelectMany(a => ints.Where(b => (b > a)).
SelectMany(b => ints.Where(c => (c > b)).
Select(c => new { a = a, b = b, c = c })));

Algorithm - java, c# or delphi - search a number in array which exists more than arraySize / 2. In one pass without additional memory

I need to do an algorithm that search a specific int in array of int.
That number must appear >= than arraySize/2 times.
example: [] = 4 4 3 5 5 5 5 5 5 6
arraysize: 10
number 5 exists 6x -> so this is the result of algorithm
but I need to do this without additionam memory, and in time O(n) -> in one pass.
Is this even possible? Any suggestions how to start it?

It is indeed possible; the task is known as "Dominant Element," and used for interviews and as a homework. Read the article below for a proper analysis; the solution itself is simple but not easy: proving that it indeed does what it promises is not quite trivial (unless of course you know the answer).
http://www.cse.iitk.ac.in/users/sbaswana/Courses/ESO211/problem.pdf
element x;
int count ← 0;
For(i = 0 to n − 1)
{
if(count == 0) { x ← A[i]; count ++; }
else if (A[i] == x) count ++;
else count −−
}
Check if x is dominant element by scanning array A.
Note though that the time is O(n), but as far as I'm aware, it is not possible to do it in one pass unless you know for sure there is a dominant element.
As of additional memory, you will need memory for i, the counter; x, the element to check and return; and count, the size of the imaginary working set. That's O(1) and is usually considered OK for such problems.

Moore describes the solution to this problem on his web site (with an example here).
Edit: Here is some Java code demonstrating the algorithm as described:
public class Majority
{
public static void main(String[] args)
{
int[]a = new int[]{4, 4, 3, 5, 5, 5, 5, 5, 5, 6};
int count = 0;
int candidateIndex = 0;
for (int i = 0; i < a.length; i++)
{
if (count == 0)
{
candidateIndex = i;
count++;
}
else
{
if (a[i] == a[candidateIndex])
count++;
else
count--;
}
}
System.out.println("Majority element: " + a[candidateIndex]);
}
}
After you get your candidateIndex, you can iterate though the array again to verify that it indeed occurs more than N / 2 times.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Bug in Microsoft's internal PriorityQueue<T>? - c#

Related

Binary search with comparer is faster than without

Code efficiency and accuracy

find items in knapsack bag

Get all possible distinct triples using LINQ

Algorithm - java, c# or delphi - search a number in array which exists more than arraySize / 2. In one pass without additional memory

Categories

Resources