Priority Queue remove items with same priority first one entered

Priority Queue remove items with same priority first one entered - c#

I've got a priority queue created and working that enters items in order and removes them in order. Even if two numbers have the same priority, it removes the one that was entered first.
If there are three numbers that have the same priority, it does not remove the first one. How would I go about doing this, or should it do this?
Dequeue function:
public void deQueue(Animal item)
{
item = items.elements[0];
items.elements[0] = items.elements[numItems - 1];
numItems--;
items.ReheapDown(0, numItems - 1);
}
ReheapDown Function:
public void ReheapDown(int root, int bottom)
{
int maxchild, rightchild, leftchild;
leftchild = root * 2 + 1;
rightchild = root * 2 + 2;
if (leftchild <= bottom)
{
if (leftchild == bottom)
maxchild = leftchild;
else
{
if (elements[leftchild].priority <= elements[rightchild].priority)
maxchild = rightchild;
else
maxchild = leftchild;
}
if (elements[root].priority < elements[maxchild].priority)
{
Swap(elements, root, maxchild);
ReheapDown(maxchild, bottom);
}
}
}

In this line
if (elements[leftchild].priority <= elements[rightchild].priority)
you swap elements if they're equal. So let's say you enter the numbers [2, 2, 1, 3], in that order. Let's call the second 2, "2*", to differentiate it from the first one. The resulting heap is:
1
/ \
2 2*
/
3
Now, you remove 1. So then you replace the 1 with 3:
3
/ \
2 2*
In your ReheapDown method, the parent has two children, and you're selecting the smallest child. When you compare the two 2's, you have this code:
if (elements[leftchild].priority <= elements[rightchild].priority)
maxchild = rightchild;
else
maxchild = leftchild;
Since 2 == 2, it sets maxchild = rightchild, so the new root becomes 2*--the second 2 that was entered. Your heap now looks like this:
2*
/ \
2 3
And the next thing to be removed will be 2*.
You might think, then, that if you change that <= to <, it'll solve your problem. But it won't.
When you consider all the different ways that the heap can mutate, it's impossible to guarantee that equal items will be removed in the same order that they were inserted, unless you supply additional information. Consider what happens if you enter items in the order [1, 3, 2, 2*]. The resulting heap is:
1
/ \
2* 2
/
3
If you remove 1, you end up with:
3
/ \
2* 2
In this case, the <= would help you out. But in the previous case, it wouldn't.
The only way to guarantee removal order of equal items is to add a second condition on your comparison--basically, you have to make those equal items unequal. You either need to add a date stamp or sequential number to the key so that you can identify the insertion order.

Related

Algorithm to find max occurrences of a substring with value of a given function

I have to find max(s.length * s.count) for any substring s of a given string t, where s.length is the length of the substring and s.count is the number of times s occurs within t. Substrings may overlap within t.
Example:
For the string aaaaaa, the substring aaa has the max (occurrences * length), substrings and occurrences are:
a: 6
aa: 5
aaa: 4
aaaa : 3
aaaaa: 2
aaaaaa: 1
So aaa is our winner with 3 occurrences * length 4 is 12. Yes, aaaa also has a score of 12, but aaa comes first.
I have tried the only means I know or can figure out, but I have an input string of 100,000 length, and just finding all the substrings is O(n^2), and this hangs my program:
var theSet = new HashSet<string>();
for (int i = 1; i < source.Length; i++)
{
for (int start = 0; start <= source.Length - i; start++)
{
var sub = source.Substring(start, i);
if (!theSet.Contains(sub))
{
theSet.Add(sub);
}
}
}
...
// Some not-noteworthy benchmark related code
...
int maxVal = 0;
foreach (var sub in subs)
{
var count = 0;
for (var i = 0; i < source.Length - sub.Length + 1; i++)
{
if (source.Substring(i, sub.Length).Equals(sub)) count++;
}
if (sub.Length * count > maxVal)
{
maxVal = sub.Length * count;
}
}
I know I am looking for a relatively unknown algorithm and or data structure with this, as google yields no results that closely match the problem. In fact, Google is where I basically only found the costly algorithms I have attempted to use in the above code.

Edit: Just realized that the problem has a solution on GFG: https://www.geeksforgeeks.org/substring-highest-frequency-length-product/
This can be solved in O(n) time by applying three well-known algorithms: Suffix Array, LCP Array and Largest Rectangular Area in a Histogram.
I will not provide any code as implementations of these algorithms can easily be found on the Internet. I will assume the input string is "banana" and try to explain the steps and how they work.
1. Run Suffix Array - O(n)
The Suffix Array algorithm sorts the suffixes of the string alphabetically. For the input "banana", the output is going to be the array [5, 3, 1, 0, 4, 2], where 5 corresponds to the suffix starting at position 5 ("a"), 3 corresponds to the suffix starting at position 3 ("ana"), 1 corresponds to the suffix starting at position 1 ("anana"), etc. After we compute this array, it becomes much easier to count the occurrences of a substring because the equal substrings are placed consecutively:
a
ana
anana
banana
na
nana
For example, we can immediately see that the substring "ana" occurs twice by looking at the 2nd and the 3rd suffixes in the above list. Similarly, we can say the substring "n" also occurs twice by looking at the 5th and the 6th.
2. Run LCP Array - O(n)
The LCP algorithm computes the length of the longest common prefix between every consecutive pair of suffixes in the suffix array. The output is going to be [1, 3, 0, 0, 2] for "banana":
a
ana // "a" and "ana" share the prefix "a", which is of length 1
anana // "ana" and "anana" share the prefix "ana", which is of length 3
banana // "anana" and "banana" share no prefix, so 0
na // "banana" and "na" share no prefix, so 0
nana // "na" and "nana" share the prefix "na", which is of length 2
Now if we plot the output of the LCP algorithm as an histogram:
x
x x
xx x
-----
01234
-----
aaabnn
nnaaa
aan n
na a
an
a
Now, here is the main observation: every rectangle in the histogram that touches the y axis corresponds to a substring and its occurences: the rectangle's width is equal to s.count - 1 and its height equals to s.length
For example consider this rectangle in the lower left corner, that corresponds to the substring "a".
xx
--
01
The rectangle is of height 1, which is "a".length and of width 2, which is "a".count - 1. And the value we need (s.count * s.length) is almost the area of the rectangle.
3. Find the largest rectangle in the histogram - O(n)
Now all we need to do is to find the largest rectangle in the histogram to find the answer to the problem, with the simple nuance that while calculating the area of the rectangle we need to add 1 to its width. This can be done by simply adding a + 1 in the area calculation logic in the algorithm.
For the "banana" example, the largest rectangle is the following (considering we added +1 to every rectangle's width):
x
x
x
-
1
We add one to its width and calculate its area as 2 * 3 = 6, which equals to how many times the substring "ana" occurs times its length.
Each of the 3 steps take O(n) time, totalling to an overall time complexity of O(n).

this does the trick despite not being very efficient O(n) complexity. I can't imagine more efficient way though...
static void TestRegexes()
{
var n = CountSubs("aaaaaa", "a");
var nn = CountSubs("aaaaaa", "aa");
var nnn = CountSubs("aaaaaa", "aaa");
var nnnn = CountSubs("aaaaaa", "aaaa");
var nnnnn = CountSubs("aaaaaa", "aaaaa");
var nnnnnn = CountSubs("aaaaaa", "aaaaaa");
;
}
private static int CountSubs( string content, string needle)
{
int l = content.Length;
int i = 0;
int count = 0;
while (content.Length >= needle.Length)
{
if (content.StartsWith(needle))
{
count++;
}
content = content.Substring(1);
i++;
}
return count;
}

Finding the index of the first gap in a sequence of numbers

I have a problem. It's a bit difficult to explain, but I am going to try. I have a few buttons, which I want to give a sequence. I created a dictionary with the buttons and the SequenceNum. Now when I am in the sequence select screen, I can click a button and give it a number. Now I have this code for getting the next highest number:
foreach (KeyValuePair<string, SelectedHexagonRegistryObject> row in SelectedHexagonRegistry.ToList())
{
if (row.Value.SequenceNum >= NextSequenceNum)
{
NextSequenceNum = row.Value.SequenceNum + 1;
}
}
Now SequenceNum does always get the next highest number, but there is a problem with this. When I gave for example 6 buttons a sequence number and I click 3 again, it gets reset. That is supposed to happen, but when I want to swap numbers for example, I also click the button with the SequenceNum 4. There are 2 buttons with no sequence numbers. If I click one of the 2 buttons that don't have sequence number. The next number is 7. But the problem is that there is a gap, because the numbers 3 and 4 are reset, so I want to be the next number the lowest number of the gap. How can I create something like that?
Example:
I have 6 buttons, which I want to give a sequence number.
When I click the first button it will get SequenceNum 1
Then when I click the rest it will get the highest given number + 1.
At the end I have 6 buttons with all a sequence number from 1 to 6.
When I click for example the button with SequenceNum=4 again, I can unset the SequenceNum to 0 (equals null).
Now when I click that same button another timen, I want it to get the number of the gap that I created, so SequenceNum 4.
Now the problem is that it gets number 7, because it takes the highest + 1.
I need code to fill the gaps and I can't seem to figure out how I can make that!

Something like this should work, unless I am mistaken:
static int GetNextSequenceNo(Dictionary<string, SelectedHexagonRegistryObject> registry)
{
// order the values ascending
var vals = registry.Values.OrderBy(s => s.SequenceNum).ToList();
// find the first value where .SequenceNum is different from (idx + 1)
var firstGap = vals.TakeWhile((s, idx) => s.SequenceNum == idx + 1).Count();
// take the sequenceNum from the previous item and increment
if (firstGap > 0)
return vals[firstGap - 1].SequenceNum + 1;
else
return 1;
}

In general, if you have a list of integers and you want to find the next lowest available number, you can sort them (if they aren't already sorted) and then walk the list one item at a time, comparing the current item to the next item. As soon as you find one where the next item is greater than the current item by more than 1, you have found a gap and can return currentItem + 1.
If you get to the end of the list, then you just return the next number. For example:
private static int GetNextAvailableNumber(IReadOnlyCollection<int> numberSequence)
{
// If the list is null or empty, return the first number (this uses 1, modify as needed)
if (numberSequence == null || !numberSequence.Any()) return 1;
var orderedNumbers = numberSequence.OrderBy(n => n).ToList();
for (var i = 0; i < orderedNumbers.Count - 1; i++)
{
var thisNumber = orderedNumbers[i];
var nextNumber = orderedNumbers[i + 1];
if (nextNumber - thisNumber > 1) return thisNumber + 1;
}
return orderedNumbers.Last() + 1;
}

Find remaining elements in the sequence

everyone. I've this small task to do:
There are two sequences of numbers:
A[0], A[1], ... , A[n].
B[0], B[1], ... , B[m].
Do the following operations with the sequence A:
Remove the items whose indices are divisible by B[0].
In the items remained, remove those whose indices are divisible by B[1].
Repeat this process up to B[m].
Output the items finally remained.
Input is like this: (where -1 is delimiter for two sequences A and B)
1 2 4 3 6 5 -1 2 -1
Here goes my code (explanation done via comments):
List<int> result = new List<int>(); // list for sequence A
List<int> values = new List<int>(); // list for holding value to remove
var input = Console.ReadLine().Split().Select(int.Parse).ToArray();
var len = Array.IndexOf(input, -1); // getting index of the first -1 (delimiter)
result = input.ToList(); // converting input array to List
result.RemoveRange(len, input.Length - len); // and deleting everything beyond first delimiter (including it)
for (var i = len + 1; i < input.Length - 1; i++) // for the number of elements in the sequence B
{
for (var j = 0; j < result.Count; j++) // going through all elmnts in sequence A
{
if (j % input[i] == 0) // if index is divisible by B[i]
{
values.Add(result[j]); // adding associated value to List<int> values
}
}
foreach (var value in values) // after all elements in sequence A have been looked upon, now deleting those who apply to criteria
{
result.Remove(value);
}
}
But the problem is that I'm only passing 5/11 tests cases. The 25% is 'Wrong result' and the rest 25% - 'Timed out'. I understand that my code is probably very badly written, but I really can't get to understand how to improve it.
So, if someone more experienced could explain (clarify) next points to me it would be very cool:
1. Am I doing parsing from the console input right? I feel like it could be done in a more elegant/efficient way.
2. Is my logic of getting value which apply to criteria and then storing them for later deleting is efficient in terms of performance? Or is there any other way to do it?
3. Why is this code not passing all test-cases or how would you change it in order to pass all of them?

I'm writing the answer once again, since I have misunderstood the problem completely. So undoubtly the problem in your code is a removal of elements. Let's try to avoid that. Let's try to make a new array C, where you can store all the correct numbers that should be left in the A array after each removal. So if index id is not divisible by B[i], you should add A[id] to the array C. Then, after checking all the indices with the B[i] value, you should replace the array A with the array C and do the same for B[i + 1]. Repeat until you reach the end of the array B.
The algorithm:
1. For each value in B:
2. For each id from 1 to length(A):
3. If id % value != 0, add A[id] to C
4. A = C
5. Return A.
EDIT: Be sure to make a new array C for each iteration of the 1. loop (or clear C after replacing A with it)

Bug in Microsoft's internal PriorityQueue<T>?

In the .NET Framework in PresentationCore.dll, there is a generic PriorityQueue<T> class whose code can be found here.
I wrote a short program to test the sorting, and the results weren't great:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using MS.Internal;
namespace ConsoleTest {
public static class ConsoleTest {
public static void Main() {
PriorityQueue<int> values = new PriorityQueue<int>(6, Comparer<int>.Default);
Random random = new Random(88);
for (int i = 0; i < 6; i++)
values.Push(random.Next(0, 10000000));
int lastValue = int.MinValue;
int temp;
while (values.Count != 0) {
temp = values.Top;
values.Pop();
if (temp >= lastValue)
lastValue = temp;
else
Console.WriteLine("found sorting error");
Console.WriteLine(temp);
}
Console.ReadLine();
}
}
}
Results:
2789658
3411390
4618917
6996709
found sorting error
6381637
9367782
There is a sorting error, and if the sample size is increased, the number of sorting errors increases somewhat proportionally.
Have I done something wrong? If not, where is the bug in the code of the PriorityQueue class located exactly?

The behavior can be reproduced using the initialization vector [0, 1, 2, 4, 5, 3]. The result is:
[0, 1, 2, 4, 3, 5]
(we can see that 3 is incorrectly placed)
The Push algorithm is correct. It builds a min-heap in a straightforward way:
Start from the bottom right
If the value is greater than the parent node then insert it and return
Otherwise, put instead the parent in the bottom right position, then try inserting the value at the parent place (and keep swapping up the tree until the right place has been found)
The resulting tree is:
0
/ \
/ \
1 2
/ \ /
4 5 3
The issue is with the Pop method. It starts by considering the top node as a "gap" to fill (since we popped it):
*
/ \
/ \
1 2
/ \ /
4 5 3
To fill it, it searches for the lowest immediate child (in this case: 1). It then moves the value up to fill the gap (and the child is now the new gap):
1
/ \
/ \
* 2
/ \ /
4 5 3
It then does the exact same thing with the new gap, so the gap moves down again:
1
/ \
/ \
4 2
/ \ /
* 5 3
When the gap has reached the bottom, the algorithm... takes the bottom-rightmost value of the tree and uses it to fill the gap:
1
/ \
/ \
4 2
/ \ /
3 5 *
Now that the gap is at the bottom-rightmost node, it decrements _count to remove the gap from the tree:
1
/ \
/ \
4 2
/ \
3 5
And we end up with... A broken heap.
To be perfectly honest, I don't understand what the author was trying to do, so I can't fix the existing code. At most, I can swap it with a working version (shamelessly copied from Wikipedia):
internal void Pop2()
{
if (_count > 0)
{
_count--;
_heap[0] = _heap[_count];
Heapify(0);
}
}
internal void Heapify(int i)
{
int left = (2 * i) + 1;
int right = left + 1;
int smallest = i;
if (left <= _count && _comparer.Compare(_heap[left], _heap[smallest]) < 0)
{
smallest = left;
}
if (right <= _count && _comparer.Compare(_heap[right], _heap[smallest]) < 0)
{
smallest = right;
}
if (smallest != i)
{
var pivot = _heap[i];
_heap[i] = _heap[smallest];
_heap[smallest] = pivot;
Heapify(smallest);
}
}
Main issue with that code is the recursive implementation, which will break if the number of elements is too large. I strongly recommend using an optimized thirdparty library instead.
Edit: I think I found out what is missing. After taking the bottom-rightmost node, the author just forgot to rebalance the heap:
internal void Pop()
{
Debug.Assert(_count != 0);
if (_count > 1)
{
// Loop invariants:
//
// 1. parent is the index of a gap in the logical tree
// 2. leftChild is
// (a) the index of parent's left child if it has one, or
// (b) a value >= _count if parent is a leaf node
//
int parent = 0;
int leftChild = HeapLeftChild(parent);
while (leftChild < _count)
{
int rightChild = HeapRightFromLeft(leftChild);
int bestChild =
(rightChild < _count && _comparer.Compare(_heap[rightChild], _heap[leftChild]) < 0) ?
rightChild : leftChild;
// Promote bestChild to fill the gap left by parent.
_heap[parent] = _heap[bestChild];
// Restore invariants, i.e., let parent point to the gap.
parent = bestChild;
leftChild = HeapLeftChild(parent);
}
// Fill the last gap by moving the last (i.e., bottom-rightmost) node.
_heap[parent] = _heap[_count - 1];
// FIX: Rebalance the heap
int index = parent;
var value = _heap[parent];
while (index > 0)
{
int parentIndex = HeapParent(index);
if (_comparer.Compare(value, _heap[parentIndex]) < 0)
{
// value is a better match than the parent node so exchange
// places to preserve the "heap" property.
var pivot = _heap[index];
_heap[index] = _heap[parentIndex];
_heap[parentIndex] = pivot;
index = parentIndex;
}
else
{
// Heap is balanced
break;
}
}
}
_count--;
}

Kevin Gosse's answer identifies the problem. Although his re-balancing of the heap will work, it's not necessary if you fix the fundamental problem in the original removal loop.
As he pointed out, the idea is to replace the item at the top of the heap with the lowest, right-most item, and then sift it down to the proper location. It's a simple modification of the original loop:
internal void Pop()
{
Debug.Assert(_count != 0);
if (_count > 0)
{
--_count;
// Logically, we're moving the last item (lowest, right-most)
// to the root and then sifting it down.
int ix = 0;
while (ix < _count/2)
{
// find the smallest child
int smallestChild = HeapLeftChild(ix);
int rightChild = HeapRightFromLeft(smallestChild);
if (rightChild < _count-1 && _comparer.Compare(_heap[rightChild], _heap[smallestChild]) < 0)
{
smallestChild = rightChild;
}
// If the item is less than or equal to the smallest child item,
// then we're done.
if (_comparer.Compare(_heap[_count], _heap[smallestChild]) <= 0)
{
break;
}
// Otherwise, move the child up
_heap[ix] = _heap[smallestChild];
// and adjust the index
ix = smallestChild;
}
// Place the item where it belongs
_heap[ix] = _heap[_count];
// and clear the position it used to occupy
_heap[_count] = default(T);
}
}
Note also that the code as written has a memory leak. This bit of code:
// Fill the last gap by moving the last (i.e., bottom-rightmost) node.
_heap[parent] = _heap[_count - 1];
Does not clear the value from _heap[_count - 1]. If the heap is storing reference types, then the references remain in the heap and cannot be garbage collected until the memory for the heap is garbage collected. I don't know where this heap is used, but if it's large and lives for any significant amount of time, it could cause excess memory consumption. The answer is to clear the item after it's copied:
_heap[_count - 1] = default(T);
My replacement code incorporates that fix.

Not reproducible in .NET Framework 4.8
Trying to reproduce this issue in 2020 with the .NET Framework 4.8 implementation of the PriorityQueue<T> as linked in the question using the following XUnit test ...
public class PriorityQueueTests
{
[Fact]
public void PriorityQueueTest()
{
Random random = new Random();
// Run 1 million tests:
for (int i = 0; i < 1000000; i++)
{
// Initialize PriorityQueue with default size of 20 using default comparer.
PriorityQueue<int> priorityQueue = new PriorityQueue<int>(20, Comparer<int>.Default);
// Using 200 entries per priority queue ensures possible edge cases with duplicate entries...
for (int j = 0; j < 200; j++)
{
// Populate queue with test data
priorityQueue.Push(random.Next(0, 100));
}
int prev = -1;
while (priorityQueue.Count > 0)
{
// Assert that previous element is less than or equal to current element...
Assert.True(prev <= priorityQueue.Top);
prev = priorityQueue.Top;
// remove top element
priorityQueue.Pop();
}
}
}
}
... succeeds in all 1 million test cases:
So it seems like Microsoft fixed the bug in their implementation:
internal void Pop()
{
Debug.Assert(_count != 0);
if (!_isHeap)
{
Heapify();
}
if (_count > 0)
{
--_count;
// discarding the root creates a gap at position 0. We fill the
// gap with the item x from the last position, after first sifting
// the gap to a position where inserting x will maintain the
// heap property. This is done in two phases - SiftDown and SiftUp.
//
// The one-phase method found in many textbooks does 2 comparisons
// per level, while this method does only 1. The one-phase method
// examines fewer levels than the two-phase method, but it does
// more comparisons unless x ends up in the top 2/3 of the tree.
// That accounts for only n^(2/3) items, and x is even more likely
// to end up near the bottom since it came from the bottom in the
// first place. Overall, the two-phase method is noticeably better.
T x = _heap[_count]; // lift item x out from the last position
int index = SiftDown(0); // sift the gap at the root down to the bottom
SiftUp(index, ref x, 0); // sift the gap up, and insert x in its rightful position
_heap[_count] = default(T); // don't leak x
}
}
As the link in the questions only points to most recent version of Microsoft's source code (currently .NET Framework 4.8) it's hard to say what exactly was changed in the code but most notably there's now an explicit comment not to leak memory, so we can assume the memory leak mentioned in #JimMischel's answer has been addressed as well which can be confirmed using the Visual Studio Diagnostic tools:
If there was a memory leak we'd see some changes here after a couple of million Pop() operations...

find items in knapsack bag

I want to solve the knapsack problem recursively in C#. This is my code:
public int f(int n, int remain)
{
if (n < 0) return 0;
if (w[n] > remain)
{
// Thread.VolatileWrite(ref check[n], 0);
check[n] = 0;
return f(n - 1, remain);
}
else
{
int a = f(n - 1, remain);
int b = p[n] + f(n - 1, remain - w[n]);
if (a >= b)
{
// Thread.VolatileWrite(ref check[n], 0);
check[n] = 0;
return a;
}
else
{
// Thread.VolatileWrite(ref check[n], 1);
check[n] = 1;
return b;
}
}
}
w is an array that holds weights and p is an array that holds prices. n is the number of items and remain is the maximum weight.
My problem is with the check array. I have used this array to store items that are going to be in the bag but it does not work always, sometimes the solution is right and sometimes not. I have tried everything but could not figure it out. How can I solve this?

The usage of the check array is wrong, since it indicates the last assignment, and it does not have to be the one chosen.
Here is a counter example that explains why it does not work.
Assume:
weights = [1,2]
values = [2,1]
w = 2
Now, let examine what will happen:
f(1,2):
f(0,2):
f(-1,2) = 0
a = 0
f(-1,1) = 0
b = 2 + 0 = 2
b>a -> check[0] = 1
return f(0,2) = 2
a = 2
f(0,0):
w[0] > 0: check[0] = 0
return f(-1,0) = 0
return f(0,0) = 0
b = 1 + 0 = 1
a > b: check[1] = 0
return f(1,2) = 2
So, the optimal solution to this problem is 2 (chosing the 2nd element), but your solution chose no element (check = [0,0])
This happens because the changing of check is global, and not local to the calling environment, and specifically - the assignment in deep levels do not depend on the choice you made in higher levels.
To handle it you can either:
make your list not global, and each recursive call will have its own
instance of a list. The "parent" call will chose not only which
value to take, but according to this choice - the parent will also
chose the list it will use, and append "his" choice to it, before forwarding up to its parent.
Switch to a DP solution, or mimic the DP solution, and then use the table you created to figure out which elements to chose as I described in this thread: How to find which elements are in the bag, using Knapsack Algorithm [and not only the bag's value]?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.