I'm trying to implement the Hungarian algorithm. Everything is fine except for when the matrix isn't square. All the methods I've searched are saying that I should make it square by adding dummy rows/columns, and filling the dummy row/column with the maximum number in the matrix. My question is that won't this affect the final result? Shouldn't the dummy row/column be filled with at least max+1?
The dummy values should all be zero. The point is that it doesn't matter which one you choose, you're going to ignore those choices in the end because they weren't in the original data. By making them zero (at the start), your algorithm won't have to work as hard to find a value you're not going to use.
The main idea of the Hungarian algorithm is built upon the fact that the "optimal assignment of jobs, remains the same if a number is added/subtracted from all entries of any row or column of the matrix". Therefore, it does not matter if you use dummy value as "max or max+1 or 0". It can be set as any number and better it is 0 (as Yay295 said, the algorithm would like to work less if entries are already 0)
Related
I've got two lists of points, let's call them L1( P1(x1, y1), ... Pn(xn, yn)) and L2(P'1(x'1, y'1), ... P'n(x'n, y'n)).
My task is to find the best match between their points for minimizing the sum of their distances.
Any clue on some algorithm? The two lists contain approx. 200-300 points.
Thanks and bests.
If the use case of your problem involves matching ever point present in list L1 with a point in list L2, then the Hungarian Algorithm would serve as a perfect fit.
The weights corresponding to your Hungarian matrix would be the distance between the point annotated for the row vs the column. The overall runtime for the optimized Hungarian algorithm is O(n3) which will comfortably fit for your given constraint of n = 300
A pretty nice tutorial covering the ideology and implementation of the Hungarian algorithm is https://www.topcoder.com/community/competitive-programming/tutorials/assignment-problem-and-hungarian-algorithm/
If not for the Hungarian algorithm, you can also morph the given problem into a max-flow-min-cost problem - the details of which I'll omit for now but can discuss if required.
I am writing a Shannon-fano algorithm, and I am struggling to find a mistake in my program - my program works for examples I managed to find on internet - example:
This is my example with 10 characters, where it sets characters will lower possibilities longer codes:
On the left side are byte values, middle is possibility and left is the generated code. Why is 65 and 226's code longer then 0,3 and 32's? Can anybody see a bug in code?
EDIT: code hidden, because this question was about a school assignment
This is probably not a bug in your code but rather illustrates an inherent weakness in Shannon-Fano codes compared to, say, Huffman compression.
As you know, the Shannon-Fano technique is to sort the list of code frequencies in descending order and then assign a binary symbol (zero or one) to each half of the frequency range. This process is repeated in a recursive fashion as long as there is more than one element in a segment.
This has a weakness, though: While it's true that the more frequent symbols, when grouped together, will have a shorter encoding on average than the less frequent symbols, it is not necessarily the case for each and every symbol that it gets a shorter encoding assigned to it.
For more information, see a question I posted a while back over on Computer Science about this very issue.
I am having trouble implementing this into my current path finding algorithm.
Currently I have Dijkstra written and works like it should, but I need to step further away and add a limit (range). I can better explain with an image:
Let's say I have range of 80. I want to go from A to E. My current algorithm, works as it should, so it results in A->B-E.
However, I need to go only on paths with weight not more than the range - 80, which would mean that A->B->E is not the option any more, but A->C->D->B->E (considering that range/limit resets on every stop)
So far, I have implemented a bool named Possible which would return for the single part of path (e.g. A->B) is it possible comparing to my limit / range.
My main problem is that I do not know where/how to start. My only idea was to see where Possible is false (A->B on the total route A->B->E) and run the algorithm from A to A->E again without / excluding B stop/vertex.
Is this a good approach? Because of that my big O notation would increment twice (as far as I understand it).
I see two ways of doing this
Create a new graph G' that contains only edges < 80, and look for shortest path there... reduction time is O(V+E), and additional O(V+E) memory usage
You can change Dijkstra's algorithm, to ignore edges > 80, just skip edges >80, when giving values to neighbor vertices, the complexity and memory usage will stay the same in this case
Create a temporary version of your graph, and set all weights above the threshold to infinity. Then run the ordinary Dijkstra algorithm on it.
Complexity will increase or not, depending on your version of the algorithm:
if you have O(V^2) then it will increase to O(E + V^2)
if you have the O(ElogV) version then it will increase to O(E + ElogV)
if you have the O(E + VlogV) version it will remain the same
As noted by ArsenMkrt you can as well remove these edges, which makes even more sense but will make the complexity a bit worse. Modifying the algorithm to just skip those edges seems to be the best option though, as he suggested in his answer.
Imagine I want to, say, compute the first one million terms of the Fibonacci sequence using the GPU. (I realize this will exceed the precision limit of a 32-bit data type - just used as an example)
Given a GPU with 40 shaders/stream processors, and cheating by using a reference book, I can break up the million terms into 40 blocks of 250,000 strips, and seed each shader with the two start values:
unit 0: 1,1 (which then calculates 2,3,5,8,blah blah blah)
unit 1: 250,000th term
unit 2: 500,000th term
...
How, if possible, could I go about ensuring that pixels are processed in order? If the first few pixels in the input texture have values (with RGBA for simplicity)
0,0,0,1 // initial condition
0,0,0,1 // initial condition
0,0,0,2
0,0,0,3
0,0,0,5
...
How can I ensure that I don't try to calculate the 5th term before the first four are ready?
I realize this could be done in multiple passes but setting a "ready" bit whenever a value is calculated, but that seems incredibly inefficient and sort of eliminates the benefit of performing this type of calculation on the GPU.
OpenCL/CUDA/etc probably provide nice ways to do this, but I'm trying (for my own edification) to get this to work with XNA/HLSL.
Links or examples are appreciated.
Update/Simplification
Is it possible to write a shader that uses values from one pixel to influence the values from a neighboring pixel?
You cannot determine the order the pixels are processed. If you could, that would break the massive pixel throughput of the shader pipelines. What you can do is calculating the Fibonacci sequence using the non-recursive formula.
In your question, you are actually trying to serialize the shader units to run one after another. You can use the CPU right away and it will be much faster.
By the way, multiple passes aren't as slow as you might think, but they won't help you in your case. You cannot really calculate any next value without knowing the previous ones, thus killing any parallelization.
I know there are quite some questions out there on generating combinations of elements, but I think this one has a certain twist to be worth a new question:
For a pet proejct of mine I've to pre-compute a lot of state to improve the runtime behavior of the application later. One of the steps I struggle with is this:
Given N tuples of two integers (lets call them points from here on, although they aren't in my use case. They roughly are X/Y related, though) I need to compute all valid combinations for a given rule.
The rule might be something like
"Every point included excludes every other point with the same X coordinate"
"Every point included excludes every other point with an odd X coordinate"
I hope and expect that this fact leads to an improvement in the selection process, but my math skills are just being resurrected as I type and I'm unable to come up with an elegant algorithm.
The set of points (N) starts small, but outgrows 64 soon (for the "use long as bitmask" solutions)
I'm doing this in C#, but solutions in any language should be fine if it explains the underlying idea
Thanks.
Update in response to Vlad's answer:
Maybe my idea to generalize the question was a bad one. My rules above were invented on the fly and just placeholders. One realistic rule would look like this:
"Every point included excludes every other point in the triagle above the chosen point"
By that rule and by choosing (2,1) I'd exclude
(2,2) - directly above
(1,3) (2,3) (3,3) - next line
and so on
So the rules are fixed, not general. They are unfortunately more complex than the X/Y samples I initially gave.
How about "the x coordinate of every point included is the exact sum of some subset of the y coordinates of the other included points". If you can come up with a fast algorithm for that simply-stated constraint problem then you will become very famous indeed.
My point being that the problem as stated is so vague as to admit NP-complete or NP-hard problems. Constraint optimization problems are incredibly hard; if you cannot put extremely tight bounds on the problem then it very rapidly becomes not analyzable by machines in polynomial time.
For some special rule types your task seems to be simple. For example, for your example rule #1 you need to choose a subset of all possible values of X, and than for each value from the subset assign an arbitrary Y.
For generic rules I doubt that it's possible to build an efficient algorithm without any AI.
My understanding of the problem is: Given a method bool property( Point x ) const, find all points the set for which property() is true. Is that reasonable?
The brute-force approach is to run all the points through property(), and store the ones which return true. The time complexity of this would be O( N ) where (a) N is the total number of points, and (b) the property() method is O( 1 ). I guess you are looking for improvements from O( N ). Is that right?
For certain kind of properties, it is possible to improve from O( N ) provided suitable data structure is used to store the points and suitable pre-computation (e.g. sorting) is done. However, this may not be true for any arbitrary property.